Liquid AI Unveils LFM2.5 Retrieval Models: High-Performance Dense Bi-Encoder and Late-Interaction Architectures
Liquid AI has expanded its LFM family with the release of LFM2.5-Embedding-350M and LFM2.5-ColBERT-350M, two bidirectional models designed for efficient multilingual and cross-lingual search across 11 different languages.
Expanding the LFM Ecosystem: From Causal to Bidirectional
Liquid AI has introduced the first bidirectional members of the Liquid Foundation Model (LFM) family. These new retrievers were developed by patching the existing LFM2.5-350M-Base model, transitioning it from a causal decoder architecture to a bidirectional encoder. This architectural shift allows the models to capture contextual information from both directions, which is critical for high-precision document retrieval and semantic search tasks.
Model Architectures and Technical Specifications
The release consists of two distinct architectural approaches to address different retrieval needs:
LFM2.5-Embedding-350M (Dense Bi-Encoder)
This model operates as a dense bi-encoder, condensing a document into a single 1024-dimensional vector. This approach is optimized for extreme speed and scalability, enabling rapid similarity searches across massive datasets via vector databases.
LFM2.5-ColBERT-350M (Late-Interaction)
Utilizing a late-interaction mechanism, this model generates a 128-dimensional vector per token. By employing the MaxSim operator, it balances the efficiency of bi-encoders with the precision of cross-encoders, allowing for more granular token-level matching during the retrieval phase.
Performance and Benchmarks
Both models demonstrate state-of-the-art performance within their parameter class. According to initial reports, the LFM2.5 retrievers lead their class on the NanoBEIR and MKQA-11 benchmarks, notably outperforming larger models such as Qwen in these specific retrieval tasks.
The models are specifically engineered for multilingual capabilities, providing robust cross-lingual search functionality across 11 supported languages.
Note: Detailed performance metrics and the specific list of the 11 supported languages were not provided in the source material.
Original Source