https://store-images.s-microsoft.com/image/apps.10812.090e72de-ae73-4649-9acd-94fe64c7608f.a3954a9e-350f-4da3-84ca-352c8c000865.ee56cfbb-cac2-4fca-b02b-b1f2c7d66a1a

Jina Colbert v2 - 128 dimensions

Jina AI

Jina Colbert v2 - 128 dimensions

Jina AI

ColBERT multi-vector embedding model for multilingual text input of size up to 8192 tokens.

  • jina-colbert-v2 is an open-source multilingual ColBERT-style embedding model supporting 8192 sequence length.
  • This model produces vectors of size 128 for tokens in input documents. ColBERT (Contextualized Late Interaction over BERT) leverages the deep language understanding of BERT while introducing a novel interaction mechanism. This mechanism, known as late interaction, allows for efficient and precise retrieval by processing queries and documents separately until the final stages of the retrieval process.
  • This state-of-the-art AI embedding model enables many applications, such as document clustering, classification, content personalization, vector search, or retrieval augmented generation.

Highlights:
  • Trained from scratch with support for 89 major world languages.

  • Matryoshka embeddings, which allow users to trade between efficiency and precision flexibly.

  • Ability to process significantly longer contexts (up to 8192 tokens) compared to the original ColBERT. This capability is crucial for handling documents with extensive content, providing more detailed and contextual search results.