https://store-images.s-microsoft.com/image/apps.10812.231074ba-a024-4951-81a9-2b076bcd8e7c.618dd513-ef77-40a5-ab2b-b172336ba8ed.0ed5d9d4-7091-4ae4-a80d-609236d232ab

Jina Embeddings v3

Jina AI

Jina Embeddings v3

Jina AI

New State-of-the-Art Multilingual Embeddings With Task LoRA

  • jina-embeddings-v3 is a multilingual multi-task text embedding model designed for a variety of NLP applications.
  • Based on the Jina-XLM-RoBERTa architecture, this model supports Rotary Position Embeddings to handle long input sequences up to 8192 tokens.
  • Additionally, it features 5 LoRA adapters to generate task-specific embeddings efficiently.

Highlights:
  • Extended Sequence Length: Supports up to 8192 tokens with RoPE.

  • Task-Specific Embedding: Customize embeddings through the task argument with the following options:

- retrieval.query: Used for query embeddings in asymmetric retrieval tasks
- retrieval.passage: Used for passage embeddings in asymmetric retrieval tasks
- separation: Used for embeddings in clustering and re-ranking applications
- classification: Used for embeddings in classification tasks
- text-matching: Used for embeddings in tasks that quantify similarity between two texts, such as STS or symmetric retrieval tasks

  • Matryoshka Embeddings: Supports flexible embedding sizes (32, 64, 128, 256, 512, 768, 1024), allowing for truncating embeddings to fit your application.