https://store-images.s-microsoft.com/image/apps.10812.2502a9e6-76bb-4e93-a354-3469f9c349c6.a576e1cd-6f44-42d9-8c2b-79d5ec36ac17.3ae9a7d7-db11-4dbb-ac00-dcd43b3e7b9f

Jina Embeddings v2 Base - zh

Jina AI

Jina Embeddings v2 Base - zh

Jina AI

Text embedding model (base) for Chinese and English input of size up to 8192 tokens.

  • jina-embeddings-v2-base-zh is an open-source bilingual Chinese-English embedding model supporting 8192 sequence length.
  • This state-of-the-art AI embedding model enables many applications, such as document clustering, classification, content personalization, vector search, or retrieval augmented generation.

Highlights:
  • State-of-the-art: This model is designed for high performance in mono-lingual & cross-lingual applications and has been trained specifically to support mixed Chinese-English input without bias.
  • Extended Context: An 8192-token length enables jina-embeddings-v2-base-zh to support longer texts and document fragments, far surpassing models that only support a few hundred tokens at a time.
  • Compact Size: jina-embeddings-v2-base-zh is built for high performance on standard computer hardware. With only 161 million parameters, the entire model is only 322MB. The embeddings themselves are 768 dimensions, a relatively small vector size compared to many models, saving space and run-time for applications.