https://store-images.s-microsoft.com/image/apps.10812.c00af664-d44b-477f-9158-cc4e276a2e8a.713e88f1-4858-4a4c-90a6-9151154a4376.f7b92283-e8a1-4422-84df-95b0af797e5d

Jina CLIP v1
Jina AI

Categories

AI + Machine Learning Compute

Support

Legal

License Agreement Privacy Policy

Jina CLIP v1

Jina AI

Overview Plans Ratings + reviews

Embedding model for cross-modal and multimodal retrieval for text and image data

With jina-clip-v1, users have a single embedding model that delivers state-of-the-art performance in both text-only and text-image cross-modal retrieval.
Jina AI has improved on OpenAI CLIP’s performance by 165% in text-only retrieval, and 12% in image-to-image retrieval, with identical or mildly better performance in text-to-image and image-to-text tasks.

Highlights:

Superior performance on all combinations of modalities, and especially large improvements in text-only embedding performance.
Support for much longer text inputs. Jina Embeddings’ 8k token input support makes it possible to process detailed textual information and correlate it with images.
A large net savings in space, compute, code maintenance, and complexity because this multimodal model is highly performant even in non-multimodal scenarios.