Building MuRAG (Multimodal RAG)? We make 𝐉𝐢𝐧𝐚 𝐂𝐋𝐈𝐏 open source and available on Hugging Face! You can now use it via 𝐓𝐫𝐚𝐧𝐬𝐟𝐨𝐫𝐦𝐞𝐫𝐬 or 𝐓𝐫𝐚𝐧𝐬𝐟𝐨𝐫𝐦𝐞𝐫𝐬.𝐣𝐬. The biggest improvement of Jina CLIP over the OpenAI CLIP is that you can use this single model 𝐟𝐨𝐫 𝐭𝐞𝐱𝐭-𝐭𝐞𝐱𝐭 𝐫𝐞𝐭𝐫𝐢𝐞𝐯𝐚𝐥 along with text-image, image-text, image-image retrieval! So one model, two modalities, four search directions! You can check out our paper on Arxiv. If you want to get a quick taste of the model, feel free to try our Embedding API.
🤗 : https://lnkd.in/e9jVd4cS
Arxiv: https://lnkd.in/e8mYHNYJ
API: https://lnkd.in/da3gBd6v
🔥 6x LinkedIn Top Voice | Sr AWS AI ML Solution Architect at IBM | Generative AI Expert | Author - Hands-on Time Series Analytics with Python | IBM Quantum ML Certified | 12+ Years in AI | MLOps | IIMA | 100k+Followers
What impact does the availability of Jina CLIP on Hugging Face have on multimodal retrieval tasks in real-world applications? What are the significant challenges encountered when utilizing this model for text-image, image-text, and image-image retrieval, and how can these challenges be addressed in practical scenarios?
Our latest embedding and reranker models, 𝐉𝐢𝐧𝐚-𝐂𝐋𝐈𝐏 and 𝐉𝐢𝐧𝐚-𝐑𝐞𝐫𝐚𝐧𝐤𝐞𝐫-𝐕𝟐-𝐌𝐮𝐥𝐭𝐢𝐥𝐢𝐧𝐠𝐮𝐚𝐥, are now optimized for AWS SageMaker! For enterprise users, deploying these models within your own AWS or Azure account offers good advantages in terms of compliance and data management. By integrating directly with your existing cloud infrastructure, you can ensure enhanced security, better control over your data, and improved latency compared to calling API. Explore our search foundation models and get started by checking out our integration page below. https://lnkd.in/e2ynp_kE
One cool thing about ColBERT-based search vs. the cosine-based vector retrieval is that you get 𝐢𝐧𝐭𝐞𝐫𝐩𝐫𝐞𝐭𝐚𝐛𝐢𝐥𝐢𝐭𝐲 for free as a byproduct of the MaxSim computation. It's kind of like the Lucene highlighter, letting you grab the most relevant snippets from a long document to show users where their query matches. With 𝐉𝐢𝐧𝐚-𝐂𝐨𝐥𝐁𝐄𝐑𝐓-𝐯𝟏, which supports up to 8K token length, released from Jina AI earlier this Feb., the visualization of the late interaction between a query and a document is almost... artistic. The video shows the late interaction between the query "Elephants eat 150 kg of food per day." and the Wikipedia article about "Indian Elephant". Darker colors indicate stronger interactions. The darkest area corresponds to "The species is classified as a megaherbivore and consume up to 150 kg (330 lb) of plant matter per day." from the original article.
You can use Jina-ColBERT-v1 via our Embedding API https://jina.ai/embeddings Make sure to select this model in the model dropdown.
Don't forget to check out this article where we explain how those graphs were made.
https://lnkd.in/eQCHbTsN
Can't we just use LLM for reranking? Just throw the 𝐪𝐮𝐞𝐫𝐲, 𝐝𝐨𝐜𝟏, 𝐝𝐨𝐜𝟐,...𝐝𝐨𝐜𝐍 into the context window and let the LLM figure out the top-K? It turns out we can, but not in the way you might think. Announcing our latest research, "𝐋𝐞𝐯𝐞𝐫𝐚𝐠𝐢𝐧𝐠 𝐏𝐚𝐬𝐬𝐚𝐠𝐞 𝐄𝐦𝐛𝐞𝐝𝐝𝐢𝐧𝐠𝐬 𝐟𝐨𝐫 𝐄𝐟𝐟𝐢𝐜𝐢𝐞𝐧𝐭 𝐋𝐢𝐬𝐭𝐰𝐢𝐬𝐞 𝐑𝐞𝐫𝐚𝐧𝐤𝐢𝐧𝐠 𝐰𝐢𝐭𝐡 𝐋𝐚𝐫𝐠𝐞 𝐋𝐚𝐧𝐠𝐮𝐚𝐠𝐞 𝐌𝐨𝐝𝐞𝐥𝐬," a joint work with Renmin University of China where we introduce PE-Rank, a new LLM-based reranker for efficient listwise passage reranking. Instead of feeding raw text into the context window, we represent each passage using embedding models as a special token <𝐝𝟏>, <𝐝𝟐>, and then feed those special tokens to LLMs. At inference time, PE-Rank restricts the output space to these special tokens, enabling more efficient decoding. Overall, PE-Rank greatly reduces the latency from 21s to 3s for reranking 100 documents using LLM. Learn more about this work at https://lnkd.in/eTf8_UFG
Check out Reranker v2 in our Search Foundation API: https://jina.ai/reranker
An Easter egg hidden on Jina AI website is the use of Reranker for article recommendations. Go to any blog post page, hit 𝐒𝐡𝐢𝐟𝐭+𝟐, and you will get the top 5 related articles from all the posts we've published so far. Its implementation is deadly simple - it grabs the current post as the `query` and uses the content of all 240 posts as `documents`, sending them to our Reranker API asking for `top_k=5`. Inefficient? Maybe, but our reranker is built for long-context and low-latency, so let it be! With the recent release of Reranker V2, we gave this easter egg another try (/fry), and the results are interesting. Comparing V2 to V1, we can observe the following:
1. V2 is faster than V1. The video below is not sped up. Remember, we simply send the raw content of all articles on the site directly to the Reranker API, which is ~1M tokens in one rerank request.
2. V2 gives much better recommendations than V1. We examined a challenging case, "Artificial General Intelligence is Cursed, And Science Fiction Isn't Helping." Many rerankers fail to recommend related articles from the pool; our V2 recommends 5/5 related articles, whereas V1 only recommends 3/5, and the scoring of V1 is not entirely accurate.
https://lnkd.in/eUMe9U8U
This is how we implemented this feature. It's incredibly simple—probably too simple—but hey, it works! We're not big fans of over-engineering anyway.
Reranker v2 API: https://jina.ai/reranker
Reranker v2 Release Note: https://lnkd.in/eZWQtJYU
Today, we are releasing 𝗝𝗶𝗻𝗮 𝗥𝗲𝗿𝗮𝗻𝗸𝗲𝗿 𝘃𝟮 (jina-reranker-v2-base-multilingual), our latest and the most powerful neural reranker model in the family of Jina AI search foundation. With Reranker v2, developers of RAG/search systems can enjoy:
- 𝐌𝐮𝐥𝐭𝐢𝐥𝐢𝐧𝐠𝐮𝐚𝐥: More relevant search results in 100+ languages, outperforming bge-reranker-v2-m3 with half of its size.
- 𝐀𝐠𝐞𝐧𝐭𝐢𝐜: State-of-the-art function-calling and text-to-SQL aware document reranking for agentic RAG;
- 𝐂𝐨𝐝𝐞 𝐫𝐞𝐭𝐫𝐢𝐞𝐯𝐚𝐥: Top performance on code retrieval tasks;
- 𝐔𝐥𝐭𝐫𝐚-𝐟𝐚𝐬𝐭:15x more documents throughput than bge-reranker-v2-m3, and 6x more than our v1 model.
You can get started with using Jina Reranker v2 via our Reranker API, where we are offering 1M free tokens for all new users.
API: https://jina.ai/rerankerHugging Face: https://lnkd.in/eRa7FWPN
A recent drama of the Llama3-V project from Stanford University, which was found to have plagiarized MiniCPM-Llama3-V 2.5 from Tsinghua University. Aside from the LLM hype itself, there are several interesting perspectives on this drama: 1. It occurs amidst geopolitical tensions between the US and China in the AI race; 2. It involves two top universities from the East and West; 3. The perceived copycat, team China, has now become a victim (I've already seen quite a bit of excitement from Chinese mainstream media about this aspect); 4. It raises questions about the applicability of traditional open-source licensing, designed for software, to AI models.
Not sure new founders today 𝙜𝙚𝙣𝙪𝙞𝙣𝙚𝙡𝙮 want their AI businesses to be "shovel-businesses." 𝗜𝗠𝗢, 𝘆𝗼𝘂 𝗱𝗼𝗻'𝘁! This "shovel-is-the-best" concept is almost imposed by few SaaS investors with a pre-AI mindset, creating an echo chamber that other investors & founders just follow for years. But if you think again, is this still true today? Are you a GPU-poor trying to compete with GPU-riches? Shouldn't we focus more on end-user experience rather than the tools/models? And why do devs need another opinionated AI framework? I really like what I heard from Mikey Shulman in a recent episode of the Unsupervised Learning podcast, where Mikey said, "It is the same people building and investing in AI companies as it was building and investing in SaaS companies a decade ago. We and everyone else in this game are almost blindly adapting the SaaS pricing to a world that is not super relevant... I think it is entirely reasonable to have done this, but it probably won't look like this in 10 years."
CLIP bridges text and image, but literally nobody used it for text retrieval—𝙪𝙣𝙩𝙞𝙡 𝙣𝙤𝙬. We're excited to introduce 𝐉𝐢𝐧𝐚 𝐂𝐋𝐈𝐏: a CLIP-like model that's great at text-text, text-image, image-text, and image-image retrieval. From now on, your Jina CLIP model 𝐢𝐬 𝐚𝐥𝐬𝐨 your text retriever. No need to switch between different embedding models when building MuRAG (Multimodal RAG) - one model, two modalities, four search directions. Not to mention it also handle 8K context length. So how we did it? Read more: https://lnkd.in/e8mYHNYJ
🔥 6x LinkedIn Top Voice | Sr AWS AI ML Solution Architect at IBM | Generative AI Expert | Author - Hands-on Time Series Analytics with Python | IBM Quantum ML Certified | 12+ Years in AI | MLOps | IIMA | 100k+Followers
3wWhat impact does the availability of Jina CLIP on Hugging Face have on multimodal retrieval tasks in real-world applications? What are the significant challenges encountered when utilizing this model for text-image, image-text, and image-image retrieval, and how can these challenges be addressed in practical scenarios?