🤔 Do you wait for the next embedding model? Don't! Just tell us what you want your embeddings to excel at, e.g., car insurance claims, financial news, or Spanish dialogs. Specify your wish in a prompt; and remember this is your only input to our API. In about 30 mins, we then deliver a ready-to-use, fine-tuned embedding model that can be loaded via SentenceTransformers. Behind the scenes, we take care of everything else: from generating useful synthetic data to managing the train-eval-test ML workflow, and finally, uploading the fine-tuned model to the Hugging Face Hub. Yep, under this very minimal UI abstraction, so much happens!
This is a new feature that we are alpha-testing with invited users. A minimalistic fine-tuning UX that eliminates the need for uploading reference data and manual triplet/hard-negative mining. As a user, you simply need to specify your expectations. For instance, "I want my embeddings to excel at biomedical literature" or for a more detailed instruction, "Please make it more effective on various subfields of artificial intelligence, particularly focusing on distinctions between machine learning, deep learning, and neural networks."
But how can we ensure the quality of the fine-tuned models? By feeding them high-quality data! To be frank, it’s not an easy job especially when we talk about synthetic data from LLMs. It’s easy to get started but hard to get it right. Simple prompting can give some ("boring") results, but finding diversified, effective and hard-negative triplets requires significant prompt engineering.
We proposed a Stochastic Augmented Generation framework, which has proven to be highly effective in generating effective training data for embedding models, under a configurable budget. Have a look at the graphic to find out more.
AI Innovation Strategist, Creative Director & Data Scientist | Ex: BBDO, Publicis, DDB | BMW, VW, IKEA, Vodafone, Bosch + 6 Startup Launches: Challenges welcome
3w1. I – as half Greek – approve Berlin's relocation to Thessaloniki. 2. I disagree with the other commenters and believe that Shenzen will soon have grown and expanded to reflect your positioning. 3. Your left trouser leg lays improperly on your shoe. 4. Europe's and Asia's proportions are massively wrong. 5. Your AI avatar's left hand was rendered incorrectly – just look at that exaggerated trigger finger! 6. You cactus needs water! 7. Your heater elements are mounted very NSFW! It looks like they're tongue-kissing! #obscene 8. Bad hair day.