Cosmin Negruseri’s Post

cofounder at stealth startup

1w Edited

Sharing something that may be obvious to people working in the search space but maybe a bit less obvious people new to genai, search or RAG. Traditional Search is getting updated in the post ChatGPT Era and is getting democratized. Traditional systems involve many steps: - Crawling, Indexing, Filtering, Deduplication - Query Understanding - User Understanding/Personalization - Document Understanding - Multiple Candidate Generators - Multiple Ranking Layers - Business Logic Integration These steps create a list of 10 blue links. In the post-ChatGPT world, LLMs can improve or replace many search components: - Better Document Understanding: Use LLMs to summarize and extract information from documents. - User Understanding: Use LLMs to understand what users like and engage with. - Ranking: Let LLMs rank answers based on relevance and quality. - Quality Rating and evaluation: Use LLMs to rate the quality of answers, reducing the need for manual raters. - Answer Generation (or rather Retrieval Augmented Generation): Provide direct answers with references, not just blue links and the ability to continue chatting while keeping the context. Think of it as having a smart high schooler to improve or replace any part of the old search system. This intelligence is democratized, building a search system is no longer only for big companies with lots of resources and having historical user actions.

10 Comments

John Milinovich

Head of GenAI Product at Canva

While I think this will be true in a few years, we’re not there yet for doing inference at query time (ie query understanding and ranking) due to latency issues. Search p99 needs to be measured in milliseconds whereas LLM p99 can still be 10s+. I’m optimistic that fine tuned models and SLMs will help us get there soon-ish though.

5 Reactions

Cătălin Moraru

Software Engineer at Google

But won’t big companies have the best, most complex GenAI? Maybe it won’t be the same companies, but still does not seem to be a democracy.

Robb Beal

Digital Product and Experience Leader | Startups: $75M Total Investment and 3 Exits

For a user-centric perspective, see also: https://dl.acm.org/doi/pdf/10.1145/3649468

Karthik Halukurike

The cost curve is still an issue for such an LLM-centric search stack to fully manifest at scale. I do expect that to happen as smaller models like Gemma, Phi and Mistral7B become smarter and serving infrastructure is further optimized.

3 Reactions

See more comments

To view or add a comment, sign in

More Relevant Posts

Cosmin Negruseri

cofounder at stealth startup
2w Edited
Report this post
Curious how the LLM/AGI race will turn out, one winner or multiple winners with very similar model quality. A year and a half after the chatGPT drop the LLM race looks very similar to a Kaggle competition. In Kaggle the top teams usually are neck and neck in terms of quality. Even the ideas in the top solutions have a large overlap. It's surprising how many parallels there are between the LLM race and Kaggle: - teams with more compute can iterate on more ideas - teams experiment fast with lots of things, there's no one thing that makes or breaks the final solution, it's a set of ideas that works well together - there's a dynamic where one sees that some team made a break through and soon other teams just by knowing there's a better solution find it as well - smart ideas "leak" (or rather are shared by people on the forum :) which reminds me of meta releasing llama or of people transferring between frontier labs) - the benchmark/dataset saturates by the end of the competition
Like Comment
To view or add a comment, sign in
Cosmin Negruseri

cofounder at stealth startup
1mo
Report this post
Looks like at Google AI writes 50% of the new code. Extrapolating the trend says we won't need coders somewhat soon :D. https://lnkd.in/g5XWRipc
4 Comments
Like Comment
To view or add a comment, sign in
Cosmin Negruseri

cofounder at stealth startup
2mo Edited
Report this post
Hao Liu (ring attention author) gives a talk at Berkeley last month covering llms, skimmed the slides, they look very good if one needs to catch up with the current state of LLMs. Some things that are discussed: - scaling laws - long context - ring attention - flash attention - mixture of experts - mixtral model - inference bottlenecks - kvcache - chain of thought

L8 Large Language Models (LLMs) --- Guest Instructor Hao Liu

https://www.youtube.com/
Like Comment
To view or add a comment, sign in
Cosmin Negruseri

cofounder at stealth startup
3mo Edited
Report this post
Skimmed the slides, they look great! Good way to catch up on what are the important pieces of training large models. All slides are insightful. I loved the slides on data preparation the most. Other things discussed: - rlhf: ppo vs dpo - flash attention - mixture of experts - parallel training - muP for hpt transfer - speculative decoding Thomas Wolf you also need to cover 1M context size and multimodal models it's 2024 :D
Thomas Wolf

Co-founder and CSO at 🤗 Hugging Face
3mo Edited

[75min talk] I finally recorded this lecture I gave two weeks ago because people kept asking me for a video So here it is, enjoy "The Little guide to building Large Language Models in 2024" I tried to keep it short and comprehensive – focusing on concepts that are crucial for training good LLM but often hidden in tech reports In the lecture, I introduce the students to all the important concepts/tools/techniques for training good performance LLM: - finding, preparing and evaluating web scale data - understanding model parallelism and efficient training - fine-tuning/aligning models - fast inference Now that I recorded it I think this will likely be part 1 of a two-parts series since I would love to make a 2nd fully hands-on video on how to run all these steps with the libraries and recipes we've released recently at HF: - datatrove for all things web-scale data preparation: https://lnkd.in/e_gsdY4k - nanotron for lightweight 4D parallelism LLM training: https://lnkd.in/eGfSmm8G - lighteval for in-training fast parallel LLM evaluations: https://lnkd.in/ecffYG-P Here is the link to watch the lecture on Youtube: https://lnkd.in/eRBiT2Jq And here is the link to the Google slides: https://lnkd.in/eHzymVdt Enjoy –
Like Comment
To view or add a comment, sign in
Cosmin Negruseri

cofounder at stealth startup
4mo
Report this post
Dataset release from romanian students (feb 20). They share problems from infoarena.ro the most popular coding contest platform in romania, a website that's very dear to a lot of us who got started in the romanian programming contest community. https://lnkd.in/gZ9RgpYP
2 Comments
Like Comment
To view or add a comment, sign in
Cosmin Negruseri

cofounder at stealth startup
5mo
Report this post
Here's a good term glossary if you wanna catch up with the cool kids who are prompt engineering, finetuning, merging models, doing rag etc via Omar Sanseviero https://lnkd.in/gtumCyy2
Like Comment
To view or add a comment, sign in
Cosmin Negruseri

cofounder at stealth startup
5mo
Report this post
Google Deepmind is getting close to IMO gold medalists on geometry problems. Exciting times. https://lnkd.in/grDEyYtX

Solving olympiad geometry without human demonstrations - Nature

nature.com

4 Comments
Like Comment
To view or add a comment, sign in
Cosmin Negruseri

cofounder at stealth startup
6mo Edited
Report this post
ML checklist from Aurélien Géron
1 Comment
Like Comment
To view or add a comment, sign in
Cosmin Negruseri

cofounder at stealth startup
6mo Edited
Report this post
It's a year since ChatGPT rolled out. Established companies and startups jumped in, eager to leverage LLMs for a myriad of use cases from code assistants to customer service. GenAI is democratizing AI abilities for beginners and advanced practitioners. One thing that you see again and again is RAG (Retrieval Augmented Generation). demos The idea is simple: mix a database with LLMs, and you've got a system that can update, remember and think. It's a neat trick and makes for impressive demos. But when it's time to scale? That's a whole different ball game. A related point is made on a twitter thread by Omar K. and Richard Socher on twitter: chatGPT with GPT4 is hard to match or beat because it is not just a LLM but it is a full fledge system with many moving pieces. The RAG problem and stack is very similar to the classical search and retrieval stacks or recommender system stacks. At Pinterest, we built recommendation systems that were like finely tuned engines, running on a mix of data types, with tens of indexes, multiple retrival algorithms, and multitask models. All that, plus a bunch of filters to keep things safe. The diagram here shows the basics of a RAG app versus the complex machinery of a full-blown rec system. It's clear there's a lot under the hood. Applications like ChatGPT, Bard, Perplexity, You.com arguably have similar complexity. I'm curious, what examples of RAG systems do you know of that work well in production?
1 Comment
Like Comment
To view or add a comment, sign in

1,773 followers

45 Posts

View Profile Follow

Cosmin Negruseri’s Post

More Relevant Posts

L8 Large Language Models (LLMs) --- Guest Instructor Hao Liu

https://www.youtube.com/

Explore topics