SlideShare a Scribd company logo
H2O.ai Confidential
JERRY LIU
CEO, LlamaIndex
H2O.ai Confidential
GenAI - Enterprise Use-cases
Document Processing
Tagging & Extraction
Knowledge Search & QA
Conversational Agent Workflow Automation
Agent: …
Human: …
Agent: …
Document
Topic:
Summary:
Author:
Knowledge Base
Answer:
Sources: …
Workflow:
● Read latest messages from user A
● Send email suggesting next-steps
Inbox
read
Email
write
H2O.ai Confidential
GenAI - Enterprise Use-cases
Document Processing
Tagging & Extraction
Knowledge Search & QA
Conversational Agent Workflow Automation
Agent: …
Human: …
Agent: …
Document
Topic:
Summary:
Author:
Knowledge
Base
Answer:
Sources: …
Workflow:
● Read latest messages
from user A
● Send email suggesting
next-steps
Inbox
read
Email
write
H2O.ai Confidential
RAG Stack

Recommended for you

How Does Generative AI Actually Work? (a quick semi-technical introduction to...
How Does Generative AI Actually Work? (a quick semi-technical introduction to...How Does Generative AI Actually Work? (a quick semi-technical introduction to...
How Does Generative AI Actually Work? (a quick semi-technical introduction to...

This document provides a technical introduction to large language models (LLMs). It explains that LLMs are based on simple probabilities derived from their massive training corpora, containing trillions of examples. The document then discusses several key aspects of how LLMs work, including that they function as a form of "lossy text compression" by encoding patterns and relationships in their training data. It also outlines some of the key elements in the architecture and training of the most advanced LLMs, such as GPT-4, focusing on their huge scale, transformer architecture, and use of reinforcement learning from human feedback.

chatgpt llm generative ai
Prompting is an art / Sztuka promptowania
Prompting is an art / Sztuka promptowaniaPrompting is an art / Sztuka promptowania
Prompting is an art / Sztuka promptowania

The document discusses advances in large language models from GPT-1 to the potential capabilities of GPT-4, including its ability to simulate human behavior, demonstrate sparks of artificial general intelligence, and generate virtual identities. It also provides tips on how to effectively prompt ChatGPT through techniques like prompt engineering, giving context and examples, and different response formats.

gptgpt4gpt3
And then there were ... Large Language Models
And then there were ... Large Language ModelsAnd then there were ... Large Language Models
And then there were ... Large Language Models

It is not often even in the ICT world that one witnesses a revolution. The rise of the Personal Computer, the rise of mobile telephony and, of course, the rise of the Internet are some of those revolutions. So what is ChatGPT really? Is ChatGPT also such a revolution? And like any revolution, does ChatGPT have its winners and losers? And who are they? How do we ensure that ChatGPT contributes to a positive impulse for "Smart Humanity?". During a key note om April 3 and 13 2023 Piek Vossen explained the impact of Large Language Models like ChatGPT. Prof. PhD. Piek Th.J.M. Vossen, is Full professor of Computational Lexicology at the Faculty of Humanities, Department of Language, Literature and Communication (LCC) at VU Amsterdam: What is ChatGPT? What technology and thought processes underlie it? What are its consequences? What choices are being made? In the presentation, Piek will elaborate on the basic principles behind Large Language Models and how they are used as a basis for Deep Learning in which they are fine-tuned for specific tasks. He will also discuss a specific variant GPT that underlies ChatGPT. It covers what ChatGPT can and cannot do, what it is good for and what the risks are.

artificial intelligencechatgptresearch
H2O.ai Confidential
Current RAG Stack for building a QA System
Vector Database
Doc
Chunk
Chunk
Chunk
Chunk
Chunk
Chunk
Chunk
LLM
Data Ingestion Data Querying (Retrieval + Synthesis)
5 Lines of Code in LlamaIndex!
H2O.ai Confidential
Challenges with “Naive” RAG
H2O.ai Confidential
Challenges with Naive RAG (Response Quality)
● Bad Retrieval
○ Low Precision: Not all chunks in retrieved set are relevant
■ Hallucination + Lost in the Middle Problems
○ Low Recall: Now all relevant chunks are retrieved.
■ Lacks enough context for LLM to synthesize an answer
○ Outdated information: The data is redundant or out of date.
H2O.ai Confidential
Challenges with Naive RAG (Response Quality)
● Bad Retrieval
○ Low Precision: Not all chunks in retrieved set are relevant
■ Hallucination + Lost in the Middle Problems
○ Low Recall: Now all relevant chunks are retrieved.
■ Lacks enough context for LLM to synthesize an answer
○ Outdated information: The data is redundant or out of date.
● Bad Response Generation
○ Hallucination: Model makes up an answer that isn’t in the context.
○ Irrelevance: Model makes up an answer that doesn’t answer the question.
○ Toxicity/Bias: Model makes up an answer that’s harmful/offensive.

Recommended for you

Large Language Models Bootcamp
Large Language Models BootcampLarge Language Models Bootcamp
Large Language Models Bootcamp

This document provides a 50-hour roadmap for building large language model (LLM) applications. It introduces key concepts like text-based and image-based generative AI models, encoder-decoder models, attention mechanisms, and transformers. It then covers topics like intro to image generation, generative AI applications, embeddings, attention mechanisms, transformers, vector databases, semantic search, prompt engineering, fine-tuning foundation models, orchestration frameworks, autonomous agents, bias and fairness, and recommended LLM application projects. The document recommends several hands-on exercises and lists upcoming bootcamp dates and locations for learning to build LLM applications.

llms large language models
Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks.pdf
Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks.pdfRetrieval-Augmented Generation for Knowledge-Intensive NLP Tasks.pdf
Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks.pdf

The document describes the RAG (Retrieval-Augmented Generation) model for knowledge-intensive NLP tasks. RAG combines a pre-trained language generator (BART) with a dense passage retriever (DPR) to retrieve and incorporate relevant knowledge from Wikipedia. RAG achieves state-of-the-art results on open-domain question answering, abstractive question answering, and fact verification by leveraging both parametric knowledge from the generator and non-parametric knowledge retrieved from Wikipedia. The retrieved knowledge can also be updated without retraining the model.

computer sciencemachine learning
LLMs Bootcamp
LLMs BootcampLLMs Bootcamp
LLMs Bootcamp

This document provides information about a bootcamp to build applications using Large Language Models (LLMs). The bootcamp consists of 11 modules covering topics such as introduction to generative AI, text analytics techniques, neural network models for natural language processing, transformer models, embedding retrieval, semantic search, prompt engineering, fine-tuning LLMs, orchestration frameworks, the LangChain application platform, and a final project to build a custom LLM application. The bootcamp will be held in various locations and dates between September 2023 and January 2024.

llms
H2O.ai Confidential
What do we do?
• Data: Can we store additional information beyond raw text chunks?
• Embeddings: Can we optimize our embedding representations?
• Retrieval: Can we do better than top-k embedding lookup?
• Synthesis: Can we use LLMs for more than generation?
Vector
Database
Doc
Chunk
Chunk
Chunk
Chunk
Chunk LLM
Data Embeddings Retrieval Synthesis
H2O.ai Confidential
What do we do?
• Data: Can we store additional information beyond raw text chunks?
• Embeddings: Can we optimize our embedding representations?
• Retrieval: Can we do better than top-k embedding lookup?
• Synthesis: Can we use LLMs for more than generation?
But before all this…
We need evals
H2O.ai Confidential
Evaluation
H2O.ai Confidential
Evaluation
● How do we properly evaluate a RAG system?
○ Evaluate in isolation (retrieval, synthesis)
○ Evaluate e2e
Vector
Database
Chunk
Chunk
Chunk
LLM
Retrieval Synthesis

Recommended for you

Prompt Engineering by Dr. Naveed.pdf
Prompt Engineering by Dr. Naveed.pdfPrompt Engineering by Dr. Naveed.pdf
Prompt Engineering by Dr. Naveed.pdf

Prompt engineering is a fundamental concept within the field of artificial intelligence, with particular relevance to natural language processing. It involves the strategic embedding of task descriptions within the input data of an AI system, often in the form of a question or query, as opposed to explicitly providing the task description separately. This approach optimizes the efficiency and effectiveness of AI models by encapsulating the desired outcome within the input context, thereby enabling more streamlined and context-aware responses.

aiprompt engineering
Transformers, LLMs, and the Possibility of AGI
Transformers, LLMs, and the Possibility of AGITransformers, LLMs, and the Possibility of AGI
Transformers, LLMs, and the Possibility of AGI

The document provides an overview of transformers, large language models (LLMs), and artificial general intelligence (AGI). It discusses the architecture and applications of transformers in natural language processing. It describes how LLMs have evolved from earlier statistical models and now perform state-of-the-art results on NLP tasks through pre-training and fine-tuning. The document outlines the capabilities of GPT-3, the largest LLM to date, as well as its limitations and ethical concerns. It introduces AGI and the potential for such systems to revolutionize AI, while also noting the technical, ethical and societal challenges to developing AGI.

machine learningnatural language processingchatgpt
LanGCHAIN Framework
LanGCHAIN FrameworkLanGCHAIN Framework
LanGCHAIN Framework

Langchain Framework is an innovative approach to linguistic data processing, combining the principles of language sciences, blockchain technology, and artificial intelligence. This deck introduces the groundbreaking elements of the framework, detailing how it enhances security, transparency, and decentralization in language data management. It discusses its applications in various fields, including machine learning, translation services, content creation, and more. The deck also highlights its key features, such as immutability, peer-to-peer networks, and linguistic asset ownership, that could revolutionize how we handle linguistic data in the digital age.

langchain frameworklinguistic dataartificial intelligence
H2O.ai Confidential
Evaluation in Isolation (Retrieval)
● Evaluate quality of retrieved
chunks given user query
● Create dataset
○ Input: query
○ Output: the “ground-truth”
documents relevant to the
query
● Run retriever over dataset
● Measure ranking metrics
○ Success rate / hit-rate
○ MRR
○ Hit-rate
H2O.ai Confidential
Evaluation E2E
● Evaluation of final generated
response given input
● Create Dataset
○ Input: query
○ [Optional] Output: the
“ground-truth” answer
● Run through full RAG pipeline
● Collect evaluation metrics:
○ If no labels: label-free evals
○ If labels: with-label evals
H2O.ai Confidential
Optimizing RAG Systems
H2O.ai Confidential
From Simple to Advanced
Less Expressive
Easier to Implement
Lower Latency/Cost
More Expressive
Harder to Implement
Higher Latency/Cost
Table Stakes
Better Parsers
Chunk Sizes
Prompt Engineering
Customizing Models
🛠️
Advanced Retrieval
Metadata Filtering
Recursive Retrieval
Embedded Tables
Small-to-big Retrieval
🔎
Agentic Behavior
Routing
Query Planning
Multi-document Agents
️
Fine-tuning
Embedding fine-tuning
LLM fine-tuning
⚙️

Recommended for you

[DSC Europe 23] Spela Poklukar & Tea Brasanac - Retrieval Augmented Generation
[DSC Europe 23] Spela Poklukar & Tea Brasanac - Retrieval Augmented Generation[DSC Europe 23] Spela Poklukar & Tea Brasanac - Retrieval Augmented Generation
[DSC Europe 23] Spela Poklukar & Tea Brasanac - Retrieval Augmented Generation

Retrieval Augmented Generation (RAG) combines the concepts of semantic search and LLM-based text generation. When a person makes a query in natural language, the query is compared to the entries in the knowledge base and most relevant results are returned to the LLM, which uses this extra information to generate more accurate and reliable response. RAG can therefore limit hallucination and provide accurate responses from reliable source. In this talk, we will present the concept of RAG and underlying concept of semantic search, and present available libraries and vector databases.

OpenAI’s GPT 3 Language Model - guest Steve Omohundro
OpenAI’s GPT 3 Language Model - guest Steve OmohundroOpenAI’s GPT 3 Language Model - guest Steve Omohundro
OpenAI’s GPT 3 Language Model - guest Steve Omohundro

In this research meeting, guest Stephen Omohundro gave a fascinating talk on GPT-3, the new massive OpenAI Natural Language Processing model. He reviewed the network architecture, training process, and results in the context of past work. There was extensive discussion on the implications for NLP and for Machine Intelligence / AGI. Link to GPT-3 paper: https://arxiv.org/abs/2005.14165 Link to YouTube recording of Steve's talk: https://youtu.be/0ZVOmBp29E0

openaigptnlp
Use Case Patterns for LLM Applications (1).pdf
Use Case Patterns for LLM Applications (1).pdfUse Case Patterns for LLM Applications (1).pdf
Use Case Patterns for LLM Applications (1).pdf

What are the "use case patterns" for deploying LLMs into production? Understanding these will allow you to spot "LLM-shaped" problems in your own industry.

generative aillms
H2O.ai Confidential
Table Stakes: Chunk Sizes
Tuning your chunk size can have outsized impacts on performance
Not obvious that more retrieved tokens == higher performance!
Note: Reranking (shuffling context order) isn’t always beneficial.
H2O.ai Confidential
Table Stakes: Prompt Engineering
RAG uses core Question-Answering (QA) prompt templates
Ways you can customize:
• Adding few-shot examples
• Modifying template text
• Adding emotions
H2O.ai Confidential
Table Stakes:
Customizing LLMs
Task performance on easy-
to-hard tasks (RAG, agents)
varies wildly among LLMs
H2O.ai Confidential
Table Stakes: Customizing Embeddings
Your embedding model + reranker affects retrieval quality
Source: https://blog.llamaindex.ai/boosting-rag-picking-the-best-embedding-reranker-models-42d079022e83

Recommended for you

How will development change with LLMs
How will development change with LLMsHow will development change with LLMs
How will development change with LLMs

GPT discusses various ways that language models can acquire external information as context to improve responses, including: 1) Querying search engines using APIs to incorporate search results into responses 2) Recognizing tasks from prompts and accessing databases or APIs to incorporate relevant information 3) Summarizing, calculating, and verifying information from external sources to provide more accurate answers

llmlangchainazure
Retrieval Augmented Generation in Practice: Scalable GenAI platforms with k8s...
Retrieval Augmented Generation in Practice: Scalable GenAI platforms with k8s...Retrieval Augmented Generation in Practice: Scalable GenAI platforms with k8s...
Retrieval Augmented Generation in Practice: Scalable GenAI platforms with k8s...

Mihai is the Principal Architect for Platform Engineering and Technology Solutions at IBM, responsible for Cloud Native and AI Solutions. He is a Red Hat Certified Architect, CKA/CKS, a leader in the IBM Open Innovation community, and advocate for open source development. Mihai is driving the development of Retrieval Augmentation Generation platforms, and solutions for Generative AI at IBM that leverage WatsonX, Vector databases, LangChain, HuggingFace and open source AI models. Mihai will share lessons learned building Retrieval Augmented Generation, or “Chat with Documents” platforms and APIs that scale, and deploy on Kubernetes. His talk will cover use cases for Generative AI, limitations of Large Language Models, use of RAG, Vector Databases and Fine Tuning to overcome model limitations and build solutions that connect to your data and provide content grounding, limit hallucinations and form the basis of explainable AI. In terms of technology, he will cover LLAMA2, HuggingFace TGIS, SentenceTransformers embedding models using Python, LangChain, and Weaviate and ChromaDB vector databases. He’ll also share tips on writing code using LLM, including building an agent for Ansible and containers. Scaling factors for Large Language Model Architectures: • Vector Database: consider sharding and High Availability • Fine Tuning: collecting data to be used for fine tuning • Governance and Model Benchmarking: how are you testing your model performance over time, with different prompts, one-shot, and various parameters • Chain of Reasoning and Agents • Caching embeddings and responses • Personalization and Conversational Memory Database • Streaming Responses and optimizing performance. A fine tuned 13B model may perform better than a poor 70B one! • Calling 3rd party functions or APIs for reasoning or other type of data (ex: LLMs are terrible at reasoning and prediction, consider calling other models) • Fallback techniques: fallback to a different model, or default answers • API scaling techniques, rate limiting, etc. • Async, streaming and parallelization, multiprocessing, GPU acceleration (including embeddings), generating your API using OpenAPI, etc.

langchainpythongenai
Episode 2: The LLM / GPT / AI Prompt / Data Engineer Roadmap
Episode 2: The LLM / GPT / AI Prompt / Data Engineer RoadmapEpisode 2: The LLM / GPT / AI Prompt / Data Engineer Roadmap
Episode 2: The LLM / GPT / AI Prompt / Data Engineer Roadmap

In this episode we'll discuss the different flavors of prompt engineering in the LLM/GPT space. According to your skill level you should be able to pick up at any of the following: Leveling up with GPT 1: Use ChatGPT / GPT Powered Apps 2: Become a Prompt Engineer on ChatGPT/GPT 3: Use GPT API with NoCode Automation, App Builders 4: Create Workflows to Automate Tasks with NoCode 5: Use GPT API with Code, make your own APIs 6: Create Workflows to Automate Tasks with Code 7: Use GPT API with your Data / a Framework 8: Use GPT API with your Data / a Framework to Make your own APIs 9: Create Workflows to Automate Tasks with your Data /a Framework 10: Use Another LLM API other than GPT (Cohere, HuggingFace) 11: Use open source LLM models on your computer 12: Finetune / Build your own models Series: Using AI / ChatGPT at Work - GPT Automation Are you a small business owner or web developer interested in leveraging the power of GPT (Generative Pretrained Transformer) technology to enhance your business processes? If so, Join us for a series of events focused on using GPT in business. Whether you're a small business owner or a web developer, you'll learn how to leverage GPT to improve your workflow and provide better services to your customers.

automationgptchatgpt
H2O.ai Confidential
Advanced Retrieval: Small-to-Big
Intuition: Embedding a
big text chunk feels
suboptimal.
Solution: Embed text
at the sentence-level -
then expand that
window during LLM
synthesis
H2O.ai Confidential
Advanced Retrieval: Small-to-Big
Sentence Window Retrieval (k=2)
Naive Retrieval (k=5)
Only one out of the 5 chunks is relevant
- “lost in the middle” problem
Leads to more precise
retrieval.
Avoids “lost in the
middle” problems.
H2O.ai Confidential
Advanced Retrieval:
Small-to-Big
Intuition: Embedding a big text chunk feels suboptimal.
Solution: Embed a smaller reference to the parent
chunk. Use parent chunk for synthesis
Examples: Smaller chunks, summaries, metadata
H2O.ai Confidential
Data Agents - LLM-powered knowledge workers
Email
Read latest
emails
Knowledge
Base
Retrieve
context
Analysis
Agent
Analyze
file
Slack
Send
update
Data
Agent

Recommended for you

Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...
Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...
Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...

- Jon McKinney, Director of Research, H2O.ai - Arno Candel, Chief Technology Officer, H2O.ai H2O Open Source GenAI World SF 2023

Challenges in AI LLMs adoption in the Enterprise
Challenges in AI LLMs adoption in the EnterpriseChallenges in AI LLMs adoption in the Enterprise
Challenges in AI LLMs adoption in the Enterprise

The presentation "ITDays_2023_GeorgeBara" discusses challenges in adopting AI large language models (LLMs) in enterprise settings. The presentation covers: 1. **Challenges in AI LLMs adoption**: It highlights the noise in the current AI landscape and questions the practical use of AI in real businesses. 2. **The DNA of an Enterprise**: Defines enterprise sizes and discusses the new solutions adoption process, emphasizing effective integration and minimizing disruption. 3. **Enterprise-Grade**: Lists qualities like robustness, reliability, scalability, performance, security, and support that are essential for enterprise-grade solutions. 4. **What are LLMs?**: Describes the pre-ChatGPT era with BERT, a model used for language understanding, and details its enterprise applications. 5. **LLM use-cases before ChatGPT**: Focuses on data triage, process automation, knowledge management, and the augmentation of business operations. 6. **EU Digital Decade Report**: Points out that AI adoption in Europe is slow and might not meet the 2030 targets. 7. **Adoption Challenges**: Addresses top challenges such as data security, predictability, performance, control, regulatory compliance, ethics, sustainability, and ROI. 8. **Conclusion**: Reflects on the slow adoption of AI in enterprises, suggesting that a surge might occur once the technology matures and is ready for enterprise use. The presenter concludes by stating that despite the hype around technologies like ChatGPT, enterprises are cautious and will adopt new technologies at their own pace. He anticipates a gradual then sudden adoption pattern once LLMs are proven to be enterprise-ready.

artificial intelligencelarge language modelstext analytics
Empowering Customers to Self Solve - A Findability Journey - Manikandan Sivan...
Empowering Customers to Self Solve - A Findability Journey - Manikandan Sivan...Empowering Customers to Self Solve - A Findability Journey - Manikandan Sivan...
Empowering Customers to Self Solve - A Findability Journey - Manikandan Sivan...

This document discusses Red Hat's efforts to empower customers to self-solve issues through improved search capabilities on their customer portal. It outlines what self-solve is, why it is important for both customers and businesses, and how Red Hat is enhancing search and findability to help customers resolve issues on their own. Key initiatives discussed include improving search relevance, integrating product metadata, handling complex error messages, customizing search for different products, and measuring success through decreased support cases and faster resolutions.

use caseactivate18
H2O.ai Confidential
Data Agents - Core Components
Agent Reasoning Loop
● ReAct Agent (any LLM)
● OpenAI Agent (only OAI)
Tools via LlamaHub
● Code interpreter
● Slack
● Notion
● Zapier
● … (15+ tools, ~100 loaders)
H2O.ai Confidential
Agentic Behavior: Multi-Document Agents
Intuition: There’s certain
questions that “top-k” RAG can’t
answer.
Solution: Multi-Document
Agents
● Fact-based QA and
Summarization over any
subsets of documents
● Chain-of-thought and
query planning.
H2O.ai Confidential
Fine-Tuning: Embeddings
Credits: Jo Bergum, vespa.ai
Intuition: Embedding Representations are not optimized over your dataset
Solution: Generate a synthetic query dataset from raw text chunks using LLMs
Use this synthetic dataset to finetune an embedding model.
H2O.ai Confidential
Fine-Tuning: LLMs
Intuition: Weaker LLMs are
not bad at response
synthesis, reasoning,
structured outputs, etc.
Solution: Generate a
synthetic dataset from raw
chunks (e.g. using GPT-4).
Help fix all of the above!

Recommended for you

Evolving s3 story
Evolving s3 storyEvolving s3 story
Evolving s3 story

This document summarizes the evolution of AppsFlyer's raw data product from a simple Spark script to a premium data service over 3 months. It began as a prototype to address large file sizes and numbers for BI clients. Challenges included scaling, monitoring, security and schema. Improvements such as Parquet format and stateful S3 reduced costs and improved performance. The service was abstracted into microservices with automated tasks, search, and notifications. Monitoring, cost optimization, and prioritizing jobs further refined the product. It concluded having transitioned to a premium, self-serve offering with onboarding and defined schemas.

hadoopsparkbig data
ShaREing Is Caring
ShaREing Is CaringShaREing Is Caring
ShaREing Is Caring

Halvar Flake and Sebastian Porst present BinCrowd, a tool for analyzing disassembled binaries. It allows uploading analysis results to a central database for later retrieval and comparison to other binaries. This helps identify code reuse across different programs. The presentation covers techniques for function matching and scoring file similarity. It also discusses how BinCrowd can be accessed using IDA Pro and managing access levels for team collaboration.

bincrowdreverse engineering
Infrastructure Challenges in Scaling RAG with Custom AI models
Infrastructure Challenges in Scaling RAG with Custom AI modelsInfrastructure Challenges in Scaling RAG with Custom AI models
Infrastructure Challenges in Scaling RAG with Custom AI models

Building Retrieval-Augmented Generation (RAG) systems with open-source and custom AI models is a complex task. This talk explores the challenges in productionizing RAG systems, including retrieval performance, response synthesis, and evaluation. We’ll discuss how to leverage open-source models like text embeddings, language models, and custom fine-tuned models to enhance RAG performance. Additionally, we’ll cover how BentoML can help orchestrate and scale these AI components efficiently, ensuring seamless deployment and management of RAG systems in the cloud.

H2O.ai Confidential
Resources
Production RAG
https://docs.llamaindex.ai/en/stabl
e/end_to_end_tutorials/dev_practi
ces/production_rag.html
Fine-tuning
https://docs.llamaindex.ai/en/stabl
e/end_to_end_tutorials/finetuning.
html
H2O.ai Confidential
Thanks!

More Related Content

What's hot

AI and ML Series - Introduction to Generative AI and LLMs - Session 1
AI and ML Series - Introduction to Generative AI and LLMs - Session 1AI and ML Series - Introduction to Generative AI and LLMs - Session 1
AI and ML Series - Introduction to Generative AI and LLMs - Session 1
DianaGray10
 
Customizing LLMs
Customizing LLMsCustomizing LLMs
Customizing LLMs
Jim Steele
 
All in AI: LLM Landscape & RAG in 2024 with Mark Ryan (Google) & Jerry Liu (L...
All in AI: LLM Landscape & RAG in 2024 with Mark Ryan (Google) & Jerry Liu (L...All in AI: LLM Landscape & RAG in 2024 with Mark Ryan (Google) & Jerry Liu (L...
All in AI: LLM Landscape & RAG in 2024 with Mark Ryan (Google) & Jerry Liu (L...
Daniel Zivkovic
 
How Does Generative AI Actually Work? (a quick semi-technical introduction to...
How Does Generative AI Actually Work? (a quick semi-technical introduction to...How Does Generative AI Actually Work? (a quick semi-technical introduction to...
How Does Generative AI Actually Work? (a quick semi-technical introduction to...
ssuser4edc93
 
Prompting is an art / Sztuka promptowania
Prompting is an art / Sztuka promptowaniaPrompting is an art / Sztuka promptowania
Prompting is an art / Sztuka promptowania
Michal Jaskolski
 
And then there were ... Large Language Models
And then there were ... Large Language ModelsAnd then there were ... Large Language Models
And then there were ... Large Language Models
Leon Dohmen
 
Large Language Models Bootcamp
Large Language Models BootcampLarge Language Models Bootcamp
Large Language Models Bootcamp
Data Science Dojo
 
Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks.pdf
Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks.pdfRetrieval-Augmented Generation for Knowledge-Intensive NLP Tasks.pdf
Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks.pdf
Po-Chuan Chen
 
LLMs Bootcamp
LLMs BootcampLLMs Bootcamp
LLMs Bootcamp
Fiza987241
 
Prompt Engineering by Dr. Naveed.pdf
Prompt Engineering by Dr. Naveed.pdfPrompt Engineering by Dr. Naveed.pdf
Prompt Engineering by Dr. Naveed.pdf
Naveed Ahmed Siddiqui
 
Transformers, LLMs, and the Possibility of AGI
Transformers, LLMs, and the Possibility of AGITransformers, LLMs, and the Possibility of AGI
Transformers, LLMs, and the Possibility of AGI
SynaptonIncorporated
 
LanGCHAIN Framework
LanGCHAIN FrameworkLanGCHAIN Framework
LanGCHAIN Framework
Keymate.AI
 
[DSC Europe 23] Spela Poklukar & Tea Brasanac - Retrieval Augmented Generation
[DSC Europe 23] Spela Poklukar & Tea Brasanac - Retrieval Augmented Generation[DSC Europe 23] Spela Poklukar & Tea Brasanac - Retrieval Augmented Generation
[DSC Europe 23] Spela Poklukar & Tea Brasanac - Retrieval Augmented Generation
DataScienceConferenc1
 
OpenAI’s GPT 3 Language Model - guest Steve Omohundro
OpenAI’s GPT 3 Language Model - guest Steve OmohundroOpenAI’s GPT 3 Language Model - guest Steve Omohundro
OpenAI’s GPT 3 Language Model - guest Steve Omohundro
Numenta
 
Use Case Patterns for LLM Applications (1).pdf
Use Case Patterns for LLM Applications (1).pdfUse Case Patterns for LLM Applications (1).pdf
Use Case Patterns for LLM Applications (1).pdf
M Waleed Kadous
 
How will development change with LLMs
How will development change with LLMsHow will development change with LLMs
How will development change with LLMs
Microsoft, InfuseAI, Appier, IBM, KaiOS
 
Retrieval Augmented Generation in Practice: Scalable GenAI platforms with k8s...
Retrieval Augmented Generation in Practice: Scalable GenAI platforms with k8s...Retrieval Augmented Generation in Practice: Scalable GenAI platforms with k8s...
Retrieval Augmented Generation in Practice: Scalable GenAI platforms with k8s...
Mihai Criveti
 
Episode 2: The LLM / GPT / AI Prompt / Data Engineer Roadmap
Episode 2: The LLM / GPT / AI Prompt / Data Engineer RoadmapEpisode 2: The LLM / GPT / AI Prompt / Data Engineer Roadmap
Episode 2: The LLM / GPT / AI Prompt / Data Engineer Roadmap
Anant Corporation
 
Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...
Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...
Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...
Sri Ambati
 
Challenges in AI LLMs adoption in the Enterprise
Challenges in AI LLMs adoption in the EnterpriseChallenges in AI LLMs adoption in the Enterprise
Challenges in AI LLMs adoption in the Enterprise
George Bara
 

What's hot (20)

AI and ML Series - Introduction to Generative AI and LLMs - Session 1
AI and ML Series - Introduction to Generative AI and LLMs - Session 1AI and ML Series - Introduction to Generative AI and LLMs - Session 1
AI and ML Series - Introduction to Generative AI and LLMs - Session 1
 
Customizing LLMs
Customizing LLMsCustomizing LLMs
Customizing LLMs
 
All in AI: LLM Landscape & RAG in 2024 with Mark Ryan (Google) & Jerry Liu (L...
All in AI: LLM Landscape & RAG in 2024 with Mark Ryan (Google) & Jerry Liu (L...All in AI: LLM Landscape & RAG in 2024 with Mark Ryan (Google) & Jerry Liu (L...
All in AI: LLM Landscape & RAG in 2024 with Mark Ryan (Google) & Jerry Liu (L...
 
How Does Generative AI Actually Work? (a quick semi-technical introduction to...
How Does Generative AI Actually Work? (a quick semi-technical introduction to...How Does Generative AI Actually Work? (a quick semi-technical introduction to...
How Does Generative AI Actually Work? (a quick semi-technical introduction to...
 
Prompting is an art / Sztuka promptowania
Prompting is an art / Sztuka promptowaniaPrompting is an art / Sztuka promptowania
Prompting is an art / Sztuka promptowania
 
And then there were ... Large Language Models
And then there were ... Large Language ModelsAnd then there were ... Large Language Models
And then there were ... Large Language Models
 
Large Language Models Bootcamp
Large Language Models BootcampLarge Language Models Bootcamp
Large Language Models Bootcamp
 
Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks.pdf
Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks.pdfRetrieval-Augmented Generation for Knowledge-Intensive NLP Tasks.pdf
Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks.pdf
 
LLMs Bootcamp
LLMs BootcampLLMs Bootcamp
LLMs Bootcamp
 
Prompt Engineering by Dr. Naveed.pdf
Prompt Engineering by Dr. Naveed.pdfPrompt Engineering by Dr. Naveed.pdf
Prompt Engineering by Dr. Naveed.pdf
 
Transformers, LLMs, and the Possibility of AGI
Transformers, LLMs, and the Possibility of AGITransformers, LLMs, and the Possibility of AGI
Transformers, LLMs, and the Possibility of AGI
 
LanGCHAIN Framework
LanGCHAIN FrameworkLanGCHAIN Framework
LanGCHAIN Framework
 
[DSC Europe 23] Spela Poklukar & Tea Brasanac - Retrieval Augmented Generation
[DSC Europe 23] Spela Poklukar & Tea Brasanac - Retrieval Augmented Generation[DSC Europe 23] Spela Poklukar & Tea Brasanac - Retrieval Augmented Generation
[DSC Europe 23] Spela Poklukar & Tea Brasanac - Retrieval Augmented Generation
 
OpenAI’s GPT 3 Language Model - guest Steve Omohundro
OpenAI’s GPT 3 Language Model - guest Steve OmohundroOpenAI’s GPT 3 Language Model - guest Steve Omohundro
OpenAI’s GPT 3 Language Model - guest Steve Omohundro
 
Use Case Patterns for LLM Applications (1).pdf
Use Case Patterns for LLM Applications (1).pdfUse Case Patterns for LLM Applications (1).pdf
Use Case Patterns for LLM Applications (1).pdf
 
How will development change with LLMs
How will development change with LLMsHow will development change with LLMs
How will development change with LLMs
 
Retrieval Augmented Generation in Practice: Scalable GenAI platforms with k8s...
Retrieval Augmented Generation in Practice: Scalable GenAI platforms with k8s...Retrieval Augmented Generation in Practice: Scalable GenAI platforms with k8s...
Retrieval Augmented Generation in Practice: Scalable GenAI platforms with k8s...
 
Episode 2: The LLM / GPT / AI Prompt / Data Engineer Roadmap
Episode 2: The LLM / GPT / AI Prompt / Data Engineer RoadmapEpisode 2: The LLM / GPT / AI Prompt / Data Engineer Roadmap
Episode 2: The LLM / GPT / AI Prompt / Data Engineer Roadmap
 
Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...
Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...
Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...
 
Challenges in AI LLMs adoption in the Enterprise
Challenges in AI LLMs adoption in the EnterpriseChallenges in AI LLMs adoption in the Enterprise
Challenges in AI LLMs adoption in the Enterprise
 

Similar to Building, Evaluating, and Optimizing your RAG App for Production

Empowering Customers to Self Solve - A Findability Journey - Manikandan Sivan...
Empowering Customers to Self Solve - A Findability Journey - Manikandan Sivan...Empowering Customers to Self Solve - A Findability Journey - Manikandan Sivan...
Empowering Customers to Self Solve - A Findability Journey - Manikandan Sivan...
Lucidworks
 
Evolving s3 story
Evolving s3 storyEvolving s3 story
Evolving s3 story
Avi Perez
 
ShaREing Is Caring
ShaREing Is CaringShaREing Is Caring
ShaREing Is Caring
sporst
 
Infrastructure Challenges in Scaling RAG with Custom AI models
Infrastructure Challenges in Scaling RAG with Custom AI modelsInfrastructure Challenges in Scaling RAG with Custom AI models
Infrastructure Challenges in Scaling RAG with Custom AI models
Zilliz
 
Advanced Design Patterns for Amazon DynamoDB - Workshop (DAT404-R1) - AWS re:...
Advanced Design Patterns for Amazon DynamoDB - Workshop (DAT404-R1) - AWS re:...Advanced Design Patterns for Amazon DynamoDB - Workshop (DAT404-R1) - AWS re:...
Advanced Design Patterns for Amazon DynamoDB - Workshop (DAT404-R1) - AWS re:...
Amazon Web Services
 
[AWS Builders] Effective AWS Glue
[AWS Builders] Effective AWS Glue[AWS Builders] Effective AWS Glue
[AWS Builders] Effective AWS Glue
Amazon Web Services Korea
 
Off-Label Data Mesh: A Prescription for Healthier Data
Off-Label Data Mesh: A Prescription for Healthier DataOff-Label Data Mesh: A Prescription for Healthier Data
Off-Label Data Mesh: A Prescription for Healthier Data
HostedbyConfluent
 
JUG Poznan - 2017.01.31
JUG Poznan - 2017.01.31 JUG Poznan - 2017.01.31
JUG Poznan - 2017.01.31
Omnilogy
 
OSA Con 2022 - Scaling your Pandas Analytics with Modin - Doris Lee - Ponder.pdf
OSA Con 2022 - Scaling your Pandas Analytics with Modin - Doris Lee - Ponder.pdfOSA Con 2022 - Scaling your Pandas Analytics with Modin - Doris Lee - Ponder.pdf
OSA Con 2022 - Scaling your Pandas Analytics with Modin - Doris Lee - Ponder.pdf
Altinity Ltd
 
MongoDB What's new in 3.2 version
MongoDB What's new in 3.2 versionMongoDB What's new in 3.2 version
MongoDB What's new in 3.2 version
Héliot PERROQUIN
 
Navigating the Mess of a Shared drive Migration to SharePoint
Navigating the Mess of a Shared drive Migration to SharePointNavigating the Mess of a Shared drive Migration to SharePoint
Navigating the Mess of a Shared drive Migration to SharePoint
Joanne Klein
 
Build Data Driven Apps with Real-time and Offline Capabilities
Build Data Driven Apps with Real-time and Offline CapabilitiesBuild Data Driven Apps with Real-time and Offline Capabilities
Build Data Driven Apps with Real-time and Offline Capabilities
Amazon Web Services
 
Webinar: Enterprise Data Management in the Era of MongoDB and Data Lakes
Webinar: Enterprise Data Management in the Era of MongoDB and Data LakesWebinar: Enterprise Data Management in the Era of MongoDB and Data Lakes
Webinar: Enterprise Data Management in the Era of MongoDB and Data Lakes
MongoDB
 
2019 StartIT - Boosting your performance with Blackfire
2019 StartIT - Boosting your performance with Blackfire2019 StartIT - Boosting your performance with Blackfire
2019 StartIT - Boosting your performance with Blackfire
Marko Mitranić
 
Code for Startup MVP (Ruby on Rails) Session 1
Code for Startup MVP (Ruby on Rails) Session 1Code for Startup MVP (Ruby on Rails) Session 1
Code for Startup MVP (Ruby on Rails) Session 1
Henry S
 
MySQL And Search At Craigslist
MySQL And Search At CraigslistMySQL And Search At Craigslist
MySQL And Search At Craigslist
Jeremy Zawodny
 
Docker/DevOps Meetup: Metrics-Driven Continuous Performance and Scalabilty
Docker/DevOps Meetup: Metrics-Driven Continuous Performance and ScalabiltyDocker/DevOps Meetup: Metrics-Driven Continuous Performance and Scalabilty
Docker/DevOps Meetup: Metrics-Driven Continuous Performance and Scalabilty
Andreas Grabner
 
Framing the Argument: How to Scale Faster with NoSQL
Framing the Argument: How to Scale Faster with NoSQLFraming the Argument: How to Scale Faster with NoSQL
Framing the Argument: How to Scale Faster with NoSQL
Inside Analysis
 
Automate Your Big Data Workflows (SVC201) | AWS re:Invent 2013
Automate Your Big Data Workflows (SVC201) | AWS re:Invent 2013Automate Your Big Data Workflows (SVC201) | AWS re:Invent 2013
Automate Your Big Data Workflows (SVC201) | AWS re:Invent 2013
Amazon Web Services
 
BGOUG "Agile Data: revolutionizing database cloning'
BGOUG  "Agile Data: revolutionizing database cloning'BGOUG  "Agile Data: revolutionizing database cloning'
BGOUG "Agile Data: revolutionizing database cloning'
Kyle Hailey
 

Similar to Building, Evaluating, and Optimizing your RAG App for Production (20)

Empowering Customers to Self Solve - A Findability Journey - Manikandan Sivan...
Empowering Customers to Self Solve - A Findability Journey - Manikandan Sivan...Empowering Customers to Self Solve - A Findability Journey - Manikandan Sivan...
Empowering Customers to Self Solve - A Findability Journey - Manikandan Sivan...
 
Evolving s3 story
Evolving s3 storyEvolving s3 story
Evolving s3 story
 
ShaREing Is Caring
ShaREing Is CaringShaREing Is Caring
ShaREing Is Caring
 
Infrastructure Challenges in Scaling RAG with Custom AI models
Infrastructure Challenges in Scaling RAG with Custom AI modelsInfrastructure Challenges in Scaling RAG with Custom AI models
Infrastructure Challenges in Scaling RAG with Custom AI models
 
Advanced Design Patterns for Amazon DynamoDB - Workshop (DAT404-R1) - AWS re:...
Advanced Design Patterns for Amazon DynamoDB - Workshop (DAT404-R1) - AWS re:...Advanced Design Patterns for Amazon DynamoDB - Workshop (DAT404-R1) - AWS re:...
Advanced Design Patterns for Amazon DynamoDB - Workshop (DAT404-R1) - AWS re:...
 
[AWS Builders] Effective AWS Glue
[AWS Builders] Effective AWS Glue[AWS Builders] Effective AWS Glue
[AWS Builders] Effective AWS Glue
 
Off-Label Data Mesh: A Prescription for Healthier Data
Off-Label Data Mesh: A Prescription for Healthier DataOff-Label Data Mesh: A Prescription for Healthier Data
Off-Label Data Mesh: A Prescription for Healthier Data
 
JUG Poznan - 2017.01.31
JUG Poznan - 2017.01.31 JUG Poznan - 2017.01.31
JUG Poznan - 2017.01.31
 
OSA Con 2022 - Scaling your Pandas Analytics with Modin - Doris Lee - Ponder.pdf
OSA Con 2022 - Scaling your Pandas Analytics with Modin - Doris Lee - Ponder.pdfOSA Con 2022 - Scaling your Pandas Analytics with Modin - Doris Lee - Ponder.pdf
OSA Con 2022 - Scaling your Pandas Analytics with Modin - Doris Lee - Ponder.pdf
 
MongoDB What's new in 3.2 version
MongoDB What's new in 3.2 versionMongoDB What's new in 3.2 version
MongoDB What's new in 3.2 version
 
Navigating the Mess of a Shared drive Migration to SharePoint
Navigating the Mess of a Shared drive Migration to SharePointNavigating the Mess of a Shared drive Migration to SharePoint
Navigating the Mess of a Shared drive Migration to SharePoint
 
Build Data Driven Apps with Real-time and Offline Capabilities
Build Data Driven Apps with Real-time and Offline CapabilitiesBuild Data Driven Apps with Real-time and Offline Capabilities
Build Data Driven Apps with Real-time and Offline Capabilities
 
Webinar: Enterprise Data Management in the Era of MongoDB and Data Lakes
Webinar: Enterprise Data Management in the Era of MongoDB and Data LakesWebinar: Enterprise Data Management in the Era of MongoDB and Data Lakes
Webinar: Enterprise Data Management in the Era of MongoDB and Data Lakes
 
2019 StartIT - Boosting your performance with Blackfire
2019 StartIT - Boosting your performance with Blackfire2019 StartIT - Boosting your performance with Blackfire
2019 StartIT - Boosting your performance with Blackfire
 
Code for Startup MVP (Ruby on Rails) Session 1
Code for Startup MVP (Ruby on Rails) Session 1Code for Startup MVP (Ruby on Rails) Session 1
Code for Startup MVP (Ruby on Rails) Session 1
 
MySQL And Search At Craigslist
MySQL And Search At CraigslistMySQL And Search At Craigslist
MySQL And Search At Craigslist
 
Docker/DevOps Meetup: Metrics-Driven Continuous Performance and Scalabilty
Docker/DevOps Meetup: Metrics-Driven Continuous Performance and ScalabiltyDocker/DevOps Meetup: Metrics-Driven Continuous Performance and Scalabilty
Docker/DevOps Meetup: Metrics-Driven Continuous Performance and Scalabilty
 
Framing the Argument: How to Scale Faster with NoSQL
Framing the Argument: How to Scale Faster with NoSQLFraming the Argument: How to Scale Faster with NoSQL
Framing the Argument: How to Scale Faster with NoSQL
 
Automate Your Big Data Workflows (SVC201) | AWS re:Invent 2013
Automate Your Big Data Workflows (SVC201) | AWS re:Invent 2013Automate Your Big Data Workflows (SVC201) | AWS re:Invent 2013
Automate Your Big Data Workflows (SVC201) | AWS re:Invent 2013
 
BGOUG "Agile Data: revolutionizing database cloning'
BGOUG  "Agile Data: revolutionizing database cloning'BGOUG  "Agile Data: revolutionizing database cloning'
BGOUG "Agile Data: revolutionizing database cloning'
 

More from Sri Ambati

GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
Sri Ambati
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
Sri Ambati
 
AI and the Future of Software Development: A Sneak Peek
AI and the Future of Software Development: A Sneak Peek AI and the Future of Software Development: A Sneak Peek
AI and the Future of Software Development: A Sneak Peek
Sri Ambati
 
LLMOps: Match report from the top of the 5th
LLMOps: Match report from the top of the 5thLLMOps: Match report from the top of the 5th
LLMOps: Match report from the top of the 5th
Sri Ambati
 
Building LLM Solutions using Open Source and Closed Source Solutions in Coher...
Building LLM Solutions using Open Source and Closed Source Solutions in Coher...Building LLM Solutions using Open Source and Closed Source Solutions in Coher...
Building LLM Solutions using Open Source and Closed Source Solutions in Coher...
Sri Ambati
 
Risk Management for LLMs
Risk Management for LLMsRisk Management for LLMs
Risk Management for LLMs
Sri Ambati
 
Open-Source AI: Community is the Way
Open-Source AI: Community is the WayOpen-Source AI: Community is the Way
Open-Source AI: Community is the Way
Sri Ambati
 
Building Custom GenAI Apps at H2O
Building Custom GenAI Apps at H2OBuilding Custom GenAI Apps at H2O
Building Custom GenAI Apps at H2O
Sri Ambati
 
Applied Gen AI for the Finance Vertical
Applied Gen AI for the Finance Vertical Applied Gen AI for the Finance Vertical
Applied Gen AI for the Finance Vertical
Sri Ambati
 
Cutting Edge Tricks from LLM Papers
Cutting Edge Tricks from LLM PapersCutting Edge Tricks from LLM Papers
Cutting Edge Tricks from LLM Papers
Sri Ambati
 
Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...
Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...
Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...
Sri Ambati
 
KGM Mastering Classification and Regression with LLMs: Insights from Kaggle C...
KGM Mastering Classification and Regression with LLMs: Insights from Kaggle C...KGM Mastering Classification and Regression with LLMs: Insights from Kaggle C...
KGM Mastering Classification and Regression with LLMs: Insights from Kaggle C...
Sri Ambati
 
LLM Interpretability
LLM Interpretability LLM Interpretability
LLM Interpretability
Sri Ambati
 
Never Reply to an Email Again
Never Reply to an Email AgainNever Reply to an Email Again
Never Reply to an Email Again
Sri Ambati
 
Introducción al Aprendizaje Automatico con H2O-3 (1)
Introducción al Aprendizaje Automatico con H2O-3 (1)Introducción al Aprendizaje Automatico con H2O-3 (1)
Introducción al Aprendizaje Automatico con H2O-3 (1)
Sri Ambati
 
From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...
From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...
From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...
Sri Ambati
 
AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...
AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...
AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...
Sri Ambati
 
AI Foundations Course Module 1 - An AI Transformation Journey
AI Foundations Course Module 1 - An AI Transformation JourneyAI Foundations Course Module 1 - An AI Transformation Journey
AI Foundations Course Module 1 - An AI Transformation Journey
Sri Ambati
 
ML Model Deployment and Scoring on the Edge with Automatic ML & DF
ML Model Deployment and Scoring on the Edge with Automatic ML & DFML Model Deployment and Scoring on the Edge with Automatic ML & DF
ML Model Deployment and Scoring on the Edge with Automatic ML & DF
Sri Ambati
 
Scaling & Managing Production Deployments with H2O ModelOps
Scaling & Managing Production Deployments with H2O ModelOpsScaling & Managing Production Deployments with H2O ModelOps
Scaling & Managing Production Deployments with H2O ModelOps
Sri Ambati
 

More from Sri Ambati (20)

GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
 
AI and the Future of Software Development: A Sneak Peek
AI and the Future of Software Development: A Sneak Peek AI and the Future of Software Development: A Sneak Peek
AI and the Future of Software Development: A Sneak Peek
 
LLMOps: Match report from the top of the 5th
LLMOps: Match report from the top of the 5thLLMOps: Match report from the top of the 5th
LLMOps: Match report from the top of the 5th
 
Building LLM Solutions using Open Source and Closed Source Solutions in Coher...
Building LLM Solutions using Open Source and Closed Source Solutions in Coher...Building LLM Solutions using Open Source and Closed Source Solutions in Coher...
Building LLM Solutions using Open Source and Closed Source Solutions in Coher...
 
Risk Management for LLMs
Risk Management for LLMsRisk Management for LLMs
Risk Management for LLMs
 
Open-Source AI: Community is the Way
Open-Source AI: Community is the WayOpen-Source AI: Community is the Way
Open-Source AI: Community is the Way
 
Building Custom GenAI Apps at H2O
Building Custom GenAI Apps at H2OBuilding Custom GenAI Apps at H2O
Building Custom GenAI Apps at H2O
 
Applied Gen AI for the Finance Vertical
Applied Gen AI for the Finance Vertical Applied Gen AI for the Finance Vertical
Applied Gen AI for the Finance Vertical
 
Cutting Edge Tricks from LLM Papers
Cutting Edge Tricks from LLM PapersCutting Edge Tricks from LLM Papers
Cutting Edge Tricks from LLM Papers
 
Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...
Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...
Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...
 
KGM Mastering Classification and Regression with LLMs: Insights from Kaggle C...
KGM Mastering Classification and Regression with LLMs: Insights from Kaggle C...KGM Mastering Classification and Regression with LLMs: Insights from Kaggle C...
KGM Mastering Classification and Regression with LLMs: Insights from Kaggle C...
 
LLM Interpretability
LLM Interpretability LLM Interpretability
LLM Interpretability
 
Never Reply to an Email Again
Never Reply to an Email AgainNever Reply to an Email Again
Never Reply to an Email Again
 
Introducción al Aprendizaje Automatico con H2O-3 (1)
Introducción al Aprendizaje Automatico con H2O-3 (1)Introducción al Aprendizaje Automatico con H2O-3 (1)
Introducción al Aprendizaje Automatico con H2O-3 (1)
 
From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...
From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...
From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...
 
AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...
AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...
AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...
 
AI Foundations Course Module 1 - An AI Transformation Journey
AI Foundations Course Module 1 - An AI Transformation JourneyAI Foundations Course Module 1 - An AI Transformation Journey
AI Foundations Course Module 1 - An AI Transformation Journey
 
ML Model Deployment and Scoring on the Edge with Automatic ML & DF
ML Model Deployment and Scoring on the Edge with Automatic ML & DFML Model Deployment and Scoring on the Edge with Automatic ML & DF
ML Model Deployment and Scoring on the Edge with Automatic ML & DF
 
Scaling & Managing Production Deployments with H2O ModelOps
Scaling & Managing Production Deployments with H2O ModelOpsScaling & Managing Production Deployments with H2O ModelOps
Scaling & Managing Production Deployments with H2O ModelOps
 

Recently uploaded

Quality Patents: Patents That Stand the Test of Time
Quality Patents: Patents That Stand the Test of TimeQuality Patents: Patents That Stand the Test of Time
Quality Patents: Patents That Stand the Test of Time
Aurora Consulting
 
WhatsApp Image 2024-03-27 at 08.19.52_bfd93109.pdf
WhatsApp Image 2024-03-27 at 08.19.52_bfd93109.pdfWhatsApp Image 2024-03-27 at 08.19.52_bfd93109.pdf
WhatsApp Image 2024-03-27 at 08.19.52_bfd93109.pdf
ArgaBisma
 
RPA In Healthcare Benefits, Use Case, Trend And Challenges 2024.pptx
RPA In Healthcare Benefits, Use Case, Trend And Challenges 2024.pptxRPA In Healthcare Benefits, Use Case, Trend And Challenges 2024.pptx
RPA In Healthcare Benefits, Use Case, Trend And Challenges 2024.pptx
SynapseIndia
 
Coordinate Systems in FME 101 - Webinar Slides
Coordinate Systems in FME 101 - Webinar SlidesCoordinate Systems in FME 101 - Webinar Slides
Coordinate Systems in FME 101 - Webinar Slides
Safe Software
 
Cookies program to display the information though cookie creation
Cookies program to display the information though cookie creationCookies program to display the information though cookie creation
Cookies program to display the information though cookie creation
shanthidl1
 
TrustArc Webinar - 2024 Data Privacy Trends: A Mid-Year Check-In
TrustArc Webinar - 2024 Data Privacy Trends: A Mid-Year Check-InTrustArc Webinar - 2024 Data Privacy Trends: A Mid-Year Check-In
TrustArc Webinar - 2024 Data Privacy Trends: A Mid-Year Check-In
TrustArc
 
[Talk] Moving Beyond Spaghetti Infrastructure [AOTB] 2024-07-04.pdf
[Talk] Moving Beyond Spaghetti Infrastructure [AOTB] 2024-07-04.pdf[Talk] Moving Beyond Spaghetti Infrastructure [AOTB] 2024-07-04.pdf
[Talk] Moving Beyond Spaghetti Infrastructure [AOTB] 2024-07-04.pdf
Kief Morris
 
The Rise of Supernetwork Data Intensive Computing
The Rise of Supernetwork Data Intensive ComputingThe Rise of Supernetwork Data Intensive Computing
The Rise of Supernetwork Data Intensive Computing
Larry Smarr
 
7 Most Powerful Solar Storms in the History of Earth.pdf
7 Most Powerful Solar Storms in the History of Earth.pdf7 Most Powerful Solar Storms in the History of Earth.pdf
7 Most Powerful Solar Storms in the History of Earth.pdf
Enterprise Wired
 
Advanced Techniques for Cyber Security Analysis and Anomaly Detection
Advanced Techniques for Cyber Security Analysis and Anomaly DetectionAdvanced Techniques for Cyber Security Analysis and Anomaly Detection
Advanced Techniques for Cyber Security Analysis and Anomaly Detection
Bert Blevins
 
20240702 Présentation Plateforme GenAI.pdf
20240702 Présentation Plateforme GenAI.pdf20240702 Présentation Plateforme GenAI.pdf
20240702 Présentation Plateforme GenAI.pdf
Sally Laouacheria
 
Details of description part II: Describing images in practice - Tech Forum 2024
Details of description part II: Describing images in practice - Tech Forum 2024Details of description part II: Describing images in practice - Tech Forum 2024
Details of description part II: Describing images in practice - Tech Forum 2024
BookNet Canada
 
DealBook of Ukraine: 2024 edition
DealBook of Ukraine: 2024 editionDealBook of Ukraine: 2024 edition
DealBook of Ukraine: 2024 edition
Yevgen Sysoyev
 
How RPA Help in the Transportation and Logistics Industry.pptx
How RPA Help in the Transportation and Logistics Industry.pptxHow RPA Help in the Transportation and Logistics Industry.pptx
How RPA Help in the Transportation and Logistics Industry.pptx
SynapseIndia
 
Implementations of Fused Deposition Modeling in real world
Implementations of Fused Deposition Modeling  in real worldImplementations of Fused Deposition Modeling  in real world
Implementations of Fused Deposition Modeling in real world
Emerging Tech
 
How Social Media Hackers Help You to See Your Wife's Message.pdf
How Social Media Hackers Help You to See Your Wife's Message.pdfHow Social Media Hackers Help You to See Your Wife's Message.pdf
How Social Media Hackers Help You to See Your Wife's Message.pdf
HackersList
 
Fluttercon 2024: Showing that you care about security - OpenSSF Scorecards fo...
Fluttercon 2024: Showing that you care about security - OpenSSF Scorecards fo...Fluttercon 2024: Showing that you care about security - OpenSSF Scorecards fo...
Fluttercon 2024: Showing that you care about security - OpenSSF Scorecards fo...
Chris Swan
 
Mitigating the Impact of State Management in Cloud Stream Processing Systems
Mitigating the Impact of State Management in Cloud Stream Processing SystemsMitigating the Impact of State Management in Cloud Stream Processing Systems
Mitigating the Impact of State Management in Cloud Stream Processing Systems
ScyllaDB
 
How to Build a Profitable IoT Product.pptx
How to Build a Profitable IoT Product.pptxHow to Build a Profitable IoT Product.pptx
How to Build a Profitable IoT Product.pptx
Adam Dunkels
 
Password Rotation in 2024 is still Relevant
Password Rotation in 2024 is still RelevantPassword Rotation in 2024 is still Relevant
Password Rotation in 2024 is still Relevant
Bert Blevins
 

Recently uploaded (20)

Quality Patents: Patents That Stand the Test of Time
Quality Patents: Patents That Stand the Test of TimeQuality Patents: Patents That Stand the Test of Time
Quality Patents: Patents That Stand the Test of Time
 
WhatsApp Image 2024-03-27 at 08.19.52_bfd93109.pdf
WhatsApp Image 2024-03-27 at 08.19.52_bfd93109.pdfWhatsApp Image 2024-03-27 at 08.19.52_bfd93109.pdf
WhatsApp Image 2024-03-27 at 08.19.52_bfd93109.pdf
 
RPA In Healthcare Benefits, Use Case, Trend And Challenges 2024.pptx
RPA In Healthcare Benefits, Use Case, Trend And Challenges 2024.pptxRPA In Healthcare Benefits, Use Case, Trend And Challenges 2024.pptx
RPA In Healthcare Benefits, Use Case, Trend And Challenges 2024.pptx
 
Coordinate Systems in FME 101 - Webinar Slides
Coordinate Systems in FME 101 - Webinar SlidesCoordinate Systems in FME 101 - Webinar Slides
Coordinate Systems in FME 101 - Webinar Slides
 
Cookies program to display the information though cookie creation
Cookies program to display the information though cookie creationCookies program to display the information though cookie creation
Cookies program to display the information though cookie creation
 
TrustArc Webinar - 2024 Data Privacy Trends: A Mid-Year Check-In
TrustArc Webinar - 2024 Data Privacy Trends: A Mid-Year Check-InTrustArc Webinar - 2024 Data Privacy Trends: A Mid-Year Check-In
TrustArc Webinar - 2024 Data Privacy Trends: A Mid-Year Check-In
 
[Talk] Moving Beyond Spaghetti Infrastructure [AOTB] 2024-07-04.pdf
[Talk] Moving Beyond Spaghetti Infrastructure [AOTB] 2024-07-04.pdf[Talk] Moving Beyond Spaghetti Infrastructure [AOTB] 2024-07-04.pdf
[Talk] Moving Beyond Spaghetti Infrastructure [AOTB] 2024-07-04.pdf
 
The Rise of Supernetwork Data Intensive Computing
The Rise of Supernetwork Data Intensive ComputingThe Rise of Supernetwork Data Intensive Computing
The Rise of Supernetwork Data Intensive Computing
 
7 Most Powerful Solar Storms in the History of Earth.pdf
7 Most Powerful Solar Storms in the History of Earth.pdf7 Most Powerful Solar Storms in the History of Earth.pdf
7 Most Powerful Solar Storms in the History of Earth.pdf
 
Advanced Techniques for Cyber Security Analysis and Anomaly Detection
Advanced Techniques for Cyber Security Analysis and Anomaly DetectionAdvanced Techniques for Cyber Security Analysis and Anomaly Detection
Advanced Techniques for Cyber Security Analysis and Anomaly Detection
 
20240702 Présentation Plateforme GenAI.pdf
20240702 Présentation Plateforme GenAI.pdf20240702 Présentation Plateforme GenAI.pdf
20240702 Présentation Plateforme GenAI.pdf
 
Details of description part II: Describing images in practice - Tech Forum 2024
Details of description part II: Describing images in practice - Tech Forum 2024Details of description part II: Describing images in practice - Tech Forum 2024
Details of description part II: Describing images in practice - Tech Forum 2024
 
DealBook of Ukraine: 2024 edition
DealBook of Ukraine: 2024 editionDealBook of Ukraine: 2024 edition
DealBook of Ukraine: 2024 edition
 
How RPA Help in the Transportation and Logistics Industry.pptx
How RPA Help in the Transportation and Logistics Industry.pptxHow RPA Help in the Transportation and Logistics Industry.pptx
How RPA Help in the Transportation and Logistics Industry.pptx
 
Implementations of Fused Deposition Modeling in real world
Implementations of Fused Deposition Modeling  in real worldImplementations of Fused Deposition Modeling  in real world
Implementations of Fused Deposition Modeling in real world
 
How Social Media Hackers Help You to See Your Wife's Message.pdf
How Social Media Hackers Help You to See Your Wife's Message.pdfHow Social Media Hackers Help You to See Your Wife's Message.pdf
How Social Media Hackers Help You to See Your Wife's Message.pdf
 
Fluttercon 2024: Showing that you care about security - OpenSSF Scorecards fo...
Fluttercon 2024: Showing that you care about security - OpenSSF Scorecards fo...Fluttercon 2024: Showing that you care about security - OpenSSF Scorecards fo...
Fluttercon 2024: Showing that you care about security - OpenSSF Scorecards fo...
 
Mitigating the Impact of State Management in Cloud Stream Processing Systems
Mitigating the Impact of State Management in Cloud Stream Processing SystemsMitigating the Impact of State Management in Cloud Stream Processing Systems
Mitigating the Impact of State Management in Cloud Stream Processing Systems
 
How to Build a Profitable IoT Product.pptx
How to Build a Profitable IoT Product.pptxHow to Build a Profitable IoT Product.pptx
How to Build a Profitable IoT Product.pptx
 
Password Rotation in 2024 is still Relevant
Password Rotation in 2024 is still RelevantPassword Rotation in 2024 is still Relevant
Password Rotation in 2024 is still Relevant
 

Building, Evaluating, and Optimizing your RAG App for Production

  • 2. H2O.ai Confidential GenAI - Enterprise Use-cases Document Processing Tagging & Extraction Knowledge Search & QA Conversational Agent Workflow Automation Agent: … Human: … Agent: … Document Topic: Summary: Author: Knowledge Base Answer: Sources: … Workflow: ● Read latest messages from user A ● Send email suggesting next-steps Inbox read Email write
  • 3. H2O.ai Confidential GenAI - Enterprise Use-cases Document Processing Tagging & Extraction Knowledge Search & QA Conversational Agent Workflow Automation Agent: … Human: … Agent: … Document Topic: Summary: Author: Knowledge Base Answer: Sources: … Workflow: ● Read latest messages from user A ● Send email suggesting next-steps Inbox read Email write
  • 5. H2O.ai Confidential Current RAG Stack for building a QA System Vector Database Doc Chunk Chunk Chunk Chunk Chunk Chunk Chunk LLM Data Ingestion Data Querying (Retrieval + Synthesis) 5 Lines of Code in LlamaIndex!
  • 7. H2O.ai Confidential Challenges with Naive RAG (Response Quality) ● Bad Retrieval ○ Low Precision: Not all chunks in retrieved set are relevant ■ Hallucination + Lost in the Middle Problems ○ Low Recall: Now all relevant chunks are retrieved. ■ Lacks enough context for LLM to synthesize an answer ○ Outdated information: The data is redundant or out of date.
  • 8. H2O.ai Confidential Challenges with Naive RAG (Response Quality) ● Bad Retrieval ○ Low Precision: Not all chunks in retrieved set are relevant ■ Hallucination + Lost in the Middle Problems ○ Low Recall: Now all relevant chunks are retrieved. ■ Lacks enough context for LLM to synthesize an answer ○ Outdated information: The data is redundant or out of date. ● Bad Response Generation ○ Hallucination: Model makes up an answer that isn’t in the context. ○ Irrelevance: Model makes up an answer that doesn’t answer the question. ○ Toxicity/Bias: Model makes up an answer that’s harmful/offensive.
  • 9. H2O.ai Confidential What do we do? • Data: Can we store additional information beyond raw text chunks? • Embeddings: Can we optimize our embedding representations? • Retrieval: Can we do better than top-k embedding lookup? • Synthesis: Can we use LLMs for more than generation? Vector Database Doc Chunk Chunk Chunk Chunk Chunk LLM Data Embeddings Retrieval Synthesis
  • 10. H2O.ai Confidential What do we do? • Data: Can we store additional information beyond raw text chunks? • Embeddings: Can we optimize our embedding representations? • Retrieval: Can we do better than top-k embedding lookup? • Synthesis: Can we use LLMs for more than generation? But before all this… We need evals
  • 12. H2O.ai Confidential Evaluation ● How do we properly evaluate a RAG system? ○ Evaluate in isolation (retrieval, synthesis) ○ Evaluate e2e Vector Database Chunk Chunk Chunk LLM Retrieval Synthesis
  • 13. H2O.ai Confidential Evaluation in Isolation (Retrieval) ● Evaluate quality of retrieved chunks given user query ● Create dataset ○ Input: query ○ Output: the “ground-truth” documents relevant to the query ● Run retriever over dataset ● Measure ranking metrics ○ Success rate / hit-rate ○ MRR ○ Hit-rate
  • 14. H2O.ai Confidential Evaluation E2E ● Evaluation of final generated response given input ● Create Dataset ○ Input: query ○ [Optional] Output: the “ground-truth” answer ● Run through full RAG pipeline ● Collect evaluation metrics: ○ If no labels: label-free evals ○ If labels: with-label evals
  • 16. H2O.ai Confidential From Simple to Advanced Less Expressive Easier to Implement Lower Latency/Cost More Expressive Harder to Implement Higher Latency/Cost Table Stakes Better Parsers Chunk Sizes Prompt Engineering Customizing Models 🛠️ Advanced Retrieval Metadata Filtering Recursive Retrieval Embedded Tables Small-to-big Retrieval 🔎 Agentic Behavior Routing Query Planning Multi-document Agents ️ Fine-tuning Embedding fine-tuning LLM fine-tuning ⚙️
  • 17. H2O.ai Confidential Table Stakes: Chunk Sizes Tuning your chunk size can have outsized impacts on performance Not obvious that more retrieved tokens == higher performance! Note: Reranking (shuffling context order) isn’t always beneficial.
  • 18. H2O.ai Confidential Table Stakes: Prompt Engineering RAG uses core Question-Answering (QA) prompt templates Ways you can customize: • Adding few-shot examples • Modifying template text • Adding emotions
  • 19. H2O.ai Confidential Table Stakes: Customizing LLMs Task performance on easy- to-hard tasks (RAG, agents) varies wildly among LLMs
  • 20. H2O.ai Confidential Table Stakes: Customizing Embeddings Your embedding model + reranker affects retrieval quality Source: https://blog.llamaindex.ai/boosting-rag-picking-the-best-embedding-reranker-models-42d079022e83
  • 21. H2O.ai Confidential Advanced Retrieval: Small-to-Big Intuition: Embedding a big text chunk feels suboptimal. Solution: Embed text at the sentence-level - then expand that window during LLM synthesis
  • 22. H2O.ai Confidential Advanced Retrieval: Small-to-Big Sentence Window Retrieval (k=2) Naive Retrieval (k=5) Only one out of the 5 chunks is relevant - “lost in the middle” problem Leads to more precise retrieval. Avoids “lost in the middle” problems.
  • 23. H2O.ai Confidential Advanced Retrieval: Small-to-Big Intuition: Embedding a big text chunk feels suboptimal. Solution: Embed a smaller reference to the parent chunk. Use parent chunk for synthesis Examples: Smaller chunks, summaries, metadata
  • 24. H2O.ai Confidential Data Agents - LLM-powered knowledge workers Email Read latest emails Knowledge Base Retrieve context Analysis Agent Analyze file Slack Send update Data Agent
  • 25. H2O.ai Confidential Data Agents - Core Components Agent Reasoning Loop ● ReAct Agent (any LLM) ● OpenAI Agent (only OAI) Tools via LlamaHub ● Code interpreter ● Slack ● Notion ● Zapier ● … (15+ tools, ~100 loaders)
  • 26. H2O.ai Confidential Agentic Behavior: Multi-Document Agents Intuition: There’s certain questions that “top-k” RAG can’t answer. Solution: Multi-Document Agents ● Fact-based QA and Summarization over any subsets of documents ● Chain-of-thought and query planning.
  • 27. H2O.ai Confidential Fine-Tuning: Embeddings Credits: Jo Bergum, vespa.ai Intuition: Embedding Representations are not optimized over your dataset Solution: Generate a synthetic query dataset from raw text chunks using LLMs Use this synthetic dataset to finetune an embedding model.
  • 28. H2O.ai Confidential Fine-Tuning: LLMs Intuition: Weaker LLMs are not bad at response synthesis, reasoning, structured outputs, etc. Solution: Generate a synthetic dataset from raw chunks (e.g. using GPT-4). Help fix all of the above!