SlideShare a Scribd company logo
Use case patterns for LLM Apps: What to look for
M Waleed Kadous
Chief Scientist, Anyscale
K1st
Oct 11, 2023
- Certain “use case patterns” we see
- Equip you to spot LLM opportunities in your organization
- Patterns (in rough order of difficulty)
- Summarization
- The RAG Family
- Knowledge Base Question Answering
- Document Question Answering
- Talk to your data
- Talk to your system
- In-context assistance family
- Co-creator
- Diagnostician
- Bonus material:
- Thoughts on Dr Nguyen’s presentation
- Waleed’s Hard Won Heuristics
Key points
- Company behind the Open Source project Ray
- Widely used Scalable AI Platform used by many
companies
- What scalable means:
- Distributed: Up to 4,000 nodes, 16,000 GPUs
- Efficient: Keep costs down by efficient resource mgmt
- Reliable: Fault tolerant, highly available
- Widely used by GenAI companies e.g. OpenAI, Cohere
- ChatGPT trained using Ray
Who is Anyscale? Why should you listen to us?
Use Case Patterns for LLM Applications (1).pdf

Recommended for you

Generative AI for the rest of us
Generative AI for the rest of usGenerative AI for the rest of us
Generative AI for the rest of us

This session was presented at the AWS Community Day in Munich (September 2023). It's for builders that heard the buzz about Generative AI but can’t quite grok it yet. Useful if you are eager to connect the dots on the Generative AI terminology and get a fast start for you to explore further and navigate the space. This session is largely product agnostic and meant to give you the fundamentals to get started.

llm genai
The Rise of the LLMs - How I Learned to Stop Worrying & Love the GPT!
The Rise of the LLMs - How I Learned to Stop Worrying & Love the GPT!The Rise of the LLMs - How I Learned to Stop Worrying & Love the GPT!
The Rise of the LLMs - How I Learned to Stop Worrying & Love the GPT!

LLM, ChatGPT, AI. A take on the new wave! After the PC, the internet, mobile and the cloud, the new wave is here: AI

chatgptainlp
Episode 2: The LLM / GPT / AI Prompt / Data Engineer Roadmap
Episode 2: The LLM / GPT / AI Prompt / Data Engineer RoadmapEpisode 2: The LLM / GPT / AI Prompt / Data Engineer Roadmap
Episode 2: The LLM / GPT / AI Prompt / Data Engineer Roadmap

In this episode we'll discuss the different flavors of prompt engineering in the LLM/GPT space. According to your skill level you should be able to pick up at any of the following: Leveling up with GPT 1: Use ChatGPT / GPT Powered Apps 2: Become a Prompt Engineer on ChatGPT/GPT 3: Use GPT API with NoCode Automation, App Builders 4: Create Workflows to Automate Tasks with NoCode 5: Use GPT API with Code, make your own APIs 6: Create Workflows to Automate Tasks with Code 7: Use GPT API with your Data / a Framework 8: Use GPT API with your Data / a Framework to Make your own APIs 9: Create Workflows to Automate Tasks with your Data /a Framework 10: Use Another LLM API other than GPT (Cohere, HuggingFace) 11: Use open source LLM models on your computer 12: Finetune / Build your own models Series: Using AI / ChatGPT at Work - GPT Automation Are you a small business owner or web developer interested in leveraging the power of GPT (Generative Pretrained Transformer) technology to enhance your business processes? If so, Join us for a series of events focused on using GPT in business. Whether you're a small business owner or a web developer, you'll learn how to leverage GPT to improve your workflow and provide better services to your customers.

automationgptchatgpt
We provide LLMs as a service (Llama models)
We use LLMs to make our products better
We help our customers deploy LLMs on Ray and on the
managed version of Ray (Anyscale Platform)
What’s our experience with LLMs?
Anyscale Endpoints
LLMs served via API
LLMs fine-tuned via API
Anyscale Endpoints
Llama2 70B
Codellama 34B $1.00
Llama2 13B
$0.25
Llama2 7B $0.15
LLM Serving Price
(per million tokens)
endpoints.anyscale.com
Anyscale Endpoints
Cost efficiency touches every layer of the stack
Anyscale Endpoints
Single GPU optimizations
Multi-GPU modeling
Inference server
Autoscaling
Multi-region, multi-cloud
$1 / million
tokens
(Llama-2 70B)
End-to-end LLM privacy, customization and control
Anyscale Endpoints
LLMs served via API
LLMs fine-tuned via API
Serve your LLMs from your Cloud
Fine-tune & customize in your Cloud
Anyscale Private
Endpoints
Cost Quality

Recommended for you

Unlocking the Power of Generative AI An Executive's Guide.pdf
Unlocking the Power of Generative AI An Executive's Guide.pdfUnlocking the Power of Generative AI An Executive's Guide.pdf
Unlocking the Power of Generative AI An Executive's Guide.pdf

Generative AI is here, and it can revolutionize your business. With its powerful capabilities, this technology can help companies create more efficient processes, unlock new insights from data, and drive innovation. But how do you make the most of these opportunities? This guide will provide you with the information and resources needed to understand the ins and outs of Generative AI, so you can make informed decisions and capitalize on the potential. It covers important topics such as strategies for leveraging large language models, optimizing MLOps processes, and best practices for building with Generative AI.

generativeaimlopsgenai
Leveraging Generative AI & Best practices
Leveraging Generative AI & Best practicesLeveraging Generative AI & Best practices
Leveraging Generative AI & Best practices

In this event we will cover: - What is Generative AI and how it is being for future of work. - Best practices for developing and deploying generative AI based models in productions. - Future of Generative AI, how generative AI is expected to evolve in the coming years.

#uipathcommunity#rpa#ai
presentation.pdf
presentation.pdfpresentation.pdf
presentation.pdf

This document provides an overview of building, evaluating, and optimizing a RAG (Retrieve-and-Generate) conversational agent for production. It discusses setting up the development environment, prototyping the initial system, addressing challenges when moving to production like latency, costs, and quality issues. It also covers approaches for systematically evaluating the system, including using LLMs as judges, and experimenting and optimizing components like retrieval and generation through configuration tuning, model fine-tuning, and customizing the pipeline.

How all the pieces fit together
AI app serving & routing
Model training & continuous tuning
Python-native Workspaces
GPU/CPU optimizations
Multi-Cloud, auto-scaling
Anyscale AI Platform
Anyscale Endpoints
LLMs served via API
LLMs fine-tuned via API
Ray AI Libraries Ray Core
Ray Open Source
Serve your LLMs from your Cloud
Fine-tune & customize in your Cloud
Anyscale Private
Endpoints
Summarization
LLMs are very good at summarizing
When GPT-3 came out, it outperformed existing engineered
solutions
Easy: prompt is
- Please summarize this into x bullet points
- Stick to the facts in the document
- Leave out irrelevant parts
- [Optional] Particularly focus on topics A, B and C
Summarization
- Summarize:
- Research papers
- Product updates
- Business contracts
- Latest industry news
- Legislative changes
- Quality control reports
Practical examples

Recommended for you

The Future of AI is Generative not Discriminative 5/26/2021
The Future of AI is Generative not Discriminative 5/26/2021The Future of AI is Generative not Discriminative 5/26/2021
The Future of AI is Generative not Discriminative 5/26/2021

The deep learning AI revolution has been sweeping the world for a decade now. Deep neural nets are routinely used for tasks like translation, fraud detection, and image classification. PwC estimates that they will create $15.7 trillion/year of value by 2030. But most current networks are "discriminative" in that they directly map inputs to predictions. This type of model requires lots of training examples, doesn't generalize well outside of its training set, creates inscrutable representations, is subject to adversarial examples, and makes knowledge transfer difficult. People, in contrast, can learn from just a few examples, generalize far beyond their experience, and can easily transfer and reuse knowledge. In recent years, new kinds of "generative" AI models have begun to exhibit these desirable human characteristics. They represent the causal generative processes by which the data is created and can be compositional, compact, and directly interpretable. Generative AI systems that assist people can model their needs and desires and interact with empathy. Their adaptability to changing circumstances will likely be required by rapidly changing AI-driven business and social systems. Generative AI will be the engine of future AI innovation.

artificial intelligencemachine learningfuture
Challenges in AI LLMs adoption in the Enterprise
Challenges in AI LLMs adoption in the EnterpriseChallenges in AI LLMs adoption in the Enterprise
Challenges in AI LLMs adoption in the Enterprise

The presentation "ITDays_2023_GeorgeBara" discusses challenges in adopting AI large language models (LLMs) in enterprise settings. The presentation covers: 1. **Challenges in AI LLMs adoption**: It highlights the noise in the current AI landscape and questions the practical use of AI in real businesses. 2. **The DNA of an Enterprise**: Defines enterprise sizes and discusses the new solutions adoption process, emphasizing effective integration and minimizing disruption. 3. **Enterprise-Grade**: Lists qualities like robustness, reliability, scalability, performance, security, and support that are essential for enterprise-grade solutions. 4. **What are LLMs?**: Describes the pre-ChatGPT era with BERT, a model used for language understanding, and details its enterprise applications. 5. **LLM use-cases before ChatGPT**: Focuses on data triage, process automation, knowledge management, and the augmentation of business operations. 6. **EU Digital Decade Report**: Points out that AI adoption in Europe is slow and might not meet the 2030 targets. 7. **Adoption Challenges**: Addresses top challenges such as data security, predictability, performance, control, regulatory compliance, ethics, sustainability, and ROI. 8. **Conclusion**: Reflects on the slow adoption of AI in enterprises, suggesting that a surge might occur once the technology matures and is ready for enterprise use. The presenter concludes by stating that despite the hype around technologies like ChatGPT, enterprises are cautious and will adopt new technologies at their own pace. He anticipates a gradual then sudden adoption pattern once LLMs are proven to be enterprise-ready.

artificial intelligencelarge language modelstext analytics
Large Language Models - Chat AI.pdf
Large Language Models - Chat AI.pdfLarge Language Models - Chat AI.pdf
Large Language Models - Chat AI.pdf

Discussion of the current state of generative AI/Large Language Model technology, exploration of whether chat AIs can really think, future projections

aichatgptllm
Anyscale Customer using Summarization: Merlin
Merlin
“We use Anyscale Endpoints to power
consumer-facing services that have
reach to millions of users … Anyscale
Endpoints gives us 5x-8x cost
advantages over alternatives, making
it easy for us to make Merlin even more
powerful while staying affordable for
millions of users.”
Watch out for cost!
Summarization: Lesson 1
30x!
Summary Ranking established in literature.
“insiders say the row brought simmering
tensions between the starkly contrasting
pair -- both rivals for miliband's ear --
to a head.”
A: insiders say the row brought tensions between
the contrasting pair.
B: insiders say the row brought simmering tensions
between miliband's ear.
Example of comparable quality: Factuality eval

Recommended for you

Generative AI: Past, Present, and Future – A Practitioner's Perspective
Generative AI: Past, Present, and Future – A Practitioner's PerspectiveGenerative AI: Past, Present, and Future – A Practitioner's Perspective
Generative AI: Past, Present, and Future – A Practitioner's Perspective

Generative AI: Past, Present, and Future – A Practitioner's Perspective As the academic realm grapples with the profound implications of generative AI and related applications like ChatGPT, I will present a grounded view from my experience as a practitioner. Starting with the origins of neural networks in the fields of logic, psychology, and computer science, I trace its history and align it within the wider context of the pursuit of artificial intelligence. This perspective will also draw parallels with historical developments in psychology. Against this backdrop, I chart a proposed trajectory for the future. Finally, I provide actionable insights for both academics and enterprising individuals in the field.

aichatgptpsychology
Unlocking the Power of ChatGPT and AI in Testing - NextSteps, presented by Ap...
Unlocking the Power of ChatGPT and AI in Testing - NextSteps, presented by Ap...Unlocking the Power of ChatGPT and AI in Testing - NextSteps, presented by Ap...
Unlocking the Power of ChatGPT and AI in Testing - NextSteps, presented by Ap...

The document discusses AI tools for software testing such as ChatGPT, Github Copilot, and Applitools Visual AI. It provides an overview of each tool and how they can help with testing tasks like test automation, debugging, and handling dynamic content. The document also covers potential challenges with AI like data privacy issues and tools having superficial knowledge. It emphasizes that AI should be used as an assistance to humans rather than replacing them and that finding the right balance and application of tools is important.

chatgptai automationai testing
generative-ai-fundamentals and Large language models
generative-ai-fundamentals and Large language modelsgenerative-ai-fundamentals and Large language models
generative-ai-fundamentals and Large language models

Thank you for the detailed review of the protein bars. I'm glad to hear you and your family are enjoying them as a healthy snack and meal replacement option. A couple suggestions based on your feedback: - For future orders, you may want to check the expiration dates to help avoid any dried out bars towards the end of the box. Freshness is key to maintaining the moist texture. - When introducing someone new to the bars, selecting one in-person if possible allows checking the flexibility as an indicator it's moist inside. This could help avoid a disappointing first impression from a dry sample. - Storing opened boxes in an airtight container in the fridge may help extend the freshness even further when you can't

genai
Use Case Patterns for LLM Applications (1).pdf
For the summarization task, LLama 2 70b is about as good
as GPT-4 (on factuality)
Dropping to GPT-3.5-Turbo doesn’t work, significant drop in
quality
Llama 2 70b costs 30x less
Cheaper not always worse
Summarization: Lesson 3
One issue is context window size
Most LLMs can take 4000-8000 tokens (3000-6000 words)
as input
2 solutions
- Long Context Window LLMs (e.g. Claude 2: 75,000 words)
- Split-and-merge approach:
- LangChain et already have chains to do this
The Retrieval Augmented
Generation Family

Recommended for you

LLMs in Production: Tooling, Process, and Team Structure
LLMs in Production: Tooling, Process, and Team StructureLLMs in Production: Tooling, Process, and Team Structure
LLMs in Production: Tooling, Process, and Team Structure

Join Dr. Greg Loughnane and Chris Alexiuk in this exciting webinar to learn all about the tooling, processes, and team structure you need to build and operate performant, reliable, and scalable production-grade LLM applications!

technologyai technologyartificial intelligence
The current state of generative AI
The current state of generative AIThe current state of generative AI
The current state of generative AI

Delve into this insightful article to explore the current state of generative AI, its ethical implications, and the power of generative AI models across various industries.

Prompt Engineering
Prompt EngineeringPrompt Engineering
Prompt Engineering

Prompt engineering is a technique in artificial intelligence to get AI models like ChatGPT to respond correctly to our needs. The 5W1H framework can be used to get good results from ChatGPT by structuring prompts around what, who, why, where, which, and how. Prompts should provide context on what is expected from the AI, who the context is for, why the generated content is needed, where it will be used, which additional information is required, and how the output should be formatted. Well-structured prompts using this framework can elicit high-quality responses from ChatGPT.

promptengineeringpromptcareer
Retrieval Augmented Generation
Solves 2 problems:
- How do I add knowledge to an LLM that’s already trained
without retraining the whole thing?
- How do I stop the LLM from simply making stuff up
(hallucination)?
Basic approach: use a secondary source (e.g. vector
database) to augmented the prompt with context
Source 
Timing
Pre-indexed Real-time
Text Knowledge Base QA Document QA
Data Talk to data Talk to system
- Knowledge Base Question Answering
- Source: existing documentation (e.g. wikis, intranets,
slack records, etc)
- Document Question Answering
- Source: a new document
- Talk to data
- Source: an existing SQL, CSV or similar
- Talk to system
- Source: a live engine or source of data
Four flavors
Pre-index stage: Build the index of “chunks” of text

Recommended for you

Understanding GenAI/LLM and What is Google Offering - Felix Goh
Understanding GenAI/LLM and What is Google Offering - Felix GohUnderstanding GenAI/LLM and What is Google Offering - Felix Goh
Understanding GenAI/LLM and What is Google Offering - Felix Goh

With the recent buzz on Generative AI & Large Language Models, the question is to what extent can these technologies be applied at work or when you're studying and how easy is it to manage/develop your own models? Hear from our guest speaker from Google as he shares some insights into how industries are evolving with these trends and what are some of Google's offerings from Duet AI in Google Workspace to the GenAI App Builder on Google Cloud.

ChatGPT, Foundation Models and Web3.pptx
ChatGPT, Foundation Models and Web3.pptxChatGPT, Foundation Models and Web3.pptx
ChatGPT, Foundation Models and Web3.pptx

This presentation presents an overview of the challenges and opportunities of generative artificial intelligence in Web3. It includes a brief research history of generative AI as well as some of its immediate applications in Web3.

artificial intelligencecryptocurrencyblockchain
Machine learning at scale challenges and solutions
Machine learning at scale challenges and solutionsMachine learning at scale challenges and solutions
Machine learning at scale challenges and solutions

This document discusses challenges and solutions for machine learning at scale. It begins by describing how machine learning is used in enterprises for business monitoring, optimization, and data monetization. It then covers the machine learning lifecycle from identifying business questions to model deployment. Key topics discussed include modeling approaches, model evolution, standardization, governance, serving models at scale using systems like TensorFlow Serving and Flink, working with data lakes, using notebooks for development, and machine learning with Apache Spark/MLlib.

machine learningmodel serving
This is a mini search engine that provides snippets
Use Case Patterns for LLM Applications (1).pdf
- Customer support
- Internal company knowledge chatbot
- Sales search: “Who is Customer X?”
- Technical documentation
Knowledge base example applications
Endless possibilities for AI innovation.
AI app serving & routing
Model training & continuous tuning
Python-native Workspaces
GPU/CPU optimizations
Multi-Cloud, auto-scaling
Anyscale AI Platform
Anyscale Endpoints
LLMs served via API
LLMs fine-tuned via API
Ray AI Libraries Ray Core
Ray Open Source
Serve your LLMs from your Cloud
Fine-tune & customize in your Cloud
Anyscale Private
Endpoints

Recommended for you

Building and deploying LLM applications with Apache Airflow
Building and deploying LLM applications with Apache AirflowBuilding and deploying LLM applications with Apache Airflow
Building and deploying LLM applications with Apache Airflow

Behind the growing interest in Generate AI and LLM-based enterprise applications lies an expanded set of requirements for data integrations and ML orchestration. Enterprises want to use proprietary data to power LLM-based applications that create new business value, but they face challenges in moving beyond experimentation. The pipelines that power these models need to run reliably at scale, bringing together data from many sources and reacting continuously to changing conditions. This talk focuses on the design patterns for using Apache Airflow to support LLM applications created using private enterprise data. We’ll go through a real-world example of what this looks like, as well as a proposal to improve Airflow and to add additional Airflow Providers to make it easier to interact with LLMs such as the ones from OpenAI (such as GPT4) and the ones on HuggingFace, while working with both structured and unstructured data. In short, this shows how these Airflow patterns enable reliable, traceable, and scalable LLM applications within the enterprise. https://airflowsummit.org/sessions/2023/keynote-llm/

airflowapache airflowworkflow
Serverless machine learning architectures at Helixa
Serverless machine learning architectures at HelixaServerless machine learning architectures at Helixa
Serverless machine learning architectures at Helixa

Helixa uses serverless machine learning architectures to power an audience intelligence platform. It ingests large datasets and uses machine learning models to provide insights. Helixa's machine learning system is built on AWS serverless services like Lambda, Glue, Athena and S3. It features a data lake for storage, a feature store for preprocessed data, and uses techniques like map-reduce to parallelize tasks. Helixa aims to build scalable and cost-effective machine learning pipelines without having to manage servers.

serverlessmachine learning
ServerTemplate Deep Dive
ServerTemplate Deep DiveServerTemplate Deep Dive
ServerTemplate Deep Dive

RightScale Webinar: January 13, 2011 – Watch this webinar for a look behind the scenes as we discuss ServerTemplates and how are they different from alternate approaches.

computingwebinarservertemplates
Erik Brynjolfsson: Professor here at Stanford
- Introduced a RAG-based customer support system
- 14% increase in resolved customer issues per hour
- 35% increase for lowest skilled worker
“Generative AI at work”, Brynjolfsson et al,
https://www.nber.org/system/files/working_papers/w31161/w31161.pdf
Real Measured Results
- Easy incremental step if you already have a existing knowledge base
- Real challenge is not the synthesis stage, but building a good search
engine
- GIGO: Garbage in, Garbage out.
- If the retrieved results are garbage, LLMs won’t fix it
- Example startup in this space: glean.com
- You don’t need GPT-4 for synthesis. Llama 2 70b or GPT-3.5-Turbo is
good enough.
Knowledge base QA: Lessons
- Example:
- Upload a 20,000 word contract.
- Ask: Does this contract give us any rights if the
customer files Chapter 11?
- Main difference: have not seen document before
- Blog post in preparation that looks at 3 approaches:
- Index it in real-time
- Use Large Context Window and shove it in (Claude 2)
- Divide into paragraphs then scatter-gather
Document Question Answering
- Example:
- You have a database of sales numbers
- You ask a natural language query:
- “Which salesperson in the East Coast has seen the
greatest monthly sales?”
- Usual approach
- Translate natural language to SQL or similar
- Note: You have to be really careful with SQL from an LLM
– could contain injection attacks.
Talk to data

Recommended for you

Streaming is a Detail
Streaming is a DetailStreaming is a Detail
Streaming is a Detail

"We can all agree that streaming is super cool. And for a while now, the adoption conversation has been largely led with an all-in mentality. But that’s silly. The only concerns end users have are: -The freshness of their data -Latency they require to meet their SLAs from source to consumption -All while maintaining data quality and governance. Luckily, the industry has realized this and we have seen a shift of streaming capabilities surfacing as an in-database technology, via objects as trivial to analytics engineers as views - materialized that is. With this convergence of streaming capabilities and batch level accessibility, this is when ELT tools like dbt can join in and expand out the adoption story. dbt is the T in ELT, Extract Load and Transform. In dbt, analytics engineers design models - SQL (and occasional python) statements that encapsulate business logic. At runtime, dbt will wrap that logic in a DDL statement and send it over to the data platform to execute. In this session, we’ll discuss how we see streaming at dbt Labs. We will dive into how we are extending dbt to support low-latency scenarios and the recent additions we have made to make batch and streaming allies in a DAG rather than archenemies."

current 23streaming capabilitiesdbt
AWS Public Sector Symposium 2014 Canberra | Putting the "Crowd" to work in th...
AWS Public Sector Symposium 2014 Canberra | Putting the "Crowd" to work in th...AWS Public Sector Symposium 2014 Canberra | Putting the "Crowd" to work in th...
AWS Public Sector Symposium 2014 Canberra | Putting the "Crowd" to work in th...

"Cloud" computing provides significant advantages and enormous cost savings by allowing IT infrastructure to be provisioned as a ubiquitous, metered, unit priced and on demand service. However, the other major resourcing issue faced by CIO’s is the provision of skilled labour to develop, support and maintain a increasing wide range of IT applications. This session will show attendees how the worldwide pool of freelance developers, the "Crowd", can be utilised as a ubiquitous, metered, unit priced and on demand resource pool to work in the "Cloud" to improve responsiveness to customer demands, reduce development timeframes and achieve significant cost savings. Although the crowd can bring enormous benefits in terms of cost and agility, there are some technical and business barriers to adoption in large organisations. This presentation will discuss the barriers and, using some real examples, will explain how GoSource overcomes them.

awspsscbr2014awscloud computing
UnConference for Georgia Southern Computer Science March 31, 2015
UnConference for Georgia Southern Computer Science March 31, 2015UnConference for Georgia Southern Computer Science March 31, 2015
UnConference for Georgia Southern Computer Science March 31, 2015

I presented to the Georgia Southern Computer Science ACM group. Rather than one topic for 90 minutes, I decided to do an UnConference. I presented them a list of 8-9 topics, let them vote on what to talk about, then repeated. Each presentation was ~8 minutes, (Except Career) and was by no means an attempt to explain the full concept or technology. Only to wake up their interest.

Use Case Patterns for LLM Applications (1).pdf
A small fine-tuned open source model
can outperform the best available general model
in some cases
The Power of Fine-tuning in Cost Reduction
Use Case Patterns for LLM Applications (1).pdf
Anyscale Endpoints - fine-tuning
Llama-2-7B GPT-4
Superior task-specific performance at 1/300th the cost of GPT-4.
fine-tuned
3%
78%
86%

Recommended for you

DevOps for DataScience
DevOps for DataScienceDevOps for DataScience
DevOps for DataScience

A talk for SF big analytics meetup. Building, testing, deploying, monitoring and maintaining big data analytics services. http://hydrosphere.io/

big dataapache sparkapache kafka
Apache ® Spark™ MLlib 2.x: How to Productionize your Machine Learning Models
Apache ® Spark™ MLlib 2.x: How to Productionize your Machine Learning ModelsApache ® Spark™ MLlib 2.x: How to Productionize your Machine Learning Models
Apache ® Spark™ MLlib 2.x: How to Productionize your Machine Learning Models

Apache Spark has rapidly become a key tool for data scientists to explore, understand and transform massive datasets and to build and train advanced machine learning models. The question then becomes, how do I deploy these model to a production environment? How do I embed what I have learned into customer facing data applications? In this webinar, we will discuss best practices from Databricks on how our customers productionize machine learning models do a deep dive with actual customer case studies, show live tutorials of a few example architectures and code in Python, Scala, Java and SQL.

machine learningapache spark 2.x
KubeCon & CloudNative Con 2024 Artificial Intelligent
KubeCon & CloudNative Con 2024 Artificial IntelligentKubeCon & CloudNative Con 2024 Artificial Intelligent
KubeCon & CloudNative Con 2024 Artificial Intelligent

Cloud Native Compute Foundation and KubeCon 2024 - Paris Cloud Native Artifical Intelligenet (CNAI)

aicnaiartificial intelligent
- Similar to talk to data but instead of a database talk to a
live system
- Example (Wireless Network):
- Q: “Any area seeing wifi congestion?”
- A: “Yes, floor 7 is. I see that there are a large number of
visitors trying to use the guest network.”
Talk to system
- Define functions for querying your system
- E.g. get_congestion_status(),
get_network_usage_type()
- Translate your queries into those functions
Basic approach
This is easy but only know one company …
In-context assistance

Recommended for you

Best Practices with OLAP Modeling with Cognos Transformer (Cognos 8)
Best Practices with OLAP Modeling with Cognos Transformer (Cognos 8)Best Practices with OLAP Modeling with Cognos Transformer (Cognos 8)
Best Practices with OLAP Modeling with Cognos Transformer (Cognos 8)

Learn best practices for OLAP modeling using Cognos 8 Transformer. View the video recording and download this deck: http://www.senturus.com/resources/best-practices-in-olap-modeling-with-cognos-transformer/ Topics include: • The value of online analytical processing (OLAP) • Reference business intelligence architecture • How to synchronize Transformer and the data source • Customizing the Transformer model using sub-dimensions and relative time categories • Resolving conflicts when using multiple data sources • Setting measure properties and comparing roll-up options plus applying basic cube security • OLAP enhancements in Cognos 10 Senturus, a business analytics consulting firm, has a resource library with hundreds of free recorded webinars, trainings, demos and unbiased product reviews. Take a look and share them with your colleagues and friends: http://www.senturus.com/resources/.

business intelligence solutionsolap cubes.online analytical processing
Open LLMs: Viable for Production or Low-Quality Toy?
Open LLMs: Viable for Production or Low-Quality Toy?Open LLMs: Viable for Production or Low-Quality Toy?
Open LLMs: Viable for Production or Low-Quality Toy?

Are Open LLMs useful for production applications, or are they low quality toys useful only for experiments? We share our experiences using open LLMs vs proprietary LLMs.

artificial intelligencemachine learninggenerative ai
System design for Web Application
System design for Web ApplicationSystem design for Web Application
System design for Web Application

The document provides guidance on designing a complex web application by breaking it into multiple microservices or applications. It recommends asking questions about team size, traffic patterns, priorities for speed vs stability, existing APIs or libraries, and programming languages. Based on the answers, it suggests appropriate frameworks, languages, data storage, testing/deployment processes, and server/container management options. The overall goal is to modularize the application, leverage existing tools when possible, and not overengineer parts of the design.

systems
- Tools that help you get your job done while you are working on it
- Automatically analyze the content
- Example:
- Smart code completion
- Looks at surrounding code, environment etc
In-context assistance
- “Autocomplete on steroids” for software developers
- 95 developers, randomized controlled trial.
Copilot users 55% faster
- 96% of Copilot users faster on repetitive tasks
- 74% said it allowed them to focus on more satisfying
work
source:
https://github.blog/2022-09-07-research-quantifying-github-copilots-impact-on-developer-productivit
y-and-happiness/
Github Copilot
- Our internal diagnostic tool
- Anyscale has an IDE – incl Jupyter notebook
Anyscale Doctor
Use Case Patterns for LLM Applications (1).pdf

Recommended for you

Fact based Generative AI
Fact based Generative AIFact based Generative AI
Fact based Generative AI

The document discusses Retrievable Augmented Generation (RAG), a technique to improve responses from large language models by providing additional context from external knowledge sources. It outlines challenges with current language models providing inconsistent responses and lack of understanding. As a solution, it proposes fine-tuning models using RAG and additional context. It then provides an example of implementing a RAG pipeline to power a question answering system for Munich Airport, describing components needed and hosting options for large language models.

outsystemsgenerative ailow-code
Data Streaming with Apache Kafka & MongoDB
Data Streaming with Apache Kafka & MongoDBData Streaming with Apache Kafka & MongoDB
Data Streaming with Apache Kafka & MongoDB

Explore the use-cases and architecture for Apache Kafka, and how it integrates with MongoDB to build sophisticated data-driven applications that exploit new sources of data.

Qubole on AWS - White paper
Qubole on AWS - White paper Qubole on AWS - White paper
Qubole on AWS - White paper

A whitepaper is about Qubole on AWS provides end-to-end data lake services such as AWS infrastructure management, data management, continuous data engineering, analytics, & ML with zero administration https://www.qubole.com/resources/white-papers/qubole-on-aws

awsopen data lake platform on awsaws data workloads
Use Case Patterns for LLM Applications (1).pdf
- Took a lot of trial and error to build
- Build on top of RAG system for additional analysis
- At the end, not that complicated
- But lots of experimentation
- Now going to be deployed in the product
Our experiences
Anyscale Doctor
User Input Summarize Categorize
Dependency
Error
Python Error
Infra Error
🦙🦙
70B
🦙🦙
70B
🦙🦙
Code
🦙🦙
Code
🦙🦙
Code QA
🦙🦙
Code QA
1. Prototype with GPT-4 (or Claude if you need big context windows).
If GPT-4 doesn’t work, nothing else is likely to.
2. One LLM call does one job. Don’t ask an LLM to summarize and
classify. Do 2 llm calls, one to summarize one to classify.
3. Llama 2 70b can be useful as a “day to day” LLM if you remember
Rule 2. GPT-4 is less sensitive to dual tasks.
4. Fine tuning is for form, not facts. RAG is for facts.
5. If you can, avoid self-hosting. It’s more difficult than it looks (e.g.
dealing with traffic peaks cost effectively), esp multi-GPU LLMs like
Llama 70b. If you have to, use RayLLM.
Bonus: Waleed’s Hard-won Heuristics

Recommended for you

2019 StartIT - Boosting your performance with Blackfire
2019 StartIT - Boosting your performance with Blackfire2019 StartIT - Boosting your performance with Blackfire
2019 StartIT - Boosting your performance with Blackfire

A workshop held in StartIT as part of Catena Media learning sessions. We aim to dispel the notion that large PHP applications tend to be sluggish, resource-intensive and slow compared to what the likes of Python, Erlang or even Node can do. The issue is not with optimising PHP internals - it's the lack of proper introspection tools and getting them into our every day workflow that counts! In this workshop we will talk about our struggles with whipping PHP Applications into shape, as well as work together on some of the more interesting examples of CPU or IO drain.

phpblackfiresymfony
Generative A IBootcamp-Presentation echnologies and how they connect Using t...
Generative A IBootcamp-Presentation  echnologies and how they connect Using t...Generative A IBootcamp-Presentation  echnologies and how they connect Using t...
Generative A IBootcamp-Presentation echnologies and how they connect Using t...

Generative AI Bootcamp

Ml ops on AWS
Ml ops on AWSMl ops on AWS
Ml ops on AWS

The document provides information about an experienced machine learning solutions architect. It includes details about their experience and qualifications, including 12 AWS certifications and over 6 years of AWS experience. It also discusses their vision for MLOps and experience producing machine learning models at scale. Their role at Inawisdom as a principal solutions architect and head of practice is mentioned.

mlopsaws
- Certain “use case patterns” we see
- Equip you to spot LLM opportunities in your organization
- Patterns (in rough order of difficulty)
- Summarization
- The RAG Family
- Knowledge Base Question Answering
- Document Question Answering
- Talk to your data
- Talk to your system
- In-context assistance family
- Co-creator
- Diagnostician
Key points
Thank You!
Endpoints: endpoints.anyscale.com
RayLLM: github.com/ray-project/ray-llm
Details: anyscale.com/blog
Numbers: llm-numbers.ray.io
Ray: ray.io
Anyscale: anyscale.com
Me: mwk@anyscale.com

More Related Content

What's hot

How ChatGPT and AI-assisted coding changes software engineering profoundly
How ChatGPT and AI-assisted coding changes software engineering profoundlyHow ChatGPT and AI-assisted coding changes software engineering profoundly
How ChatGPT and AI-assisted coding changes software engineering profoundly
Pekka Abrahamsson / Tampere University
 
AI and ML Series - Introduction to Generative AI and LLMs - Session 1
AI and ML Series - Introduction to Generative AI and LLMs - Session 1AI and ML Series - Introduction to Generative AI and LLMs - Session 1
AI and ML Series - Introduction to Generative AI and LLMs - Session 1
DianaGray10
 
Exploring Opportunities in the Generative AI Value Chain.pdf
Exploring Opportunities in the Generative AI Value Chain.pdfExploring Opportunities in the Generative AI Value Chain.pdf
Exploring Opportunities in the Generative AI Value Chain.pdf
Dung Hoang
 
Generative AI for the rest of us
Generative AI for the rest of usGenerative AI for the rest of us
Generative AI for the rest of us
Massimo Ferre'
 
The Rise of the LLMs - How I Learned to Stop Worrying & Love the GPT!
The Rise of the LLMs - How I Learned to Stop Worrying & Love the GPT!The Rise of the LLMs - How I Learned to Stop Worrying & Love the GPT!
The Rise of the LLMs - How I Learned to Stop Worrying & Love the GPT!
taozen
 
Episode 2: The LLM / GPT / AI Prompt / Data Engineer Roadmap
Episode 2: The LLM / GPT / AI Prompt / Data Engineer RoadmapEpisode 2: The LLM / GPT / AI Prompt / Data Engineer Roadmap
Episode 2: The LLM / GPT / AI Prompt / Data Engineer Roadmap
Anant Corporation
 
Unlocking the Power of Generative AI An Executive's Guide.pdf
Unlocking the Power of Generative AI An Executive's Guide.pdfUnlocking the Power of Generative AI An Executive's Guide.pdf
Unlocking the Power of Generative AI An Executive's Guide.pdf
PremNaraindas1
 
Leveraging Generative AI & Best practices
Leveraging Generative AI & Best practicesLeveraging Generative AI & Best practices
Leveraging Generative AI & Best practices
DianaGray10
 
presentation.pdf
presentation.pdfpresentation.pdf
presentation.pdf
caa28steve
 
The Future of AI is Generative not Discriminative 5/26/2021
The Future of AI is Generative not Discriminative 5/26/2021The Future of AI is Generative not Discriminative 5/26/2021
The Future of AI is Generative not Discriminative 5/26/2021
Steve Omohundro
 
Challenges in AI LLMs adoption in the Enterprise
Challenges in AI LLMs adoption in the EnterpriseChallenges in AI LLMs adoption in the Enterprise
Challenges in AI LLMs adoption in the Enterprise
George Bara
 
Large Language Models - Chat AI.pdf
Large Language Models - Chat AI.pdfLarge Language Models - Chat AI.pdf
Large Language Models - Chat AI.pdf
David Rostcheck
 
Generative AI: Past, Present, and Future – A Practitioner's Perspective
Generative AI: Past, Present, and Future – A Practitioner's PerspectiveGenerative AI: Past, Present, and Future – A Practitioner's Perspective
Generative AI: Past, Present, and Future – A Practitioner's Perspective
Huahai Yang
 
Unlocking the Power of ChatGPT and AI in Testing - NextSteps, presented by Ap...
Unlocking the Power of ChatGPT and AI in Testing - NextSteps, presented by Ap...Unlocking the Power of ChatGPT and AI in Testing - NextSteps, presented by Ap...
Unlocking the Power of ChatGPT and AI in Testing - NextSteps, presented by Ap...
Applitools
 
generative-ai-fundamentals and Large language models
generative-ai-fundamentals and Large language modelsgenerative-ai-fundamentals and Large language models
generative-ai-fundamentals and Large language models
AdventureWorld5
 
LLMs in Production: Tooling, Process, and Team Structure
LLMs in Production: Tooling, Process, and Team StructureLLMs in Production: Tooling, Process, and Team Structure
LLMs in Production: Tooling, Process, and Team Structure
Aggregage
 
The current state of generative AI
The current state of generative AIThe current state of generative AI
The current state of generative AI
Benjaminlapid1
 
Prompt Engineering
Prompt EngineeringPrompt Engineering
Prompt Engineering
Manjunatha Sai
 
Understanding GenAI/LLM and What is Google Offering - Felix Goh
Understanding GenAI/LLM and What is Google Offering - Felix GohUnderstanding GenAI/LLM and What is Google Offering - Felix Goh
Understanding GenAI/LLM and What is Google Offering - Felix Goh
NUS-ISS
 
ChatGPT, Foundation Models and Web3.pptx
ChatGPT, Foundation Models and Web3.pptxChatGPT, Foundation Models and Web3.pptx
ChatGPT, Foundation Models and Web3.pptx
Jesus Rodriguez
 

What's hot (20)

How ChatGPT and AI-assisted coding changes software engineering profoundly
How ChatGPT and AI-assisted coding changes software engineering profoundlyHow ChatGPT and AI-assisted coding changes software engineering profoundly
How ChatGPT and AI-assisted coding changes software engineering profoundly
 
AI and ML Series - Introduction to Generative AI and LLMs - Session 1
AI and ML Series - Introduction to Generative AI and LLMs - Session 1AI and ML Series - Introduction to Generative AI and LLMs - Session 1
AI and ML Series - Introduction to Generative AI and LLMs - Session 1
 
Exploring Opportunities in the Generative AI Value Chain.pdf
Exploring Opportunities in the Generative AI Value Chain.pdfExploring Opportunities in the Generative AI Value Chain.pdf
Exploring Opportunities in the Generative AI Value Chain.pdf
 
Generative AI for the rest of us
Generative AI for the rest of usGenerative AI for the rest of us
Generative AI for the rest of us
 
The Rise of the LLMs - How I Learned to Stop Worrying & Love the GPT!
The Rise of the LLMs - How I Learned to Stop Worrying & Love the GPT!The Rise of the LLMs - How I Learned to Stop Worrying & Love the GPT!
The Rise of the LLMs - How I Learned to Stop Worrying & Love the GPT!
 
Episode 2: The LLM / GPT / AI Prompt / Data Engineer Roadmap
Episode 2: The LLM / GPT / AI Prompt / Data Engineer RoadmapEpisode 2: The LLM / GPT / AI Prompt / Data Engineer Roadmap
Episode 2: The LLM / GPT / AI Prompt / Data Engineer Roadmap
 
Unlocking the Power of Generative AI An Executive's Guide.pdf
Unlocking the Power of Generative AI An Executive's Guide.pdfUnlocking the Power of Generative AI An Executive's Guide.pdf
Unlocking the Power of Generative AI An Executive's Guide.pdf
 
Leveraging Generative AI & Best practices
Leveraging Generative AI & Best practicesLeveraging Generative AI & Best practices
Leveraging Generative AI & Best practices
 
presentation.pdf
presentation.pdfpresentation.pdf
presentation.pdf
 
The Future of AI is Generative not Discriminative 5/26/2021
The Future of AI is Generative not Discriminative 5/26/2021The Future of AI is Generative not Discriminative 5/26/2021
The Future of AI is Generative not Discriminative 5/26/2021
 
Challenges in AI LLMs adoption in the Enterprise
Challenges in AI LLMs adoption in the EnterpriseChallenges in AI LLMs adoption in the Enterprise
Challenges in AI LLMs adoption in the Enterprise
 
Large Language Models - Chat AI.pdf
Large Language Models - Chat AI.pdfLarge Language Models - Chat AI.pdf
Large Language Models - Chat AI.pdf
 
Generative AI: Past, Present, and Future – A Practitioner's Perspective
Generative AI: Past, Present, and Future – A Practitioner's PerspectiveGenerative AI: Past, Present, and Future – A Practitioner's Perspective
Generative AI: Past, Present, and Future – A Practitioner's Perspective
 
Unlocking the Power of ChatGPT and AI in Testing - NextSteps, presented by Ap...
Unlocking the Power of ChatGPT and AI in Testing - NextSteps, presented by Ap...Unlocking the Power of ChatGPT and AI in Testing - NextSteps, presented by Ap...
Unlocking the Power of ChatGPT and AI in Testing - NextSteps, presented by Ap...
 
generative-ai-fundamentals and Large language models
generative-ai-fundamentals and Large language modelsgenerative-ai-fundamentals and Large language models
generative-ai-fundamentals and Large language models
 
LLMs in Production: Tooling, Process, and Team Structure
LLMs in Production: Tooling, Process, and Team StructureLLMs in Production: Tooling, Process, and Team Structure
LLMs in Production: Tooling, Process, and Team Structure
 
The current state of generative AI
The current state of generative AIThe current state of generative AI
The current state of generative AI
 
Prompt Engineering
Prompt EngineeringPrompt Engineering
Prompt Engineering
 
Understanding GenAI/LLM and What is Google Offering - Felix Goh
Understanding GenAI/LLM and What is Google Offering - Felix GohUnderstanding GenAI/LLM and What is Google Offering - Felix Goh
Understanding GenAI/LLM and What is Google Offering - Felix Goh
 
ChatGPT, Foundation Models and Web3.pptx
ChatGPT, Foundation Models and Web3.pptxChatGPT, Foundation Models and Web3.pptx
ChatGPT, Foundation Models and Web3.pptx
 

Similar to Use Case Patterns for LLM Applications (1).pdf

Machine learning at scale challenges and solutions
Machine learning at scale challenges and solutionsMachine learning at scale challenges and solutions
Machine learning at scale challenges and solutions
Stavros Kontopoulos
 
Building and deploying LLM applications with Apache Airflow
Building and deploying LLM applications with Apache AirflowBuilding and deploying LLM applications with Apache Airflow
Building and deploying LLM applications with Apache Airflow
Kaxil Naik
 
Serverless machine learning architectures at Helixa
Serverless machine learning architectures at HelixaServerless machine learning architectures at Helixa
Serverless machine learning architectures at Helixa
Data Science Milan
 
ServerTemplate Deep Dive
ServerTemplate Deep DiveServerTemplate Deep Dive
ServerTemplate Deep Dive
RightScale
 
Streaming is a Detail
Streaming is a DetailStreaming is a Detail
Streaming is a Detail
HostedbyConfluent
 
AWS Public Sector Symposium 2014 Canberra | Putting the "Crowd" to work in th...
AWS Public Sector Symposium 2014 Canberra | Putting the "Crowd" to work in th...AWS Public Sector Symposium 2014 Canberra | Putting the "Crowd" to work in th...
AWS Public Sector Symposium 2014 Canberra | Putting the "Crowd" to work in th...
Amazon Web Services
 
UnConference for Georgia Southern Computer Science March 31, 2015
UnConference for Georgia Southern Computer Science March 31, 2015UnConference for Georgia Southern Computer Science March 31, 2015
UnConference for Georgia Southern Computer Science March 31, 2015
Christopher Curtin
 
DevOps for DataScience
DevOps for DataScienceDevOps for DataScience
DevOps for DataScience
Stepan Pushkarev
 
Apache ® Spark™ MLlib 2.x: How to Productionize your Machine Learning Models
Apache ® Spark™ MLlib 2.x: How to Productionize your Machine Learning ModelsApache ® Spark™ MLlib 2.x: How to Productionize your Machine Learning Models
Apache ® Spark™ MLlib 2.x: How to Productionize your Machine Learning Models
Anyscale
 
KubeCon & CloudNative Con 2024 Artificial Intelligent
KubeCon & CloudNative Con 2024 Artificial IntelligentKubeCon & CloudNative Con 2024 Artificial Intelligent
KubeCon & CloudNative Con 2024 Artificial Intelligent
Emre Gündoğdu
 
Best Practices with OLAP Modeling with Cognos Transformer (Cognos 8)
Best Practices with OLAP Modeling with Cognos Transformer (Cognos 8)Best Practices with OLAP Modeling with Cognos Transformer (Cognos 8)
Best Practices with OLAP Modeling with Cognos Transformer (Cognos 8)
Senturus
 
Open LLMs: Viable for Production or Low-Quality Toy?
Open LLMs: Viable for Production or Low-Quality Toy?Open LLMs: Viable for Production or Low-Quality Toy?
Open LLMs: Viable for Production or Low-Quality Toy?
M Waleed Kadous
 
System design for Web Application
System design for Web ApplicationSystem design for Web Application
System design for Web Application
Michael Choi
 
Fact based Generative AI
Fact based Generative AIFact based Generative AI
Fact based Generative AI
Stefan Weber
 
Data Streaming with Apache Kafka & MongoDB
Data Streaming with Apache Kafka & MongoDBData Streaming with Apache Kafka & MongoDB
Data Streaming with Apache Kafka & MongoDB
confluent
 
Qubole on AWS - White paper
Qubole on AWS - White paper Qubole on AWS - White paper
Qubole on AWS - White paper
Vasu S
 
2019 StartIT - Boosting your performance with Blackfire
2019 StartIT - Boosting your performance with Blackfire2019 StartIT - Boosting your performance with Blackfire
2019 StartIT - Boosting your performance with Blackfire
Marko Mitranić
 
Generative A IBootcamp-Presentation echnologies and how they connect Using t...
Generative A IBootcamp-Presentation  echnologies and how they connect Using t...Generative A IBootcamp-Presentation  echnologies and how they connect Using t...
Generative A IBootcamp-Presentation echnologies and how they connect Using t...
micromemail
 
Ml ops on AWS
Ml ops on AWSMl ops on AWS
Ml ops on AWS
PhilipBasford
 
Webinar: High Performance MongoDB Applications with IBM POWER8
Webinar: High Performance MongoDB Applications with IBM POWER8Webinar: High Performance MongoDB Applications with IBM POWER8
Webinar: High Performance MongoDB Applications with IBM POWER8
MongoDB
 

Similar to Use Case Patterns for LLM Applications (1).pdf (20)

Machine learning at scale challenges and solutions
Machine learning at scale challenges and solutionsMachine learning at scale challenges and solutions
Machine learning at scale challenges and solutions
 
Building and deploying LLM applications with Apache Airflow
Building and deploying LLM applications with Apache AirflowBuilding and deploying LLM applications with Apache Airflow
Building and deploying LLM applications with Apache Airflow
 
Serverless machine learning architectures at Helixa
Serverless machine learning architectures at HelixaServerless machine learning architectures at Helixa
Serverless machine learning architectures at Helixa
 
ServerTemplate Deep Dive
ServerTemplate Deep DiveServerTemplate Deep Dive
ServerTemplate Deep Dive
 
Streaming is a Detail
Streaming is a DetailStreaming is a Detail
Streaming is a Detail
 
AWS Public Sector Symposium 2014 Canberra | Putting the "Crowd" to work in th...
AWS Public Sector Symposium 2014 Canberra | Putting the "Crowd" to work in th...AWS Public Sector Symposium 2014 Canberra | Putting the "Crowd" to work in th...
AWS Public Sector Symposium 2014 Canberra | Putting the "Crowd" to work in th...
 
UnConference for Georgia Southern Computer Science March 31, 2015
UnConference for Georgia Southern Computer Science March 31, 2015UnConference for Georgia Southern Computer Science March 31, 2015
UnConference for Georgia Southern Computer Science March 31, 2015
 
DevOps for DataScience
DevOps for DataScienceDevOps for DataScience
DevOps for DataScience
 
Apache ® Spark™ MLlib 2.x: How to Productionize your Machine Learning Models
Apache ® Spark™ MLlib 2.x: How to Productionize your Machine Learning ModelsApache ® Spark™ MLlib 2.x: How to Productionize your Machine Learning Models
Apache ® Spark™ MLlib 2.x: How to Productionize your Machine Learning Models
 
KubeCon & CloudNative Con 2024 Artificial Intelligent
KubeCon & CloudNative Con 2024 Artificial IntelligentKubeCon & CloudNative Con 2024 Artificial Intelligent
KubeCon & CloudNative Con 2024 Artificial Intelligent
 
Best Practices with OLAP Modeling with Cognos Transformer (Cognos 8)
Best Practices with OLAP Modeling with Cognos Transformer (Cognos 8)Best Practices with OLAP Modeling with Cognos Transformer (Cognos 8)
Best Practices with OLAP Modeling with Cognos Transformer (Cognos 8)
 
Open LLMs: Viable for Production or Low-Quality Toy?
Open LLMs: Viable for Production or Low-Quality Toy?Open LLMs: Viable for Production or Low-Quality Toy?
Open LLMs: Viable for Production or Low-Quality Toy?
 
System design for Web Application
System design for Web ApplicationSystem design for Web Application
System design for Web Application
 
Fact based Generative AI
Fact based Generative AIFact based Generative AI
Fact based Generative AI
 
Data Streaming with Apache Kafka & MongoDB
Data Streaming with Apache Kafka & MongoDBData Streaming with Apache Kafka & MongoDB
Data Streaming with Apache Kafka & MongoDB
 
Qubole on AWS - White paper
Qubole on AWS - White paper Qubole on AWS - White paper
Qubole on AWS - White paper
 
2019 StartIT - Boosting your performance with Blackfire
2019 StartIT - Boosting your performance with Blackfire2019 StartIT - Boosting your performance with Blackfire
2019 StartIT - Boosting your performance with Blackfire
 
Generative A IBootcamp-Presentation echnologies and how they connect Using t...
Generative A IBootcamp-Presentation  echnologies and how they connect Using t...Generative A IBootcamp-Presentation  echnologies and how they connect Using t...
Generative A IBootcamp-Presentation echnologies and how they connect Using t...
 
Ml ops on AWS
Ml ops on AWSMl ops on AWS
Ml ops on AWS
 
Webinar: High Performance MongoDB Applications with IBM POWER8
Webinar: High Performance MongoDB Applications with IBM POWER8Webinar: High Performance MongoDB Applications with IBM POWER8
Webinar: High Performance MongoDB Applications with IBM POWER8
 

Recently uploaded

CCS367-STORAGE TECHNOLOGIES QUESTION BANK.doc
CCS367-STORAGE TECHNOLOGIES QUESTION BANK.docCCS367-STORAGE TECHNOLOGIES QUESTION BANK.doc
CCS367-STORAGE TECHNOLOGIES QUESTION BANK.doc
Dss
 
Lecture 6 - The effect of Corona effect in Power systems.pdf
Lecture 6 - The effect of Corona effect in Power systems.pdfLecture 6 - The effect of Corona effect in Power systems.pdf
Lecture 6 - The effect of Corona effect in Power systems.pdf
peacekipu
 
Vernier Caliper and How to use Vernier Caliper.ppsx
Vernier Caliper and How to use Vernier Caliper.ppsxVernier Caliper and How to use Vernier Caliper.ppsx
Vernier Caliper and How to use Vernier Caliper.ppsx
Tool and Die Tech
 
Social media management system project report.pdf
Social media management system project report.pdfSocial media management system project report.pdf
Social media management system project report.pdf
Kamal Acharya
 
Evento anual Splunk .conf24 Highlights recap
Evento anual Splunk .conf24 Highlights recapEvento anual Splunk .conf24 Highlights recap
Evento anual Splunk .conf24 Highlights recap
Rafael Santos
 
OCS Training - Rig Equipment Inspection - Advanced 5 Days_IADC.pdf
OCS Training - Rig Equipment Inspection - Advanced 5 Days_IADC.pdfOCS Training - Rig Equipment Inspection - Advanced 5 Days_IADC.pdf
OCS Training - Rig Equipment Inspection - Advanced 5 Days_IADC.pdf
Muanisa Waras
 
Net Zero Case Study: SRK House and SRK Empire
Net Zero Case Study: SRK House and SRK EmpireNet Zero Case Study: SRK House and SRK Empire
Net Zero Case Study: SRK House and SRK Empire
Global Network for Zero
 
Online music portal management system project report.pdf
Online music portal management system project report.pdfOnline music portal management system project report.pdf
Online music portal management system project report.pdf
Kamal Acharya
 
Lecture 3 Biomass energy...............ppt
Lecture 3 Biomass energy...............pptLecture 3 Biomass energy...............ppt
Lecture 3 Biomass energy...............ppt
RujanTimsina1
 
Unit 1 Information Storage and Retrieval
Unit 1 Information Storage and RetrievalUnit 1 Information Storage and Retrieval
Unit 1 Information Storage and Retrieval
KishorMahale5
 
Exploring Deep Learning Models for Image Recognition: A Comparative Review
Exploring Deep Learning Models for Image Recognition: A Comparative ReviewExploring Deep Learning Models for Image Recognition: A Comparative Review
Exploring Deep Learning Models for Image Recognition: A Comparative Review
sipij
 
Understanding Cybersecurity Breaches: Causes, Consequences, and Prevention
Understanding Cybersecurity Breaches: Causes, Consequences, and PreventionUnderstanding Cybersecurity Breaches: Causes, Consequences, and Prevention
Understanding Cybersecurity Breaches: Causes, Consequences, and Prevention
Bert Blevins
 
Biology for computer science BBOC407 vtu
Biology for computer science BBOC407 vtuBiology for computer science BBOC407 vtu
Biology for computer science BBOC407 vtu
santoshpatilrao33
 
21EC63_Module1B.pptx VLSI design 21ec63 MOS TRANSISTOR THEORY
21EC63_Module1B.pptx VLSI design 21ec63 MOS TRANSISTOR THEORY21EC63_Module1B.pptx VLSI design 21ec63 MOS TRANSISTOR THEORY
21EC63_Module1B.pptx VLSI design 21ec63 MOS TRANSISTOR THEORY
PradeepKumarSK3
 
kiln burning and kiln burner system for clinker
kiln burning and kiln burner system for clinkerkiln burning and kiln burner system for clinker
kiln burning and kiln burner system for clinker
hamedmustafa094
 
Rotary Intersection in traffic engineering.pptx
Rotary Intersection in traffic engineering.pptxRotary Intersection in traffic engineering.pptx
Rotary Intersection in traffic engineering.pptx
surekha1287
 
Best Practices of Clothing Businesses in Talavera, Nueva Ecija, A Foundation ...
Best Practices of Clothing Businesses in Talavera, Nueva Ecija, A Foundation ...Best Practices of Clothing Businesses in Talavera, Nueva Ecija, A Foundation ...
Best Practices of Clothing Businesses in Talavera, Nueva Ecija, A Foundation ...
IJAEMSJORNAL
 
L-3536-Cost Benifit Analysis in ESIA.pptx
L-3536-Cost Benifit Analysis in ESIA.pptxL-3536-Cost Benifit Analysis in ESIA.pptx
L-3536-Cost Benifit Analysis in ESIA.pptx
naseki5964
 
Paharganj @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Arti Singh Top Model Safe
Paharganj @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Arti Singh Top Model SafePaharganj @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Arti Singh Top Model Safe
Paharganj @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Arti Singh Top Model Safe
aarusi sexy model
 
Bangalore @ℂall @Girls ꧁❤ 0000000000 ❤꧂@ℂall @Girls Service Vip Top Model Safe
Bangalore @ℂall @Girls ꧁❤ 0000000000 ❤꧂@ℂall @Girls Service Vip Top Model SafeBangalore @ℂall @Girls ꧁❤ 0000000000 ❤꧂@ℂall @Girls Service Vip Top Model Safe
Bangalore @ℂall @Girls ꧁❤ 0000000000 ❤꧂@ℂall @Girls Service Vip Top Model Safe
bookhotbebes1
 

Recently uploaded (20)

CCS367-STORAGE TECHNOLOGIES QUESTION BANK.doc
CCS367-STORAGE TECHNOLOGIES QUESTION BANK.docCCS367-STORAGE TECHNOLOGIES QUESTION BANK.doc
CCS367-STORAGE TECHNOLOGIES QUESTION BANK.doc
 
Lecture 6 - The effect of Corona effect in Power systems.pdf
Lecture 6 - The effect of Corona effect in Power systems.pdfLecture 6 - The effect of Corona effect in Power systems.pdf
Lecture 6 - The effect of Corona effect in Power systems.pdf
 
Vernier Caliper and How to use Vernier Caliper.ppsx
Vernier Caliper and How to use Vernier Caliper.ppsxVernier Caliper and How to use Vernier Caliper.ppsx
Vernier Caliper and How to use Vernier Caliper.ppsx
 
Social media management system project report.pdf
Social media management system project report.pdfSocial media management system project report.pdf
Social media management system project report.pdf
 
Evento anual Splunk .conf24 Highlights recap
Evento anual Splunk .conf24 Highlights recapEvento anual Splunk .conf24 Highlights recap
Evento anual Splunk .conf24 Highlights recap
 
OCS Training - Rig Equipment Inspection - Advanced 5 Days_IADC.pdf
OCS Training - Rig Equipment Inspection - Advanced 5 Days_IADC.pdfOCS Training - Rig Equipment Inspection - Advanced 5 Days_IADC.pdf
OCS Training - Rig Equipment Inspection - Advanced 5 Days_IADC.pdf
 
Net Zero Case Study: SRK House and SRK Empire
Net Zero Case Study: SRK House and SRK EmpireNet Zero Case Study: SRK House and SRK Empire
Net Zero Case Study: SRK House and SRK Empire
 
Online music portal management system project report.pdf
Online music portal management system project report.pdfOnline music portal management system project report.pdf
Online music portal management system project report.pdf
 
Lecture 3 Biomass energy...............ppt
Lecture 3 Biomass energy...............pptLecture 3 Biomass energy...............ppt
Lecture 3 Biomass energy...............ppt
 
Unit 1 Information Storage and Retrieval
Unit 1 Information Storage and RetrievalUnit 1 Information Storage and Retrieval
Unit 1 Information Storage and Retrieval
 
Exploring Deep Learning Models for Image Recognition: A Comparative Review
Exploring Deep Learning Models for Image Recognition: A Comparative ReviewExploring Deep Learning Models for Image Recognition: A Comparative Review
Exploring Deep Learning Models for Image Recognition: A Comparative Review
 
Understanding Cybersecurity Breaches: Causes, Consequences, and Prevention
Understanding Cybersecurity Breaches: Causes, Consequences, and PreventionUnderstanding Cybersecurity Breaches: Causes, Consequences, and Prevention
Understanding Cybersecurity Breaches: Causes, Consequences, and Prevention
 
Biology for computer science BBOC407 vtu
Biology for computer science BBOC407 vtuBiology for computer science BBOC407 vtu
Biology for computer science BBOC407 vtu
 
21EC63_Module1B.pptx VLSI design 21ec63 MOS TRANSISTOR THEORY
21EC63_Module1B.pptx VLSI design 21ec63 MOS TRANSISTOR THEORY21EC63_Module1B.pptx VLSI design 21ec63 MOS TRANSISTOR THEORY
21EC63_Module1B.pptx VLSI design 21ec63 MOS TRANSISTOR THEORY
 
kiln burning and kiln burner system for clinker
kiln burning and kiln burner system for clinkerkiln burning and kiln burner system for clinker
kiln burning and kiln burner system for clinker
 
Rotary Intersection in traffic engineering.pptx
Rotary Intersection in traffic engineering.pptxRotary Intersection in traffic engineering.pptx
Rotary Intersection in traffic engineering.pptx
 
Best Practices of Clothing Businesses in Talavera, Nueva Ecija, A Foundation ...
Best Practices of Clothing Businesses in Talavera, Nueva Ecija, A Foundation ...Best Practices of Clothing Businesses in Talavera, Nueva Ecija, A Foundation ...
Best Practices of Clothing Businesses in Talavera, Nueva Ecija, A Foundation ...
 
L-3536-Cost Benifit Analysis in ESIA.pptx
L-3536-Cost Benifit Analysis in ESIA.pptxL-3536-Cost Benifit Analysis in ESIA.pptx
L-3536-Cost Benifit Analysis in ESIA.pptx
 
Paharganj @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Arti Singh Top Model Safe
Paharganj @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Arti Singh Top Model SafePaharganj @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Arti Singh Top Model Safe
Paharganj @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Arti Singh Top Model Safe
 
Bangalore @ℂall @Girls ꧁❤ 0000000000 ❤꧂@ℂall @Girls Service Vip Top Model Safe
Bangalore @ℂall @Girls ꧁❤ 0000000000 ❤꧂@ℂall @Girls Service Vip Top Model SafeBangalore @ℂall @Girls ꧁❤ 0000000000 ❤꧂@ℂall @Girls Service Vip Top Model Safe
Bangalore @ℂall @Girls ꧁❤ 0000000000 ❤꧂@ℂall @Girls Service Vip Top Model Safe
 

Use Case Patterns for LLM Applications (1).pdf

  • 1. Use case patterns for LLM Apps: What to look for M Waleed Kadous Chief Scientist, Anyscale K1st Oct 11, 2023
  • 2. - Certain “use case patterns” we see - Equip you to spot LLM opportunities in your organization - Patterns (in rough order of difficulty) - Summarization - The RAG Family - Knowledge Base Question Answering - Document Question Answering - Talk to your data - Talk to your system - In-context assistance family - Co-creator - Diagnostician - Bonus material: - Thoughts on Dr Nguyen’s presentation - Waleed’s Hard Won Heuristics Key points
  • 3. - Company behind the Open Source project Ray - Widely used Scalable AI Platform used by many companies - What scalable means: - Distributed: Up to 4,000 nodes, 16,000 GPUs - Efficient: Keep costs down by efficient resource mgmt - Reliable: Fault tolerant, highly available - Widely used by GenAI companies e.g. OpenAI, Cohere - ChatGPT trained using Ray Who is Anyscale? Why should you listen to us?
  • 5. We provide LLMs as a service (Llama models) We use LLMs to make our products better We help our customers deploy LLMs on Ray and on the managed version of Ray (Anyscale Platform) What’s our experience with LLMs?
  • 6. Anyscale Endpoints LLMs served via API LLMs fine-tuned via API Anyscale Endpoints Llama2 70B Codellama 34B $1.00 Llama2 13B $0.25 Llama2 7B $0.15 LLM Serving Price (per million tokens) endpoints.anyscale.com
  • 7. Anyscale Endpoints Cost efficiency touches every layer of the stack Anyscale Endpoints Single GPU optimizations Multi-GPU modeling Inference server Autoscaling Multi-region, multi-cloud $1 / million tokens (Llama-2 70B)
  • 8. End-to-end LLM privacy, customization and control Anyscale Endpoints LLMs served via API LLMs fine-tuned via API Serve your LLMs from your Cloud Fine-tune & customize in your Cloud Anyscale Private Endpoints Cost Quality
  • 9. How all the pieces fit together AI app serving & routing Model training & continuous tuning Python-native Workspaces GPU/CPU optimizations Multi-Cloud, auto-scaling Anyscale AI Platform Anyscale Endpoints LLMs served via API LLMs fine-tuned via API Ray AI Libraries Ray Core Ray Open Source Serve your LLMs from your Cloud Fine-tune & customize in your Cloud Anyscale Private Endpoints
  • 11. LLMs are very good at summarizing When GPT-3 came out, it outperformed existing engineered solutions Easy: prompt is - Please summarize this into x bullet points - Stick to the facts in the document - Leave out irrelevant parts - [Optional] Particularly focus on topics A, B and C Summarization
  • 12. - Summarize: - Research papers - Product updates - Business contracts - Latest industry news - Legislative changes - Quality control reports Practical examples
  • 13. Anyscale Customer using Summarization: Merlin
  • 14. Merlin “We use Anyscale Endpoints to power consumer-facing services that have reach to millions of users … Anyscale Endpoints gives us 5x-8x cost advantages over alternatives, making it easy for us to make Merlin even more powerful while staying affordable for millions of users.”
  • 15. Watch out for cost! Summarization: Lesson 1 30x!
  • 16. Summary Ranking established in literature. “insiders say the row brought simmering tensions between the starkly contrasting pair -- both rivals for miliband's ear -- to a head.” A: insiders say the row brought tensions between the contrasting pair. B: insiders say the row brought simmering tensions between miliband's ear. Example of comparable quality: Factuality eval
  • 18. For the summarization task, LLama 2 70b is about as good as GPT-4 (on factuality) Dropping to GPT-3.5-Turbo doesn’t work, significant drop in quality Llama 2 70b costs 30x less Cheaper not always worse
  • 19. Summarization: Lesson 3 One issue is context window size Most LLMs can take 4000-8000 tokens (3000-6000 words) as input 2 solutions - Long Context Window LLMs (e.g. Claude 2: 75,000 words) - Split-and-merge approach: - LangChain et already have chains to do this
  • 21. Retrieval Augmented Generation Solves 2 problems: - How do I add knowledge to an LLM that’s already trained without retraining the whole thing? - How do I stop the LLM from simply making stuff up (hallucination)? Basic approach: use a secondary source (e.g. vector database) to augmented the prompt with context
  • 22. Source Timing Pre-indexed Real-time Text Knowledge Base QA Document QA Data Talk to data Talk to system
  • 23. - Knowledge Base Question Answering - Source: existing documentation (e.g. wikis, intranets, slack records, etc) - Document Question Answering - Source: a new document - Talk to data - Source: an existing SQL, CSV or similar - Talk to system - Source: a live engine or source of data Four flavors
  • 24. Pre-index stage: Build the index of “chunks” of text
  • 25. This is a mini search engine that provides snippets
  • 27. - Customer support - Internal company knowledge chatbot - Sales search: “Who is Customer X?” - Technical documentation Knowledge base example applications
  • 28. Endless possibilities for AI innovation. AI app serving & routing Model training & continuous tuning Python-native Workspaces GPU/CPU optimizations Multi-Cloud, auto-scaling Anyscale AI Platform Anyscale Endpoints LLMs served via API LLMs fine-tuned via API Ray AI Libraries Ray Core Ray Open Source Serve your LLMs from your Cloud Fine-tune & customize in your Cloud Anyscale Private Endpoints
  • 29. Erik Brynjolfsson: Professor here at Stanford - Introduced a RAG-based customer support system - 14% increase in resolved customer issues per hour - 35% increase for lowest skilled worker “Generative AI at work”, Brynjolfsson et al, https://www.nber.org/system/files/working_papers/w31161/w31161.pdf Real Measured Results
  • 30. - Easy incremental step if you already have a existing knowledge base - Real challenge is not the synthesis stage, but building a good search engine - GIGO: Garbage in, Garbage out. - If the retrieved results are garbage, LLMs won’t fix it - Example startup in this space: glean.com - You don’t need GPT-4 for synthesis. Llama 2 70b or GPT-3.5-Turbo is good enough. Knowledge base QA: Lessons
  • 31. - Example: - Upload a 20,000 word contract. - Ask: Does this contract give us any rights if the customer files Chapter 11? - Main difference: have not seen document before - Blog post in preparation that looks at 3 approaches: - Index it in real-time - Use Large Context Window and shove it in (Claude 2) - Divide into paragraphs then scatter-gather Document Question Answering
  • 32. - Example: - You have a database of sales numbers - You ask a natural language query: - “Which salesperson in the East Coast has seen the greatest monthly sales?” - Usual approach - Translate natural language to SQL or similar - Note: You have to be really careful with SQL from an LLM – could contain injection attacks. Talk to data
  • 34. A small fine-tuned open source model can outperform the best available general model in some cases The Power of Fine-tuning in Cost Reduction
  • 36. Anyscale Endpoints - fine-tuning Llama-2-7B GPT-4 Superior task-specific performance at 1/300th the cost of GPT-4. fine-tuned 3% 78% 86%
  • 37. - Similar to talk to data but instead of a database talk to a live system - Example (Wireless Network): - Q: “Any area seeing wifi congestion?” - A: “Yes, floor 7 is. I see that there are a large number of visitors trying to use the guest network.” Talk to system
  • 38. - Define functions for querying your system - E.g. get_congestion_status(), get_network_usage_type() - Translate your queries into those functions Basic approach
  • 39. This is easy but only know one company …
  • 41. - Tools that help you get your job done while you are working on it - Automatically analyze the content - Example: - Smart code completion - Looks at surrounding code, environment etc In-context assistance
  • 42. - “Autocomplete on steroids” for software developers - 95 developers, randomized controlled trial. Copilot users 55% faster - 96% of Copilot users faster on repetitive tasks - 74% said it allowed them to focus on more satisfying work source: https://github.blog/2022-09-07-research-quantifying-github-copilots-impact-on-developer-productivit y-and-happiness/ Github Copilot
  • 43. - Our internal diagnostic tool - Anyscale has an IDE – incl Jupyter notebook Anyscale Doctor
  • 46. - Took a lot of trial and error to build - Build on top of RAG system for additional analysis - At the end, not that complicated - But lots of experimentation - Now going to be deployed in the product Our experiences
  • 47. Anyscale Doctor User Input Summarize Categorize Dependency Error Python Error Infra Error 🦙🦙 70B 🦙🦙 70B 🦙🦙 Code 🦙🦙 Code 🦙🦙 Code QA 🦙🦙 Code QA
  • 48. 1. Prototype with GPT-4 (or Claude if you need big context windows). If GPT-4 doesn’t work, nothing else is likely to. 2. One LLM call does one job. Don’t ask an LLM to summarize and classify. Do 2 llm calls, one to summarize one to classify. 3. Llama 2 70b can be useful as a “day to day” LLM if you remember Rule 2. GPT-4 is less sensitive to dual tasks. 4. Fine tuning is for form, not facts. RAG is for facts. 5. If you can, avoid self-hosting. It’s more difficult than it looks (e.g. dealing with traffic peaks cost effectively), esp multi-GPU LLMs like Llama 70b. If you have to, use RayLLM. Bonus: Waleed’s Hard-won Heuristics
  • 49. - Certain “use case patterns” we see - Equip you to spot LLM opportunities in your organization - Patterns (in rough order of difficulty) - Summarization - The RAG Family - Knowledge Base Question Answering - Document Question Answering - Talk to your data - Talk to your system - In-context assistance family - Co-creator - Diagnostician Key points
  • 50. Thank You! Endpoints: endpoints.anyscale.com RayLLM: github.com/ray-project/ray-llm Details: anyscale.com/blog Numbers: llm-numbers.ray.io Ray: ray.io Anyscale: anyscale.com Me: mwk@anyscale.com