3

I'm new to this and would like to know the techstack to finetune an LLM and the techstack to create a RAG system.

A good overview with the full code to set it up is at Huggingface - Transformers - RAG:

Retrieval-augmented generation (“RAG”) models combine the powers of pretrained dense [missing word: "passage", see Huggingface - Transformers - DPR] retrieval (DPR) and sequence-to-sequence models. RAG models retrieve documents, pass them to a seq2seq model, then marginalize to generate outputs. The retriever and seq2seq modules are initialized from pretrained models, and fine-tuned jointly, allowing both retrieval and generation to adapt to downstream tasks.

It is based on the paper Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks.

Another good guide is at Retrieval Augmented Generation (RAG) Explained: Understanding Key Concepts, with this key insight:

Retrieval models bring the "what"—the factual content—while generative models contribute the "how"—the art of composing these facts into coherent and meaningful language.

But even with this overview and code at hand, I do not know which techstack is needed to set this up.

0

1 Answer 1

3

Regarding RAG:

Python + the LangChain lib for the retriever + GPT for the answer generator is a common choice, but there exist many alternatives.

Code example for RAG-QA to answer questions about how to use Stack Exchange. The code uses Python 3.11 + LangChain lib + Faiss + GPT (via Azure):

import os
import pprint
import pandas
from langchain_community.vectorstores import FAISS
from langchain_community.embeddings import HuggingFaceEmbeddings
from langchain.docstore.document import Document
import yaml
with open('parameters.yaml') as f:
    parameters = yaml.safe_load(f)
pprint.pprint(parameters)

model = "sentence-transformers/multi-qa-MiniLM-L6-cos-v1"
embeddings = HuggingFaceEmbeddings(model_name = model)

def create_index():
    # https://python.langchain.com/docs/integrations/vectorstores/faiss
    # Uncomment the following line if you need to initialize FAISS with no AVX2 optimization
    # os.environ['FAISS_NO_AVX2'] = '1'

    langchain_documents = []
    raw_documents_df = pandas.read_csv(os.path.join('data', 'help.csv'), header=0)
    for raw_document_index, raw_document  in raw_documents_df.iterrows():
        print(raw_document)
        new_doc = Document(page_content=raw_document['section_body'],
                           metadata={"document_id": str(raw_document['document_id']),
                                     "url": str(raw_document['url']),
                                     "header": str(raw_document['section_header']),
                                     })
        langchain_documents.append(new_doc)

    db = FAISS.from_documents(langchain_documents, embeddings)
    db.save_local("faiss_index")

def test_index():
    query = "Where can I see my deleted questions?"
    print('Query:', query)
    new_db = FAISS.load_local("faiss_index", embeddings)
    relevant_documents = new_db.similarity_search_with_score(query, k=3)
    pprint.pprint(relevant_documents)
    context = []
    # Convert docs into a JSON context
    for relevant_document in relevant_documents:
        document_dictionary = {}
        print('relevant_document', relevant_document)
        document_dictionary['content'] = relevant_document[0].page_content
        document_dictionary['score'] = relevant_document[1]
        context.append(document_dictionary)

    print('\n\n\nContext:')
    pprint.pprint(context)
    answer = generate_answer(context, query)

from openai import AzureOpenAI
def generate_answer(context, query):
    pprint.pprint(context)
    client = AzureOpenAI(
        api_key=parameters['azure']['api_key'],
        azure_endpoint = parameters['azure']['azure_endpoint'],
        api_version = parameters['azure']['api_version']
    )

    llm_user_prompt = llm_user_prompt_template.format(question=query, context=context)
    print('llm_user_prompt', llm_user_prompt)

    messages = [{"role":"system",
                 "content": llm_system_prompt},
                {"role": "user", "content": llm_user_prompt}]

    chat_completion = client.chat.completions.create(messages=messages, model=parameters['azure']['model'], temperature=0)
    print(chat_completion)
    print('\n\n\nanswer', chat_completion.choices[0].message.content)

    answer = chat_completion
    return answer

llm_system_prompt =  '''You are an assistant to help people use Stack Exchange based on the information given to you. When asked about anything that does not relate to Stack Exchange, only reply with 'Content not found'''

llm_user_prompt_template = """You are asked a question by the user and you must write an answer from only the data provided in variable 'Context'.\n""" \
                            """You must use only the provided data in variable 'Context' to see if any of the text is relevant to answer the question.\n""" \
                            """You must not use any other information from any other source or from previous knowledge beyond the provided 'Context'.\n""" \
                            """Return 'content not found' if nothing relevant is found in provided Context or no Context was given\n""" \
                            """ Context is an array of JSONs in the following JSON format\n
                                       # JSON format:
                                       # {{
                                       #   "context": context text,
                                       #   "context title" : section title
                                       #   "score": score of the retrieved context.
                                       # }}
                                       Consider value in content key for context. 
                               
                                       """ \
                            """Question: "{question}" """ \
                            """Context: "{context}" """ \
                            """Constraints: \n""" \
                            """1. User is already on Stack Exchange: don't suggest opening Stack Exchange.\n""" \
                            """2. Users will be always using Stack Exchange application: don't assume or talk about any other website.\n""" \
                            """3. Respond answers in numbered list. \n""" \
                            """4. You must keep answers specific and list down all steps. \n""" \
                            """5. If an answer is found, mention it in answer as list and bullet points\n"""

if __name__ == '__main__':
    if not os.path.isdir('faiss_index'):
        print('Creating index')
        create_index()
    test_index()

Assuming that help.csv contains a list of documents on how to use Stack Exchange, e.g.:

document_id,section_header,section_body,url
1,See one's deleted questions,"blah blah
",https://genai.stackexchange.com/help/on-topic
2,Account suspensions,"blah blah
",https://genai.stackexchange.com/help/dont-ask
3,Editing answers,"blah blah
",https://genai.stackexchange.com/help/closed-questions
document_id section_header section_body url
1 See one's deleted questions blah blah
https://genai.stackexchange.com/help/on-topic
2 Account suspensions blah blah
https://genai.stackexchange.com/help/dont-ask
3 Editing answers blah blah
https://genai.stackexchange.com/help/closed-questions

and parameters.yaml:

azure:
    api_key: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
    azure_endpoint: https://xxxxxx.openai.azure.com/
    api_version: 2023-07-01-preview
    model: xxxxxxxxx

Tested with Python 3.11 with:

pip install langchain==0.1.1 langchain_openai==0.0.2.post1 sentence-transformers==2.2.2 langchain_community==0.0.13 faiss-cpu==1.7.4

Examples of common IDEs: PyCharm, Visual Studio.

Not the answer you're looking for? Browse other questions tagged or ask your own question.