Questions tagged [llm]
Large Language Models (LLMs) are pretrained models that will probabilistically generate natural language texts. The underlying model is typically a Deep Learning one. Examples include GPT models.
27
questions
0
votes
0
answers
7
views
Choosing an evaluation model for LLM for Question and Answering
I was learning the basic of LLM evaluation and the framework introduced in one of the short courses in Deep Learning AI was to generate samples of question and answer which act as the ground truth. ...
0
votes
0
answers
8
views
How to construct class proportion confidence interval for an LLM classifier with known bias and precision and recall?
Let's say I have a dataset, $D$, with known ground truth labels. I nonetheless use a few-shot LLM classifier on this dataset to predict $k$ classes for each label.
From the LLM results, I get ...
0
votes
0
answers
16
views
How to report few-shot accuracy for LLMs?
I am comparing three prompting techniques in LLMs to check which one is best. All prompting strategies include three examples for in-context learning (few-shot only, no fine-tuning).
If I do greedy ...
4
votes
0
answers
80
views
What advantage do sinusoidal positional encodings have over binary positional encodings in transformer LLMs?
I've recently come across an article that discusses the reasons why large language models use sinusoidal functions to generate positional encodings — as per the famous paper Attention Is All You Need (...
0
votes
0
answers
20
views
Where does the equation $ C = 6 \times N \times T $ come from for Large Language Models, especially with a simple explanation for both passes?
Why $ C = 6 \times N \times T $?
I'm trying to understand the computational steps specifically during the backward pass of neural networks in relation to the widely cited formula ( C = 6 \times N \...
0
votes
0
answers
11
views
Gradient flow through sampled tokens when training RNNs (but without teacher forcing)
Suppose we want to train an autoregressive generative language model based on a recurrent neural network (RNN) architecture without teacher forcing:
At each timestep, the RNN takes an input token $x_t$...
0
votes
0
answers
72
views
Are LLMs stateful or stateless during the generation process?
Are LLMs like OpenAI's gpt-* stateful or stateless during the generation of the response? I've read a couple of articles like this but am still not quite sure about that. I understand they are ...
0
votes
0
answers
27
views
Why is my (Mistral) LLM (almost completely) stopping to learn on my synthetic data after the first epoch, yet not overfitting?
I am creating synthetic task-oriented dialogs that are rather complex. Training and validation losses suggest that the model (almost completely) stops learning, but does not start overfitting:
How ...
0
votes
0
answers
70
views
Why did the OpenAI's scaling law paper underestimate the importance of data in model scaling?
The Chinchilla paper (Hoffmann, Jordan, et al. "Training compute-optimal large language models." arXiv preprint arXiv:2203.15556 (2022).) famously found that when scaling a model, you should ...
0
votes
0
answers
208
views
block_size in transformers: does it dictate effective context length in LLMs?
I would like to understand how the block_size parameter in the huggingface transformers library works, particularly in comparison with model_max_length. I am interested in models being able to attend ...
0
votes
0
answers
22
views
Does Positional Interpolation Change Llama's Architecture?
I'm currently exploring Meta's positional interpolation method, which aims to increase the context size in their large language model. This method extends the context length from n x n into n′ x n′. ...
0
votes
0
answers
24
views
Evaluation metrics for chunking and synthesis steps for Q&A system
I am doing research and interested in the following question:
What are the evaluation metrics for chunking, retrieval and synthesis steps for Q&A, when I do NLQ with LLMs? I am looking for ...
0
votes
0
answers
34
views
LLMs' latency and their usability for inference
I am trying to use a transformer decoder (LLM, for simplicity) to label a collection of texts, later to be used for training a classifier.
I tried multiple 7B models, which I can save on my local ...
0
votes
1
answer
642
views
Are embedding in GPT models trainable model parameters? [closed]
I have tried to search from a few sources, but I did not see any one of them specifically talking about this issue. For example This blog post seems to imply that the embedding used in transformer is ...
1
vote
0
answers
47
views
Why are LLMs generative models [duplicate]
According to Wikipedia:
A generative model is a statistical model of the joint probability distribution $P ( X , Y )$ on given observable variable $X$ and target variable $Y$;
A discriminative model ...