Skip to main content

Questions tagged [llm]

Large Language Models (LLMs) are pretrained models that will probabilistically generate natural language texts. The underlying model is typically a Deep Learning one. Examples include GPT models.

0 votes
0 answers
7 views

Choosing an evaluation model for LLM for Question and Answering

I was learning the basic of LLM evaluation and the framework introduced in one of the short courses in Deep Learning AI was to generate samples of question and answer which act as the ground truth. ...
ShengXue's user avatar
0 votes
0 answers
8 views

How to construct class proportion confidence interval for an LLM classifier with known bias and precision and recall?

Let's say I have a dataset, $D$, with known ground truth labels. I nonetheless use a few-shot LLM classifier on this dataset to predict $k$ classes for each label. From the LLM results, I get ...
Estimate the estimators's user avatar
0 votes
0 answers
16 views

How to report few-shot accuracy for LLMs?

I am comparing three prompting techniques in LLMs to check which one is best. All prompting strategies include three examples for in-context learning (few-shot only, no fine-tuning). If I do greedy ...
Jader Martins's user avatar
4 votes
0 answers
80 views

What advantage do sinusoidal positional encodings have over binary positional encodings in transformer LLMs?

I've recently come across an article that discusses the reasons why large language models use sinusoidal functions to generate positional encodings — as per the famous paper Attention Is All You Need (...
Philip Voinea's user avatar
0 votes
0 answers
20 views

Where does the equation $ C = 6 \times N \times T $ come from for Large Language Models, especially with a simple explanation for both passes?

Why $ C = 6 \times N \times T $? I'm trying to understand the computational steps specifically during the backward pass of neural networks in relation to the widely cited formula ( C = 6 \times N \...
Charlie Parker's user avatar
0 votes
0 answers
11 views

Gradient flow through sampled tokens when training RNNs (but without teacher forcing)

Suppose we want to train an autoregressive generative language model based on a recurrent neural network (RNN) architecture without teacher forcing: At each timestep, the RNN takes an input token $x_t$...
Ben JW's user avatar
  • 101
0 votes
0 answers
72 views

Are LLMs stateful or stateless during the generation process?

Are LLMs like OpenAI's gpt-* stateful or stateless during the generation of the response? I've read a couple of articles like this but am still not quite sure about that. I understand they are ...
Dr. Hans-Peter Störr's user avatar
0 votes
0 answers
27 views

Why is my (Mistral) LLM (almost completely) stopping to learn on my synthetic data after the first epoch, yet not overfitting?

I am creating synthetic task-oriented dialogs that are rather complex. Training and validation losses suggest that the model (almost completely) stops learning, but does not start overfitting: How ...
DaveFar's user avatar
  • 93
0 votes
0 answers
70 views

Why did the OpenAI's scaling law paper underestimate the importance of data in model scaling?

The Chinchilla paper (Hoffmann, Jordan, et al. "Training compute-optimal large language models." arXiv preprint arXiv:2203.15556 (2022).) famously found that when scaling a model, you should ...
user35734's user avatar
  • 406
0 votes
0 answers
208 views

block_size in transformers: does it dictate effective context length in LLMs?

I would like to understand how the block_size parameter in the huggingface transformers library works, particularly in comparison with model_max_length. I am interested in models being able to attend ...
Nucular's user avatar
  • 453
0 votes
0 answers
22 views

Does Positional Interpolation Change Llama's Architecture?

I'm currently exploring Meta's positional interpolation method, which aims to increase the context size in their large language model. This method extends the context length from n x n into n′ x n′. ...
user219313's user avatar
0 votes
0 answers
24 views

Evaluation metrics for chunking and synthesis steps for Q&A system

I am doing research and interested in the following question: What are the evaluation metrics for chunking, retrieval and synthesis steps for Q&A, when I do NLQ with LLMs? I am looking for ...
Anakin Skywalker's user avatar
0 votes
0 answers
34 views

LLMs' latency and their usability for inference

I am trying to use a transformer decoder (LLM, for simplicity) to label a collection of texts, later to be used for training a classifier. I tried multiple 7B models, which I can save on my local ...
David Harar's user avatar
0 votes
1 answer
642 views

Are embedding in GPT models trainable model parameters? [closed]

I have tried to search from a few sources, but I did not see any one of them specifically talking about this issue. For example This blog post seems to imply that the embedding used in transformer is ...
Sam's user avatar
  • 403
1 vote
0 answers
47 views

Why are LLMs generative models [duplicate]

According to Wikipedia: A generative model is a statistical model of the joint probability distribution $P ( X , Y )$ on given observable variable $X$ and target variable $Y$; A discriminative model ...
Sam's user avatar
  • 403

15 30 50 per page