Newest 'llm' Questions - Cross Validated

0 votes

0 answers

7 views

Choosing an evaluation model for LLM for Question and Answering

I was learning the basic of LLM evaluation and the framework introduced in one of the short courses in Deep Learning AI was to generate samples of question and answer which act as the ground truth. ...

ShengXue

1

asked Jul 14 at 12:47

0 votes

0 answers

8 views

How to construct class proportion confidence interval for an LLM classifier with known bias and precision and recall?

Let's say I have a dataset, $D$, with known ground truth labels. I nonetheless use a few-shot LLM classifier on this dataset to predict $k$ classes for each label. From the LLM results, I get ...

Estimate the estimators

811

asked Jul 14 at 1:59

0 votes

0 answers

16 views

How to report few-shot accuracy for LLMs?

I am comparing three prompting techniques in LLMs to check which one is best. All prompting strategies include three examples for in-context learning (few-shot only, no fine-tuning). If I do greedy ...

Jader Martins

205

asked May 28 at 11:26

4 votes

0 answers

80 views

What advantage do sinusoidal positional encodings have over binary positional encodings in transformer LLMs?

I've recently come across an article that discusses the reasons why large language models use sinusoidal functions to generate positional encodings — as per the famous paper Attention Is All You Need (...

Philip Voinea

136

asked May 2 at 0:17

0 votes

0 answers

20 views

Where does the equation $ C = 6 \times N \times T $ come from for Large Language Models, especially with a simple explanation for both passes?

Why $ C = 6 \times N \times T $? I'm trying to understand the computational steps specifically during the backward pass of neural networks in relation to the widely cited formula ( C = 6 \times N \...

Charlie Parker

6,926

asked Apr 30 at 19:00

0 votes

0 answers

11 views

Gradient flow through sampled tokens when training RNNs (but without teacher forcing)

Suppose we want to train an autoregressive generative language model based on a recurrent neural network (RNN) architecture without teacher forcing: At each timestep, the RNN takes an input token $x_t$...

Ben JW

101

asked Apr 30 at 12:17

0 votes

0 answers

72 views

Are LLMs stateful or stateless during the generation process?

Are LLMs like OpenAI's gpt-* stateful or stateless during the generation of the response? I've read a couple of articles like this but am still not quite sure about that. I understand they are ...

Dr. Hans-Peter Störr

667

asked Mar 25 at 7:19

0 votes

0 answers

27 views

Why is my (Mistral) LLM (almost completely) stopping to learn on my synthetic data after the first epoch, yet not overfitting?

I am creating synthetic task-oriented dialogs that are rather complex. Training and validation losses suggest that the model (almost completely) stops learning, but does not start overfitting: How ...

DaveFar

93

asked Mar 11 at 12:36

0 votes

0 answers

70 views

Why did the OpenAI's scaling law paper underestimate the importance of data in model scaling?

The Chinchilla paper (Hoffmann, Jordan, et al. "Training compute-optimal large language models." arXiv preprint arXiv:2203.15556 (2022).) famously found that when scaling a model, you should ...

user35734

406

asked Jan 30 at 23:19

0 votes

0 answers

208 views

block_size in transformers: does it dictate effective context length in LLMs?

I would like to understand how the block_size parameter in the huggingface transformers library works, particularly in comparison with model_max_length. I am interested in models being able to attend ...

Nucular

453

asked Jan 12 at 16:49

0 votes

0 answers

22 views

Does Positional Interpolation Change Llama's Architecture?

I'm currently exploring Meta's positional interpolation method, which aims to increase the context size in their large language model. This method extends the context length from n x n into n′ x n′. ...

user219313

1

asked Nov 19, 2023 at 23:31

0 votes

0 answers

24 views

Evaluation metrics for chunking and synthesis steps for Q&A system

I am doing research and interested in the following question: What are the evaluation metrics for chunking, retrieval and synthesis steps for Q&A, when I do NLQ with LLMs? I am looking for ...

Anakin Skywalker

229

asked Nov 2, 2023 at 10:57

0 votes

0 answers

34 views

LLMs' latency and their usability for inference

I am trying to use a transformer decoder (LLM, for simplicity) to label a collection of texts, later to be used for training a classifier. I tried multiple 7B models, which I can save on my local ...

David Harar

335

asked Oct 12, 2023 at 16:23

0 votes

1 answer

642 views

Are embedding in GPT models trainable model parameters? [closed]

I have tried to search from a few sources, but I did not see any one of them specifically talking about this issue. For example This blog post seems to imply that the embedding used in transformer is ...

Sam

403

asked Sep 4, 2023 at 21:52

1 vote

0 answers

47 views

Why are LLMs generative models [duplicate]

According to Wikipedia: A generative model is a statistical model of the joint probability distribution $P ( X , Y )$ on given observable variable $X$ and target variable $Y$; A discriminative model ...

Sam

403

asked Sep 1, 2023 at 17:30

Stack Exchange Network

Questions tagged [llm]

Choosing an evaluation model for LLM for Question and Answering

How to construct class proportion confidence interval for an LLM classifier with known bias and precision and recall?

How to report few-shot accuracy for LLMs?

What advantage do sinusoidal positional encodings have over binary positional encodings in transformer LLMs?

Where does the equation $ C = 6 \times N \times T $ come from for Large Language Models, especially with a simple explanation for both passes?

Gradient flow through sampled tokens when training RNNs (but without teacher forcing)

Are LLMs stateful or stateless during the generation process?

Why is my (Mistral) LLM (almost completely) stopping to learn on my synthetic data after the first epoch, yet not overfitting?

Why did the OpenAI's scaling law paper underestimate the importance of data in model scaling?

block_size in transformers: does it dictate effective context length in LLMs?

Does Positional Interpolation Change Llama's Architecture?

Evaluation metrics for chunking and synthesis steps for Q&A system

LLMs' latency and their usability for inference

Are embedding in GPT models trainable model parameters? [closed]

Why are LLMs generative models [duplicate]

Hot Network Questions

Questions tagged [llm]

Related Tags