How to report few-shot accuracy for LLMs?

Ask Question

Asked 1 month ago

Modified 1 month ago

Viewed 16 times

I am comparing three prompting techniques in LLMs to check which one is best. All prompting strategies include three examples for in-context learning (few-shot only, no fine-tuning).

If I do greedy decoding, I have a deterministic result for accuracy, which I believe doesn't translate to accessing the real error.

I can also sample with temperature=0.7 to get a distribution. But then, to compare, I should check the type of distribution. I sampled five times (I could sample more, but having a big N is a little expensive). Then I checked, and it looks normal, so I am just doing a t-test.

Is my approach correct? Should I perform things like Bonferroni correction to compare more than two methods? Does this sampling approach better approximate the real error?

asked May 28 at 11:26

Jader Martins

2052 silver badges10 bronze badges

Add a comment |

Stack Exchange Network

How to report few-shot accuracy for LLMs?

0

Browse other questions tagged
hypothesis-testing
statistical-significance
multiple-comparisons
llm
or ask your own question.

Hot Network Questions

How to report few-shot accuracy for LLMs?

0

Know someone who can answer? Share a link to this question via email, Twitter, or Facebook.

Browse other questions tagged hypothesis-testingstatistical-significancemultiple-comparisonsllm or ask your own question.

Related

Hot Network Questions

Browse other questions tagged
hypothesis-testing
statistical-significance
multiple-comparisons
llm
or ask your own question.