Does it common for LM (hundreds million parameters) beat LLM (billion parameters) for binary classification task?

Ask Question

Asked 20 days ago

Modified 20 days ago

Viewed 18 times

Preface

I am trying to fine-tune the transformer-based model (LM and LLM). The LM that I used is DEBERTA, and the LLM is LLaMA 3. The task is to classify whether a text contains condescending language (binary classification).

I use AutoModelForSequenceClassification, which adds a classification layer to the model's top layer for both LM and LLM.

Implementation

Dataset:
- Amount: it has about 10.000 texts with each text labeled 0 (for not condescending) and 1 (condescending). The proportion is 1:10 (condescending : not condescending).
Parameter

Parameter	LM	LLM
Batch size	32	16 (per_device_train_batch_size = 4, gradient_accumulation_steps = 4)
Epoch / steps	2 epoch	1000 steps (20% used as validation set)
Learning Rate	linear (2e-5)	constant (2e-5)
Optimizer	AdamW (lr = 2e-5, eps = 1e-8)	paged_adamw_32bit
Fine-tuning	Full fine-tuning	LoRA (rank=32, dropout=0.5, alpha=8) with 8-bit quantization
Learning Rate	linear (2e-5)	constant (2e-5)
Precision	0,659	0,836
Recall	0,47	0,091
F1-score	0,549	0,164

Question and Issue

Here is the log of the training sample. The validation f1-score is always >0.6. But the validation loss is stuck at 0.24. It is one of the samples of fine-tuned LLM.

Why does the test set f1-score only range from 0 - 0.2 for some parameter variation that I tuned when the f1-score for the validation set is always above 0.6, is it reasonable? why?
Is it common for LM to beat LLM for a particular task? If yes, what is the rationalization?

edited Jul 1 at 1:18

asked Jul 1 at 1:16

sempraEdic

11 bronze badge

$\begingroup$ Please edit the question to limit it to a specific problem with enough detail to identify an adequate answer. $\endgroup$
– Community Bot
Commented Jul 14 at 6:41

Add a comment |

Stack Exchange Network

Does it common for LM (hundreds million parameters) beat LLM (billion parameters) for binary classification task?

0

Browse other questions tagged
python
neural-network
nlp
transformer
huggingface
or ask your own question.

Hot Network Questions

Does it common for LM (hundreds million parameters) beat LLM (billion parameters) for binary classification task?

0

Know someone who can answer? Share a link to this question via email, Twitter, or Facebook.

Browse other questions tagged pythonneural-networknlptransformerhuggingface or ask your own question.

Related

Hot Network Questions

Browse other questions tagged
python
neural-network
nlp
transformer
huggingface
or ask your own question.