Newest 'neural-network+nlp' Questions - Data Science Stack Exchange

0 votes

0 answers

18 views

Does it common for LM (hundreds million parameters) beat LLM (billion parameters) for binary classification task?

Preface I am trying to fine-tune the transformer-based model (LM and LLM). The LM that I used is DEBERTA, and the LLM is LLaMA 3. The task is to classify whether a text contains condescending language ...

sempraEdic

1

asked Jul 1 at 1:16

1 vote

1 answer

62 views

Improving GPU Utilization in LLM Inference System

I´m trying to build a distributed LLM inference platform with Huggingface support. The implementation involves utilizing Python for model processing and Java for interfacing with external systems. ...

Cardstdani

111

asked May 14 at 16:17

0 votes

1 answer

37 views

How do transformer-based architectures generate contextual embeddings?

How do transformer-based architectures, such as Roberta, etc., generate contextual embeddings? The issue is, I haven't found any articles that explain this process.

user159173

asked Mar 11 at 8:33

0 votes

1 answer

59 views

Fine tuning or just feature extraction or both using Roberta?

I'm reading a program that use the pre-trained Roberta model (roberta-base). The code first extracts word embeddings from each caption in the batch, using the last hidden state of the Roberta model. ...

user159173

asked Mar 8 at 18:58

1 vote

1 answer

197 views

What do special tokens used for in Roberta?

When I use this code: ...

user159173

asked Feb 24 at 12:28

0 votes

1 answer

57 views

Why was the learning rate decreased for Roberta compared to LSTM?

I'm reading the codebase of a project that uses Bidirectional-LSTM. The learning rate for it is 0.02. Later, someone improved the project by replacing LSTM with Roberta and decreased the learning rate ...

user159173

asked Feb 21 at 21:18

2 votes

1 answer

213 views

What do these terms mean in the context of Roberta?

When I read articles about Roberta, I often read the terms "transfer learning" and "fine-tuning". Additionally, they also mention "feature extraction". What are the ...

user159173

asked Feb 20 at 9:29

1 vote

0 answers

228 views

Why do the Llama 2 weights have eight different files?

I downloaded the weights for Llama 2 (70B-chat). This process created a folder titled "llama-2-70b-chat," which contained 8 files titled consolidated.00.pth, consolidated.01.pth, and so on ...

jskattt797

155

asked Feb 19 at 23:09

0 votes

1 answer

111 views

What are the differences between Embedding Layer and Roberta Embedding?

I'm reading an article about the Embedding Layer: The Embedding Layer learns word embeddings from raw text. It is initialized with small random numbers and can be learned simultaneously with a neural ...

user159173

asked Feb 19 at 21:34

0 votes

2 answers

164 views

What are the differences between contextual embeddings of Bidirectional-LSTM and Transformer?

A Transformer, like Roberta, can generate contextual embeddings using the encoder part, similar to a Bidirectional-LSTM that concatenates hidden states. What are the differences between them ? Are ...

user159173

asked Feb 18 at 9:19

0 votes

1 answer

90 views

Questions about hidden states of bidirectional LSTMs

I read this in an article about bidirectional LSTM: In bidirectional LSTM, each word corresponds to two hidden states, one for each direction. Thus, we concatenate these two hidden states to ...

user159173

asked Feb 10 at 6:24

2 votes

1 answer

1k views

What are the differences between BPE and byte-level BPE?

In Roberta, I'm not sure if the model use BPE or byte-level BPE tokenization, are these techniques different or the same ? Can someone explain ? Thanks

user159173

asked Feb 4 at 13:27

0 votes

1 answer

92 views

Anomaly Detection in Log Data using LSTM

Problem Overview: I am currently working on a project involving anomaly detection in log data. The anomalies are defined by deviations from historical patterns. The log data has a simple structure: [...

Raj

1

asked Jan 30 at 14:34

0 votes

0 answers

7 views

How to label a dataset of text pairs to use it as a universal one for calculating the precision@k metric for different models?

I am facing a semantic search problem. I am fine tuning different NLU models and i want to use precision@k as my main metric. Is it possible to label a dataset of text pairs to use it as a universal ...

Ir8_mind

183

asked Jan 2 at 19:33

1 vote

1 answer

63 views

Why my validation loss and accuracy decays over epochs?

Im trying to build 2 simple networks with cleaned dataset for tweets sentiment classification(0/1): one with all dense layers(binary bag of words) another with RNN layer(embedding layer). But it both ...

emily

35

asked Sep 1, 2023 at 11:52

Stack Exchange Network

All Questions

Does it common for LM (hundreds million parameters) beat LLM (billion parameters) for binary classification task?

Improving GPU Utilization in LLM Inference System

How do transformer-based architectures generate contextual embeddings?

Fine tuning or just feature extraction or both using Roberta?

What do special tokens used for in Roberta?

Why was the learning rate decreased for Roberta compared to LSTM?

What do these terms mean in the context of Roberta?

Why do the Llama 2 weights have eight different files?

What are the differences between Embedding Layer and Roberta Embedding?

What are the differences between contextual embeddings of Bidirectional-LSTM and Transformer?

Questions about hidden states of bidirectional LSTMs

What are the differences between BPE and byte-level BPE?

Anomaly Detection in Log Data using LSTM

How to label a dataset of text pairs to use it as a universal one for calculating the precision@k metric for different models?

Why my validation loss and accuracy decays over epochs?

Hot Network Questions

All Questions

Related Tags