Newest 'transformer' Questions - Data Science Stack Exchange

0 votes

0 answers

9 views

Implementing pytorch temporal fusion transformer on time series

I am trying to run the temporal fusion transformer from the pytorch package. I am trying to compare the output on like terms to the tensorflow output in this paper p. 15 https://arxiv.org/pdf/1912....

Anna-Lise Nicholas

1

asked Jul 15 at 20:37

0 votes

0 answers

18 views

Does it common for LM (hundreds million parameters) beat LLM (billion parameters) for binary classification task?

Preface I am trying to fine-tune the transformer-based model (LM and LLM). The LM that I used is DEBERTA, and the LLM is LLaMA 3. The task is to classify whether a text contains condescending language ...

sempraEdic

1

asked Jul 1 at 1:16

0 votes

0 answers

16 views

Training a transformer CNN for image output from scratch

I'm trying to train a Transformer-CNN model from scratch. The Transformer model is comparable to that of the ViViT model 2. The CNN is taking the output of the second (temporal) transformer and is ...

SwingNoob

1

asked Jun 27 at 10:27

0 votes

0 answers

13 views

Do a transformer's embeddings self-organise the same way as word2vec embeddings?

Word2vec embeddings are well-known for being able to do vector arithmetic on them. So King - Queen ≈ Man - Woman. Or Germany - Berlin ≈ France - Paris. When I first learned about transformers, one of ...

Darren Cook

1,104

asked Jun 25 at 22:24

2 votes

0 answers

45 views

Transformer model conditional probability distribution of sub-sentences

I have a simple transformer model (decoder only) which is trained on some dataset containing sentences to do next-word prediction. The model captures a probability distribution $P_{\theta}(\mathbf{a})$...

JazzJammer

21

asked Jun 17 at 11:21

1 vote

0 answers

30 views

Can Transformers predict periodic time series data?

I want to use Transformers to predict a noise-free periodic 2D signal $f(t)$. The signal has a period of $T=10$, and since there is no noise, future predictions can be made perfectly from the past 5 ...

nemy

111

asked Jun 10 at 9:14

0 votes

1 answer

31 views

attentions not returned from transformers ViT model when using output_attentions=True

I'm using this code snippet from the docs of HuggingFace ViT classification model - with one addition: I'm using the output_attentions=True parameter. Nevertheless, ...

OfirD

91

asked Jun 9 at 19:13

1 vote

0 answers

20 views

The real world implementations of RAG vs the methods explained in the paper

While building a RAG application we Encode the query Retrieve k docs Concatenate before the query Pass the entire thing to a LLM and it completes it for you I do not think this is either of RAG-...

figs_and_nuts

883

asked Jun 9 at 8:08

0 votes

0 answers

18 views

How to interpret the token embeddings from decoders?

I am having trouble thinking about the token embeddings from masked attention compared to BERT. Let's say we have 5 tokens. The embedding of the first token will be used to predict the second token, ...

BPDev

101

asked Jun 4 at 20:26

0 votes

0 answers

18 views

Apply Swin transformer to 1d arrays

My input features are 1d arrays of shape (1000,) I can tokenize the arrays using tf.extract_patches ...

Alex

1

asked Jun 1 at 4:51

0 votes

0 answers

14 views

How contextual embeddings learned during training a transformer are applied to the input sequence at inference time

I'm trying to understand contextual word embeddings better, and how they are applied at inference time. While training a transformer, embeddings are learned as parameters during training. Are the ...

Last_neutrin0

1

asked May 24 at 13:49

0 votes

0 answers

16 views

In Swin-Transformer, Is each token (to-embedding) value an integer?

Swin-Transformer transform the image to tokens to input to transformer. Is each token (before-embedding) value an integer? In practice, where is this done? https://github.com/microsoft/Swin-...

CoderOnly

711

asked May 14 at 2:40

0 votes

3 answers

80 views

Why do we use similarity/cosine between Query and Key in attention?

Let's take an example sentence for translation: I am going to my home and play with toy house. For translating 'home', as per my understanding, Query will be 'house'...

Pratham

1

asked May 7 at 4:45

0 votes

0 answers

31 views

Instruction LLM - extract data from text wrongly continues

I'm trying to fine-tune open sourced LLMs, for now let's stick with Mistral-7b-instruct model. My task is a follow: I have emails, that represents "price requests" for shipments sends by our ...

sagi

101

asked May 6 at 9:04

2 votes

1 answer

38 views

Practical Experiments on Self-Attention Mechanisms: QQ^T vs. QK^T

I'm currently exploring the self-attention mechanism used in models like Transformers, and I have a question about the necessity of using a separate key matrix (K) instead of just using the query ...

Peyman

1,175

asked Apr 29 at 20:57

Stack Exchange Network

Questions tagged [transformer]

Implementing pytorch temporal fusion transformer on time series

Does it common for LM (hundreds million parameters) beat LLM (billion parameters) for binary classification task?

Training a transformer CNN for image output from scratch

Do a transformer's embeddings self-organise the same way as word2vec embeddings?

Transformer model conditional probability distribution of sub-sentences

Can Transformers predict periodic time series data?

attentions not returned from transformers ViT model when using output_attentions=True

The real world implementations of RAG vs the methods explained in the paper

How to interpret the token embeddings from decoders?

Apply Swin transformer to 1d arrays

How contextual embeddings learned during training a transformer are applied to the input sequence at inference time

In Swin-Transformer, Is each token (to-embedding) value an integer?

Why do we use similarity/cosine between Query and Key in attention?

Instruction LLM - extract data from text wrongly continues

Practical Experiments on Self-Attention Mechanisms: QQ^T vs. QK^T

Hot Network Questions

Questions tagged [transformer]

Related Tags