Skip to main content

Questions tagged [transformer]

Use for questions related to the Transformer (based on encoder-decoder) architecture in machine learning.

0 votes
0 answers
9 views

Implementing pytorch temporal fusion transformer on time series

I am trying to run the temporal fusion transformer from the pytorch package. I am trying to compare the output on like terms to the tensorflow output in this paper p. 15 https://arxiv.org/pdf/1912....
Anna-Lise Nicholas's user avatar
0 votes
0 answers
18 views

Does it common for LM (hundreds million parameters) beat LLM (billion parameters) for binary classification task?

Preface I am trying to fine-tune the transformer-based model (LM and LLM). The LM that I used is DEBERTA, and the LLM is LLaMA 3. The task is to classify whether a text contains condescending language ...
sempraEdic's user avatar
0 votes
0 answers
16 views

Training a transformer CNN for image output from scratch

I'm trying to train a Transformer-CNN model from scratch. The Transformer model is comparable to that of the ViViT model 2. The CNN is taking the output of the second (temporal) transformer and is ...
SwingNoob's user avatar
0 votes
0 answers
13 views

Do a transformer's embeddings self-organise the same way as word2vec embeddings?

Word2vec embeddings are well-known for being able to do vector arithmetic on them. So King - Queen ≈ Man - Woman. Or Germany - Berlin ≈ France - Paris. When I first learned about transformers, one of ...
Darren Cook's user avatar
  • 1,104
2 votes
0 answers
45 views

Transformer model conditional probability distribution of sub-sentences

I have a simple transformer model (decoder only) which is trained on some dataset containing sentences to do next-word prediction. The model captures a probability distribution $P_{\theta}(\mathbf{a})$...
JazzJammer's user avatar
1 vote
0 answers
30 views

Can Transformers predict periodic time series data?

I want to use Transformers to predict a noise-free periodic 2D signal $f(t)$. The signal has a period of $T=10$, and since there is no noise, future predictions can be made perfectly from the past 5 ...
nemy's user avatar
  • 111
0 votes
1 answer
31 views

attentions not returned from transformers ViT model when using output_attentions=True

I'm using this code snippet from the docs of HuggingFace ViT classification model - with one addition: I'm using the output_attentions=True parameter. Nevertheless, ...
OfirD's user avatar
  • 91
1 vote
0 answers
20 views

The real world implementations of RAG vs the methods explained in the paper

While building a RAG application we Encode the query Retrieve k docs Concatenate before the query Pass the entire thing to a LLM and it completes it for you I do not think this is either of RAG-...
figs_and_nuts's user avatar
0 votes
0 answers
18 views

How to interpret the token embeddings from decoders?

I am having trouble thinking about the token embeddings from masked attention compared to BERT. Let's say we have 5 tokens. The embedding of the first token will be used to predict the second token, ...
BPDev's user avatar
  • 101
0 votes
0 answers
18 views

Apply Swin transformer to 1d arrays

My input features are 1d arrays of shape (1000,) I can tokenize the arrays using tf.extract_patches ...
Alex's user avatar
  • 1
0 votes
0 answers
14 views

How contextual embeddings learned during training a transformer are applied to the input sequence at inference time

I'm trying to understand contextual word embeddings better, and how they are applied at inference time. While training a transformer, embeddings are learned as parameters during training. Are the ...
Last_neutrin0's user avatar
0 votes
0 answers
16 views

In Swin-Transformer, Is each token (to-embedding) value an integer?

Swin-Transformer transform the image to tokens to input to transformer. Is each token (before-embedding) value an integer? In practice, where is this done? https://github.com/microsoft/Swin-...
CoderOnly's user avatar
  • 711
0 votes
3 answers
80 views

Why do we use similarity/cosine between Query and Key in attention?

Let's take an example sentence for translation: I am going to my home and play with toy house. For translating 'home', as per my understanding, Query will be 'house'...
Pratham's user avatar
0 votes
0 answers
31 views

Instruction LLM - extract data from text wrongly continues

I'm trying to fine-tune open sourced LLMs, for now let's stick with Mistral-7b-instruct model. My task is a follow: I have emails, that represents "price requests" for shipments sends by our ...
sagi's user avatar
  • 101
2 votes
1 answer
38 views

Practical Experiments on Self-Attention Mechanisms: QQ^T vs. QK^T

I'm currently exploring the self-attention mechanism used in models like Transformers, and I have a question about the necessity of using a separate key matrix (K) instead of just using the query ...
Peyman's user avatar
  • 1,175

15 30 50 per page
1
2 3 4 5
33