Newest 'natural-language-processing' Questions - Artificial Intelligence Stack Exchange

0 votes

0 answers

15 views

Combinig output of two different machine learning models for accurate invoice data extraction: Is this a viable approach?

I am working (trying to work) on a project to extract relevant information from invoices. Currently I don't achieve much good accuracy so am trying to come up with some new ideas. I am considering ...

rowor

1

asked Jul 17 at 18:12

0 votes

0 answers

12 views

How do I load BERT-base-uncased to perform fine-tuning?

I'm learning how to fine-tune LLMs without using Huggingface tools. I understand that Huggingface provides tools for streamlining the process of building, deploying, and training ML models. That's ...

Майкл Шодеке

101

asked Jul 17 at 4:04

0 votes

0 answers

16 views

Normalizing the embedding space of an encoder language model with respect to categorical data

Suppose we have a tree/hierarchy of categories (e.g. categories of products in an e-commerce website), each node being assigned a title. Assume that the title of each node is semantically accurate, ...

mtcicero

101

asked Jul 15 at 23:12

0 votes

0 answers

13 views

How to quantify the tone of a textual paragraph? If there is historical communication available, how to check for consistency in tonality for new i/p?

Certain aspects of NLP such as the basic Polarity, Subjectivity, and Positivity, can be obtained with ease, but keyword consistent usage and the "Style" or the "Tone" of writing ...

rushit palesha

1

asked Jul 15 at 7:27

0 votes

1 answer

25 views

How to generate synthetic text data for LLM fine tuning?

Given a corpus of data like a log of Slack conversations, I want to be able to use this and generate fictitious conversations e.g., given 10 conversations, I want to be able to scale this up to 100 ...

morpheus

294

asked Jul 5 at 20:18

0 votes

0 answers

17 views

Do sinusoidal positional embeddings fundamentally limit context length?

In principle, it seems like we can generate endless sinusoidal positional embeddings. However, with respect to context window maximums, is there some limitation that sinusoidal positional embeddings ...

Victor M

101

asked Jul 4 at 1:35

0 votes

0 answers

14 views

Hugging Face model with large context window

Im looking for a model that accepts input like 50k characters and a prompt answering a question based on that text. Is there something available like that? Not sure how to find it, new to AI.

repo

101

asked Jun 30 at 19:20

0 votes

0 answers

25 views

How does casual and padding mask work in decoder-only models?

I am trying to implement a decoder-only model from scratch using PyTorch, but I am confused about how the masking works. From what I understand, when we have encoder-decoder architecture, the padding ...

Nadya Koleva

1

asked Jun 30 at 14:00

0 votes

0 answers

21 views

Efficient Matching of Sample Requests to Sample Offers Using Large Language Models

I want to discuss an interesting matching problem. We aim to match sample requests with corresponding sample offers. Here are some examples: Sample Requests: Need help installing Linux on my old ...

GGT

101

asked Jun 23 at 2:27

0 votes

1 answer

36 views

What is the exact purpose of input modulation gate in LSTMs?

Basically, I was learning about LSTMs where I found LSTMs are made up of three gates: The forget gate, input gate and output gate. However, I came across some sources that state there is a fourth gate ...

MrIzzat

1

asked Jun 19 at 12:50

1 vote

1 answer

54 views

How are perplexities over multiple instance aggregated?

The perplexity of the $i^{th}$ token in the $k^{th}$ sequence is $$ P_{ki} = \frac{1}{p(t_{ki})} $$ The perplexity aggregated for the $k^{th}$ sequence is then $$ P_{k} = \left(\prod_{i=1}^N P_{ki}\...

Borun Chowdhury

200

asked Jun 15 at 19:27

0 votes

2 answers

48 views

Classifier-Free-Guidance with Transformers

I'm working on music generation using transformers. Using the decoder part for the audio tokens with text conditioning by the T5 encoder In Classifier-Free-Guidance, the text conditioning randomly ...

qmzp

1

asked Jun 9 at 13:29

0 votes

0 answers

37 views

Why decoder only model require left padding?

We used Gemma 2B model to infer, and tried left and right padding. "right" padding is giving us different answer compared to left padding. Why do we use left padding for decoder only model, ...

Aamod Thakur

35

asked Jun 8 at 10:22

0 votes

0 answers

19 views

Why I am getting different KV cache?

I have taken Squad 2.0 dataset for inferencing Gemma 2B model. When I provided model with 1st datapoint truncating till 36 tokens and same datapoint truncating till 80 tokens. I am getting slightly ...

Aamod Thakur

35

asked Jun 8 at 8:02

4 votes

2 answers

56 views

Any popular diffusion model for language modeling?

Is there a popular diffusion model-based framework for language modelling? If not, is it because of the difficulty sampling for discrete distributions?

Ayan Sengupta

41

asked Jun 6 at 7:28

Stack Exchange Network

Questions tagged [natural-language-processing]

Combinig output of two different machine learning models for accurate invoice data extraction: Is this a viable approach?

How do I load BERT-base-uncased to perform fine-tuning?

Normalizing the embedding space of an encoder language model with respect to categorical data

How to quantify the tone of a textual paragraph? If there is historical communication available, how to check for consistency in tonality for new i/p?

How to generate synthetic text data for LLM fine tuning?

Do sinusoidal positional embeddings fundamentally limit context length?

Hugging Face model with large context window

How does casual and padding mask work in decoder-only models?

Efficient Matching of Sample Requests to Sample Offers Using Large Language Models

What is the exact purpose of input modulation gate in LSTMs?

How are perplexities over multiple instance aggregated?

Classifier-Free-Guidance with Transformers

Why decoder only model require left padding?

Why I am getting different KV cache?

Any popular diffusion model for language modeling?

Hot Network Questions

Questions tagged [natural-language-processing]

Related Tags