Skip to main content

Questions tagged [natural-language-processing]

For questions related to natural language processing (NLP), which is concerned with the interactions between computers and human (or natural) languages, in particular how to create programs that process and analyze large amounts of natural language data.

0 votes
0 answers
15 views

Combinig output of two different machine learning models for accurate invoice data extraction: Is this a viable approach?

I am working (trying to work) on a project to extract relevant information from invoices. Currently I don't achieve much good accuracy so am trying to come up with some new ideas. I am considering ...
rowor's user avatar
  • 1
0 votes
0 answers
12 views

How do I load BERT-base-uncased to perform fine-tuning?

I'm learning how to fine-tune LLMs without using Huggingface tools. I understand that Huggingface provides tools for streamlining the process of building, deploying, and training ML models. That's ...
Майкл Шодеке's user avatar
0 votes
0 answers
16 views

Normalizing the embedding space of an encoder language model with respect to categorical data

Suppose we have a tree/hierarchy of categories (e.g. categories of products in an e-commerce website), each node being assigned a title. Assume that the title of each node is semantically accurate, ...
mtcicero's user avatar
  • 101
0 votes
0 answers
13 views

How to quantify the tone of a textual paragraph? If there is historical communication available, how to check for consistency in tonality for new i/p?

Certain aspects of NLP such as the basic Polarity, Subjectivity, and Positivity, can be obtained with ease, but keyword consistent usage and the "Style" or the "Tone" of writing ...
rushit palesha's user avatar
0 votes
1 answer
25 views

How to generate synthetic text data for LLM fine tuning?

Given a corpus of data like a log of Slack conversations, I want to be able to use this and generate fictitious conversations e.g., given 10 conversations, I want to be able to scale this up to 100 ...
morpheus's user avatar
  • 294
0 votes
0 answers
17 views

Do sinusoidal positional embeddings fundamentally limit context length?

In principle, it seems like we can generate endless sinusoidal positional embeddings. However, with respect to context window maximums, is there some limitation that sinusoidal positional embeddings ...
Victor M's user avatar
  • 101
0 votes
0 answers
14 views

Hugging Face model with large context window

Im looking for a model that accepts input like 50k characters and a prompt answering a question based on that text. Is there something available like that? Not sure how to find it, new to AI.
repo's user avatar
  • 101
0 votes
0 answers
25 views

How does casual and padding mask work in decoder-only models?

I am trying to implement a decoder-only model from scratch using PyTorch, but I am confused about how the masking works. From what I understand, when we have encoder-decoder architecture, the padding ...
Nadya Koleva's user avatar
0 votes
0 answers
21 views

Efficient Matching of Sample Requests to Sample Offers Using Large Language Models

I want to discuss an interesting matching problem. We aim to match sample requests with corresponding sample offers. Here are some examples: Sample Requests: Need help installing Linux on my old ...
GGT's user avatar
  • 101
0 votes
1 answer
36 views

What is the exact purpose of input modulation gate in LSTMs?

Basically, I was learning about LSTMs where I found LSTMs are made up of three gates: The forget gate, input gate and output gate. However, I came across some sources that state there is a fourth gate ...
MrIzzat's user avatar
1 vote
1 answer
54 views

How are perplexities over multiple instance aggregated?

The perplexity of the $i^{th}$ token in the $k^{th}$ sequence is $$ P_{ki} = \frac{1}{p(t_{ki})} $$ The perplexity aggregated for the $k^{th}$ sequence is then $$ P_{k} = \left(\prod_{i=1}^N P_{ki}\...
Borun Chowdhury's user avatar
0 votes
2 answers
48 views

Classifier-Free-Guidance with Transformers

I'm working on music generation using transformers. Using the decoder part for the audio tokens with text conditioning by the T5 encoder In Classifier-Free-Guidance, the text conditioning randomly ...
qmzp's user avatar
  • 1
0 votes
0 answers
37 views

Why decoder only model require left padding?

We used Gemma 2B model to infer, and tried left and right padding. "right" padding is giving us different answer compared to left padding. Why do we use left padding for decoder only model, ...
Aamod Thakur's user avatar
0 votes
0 answers
19 views

Why I am getting different KV cache?

I have taken Squad 2.0 dataset for inferencing Gemma 2B model. When I provided model with 1st datapoint truncating till 36 tokens and same datapoint truncating till 80 tokens. I am getting slightly ...
Aamod Thakur's user avatar
4 votes
2 answers
56 views

Any popular diffusion model for language modeling?

Is there a popular diffusion model-based framework for language modelling? If not, is it because of the difficulty sampling for discrete distributions?
Ayan Sengupta's user avatar

15 30 50 per page
1
2 3 4 5
52