Questions tagged [n-gram]

Ask Question

An N-gram is an ordered collection of N elements of the same kind, usually presented in a large collection of many other similar N-grams. The individual elements are commonly natural language words, though N-grams have been applied to many other data types, such as numbers, letters, genetic proteins in DNA, etc. Statistical N-gram analysis is commonly performed as part of natural language processing, bioinformatics, and information theory.

880 questions

0 votes

1 answer

66 views

How to save n-gram output

A hopefully simple question. How can I save the ngram output from the following code? \\ library("quanteda") ## Package version: 2.1.2 data(data_corpus_inaugural) toks <- ...

bgreen

asked Jul 11 at 23:36

0 votes

1 answer

36 views

letter and bigram composition for each word in the dataframe

I have a data frame with words and I want to extract the letter and bigram composition for each word. Data: df$text [1] "table" [2] "run" [3] "mug"` And in the end I ...

Oksana Ts.

asked Jun 18 at 6:06

1 vote

1 answer

38 views

How do I determine the weight? depending on what?

I'm trying to calculate the n--gram using Python. The weight I used for for uni-gram, bi-gram, tri-gram, and 4-gram is (0.25, 0.25, 0, 0). When I run the script for the first reference it gives me a ...

user20003920

asked Jun 14 at 19:22

0 votes

0 answers

18 views

How to calculate the frequency of bigrams on fixed size windows

I am computing the frequency of bigrams given a list of token files tokenized_corpus = ['tokens_A.pickle', 'tokens_B.pickle', ...] where every tokens_X file unpickles as ['x', 'a', 'b', 'a', 'b', 'd', ...

Mustafa

asked May 7 at 5:24

1 vote

0 answers

32 views

Better performance and results for autocomplete search edge_ngram or search_as_you_type elasticsearch

I was testing and researching about the use of edge_ngrams and the search_as_you_type field in Elasticsearch to improve search results, but I see that they are very similar and I would like to know ...

Andry Hernandez

asked Apr 25 at 3:40

0 votes

0 answers

20 views

How to find pmi and phrase-count for everygrams?

Using NLTK's library I can find metrics about bi and trigrams . Now I want to find all the possible phrases and find their occurence count and PMI score as I did with the bi-grams and trigrams like ...

98fly

asked Apr 17 at 1:20

0 votes

0 answers

27 views

Bitextor/Bicleaner MAX_ORDER Issue

I am trying to analyze a translation file (with English-French sentence pairs) using Bicleaner (https://github.com/bitextor/bicleaner). I have a "test corpus" with ten sentence pairs ...

DevNoob_21

asked Mar 26 at 11:18

0 votes

0 answers

48 views

String Matching Function Not Matching Strings Despite Threshold Set to 0

I have implemented a string matching function in Python utilizing n-grams and similarity ratios. The function signature is as follows: # concise version of the function def match_strings(...

NIDHI SHASTRY

asked Feb 26 at 16:06

-2 votes

1 answer

52 views

Incorporating Phone Number Matching into Existing String based Name Matching Function

I have a Python function, match_strings, which is designed to match names from two different data sources. Here is the function definition: python def match_strings(strings1, strings2, ngram_n=2, ...

Rahul T

asked Feb 20 at 7:27

0 votes

0 answers

12 views

Ideal number of <BOS> tags in N-gram Language Model

Let us assume there is a sentence "There is a monkey". Now, let us try to create Trigrams after appending Beggining of String, End of String (<BOS>, <EOS>) tags to the string. ...

Anant Kumar

asked Jan 30 at 4:39

1 vote

1 answer

91 views

How to count char tuples efficiently in PHP

I need to fast count char tuples (or N-grams) in huge files/strings (from 10MB+ up to 1GB+) within a PHP project (a file classifier). The current implementation is made for single characters count (N=...

Crypto

asked Jan 3 at 11:15

0 votes

1 answer

62 views

How does elasticsearch count tf-idf? That looks weird

I have an index with documents that store system information and searchable fields that are copied into searchable_keys field In this case, there is only one such field - name. Here's the definition ...

Prosto_Oleg

asked Dec 14, 2023 at 11:23

0 votes

0 answers

26 views

BERTopic n-gram phrases are not adjacent to each other

ngram_range parameter of BERTopic is outputting n-grams with words far away from each other After setting the ngram_range=(2,2), the trained BERTopic model generates topics with 2-gram phrases such as ...

David

asked Dec 7, 2023 at 5:34

0 votes

2 answers

212 views

Python IntelliJ style 'search everywhere' algorithm

I have a list of file names in python like this: HelloWorld.csv hello_windsor.pdf some_file_i_need.jpg san_fransisco.png Another.file.txt A file name.rar I am looking for an IntelliJ style search ...

Adam Griffiths

asked Nov 23, 2023 at 10:19

1 vote

1 answer

135 views

bigram calculation - Memory error, large file problem

Here is a code for bigram calculation from the text corpus: import sys import csv import string import nltk from nltk import word_tokenize from nltk.tokenize import RegexpTokenizer from nltk.util ...

XTRUST.ORG

3,374

asked Nov 22, 2023 at 19:42

15 30 50 per page

2 3 4 5

…

59 Next

Collectives™ on Stack Overflow

Questions tagged [n-gram]

How to save n-gram output

letter and bigram composition for each word in the dataframe

How do I determine the weight? depending on what?

How to calculate the frequency of bigrams on fixed size windows

Better performance and results for autocomplete search edge_ngram or search_as_you_type elasticsearch

How to find pmi and phrase-count for everygrams?

Bitextor/Bicleaner MAX_ORDER Issue

String Matching Function Not Matching Strings Despite Threshold Set to 0

Incorporating Phone Number Matching into Existing String based Name Matching Function

Ideal number of <BOS> tags in N-gram Language Model

How to count char tuples efficiently in PHP

How does elasticsearch count tf-idf? That looks weird

BERTopic n-gram phrases are not adjacent to each other

Python IntelliJ style 'search everywhere' algorithm

bigram calculation - Memory error, large file problem

Hot Network Questions

Collectives™ on Stack Overflow

Questions tagged [n-gram]

Related Tags