Skip to main content

Questions tagged [corpus-linguistics]

The tag has no usage guidance.

1 vote
0 answers
16 views

Quantative/Statistical comparison between unequal corpora

I have created a corpus of 400.000 words, consisting exclusively of governmental administrative documents. I am focusing on the usage of rare words and i want to prove that my corpus has increased ...
vigilantius22's user avatar
0 votes
0 answers
13 views

Given variable A and B containing data of lemma sentiments, what is the correct term for the variable containing average of var A and var B?

I have a data visualization, showing the sentiment of two lemmas "гей" (var a) and "трансгендер" (var b) in a news corpus throughout the year. Here is the dataframe sample of my ...
pindakazen's user avatar
1 vote
1 answer
35 views

Can log2 be substituted with ln in logDice association measure?

I am currently doing collocational analysis in the Russian National Corpus, to be precise the Russian national news subcorpus, to see what is the most significant collocates of the lemma "gay&...
pindakazen's user avatar
1 vote
1 answer
40 views

Poisson regressions in ratio: why is the counterpart not significant?

I am not a statistician and have limited knowledge about the underlying mathematics behind models but I am curious about something I found. I have count data, something like this: out of 150 words in ...
Giuseppe Magistro's user avatar
2 votes
2 answers
88 views

Is a binomial logistic regression valid in this case, and how do I use it / interpret its results?

I am facing the unusual problem that my $p$ values are too good. They are so good that I must be doing something wrong, but I don't know what. I am working with natural language data from a text ...
Keelan's user avatar
  • 143
1 vote
0 answers
11 views

Do I need to normalize corpus frequency if I am not comparing between corpora?

Do I need to normalize my corpus frequency, given that I am not comparing corpora? For example, if I am going to compare collocations of lexicons A, B and C in a corpus with 13 million tokens, can I ...
user avatar
0 votes
0 answers
8 views

(Quantitative) research methods for collocation analysis in corpus linguistics?

I am trying to conduct a collocation analysis. This is a corpus linguistics research, doing so require me to test statistical significance using Mutual Information, as well as frequency normalization ...
user avatar
1 vote
0 answers
31 views

Does Mutual Information by the power of 3 for the numerator really exist?

I am currently trying to measure collocational strength using Mutual Information (MI). MI gives an edge for exclusive and infrequent words. As stated by Brezina (2018), in measuring collocational ...
user avatar
0 votes
0 answers
6 views

What is the best statistical measurement for collocational analysis?

I am a beginner with statistics and corpus linguistics, so sorry in advance if my explanation have gaps in it. So I want to perform corpus linguistics collocational strength analysis on a given corpus,...
user avatar
2 votes
1 answer
31 views

Why use log per-million count when analyzing corpora?

This might be such a trivial question for you, but please bare with me as I don't have background in statistics. So I am curious about corpus linguistics, and especially in this case how corpora is ...
user avatar
3 votes
3 answers
506 views

Countering t-test "any feature is significant" results for large sample size datasets

I'm doing some analysis over natural language data, which basically entails: Computing some feature over all samples. Evaluating if this feature statistically significantly discriminates between ...
Andre Ye's user avatar
0 votes
0 answers
7 views

The right test for word usage in corpus linguistics

I need to find the right statistical test to use here: the words on the left are prepositions. the groups are beginner, Intermediate, and advanced level speakers of Swedish My hypothesis is that ...
Aaron Hernandez's user avatar
0 votes
1 answer
133 views

Estimating exponent of Zipf distribution using MLE vs fitting linear regression on log-transformed rank and frequency data

I'm having trouble understanding why I get radically different results if I try to find the parameter of a Zipf distribution when I use the methods proposed by Clauset et al. (2009) as opposed to ...
MarcoLin8's user avatar
0 votes
0 answers
91 views

predicting the value of a nominal variable from the value of a ratio scale variable

I am doing a corpus study on the influence of subject length on word order choice between SVO and VSO orders. TO calculate that, I am using The Generalized Linear Model glm. glm(formula = WordOrder ~ ...
user avatar
1 vote
1 answer
71 views

How do I tell if word frequencies are changing over time?

I have a collection of texts that span about 1000 years. I am interested in the frequency of a particular word in these texts. Specifically, I want to know whether the frequency of the word increased ...
Namenlos's user avatar
  • 409

15 30 50 per page