Questions tagged [corpus-linguistics]
The corpus-linguistics tag has no usage guidance.
28
questions
1
vote
0
answers
16
views
Quantative/Statistical comparison between unequal corpora
I have created a corpus of 400.000 words, consisting exclusively of governmental administrative documents. I am focusing on the usage of rare words and i want to prove that my corpus has increased ...
0
votes
0
answers
13
views
Given variable A and B containing data of lemma sentiments, what is the correct term for the variable containing average of var A and var B?
I have a data visualization, showing the sentiment of two lemmas "гей" (var a) and "трансгендер" (var b) in a news corpus throughout the year.
Here is the dataframe sample of my ...
1
vote
1
answer
35
views
Can log2 be substituted with ln in logDice association measure?
I am currently doing collocational analysis in the Russian National Corpus, to be precise the Russian national news subcorpus, to see what is the most significant collocates of the lemma "gay&...
1
vote
1
answer
40
views
Poisson regressions in ratio: why is the counterpart not significant?
I am not a statistician and have limited knowledge about the underlying mathematics behind models but I am curious about something I found. I have count data, something like this: out of 150 words in ...
2
votes
2
answers
88
views
Is a binomial logistic regression valid in this case, and how do I use it / interpret its results?
I am facing the unusual problem that my $p$ values are too good. They are so good that I must be doing something wrong, but I don't know what.
I am working with natural language data from a text ...
1
vote
0
answers
11
views
Do I need to normalize corpus frequency if I am not comparing between corpora?
Do I need to normalize my corpus frequency, given that I am not comparing corpora? For example, if I am going to compare collocations of lexicons A, B and C in a corpus with 13 million tokens, can I ...
0
votes
0
answers
8
views
(Quantitative) research methods for collocation analysis in corpus linguistics?
I am trying to conduct a collocation analysis. This is a corpus linguistics research, doing so require me to test statistical significance using Mutual Information, as well as frequency normalization ...
1
vote
0
answers
31
views
Does Mutual Information by the power of 3 for the numerator really exist?
I am currently trying to measure collocational strength using Mutual Information (MI). MI gives an edge for exclusive and infrequent words. As stated by Brezina (2018), in measuring collocational ...
0
votes
0
answers
6
views
What is the best statistical measurement for collocational analysis?
I am a beginner with statistics and corpus linguistics, so sorry in advance if my explanation have gaps in it.
So I want to perform corpus linguistics collocational strength analysis on a given corpus,...
2
votes
1
answer
31
views
Why use log per-million count when analyzing corpora?
This might be such a trivial question for you, but please bare with me as I don't have background in statistics. So I am curious about corpus linguistics, and especially in this case how corpora is ...
3
votes
3
answers
506
views
Countering t-test "any feature is significant" results for large sample size datasets
I'm doing some analysis over natural language data, which basically entails:
Computing some feature over all samples.
Evaluating if this feature statistically significantly discriminates between ...
0
votes
0
answers
7
views
The right test for word usage in corpus linguistics
I need to find the right statistical test to use here:
the words on the left are prepositions.
the groups are beginner, Intermediate, and advanced level speakers of Swedish
My hypothesis is that ...
0
votes
1
answer
133
views
Estimating exponent of Zipf distribution using MLE vs fitting linear regression on log-transformed rank and frequency data
I'm having trouble understanding why I get radically different results if I try to find the parameter of a Zipf distribution when I use the methods proposed by Clauset et al. (2009) as opposed to ...
0
votes
0
answers
91
views
predicting the value of a nominal variable from the value of a ratio scale variable
I am doing a corpus study on the influence of subject length on word order choice between SVO and VSO orders. TO calculate that, I am using The Generalized Linear Model glm.
glm(formula = WordOrder ~ ...
1
vote
1
answer
71
views
How do I tell if word frequencies are changing over time?
I have a collection of texts that span about 1000 years. I am interested in the frequency of a particular word in these texts. Specifically, I want to know whether the frequency of the word increased ...