Questions tagged [distance]
Measure of distance between distributions or variables, such as Euclidean distance between points in n-space.
742
questions
0
votes
0
answers
17
views
GLMM with cumulative distances
I have calculated the cumulative distances travelled for each individual. The objective is to represent the data in a graph with a tendency curve using the R package ggplot2. However, the data are too ...
2
votes
1
answer
32
views
analytical asymptotic approximation of the expected maximum, mean, and minimum distance of nearest neighbours in unit ball
Say I uniformly at random distribute $x = n^3$ (independent identically distributed) points in a ball of radius $r=1$ in $\mathbb{R}^3$.
What can be said about the expected maximum, minimum, and mean ...
0
votes
0
answers
28
views
Wasserstein distance to assess the degree of normality
The Wasserstein distance between two probability measures with quantile functions $F^{-1}$ and $G^{-1}$ is given by
\begin{align}
W(F,G) = \int_{[0,1]} |F^{-1}(t) - G^{-1}(t)|dt
\end{align}
Now let's ...
0
votes
0
answers
21
views
Looking for a suitable way to find groups of events
I have an excel file in which I have three columns. The first one is the name of an event, the second one is the moment the event starts and the third one is the time at which an event ends. Let's say ...
1
vote
1
answer
24
views
After matching: How do I interpret the value of the type ‘distance’ (=Propensity score) in the balance measures table of the r-package cobalt bal.tab?
I have used the R-package ‘MatchIt’ to perform (1) a nearest neighbour propensity score matching (NNM) based on the Framingham Heart Study and (2) for comparison, an optimal PS matching (OM) for the ...
1
vote
1
answer
58
views
Hypothesis testing by asymptotic distribution
Consider the following hypothesis testing problem:
under $H_0$: $(X_1,\cdots,X_n) \sim P_n,$
under $H_1$: $(X_1,\cdots,X_n) \sim Q_n.$
We want to show that the minimum testing error goes to zero when $...
1
vote
1
answer
33
views
Distance between two multivariate distributions
I have data structured as followed: 2 years -> several metrics per year (let's say 2000) -> several measurements per metric (let's say 1000 for year 1 and 800 for year 2). So I have a 2000x1000 ...
1
vote
1
answer
45
views
Hierarchical clustering of a distance matrix with element weights
I am computing a hierarchical clustering of some geospatial data. I need to add in an element weighting to the approach.
My current approach is:
I compute temporal cross-correlations between my N ...
2
votes
1
answer
44
views
What are downsides to "genetic matching," particularly outside of causal inference settings?
Multivariate matching methods typically involve two steps. First the user computes $D$, a matrix of the multivariate distances between units. Second, the user applies a matching function (e.g., 1:1 ...
1
vote
0
answers
71
views
Understanding $\chi^2$ in plain language [closed]
Suppose I was trying to explain what $\chi^2$ is and why it's important to my grandma. I want to give core intuition to this formula:
$$\chi^2 = \sum_{i}\frac{(O_i-E_i)^2}{E_i}$$
I would tell her ...
1
vote
0
answers
48
views
Why not use the $L^2$ norm as the difference between two probability distributions (as opposed to KL-Divergence and others) [closed]
So I was wondering why not just use:
$$dist(p,q)=\bigg(\int_{x \in X} |p(x)-q(x)|^2 dx\bigg)^{1/2}$$ instead of the commonly used KL-Divergence, which isn't even a distance measure and therefore not ...
0
votes
0
answers
10
views
Distance to find similar samples in a multivariate dataset
Apart from Euclidean and Mahalanobis distance metrics. Given a sample with multivariable values, is there a way to find the samples that are similar to the given sample?
Does KNN clustering find the ...
0
votes
0
answers
85
views
PCA and Gower's Distance
I have a dataset with nutrient information about different ingredients. There are a total of 70 nutrients (numeric features) and 3 categorical features for a total of around 550 ingredients. I am ...
0
votes
0
answers
24
views
Is it possible to convert a similarity (distance) index to correlation coefficient?
I have two cases that each one has some values on a series of variables (e.g. A, B & C). Is it possible to calculate a distance or similarity index between these two cases and then convert it to a ...
0
votes
0
answers
18
views
Distance metric for dummy and continous variables
I'm trying to apply the KNN regression model to the data I have at my disposal which contains one dummy variable and two continuous variables (which I have normalized). I was wondering if it is okay ...