Questions tagged [clustering]
Clustering is grouping (partitioning) a set of objects so that items in the same group are more similar to each other than to items in different groups, where the notion of similarity may be variously defined.
324
questions
0
votes
1
answer
19
views
Clustering for a real problem - location matters!
I am working on a clustering problem and need some help to develop an appropriate mathematical model. Here are the details of my problem:
Locations: I have a set of 141 locations, each defined by ...
0
votes
0
answers
16
views
Clustering a sequence of Bernoulli random variables
Let $Z_1$, ..., $Z_n$ be a sequence of independent Bernoulli random variables such that
for all $i\in\left\{1,..,n\right\}$ $Z_i\sim\mathcal{B}(p_i)$ where $p_i < 1/2$.
Define $\ell(x_{1:n}, y_{1:n}...
0
votes
0
answers
5
views
Splitting upon insertion in hierarchical clustering
It's my understanding that, upon the insertion of a new element, complete-link hierarchical clustering can lead the splitting of a cluster so as to maintain its "spherical compactness". Do ...
0
votes
0
answers
9
views
Quantifying the distance between two discrete fuzzy sets
I am looking to use fuzzy sets to represent several collections of data points. Then, given a crisp set, I'd like to determine which collection the crisp set is most similar to.
Each collection is ...
0
votes
0
answers
30
views
Spectral Clustering: Finding the normalized minimum cut using the laplacian
I am trying to prove that finding the min $Ncut(A,B)$ for a edge weight graph $W$ with the diagonal matrix of edge degrees $D$ is equivalent to solving for $f \in \{a,b\}^n$ with the constraint that $...
1
vote
0
answers
18
views
Maximum number of local minima in k-means
Suppose $\mathcal{Z} = \{z_1, \dots, z_n\}$ is the set of points in $d$-dimensional Euclidean space. The aim is to partition the dataset into $(K\leq n)$ distinct clusters $R_1,\dots, R_K$ where $R_i\...
0
votes
0
answers
13
views
Metrics for document clustering with measure of synonyms
I asked this question on Data Science stack exchange, but didn't get any responses there.
I have a (finite) vocabulary which is a metric space, where the metric measures how antonymous the words are. ...
3
votes
1
answer
222
views
Why do randomly drawn numbers tend to repeat themselves?
I track the behavior of random numbers and I have discovered that once a number appears, it tends to reappear again shortly thereafter. For example, I've been tracking the Red Powerball in the ...
1
vote
0
answers
29
views
References for a statistics question relating to clustering
I am interested in references for the following research topic. It was mentioned to me that this may be a classically studied question, but I'm unsure what line of work of references to begin looking ...
0
votes
0
answers
23
views
notation for clusters of 2D data points
Is there any convention about the notation to use for clusters of $2-$D data points?
I have a set of clusters of $2-$D data point. I can denote each cluster with $c_i$, where $i = 1, 2, ..., n$, and $...
1
vote
1
answer
39
views
Derivation of a function - GBM
why does the sum disapear in this derivation:
derivation of loss Mean Squared Error. It comes from the following wikipedia page: https://en.wikipedia.org/wiki/Gradient_boosting. It is the last ...
0
votes
0
answers
14
views
Eigenvectors corresponding to eigenvalue 1 in the Normalized Laplacian - Why does it represent clusters?
Consider the Normalized Laplacian associated to a similarty graph
$$
L = D^{-1/2}SD^{-1/2}
$$
I have two sources stating that, in the "ideal case of zero noise", the eigenvectors ...
0
votes
0
answers
16
views
minimizing Earth Mover Distance
So I have a discretized magnitude spectrum $S \in \mathbb{R}^n$ ($n$ number of bins), and a set of frequencies $f_1, f_2, ..., f_m$ (not necessarily corresponding to any of the discretized bin ...
1
vote
0
answers
379
views
What is the correct formula for Within Cluster Sum of Squares
I am studying clustering with K-Means algorithm and I got stumbled in the "inertia", or "within cluster sum of squares" part. First I would appreciate if anyone could explain me ...
1
vote
0
answers
96
views
Modeling a similarity measure between numbers based on predictive probability
Suppose I'm trying to predict a number $v_p \in \mathbb{R}$ and, thanks to sampling, I know that the prediction $v_p=a$ is true in $P(v_p)=P(a)$ percent of cases. In other words, $P(a)$ percent of the ...