Skip to main content

Questions tagged [clustering]

Clustering is grouping (partitioning) a set of objects so that items in the same group are more similar to each other than to items in different groups, where the notion of similarity may be variously defined.

0 votes
1 answer
19 views

Clustering for a real problem - location matters!

I am working on a clustering problem and need some help to develop an appropriate mathematical model. Here are the details of my problem: Locations: I have a set of 141 locations, each defined by ...
juasmilla's user avatar
  • 101
0 votes
0 answers
16 views

Clustering a sequence of Bernoulli random variables

Let $Z_1$, ..., $Z_n$ be a sequence of independent Bernoulli random variables such that for all $i\in\left\{1,..,n\right\}$ $Z_i\sim\mathcal{B}(p_i)$ where $p_i < 1/2$. Define $\ell(x_{1:n}, y_{1:n}...
Ibra's user avatar
  • 175
0 votes
0 answers
5 views

Splitting upon insertion in hierarchical clustering

It's my understanding that, upon the insertion of a new element, complete-link hierarchical clustering can lead the splitting of a cluster so as to maintain its "spherical compactness". Do ...
Tfovid's user avatar
  • 153
0 votes
0 answers
9 views

Quantifying the distance between two discrete fuzzy sets

I am looking to use fuzzy sets to represent several collections of data points. Then, given a crisp set, I'd like to determine which collection the crisp set is most similar to. Each collection is ...
Alex 's user avatar
0 votes
0 answers
30 views

Spectral Clustering: Finding the normalized minimum cut using the laplacian

I am trying to prove that finding the min $Ncut(A,B)$ for a edge weight graph $W$ with the diagonal matrix of edge degrees $D$ is equivalent to solving for $f \in \{a,b\}^n$ with the constraint that $...
bluesquare's user avatar
1 vote
0 answers
18 views

Maximum number of local minima in k-means

Suppose $\mathcal{Z} = \{z_1, \dots, z_n\}$ is the set of points in $d$-dimensional Euclidean space. The aim is to partition the dataset into $(K\leq n)$ distinct clusters $R_1,\dots, R_K$ where $R_i\...
entropy's user avatar
  • 147
0 votes
0 answers
13 views

Metrics for document clustering with measure of synonyms

I asked this question on Data Science stack exchange, but didn't get any responses there. I have a (finite) vocabulary which is a metric space, where the metric measures how antonymous the words are. ...
user avatar
3 votes
1 answer
222 views

Why do randomly drawn numbers tend to repeat themselves?

I track the behavior of random numbers and I have discovered that once a number appears, it tends to reappear again shortly thereafter. For example, I've been tracking the Red Powerball in the ...
steveK's user avatar
  • 137
1 vote
0 answers
29 views

References for a statistics question relating to clustering

I am interested in references for the following research topic. It was mentioned to me that this may be a classically studied question, but I'm unsure what line of work of references to begin looking ...
spectrum's user avatar
0 votes
0 answers
23 views

notation for clusters of 2D data points

Is there any convention about the notation to use for clusters of $2-$D data points? I have a set of clusters of $2-$D data point. I can denote each cluster with $c_i$, where $i = 1, 2, ..., n$, and $...
Ommo's user avatar
  • 349
1 vote
1 answer
39 views

Derivation of a function - GBM

why does the sum disapear in this derivation: derivation of loss Mean Squared Error. It comes from the following wikipedia page: https://en.wikipedia.org/wiki/Gradient_boosting. It is the last ...
F.I.'s user avatar
  • 15
0 votes
0 answers
14 views

Eigenvectors corresponding to eigenvalue 1 in the Normalized Laplacian - Why does it represent clusters?

Consider the Normalized Laplacian associated to a similarty graph $$ L = D^{-1/2}SD^{-1/2} $$ I have two sources stating that, in the "ideal case of zero noise", the eigenvectors ...
ygh's user avatar
  • 121
0 votes
0 answers
16 views

minimizing Earth Mover Distance

So I have a discretized magnitude spectrum $S \in \mathbb{R}^n$ ($n$ number of bins), and a set of frequencies $f_1, f_2, ..., f_m$ (not necessarily corresponding to any of the discretized bin ...
SmoothKen's user avatar
  • 429
1 vote
0 answers
379 views

What is the correct formula for Within Cluster Sum of Squares

I am studying clustering with K-Means algorithm and I got stumbled in the "inertia", or "within cluster sum of squares" part. First I would appreciate if anyone could explain me ...
Artur Juan Dantas's user avatar
1 vote
0 answers
96 views

Modeling a similarity measure between numbers based on predictive probability

Suppose I'm trying to predict a number $v_p \in \mathbb{R}$ and, thanks to sampling, I know that the prediction $v_p=a$ is true in $P(v_p)=P(a)$ percent of cases. In other words, $P(a)$ percent of the ...
Ben W's user avatar
  • 31

15 30 50 per page
1
2 3 4 5
22