Skip to main content

All Questions

2 votes
1 answer
34 views

Grouping similar classes to improve accuracy, whilst maximising the number of classes

Suppose I have a large number of distinct classes, some of which are related. My model has high classification accuracy for some classes, whilst other classes are hard to predict. How could I group ...
MuhammedYunus's user avatar
0 votes
0 answers
13 views

Need to compare results using Ward's method

So I create clusters like this and StandardScale them ...
Poyo's user avatar
  • 1
0 votes
1 answer
25 views

Group/cluster semantically similar classes in reports?

I'm fine-tuning BERT models to binary classify reports. For example, a report can be about 'birds' or not about 'birds'. This works really well, but now I want to do multi-label classification, ...
Rob Audenaerde's user avatar
0 votes
0 answers
14 views

How to classify/recognize postage stamp varieties?

As a hobbiest stamp collector, I often run into the need for classifying stamps based on minute differences, such as these: Now, I literally have thousands of them (in ziploc bags) and I am planning ...
René Becker's user avatar
0 votes
0 answers
37 views

Classifying Players as winners or losers

I have a dataset that I curated from a game that I play. There are currently 130 instances (i.e. players) and an innumerable number of features. Experience tells me <10 features would be sufficient....
Shawn's user avatar
  • 35
0 votes
0 answers
9 views

Unsupervised learning with bags of words with a word metric

I would like to perform clustering on a collection of documents with the assumption that I have a metric $\rho$ which tells me how close two words are to being synonyms. If $\mathcal{W}$ is our ...
jwhite's user avatar
  • 101
0 votes
0 answers
23 views

Choosing a cluster validation measure for graph clustering algorithm

I am currently solving a clustering problem. Objects to be clustered are represented as sparse vectors in R^N, N=10. The number of objects is about 1kk. To cluster, I build a graph keeping the largest ...
Sergey Tkachenko's user avatar
0 votes
0 answers
12 views

How to solve classification problem that we should cluster elements, with Multinomial classification from CS229?

I just learned about Multinomial classification (CS229 Lecture note (What I learned is on page 24)) and I attempted to solve a problem that Obesity classification from Kaggle. Kaggle Link I tried to ...
Gosu Choi's user avatar
1 vote
2 answers
96 views

Cluster/Similarity problem with two datasets of different cardinality

I want to cluster financial products according to their similarity. I have two dataset of different cardinality: One-to-One dataset: One ID has One attribute/feature per column - Describes a ...
Maeaex1's user avatar
  • 550
4 votes
1 answer
329 views

Solve tough clustering problem with overlapping clusters

I'm having some trouble to solve a hard clustering problem. I have a 2D dataset characterized by non spherical and partially overlaping clusters with different densities. I've read a lot about ...
Lorenço Santos's user avatar
0 votes
2 answers
53 views

Text Classification Taking too long

I have a sample of 135k documents that are preprocessed, and to which I calculated TFIDF. I tried clustering with KMeans, which gave me a memory problem (20GB). Then, i tried with MiniBatch K-Means ...
ayowhatthedogdoin's user avatar
0 votes
2 answers
88 views

Different Algorithms for 50-50 A/B Testing

We are running A/B tests on web app customers, given a customerId. Each customer will see different web-feature designs. Trying to prevent usage of Feature Flags as its not currently setup yet in our ...
mattsmith5's user avatar
1 vote
1 answer
24 views

Movement in cohorts

I am working on a user sales data which gets updated week over week. Based on the sales done in each week, the user is categorized in segment A, B or C. This means size of each segment could change ...
Sham's user avatar
  • 31
0 votes
0 answers
23 views

Determine unusual occurrence of words in classes

I am working on a project where I have 20+ classes/groups. Each of these groups perform certain text searches. I am looking for specific keywords example 'code' which is an anomaly. The challenge is ...
kruparulz14's user avatar
1 vote
1 answer
476 views

In DBSCAN, can the distance between a Noise Point and Border Point be less than Epsilon?

In DBSCAN: A core point is a point which has at least "MinPts" points inside its Epsilon radius. A border point is a point inside the Epsilon radius of a core point, but it has a number of ...
SuperFluo's user avatar

15 30 50 per page
1
2 3 4 5
8