All Questions
Tagged with classification machine-learning
1,359
questions
0
votes
0
answers
12
views
Does it make sense to have object detection model followed by a classification model
So i was working with the SKU110k dataset and i was required to identify the different items in the shelf as well but the SKU110k dataset only annotated shelf items but did not identify them. So i ...
0
votes
0
answers
11
views
NER with custom tags and no training data, zero shot approach help
I am building a "field tagger" for documents. Basically, a document, in my case something like a proposal or sales quote, would have a bunch of entities scattered throughout it, and we want ...
0
votes
0
answers
32
views
How does one handle a dataset with groups of features and groups of labels in classification?
I have a large dataset (1.8mil samples). There are 15 features: x1, y1, z1, e1, d1, x2,..., d3. (x,y,z) are coordinates, e is energy, and d is a derived feature- Euclidean distance between the ...
1
vote
1
answer
24
views
Everything is classified as background by segmentation model
I am training a U-NET model for medical image segmentation. Problem is that the binary masks that im using to train the model mostly consist of background pixels and a very small region of the whole ...
0
votes
0
answers
23
views
How to choose thresholds to discretize target for binary classification
My group is using logistic regression to investigate the most predictive features in a dataset. Our target variable is actually a continuous variable that we discretized using two cutoff thresholds (...
1
vote
1
answer
77
views
How do I compute and plot Bias and Variance of a classifier in Python?
I'm new to Machine Learning and I understand bias and variance in theory but I can't seem to find a single source that explains how bias or variance can be computed. I'd like to do it in Python and ...
0
votes
1
answer
25
views
Fixing class imbalance vs Over-detecting in test data
In my experiences, binary classifiers tend do better in terms of F1 scores when the class imbalance is at least reduced. However, this leads to over-predicting in the test data.
(Thought) Example: If ...
0
votes
0
answers
16
views
How to choose segment in Grouped AUC metric?
Background
In Binary Classification, AUC is a common metric. However, Group-AUC performs better in some scenario, such as we use AUC grouped by user in recommendation systems.
In the below examples, I ...
1
vote
1
answer
26
views
Feature Engineering a Recency feature
I have a customer scoring problem I'm working on specifically on predicting conversion and coming up with a probability score on conversion (using xgboost classifier atm). There's a feature I want to ...
0
votes
0
answers
13
views
Modeling spatial data
I have the following dataset. For every time point (at a frequency of 1 hour), we can construct a graph consisting of 20 nodes representing countries. Each country (node) is characterized by 5 ...
0
votes
0
answers
37
views
Determining VCdim for union of subspaces $H_i$ - short question
Consider $\mathcal{H} = \mathcal{H}_1 \cup \mathcal{H}_2 \cup \mathcal{H}_3$, where:
$\mathcal{H_1} = \{h_{a} : \mathbb{R} \rightarrow \{0,1\} \ | \ h_{a}(x) = 1_{[x \geq a]}(x) = 1_{[a, +\infty)}(x), ...
0
votes
1
answer
23
views
Should I standardise time series data for deep learning classification?
Say I have time series data for classifying stars using deep learning based on stellar variability, with each time series data measuring the flux of the star overtime. For each star, I have the data ...
1
vote
1
answer
39
views
Data binning for interval data
I am trying to create a ML model for salary classification into 5 categories (0-90k, 90-120k, 120-180k and so on).
The problem is that in my dataset almost all salary data is presented in intervals. ...
0
votes
0
answers
45
views
When is sampling bias acceptable?
Overview: Dataset is small and a bit messy and the task is to classify 5 classes wherein the targets are ordinal.
Feature Engineering and Selection, Model Tuning, etc. did not produce acceptable ...
1
vote
0
answers
7
views
Is GroupKFold needed if some samples have some of their feature values equal?
I am given a dataset $D$ of 10k enzyme-substrate complexes having a lock-key relationship, with each sample (complex) being characterized by enzyme features $x_e$ and substrate features $x_s$. That is,...