All Questions
Tagged with classification dataset
120
questions
0
votes
1
answer
52
views
Train/test split of data, stratified based on label, but ensuring no athletes are In both train/test sets
I’m working on a project that uses data from wearable tech for activity classification. However, I’m having trouble deciding on how to do the train/test split. I’m currently doing the split based on ...
0
votes
0
answers
45
views
When is sampling bias acceptable?
Overview: Dataset is small and a bit messy and the task is to classify 5 classes wherein the targets are ordinal.
Feature Engineering and Selection, Model Tuning, etc. did not produce acceptable ...
0
votes
0
answers
8
views
What is the reference time point relative to which the capital-gain and capital-loss features of the UC Irvine Adults dataset are measured?
The Adults dataset available in the UC Irvine Machine Learning Repository is based on the 1994 census data (USA census, I presume).
The dataset has two features named ...
0
votes
0
answers
12
views
Building a dataset for classification
I'm thinking of building a powershell script classifier using different architectures of neural networks. I have approximately 6k powershell scripts (3k malicious, 3k benign). My questions are: How ...
1
vote
1
answer
26
views
Public Email Classification Dataset but not Spam vs Ham
Context
Working to deliver a POC on automated email classification (in customer service context) to tag emails as related to feedback, complain, lost and found etc. The tags are not entirely exclusive,...
0
votes
0
answers
9
views
Which dataset could be a good choice to train Environment Sound Classification model for user environment awareness while wearing earbus?
Which dataset could be a good choice to train an Environment Sound Classification model for the following use case: use the model in the earbuds/earphones to detect important sound events in the user'...
0
votes
1
answer
29
views
Where can I get 5000+ classified images of zoo animals? [closed]
please help! We are college students doing this for a project. The project is using neural networks and want to build a model that takes in an input of a colored image of an animal and outputs the ...
0
votes
0
answers
35
views
How can I identify coverage types in NFL games using Computer vision
I am currently working on a project that classifies coverage types from sports highlights using advanced computer vision techniques. Next Gen Stats effectively utilizes tracking data to identify ...
0
votes
0
answers
92
views
Seeking datasets for training a Language Model on U.S. mortgage loan processes
I'm in the process of training a Language Model (LLM) and require datasets that encompass various aspects of the U.S. mortgage loan process. The model's aim is to understand and simulate decision-...
0
votes
0
answers
32
views
Optimal ML classification approach
Background: I have an app data (impressions, user activities) that I can use as features for a multiclass classifier (5 classes).
I just want to discuss about some things that our team is having a ...
0
votes
1
answer
31
views
Binary Classification of Images- CNN
I am learning ML and am working on a CNN problem where I need to classify images of CATS and DOGS.
The way I have setup the labels is that cats are 1 and dogs are 0. I have made the final output layer ...
1
vote
2
answers
96
views
Cluster/Similarity problem with two datasets of different cardinality
I want to cluster financial products according to their similarity. I have two dataset of different cardinality:
One-to-One dataset: One ID has One attribute/feature per column - Describes a ...
1
vote
1
answer
107
views
How to know the confidence of a classification on unlabeled data generated after model training?
I have created (in python) the code for a Random Forest classification model for a labeled dataset using sklearn. The model works very well.
...
0
votes
0
answers
45
views
Open source dataset (manufacturing, machine operations)
I am looking for an open source dataset from the manufacturing domain (sensor data, time series) with specific traits. It should stem from a process consisting of a sequence of distinct machine ...
0
votes
1
answer
12
views
Classification Problematics : Feature Number Variance & Feature Repetition
I have a harsh case study (in my mind).
The problem is I need make binary classification on Quality of Service (good or bad). I have a feedback on quality on groups of devices belonging to company. I ...