Skip to main content

All Questions

Tagged with
0 votes
1 answer
52 views

Train/test split of data, stratified based on label, but ensuring no athletes are In both train/test sets

I’m working on a project that uses data from wearable tech for activity classification. However, I’m having trouble deciding on how to do the train/test split. I’m currently doing the split based on ...
Shane O Mahony's user avatar
0 votes
0 answers
45 views

When is sampling bias acceptable?

Overview: Dataset is small and a bit messy and the task is to classify 5 classes wherein the targets are ordinal. Feature Engineering and Selection, Model Tuning, etc. did not produce acceptable ...
easymoneysniper's user avatar
0 votes
0 answers
8 views

What is the reference time point relative to which the capital-gain and capital-loss features of the UC Irvine Adults dataset are measured?

The Adults dataset available in the UC Irvine Machine Learning Repository is based on the 1994 census data (USA census, I presume). The dataset has two features named ...
Evan Aad's user avatar
  • 175
0 votes
0 answers
12 views

Building a dataset for classification

I'm thinking of building a powershell script classifier using different architectures of neural networks. I have approximately 6k powershell scripts (3k malicious, 3k benign). My questions are: How ...
freaks's user avatar
  • 1
1 vote
1 answer
26 views

Public Email Classification Dataset but not Spam vs Ham

Context Working to deliver a POC on automated email classification (in customer service context) to tag emails as related to feedback, complain, lost and found etc. The tags are not entirely exclusive,...
Della's user avatar
  • 335
0 votes
0 answers
9 views

Which dataset could be a good choice to train Environment Sound Classification model for user environment awareness while wearing earbus?

Which dataset could be a good choice to train an Environment Sound Classification model for the following use case: use the model in the earbuds/earphones to detect important sound events in the user'...
Danijel's user avatar
  • 173
0 votes
1 answer
29 views

Where can I get 5000+ classified images of zoo animals? [closed]

please help! We are college students doing this for a project. The project is using neural networks and want to build a model that takes in an input of a colored image of an animal and outputs the ...
user90061's user avatar
0 votes
0 answers
35 views

How can I identify coverage types in NFL games using Computer vision

I am currently working on a project that classifies coverage types from sports highlights using advanced computer vision techniques. Next Gen Stats effectively utilizes tracking data to identify ...
Shah Zeb's user avatar
0 votes
0 answers
92 views

Seeking datasets for training a Language Model on U.S. mortgage loan processes

I'm in the process of training a Language Model (LLM) and require datasets that encompass various aspects of the U.S. mortgage loan process. The model's aim is to understand and simulate decision-...
Anand 's user avatar
0 votes
0 answers
32 views

Optimal ML classification approach

Background: I have an app data (impressions, user activities) that I can use as features for a multiclass classifier (5 classes). I just want to discuss about some things that our team is having a ...
easymoneysniper's user avatar
0 votes
1 answer
31 views

Binary Classification of Images- CNN

I am learning ML and am working on a CNN problem where I need to classify images of CATS and DOGS. The way I have setup the labels is that cats are 1 and dogs are 0. I have made the final output layer ...
Hussain Bhavnagarwala's user avatar
1 vote
2 answers
96 views

Cluster/Similarity problem with two datasets of different cardinality

I want to cluster financial products according to their similarity. I have two dataset of different cardinality: One-to-One dataset: One ID has One attribute/feature per column - Describes a ...
Maeaex1's user avatar
  • 550
1 vote
1 answer
107 views

How to know the confidence of a classification on unlabeled data generated after model training?

I have created (in python) the code for a Random Forest classification model for a labeled dataset using sklearn. The model works very well. ...
Daniel Vieira's user avatar
0 votes
0 answers
45 views

Open source dataset (manufacturing, machine operations)

I am looking for an open source dataset from the manufacturing domain (sensor data, time series) with specific traits. It should stem from a process consisting of a sequence of distinct machine ...
sinpalabras's user avatar
0 votes
1 answer
12 views

Classification Problematics : Feature Number Variance & Feature Repetition

I have a harsh case study (in my mind). The problem is I need make binary classification on Quality of Service (good or bad). I have a feedback on quality on groups of devices belonging to company. I ...
secuf's user avatar
  • 1

15 30 50 per page
1
2 3 4 5
8