Newest 'classification+data-mining' Questions

0 votes

0 answers

92 views

Seeking datasets for training a Language Model on U.S. mortgage loan processes

I'm in the process of training a Language Model (LLM) and require datasets that encompass various aspects of the U.S. mortgage loan process. The model's aim is to understand and simulate decision-...

Anand

1

asked Nov 30, 2023 at 7:53

0 votes

2 answers

74 views

Been stuck on a DS problem. Just need to know whether this problem statement is solvable or not

I've been stuck on the following problem for weeks now. To be clear I'm not asking the community to provide a full solution. Just a few ideas or at least confirmation on whether this problem statement ...

Aakash Dusane

173

asked Jul 23, 2023 at 4:52

0 votes

2 answers

109 views

How do I separate periodic data from time series data?

I am currently working on a classification task of gym exercises based on accelerometer data. I am trying modularize window extraction so I can train my model based on metrics within a window (which ...

David Chen

1

asked Jun 26, 2023 at 7:44

1 vote

2 answers

553 views

what qualifies as a data leakage?

I am currently working on a binary classification problem using imbalanced data. The algorithm that I am using is random forest. The problem is about predicting whether each sales project will meet ...

The Great

2,585

asked May 15, 2023 at 1:12

5 votes

1 answer

351 views

Are imbalanced data problems solvable? [closed]

I am working as a data scientist for the past 2 years where I have worked on problems related to binary classification, revenue prediction etc. In the past two years, I have had 2 problems that ...

The Great

2,585

asked May 12, 2023 at 13:06

0 votes

1 answer

85 views

Classification of a noisy data

What method can be used to classify data in the following example? There is a table (hundreds of strings and hundreds of columns). Several columns in this table uniquely allow you to classify each row:...

Mic

1

asked Feb 24, 2023 at 14:13

1 vote

1 answer

48 views

How to increase retention?

As you might already know there is a concept of retention. Let's say I have created a game and today hundred people have downloaded my game. Let's say tomorrow 47 out of yesterday's hundred people are ...

Narek

121

asked Jan 22, 2023 at 15:49

0 votes

1 answer

33 views

Classification for choice data

It is essentially a choice modelling problem, but hopefully can be addressed by classification. Suppose one needs to choose a route to drive to work among many candidates in his mind. These candidates ...

GDI

101

asked Oct 19, 2022 at 20:18

2 votes

0 answers

52 views

Should credit be given to AI model - low data scenario [closed]

In my office, we recently built an AI model for project success prediction using binary classification. Though the dataset size was small (977 records), my boss still wanted to go ahead with the POC ...

The Great

2,585

asked Sep 14, 2022 at 15:21

1 vote

1 answer

176 views

Labelling for churn measurement

I have 3 domains of supplier data (Jan 2017 to Jan 2022) and they are as follows a) Purchase data - Contains all the purchase (of product) data made by the suppliers with us. It contains columns such ...

The Great

2,585

asked Apr 19, 2022 at 14:07

0 votes

1 answer

78 views

Comparing two groups at large scale

Let's consider we have two datasets. Dataset "A" and Dataset "B". Dataset "A" has two columns. Supplier_id and "Status" (pass and fail are values for status ...

The Great

2,585

asked Apr 6, 2022 at 1:08

1 vote

1 answer

30 views

Looking for in depth knowledge in evalution metric

I am dealing with an unbalanced dataset. The total instances in my dataset is 1273 and the Yes class is 174 and No class is 1099. So the unbalance ratio is like 1:6. Now I know ...

Encipher

361

asked Mar 4, 2022 at 7:21

1 vote

0 answers

467 views

Intuitive explanation of FOIL's gain in Rule-based classification

I encounter the formula for calculating FOIL's gain as below: $$FOIL's\space gain = p_0(log_2(\frac{p_1}{p_1+n_1}) - log_2(\frac{p_0}{p_0+n_0}))$$ unlike Information gain or Gini index used to measure ...

tmo

11

asked Mar 4, 2022 at 0:37

0 votes

1 answer

409 views

How to interpret the score output by a binary classifier when using a threshold < 0.5?

My understanding is that a score output by a binary classifier e.g. logistic regression for an input instance, is interpreted as the probability of the instance belonging to class 1. The threshold 0.5 ...

David Tian

86

asked Feb 24, 2022 at 17:59

1 vote

0 answers

1k views

SMOTE before categorical encoding vs SMOTE after categorical encoding

I have a small dataset of 977 rows with a class proportion of 77:23. For the sake of metrics improvement, I have kept my minority class ('default') as class 1 (and 'not default' as class 0). My input ...

The Great

2,585

asked Feb 20, 2022 at 14:53

Stack Exchange Network

All Questions

Seeking datasets for training a Language Model on U.S. mortgage loan processes

Been stuck on a DS problem. Just need to know whether this problem statement is solvable or not

How do I separate periodic data from time series data?

what qualifies as a data leakage?

Are imbalanced data problems solvable? [closed]

Classification of a noisy data

How to increase retention?

Classification for choice data

Should credit be given to AI model - low data scenario [closed]

Labelling for churn measurement

Comparing two groups at large scale

Looking for in depth knowledge in evalution metric

Intuitive explanation of FOIL's gain in Rule-based classification

How to interpret the score output by a binary classifier when using a threshold < 0.5?

SMOTE before categorical encoding vs SMOTE after categorical encoding

Hot Network Questions

All Questions

Related Tags