Skip to main content

Questions tagged [exploratory-data-analysis]

EDA stands for "Exploratory data analysis". Developed by Tukey to contrast with Confirmatory Data Analysis or CDA (the formal testing of hypotheses). EDA is typically concerned with describing data numerically and graphically to make the data easier to understand and to yield new insights.

0 votes
0 answers
55 views

Which statistical model will be best for this data?

I'm trying to identify the relationship between the dependent variable and the independent variables. I've utilized linear regression, but I'm not sure if it's suitable given the distribution of my ...
Chemokine1's user avatar
0 votes
0 answers
35 views

Average Variance Extracted and Factor loading cutoff not aligning

The cutoff for factor loadings is generally around 0.4 (Stevens 1992), Fidell (2007), 0.32 (poor), 0.45 (fair), 0.55 (good) by follow Comrey and Lee (1992). But for convergent validity you require ...
Rahul Kiroriwal's user avatar
0 votes
0 answers
19 views

Vocabulary in EDA

I've found myself straggling to find appropriate vocabulary to describe a scatterplot as part of exploratory data analysis. I've found this paper about the graph theory and although it's interesting I'...
Lefty's user avatar
  • 499
1 vote
0 answers
37 views

Valid forms of exploratory data analysis for time series that don't assume stationarity?

Lets say we are given a time series sample and want to try to create a model to forecast future values of said time series When trying to build a model to forecast time series data, many statistics ...
QMath's user avatar
  • 451
3 votes
1 answer
91 views

Exploratory analysis to find out characteristics of low scorers

I'm currently looking at three specific questions of a feedback survey and have been tasked with finding out the characteristics of the lowest scorers, to see if there are any patterns or common ...
sixfortyseven's user avatar
0 votes
0 answers
48 views

Outliers in EDA - With or without?

I'm trying to carry out my first EDA on a Student Performance dataset. The dataset has 395 samples and consists of 33 attributes. After drawing the boxplots and doing some tests I detected outliers in ...
Christina Kataki's user avatar
2 votes
0 answers
51 views

Variable Selection for Longitudinal Data with a Binary Outomce

I have a large longitudinal dataset (100,000 observations) with firm IDs and Years with about 1000 features (most numeric and ...
thatsroughbuddy's user avatar
2 votes
1 answer
148 views

EDA and Model Selection for Forecasting while avoiding Data Leakage

How to do EDA and model selection for time series forecasting without data leakage? Im assuming just checking for missing values is ok. But is graphing the entire time series considered data leakage? ...
pandashelp's user avatar
0 votes
0 answers
19 views

Calculate when a time series will reach s specific value

I have a time series at day level granularity: Date. Value 2023-10-01 78945 2023-10-02 78990 2023-10-03 79005 2023-10-04 78999 ... While there are some ...
Ricky's user avatar
  • 101
0 votes
0 answers
18 views

multivariate analysis for seasonal patterns among rivers

I sampled three different rivers over one year during the four seasons. In addition, in the autumn and winter seasons I sampled both during an event of high (drought period) and low water (flood ...
Rudy Benetti's user avatar
1 vote
0 answers
27 views

What kind of machine learning model could I use on this dataset?

I am a beginner to data science. I found this dataset that covers natural disaster incidents in Afghanistan from 2016 - present. Here are the 13 columns: REGION (South West, North, etc), PROV_CODE (...
Mas's user avatar
  • 11
1 vote
0 answers
34 views

How do i prioritize which features to use in my machine learning model before the feature engineering stage?

I am encountering a probably fairly common problem where I have too many features, lets say 500 possible features. I only want to pick the top 10-50 features that would be the most predictive of y, or ...
Katsu's user avatar
  • 1,021
0 votes
0 answers
20 views

Finding functional forms and state space in a state-space model

Let's say that we have a time series: $$\{y_i\}_{i=0}^{n-1}, y_i \in \mathcal{Y} \subseteq \mathbb{R}^m, m \in \mathbb{N}_1$$ that we would like to model with some sort of state-space model: $$x_i = f(...
QMath's user avatar
  • 451
0 votes
0 answers
97 views

exploratory data analysis on a large dataset with hundreds of variables

I'm working on longitudinal data with repeated measures for each subject and hundreds of variables. I would like to use linear mixed model to look at the mean response of each dependent variable at ...
Ed9012's user avatar
  • 391
0 votes
0 answers
37 views

Best way to perform a reliability growth analysis and answering some questions like "the reliability is getting better?"

everyone! I'm a reliability engineer and a project was given to me two weeks ago. I need to answer some questions about the reliability of a specific item in a train. There are 189 items, one for each ...
Caio Rodrigues's user avatar

15 30 50 per page
1
2 3 4 5
22