Questions tagged [exploratory-data-analysis]
EDA stands for "Exploratory data analysis". Developed by Tukey to contrast with Confirmatory Data Analysis or CDA (the formal testing of hypotheses). EDA is typically concerned with describing data numerically and graphically to make the data easier to understand and to yield new insights.
326
questions
0
votes
0
answers
55
views
Which statistical model will be best for this data?
I'm trying to identify the relationship between the dependent variable and the independent variables. I've utilized linear regression, but I'm not sure if it's suitable given the distribution of my ...
0
votes
0
answers
35
views
Average Variance Extracted and Factor loading cutoff not aligning
The cutoff for factor loadings is generally around 0.4 (Stevens 1992), Fidell (2007), 0.32 (poor), 0.45 (fair), 0.55 (good) by follow Comrey and Lee (1992). But for convergent validity you require ...
0
votes
0
answers
19
views
Vocabulary in EDA
I've found myself straggling to find appropriate vocabulary to describe a scatterplot as part of exploratory data analysis.
I've found this paper about the graph theory and although it's interesting I'...
1
vote
0
answers
37
views
Valid forms of exploratory data analysis for time series that don't assume stationarity?
Lets say we are given a time series sample and want to try to create a model to forecast future values of said time series
When trying to build a model to forecast time series data, many statistics ...
3
votes
1
answer
91
views
Exploratory analysis to find out characteristics of low scorers
I'm currently looking at three specific questions of a feedback survey and have been tasked with finding out the characteristics of the lowest scorers, to see if there are any patterns or common ...
0
votes
0
answers
48
views
Outliers in EDA - With or without?
I'm trying to carry out my first EDA on a Student Performance dataset. The dataset has 395 samples and consists of 33 attributes. After drawing the boxplots and doing some tests I detected outliers in ...
2
votes
0
answers
51
views
Variable Selection for Longitudinal Data with a Binary Outomce
I have a large longitudinal dataset (100,000 observations) with firm IDs and Years with about 1000 features (most numeric and ...
2
votes
1
answer
148
views
EDA and Model Selection for Forecasting while avoiding Data Leakage
How to do EDA and model selection for time series forecasting without data leakage?
Im assuming just checking for missing values is ok. But is graphing the entire time series considered data leakage?
...
0
votes
0
answers
19
views
Calculate when a time series will reach s specific value
I have a time series at day level granularity:
Date. Value
2023-10-01 78945
2023-10-02 78990
2023-10-03 79005
2023-10-04 78999
...
While there are some ...
0
votes
0
answers
18
views
multivariate analysis for seasonal patterns among rivers
I sampled three different rivers over one year during the four seasons. In addition, in the autumn and winter seasons I sampled both during an event of high (drought period) and low water (flood ...
1
vote
0
answers
27
views
What kind of machine learning model could I use on this dataset?
I am a beginner to data science. I found this dataset that covers natural disaster incidents in Afghanistan from 2016 - present. Here are the 13 columns: REGION (South West, North, etc), PROV_CODE (...
1
vote
0
answers
34
views
How do i prioritize which features to use in my machine learning model before the feature engineering stage?
I am encountering a probably fairly common problem where I have too many features, lets say 500 possible features.
I only want to pick the top 10-50 features that would be the most predictive of y, or ...
0
votes
0
answers
20
views
Finding functional forms and state space in a state-space model
Let's say that we have a time series:
$$\{y_i\}_{i=0}^{n-1}, y_i \in \mathcal{Y} \subseteq \mathbb{R}^m, m \in \mathbb{N}_1$$
that we would like to model with some sort of state-space model:
$$x_i = f(...
0
votes
0
answers
97
views
exploratory data analysis on a large dataset with hundreds of variables
I'm working on longitudinal data with repeated measures for each subject and hundreds of variables. I would like to use linear mixed model to look at the mean response of each dependent variable at ...
0
votes
0
answers
37
views
Best way to perform a reliability growth analysis and answering some questions like "the reliability is getting better?"
everyone!
I'm a reliability engineer and a project was given to me two weeks ago. I need to answer some questions about the reliability of a specific item in a train. There are 189 items, one for each ...