Questions tagged [synthetic-data]
The synthetic-data tag has no usage guidance.
101
questions
0
votes
0
answers
18
views
Is this a reasonable way to check the quality of simulated data in MCMC inference?
I have a hierarchical Bayesian model that looks like this:
$\alpha_i \sim \mathcal{N}\left(\mu_\alpha, \sigma_\alpha\right) \tag{1}$
$\beta_i \sim \mathcal{N}\left(\mu_\beta, \sigma_\beta\right) \tag{...
4
votes
1
answer
41
views
Generating synthetic data with multiple records per ID
I would like to generate a synthetic dataset where there are multiple records per ID, and self-consistency is maintained among records of each ID.
For example, imagine a dataset where the ID is a ...
3
votes
0
answers
45
views
References for Generation of Synthetic Data
What are some of the introductory textbooks/references specifically on the task of generating synthetic data (from real data)? If possible, such a text is expected to cover a range of methods, be it ...
0
votes
0
answers
31
views
Can I generate a time series with same features as a given dataset, but add a known linear trend coefficient (not just trend strength)?
I want to generate data that matches features of environmental data (that is often analysed with nonparametric tests due to nonrmality, skewness etc). I want to know how to best capture any linear ...
1
vote
0
answers
12
views
Creating a dataset which provides specific regression output
I need to create a synthetic dataset with 1000 rows for two variables X and Y. I need the relationship between X and Y to be set up such that when I run a regression model, it provides specific output,...
1
vote
1
answer
52
views
Are synthetic data produced by Gretel, YData, MostlyAI, etc. of higher quality than sdv-dev CTGAN?
There are some online services that we can use to generate synthetic data.
On the other hand, we can also use sdv-dev CTGAN from GitHub.
Are synthetic data produced by Gretel, YData, MostlyAI, etc. of ...
0
votes
0
answers
15
views
sampling correlated random variables using copula
i have a very small set of data, which is a collection of vectors with 4 element (for the sake of simplicity). the 4 marginal distributions are quite diversify (they are gaussian-like or sine-like). ...
0
votes
0
answers
41
views
Imputing missing observations of zip code level data
I am looking for a sufficient imputation method for missing observations in my zip code level data, using R.
I have a random sample consisting of households which live in different zip codes within ...
0
votes
0
answers
32
views
How can I create realistic noisy data from distributions?
I want to create synthetic data from stitched distributions in order to test some models on them (for example Gaussian stitched with a GPD at quantile q).
I'm currently simply sampling N*q points from ...
0
votes
0
answers
71
views
asking for synthetic control method with multiples outcomes and multiple treated units stats command please
I am trying to do synthetic control method with 2 treated provinces, 10 non-treated provinces, and 7 outcomes. Can anyone let me know which stata command I can use?
0
votes
0
answers
17
views
Generate a Synthetic Bivariate Data for Testing Simple Linear Regression Using Excel
I come up with a solution as below. What do you think? Do you have any other way?
Linear regression modeled as;
y(i) = a + b*x(i) + e(i)
a and b is constant, thus ...
1
vote
0
answers
200
views
How to generate random values based on mean, standard deviation, skew and kurtosis in Python?
Given these values, is it possible to generate random values that conform to this distribution (using Python, but preferably without the SciPy package)?
Statistic
Value
Mean
1.518
Std Dev
24.827
...
2
votes
0
answers
28
views
How can I introduce dependence (to varying degrees) into a synthetic dataset to measure the effect on my method?
I'm using a synthetic dataset in which I sample from three independent Bernoulli random variables x1, x2 and x3 with p=p1, p=p2 and p=p3 respectively. I wish to "introduce dependence," or ...
1
vote
0
answers
62
views
Synthetic data for PCA [closed]
I was trying to evaluate different algorithms for PCA like eigenvalue decomposition, SVD, Lanczos Algorithm, Power iteration.
I also want to do some other analysis
I could find any papers concerning ...
0
votes
0
answers
123
views
How can I model the multivariate probability distribution of a dataset with both continuous and discrete variables for sampling?
This might seem like a duplicate of the following link, but I think that one is asking how to create a completely new dataset with specific distributions, rather than how to model an existing dataset ...