Skip to main content

Questions tagged [synthetic-data]

The tag has no usage guidance.

0 votes
0 answers
18 views

Is this a reasonable way to check the quality of simulated data in MCMC inference?

I have a hierarchical Bayesian model that looks like this: $\alpha_i \sim \mathcal{N}\left(\mu_\alpha, \sigma_\alpha\right) \tag{1}$ $\beta_i \sim \mathcal{N}\left(\mu_\beta, \sigma_\beta\right) \tag{...
chesslad's user avatar
  • 211
4 votes
1 answer
41 views

Generating synthetic data with multiple records per ID

I would like to generate a synthetic dataset where there are multiple records per ID, and self-consistency is maintained among records of each ID. For example, imagine a dataset where the ID is a ...
user12138762's user avatar
3 votes
0 answers
45 views

References for Generation of Synthetic Data

What are some of the introductory textbooks/references specifically on the task of generating synthetic data (from real data)? If possible, such a text is expected to cover a range of methods, be it ...
0 votes
0 answers
31 views

Can I generate a time series with same features as a given dataset, but add a known linear trend coefficient (not just trend strength)?

I want to generate data that matches features of environmental data (that is often analysed with nonparametric tests due to nonrmality, skewness etc). I want to know how to best capture any linear ...
Justin Murphy's user avatar
1 vote
0 answers
12 views

Creating a dataset which provides specific regression output

I need to create a synthetic dataset with 1000 rows for two variables X and Y. I need the relationship between X and Y to be set up such that when I run a regression model, it provides specific output,...
mp9828's user avatar
  • 11
1 vote
1 answer
52 views

Are synthetic data produced by Gretel, YData, MostlyAI, etc. of higher quality than sdv-dev CTGAN?

There are some online services that we can use to generate synthetic data. On the other hand, we can also use sdv-dev CTGAN from GitHub. Are synthetic data produced by Gretel, YData, MostlyAI, etc. of ...
user366312's user avatar
  • 2,201
0 votes
0 answers
15 views

sampling correlated random variables using copula

i have a very small set of data, which is a collection of vectors with 4 element (for the sake of simplicity). the 4 marginal distributions are quite diversify (they are gaussian-like or sine-like). ...
Physics Student's user avatar
0 votes
0 answers
41 views

Imputing missing observations of zip code level data

I am looking for a sufficient imputation method for missing observations in my zip code level data, using R. I have a random sample consisting of households which live in different zip codes within ...
Ottibanane123's user avatar
0 votes
0 answers
32 views

How can I create realistic noisy data from distributions?

I want to create synthetic data from stitched distributions in order to test some models on them (for example Gaussian stitched with a GPD at quantile q). I'm currently simply sampling N*q points from ...
Philippe Ear's user avatar
0 votes
0 answers
71 views

asking for synthetic control method with multiples outcomes and multiple treated units stats command please

I am trying to do synthetic control method with 2 treated provinces, 10 non-treated provinces, and 7 outcomes. Can anyone let me know which stata command I can use?
Kimdung's user avatar
0 votes
0 answers
17 views

Generate a Synthetic Bivariate Data for Testing Simple Linear Regression Using Excel

I come up with a solution as below. What do you think? Do you have any other way? Linear regression modeled as; y(i) = a + b*x(i) + e(i) a and b is constant, thus ...
Maaruf Bussri's user avatar
1 vote
0 answers
200 views

How to generate random values based on mean, standard deviation, skew and kurtosis in Python?

Given these values, is it possible to generate random values that conform to this distribution (using Python, but preferably without the SciPy package)? Statistic Value Mean 1.518 Std Dev 24.827 ...
m01010011's user avatar
  • 111
2 votes
0 answers
28 views

How can I introduce dependence (to varying degrees) into a synthetic dataset to measure the effect on my method?

I'm using a synthetic dataset in which I sample from three independent Bernoulli random variables x1, x2 and x3 with p=p1, p=p2 and p=p3 respectively. I wish to "introduce dependence," or ...
Vance's user avatar
  • 21
1 vote
0 answers
62 views

Synthetic data for PCA [closed]

I was trying to evaluate different algorithms for PCA like eigenvalue decomposition, SVD, Lanczos Algorithm, Power iteration. I also want to do some other analysis I could find any papers concerning ...
pppp_prs's user avatar
0 votes
0 answers
123 views

How can I model the multivariate probability distribution of a dataset with both continuous and discrete variables for sampling?

This might seem like a duplicate of the following link, but I think that one is asking how to create a completely new dataset with specific distributions, rather than how to model an existing dataset ...
quanty's user avatar
  • 252

15 30 50 per page
1
2 3 4 5
7