Interpolation and Sample size when Visualizing distributions

Ask Question

Asked 7 years, 9 months ago

Modified 7 years, 4 months ago

Viewed 505 times

Let's assume a stochastic simulation or test with a control variable. The task is to visualize the distribution to demonstrate the effect that is being researched. The objective is to get smooth plot, without losing the resolution on important effects. For smoothness the plot needs to be interpolated.

Simulation or test can be done with different continuous values of a control variable or predictor $X$. The samples have noise, for there is natural variation with unknown distribution.

The control variable is interesting around a reasonable range. Within the range, a grid of $M$ points of control variable is sampled $N$ times. Total number of samples is thus $MN$.

There are two options:

Calculate a denser grid (greater $M$), with less times (smaller $N$)
Calculate sparser grid (smaller $M$), with more times (greater $N$)

Question: Are there any rules of thumb or theorems which would help in making the decision for better-looking results?

A denser grid will not help if the data is too noisy, but decreasing noise by having means of data with large $N$ will lead to poor resolution.

Will I even gain anything with a larger $N$? It is also about the method of choice for interpolation, but are there in general any preferable ways for analysis.

Background: In my case the results are from a simulation of a complex system, so there is not necessarily any analytic solutions that is right. Without analytic solution the interpolation has no right formula to estimate. Without some analytic solution for which some parameters are to be approximated, the methods need to rely only on the data; to be nonparametric.

The interpolation methods that are sensible are thus splines and nearest data-point like. The results are fairly continuous; small changes in a control variable lead to small changes in outcomes. The interpolation thus makes sense.

edited Mar 19, 2017 at 18:18

Chill2Macht

6,3694 gold badges30 silver badges61 bronze badges

asked Oct 3, 2016 at 8:14

user3644640

4462 silver badges7 bronze badges

2

$\begingroup$ Your post is a bit unclear. You have a response random variable $Y$, and a predictor $X$. For each value of $X $, you can run a simulation and compute the corresponding $Y$. Right? Then, where does the random noise comes from? If it's a deterministic simulation (CFD, FEM, combustion, etc. ) you'll very always the same result, for a given value of $X$. Is it a stochastic simulation? Also, what exactly is $N$ ?You say that's the number of samples of $X$ inside s certain range. But then you say that a denser grid corresponds to small $N$, i.e., less samples? How's that possible? $\endgroup$
– DeltaIV
Commented Oct 4, 2016 at 6:10
2

$\begingroup$ Finally, how is interpolation related to all this? If you can compute the value of $Y$ for each value of $X$, you don't need to interpolate. Maybe the simulation is expensive, so, given a set of points ${(x_i,y_i)}_{i=1\dots N}$, you want to predict $Y$ at all possible values of $X$ in the "reasonable" range, without having to run new simulations. Is this correct? $\endgroup$
– DeltaIV
Commented Oct 4, 2016 at 6:14
1

$\begingroup$ When you say "the data are nonparametric", what do you mean? $\endgroup$
– Glen_b
Commented Oct 4, 2016 at 11:00
$\begingroup$ Yes, my problem is exactly as @DeltaIV says. I edited my problem for clarity. $\endgroup$
– user3644640
Commented Oct 5, 2016 at 6:03
1

$\begingroup$ Lets assume for the moment that the statement "the methods need to be nonparametric" is correct. That doesn't make the data in any sense nonparametric. (Data are just data, they never have parameters.) .... better to explain what you mean rather than coin a term very likely to be misinterpreted -- e.g. if you include in your question something along the lines of the explanation you just gave in comments (but without the conclusion that the data are nonparametric), that might help. $\endgroup$
– Glen_b
Commented Oct 5, 2016 at 9:07

| Show 4 more comments

Stack Exchange Network

Interpolation and Sample size when Visualizing distributions

0

Browse other questions tagged
data-visualization
nonparametric
simulation
interpolation
or ask your own question.

Hot Network Questions

Interpolation and Sample size when Visualizing distributions

0

Know someone who can answer? Share a link to this question via email, Twitter, or Facebook.

Browse other questions tagged data-visualizationnonparametricsimulationinterpolation or ask your own question.

Related

Hot Network Questions

Browse other questions tagged
data-visualization
nonparametric
simulation
interpolation
or ask your own question.