6
$\begingroup$

Let's assume a stochastic simulation or test with a control variable. The task is to visualize the distribution to demonstrate the effect that is being researched. The objective is to get smooth plot, without losing the resolution on important effects. For smoothness the plot needs to be interpolated.

Simulation or test can be done with different continuous values of a control variable or predictor $X$. The samples have noise, for there is natural variation with unknown distribution.

The control variable is interesting around a reasonable range. Within the range, a grid of $M$ points of control variable is sampled $N$ times. Total number of samples is thus $MN$.

There are two options:

  • Calculate a denser grid (greater $M$), with less times (smaller $N$)

  • Calculate sparser grid (smaller $M$), with more times (greater $N$)

Question: Are there any rules of thumb or theorems which would help in making the decision for better-looking results?

A denser grid will not help if the data is too noisy, but decreasing noise by having means of data with large $N$ will lead to poor resolution.

Will I even gain anything with a larger $N$? It is also about the method of choice for interpolation, but are there in general any preferable ways for analysis.

Background: In my case the results are from a simulation of a complex system, so there is not necessarily any analytic solutions that is right. Without analytic solution the interpolation has no right formula to estimate. Without some analytic solution for which some parameters are to be approximated, the methods need to rely only on the data; to be nonparametric.

The interpolation methods that are sensible are thus splines and nearest data-point like. The results are fairly continuous; small changes in a control variable lead to small changes in outcomes. The interpolation thus makes sense.

$\endgroup$
9
  • 2
    $\begingroup$ Your post is a bit unclear. You have a response random variable $Y$, and a predictor $X$. For each value of $X $, you can run a simulation and compute the corresponding $Y$. Right? Then, where does the random noise comes from? If it's a deterministic simulation (CFD, FEM, combustion, etc. ) you'll very always the same result, for a given value of $X$. Is it a stochastic simulation? Also, what exactly is $N$ ?You say that's the number of samples of $X$ inside s certain range. But then you say that a denser grid corresponds to small $N$, i.e., less samples? How's that possible? $\endgroup$
    – DeltaIV
    Commented Oct 4, 2016 at 6:10
  • 2
    $\begingroup$ Finally, how is interpolation related to all this? If you can compute the value of $Y$ for each value of $X$, you don't need to interpolate. Maybe the simulation is expensive, so, given a set of points ${(x_i,y_i)}_{i=1\dots N}$, you want to predict $Y$ at all possible values of $X$ in the "reasonable" range, without having to run new simulations. Is this correct? $\endgroup$
    – DeltaIV
    Commented Oct 4, 2016 at 6:14
  • 1
    $\begingroup$ When you say "the data are nonparametric", what do you mean? $\endgroup$
    – Glen_b
    Commented Oct 4, 2016 at 11:00
  • $\begingroup$ Yes, my problem is exactly as @DeltaIV says. I edited my problem for clarity. $\endgroup$ Commented Oct 5, 2016 at 6:03
  • 1
    $\begingroup$ Lets assume for the moment that the statement "the methods need to be nonparametric" is correct. That doesn't make the data in any sense nonparametric. (Data are just data, they never have parameters.) .... better to explain what you mean rather than coin a term very likely to be misinterpreted -- e.g. if you include in your question something along the lines of the explanation you just gave in comments (but without the conclusion that the data are nonparametric), that might help. $\endgroup$
    – Glen_b
    Commented Oct 5, 2016 at 9:07

0