2
$\begingroup$

For a piece of software I am writing I need to find the area under a curve that is collected as a list of points that make it up. I am trying to determine the fastest way to get the area.

The only option I see is to approximate a function that represents the list of points and integrate over it, though I was wondering if there were any methods that worked on the points directly? Hopefully being a bit faster and possible more exact?

About the data: The $Y$ values of the points will always be positive, and can possibly be wildly far apart in value. The $X$ values will almost never be evenly spaced and usually be some floating point number.

EDIT: Also. The data is first recorded as a list of points where the $X$ values are spaced evenly. This data is then calibrated based on some defined calibration function. This is what causes the $X$ values to change their spacing.

Knowing this function and the pre-calibration values, could there be some sort of combination of the Trapezoidal Rule and this calibration function to get an even more accurate area than simply using the Trapezoidal Rule on the calibrated data?

$\endgroup$
5
  • $\begingroup$ Are the $x$ values evenly spaced? $\endgroup$ Commented Mar 23, 2016 at 13:15
  • $\begingroup$ No, the $X$ values will almost never be evenly spaced. $\endgroup$
    – KDecker
    Commented Mar 23, 2016 at 13:17
  • $\begingroup$ The Monte Carlo method seems to work well in this case. $\endgroup$
    – James
    Commented Mar 23, 2016 at 13:18
  • $\begingroup$ en.wikipedia.org/wiki/Shoelace_formula $\endgroup$ Commented Mar 23, 2016 at 13:28
  • $\begingroup$ Regarding the question in your edit, Do you know anything else about the function between the sampling points? Do you know the function is positive everywhere? increasing or decreasing? continuous? differentiable? smooth? slowly varying? If not, then I want you to imagine that the function wiggles around wildly everywhere and think about how we could ever know that just from the set of samples. $\endgroup$ Commented Mar 23, 2016 at 23:19

5 Answers 5

4
$\begingroup$

It really depends on the accuracy and what you know about the function. If you can assume your function is fairly smooth$^*$ between your sampled points, you can use the Trapezoidal Rule to quickly find the area. That is, if you have a bunch of points ${x_i, y_i}$, you would first sort them in increasing $x$ order and then write:

$$\text{Area} = \sum_i \frac{(x_{i+1}-x_i)(y_{i+1} + y_i)}{2}$$

$^*$ Here "smooth" means that the second derivative is small, or in other words, that you function is reasonably approximated by passing a series of straight lines through your points.

$\endgroup$
2
  • 1
    $\begingroup$ If the $x_i$ are evenly spaced, as is often the case, this simplifies considerably. Let $\Delta x$ be the spacing, then it is $\Delta x ((\sum_i y_i)-\frac 12(y_1+y_{i+1}))$. A good approach $\endgroup$ Commented Mar 23, 2016 at 14:12
  • $\begingroup$ @RossMillikan - of course, though the OP stated that his data is not evenly spaced. $\endgroup$ Commented Mar 23, 2016 at 14:16
2
$\begingroup$

If the $x$ values are evenly spaced, average the $y$ values and multiply by the $x$ spacing and then by one less than the number of points.

For instance, if your $y$ values are $1$ and $3$ and you $x$ spacing is $5$, you get $\dfrac{1+3}{2} \times 5 \times (2-1) = 10$.

If your $x$-spacings are not even, you should do each trapezoid by itself -- basically summing up each neighbouring pair of points in exactly the way I did the pair in the example above (but you can drop the "$(2-1)$" because they're all lists of two points).

$\endgroup$
2
  • $\begingroup$ I assume this method is properly named the Trapezodial Rule as nbubis stated in his answer? I guess this would make the most sense. Seeing as the data is never computed from a known function, but measured from some hardware as a list of points, I guess the best I could do in accuracy is the Trapezoidal Rule after reading about it just now. $\endgroup$
    – KDecker
    Commented Mar 23, 2016 at 13:22
  • $\begingroup$ I have also added a small edit to the question. $\endgroup$
    – KDecker
    Commented Mar 23, 2016 at 13:26
2
$\begingroup$

There are a few ways that I see this being possible. Some methods:

Riemann Sums
A very simple way would be to of course just do a simple Riemann Sum on the data points. Say you have $n$ data points in the form $(x_i,y_i)$. Then a right Riemann Sum would look something like $$\sum_{i=2}^{n}(x_i - x_{i-1})y_i$$
Polynomial Interpolation
Polynomial interpolation can be used to fit a polynomial of degree $n$ to $n+1$ data points. From there you could just integrate the polynomial from the lowest to highest x value.

Least Squares
Say you have a specific function in mind, maybe a quadratic, then least squares tells you that you should minimize the sum of the square deviations from the best fit function and your data points. So you would want to minimize (for a quadratic) $$S(a,b,c)=\sum_{i=1}^{n} ([ax_i^2+bx_i+c] - y_i)^2$$ which just comes down to minimizing a three variable function (which comes down to solving a linear system in the end for quadratics). Then get your function and integrate.

$\endgroup$
1
$\begingroup$

My experience with numerical integration is that one of the fastest way of integration is the trapezoidal rule. However, if you have a huge ammount of points it becomes slow. In this case, if the function is smooth, you can interpolate (using splines, for example) an smaller collection of points and to integrate them (using again the trapezoidal rule).

If you are more concern about precision, you can use the Simpson integration method, which are slower but more accurate.

$\endgroup$
1
$\begingroup$

A good answer to this question depends on the situation where the data came from. If these points are coming from a smooth function, then a classical quadrature rule like the trapezoidal rule or Simpson rule will fare well. If not, then these will fare as well as, if not worse than, a simple rectangle rule. Monte Carlo integration will most likely not perform well, it is usually not a good technique in one dimension.

All of these techniques are based on constructing an interpolant of some kind and then integrating that (either exactly or approximately). None of them really integrate a discrete set of points, because that is not really a well-defined problem. Also, these methods can all be defined with non-uniform spacing in the domain, with slightly more difficulty than in the uniform context.

One thing you should absolutely not do, provided your number of points is larger than about 5, is construct a single polynomial interpolant and then integrate that. This is basically because high-degree polynomial interpolation is unstable in a certain precise sense.

I am not sure how to use the pre- and post-calibration information without knowing exactly what the calibration operation entails.

$\endgroup$

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .