Fastest way to find the area under a curve which is represented by a list of points?

Question

For a piece of software I am writing I need to find the area under a curve that is collected as a list of points that make it up. I am trying to determine the fastest way to get the area.

The only option I see is to approximate a function that represents the list of points and integrate over it, though I was wondering if there were any methods that worked on the points directly? Hopefully being a bit faster and possible more exact?

About the data: The $Y$ values of the points will always be positive, and can possibly be wildly far apart in value. The $X$ values will almost never be evenly spaced and usually be some floating point number.

EDIT: Also. The data is first recorded as a list of points where the $X$ values are spaced evenly. This data is then calibrated based on some defined calibration function. This is what causes the $X$ values to change their spacing.

Knowing this function and the pre-calibration values, could there be some sort of combination of the Trapezoidal Rule and this calibration function to get an even more accurate area than simply using the Trapezoidal Rule on the calibrated data?

Regarding the question in your edit, Do you know anything else about the function between the sampling points? Do you know the function is positive everywhere? increasing or decreasing? continuous? differentiable? smooth? slowly varying? If not, then I want you to imagine that the function wiggles around wildly everywhere and think about how we could ever know that just from the set of samples. — Eric Towers, Commented Mar 23, 2016 at 23:19

Nathaniel Bubis · Accepted Answer · 2016-03-23 13:17:54Z

4

It really depends on the accuracy and what you know about the function. If you can assume your function is fairly smooth$^*$ between your sampled points, you can use the Trapezoidal Rule to quickly find the area. That is, if you have a bunch of points ${x_i, y_i}$, you would first sort them in increasing $x$ order and then write:

$$\text{Area} = \sum_i \frac{(x_{i+1}-x_i)(y_{i+1} + y_i)}{2}$$

$^*$ Here "smooth" means that the second derivative is small, or in other words, that you function is reasonably approximated by passing a series of straight lines through your points.

answered Mar 23, 2016 at 13:17

Nathaniel Bubis

33.1k10 gold badges84 silver badges143 bronze badges

1

$\begingroup$ If the $x_i$ are evenly spaced, as is often the case, this simplifies considerably. Let $\Delta x$ be the spacing, then it is $\Delta x ((\sum_i y_i)-\frac 12(y_1+y_{i+1}))$. A good approach $\endgroup$
– Ross Millikan
Commented Mar 23, 2016 at 14:12
$\begingroup$ @RossMillikan - of course, though the OP stated that his data is not evenly spaced. $\endgroup$
– Nathaniel Bubis
Commented Mar 23, 2016 at 14:16

Add a comment |

Eric Towers · Accepted Answer · 2016-03-23 13:18:05Z

2

If the $x$ values are evenly spaced, average the $y$ values and multiply by the $x$ spacing and then by one less than the number of points.

For instance, if your $y$ values are $1$ and $3$ and you $x$ spacing is $5$, you get $\dfrac{1+3}{2} \times 5 \times (2-1) = 10$.

If your $x$-spacings are not even, you should do each trapezoid by itself -- basically summing up each neighbouring pair of points in exactly the way I did the pair in the example above (but you can drop the "$(2-1)$" because they're all lists of two points).

answered Mar 23, 2016 at 13:18

Eric Towers

67.8k3 gold badges50 silver badges120 bronze badges

$\begingroup$ I assume this method is properly named the Trapezodial Rule as nbubis stated in his answer? I guess this would make the most sense. Seeing as the data is never computed from a known function, but measured from some hardware as a list of points, I guess the best I could do in accuracy is the Trapezoidal Rule after reading about it just now. $\endgroup$
– KDecker
Commented Mar 23, 2016 at 13:22
$\begingroup$ I have also added a small edit to the question. $\endgroup$
– KDecker
Commented Mar 23, 2016 at 13:26

Add a comment |

Will Fisher · Accepted Answer · 2016-03-23 13:25:45Z

There are a few ways that I see this being possible. Some methods:

Riemann Sums
A very simple way would be to of course just do a simple Riemann Sum on the data points. Say you have $n$ data points in the form $(x_i,y_i)$. Then a right Riemann Sum would look something like $$\sum_{i=2}^{n}(x_i - x_{i-1})y_i$$
Polynomial Interpolation
Polynomial interpolation can be used to fit a polynomial of degree $n$ to $n+1$ data points. From there you could just integrate the polynomial from the lowest to highest x value.

Least Squares
Say you have a specific function in mind, maybe a quadratic, then least squares tells you that you should minimize the sum of the square deviations from the best fit function and your data points. So you would want to minimize (for a quadratic) $$S(a,b,c)=\sum_{i=1}^{n} ([ax_i^2+bx_i+c] - y_i)^2$$ which just comes down to minimizing a three variable function (which comes down to solving a linear system in the end for quadratics). Then get your function and integrate.

user3209698 · Accepted Answer · 2016-03-23 13:23:15Z

My experience with numerical integration is that one of the fastest way of integration is the trapezoidal rule. However, if you have a huge ammount of points it becomes slow. In this case, if the function is smooth, you can interpolate (using splines, for example) an smaller collection of points and to integrate them (using again the trapezoidal rule).

If you are more concern about precision, you can use the Simpson integration method, which are slower but more accurate.

Ian · Accepted Answer · 2016-03-23 13:35:03Z

A good answer to this question depends on the situation where the data came from. If these points are coming from a smooth function, then a classical quadrature rule like the trapezoidal rule or Simpson rule will fare well. If not, then these will fare as well as, if not worse than, a simple rectangle rule. Monte Carlo integration will most likely not perform well, it is usually not a good technique in one dimension.

All of these techniques are based on constructing an interpolant of some kind and then integrating that (either exactly or approximately). None of them really integrate a discrete set of points, because that is not really a well-defined problem. Also, these methods can all be defined with non-uniform spacing in the domain, with slightly more difficulty than in the uniform context.

One thing you should absolutely not do, provided your number of points is larger than about 5, is construct a single polynomial interpolant and then integrate that. This is basically because high-degree polynomial interpolation is unstable in a certain precise sense.

I am not sure how to use the pre- and post-calibration information without knowing exactly what the calibration operation entails.

Stack Exchange Network

Fastest way to find the area under a curve which is represented by a list of points?

5 Answers 5

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged
integration
.

Hot Network Questions

Fastest way to find the area under a curve which is represented by a list of points?

5 Answers 5

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged integration.

Related

Hot Network Questions

Not the answer you're looking for? Browse other questions tagged
integration
.