Numerical approximation of the integral by using data

Question

I want to use the numerical approximation of the integral of a function given a list of data:

$$\int_a^bf(x)dx\approx\sum_{k=1}^N\frac{f(x_{k})+f(x_{k-1})}{2}(x_{k}-x_{k-1}),$$

where $f(x_0)=f(a)$ and $f(x_N)=f(b)$. I have the following list of data $(x_k,f(x_k))$:

Data1:={{0.0708461, 3.81099}, {0.152771, 6.5486}, {0.334735,4.56075}, {0.361943,3.44458}, {0.734831, -13.7032}, {0.881342, -15.2212}, {0.9695,-14.8131}, {1.23953, -13.3162}, {1.26711, -13.4099}, {1.33873, -13.9118}, {1.34442, -13.9659}, {1.50195,-15.9091}, {1.54436,-16.4444}, {1.66381, -17.5282}, {1.99997, -15.3791}, {2.12293,-13.5003}, {2.32644, -11.0439}, {2.42306, -10.3641}, {2.81243,-8.57635}, {3.02891, -6.8037}, {3.07009, -6.42764}, {4.06738,-2.08671}, {4.31106, -1.35298}, {4.49278, -1.02712}, {4.50307,-1.01467}, {4.63782, -0.892636}, {4.66641, -0.873371}, {4.68833,-0.859377}, {4.87293, -0.740146}, {4.95529, -0.67409}}

which I'd like to take to construct the result. I've tried RecurrenceTable and Interpolation, but I want to use only the given data.

EDIT:

I've tried

Sum[(Data1[[k, 2]] + Data1[[k - 1, 2]])/2(Data1[[k, 1]]-Data1[[k-1,1]]),{k,2,Length[Data1]}]

$\begingroup$ Mathematica v12.2 evalutes your code correctly! $\endgroup$
– Ulrich Neumann
Commented Jun 24 at 7:06 — Ulrich Neumann, Commented Jun 24 at 7:06
$\begingroup$ Your code gives the correct answer in V14.0.0, too! $\endgroup$
– Michael E2
Commented Jun 24 at 23:44 — Michael E2, Commented Jun 24 at 23:44

zeraoulia rafik · Accepted Answer · 2024-06-23 18:22:55Z

To compute the numerical approximation of the integral of a function given a list of data using the trapezoidal rule, you may need to follow the approach you described. Your method is essentially correct; however, there are a few minor issues in your code. The expression (Data1[[k, 2]] + Data1[[k - 1, 2]])/2(Data1[[k, 1]]-Data1[[k-1,1]]) is missing a multiplication operator between 1/2 and the interval length.

Here is the improved code:

Data1 = {{0.0708461, 3.81099}, {0.152771, 6.5486}, {0.334735, 4.56075}, 
         {0.361943, 3.44458}, {0.734831, -13.7032}, {0.881342, -15.2212}, 
         {0.9695, -14.8131}, {1.23953, -13.3162}, {1.26711, -13.4099}, 
         {1.33873, -13.9118}, {1.34442, -13.9659}, {1.50195, -15.9091}, 
         {1.54436, -16.4444}, {1.66381, -17.5282}, {1.99997, -15.3791}, 
         {2.12293, -13.5003}, {2.32644, -11.0439}, {2.42306, -10.3641}, 
         {2.81243, -8.57635}, {3.02891, -6.8037}, {3.07009, -6.42764}, 
         {4.06738, -2.08671}, {4.31106, -1.35298}, {4.49278, -1.02712}, 
         {4.50307, -1.01467}, {4.63782, -0.892636}, {4.66641, -0.873371}, 
         {4.68833, -0.859377}, {4.87293, -0.740146}, {4.95529, -0.67409}};

integral = Sum[(Data1[[k, 2]] + Data1[[k - 1, 2]])/2 * (Data1[[k, 1]] - Data1[[k - 1, 1]]), {k, 2, Length[Data1]}]

Print["The integral approximation is: ", integral]

"(Data1[[k, 2]] + Data1[[k - 1, 2]])/2(Data1[[k, 1]]-Data1[[k-1,1]]) is missing a multiplication operator between 1/2 and the interval length" — This statement is false. There is no need to insert the multiplication operator in Mathematica though there often is such a need in other languages. InputForm[] will show all multiplication operators, though. Check InputForm[ Hold[(Data1[[k, 2]] + Data1[[k - 1, 2]])/2 (Data1[[k, 1]] - Data1[[k - 1, 1]])]] to see that the multiplication is automatically interpreted. — Michael E2, Commented Jun 24 at 23:48

Jocelyn Minini · Accepted Answer · 2024-06-23 20:06:12Z

11

Another approach is to combine Interpolation with NIntegrate.

Data1 = {{0.0708461, 3.81099}, {0.152771, 6.5486}, {0.334735, 
4.56075}, {0.361943, 
3.44458}, {0.734831, -13.7032}, {0.881342, -15.2212}, {0.9695, \
-14.8131}, {1.23953, -13.3162}, {1.26711, -13.4099}, {1.33873, \
-13.9118}, {1.34442, -13.9659}, {1.50195, -15.9091}, {1.54436, \
-16.4444}, {1.66381, -17.5282}, {1.99997, -15.3791}, {2.12293, \
-13.5003}, {2.32644, -11.0439}, {2.42306, -10.3641}, {2.81243, \
-8.57635}, {3.02891, -6.8037}, {3.07009, -6.42764}, {4.06738, \
-2.08671}, {4.31106, -1.35298}, {4.49278, -1.02712}, {4.50307, \
-1.01467}, {4.63782, -0.892636}, {4.66641, -0.873371}, {4.68833, \
-0.859377}, {4.87293, -0.740146}, {4.95529, -0.67409}};

f = Interpolation[Data1, InterpolationOrder -> 1];

Int = NIntegrate[f[x], Flatten@{x, MinMax[Data1[[All, 1]]]}]

Print["The integral approximation is: ", Int]

answered Jun 23 at 20:06

Jocelyn Minini

7223 silver badges9 bronze badges

3

$\begingroup$ I'd suggest you use Integrate instead of NIntegrate on interpolations. The results will be more accurate because no sampling is involved. Integrate[f[x], Flatten@{x, f["Domain"]}] $\endgroup$
– Roman
Commented Jun 24 at 10:41
2

$\begingroup$ What @Roman suggests is in effect the built-in trapezoidal rule and should be the accepted answer. In fact, Integrate[f[x], x] produces an InterpolatingFunction antiderivative of an InterpolatingFunction f[x] over its domain (with f[a] == 0, where a is the lower limit of the domain). This useful fact in not sufficiently emphasized in the documentation, it seems, although it is now pointed out, once for an indefinite integral and once for this type of definite integral in the documentation for Integrate[]. $\endgroup$
– Michael E2
Commented Jun 24 at 14:16
$\begingroup$ Yes, exactly. Very well put, @MichaelE2! $\endgroup$
– Roman
Commented Jun 24 at 15:23

Add a comment |

ubpdqn · Accepted Answer · 2024-06-24 10:23:31Z

Using:

Data1 = {{0.0708461, 3.81099}, {0.152771, 6.5486}, {0.334735, 4.56075}, 
         {0.361943, 3.44458}, {0.734831, -13.7032}, {0.881342, -15.2212}, 
         {0.9695, -14.8131}, {1.23953, -13.3162}, {1.26711, -13.4099}, 
         {1.33873, -13.9118}, {1.34442, -13.9659}, {1.50195, -15.9091}, 
         {1.54436, -16.4444}, {1.66381, -17.5282}, {1.99997, -15.3791}, 
         {2.12293, -13.5003}, {2.32644, -11.0439}, {2.42306, -10.3641}, 
         {2.81243, -8.57635}, {3.02891, -6.8037}, {3.07009, -6.42764}, 
         {4.06738, -2.08671}, {4.31106, -1.35298}, {4.49278, -1.02712}, 
         {4.50307, -1.01467}, {4.63782, -0.892636}, {4.66641, -0.873371}, 
         {4.68833, -0.859377}, {4.87293, -0.740146}, {4.95529, -0.67409}};

You can use Differences and BlockMap, e.g.

Differences[#1] . BlockMap[Mean, #2, 2, 1] & @@ Transpose[Data1]

-> -35.8367

Nice answer. Not posting as my own answer because it's not unique enough, but you could also swap BlockMap for MovingAverage if you wanted to. — ydd, Commented Jun 24 at 12:16

Michael E2 · Accepted Answer · 2024-06-24 18:25:09Z

I think @Roman's modification of @JocelynMinini's use of Interpolation + Integrate is easy to code, easy to understand, and accurate; and on small data sets, it is probably the best way to code the solution.

If performance matters, then the following is fast and presented in functional style, with each step following the postfix operator //:

Developer`ToPackedArray@data1 // (* see below about packing *)
  Transpose //
 Apply[{x, y} |-> (Most@y + Rest@y).Differences@x / 2] //

(* OR *)

Developer`ToPackedArray@data2 //
  Transpose //
 Apply[{x, y} |-> ListConvolve[{0.5, 0.5}, y] . Differences@x]

Developer`ToPackedArray may be omitted if the data is known to be packed. If the data has mixed types, but can be represented in machine floating-point, then use Developer`ToPackedArray@N@data1 or Developer`ToPackedArray[data1, Real].

ListConvolve[{0.5, 0.5}, y] is nearly as efficient as Most@y + Rest@y, sometimes beating it in timings. I omit it from the discussion of performance below. It can be sped up slightly by packing the kernel {0.5, 0.5}. And ListConvolve[{0.5, 0.5}, y].Differences[x] may (or may not) be a familiar trapezoidal rule formula.

Performance issues.

Developer`ToPackedArray@data1 //
   Transpose //
  Apply[{x, y} |-> (Most@y + Rest@y).Differences@x / 2] //
 RepeatedTiming

Interpolation[data1, InterpolationOrder -> 1] // (* @Roman/@Jocelyn *)
  Integrate[#[x], Flatten@{x, #["Domain"]}] & //
 RepeatedTiming

Sum[                                             (* @zeraoulia rafik *)
 (data1[[k, 2]] + data1[[k - 1, 2]])/
    2*(data1[[k, 1]] - data1[[k - 1, 1]]), {k, 2, Length[data1]}] //
 RepeatedTiming

Total[(data1[[;; -2, 2]] + data1[[2 ;;, 2]])/    (* @so_sure *)
    2  (data1[[2 ;;, 1]] - data1[[;; -2, 1]])] //
 RepeatedTiming

Differences[#1] . BlockMap[Mean, #2, 2, 1] & @@  (* @ubpdqn *)
  Transpose[data1] //
 RepeatedTiming
(*
{5.72664*10^-6, -35.8367}  <-- Dot[]
{0.0000518135, -35.8367}   <-- My rec of @Jocelyn's: slowest, but not slow
{0.0000450536, -35.8367}   <-- Sum[]
{0.000042, -35.8367}       <-- Total[] is usually faster than Sum[]
{0.0000493315, -35.8367}   <-- BlockMap[]
*)

A larger data set, in which performance is more evident:

SeedRandom[0];
data2 = SortBy[RandomReal[{-1, 1.1}, {10^4, 2}], First];

Developer`ToPackedArray@data2 // (* ToPackedArray[] not needed *)
   Transpose //
  Apply[{x, y} |-> (Most@y + Rest@y).Differences@x / 2] //
RepeatedTiming

Interpolation[data2, InterpolationOrder -> 1] //
  Integrate[#[x], Flatten@{x, #["Domain"]}] & //
 RepeatedTiming

Sum[(data2[[k, 2]] + data2[[k - 1, 2]])/
    2*(data2[[k, 1]] - data2[[k - 1, 1]]), {k, 2, Length[data2]}] //
 RepeatedTiming

Total[(data2[[;; -2, 2]] + data2[[2 ;;, 2]])/
    2  (data2[[2 ;;, 1]] - data2[[;; -2, 1]])] // RepeatedTiming

Differences[#1] . BlockMap[Mean, #2, 2, 1] & @@ Transpose[data2] //
 RepeatedTiming
(*    |
{0.0000663423, 0.125917}     <-- Very fast   (1st pl.)
{0.0158068, 0.125917}        <-- My rec is starting to look slow
{0.0239918, 0.125917}        <-- Sum[], quite the slowest
{0.000302133, 0.125917}      <-- Total[]     (2nd pl.)
{0.00250427, 0.125917}       <-- BlockMap[]  (3rd pl.)
      |                                                       *)

Note on Sum[]. Sum[] is principally a symbolic summation tool. It will symbolically analyze its argument and try to choose a summation method. In the case of a definite sum, it will choose the "Procedural" method if the number of terms is small, which method sums terms step by step. It is not very fast at this, at present, as one can see in the timings. If the number of terms is large, Sum[] may try to find a general formula it can apply instead of adding up all the terms (for instance, Sum[2. i, {i, 10^12}]). Sum[] falls back on "Procedural" if all else fails, which is what happens in the cases above. (In fact, it probably does that quickly here, since the summand depends on the data list.) In any case, the function provided for summing data in Mathematica is Total[]. If one thinks the advantage of Sum[] is to use a formula for the summand, then Total@Table[formula, {k, a, b}] will do the same thing, usually much faster. For instance, compare 0.0005 sec. of Total@Table[] below with the 0.024 sec. of Sum[] above, almost 50 times the speed:

Total@Table[
   (data2[[k, 2]] + data2[[k - 1, 2]])/
     2*(data2[[k, 1]] - data2[[k - 1, 1]]),
   {k, 2, Length[data2]}] // RepeatedTiming

(*  {0.000506291, 0.125917}  *)

The documentation for Sum[] points out the equivalence of Sum[..] and Total[Table[..]], but it does not give advice on use-case for each. Hence this somewhat lengthy answer for an elementary problem.

One can find the bits of advice given in this answer in other posts here over the last decade, which is how I learned most of it. We have many new members since then, and with almost a hundred thousand Q&As, I'm not sure how to share the accumulated knowledge contained in the site. Over 2,000 have a score of 20 or more; 8,000 with 10+ votes. It's getting hard for me to find things. I can hardly expect others to get up to speed in a short time by going through the site.

Very well-written answer. It seems to me that WRI need to be more explicit about their implementation details, or at least rewrite some of the documentation with an eye towards best practices for performance. — Glenn Welch, Commented Jun 24 at 19:49
@MichaelE2 thank you for this practical insight. I had noticed my answer had poor performance. Your answer provides the insights. +1 of course. What is “under the hood” affects choices and I need to think about it more. — ubpdqn, Commented Jun 26 at 23:12

so_sure · Accepted Answer · 2024-06-24 06:57:20Z

Here's a faster approach using Total.

Data1 = {{0.0708461, 3.81099}, {0.152771, 6.5486}, {0.334735, 
    4.56075}, {0.361943, 
    3.44458}, {0.734831, -13.7032}, {0.881342, -15.2212}, {0.9695, \
-14.8131}, {1.23953, -13.3162}, {1.26711, -13.4099}, {1.33873, \
-13.9118}, {1.34442, -13.9659}, {1.50195, -15.9091}, {1.54436, \
-16.4444}, {1.66381, -17.5282}, {1.99997, -15.3791}, {2.12293, \
-13.5003}, {2.32644, -11.0439}, {2.42306, -10.3641}, {2.81243, \
-8.57635}, {3.02891, -6.8037}, {3.07009, -6.42764}, {4.06738, \
-2.08671}, {4.31106, -1.35298}, {4.49278, -1.02712}, {4.50307, \
-1.01467}, {4.63782, -0.892636}, {4.66641, -0.873371}, {4.68833, \
-0.859377}, {4.87293, -0.740146}, {4.95529, -0.67409}};
 

Total[(Data1[[;; -2, 2]] + Data1[[2 ;;, 2]])/
    2 (Data1[[2 ;;, 1]] - Data1[[;; -2, 1]])] // AbsoluteTiming

which gives

{0.00003, -35.8367}

In comparison,

Sum[(Data1[[k, 2]] + Data1[[k - 1, 2]])/
     2 (Data1[[k, 1]] - Data1[[k - 1, 1]]), {k, 2, Length[Data1]}] // 
  AbsoluteTiming

gives

{0.000065, -35.8367}

If you are dealing with large datasets, using Total will be more efficient.

E. Chan-López · Accepted Answer · 2024-06-24 19:30:53Z

Data1 = {{0.0708461, 3.81099}, {0.152771, 6.5486}, {0.334735, 4.56075}, 
     {0.361943, 3.44458}, {0.734831, -13.7032}, {0.881342, -15.2212}, 
     {0.9695, -14.8131}, {1.23953, -13.3162}, {1.26711, -13.4099}, 
     {1.33873, -13.9118}, {1.34442, -13.9659}, {1.50195, -15.9091}, 
     {1.54436, -16.4444}, {1.66381, -17.5282}, {1.99997, -15.3791}, 
     {2.12293, -13.5003}, {2.32644, -11.0439}, {2.42306, -10.3641}, 
     {2.81243, -8.57635}, {3.02891, -6.8037}, {3.07009, -6.42764}, 
     {4.06738, -2.08671}, {4.31106, -1.35298}, {4.49278, -1.02712}, 
     {4.50307, -1.01467}, {4.63782, -0.892636}, {4.66641, -0.873371}, 
     {4.68833, -0.859377}, {4.87293, -0.740146}, {4.95529, -0.67409}};

Following @ubpdqn's idea, you can also use Differences and MovingMap, e.g.

Differences[#1] . MovingMap[Mean, #2, 1] & @@ Transpose[Data1]

-35.8367

Or, as @ydd points out, you can also use MovingAverage as follows:

With[{diff = Differences[Data1[[All, 1]]],                              
      aver = MovingAverage[Data1[[All, 2]], 2]
      }, Total[MapThread[(#1*#2) &, {diff, aver}]]]

-35.8367

eldo · Accepted Answer · 2024-06-25 00:45:38Z

data = {{0.0708461, 3.81099}, {0.152771, 6.5486}, {0.334735, 4.56075}, 
     {0.361943, 3.44458}, {0.734831, -13.7032}, {0.881342, -15.2212}, 
     {0.9695, -14.8131}, {1.23953, -13.3162}, {1.26711, -13.4099}, 
     {1.33873, -13.9118}, {1.34442, -13.9659}, {1.50195, -15.9091}, 
     {1.54436, -16.4444}, {1.66381, -17.5282}, {1.99997, -15.3791}, 
     {2.12293, -13.5003}, {2.32644, -11.0439}, {2.42306, -10.3641}, 
     {2.81243, -8.57635}, {3.02891, -6.8037}, {3.07009, -6.42764}, 
     {4.06738, -2.08671}, {4.31106, -1.35298}, {4.49278, -1.02712}, 
     {4.50307, -1.01467}, {4.63782, -0.892636}, {4.66641, -0.873371}, 
     {4.68833, -0.859377}, {4.87293, -0.740146}, {4.95529, -0.67409}};

A variant of E. Chan-López' second answer using Query

Dot @@ Query[{1 -> Differences, 2 -> (MovingAverage[#, 2] &)}] @ Transpose[data]

-35.8367

$\begingroup$ This variant is the nicest! ;) $\endgroup$
– E. Chan-López
Commented Jun 25 at 1:31 — E. Chan-López, Commented Jun 25 at 1:31

eldo · Accepted Answer · 2024-07-06 21:51:28Z

data = {{0.0708461, 3.81099}, {0.152771, 6.5486}, {0.334735, 4.56075}, 
     {0.361943, 3.44458}, {0.734831, -13.7032}, {0.881342, -15.2212}, 
     {0.9695, -14.8131}, {1.23953, -13.3162}, {1.26711, -13.4099}, 
     {1.33873, -13.9118}, {1.34442, -13.9659}, {1.50195, -15.9091}, 
     {1.54436, -16.4444}, {1.66381, -17.5282}, {1.99997, -15.3791}, 
     {2.12293, -13.5003}, {2.32644, -11.0439}, {2.42306, -10.3641}, 
     {2.81243, -8.57635}, {3.02891, -6.8037}, {3.07009, -6.42764}, 
     {4.06738, -2.08671}, {4.31106, -1.35298}, {4.49278, -1.02712}, 
     {4.50307, -1.01467}, {4.63782, -0.892636}, {4.66641, -0.873371}, 
     {4.68833, -0.859377}, {4.87293, -0.740146}, {4.95529, -0.67409}};

Using DiscreteIntegralPlot by Dennis M Schneider

DiscreteIntegralPlot = ResourceFunction["DiscreteIntegralPlot"];

DiscreteIntegralPlot[data, "Trap"]

To only get the area value:

DiscreteIntegralPlot[data, "Trap", "DrawGraph" -> False]

-35.8367

One of the other methods:

DiscreteIntegralPlot[data, "Left"]

Stack Exchange Network

Numerical approximation of the integral by using data

8 Answers 8

Not the answer you're looking for? Browse other questions tagged
calculus-and-analysis
numerical-integration
algorithm
or ask your own question.

Linked

Hot Network Questions

Numerical approximation of the integral by using data

8 Answers 8

Not the answer you're looking for? Browse other questions tagged calculus-and-analysisnumerical-integrationalgorithm or ask your own question.

Linked

Related

Hot Network Questions

Not the answer you're looking for? Browse other questions tagged
calculus-and-analysis
numerical-integration
algorithm
or ask your own question.