I think @Roman's modification of @JocelynMinini's use of Interpolation
+ Integrate
is easy to code, easy to understand, and accurate; and on small data sets, it is probably the best way to code the solution.
If performance matters, then the following is fast and presented in functional style, with each step following the postfix operator //
:
Developer`ToPackedArray@data1 // (* see below about packing *)
Transpose //
Apply[{x, y} |-> (Most@y + Rest@y).Differences@x / 2] //
(* OR *)
Developer`ToPackedArray@data2 //
Transpose //
Apply[{x, y} |-> ListConvolve[{0.5, 0.5}, y] . Differences@x]
Developer`ToPackedArray
may be omitted if the data is known to be packed. If the data has mixed types, but can be represented in machine floating-point, then use Developer`ToPackedArray@N@data1
or Developer`ToPackedArray[data1, Real]
.
ListConvolve[{0.5, 0.5}, y]
is nearly as efficient as Most@y + Rest@y
, sometimes beating it in timings. I omit it from the discussion of performance below. It can be sped up slightly by packing the kernel {0.5, 0.5}
.
And ListConvolve[{0.5, 0.5}, y].Differences[x]
may (or may not) be a familiar trapezoidal rule formula.
Performance issues.
Developer`ToPackedArray@data1 //
Transpose //
Apply[{x, y} |-> (Most@y + Rest@y).Differences@x / 2] //
RepeatedTiming
Interpolation[data1, InterpolationOrder -> 1] // (* @Roman/@Jocelyn *)
Integrate[#[x], Flatten@{x, #["Domain"]}] & //
RepeatedTiming
Sum[ (* @zeraoulia rafik *)
(data1[[k, 2]] + data1[[k - 1, 2]])/
2*(data1[[k, 1]] - data1[[k - 1, 1]]), {k, 2, Length[data1]}] //
RepeatedTiming
Total[(data1[[;; -2, 2]] + data1[[2 ;;, 2]])/ (* @so_sure *)
2 (data1[[2 ;;, 1]] - data1[[;; -2, 1]])] //
RepeatedTiming
Differences[#1] . BlockMap[Mean, #2, 2, 1] & @@ (* @ubpdqn *)
Transpose[data1] //
RepeatedTiming
(*
{5.72664*10^-6, -35.8367} <-- Dot[]
{0.0000518135, -35.8367} <-- My rec of @Jocelyn's: slowest, but not slow
{0.0000450536, -35.8367} <-- Sum[]
{0.000042, -35.8367} <-- Total[] is usually faster than Sum[]
{0.0000493315, -35.8367} <-- BlockMap[]
*)
A larger data set, in which performance is more evident:
SeedRandom[0];
data2 = SortBy[RandomReal[{-1, 1.1}, {10^4, 2}], First];
Developer`ToPackedArray@data2 // (* ToPackedArray[] not needed *)
Transpose //
Apply[{x, y} |-> (Most@y + Rest@y).Differences@x / 2] //
RepeatedTiming
Interpolation[data2, InterpolationOrder -> 1] //
Integrate[#[x], Flatten@{x, #["Domain"]}] & //
RepeatedTiming
Sum[(data2[[k, 2]] + data2[[k - 1, 2]])/
2*(data2[[k, 1]] - data2[[k - 1, 1]]), {k, 2, Length[data2]}] //
RepeatedTiming
Total[(data2[[;; -2, 2]] + data2[[2 ;;, 2]])/
2 (data2[[2 ;;, 1]] - data2[[;; -2, 1]])] // RepeatedTiming
Differences[#1] . BlockMap[Mean, #2, 2, 1] & @@ Transpose[data2] //
RepeatedTiming
(* |
{0.0000663423, 0.125917} <-- Very fast (1st pl.)
{0.0158068, 0.125917} <-- My rec is starting to look slow
{0.0239918, 0.125917} <-- Sum[], quite the slowest
{0.000302133, 0.125917} <-- Total[] (2nd pl.)
{0.00250427, 0.125917} <-- BlockMap[] (3rd pl.)
| *)
Note on Sum[]
. Sum[]
is principally a symbolic summation tool. It will symbolically analyze its argument and try to choose a summation method. In the case of a definite sum, it will choose the "Procedural"
method if the number of terms is small, which method sums terms step by step. It is not very fast at this, at present, as one can see in the timings. If the number of terms is large, Sum[]
may try to find a general formula it can apply instead of adding up all the terms (for instance, Sum[2. i, {i, 10^12}]
). Sum[]
falls back on "Procedural"
if all else fails, which is what happens in the cases above. (In fact, it probably does that quickly here, since the summand depends on the data list.) In any case, the function provided for summing data in Mathematica is Total[]
.
If one thinks the advantage of Sum[]
is to use a formula for the summand, then Total@Table[formula, {k, a, b}]
will do the same thing, usually much faster. For instance, compare 0.0005 sec. of Total@Table[]
below with the 0.024 sec. of Sum[]
above, almost 50 times the speed:
Total@Table[
(data2[[k, 2]] + data2[[k - 1, 2]])/
2*(data2[[k, 1]] - data2[[k - 1, 1]]),
{k, 2, Length[data2]}] // RepeatedTiming
(* {0.000506291, 0.125917} *)
The documentation for Sum[]
points out the equivalence of Sum[..]
and Total[Table[..]]
, but it does not give advice on use-case for each. Hence this somewhat lengthy answer for an elementary problem.
One can find the bits of advice given in this answer in other posts here over the last decade, which is how I learned most of it. We have many new members since then, and with almost a hundred thousand Q&As, I'm not sure how to share the accumulated knowledge contained in the site. Over 2,000 have a score of 20 or more; 8,000 with 10+ votes. It's getting hard for me to find things. I can hardly expect others to get up to speed in a short time by going through the site.