Why is sum so much faster than inject(:+)?

Question

So I was running some benchmarks in Ruby 2.4.0 and realized that

(1...1000000000000000000000000000000).sum

calculates immediately whereas

(1...1000000000000000000000000000000).inject(:+)

takes so long that I just aborted the operation. I was under the impression that Range#sum was an alias for Range#inject(:+) but it seems like that is not true. So how does sum work, and why is it so much faster than inject(:+)?

N.B. The documentation for Enumerable#sum (which is implemented by Range) does not say anything about lazy evaluation or anything along those lines.

Eric Duminil · Accepted Answer · 2017-01-04 10:55:46Z

Short answer

For an integer range :

Enumerable#sum returns (range.max-range.min+1)*(range.max+range.min)/2
Enumerable#inject(:+) iterates over every element.

Theory

The sum of integers between 1 and n is called a triangular number, and is equal to n*(n+1)/2.

The sum of integers between n and m is the triangular number of m minus the triangular number of n-1, which is equal to m*(m+1)/2-n*(n-1)/2, and can be written (m-n+1)*(m+n)/2.

Enumerable#sum in Ruby 2.4

This property in used in Enumerable#sum for integer ranges :

if (RTEST(rb_range_values(obj, &beg, &end, &excl))) {
    if (!memo.block_given && !memo.float_value &&
            (FIXNUM_P(beg) || RB_TYPE_P(beg, T_BIGNUM)) &&
            (FIXNUM_P(end) || RB_TYPE_P(end, T_BIGNUM))) { 
        return int_range_sum(beg, end, excl, memo.v);
    } 
}

int_range_sum looks like this :

VALUE a;
a = rb_int_plus(rb_int_minus(end, beg), LONG2FIX(1));
a = rb_int_mul(a, rb_int_plus(end, beg));
a = rb_int_idiv(a, LONG2FIX(2));
return rb_int_plus(init, a);

which is equivalent to:

(range.max-range.min+1)*(range.max+range.min)/2

the aforementioned equality!

Complexity

Thanks a lot to @k_g and @Hynek-Pichi-Vychodil for this part!

sum

(1...1000000000000000000000000000000).sum requires three additions, a multiplication, a substraction and a division.

It's a constant number of operations, but multiplication is O((log n)²), so Enumerable#sum is O((log n)²) for an integer range.

inject

(1...1000000000000000000000000000000).inject(:+)

requires 999999999999999999999999999998 additions!

Addition is O(log n), so Enumerable#inject is O(n log n).

With 1E30 as input, inject with never return. The sun will explode long before!

Test

It's easy to check if Ruby Integers are being added :

module AdditionInspector
  def +(b)
    puts "Calculating #{self}+#{b}"
    super
  end
end

class Integer
  prepend AdditionInspector
end

puts (1..5).sum
#=> 15

puts (1..5).inject(:+)
# Calculating 1+2
# Calculating 3+3
# Calculating 6+4
# Calculating 10+5
#=> 15

Indeed, from enum.c comments :

Enumerable#sum method may not respect method redefinition of "+" methods such as Integer#+.

This is a really good optimization to have since calculating the sum of a range of numbers is trivial if you use the right formula, and excruciating if you go about it iteratively. It's like trying to implement multiplication as a series of addition operations. — tadman, Commented Jan 3, 2017 at 18:16
So the performance boost is for n+1 ranges only? I don't have 2.4 installed or I would test myself but are other Enumerable Objects handled by basic addition as they would be in inject(:+) minus the overhead of symbol to proc. — engineersmnky, Commented Jan 3, 2017 at 18:36
Readers, recall from your high-school math that n, n+1, n+2, .., m constitutes an arithmetic series whose sum equals (m-n+1)*(m+n)/2. Similarly, the sum of a geometric series, n, (α^1)n, (α^2)n, (α^3)n, ... , (α^m)n. can be computed from a closed-form expression. — Cary Swoveland, Commented Jan 3, 2017 at 18:42
\begin{nitpick} Enumerable#sum is O((log n)^2) and inject is O(n log n) when your numbers are allowed to be unbounded. \end{nitpick} — k_g, Commented Jan 4, 2017 at 4:52
@EliSadoff: It means really big numbers. It means numbers which don't fit architecture word i.e. can't be computed by one instruction and one operation in CPU core. Number of size N could be encoded by log_2 N bits so addition is O(logN) operation and multiplication is O((logN)^2) but could be O((logN)^1.585) (Karasuba) or even O(logN*log(logN)*log(log(LogN)) (FFT). — Hynek -Pichi- Vychodil, Commented Jan 4, 2017 at 8:56

Collectives™ on Stack Overflow

Why is sum so much faster than inject(:+)?

1 Answer 1

Short answer

Theory

Enumerable#sum in Ruby 2.4

Complexity

sum

inject

Test

Not the answer you're looking for? Browse other questions tagged
ruby
or ask your own question.

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Short answer

Theory

Enumerable#sum in Ruby 2.4

Complexity

sum

inject

Test

Not the answer you're looking for? Browse other questions tagged ruby or ask your own question.

Linked

Related

Not the answer you're looking for? Browse other questions tagged
ruby
or ask your own question.