13

I wrote a test that attempts to test two things:

  • Whether the size of a buffer array affects its performance, even if you don't use the whole buffer
  • The relative performance of arrays and ArrayList

I was kind of surprised with the results

  • Boxed arrays (i.e. Integer vs int) are not very much slower than the primitive version
  • The size of the underlying array doesn't matter very much
  • ArrayLists are more than twice as slow as the corresponding array.

The Questions

  1. Why is ArrayList so much slower?
  2. Is my benchmark written well? In other words, are my results accurate?

The Results

 0% Scenario{vm=java, trial=0, benchmark=SmallArray} 34.57 ns; ?=0.79 ns @ 10 trials
17% Scenario{vm=java, trial=0, benchmark=SmallBoxed} 40.40 ns; ?=0.21 ns @ 3 trials
33% Scenario{vm=java, trial=0, benchmark=SmallList} 105.78 ns; ?=0.09 ns @ 3 trials
50% Scenario{vm=java, trial=0, benchmark=BigArray} 34.53 ns; ?=0.05 ns @ 3 trials
67% Scenario{vm=java, trial=0, benchmark=BigBoxed} 40.09 ns; ?=0.23 ns @ 3 trials
83% Scenario{vm=java, trial=0, benchmark=BigList} 105.91 ns; ?=0.14 ns @ 3 trials

 benchmark    ns linear runtime
SmallArray  34.6 =========
SmallBoxed  40.4 ===========
 SmallList 105.8 =============================
  BigArray  34.5 =========
  BigBoxed  40.1 ===========
   BigList 105.9 ==============================

vm: java
trial: 0

The Code

This code was written in Windows using Java 7 and Google caliper 0.5-rc1 (because last I checked 1.0 doesn't work in Windows yet).

Quick outline: in all 6 tests, in each iteration of the loop, it adds the values in the first 128 cells of the array (no matter how big the array is) and adds that to a total value. Caliper tells me how many times the test should run, so I loop through that addition 128 times.

The 6 tests have a big (131072) and a small (128) version of int[], Integer[], and ArrayList<Integer>. You can probably figure out which is which from the names.

import java.util.ArrayList;
import java.util.List;
import java.util.Random;

import com.google.caliper.Runner;
import com.google.caliper.SimpleBenchmark;

public class SpeedTest {    
    public static class TestBenchmark extends SimpleBenchmark {
        int[] bigArray = new int[131072];
        int[] smallArray = new int[128];
        Integer[] bigBoxed = new Integer[131072];
        Integer[] smallBoxed = new Integer[128];
        List<Integer> bigList = new ArrayList<>(131072);
        List<Integer> smallList = new ArrayList<>(128);

        @Override
        protected void setUp() {
            Random r = new Random();
            for(int i = 0; i < 128; i++) {
                smallArray[i] = Math.abs(r.nextInt(100));
                bigArray[i] = smallArray[i];
                smallBoxed[i] = smallArray[i];
                bigBoxed[i] = smallArray[i];
                smallList.add(smallArray[i]);
                bigList.add(smallArray[i]);
            }
        }

        public long timeBigArray(int reps) {
            long result = 0;
            for(int i = 0; i < reps; i++) {
                for(int j = 0; j < 128; j++) {
                    result += bigArray[j];
                }
            }
            return result;
        }

        public long timeSmallArray(int reps) {
            long result = 0;
            for(int i = 0; i < reps; i++) {
                for(int j = 0; j < 128; j++) {
                    result += smallArray[j];
                }
            }
            return result;
        }

        public long timeBigBoxed(int reps) {
            long result = 0;
            for(int i = 0; i < reps; i++) {
                for(int j = 0; j < 128; j++) {
                    result += bigBoxed[j];
                }
            }
            return result;
        }

        public long timeSmallBoxed(int reps) {
            long result = 0;
            for(int i = 0; i < reps; i++) {
                for(int j = 0; j < 128; j++) {
                    result += smallBoxed[j];
                }
            }
            return result;
        }

        public long timeBigList(int reps) {
            long result = 0;
            for(int i = 0; i < reps; i++) {
                for(int j = 0; j < 128; j++) {
                    result += bigList.get(j);
                }
            }
            return result;
        }

        public long timeSmallList(int reps) {
            long result = 0;
            for(int i = 0; i < reps; i++) {
                for(int j = 0; j < 128; j++) {
                    result += smallList.get(j);
                }
            }
            return result;
        }
    }

    public static void main(String[] args) {
        Runner.main(TestBenchmark.class, new String[0]);
    }
}
14
  • what is that runtime a runtime of? Commented Dec 9, 2013 at 22:36
  • @SamIam I'm not sure I understand the question; the runtime is of the benchmark code below it.
    – durron597
    Commented Dec 9, 2013 at 22:41
  • Is it the amount of time that an entire timeBigBoxed or similar method took? Is it the amount of time that a single iteration of one of the for loops took? Commented Dec 9, 2013 at 22:43
  • 2
    Random.nextInt(n) always return a positive number. BTW Math.abs(n) doesn't always return a positive number. Commented Dec 9, 2013 at 23:02
  • 2
    @PeterLawrey Actually Random.nextInt(n) returns a non-negative number, but your point is taken that it's pointless to call Math.abs() on the result. +1 anyway. :-) Commented Dec 10, 2013 at 0:52

4 Answers 4

9

Firstly ...

Are ArrayLists more than twice as slow as arrays?

As a generalization, no. For operations that potentially involve "changing" the length of the list / array, an ArrayList will be faster than an array ... unless you use a separate variable to represent the array's logical size.

For other operations, the ArrayList is likely to be slower, though the performance ratio will most likely depend on the operation and the JVM implementation. Also note that you have only tested one operation / pattern.

Why is ArrayList so much slower?

Because an ArrayList has a distinct array object inside of it.

  • Operations typically involve extra indirections (e.g. to fetch the list's size and inner array) and there are extra bounds checks (e.g. checking the list's size and the array's length). A typical JIT compiler is (apparently) not able to optimize these away. (And in fact, you would NOT want to optimize away the inner array because that's what allows an ArrayList to grow.)

  • For array's of primitives, the corresponding list types involve wrapped primitive types / objects, and that adds an overhead. For example your result += ... involves unboxing, in the "list" cases.

Is my benchmark written well? In other words, are my results accurate?

There's nothing wrong with it technically. But it is not sufficient to demonstrate your point. For a start, you are only measuring one kind of operation: array element fetching and its equivalent. And you are only measuring for primitive types.


Finally, this largely misses the point of using List types. We use them because they are almost always easier to use than plain arrays. A difference in performance of (say) 2 is typically not important to overall application performance.

2
  • Great answer @StephenC. Though, I will say that in my use case (the reason I wrote the benchmark), that operation is the one that matters; things like insertion and deletion are far less important.
    – durron597
    Commented Dec 9, 2013 at 23:25
  • 1
    Fair enough. But then you should not be trying to generalize ... in the way that your Question's text does.
    – Stephen C
    Commented Dec 9, 2013 at 23:51
4

Keep in mind that in using ArrayList, you are actually calling a function, which in the case of get() actually makes two other function calls. (One of which is a range check, which I suspect may be part of the delay).

The important thing with ArrayList, is not so much how much faster or slower it is compared with straight arrays, but that it's access time always constant (like arrays). In the real world, you'll almost always find that the added delay is negligible. Especially if you have an application that even thinks about connecting to a database. :)

In short, I think your test (and the results) are legit.

4
  • I think you're right about the range check. And ArrayList has to check two different ranges, because even if the backing array is, say, 100 elements long, if the effective size is only 3, you can't just rely on array bounds.
    – durron597
    Commented Dec 9, 2013 at 22:59
  • Right - which is the only real reason it has to do the check. Otherwise, it could just rely on the array access to throw the IndexOutOfBoundsException Commented Dec 9, 2013 at 23:03
  • The compiler will optimize the method calls so that the calls themselves aren't costly, but the important thing is that the call to get() results in 2 actions instead of one. The range check is most likely what is resulting in the slower performance.
    – LINEMAN78
    Commented Dec 9, 2013 at 23:05
  • ArrayList's methods will most likely be inlined
    – Steve Kuo
    Commented Dec 10, 2013 at 1:33
0

These results don't surprise me. List.get(int) involves a cast, which is slow. Java's generics are implemented via type erasure, meaning that any type of List<T> is actually a List<Object>, and the only reason you get your type out is because of a cast. The source for ArrayList looks like this:

public E get(int index) {
    rangeCheck(index);

    return elementData(index);
}
// snip...
private void rangeCheck(int index) {
    if (index >= size)
        throw new IndexOutOfBoundsException(outOfBoundsMsg(index));
}
// snip...
@SuppressWarnings("unchecked")
E elementData(int index) {
    return (E) elementData[index];
}

The rangeCheck and the overhead of the function calls are trivial, it's that cast to E that kills you.

12
  • Cast is fairly cheap. And the cast to (E) in the above code is totally fictitious.
    – Hot Licks
    Commented Dec 9, 2013 at 22:52
  • Is there a source that you know of that indicates type casts are particularly slow? Commented Dec 9, 2013 at 22:52
  • @HotLicks Actually he is correct about the code. ArrayList is backed by an Object[] not an E[]. What I'm uncertain of is whether the cast is the real problem. I think it's far more likely to be the rangeCheck
    – durron597
    Commented Dec 9, 2013 at 22:55
  • 3
    @durron597 - But I stand by my statement that the above cast is fictitious. The real cast occurs on the calling side.
    – Hot Licks
    Commented Dec 9, 2013 at 22:58
  • 1
    I'm not even sure this particular cast (since it is casting to a parameterized type - not an actual type) even does anything at the bytecode level at all? (More of a question than a statement.) Commented Dec 9, 2013 at 23:04
0

If you store millions of objects, then the Add or Contains functions will be super slow. Best way is to split it using hashMap of Arrays. Although similar algorithms can be used for other types of objects, this is how I improved 1000 times faster the processing of 10 million strings (the memory taken is 2-3 times more)

public static class ArrayHashList  {
    private String temp1, temp2;
    HashMap allKeys = new HashMap();
    ArrayList curKeys;  
    private int keySize;
    public ArrayHashList(int keySize) {
        this.keySize = keySize;
    }   
    public ArrayHashList(int keySize, String fromFileName) {
        this.keySize = keySize;
        String line;
        try{
            BufferedReader br1 = new BufferedReader(new FileReader(fromFileName));        
            while ((line = br1.readLine()) != null) 
                addString(line);
            br1.close();
        }catch(Exception e){
            e.printStackTrace();
        }
    }   
    public boolean addString(String strToAdd) {  
        if (strToAdd.length()<keySize)
            temp1 = strToAdd;
        else
            temp1 = strToAdd.substring(0,keySize);
        if (!allKeys.containsKey(temp1))
            allKeys.put(temp1,new ArrayList());
        curKeys =  (ArrayList)allKeys.get(temp1);
        if (!curKeys.contains(strToAdd)){
            curKeys.add(strToAdd);
            return true;
        }
        return false;
    }
    public boolean haveString(String strCheck) { 
        if (strCheck.length()<keySize)
            temp1 = strCheck;
        else
            temp1 = strCheck.substring(0,keySize);
        if (!allKeys.containsKey(temp1))
            allKeys.put(temp1,new ArrayList());
        curKeys =  (ArrayList)allKeys.get(temp1);
        return curKeys.contains(strCheck);
    }
}

to init and use it:

ArrayHashList fullHlist = new ArrayHashList(3, filesPath+"\\allPhrases.txt");       
ArrayList pendingList = new ArrayList();
BufferedReader br1 = new BufferedReader(new FileReader(filesPath + "\\processedPhrases.txt"));
while ((line = br1.readLine()) != null) {
    wordEnc = StrUtil.GetFirstToken(line,",~~~,");
    if (!fullHlist.haveString(wordEnc))
        pendingList.add(wordEnc);
}
br1.close();    
1
  • This answers a different question. My question was "why?", your answer answers "okay so now what"?
    – durron597
    Commented Apr 20, 2018 at 20:28

Not the answer you're looking for? Browse other questions tagged or ask your own question.