0

Let's assume we have a method that we want to run as fast as possible and it needs to loop on a 2D array, naturally we would do a nested for loop as such:

    int[][] arr = new int[3][3];
    for(int i = 0; i < arr.length; i++) {
        for(int j = 0; j < arr[i].length; j++) {
            arr[i][j] = j;
        }
    }

Assuming we wanted to get rid of the O(n^2), we can do something like this:

    int j = 0;
    for(int i = 0; i < arr.length; ) {
        if (j == arr[i].length - 1) {
            i++;
            if(i == arr.length) {
                break;
            }
            j = 0;
        }
        j++;
        arr[i][j] = j;
    }

Stating the obvious, it is bad to change the value of the loop counter inside of the loop body and it is less readable, but is there a scenario where this would be okay? Would this still be considered O(n^2)?

6
  • 2
    It's not considered O(n^2), it is O(n^2) (if n is the size of each dimension) or O(n) (if n is the total number of elements) or O(1) (if the array is always 3x3) Commented Aug 18, 2022 at 19:59
  • 4
    I am curious to know how you think performance works. There are surely other people with the same misunderstanding and if I understand the misunderstanding maybe I can teach you and them Commented Aug 18, 2022 at 20:00
  • As you can tell, I am not fully proficient with algorithms yet and I obviously wouldn't ever actually use this while coding, I thought getting rid of the nested for loop might increase performance since we are removing the loop, but I obviously had my speculations about it and decided to ask and you made it clear why it is not an increase in performance.
    – Yoh
    Commented Aug 18, 2022 at 20:04
  • 12
    Complexity is (roughly speaking) about how many times your code runs - not how many times you write "for". Commented Aug 18, 2022 at 20:08
  • @Yoh the computer breaks the loop down into basic steps - I added it to my answer. Commented Aug 18, 2022 at 20:15

3 Answers 3

11

Both are O(N) where N is the number of elements in the array. The second one is just more confusing and has bugs.

Assuming you fix the bugs, they're both the same loop. They'll compile into similar assembly code (or Java bytecode). The second one is just more confusing (and has bugs).

So there's no good reason to write the second one.

O(n^2) isn't when you have two nested loops. O(n^2) is when the amount of time your algorithm takes is proportional to the square of the amount of input data (or more generally, the square of something). The first code runs for arr.length*arr[i].length iterations; the second code also runs for arr.length*arr[i].length iterations if you fix the bugs, it's just more confusing (and has bugs).


CPUs don't understand loops, only goto, and if+goto, and other basic instructions like those. Loops were one of the first shortcuts that programmers invented to make it easier to write programs. When you write

for(int j = 0; j < arr[i].length; j++) {
    arr[i][j] = j;
}

the compiler actually turns it into something like this:

j = 0;
start_of_loop:
if (j >= arr[i].length) goto end_of_loop;
arr[i][j] = j;
j++;
goto start_of_loop;
end_of_loop:

and when you write this:

    if (j == arr[i].length - 1) {
        i++;
        if(i == arr.length) {
            break;
        }
        j = 0;
    }

the compiler actually turns it into something like this:

if(j != arr[i].length - 1) goto end_of_if;
i++;
if(i == arr.length) goto end_of_loop;
j = 0;
end_of_if:

so you can see the CPU isn't going to care which way you write it - it's (approximately) the same code by the time the CPU actually runs it.

The CPU takes some time to run each instruction, so what matters is (approximately) the number of instructions it runs, and that's what we are trying to say with big-O notation: if n is twice as big, does the CPU run the same number of instructions, or a fixed amount extra, or twice as many, or 4 times as many? Big-O notation tells us that, in a useful approximate way, without caring about the exact number of instructions.

4
  • PowerPC and POWER actually have a loop instruction. You load a counter into a special register and end a loop with a “decrement and branch if not zero” instruction. The biggest difference to a normal branch is that the loop instruction is not predicted, it is like an unconditional branch until the last one which is not executed at all.
    – gnasher729
    Commented Aug 18, 2022 at 21:31
  • @gnasher729 that is a form of dec+if+goto. Yes, sometimes a CPU's "basic" instructions are designed to actually do a few different things in only one instruction Commented Aug 19, 2022 at 13:48
  • You should have mentioned the second snippet has bugs ;-)
    – Doc Brown
    Commented Aug 22, 2022 at 8:45
  • 1
    It might be worth mentioning the importance of defining what n in O(n) is. i.e. total number of elements or the side of a square 2D array? Either might be appropriate depending on the context.
    – JonasH
    Commented Aug 22, 2022 at 14:09
1

It’s the number of operations that counts, not the number of loops. Your code is supposed to set all array elements. Therefore if your code is correct the number of operations is proportional to the total number of array elements, no matter how you do it.

0

Both loops will turn into roughly the same thing. In Java, the former may be slightly faster because the JIT compiler may have an easier time tracking what you are doing.

From a code development perspective, readability is important. If nested for loops capture the story you need, use them because everyone can read them.

I have written code that looked like the second example. It was a case where the incrementing pattern was a lot less regular. I had an algorithm with multiple modes, labeled 0, 1, 2, 3, etc. It started in mode 0 and processed data until it came across too many errors. Then it incremented the mode, and tried again. The resulting code could have been treated as a nested pair of for loops, but the conditions for terminating the inner loop would be rather awkward, and not very for-loop-like. So instead I wrote out the if logic, as you did, to make it very clear that this was a more interesting behavior.

Its the same logic that suggests the famous C++ nugget while(*dest++ = *src++); should not be used. That little snippit copies a string from src to dest, and looks pretty snazzy while doing it. However, its terribly hard to follow because most while loop conditions don't have assignment in them, nor have multiple increment operations. For readability, its better to write that one out with a nice clear loop

3
  • About 20 years ago I compared memcpy, copied from an old book, and memcpy in the MacOS Standard C library. The library was about ten times faster. Actually, when the machine was booted, it detected the processor and copied one of several versions into a fixed location. And it did very interesting things prefetching cache lines and so on for large amounts of data.
    – gnasher729
    Commented Aug 19, 2022 at 1:24
  • That old nugget would be written while((*dest++ = *src++)) /**/; at least, to avoid a warning about the assignment used for its truth-value, and to make the empty instruction obvious. After that, it's digestible enough, though the lib-function is faster unless the strings are really short. Commented Aug 19, 2022 at 13:52
  • What I saw, the library function for up to 64 bytes would load all the data into registers at once and spit them out to the right place. Usually memcpy has to check for overlap between src and dst, this one didn't for up to 64 bytes.
    – gnasher729
    Commented Aug 22, 2022 at 16:55

Not the answer you're looking for? Browse other questions tagged or ask your own question.