6
$\begingroup$

In Java, the size of arrays are immutable. However, to my understanding, they are still allocated on the heap, because Java allocates almost everything on the heap. Even then, Java arrays are still unchanging in size. My question is, if at any point a language implementation (e.g., the Java Virtual Machine) could simply call realloc() or the equivalent, why do they still not support resizing an array without creating a new one?

Another example is in C++, there are new/delete which are equivalent to malloc()/free(), but there is no corresponding equivalent of realloc(). At the same time, both languages have standard library classes that behave as arrays that can be resized (java.util.ArrayList and std::vector). Is there a reason a language would bar the programmer from manually performing reallocations/resizes?

$\endgroup$
2
  • 2
    $\begingroup$ It might be worth noting that realloc is, essentially, a fancy wrapper around free and malloc with some corner cases. While it might resize in-place in favorable conditions, it gives no guarantees. Java and C++ still allow you to do the same thing manually, which is what ArrayList and std::vector do. $\endgroup$
    – abel1502
    Commented Jul 11, 2023 at 19:45
  • 7
    $\begingroup$ It could also be said that those corner cases are the reason why realloc exists. $\endgroup$
    – ojs
    Commented Jul 12, 2023 at 9:20

3 Answers 3

14
$\begingroup$

There is potentially unknown number of variables pointing to ("referencing") the same Java array. It cannot be "reallocated" because these references may end up pointing to nothing if it is moved in the memory. A wrapper object is required that is not reallocated so references to it remain valid. This wrapper object can then manage the internal array to what only it has access to.

With C++, it is still possible to get this wrong by creating and retaining pointers into internal values inside the vector: you add one more element, the vector grows and using your pointers is now undefined behavior. However you mostly just need to be aware.

And, yes, C++ does have std::realloc that you may use if find reasonable, even if there are restrictions how to allocate the region, see comment below.

$\endgroup$
4
  • 2
    $\begingroup$ C++ std::realloc is for use with what was std::malloced (or std::calloced), it's incompatible with new. $\endgroup$
    – Ruslan
    Commented Jul 12, 2023 at 10:40
  • 3
    $\begingroup$ Note that a stop-the-world garbage collector can reallocate storage for objects and update all references to them, and in fact that is how many such collectors work. Such reallocation can only occur during a stop-the-world event, however, and not at other times. $\endgroup$
    – supercat
    Commented Jul 14, 2023 at 14:56
  • $\begingroup$ @supercat - yes, but running a stop-the-world collection generally takes multiple milliseconds or even tens of milliseconds in typical applications. And a hypothetical Java Arrays.resize would need to run one every time it was called, which is likely to be substantially less efficient than just copying the array's contents into a new one, thus the entire point of realloc existing (that in some unusual circumstances it's faster, and isn't ever noticeably slower than making a new copy) is essentially negated. $\endgroup$
    – occipita
    Commented Jul 23, 2023 at 18:50
  • $\begingroup$ @occipita: Agreed that stop-the-world GC systems that globally update direct references can't accommodate operations that might require expanding objects in place. Such a system could allow objects to be shrunk, however, with the proviso that storage wouldn't become available until the next GC cycle. That's not really a problem, though, since nobody would care whether the memory was available before the next GC cycle. $\endgroup$
    – supercat
    Commented Jul 23, 2023 at 19:17
8
$\begingroup$

Java couldn't allow this because of how its memory is laid out. There are four pools, each of increasing size. Every time an object is created, it is added to the smallest pool, in a contiguous manner. Objects that go out of scope don't immediately get deallocated, instead waiting for the next GC cycle. When that pool is full, all objects in the pool are checked for aliveness. If they are, they move to the next larger pool. The pool is then cleared at the end of the GC cycle. This happens for each larger pool, until the largest, which requires the most work to clean up, but rarely ever does. As such, a reallocation would require all members of the pool to potentially move to make space, or the object would have to be placed at the end of the pool, and everything else would move, and it may even trigger an early GC cycle in that pool if there's insufficient space.

C++ has a different memory layout, but the ideas are kind of the same. Memory is allocated contiguously, so increasing the size almost always requires a move operation. However, since we don't control the pointer, only references, and since multiple references can exist (which is also true for Java), that means that the language would have to specify that references are also updated automatically. This could be a major performance problem. One way to work around this problem is by using smart pointers. Your code would only ever interact with a pointer that is guaranteed to not be reallocated, and that pointer would point to the data that can move. This way, you maintain just one reference to the object, which improves performance.

Other strategies are possible, such as intentionally leaving space after every object, but this would trade the risk of movement in exchange for wasted memory usage. The language could also just keep track of the references in some kind of list, and therefore know how to update the references. Either way, such languages would likely be wasteful with memory or processing time. The fastest way to access data is direct access, and that typically means limiting what you can do with that memory once allocated.

$\endgroup$
3
  • 1
    $\begingroup$ I'm not that much into current Java GC strategy, but if I recall correctly, they moved away from having fixed pools and all survining GC strategies are mark&copy anyway. But I could be mistaken. $\endgroup$
    – feldentm
    Commented Jul 16, 2023 at 10:02
  • $\begingroup$ @feldentm This answer is intentionally written as more abstract, rather than being overly technical. You are correct that some details have changed wrt exactly how pools are allocated and cleared, but these differences don't matter at a high level. There's still young and old generations, promotion still occurs, etc. It's been incremental changes, like concurrent cleanup, aggressive optimization of the memory maps used, and so on, that make GC more efficient these days. $\endgroup$
    – phyrfox
    Commented Jul 16, 2023 at 13:46
  • 1
    $\begingroup$ An essential feature of Java is that as long as any reference exists and isn't null, it will identify an object of the same type and size. Reallocating arrays with different sizes would break this. $\endgroup$
    – supercat
    Commented Jul 17, 2023 at 16:57
5
$\begingroup$

In C, the memory location of the object affected by calling realloc may change, meaning you have to update your pointers to the object after calling it. The canonical way of using it is:

ptr = realloc(ptr, newSize);

If there are other copies of the pointer, you need to make sure they're updated too. In reality, almost all code that uses realloc works on pointers that only have one copy.

In a language with garbage collection, copying the contents and updating the references will cause the original object to be deleted. This means that the Java code:

array = Arrays.copyOf (array, newSize)

is actually directly equivalent in the case where there is no other reference to array. Unfortunately, we cannot directly modify the allocation because in most implementations there is no way of telling that array is the only copy of the reference: the garbage collector can only find out when there are no copies, but not identify how many copies there are. A reference counting GC may be able to achieve this, but Java was not designed with using such a GC in mind.

$\endgroup$
3
  • $\begingroup$ I wouldn't call that the "canonical" way to use realloc(). A common wrong way, perhaps... $\endgroup$ Commented Jul 18, 2023 at 9:16
  • $\begingroup$ @TobySpeight they're just omitting the error checking for brevity here. The important point is that you have to reassign the original pointer. $\endgroup$
    – Barmar
    Commented Jul 24, 2023 at 4:50
  • $\begingroup$ I hope so @Barmar. Unfortunately, I've seen almost exactly that line in real code, so it bears mention that we mustn't reassign ptr when realloc() has failed and left the existing memory block untouched. $\endgroup$ Commented Jul 27, 2023 at 7:22

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .