I have some knowledge of what you call “stacking” (I don't know if there's a standard term), from having programmed with classic Mac OS graphics (QuickDraw). To keep the explanation straightforward, I'm going to write as if that system is the one single way to do things.
I guess it is using dirty regions and asks the owners of the windows to redraw their content.
Yes, dirty regions are critical, and there is a further refinement: the dirty region is also used as a clipping region, built into the graphics system, such that drawing operations do not modify any pixels outside the region. This region is automatically set so that it contains only the pixels that need to be redrawn and are visible.
Thus, when some window A is moved to uncover part of a window B, the program responsible for window B can draw as carefully or sloppily as it likes, but will not accidentally either overwrite window A, or paint over its own previous drawing, and all graphics primitives automatically skip invisible pixels. This was a very important performance optimization, back when processors were slower and there were no GPUs; it meant that the only pixels that needed to be touched were the ones actually needed to show on the screen.
Also, the foreground window A would be moved by copying its pixels within the video RAM (a.k.a. “blitting”), so that A does not need to be redrawn whenever it is moved. (While faster than redrawing, this copy was still slow enough to be visible on-screen — I remember noticing that depending on which direction I moved a window, the copy would either proceed top-to-bottom or bottom-to-top!)
So, the window system made sure that every pixel only needs to be drawn once. An application might or might not be clever enough to figure out how to draw its part of the screen without multiple operations touching the same pixel, but it has the opportunity. (The application was given, if I remember correctly, a list of dirty rectangles to draw.)
Furthermore, since nothing is double-buffered and all drawing is immediately visible on-screen, minimizing overdraw also minimized the amount of flickering visible to the user.
(However, classic Mac OS actually did do one extra draw: it would always immediately fill the newly-visible areas with white, before invoking any applications' redraw routines. This is a noticeable difference from Windows, which would instead leave the old screen contents in place — thus resulting in visible copies of a dragged window, or other stale pixels, if the window underneath did not redraw promptly.)
Using painter's algorithm for the compositions seems to be a rather brute force approach.
Yes, it is. It is feasible today because GPUs are dedicated hardware optimized for copying pixels around. Even then, your window compositor probably has special cases like not drawing a window whose bounding rectangle lies completely behind another opaque window.
(But for text-mode windows it is completely reasonable to do this with a CPU, since the amount of data, is so much smaller!)
I was considering if windows could use some kind of backbuffer to store the content they obscure and redraw the revealed parts when they move away - this sounds somewhat too complicated.
Yes, that will create complexity for not a lot of benefit. If you're going to store copies of window contents, you might as well do a full buffer for every window (with today's hardware), rather than partial storage that you have to reshape all the time. And what happens if an obscured window wants to update its contents? Does it have to draw half on the screen and half on the obscured-buffer?
In the old days, we avoided keeping copies of window contents entirely, because
- memory is precious, so you don't store extra copies of what's on screen, and
- you would have had to run three memory operations whenever the content changed (draw to back buffer, read back buffer, write graphics memory).
The one place where having a copy of window contents stored makes sense is a paint program, where those pixels are the document so you need a copy that is never affected by being obscured.
I stared looking into segmentation and interval trees if they could be useful for identifying the obscured regions.
You might want to look at the QuickDraw “region” data structure, which was used for clipping. It is optimized for compactly representing irregular rectangular-ish shapes by storing a list of only their corner points. I don't know if it was actually used for computing/remembering visibility, but there were definitely region union/intersection/subtraction operations that could have been used for that.