From the early days of the IBM PC and its clones, the display adapter hardware was very simple: a small block of memory was dedicated to a grid of character cells (80x25 characters in the standard mode), with two bytes of memory for each cell. One byte selected the character, and the other selected its "attributes" - foreground and background colors plus blink control for color adapters; bold, underlined, blinking, or reverse video for monochrome adapters. The video output hardware looked up pixels from a ROM table of character shapes according to the contents of character memory.
In order to offer a certain degree of hardware independence, the BIOS interface to the character map required a software interrupt to be executed in order to set a single character cell on the screen. This was slow and inefficient. However, the character memory was directly addressable by the CPU as well, so if you knew what hardware was present, you could write directly to memory instead. Either way, once set, the character would remain on screen until changed, and the total character memory you needed to work with was 4000 bytes - about the size of a single 32x32 full color texture today!
In the graphics modes, the situation was similar; each pixel on screen is associated with a particular location in memory, and there was a BIOS set-pixel interface but high performance work required writing directly to memory. Later standards like VESA let the system do a few slow BIOS-based queries to learn the memory layout of the hardware, then work directly with memory. This is how an OS can display graphics without a specialized driver, although modern OSes do also include basic drivers for every major GPU manufacturer's hardware. Even the newest NVidia card will support several different backwards compatibility modes, probably all the way back to IBM CGA.
One important difference between 3D graphics and 2D is that in 2D you don't generally need to redraw the entire screen every frame. In 3D, if the camera moves even a tiny bit, every pixel on the screen might change; in 2D, if you aren't scrolling, most of the screen will be unchanged frame-to-frame, and even if you are scrolling, you can generally do a fast memory-to-memory copy instead of recomposing the whole scene. So it's nothing like having to execute INT 10h for every pixel every frame.
Source: I'm really old