Was there ever a compiler type that was just large enough to contain a memory segment?

Question

From this answer,

Hark back to the days of segmented 16-bit architectures for example: an array might be limited to a single segment (so a 16-bit size_t would do) BUT you could have multiple segments (so a 32-bit intptr_t type would be needed to pick the segment as well as the offset within it). ...

This gives me a conceptual model of a type system whereby

uintptr_t = XXXsegment_t + size_t

On a system that supports memory segmentation. Did a XXXsegment_t or the like ever get standardized into a type on such systems? Was there ever a segment_t?

@Raffzahn that's ok if it get's closed or migrated, but it would be nice for the community to fill out the FAQ (like the other sites) retrocomputing.stackexchange.com/help/on-topic retrocomputing.stackexchange.com/help/dont-ask Seems like it would be on topic. — Evan Carroll, Commented Jul 7, 2018 at 23:57
@Raffzahn Segmented memory architectures are pretty much only found on retro systems (e.g. MS-DOS). I can't think of a single modern architecture that uses them. — Alex Hajnal, Commented Jul 8, 2018 at 0:10
@Raffzahn I think we've agreed that computing history is on-topic; there was an Area 51 proposal but it got sort of merged into this site because its scioe was entirely contained within either this site or HSM. — wizzwizz4, Commented Jul 8, 2018 at 3:44
@EvanCarroll Yeah, we need to do that; could you post another Retrocomputing Meta question about it? — wizzwizz4, Commented Jul 8, 2018 at 3:47
Found the Turbo C manual: pp. 237-; there were new keywords near, far, huge and register prefix keywords to modify pointer types, and you'd use macros to get segment/offset. — dirkt, Commented Jul 8, 2018 at 5:56

user722user722 · Accepted Answer · 2018-07-08 19:07:02Z

There was never any integral type defined that corresponded to value of a segment the same way uintptr_t corresponds to the value of a pointer defined in any C compiler targeting x86 CPUs that I'm aware of. For that matter I don't know if any 16-bit segmented C compiler for x86 CPUs ever implement the uintptr_t type. You just had to know if you assigned a 32-bit long integer value to a far pointer then the most significant 16-bits of the value would be used as the segment part of the address and the least significant 16-bit as the offset part.

However there was at least one implementation of pointer-like segment types, by Borland, and of segment-like pointer types, by Microsoft. Borland's implementation used the _seg keyword to create segment pointer types while Microsoft's implementation used the _based keyword to create something called based pointers. Microsoft's based pointers were also supported by Watcom's C compiler.

The Borland implementation was the most straightforward. It allows the use of the _seg keyword just like the near, far and huge keywords in pointer types. For example you could do something like:

unsigned short _seg video_seg = 0xb800;
video_seg[80 * row + column] = character + attribute * 256;

The advantage of using a segment pointer over a far pointer here is that the variable only takes 2 bytes of space rather than 4. Also it should help the fairly primitive Borland compiler generate better code.

The Microsoft implementation used the _based keyword to based pointers. In theory these were more flexible and not specific in to the x86 segmented architecture. In fact they're still supported by current Microsoft compilers. The basic idea is that you can use the _based keyword to make a pointer "based" on some other variable. So for example you can do:

void far *video_base = 0xb8000000;
unsigned short __based(video_base) *video_seg = 0;

video_seg[row * 80 + column] = character + attribute * 256;

In practice though Microsoft implementation of based pointers in their 16-bit segmented compilers was pretty buggy and don't really offer an advantage over using far pointers to refer to segments. The were modestly used in combination with the _segname keyword to specific which segment statically allocated variables should be allocated in. For example:

unsigned char _based(_segname("OFFSCRBUF")) offscren_buffer[320U * 200];

The only other compiler that I know that did anything like this was Watcom's. They took Microsoft's based pointers and added several extensions. In particular they created a _segment keyword which is almost what you're asking for but much more inconvenient as you need to use it with both based pointers and a special :> operator. For example:

_segment video_seg = 0xb800;
unsigned short _based(void) *offset = row * 80 + column;
*(video_seg:>offset) = character + attribute * 256;

Note that all the above examples omit casts for brevity. Without the casts these code examples will all generate warnings but generate correct code.

The DOS port of gcc, djgpp, supports uintptr_t, which as you know did not become part of the C standard until 1999. So do the successors to classic DOS compilers such as OpenWatcom and C++Builder, and so does Digital Mars. — Davislor, Commented Jul 8, 2018 at 22:48
Of course, C programmers in the 20th century, before uintptr_t existed, generally used unsigned long. (On the Tiny, Small or Medium models, pointers were 16-bit instead, so OpenWatcom changes the definition of uintptr_t accordingly.) — Davislor, Commented Jul 8, 2018 at 23:15

Alex Hajnal · Accepted Answer · 2018-07-08 07:46:00Z

3

To summarize what @traal linked to above, the recommended practice for DOS systems was as follows:

Pointers are 32 bit (unsigned long)
To get the segment for a pointer use the FP_SEG macro from DOS.H
To get a pointer's offset within a segment use the FP_OFF macro, also from DOS.H
Both the segment and offset are unsigned i.e. uint16_t

The macros are defined as:

#define FP_OFF(fp)  ((unsigned)(fp))
#define FP_SEG(fp)  ((unsigned)((unsigned long)(fp)>>16))

It makes sense if you think about it since a pointer_t, well, points to a location in memory. Without the segment information the actual value it points to is ambiguous. Likewise, the segment number on its own is not terribly useful¹; it only becomes useful when paired with an offset. In short, segment_t was a uint16_t but pointers were always² 32-bit³.

¹ Unless one is writing memory management code (e.g. paging to disk). In that case you would probably be working strictly with uint16_ts.

² Unless you kept all of your data in a single 64kB segement

³ This is all a bit simplified but the gist of it is that the CPU's segment and offset registers are each 16 bits wide so it makes sense to treat memory addresses as atomic 32 bit values (at the C level) with (under the hood) the 16 MSBs of an address being the segment and the 16 LSBs being the offset. Note that under this scheme a pointer_t's value is not the physical linear address (except in the 0^th segment).

edited Jul 8, 2018 at 7:46

answered Jul 8, 2018 at 4:26

Alex Hajnal

9,4104 gold badges37 silver badges55 bronze badges

Yes, this is kinda what I was thinking, but I think it only makes sense so long as the segment/size_t are 16bit, I just wasn't sure if that was always the case (or if anyone ever planned for that not to be the case, as in after you had a 16/16 split and before memory segmentation went out the window)
– Evan Carroll
Commented Jul 8, 2018 at 4:30
@EvanCarroll On x86 the segment and offset were each always 16-bit. At the processor level, things got a bit messy (a single memory location could have multiple, valid addresses). A given address lived in two separate registers, each 16 bits wide which were combined ( (Segment << 4) + Offset ) to form the linear address.
– Alex Hajnal
Commented Jul 8, 2018 at 4:50
That's for real mode. Protected mode is conceptually similar but differs in the low-level details.
– Alex Hajnal
Commented Jul 8, 2018 at 5:12
@AlexHajnal Minor nit: You actually can force 32-bit addressing with the 67H byte prefix in real and v86 modes, but as the segment limits are 64K, you'll get interesting hardware exceptions. There's also "unreal mode", which has its own can of worms! :-)
– ErikF
Commented Jul 8, 2018 at 8:28
1

@AlexHajnal My guess is that allocating memory on 16-byte boundaries wastes far less memory than doing the same on 64KB ones if most programs don't use the entire segment, and makes it easier to port CP/M-style programs. For example, if a system had 128KB total RAM and the OS used 32KB, a 32KB program could be loaded that still could access 64KB of data without ever having to alter any segments.
– ErikF
Commented Jul 8, 2018 at 9:20

| Show 4 more comments

Leo B. · Accepted Answer · 2018-07-08 03:30:29Z

2

No; size_t has to cover every possible size of a data structure which could be defined; as soon as a model allowed an array larger than 64kb, size_t had to be a 32 bit type, and there was no point in having a separate segment type.

answered Jul 8, 2018 at 3:30

Leo B.

19.4k5 gold badges49 silver badges146 bronze badges

1

What does cover mean there? Why couldn't you have a 32 bit type with a 24 bit size_t, and a 8 bit segment selector?
– Evan Carroll
Commented Jul 8, 2018 at 3:56
1

@EvanCarroll Because the physical registers in the processor are 16 bits wide.
– Alex Hajnal
Commented Jul 8, 2018 at 6:01
Perfect answer.
– Raffzahn
Commented Jul 8, 2018 at 8:55
@EvanCarroll cover means that for every type that a compiler can compile, sizeof(type) can be represented by a size_t value. Converting pointers to size_t is undefined behavior, as well as subtracting pointers to different data structures. The need for a separate segment_t type doesn't arise.
– Leo B.
Commented Jul 8, 2018 at 17:23
1

@LeoB Why couldn't data structures in a 32-bit memory space be limited to 64 KiB?
– snips-n-snails
Commented Jul 8, 2018 at 19:17

| Show 3 more comments

rcgldr · Accepted Answer · 2018-07-13 20:19:21Z

For 80286 (or higher) processors, there is a Windows 3.x memory model that runs in protected mode to go beyond the real mode 1MB limit, using selectors instead of real mode segments. A block of selectors are used to access blocks of memory greater than 64KB, and incrementing a selector by 8 would point to the next 64KB block of memory. The increment value was set via a define named _AHINCR. (For real mode huge memory model, _AHINCR was 0x1000, used as the increment value for a segment register). _AHINCR - "AH" may have been used for the name because the command line switch for huge model with Microsoft compilers is /AH, while "INCR" is for increment value. For 80286, protected mode allowed for 24 bit addressing which translates into 16MB of memory.

This was superseded with 80386 and Windows 3.x support for 32 bit flat model (32 bit offsets) via win32s or winmem32. Watcom 10 C/C++ was the only compiler to support winmem32 as one of it's standard memory models.

Stack Exchange Network

Was there ever a compiler type that was just large enough to contain a memory segment?

4 Answers 4

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged
programming
c
compilers
segmentation
.

Hot Network Questions

Was there ever a compiler type that was just large enough to contain a memory segment?

4 Answers 4

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged programmingccompilerssegmentation.

Related

Hot Network Questions

Not the answer you're looking for? Browse other questions tagged
programming
c
compilers
segmentation
.