Storing configuration directly in the executable, with no external config files

Question

Back in the days when dinos talked fluently English, Arabic, Spanish and a bunch of other languages, when one could change tapes of a PDP-11 on the fly there was a programming language Turbo Pascal.

I remember a technique (with the aid of some obscure helper functions) were in Turbo Pascal variables are put in the EXE (like all variables were put in a file). But one could change the variable while running the program and then store the changed variable in the EXE. Essentially, no external config files were needed by using this technique.

Does someone remembers how this technique was called? (In fact, I'm searching something similar for a compiled Python program, but I guess it's a start to know the name of the technique.)

computer-programming-forum.com/29-pascal/b712f2e3b6be3f84.htm — Sneftel, Commented Apr 15, 2023 at 11:39
Presumably, the program could store any configuration after the executable image as defined by MZ headers. I don’t think this technique had any particular name, though you may look for the broader term ‘self-modifying executable’. Not sure though if DOS-specific techniques will be of much use with Python… — user3840170, Commented Apr 15, 2023 at 14:21
It was quite common for early programs (e.g. CP/M Wordstar, and ofc Turbo Pascal uses Wordstar keys) to get patched directly into specific locations in the assembly to set "configuration variables". — dirkt, Commented Apr 15, 2023 at 17:41
This is a pretty bad idea for modern programs. Back in the DOS days, it made more sense, because no directory was guaranteed to exist, not even a C: drive. MS-DOS and its stock programs could live almost anywhere. Nowadays, nearly every operating system gives a user a home directory, and dedicated subdirectories for user data (Windows has AppData, Mac has Library, Linux has .config.) — VGR, Commented Apr 16, 2023 at 5:35
On some systems, it is possible to "snapshot the executable" after initialization. For instance, it would load up a whole load of data into memory after asking you some configuration questions and save an image of the memory. The next time round, you just run that image. If you run the original, you'd have to go through the whole process again. I saw this being used back in 2006 on a Java program that took 4 hours to load its data so it is still possible on a modern OS (W2K) but I don't know how it was done. — cup, Commented Apr 16, 2023 at 9:51

tofro · Accepted Answer · 2023-04-21 07:53:40Z

Well, I don't think there ever was a specific term for that.

What Turbo Pascal calls "typed constants" (which are, in fact, variables, and a non-standard Turbo Pascal extension of the language) are put as one single block into the program executable by the compiler in the exact order the compiler comes across them at compile time.

CONST 
   beaconStr : String [20] = 'config starts here';
   configVar : INTEGER = 1;

"Normal" constants are spread all over the place in the executable by the compiler and are by far not as easy to locate in the binary. Typed constants, however, are collected into one single block at compile time and loaded by the runtimes to the bottom of the data segment.

What you did was, just as shown above, that you put a magic marker (or beacon) at the beginning of the typed constants to be able to find them in the executable, then patched what you found after that. Note this was a bit of a unique feature in Turbo Pascal - other languages and compilers don't necessarily have everything so conveniently put into one place. This "one place" was required because older MSDOS binary formats for COM and .EXE files did not include enough (or rather, any) information on where in the binary the initialized data segment contents is to be found.

I doubt this technique would be reproducible easily with today's operating systems: Program files are typically write-protected for security reasons (which makes the technique much harder today - you need admin privilege to patch an executable), on the other hand, modern object formats actually mark the initialized data segments specifically for the loader, so, constants to patch would actually be easier to find in a documented way, even without a beacon.

Typed constants in Turbo Pascal 3.x and earlier lived in the code segment, a fact which was very useful when trying to use a piece of inline machine code to e.g. read a list of tiles from one table, look up the shape data in a second table, and render the results to the screen, all without having to perform any segment-register loads within the loop. Putting the tile shapes in the code segment meant one didn't need to keep them within the size-limited data segment, nor have the ES register point to it. — supercat, Commented Apr 17, 2023 at 18:15
@supercat TP 3.0 created .COM files - code, data and stack segments are all the same here, and have all the same size limit. — tofro, Commented Apr 17, 2023 at 19:14
When DOS loads a COM file, CS, DS, and SS are initialized to the same value, but the startup code for Turbo Pascal 3.x on x86 allocated additional areas of storage and made SS and DS point to them. — supercat, Commented Apr 17, 2023 at 19:22
Well, I was searching for a way to store the date when a program was first ran into the program itself (as rudimentary time limited protection) — HermDP, Commented Apr 20, 2023 at 17:34
@HermDP Then you should probably do hat in your setup, something you typically do with admin privileges anyways. — tofro, Commented Apr 21, 2023 at 7:55

Justme · Accepted Answer · 2023-04-15 17:07:56Z

12

There may be no specific name for it.

DOS .EXE programs are just files on disk, and any program can be used to change the contents of .EXE file in any way, like a hex editor.

But as it is possible for a Turbo Pascal program to know in which memory address the variable is stored during runtime, and so the program can fetch information to which memory address it is loaded to, the program can use this information to calculate where in the .EXE file the the contents of the variable is, and simply overwrite the data for that variable.

This obviously works only for the original uncompressed .EXE file.

Another way to do it is to just leave the original executable part untouched, and just add custom data of known size after it. Many programs like demos used this method if they had a single large .EXE file for code and resources, but of course used it only for reading. This way the .EXE can be first compressed before appending any user data to it.

answered Apr 15, 2023 at 17:07

Justme

34.4k1 gold badge79 silver badges157 bronze badges

6

The "Another way to do it" is also more or less how self-extracting archives work. You can make your own Zip self-extractor stub just by writing a Zip extraction tool that uses the path to itself for the input zip file, concatenating the Zip file onto the end, and then using Info-ZIP's zip -A to fix up the offsets in the archive data.
– ssokolow
Commented Apr 16, 2023 at 1:10
1

Updating COM files was even easier in MS-DOS and CP/M. If a piece of initialized data was stored at address CS:0x100+x (MS-DOS), or hardware address 0x100+x (CPM), its initial value would be stored starting at offset x in the COM file.
– supercat
Commented Apr 17, 2023 at 18:18
However CP/M doesn't tell a program the filename it was loaded from, so if a program was renamed or just loaded from an unexpected drive, it wouldn't be able to update itself.
– john_e
Commented Apr 19, 2023 at 7:50

Add a comment |

user20985user20985 · Accepted Answer · 2023-04-17 18:44:50Z

It wasn't special then

DOS had none of the protection mechanisms known from modern operating systems. There was no distinction between data and code - you could mix them and jump to data segment or wherever you want, no write protection of executable. EXE was just a file like any other, and you could write to the file from which your code was loaded. Since executable file name was given to you, you could open it for write without any problems. Writing to your own executable seemed like a great idea, and just like writing to the separate config file, need no special name other than 'writing to the file'.

Of course, it was a paradise for homegrown virus creators, since it required no more knowledge than knowing the structure of EXE file and how the code gets executed.

Davislor · Accepted Answer · 2023-04-16 13:18:42Z

1

Many operating systems have some concept like a “resource” that can be attached to an executable, but read and updated like a file. This would be the most natural way to save configuration settings in the executable file itself under Windows or MacOS. But since a compiler and linker know where in the executable the constant data is stored, one could, if it wanted to, enable you to write a self-modifying program.

If you tried to alter the executable file itself, most modern operating systems would at least make you use Administrator permissions to do it. Some would only update a shadow copy in your own personal directory.

edited Apr 16, 2023 at 13:18

answered Apr 16, 2023 at 2:55

Davislor

8,8751 gold badge29 silver badges35 bronze badges

This answers a different question, not OP’s question.
– RonJohn
Commented Apr 16, 2023 at 16:55

Add a comment |

Polluks · Accepted Answer · 2023-04-18 22:59:57Z

0

There was no termcap file but tinst.com. This installer was patching an executable, some kind of "self-modification".

answered Apr 18, 2023 at 22:59

Polluks

4993 silver badges7 bronze badges

Add a comment |

Stack Exchange Network

Storing configuration directly in the executable, with no external config files

5 Answers 5

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged
programming
pascal
.

Hot Network Questions

Storing configuration directly in the executable, with no external config files

5 Answers 5

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged programmingpascal.

Related

Hot Network Questions

Not the answer you're looking for? Browse other questions tagged
programming
pascal
.