The x86 architecture provides a hardware single-step trap for debugging. How much does it slow down the running program?
If, say, a Linux kernel function was created to do nothing but single step a process, how much slower would that process run? Does anybody have a good estimate?
I'm wondering this after spending a week tracking down a threading bug. It'd be nice if these bugs could be reproduced. How about a feature that executed two threads sequentially, alternating between executing an instruction on one thread and then an instruction on the other, in a predictable manner. I'm thinking of a pseudo-random number generator that would produce a string of bits - 0 means execute an instruction on thread 1, 1 means execute an instruction on thread 2.
Then you could seed the PRNG and get a reproducible interleaving of instructions. Different PRNG seeds would produce different interleaving patterns. You could run a test case under a bunch of PRNG seeds, and if you found one that triggered a failure, reproduce it.
Anybody heard of anything like this being done?
Update:
How could it be done?
Assume we're running on something like a Core i5, where you've got 4 processor states and 2 cores. We're using the single-step trap to bounce a process back and forth from its user space to its kernel space. So that's two of the states, right? Then we've got the other thread running on the other core with its user space and kernel space states, right? There's something like a spinlock (probably two spinlocks) synchronizing the two kernel threads. Each one spinlocks while the other one steps its user space a few instructions, then they synchronize and exchange roles.
Sounds like we've got just the right number of threads and cores so that everything fits on the chip at once. But how fast does it run?
We could just try it. Somebody could write some kernel code. Or maybe somebody knows.
All that fancy stuff these new chips do. I'd be impressed, and not totally surprised, if it ran quick.