The event scheduler is a recurrent theme in event-driven architectures: something triggers an event that needs to be checked in the future one or more times. I has particular use in trading where you want to correlate one market event with price movements in the future at several time deltas (1ms, 100ms, 1s, 1 min, etc). During strategy discovery and backtesting the scheduling can reach billions of events.
A parenthesis here: this is a general algorithm challenge that can be implemented in any language. We provide an example implementation in C++ but it does not need to be written in that language necessarily.
The scheduler problem can be summarized in the following interface:
using Time = std::uint64_t; ///! nanos since epoch
/** Base class for all events to be handled by the scheduler */
struct Event {
virtual void fire( Time scheduled, Time now ) = 0;
};
/** A class that functions as a container for events to be fired in the future.
The same event can be added multiple times into the scheduler.
Distinct events can be added at the same time.
*/
class Scheduler {
public:
virtual ~Scheduler() = default;
/** Schedules one event */
virtual void schedule( Event* ev, Time tm ) =0;
/** Checks events and fires accordingly.
"t" is guaranteed ascending */
virtual bool check( Time t ) =0;
};
As a note, we decided to use raw points so not to introduce noise around smart pointers, move semantics, etc. These can be added later on.
The scheduler can be trivially coded using the standard containers but it will lead to log(N) algorithms for scheduling. Other trivial solutions will lead to O(1) for scheduling but O(logN) for checking. For example:
class StandardScheduler : public Scheduler {
public:
StandardScheduler();
virtual void schedule( Event* ev, Time tm ) override;
virtual bool check( Time t ) override;
private:
using EventMap = std::multimap<Time,Event*>;
EventMap events;
Time current;
};
Our goal in this test is to generate a solution that remains fast even for very large number of events, say 1E8. Therefore it helps if the solution is constant time O(1) in both scheduling and checking, in average.
The main requirements are:
Both scheduling and checking need to be very fast for extremely large number of events (eg 1E8). It is acceptable if it is slower at times (hiccups) for rebalance for example.
Individual events can be submitted multiple times
Distinct events can be submitted under the same time (same key in a map)
Events need to be fired in strict order by time.
The best solution will display the best execution times to schedule and iterate over a very large number of samples, which we set arbitrarily at 100,000,000 events with 3 random reposts of the same event. If two solutions come in first with 10% between them, I would like to consider the code that is smaller in size.
As per suggestion in the sandbox, I would like to propose winners per language: C++, Java, Python, etc since the commenter highlighted that this task is bread and butter for C/C++. So other languages can also compete.
The times need to be randomized and distributed over a span of time between 0 and 10 times the number of samples. The iteration is done every 5 time units.
A basic implementation for this challenge is provided on Github: https://github.com/HFTrader/scheduler.git.
The sample code is all standard C++ with no platform-specific features and is provided with a CMakeLists.txt which should make it able to compile in all major platforms, although it has only been tested on Linux.
If you choose to do it C++, you might clone, subclass the Scheduler class and replace it in main(). An example StandardScheduler is provided. In other languages, please replace with equivalent constructs.
git clone https://github.com/HFTrader/scheduler.git
cd scheduler
mkdir build
cd build
cmake -DCMAKE_BUILD_TYPE=release ..
make
./scheduler 10000000 3
The timings in this C++ code provide the number of nanoseconds for scheduling and checking (iteration through time). A sample run can be like:
$ ./scheduler
Usage:
./scheduler <numsamples> <numreposts>
$ ./scheduler 10000000 3
Timings schedule:1196 check:126
Success!
The above code ran in 1196 and 126 nanoseconds per event, which is the key result. The above was ran in an AMD Ryzen Threadripper 3960X with 64Gb RAM under Ubuntu 20.04LTS and compiled with g++ 9.3 under "-O3".
If you are able to submit your solution to that Github repository, I will make sure to run all the solutions and post results.