SlideShare a Scribd company logo
Low latency
persistence, logging, IPC
and more
Peter Lawrey
CEO, Principal Consultant
Higher Frequency Trading
Agenda
Who are we?
Libraries designed to be ultra-low GC.
When would you use them?
Sample code.
Who are we
Higher Frequency Trading is a small consulting
and software development house specialising in

Low latency, high throughput software

8 developers in Europe and USA.

Sponsor HFT related open source projects

Core Java engineering
Who am I?
Peter Lawrey
- CEO and Principal Consultant
- 3rd on Stackoverflow for Java,
most Java Performance answers.
- Founder of the Performance Java User's Group
- An Australian, based in the U.K.
What is our OSS
Key OpenHFT projects

Chronicle, low latency logging, event store and
IPC. (record / log everything)

HugeCollections, cross process embedded
persisted data stores. (only need the latest)
Millions of operations per second.
Micro-second latency.
What is HFT?

No standard definition.

Trading faster than a human can see.

Being fast can make the difference between
making and losing money.

For different systems this means typical
latencies of between
− 10 micro-seconds and
− 10 milli-second.
(Latencies external to the provider)
Event driven processing
Trading system use event driven processing to
minimise latency in a system.

Any data needed should already be loaded in
memory, not go off to a slow SQL database.

Each input event triggers a response, unless
there is a need to limit the output.
Simple Trading System
Critical Path
A trading system is designed around the critical
path. This has to be as short in terms of
latency as possible.

Critical path has a tight latency budget which
excludes many traditional databases.

Even the number of network hops can be
minimised.

Non critical path can use tradition databases
What is Chronicle?
Very fast embedded persistence for Java.
Functionality is simple and low level by design
Where does Chronicle come from

Low latency, high frequency trading
– Applications which are sub 100 micro-second
external to the system.
Where does Chronicle come from

High throughput trading systems
– Hundreds of thousand of events per second
Where does Chronicle come from

Modes of use
– GC free
– Lock-less
– Shared memory
– Text or binary
– Replicated over
TCP
– Supports thread
affinity
Use for Chronicle

Synchronous text logging
– support for SLF4J coming.

Synchronous binary data logging
Use for Chronicle

Messaging between processes
via shared memory

Messaging across systems
Use for Chronicle

Supports recording micro-second timestamps
across the systems

Replay for production data in test
Writing to Chronicle
IndexedChronicle ic = new IndexedChronicle(basePath);
Appender excerpt = ic.createAppender();
for (int i = 1; i <= runs; i++) {
excerpt.startExcerpt();
excerpt.writeUnsignedByte('M'); // message type
excerpt.writeLong(i); // e.g. time stamp
excerpt.writeDouble(i);
excerpt.finish();
}
ic.close();
Reading from Chronicle
IndexedChronicle ic = new IndexedChronicle(basePath);
ic.useUnsafe(true); // for benchmarks
Tailer excerpt = ic.createTailer();
for (int i = 1; i <= runs; i++) {
while (!excerpt.nextIndex()) {
// busy wait
}
char ch = (char) excerpt.readUnsignedByte();
long l = excerpt.readLong();
double d = excerpt.readDouble();
assert ch == 'M';
assert l == i;
assert d == i;
excerpt.finish();
}
ic.close();
Chronicle code
VanillaChronicle chronicle =
new VanillaChronicle(baseDir);
// one per thread
ExcerptAppender appender =
chronicle.createAppender();
// once per message
appender.startExcerpt();
appender.appendDateMillis(System.curren
tTimeMillis())
.append(" - ").append(finalT)
Chronicle and replication
Replication is point to point (TCP)
Server A records an event
– replicates to Server B
Server B reads local copy
– B processes the event
Server B stores the result.
– replicates to Server A
Server A replies.
Round trip
25 micro-seconds
99% of the time
GC-free
Lock less
Off heap
Unbounded
How does it recover?
Once finish()
returns, the OS will do
the rest.
If an excerpt is
incomplete, it will be
pruned.
Cache friendly
Data is laid out continuously, naturally packed.
You can compress some types. One entry
starts in the next byte to the previous one.
Consumer insensitive
No matter how slow the consumer is, the
producer never has to wait. It never needs to
clean messages before publishing (as a ring
buffer does)
You can start a consumer at the end of the day
e.g. for reporting. The consumer can be more
than the main memory size behind the
producer as a Chronicle is not limited by main
memory.
How does it collect garbage?
There is an assumption that your application has a daily
or weekly maintenance cycle.
This is implemented by
closing the files and
creating new ones.
i.e. the whole lot is moved,
compressed or deleted.
Anything which must be
retained can be copied
to the new Chronicle
Is there a lower level API?
Chronicle 2.0 is based on OpenHFT Java Lang
library which supports access to 64-bit native
memory.

Has long size and offsets.

Support serialization and deserialization

Thread safe access including locking
Is there a higher level API?
You can hide the low level details with an
interface.
Is there a higher level API?
There is a demo
program with a
simple interface.
This models a “hub”
process which take in
events, processes
them and publishes
results.
HugeCollections
HugeCollections provides key-value storage.

Persisted (by the OS)

Embedded in multiple processes

Concurrent reads and writes

Off heap accessible without serialization.
Creating a SharedHashMap

Uses a builder to create the map as there are
a number of options.
Updating an entry in the SHM

Create an off heap reference from an interface
and update it as if it were on the heap
Accessing a SHM entry

Accessing an entry looks like normal Java
code, except arrays use a method xxxAt(n)
Why use SHM?

Shared between processes

Persisted, or “written” to tmpfs e.g. /dev/shm

Can be GC-less, so not impact on pause
times.

As little as 1/5th of the memory of
ConcurrentHashMap

TCP/UDP multi-master replication planned.
HugeCollections and throughput
SharedHashMap tested on a machine with 128
GB, 16 cores, 32 threads.
String keys, 64-bit long values.

10 million key-values updated at 37 M/s

500 million key-values updated at 23 M/s

On tmpfs, 2.5 billion key-values at 26 M/s
HugeCollections and latency
For a Map of small key-values (both 64-bit longs)
With an update rate of 1 M/s, one thread.
Percentile 100K
entries
1 M entries 10 M entries
50% (typical) 0.1 μsec 0.2 μsec 0.2 μsec
90% (worst 1 in 10) 0.4 μsec 0.5 μsec 0.5 μsec
99% (worst 1 in 100) 4.4 μsec 5.5 μsec 7 μsec
99.9% 9 μsec 10 μsec 10 μsec
99.99% 10 μsec 12 μsec 13 μsec
worst 24 μsec 29 μsec 26 μsec
Performance of CHM
With a 30 GB heap, 12 updates per entry
Performance of SHM
With a 64 MB heap, 12 updates per entry, no GCs
Bonus topic: Units
A peak times an application writes 49 “mb/s” to a
disk which supports 50 “mb/s” and is replicated
over a 100 “mb/s” network.
What units were probably intended and where
would you expect buffering if any?
Bonus topic: Units
A peak times an application writes 49 MiB/s to a
disk which supports 50 MB/s and is replicated
over a 100 Mb/s network.
MiB = 1024^2 bytes
MB = 1000^2 bytes
Mb = 125,000 bytes
The 49 MiB/s is the highest rate and 100 Mb/s is
the lowest.
Bonus topic: Units
Unit bandwidth Used for
mb - miili-bit mb/s – milli-bits per second ?
mB - milli-byte mB/s – milli-bytes per second ?
kb – kilo-bit (1000) kb/s – kilo-bits (baud) per second Dial up bandwidth
kB – kilo-byte (1000) kB/s – kilo-bytes per second ?
Mb – mega-bit (1000^2) Mb/s – mega-bits (baud) per second Cat 5 ethernet
MB - mega-byte (1000^2) MB/s – mega bytes per second Disk bandwidth
Mib – mibi-bit (1024^2) Mib – Mibi-bits per second ?
MiB – mibi-byte (1024^2) MiB – Mibi-bytes per second Memory bandwidth
Gb – giga-bit (1000^3) Gb/s – giga-bit (baud) per second High speed networks
GB – giga-byte (1000^3) GB/s – giga-byte per second -
Gib – gibi-bit (1024^3) Gib/s – gibi-bit per second -
GiB – gibi-byte (1024^3) GiB/s – gibi-byte per second. Memory Bandwidth
Q & A
https://github.com/OpenHFT/OpenHFT
@PeterLawrey
peter.lawrey@higherfrequencytrading.com

More Related Content

Open HFT libraries in @Java

  • 1. Low latency persistence, logging, IPC and more Peter Lawrey CEO, Principal Consultant Higher Frequency Trading
  • 2. Agenda Who are we? Libraries designed to be ultra-low GC. When would you use them? Sample code.
  • 3. Who are we Higher Frequency Trading is a small consulting and software development house specialising in  Low latency, high throughput software  8 developers in Europe and USA.  Sponsor HFT related open source projects  Core Java engineering
  • 4. Who am I? Peter Lawrey - CEO and Principal Consultant - 3rd on Stackoverflow for Java, most Java Performance answers. - Founder of the Performance Java User's Group - An Australian, based in the U.K.
  • 5. What is our OSS Key OpenHFT projects  Chronicle, low latency logging, event store and IPC. (record / log everything)  HugeCollections, cross process embedded persisted data stores. (only need the latest) Millions of operations per second. Micro-second latency.
  • 6. What is HFT?  No standard definition.  Trading faster than a human can see.  Being fast can make the difference between making and losing money.  For different systems this means typical latencies of between − 10 micro-seconds and − 10 milli-second. (Latencies external to the provider)
  • 7. Event driven processing Trading system use event driven processing to minimise latency in a system.  Any data needed should already be loaded in memory, not go off to a slow SQL database.  Each input event triggers a response, unless there is a need to limit the output.
  • 9. Critical Path A trading system is designed around the critical path. This has to be as short in terms of latency as possible.  Critical path has a tight latency budget which excludes many traditional databases.  Even the number of network hops can be minimised.  Non critical path can use tradition databases
  • 10. What is Chronicle? Very fast embedded persistence for Java. Functionality is simple and low level by design
  • 11. Where does Chronicle come from  Low latency, high frequency trading – Applications which are sub 100 micro-second external to the system.
  • 12. Where does Chronicle come from  High throughput trading systems – Hundreds of thousand of events per second
  • 13. Where does Chronicle come from  Modes of use – GC free – Lock-less – Shared memory – Text or binary – Replicated over TCP – Supports thread affinity
  • 14. Use for Chronicle  Synchronous text logging – support for SLF4J coming.  Synchronous binary data logging
  • 15. Use for Chronicle  Messaging between processes via shared memory  Messaging across systems
  • 16. Use for Chronicle  Supports recording micro-second timestamps across the systems  Replay for production data in test
  • 17. Writing to Chronicle IndexedChronicle ic = new IndexedChronicle(basePath); Appender excerpt = ic.createAppender(); for (int i = 1; i <= runs; i++) { excerpt.startExcerpt(); excerpt.writeUnsignedByte('M'); // message type excerpt.writeLong(i); // e.g. time stamp excerpt.writeDouble(i); excerpt.finish(); } ic.close();
  • 18. Reading from Chronicle IndexedChronicle ic = new IndexedChronicle(basePath); ic.useUnsafe(true); // for benchmarks Tailer excerpt = ic.createTailer(); for (int i = 1; i <= runs; i++) { while (!excerpt.nextIndex()) { // busy wait } char ch = (char) excerpt.readUnsignedByte(); long l = excerpt.readLong(); double d = excerpt.readDouble(); assert ch == 'M'; assert l == i; assert d == i; excerpt.finish(); } ic.close();
  • 19. Chronicle code VanillaChronicle chronicle = new VanillaChronicle(baseDir); // one per thread ExcerptAppender appender = chronicle.createAppender(); // once per message appender.startExcerpt(); appender.appendDateMillis(System.curren tTimeMillis()) .append(" - ").append(finalT)
  • 20. Chronicle and replication Replication is point to point (TCP) Server A records an event – replicates to Server B Server B reads local copy – B processes the event Server B stores the result. – replicates to Server A Server A replies. Round trip 25 micro-seconds 99% of the time GC-free Lock less Off heap Unbounded
  • 21. How does it recover? Once finish() returns, the OS will do the rest. If an excerpt is incomplete, it will be pruned.
  • 22. Cache friendly Data is laid out continuously, naturally packed. You can compress some types. One entry starts in the next byte to the previous one.
  • 23. Consumer insensitive No matter how slow the consumer is, the producer never has to wait. It never needs to clean messages before publishing (as a ring buffer does) You can start a consumer at the end of the day e.g. for reporting. The consumer can be more than the main memory size behind the producer as a Chronicle is not limited by main memory.
  • 24. How does it collect garbage? There is an assumption that your application has a daily or weekly maintenance cycle. This is implemented by closing the files and creating new ones. i.e. the whole lot is moved, compressed or deleted. Anything which must be retained can be copied to the new Chronicle
  • 25. Is there a lower level API? Chronicle 2.0 is based on OpenHFT Java Lang library which supports access to 64-bit native memory.  Has long size and offsets.  Support serialization and deserialization  Thread safe access including locking
  • 26. Is there a higher level API? You can hide the low level details with an interface.
  • 27. Is there a higher level API? There is a demo program with a simple interface. This models a “hub” process which take in events, processes them and publishes results.
  • 28. HugeCollections HugeCollections provides key-value storage.  Persisted (by the OS)  Embedded in multiple processes  Concurrent reads and writes  Off heap accessible without serialization.
  • 29. Creating a SharedHashMap  Uses a builder to create the map as there are a number of options.
  • 30. Updating an entry in the SHM  Create an off heap reference from an interface and update it as if it were on the heap
  • 31. Accessing a SHM entry  Accessing an entry looks like normal Java code, except arrays use a method xxxAt(n)
  • 32. Why use SHM?  Shared between processes  Persisted, or “written” to tmpfs e.g. /dev/shm  Can be GC-less, so not impact on pause times.  As little as 1/5th of the memory of ConcurrentHashMap  TCP/UDP multi-master replication planned.
  • 33. HugeCollections and throughput SharedHashMap tested on a machine with 128 GB, 16 cores, 32 threads. String keys, 64-bit long values.  10 million key-values updated at 37 M/s  500 million key-values updated at 23 M/s  On tmpfs, 2.5 billion key-values at 26 M/s
  • 34. HugeCollections and latency For a Map of small key-values (both 64-bit longs) With an update rate of 1 M/s, one thread. Percentile 100K entries 1 M entries 10 M entries 50% (typical) 0.1 μsec 0.2 μsec 0.2 μsec 90% (worst 1 in 10) 0.4 μsec 0.5 μsec 0.5 μsec 99% (worst 1 in 100) 4.4 μsec 5.5 μsec 7 μsec 99.9% 9 μsec 10 μsec 10 μsec 99.99% 10 μsec 12 μsec 13 μsec worst 24 μsec 29 μsec 26 μsec
  • 35. Performance of CHM With a 30 GB heap, 12 updates per entry
  • 36. Performance of SHM With a 64 MB heap, 12 updates per entry, no GCs
  • 37. Bonus topic: Units A peak times an application writes 49 “mb/s” to a disk which supports 50 “mb/s” and is replicated over a 100 “mb/s” network. What units were probably intended and where would you expect buffering if any?
  • 38. Bonus topic: Units A peak times an application writes 49 MiB/s to a disk which supports 50 MB/s and is replicated over a 100 Mb/s network. MiB = 1024^2 bytes MB = 1000^2 bytes Mb = 125,000 bytes The 49 MiB/s is the highest rate and 100 Mb/s is the lowest.
  • 39. Bonus topic: Units Unit bandwidth Used for mb - miili-bit mb/s – milli-bits per second ? mB - milli-byte mB/s – milli-bytes per second ? kb – kilo-bit (1000) kb/s – kilo-bits (baud) per second Dial up bandwidth kB – kilo-byte (1000) kB/s – kilo-bytes per second ? Mb – mega-bit (1000^2) Mb/s – mega-bits (baud) per second Cat 5 ethernet MB - mega-byte (1000^2) MB/s – mega bytes per second Disk bandwidth Mib – mibi-bit (1024^2) Mib – Mibi-bits per second ? MiB – mibi-byte (1024^2) MiB – Mibi-bytes per second Memory bandwidth Gb – giga-bit (1000^3) Gb/s – giga-bit (baud) per second High speed networks GB – giga-byte (1000^3) GB/s – giga-byte per second - Gib – gibi-bit (1024^3) Gib/s – gibi-bit per second - GiB – gibi-byte (1024^3) GiB/s – gibi-byte per second. Memory Bandwidth