SlideShare a Scribd company logo
Performance Optimization Techniques of
MessagePack-Ruby
Sadayuki Furuhashi
RubyKaigi 2019 #MyMessagePack
Tweet your msgpack usage with
1 1 0 1 0 1 0 1
1 0 1 1 0 0 1 0
1 0 1 0 0 1 1 0
1 1 0 0 1 1 1 0
1 1 0 0 1 1 1 0
1 0 0 0 0 1 1 0
1 1 1 0 0 1 1 0
1 0 1 0 0 1 1 0
0 0 0 0 1 0 1 0
1 0 0 0 0 1 1 0
About me
A founder of Treasure Data, Inc.
Located in Silicon Valley, USA.
OSS Hacker. Github: @frsyuki
1 1 0 1 0 1 0 1
1 0 1 1 0 0 1 0
1 0 1 0 0 1 1 0
1 1 0 0 1 1 1 0
1 1 0 0 1 1 1 0
1 0 0 0 0 1 1 0
1 1 1 0 0 1 1 0
1 0 1 0 0 1 1 0
0 0 0 0 1 0 1 0
1 0 0 0 0 1 1 0
1 1 0 0 0 1 1 0
Basics of MessagePack
1 1 0 1 0 1 0 1
1 0 1 1 0 0 1 0
1 0 1 0 0 1 1 0
1 1 0 0 1 1 1 0
1 1 0 0 1 1 1 0
1 0 0 0 0 1 1 0
1 1 1 0 0 1 1 0
1 0 1 0 0 1 1 0
0 0 0 0 1 0 1 0
1 0 0 0 0 1 1 0
What’s MessagePack?
{ “compact”: true, “schema”: 0 }
82 A7 compact C3
A6 schema 00
JSON
MessagePack
It’s like JSON, but fast and small.
7-byte string2-element map true
6-byte string 0
27 bytes
18 bytes (34% smaller)
1 1 0 1 0 1 0 1
1 0 1 1 0 0 1 0
1 0 1 0 0 1 1 0
1 1 0 0 1 1 1 0
1 1 0 0 1 1 1 0
1 0 0 0 0 1 1 0
1 1 1 0 0 1 1 0
1 0 1 0 0 1 1 0
0 0 0 0 1 0 1 0
1 0 0 0 0 1 1 0
What’s MessagePack?
It’s like JSON, but fast and small.
> Self-descriptive, Schema-on-Read semantics
> Everyone knows
> De facto standard data format
> Human-readable
> Self-descriptive, Schema-on-Read semantics
> Everyone who uses JSON can use
> Drop-in improvement of JSON
> Machine-readable
JSON
MessagePack
1 1 0 1 0 1 0 1
1 0 1 1 0 0 1 0
1 0 1 0 0 1 1 0
1 1 0 0 1 1 1 0
1 1 0 0 1 1 1 0
1 0 0 0 0 1 1 0
1 1 1 0 0 1 1 0
1 0 1 0 0 1 1 0
0 0 0 0 1 0 1 0
1 0 0 0 0 1 1 0
timestamp 32
timestamp 64
timestamp 96
Language Agnostic Type System
MessagePack

Type System
MessagePack

Format
String
Timestamp
Language Types
fixstr
str 8
str 16
str 32
String
Timestamp
(JSON compatible)(Ruby, Swift, Java, Go, …)
Convert
1 1 0 1 0 1 0 1
1 0 1 1 0 0 1 0
1 0 1 0 0 1 1 0
1 1 0 0 1 1 1 0
1 1 0 0 1 1 1 0
1 0 0 0 0 1 1 0
1 1 1 0 0 1 1 0
1 0 1 0 0 1 1 0
0 0 0 0 1 0 1 0
1 0 0 0 0 1 1 0
Supported by Super Skilled Engineers All Over The World
Ruby /msgpack
1 1 0 1 0 1 0 1
1 0 1 1 0 0 1 0
1 0 1 0 0 1 1 0
1 1 0 0 1 1 1 0
1 1 0 0 1 1 1 0
1 0 0 0 0 1 1 0
1 1 1 0 0 1 1 0
1 0 1 0 0 1 1 0
0 0 0 0 1 0 1 0
1 0 0 0 0 1 1 0
MessagePack-Ruby Major Committers
1 1 0 1 0 1 0 1
1 0 1 1 0 0 1 0
1 0 1 0 0 1 1 0
1 1 0 0 1 1 1 0
1 1 0 0 1 1 1 0
1 0 0 0 0 1 1 0
1 1 1 0 0 1 1 0
1 0 1 0 0 1 1 0
0 0 0 0 1 0 1 0
1 0 0 0 0 1 1 0
Real World MessagePack
We import over 2,000,000 records/sec and

store 30PB of data in MessagePack format.
15 trillion rows processed every day.
Sada Furuhashi, Arm Treasure Data
Sada Furuhashi, Initial Creator of Fluentd
MessagePack is an essential component of
Fluentd to achieve high performance and
flexibility at the same time.
1 1 0 1 0 1 0 1
1 0 1 1 0 0 1 0
1 0 1 0 0 1 1 0
1 1 0 0 1 1 1 0
1 1 0 0 1 1 1 0
1 0 0 0 0 1 1 0
1 1 1 0 0 1 1 0
1 0 1 0 0 1 1 0
0 0 0 0 1 0 1 0
1 0 0 0 0 1 1 0
Adoption of MessagePack Today
Mobile Apps
Microprocessors
Automotive Telematics
Sensors
Cloud Infrastructure
Middleware
Machine Learning
Games
Analytical Databases
MessgePack
Zero-overhead,
heterogeneous
data exchange
1 1 0 1 0 1 0 1
1 0 1 1 0 0 1 0
1 0 1 0 0 1 1 0
1 1 0 0 1 1 1 0
1 1 0 0 1 1 1 0
1 0 0 0 0 1 1 0
1 1 1 0 0 1 1 0
1 0 1 0 0 1 1 0
0 0 0 0 1 0 1 0
1 0 0 0 0 1 1 0
1 1 0 0 0 1 1 0
MessagePack-Ruby implementation
1 1 0 1 0 1 0 1
1 0 1 1 0 0 1 0
1 0 1 0 0 1 1 0
1 1 0 0 1 1 1 0
1 1 0 0 1 1 1 0
1 0 0 0 0 1 1 0
1 1 1 0 0 1 1 0
1 0 1 0 0 1 1 0
0 0 0 0 1 0 1 0
1 0 0 0 0 1 1 0
MessagePack::Packer
Object Buffer
[
1,
2,
{“k” => “v”},
nil
]
94 4-element array
1 1 0 1 0 1 0 1
1 0 1 1 0 0 1 0
1 0 1 0 0 1 1 0
1 1 0 0 1 1 1 0
1 1 0 0 1 1 1 0
1 0 0 0 0 1 1 0
1 1 1 0 0 1 1 0
1 0 1 0 0 1 1 0
0 0 0 0 1 0 1 0
1 0 0 0 0 1 1 0
MessagePack::Packer
Object Buffer
[
1,
2,
{“k” => “v”},
nil
]
94 4-element array
01 Integer 1
1 1 0 1 0 1 0 1
1 0 1 1 0 0 1 0
1 0 1 0 0 1 1 0
1 1 0 0 1 1 1 0
1 1 0 0 1 1 1 0
1 0 0 0 0 1 1 0
1 1 1 0 0 1 1 0
1 0 1 0 0 1 1 0
0 0 0 0 1 0 1 0
1 0 0 0 0 1 1 0
MessagePack::Packer
Object Buffer
[
1,
2,
{“k” => “v”},
nil
]
94 4-element array
01 Integer 1
02 Integer 2
1 1 0 1 0 1 0 1
1 0 1 1 0 0 1 0
1 0 1 0 0 1 1 0
1 1 0 0 1 1 1 0
1 1 0 0 1 1 1 0
1 0 0 0 0 1 1 0
1 1 1 0 0 1 1 0
1 0 1 0 0 1 1 0
0 0 0 0 1 0 1 0
1 0 0 0 0 1 1 0
MessagePack::Packer
Object Buffer
[
1,
2,
{“k” => “v”},
nil
]
94 4-element array
01 Integer 1
02 Integer 2
81 1-element map
1 1 0 1 0 1 0 1
1 0 1 1 0 0 1 0
1 0 1 0 0 1 1 0
1 1 0 0 1 1 1 0
1 1 0 0 1 1 1 0
1 0 0 0 0 1 1 0
1 1 1 0 0 1 1 0
1 0 1 0 0 1 1 0
0 0 0 0 1 0 1 0
1 0 0 0 0 1 1 0
MessagePack::Packer
Object Buffer
[
1,
2,
{“k” => “v”},
nil
]
94
01
02
81
A1 1-byte string
4-element array
Integer 1
Integer 2
1-element map
1 1 0 1 0 1 0 1
1 0 1 1 0 0 1 0
1 0 1 0 0 1 1 0
1 1 0 0 1 1 1 0
1 1 0 0 1 1 1 0
1 0 0 0 0 1 1 0
1 1 1 0 0 1 1 0
1 0 1 0 0 1 1 0
0 0 0 0 1 0 1 0
1 0 0 0 0 1 1 0
MessagePack::Packer
Object Buffer
[
1,
2,
{“k” => “v”},
nil
]
94
01
02
81
‘k’ ‘k’
A1 1-byte string
4-element array
Integer 1
Integer 2
1-element map
1 1 0 1 0 1 0 1
1 0 1 1 0 0 1 0
1 0 1 0 0 1 1 0
1 1 0 0 1 1 1 0
1 1 0 0 1 1 1 0
1 0 0 0 0 1 1 0
1 1 1 0 0 1 1 0
1 0 1 0 0 1 1 0
0 0 0 0 1 0 1 0
1 0 0 0 0 1 1 0
MessagePack::Packer
Object Buffer
[
1,
2,
{“k” => “v”},
nil
]
94
01
02
81
‘k’ ‘k’
A1 1-byte string
‘v’ ‘v’
A1 1-byte string
4-element array
Integer 1
Integer 2
1-element map
1 1 0 1 0 1 0 1
1 0 1 1 0 0 1 0
1 0 1 0 0 1 1 0
1 1 0 0 1 1 1 0
1 1 0 0 1 1 1 0
1 0 0 0 0 1 1 0
1 1 1 0 0 1 1 0
1 0 1 0 0 1 1 0
0 0 0 0 1 0 1 0
1 0 0 0 0 1 1 0
MessagePack::Packer
Object Buffer
[
1,
2,
{“k” => “v”},
nil
]
94
01
02
81
‘k’
A1
‘v’
A1
C0 nil
4-element array
Integer 1
Integer 2
1-element map
‘k’
1-byte string
‘v’
1-byte string
1 1 0 1 0 1 0 1
1 0 1 1 0 0 1 0
1 0 1 0 0 1 1 0
1 1 0 0 1 1 1 0
1 1 0 0 1 1 1 0
1 0 0 0 0 1 1 0
1 1 1 0 0 1 1 0
1 0 1 0 0 1 1 0
0 0 0 0 1 0 1 0
1 0 0 0 0 1 1 0
MessagePack::Unpacker
Buffer Stack
94
01
02
81
‘k’
A1
‘v’
A1
C0
[ □, □, □, □ ]
4-element array
1 1 0 1 0 1 0 1
1 0 1 1 0 0 1 0
1 0 1 0 0 1 1 0
1 1 0 0 1 1 1 0
1 1 0 0 1 1 1 0
1 0 0 0 0 1 1 0
1 1 1 0 0 1 1 0
1 0 1 0 0 1 1 0
0 0 0 0 1 0 1 0
1 0 0 0 0 1 1 0
MessagePack::Unpacker
Buffer Stack
94
01
02
81
‘k’
A1
‘v’
A1
C0
[ □, □, □, □ ]
1
Integer 1
1 1 0 1 0 1 0 1
1 0 1 1 0 0 1 0
1 0 1 0 0 1 1 0
1 1 0 0 1 1 1 0
1 1 0 0 1 1 1 0
1 0 0 0 0 1 1 0
1 1 1 0 0 1 1 0
1 0 1 0 0 1 1 0
0 0 0 0 1 0 1 0
1 0 0 0 0 1 1 0
MessagePack::Unpacker
Buffer Stack
94
01
02
81
‘k’
A1
‘v’
A1
C0
[ 1, □, □, □ ]
1
complete object
1 1 0 1 0 1 0 1
1 0 1 1 0 0 1 0
1 0 1 0 0 1 1 0
1 1 0 0 1 1 1 0
1 1 0 0 1 1 1 0
1 0 0 0 0 1 1 0
1 1 1 0 0 1 1 0
1 0 1 0 0 1 1 0
0 0 0 0 1 0 1 0
1 0 0 0 0 1 1 0
MessagePack::Unpacker
Buffer Stack
94
01
02
81
‘k’
A1
‘v’
A1
C0
[ 1, □, □, □ ]
2
Integer 2
1 1 0 1 0 1 0 1
1 0 1 1 0 0 1 0
1 0 1 0 0 1 1 0
1 1 0 0 1 1 1 0
1 1 0 0 1 1 1 0
1 0 0 0 0 1 1 0
1 1 1 0 0 1 1 0
1 0 1 0 0 1 1 0
0 0 0 0 1 0 1 0
1 0 0 0 0 1 1 0
MessagePack::Unpacker
Buffer Stack
94
01
02
81
‘k’
A1
‘v’
A1
C0
[ 1, 2, □, □ ]
2
complete object
1 1 0 1 0 1 0 1
1 0 1 1 0 0 1 0
1 0 1 0 0 1 1 0
1 1 0 0 1 1 1 0
1 1 0 0 1 1 1 0
1 0 0 0 0 1 1 0
1 1 1 0 0 1 1 0
1 0 1 0 0 1 1 0
0 0 0 0 1 0 1 0
1 0 0 0 0 1 1 0
MessagePack::Unpacker
Buffer Stack
94
01
02
81
‘k’
A1
‘v’
A1
C0
[ 1, 2, □, □ ]
{ □ => □ }1-element map
1 1 0 1 0 1 0 1
1 0 1 1 0 0 1 0
1 0 1 0 0 1 1 0
1 1 0 0 1 1 1 0
1 1 0 0 1 1 1 0
1 0 0 0 0 1 1 0
1 1 1 0 0 1 1 0
1 0 1 0 0 1 1 0
0 0 0 0 1 0 1 0
1 0 0 0 0 1 1 0
MessagePack::Unpacker
Buffer Stack
94
01
02
81
‘k’
A1
‘v’
A1
C0
[ 1, 2, □, □ ]
{ □ => □ }
“k”1-byte string
1 1 0 1 0 1 0 1
1 0 1 1 0 0 1 0
1 0 1 0 0 1 1 0
1 1 0 0 1 1 1 0
1 1 0 0 1 1 1 0
1 0 0 0 0 1 1 0
1 1 1 0 0 1 1 0
1 0 1 0 0 1 1 0
0 0 0 0 1 0 1 0
1 0 0 0 0 1 1 0
MessagePack::Unpacker
Buffer Stack
94
01
02
81
‘k’
A1
‘v’
A1
C0
[ 1, 2, □, □ ]
{ “k” => □ }
“k”
complete object
1 1 0 1 0 1 0 1
1 0 1 1 0 0 1 0
1 0 1 0 0 1 1 0
1 1 0 0 1 1 1 0
1 1 0 0 1 1 1 0
1 0 0 0 0 1 1 0
1 1 1 0 0 1 1 0
1 0 1 0 0 1 1 0
0 0 0 0 1 0 1 0
1 0 0 0 0 1 1 0
MessagePack::Unpacker
Buffer Stack
94
01
02
81
‘k’
A1
‘v’
A1
C0
[ 1, 2, □, □ ]
{ “k” => □ }
“v”
1-byte string
1 1 0 1 0 1 0 1
1 0 1 1 0 0 1 0
1 0 1 0 0 1 1 0
1 1 0 0 1 1 1 0
1 1 0 0 1 1 1 0
1 0 0 0 0 1 1 0
1 1 1 0 0 1 1 0
1 0 1 0 0 1 1 0
0 0 0 0 1 0 1 0
1 0 0 0 0 1 1 0
MessagePack::Unpacker
Buffer Stack
94
01
02
81
‘k’
A1
‘v’
A1
C0
[ 1, 2, □, □ ]
{ “k” => “v” }
“v”
complete object
1 1 0 1 0 1 0 1
1 0 1 1 0 0 1 0
1 0 1 0 0 1 1 0
1 1 0 0 1 1 1 0
1 1 0 0 1 1 1 0
1 0 0 0 0 1 1 0
1 1 1 0 0 1 1 0
1 0 1 0 0 1 1 0
0 0 0 0 1 0 1 0
1 0 0 0 0 1 1 0
MessagePack::Unpacker
Buffer Stack
94
01
02
81
‘k’
A1
‘v’
A1
C0
[1, 2,{“k”=>“v”}, □ ]
{ “k” => “v” }
complete object
1 1 0 1 0 1 0 1
1 0 1 1 0 0 1 0
1 0 1 0 0 1 1 0
1 1 0 0 1 1 1 0
1 1 0 0 1 1 1 0
1 0 0 0 0 1 1 0
1 1 1 0 0 1 1 0
1 0 1 0 0 1 1 0
0 0 0 0 1 0 1 0
1 0 0 0 0 1 1 0
MessagePack::Unpacker
Buffer Stack
94
01
02
81
‘k’
A1
‘v’
A1
C0
[1, 2,{“k”=>“v”}, □ ]
nil
nil
1 1 0 1 0 1 0 1
1 0 1 1 0 0 1 0
1 0 1 0 0 1 1 0
1 1 0 0 1 1 1 0
1 1 0 0 1 1 1 0
1 0 0 0 0 1 1 0
1 1 1 0 0 1 1 0
1 0 1 0 0 1 1 0
0 0 0 0 1 0 1 0
1 0 0 0 0 1 1 0
MessagePack::Unpacker
Buffer Stack
94
01
02
81
‘k’
A1
‘v’
A1
C0
[1, 2,{“k”=>“v”},nil]
nil
complete object
1 1 0 1 0 1 0 1
1 0 1 1 0 0 1 0
1 0 1 0 0 1 1 0
1 1 0 0 1 1 1 0
1 1 0 0 1 1 1 0
1 0 0 0 0 1 1 0
1 1 1 0 0 1 1 0
1 0 1 0 0 1 1 0
0 0 0 0 1 0 1 0
1 0 0 0 0 1 1 0
MessagePack::Unpacker
Buffer Stack
94
01
02
81
‘k’
A1
‘v’
A1
C0
[1, 2,{“k”=>“v”},nil]
Complete object
1 1 0 1 0 1 0 1
1 0 1 1 0 0 1 0
1 0 1 0 0 1 1 0
1 1 0 0 1 1 1 0
1 1 0 0 1 1 1 0
1 0 0 0 0 1 1 0
1 1 1 0 0 1 1 0
1 0 1 0 0 1 1 0
0 0 0 0 1 0 1 0
1 0 0 0 0 1 1 0
1 1 0 0 0 1 1 0
Optimization
1 1 0 1 0 1 0 1
1 0 1 1 0 0 1 0
1 0 1 0 0 1 1 0
1 1 0 0 1 1 1 0
1 1 0 0 1 1 1 0
1 0 0 0 0 1 1 0
1 1 1 0 0 1 1 0
1 0 1 0 0 1 1 0
0 0 0 0 1 0 1 0
1 0 0 0 0 1 1 0
MessagePack::Buffer
MessagePack::Buffer
next
mapped_string
mem
{
“count” => 1,
“page” => 0,
“body” => [
“LONG TEXT”
]
]
Object
83 A5 ‘c’ ‘o’ ’u’ ’n’
’t' 01 A4 ‘p’ ’a’ ’g’
’e’ A4 ‘b’ ‘o’ ’d’ ’y’
1 1 0 1 0 1 0 1
1 0 1 1 0 0 1 0
1 0 1 0 0 1 1 0
1 1 0 0 1 1 1 0
1 1 0 0 1 1 1 0
1 0 0 0 0 1 1 0
1 1 1 0 0 1 1 0
1 0 1 0 0 1 1 0
0 0 0 0 1 0 1 0
1 0 0 0 0 1 1 0
MessagePack::Buffer
MessagePack::Buffer
next
mapped_string
mem
next
mapped_string
mem
{
“count” => 1,
“page” => 0,
“body” => [
“LONG TEXT”
]
]
Object
83 A5 ‘c’ ‘o’ ’u’ ’n’
’t' 01 A4 ‘p’ ’a’ ’g’
’e’ A4 ‘b’ ‘o’ ’d’ ’y’
91 A9 ‘L’ ‘O’ ’N’ ’G’
’ ’ ’T‘ ’E’ ‘X’ ’T’
Add more buffer chunks
instead of realloc()
1 1 0 1 0 1 0 1
1 0 1 1 0 0 1 0
1 0 1 0 0 1 1 0
1 1 0 0 1 1 1 0
1 1 0 0 1 1 1 0
1 0 0 0 0 1 1 0
1 1 1 0 0 1 1 0
1 0 1 0 0 1 1 0
0 0 0 0 1 0 1 0
1 0 0 0 0 1 1 0
Zero-copy write optimization
MessagePack::Buffer
next
mapped_string
mem
next
mapped_string
mem
next
mapped_string
mem
{
“count” => 1,
“page” => 0,
“body” => [
“LONG TEXT”
]
]
Object
83 A5 ‘c’ ‘o’ ’u’ ’n’
’t' 01 A4 ‘p’ ’a’ ’g’
’e’ A4 ‘b’ ‘o’ ’d’ ’y’
91 A9
“LONG TEXT”
rb_str_dup()
Fast copy-on-write
1 1 0 1 0 1 0 1
1 0 1 1 0 0 1 0
1 0 1 0 0 1 1 0
1 1 0 0 1 1 1 0
1 1 0 0 1 1 1 0
1 0 0 0 0 1 1 0
1 1 1 0 0 1 1 0
1 0 1 0 0 1 1 0
0 0 0 0 1 0 1 0
1 0 0 0 0 1 1 0
Zero-copy read optimization
MessagePack::Buffer
next
mapped_string
mem
Object
{
“count” => 1,
“page” => 0,
“body” => [
“LONG TEXT”
]
]
rb_str_substr()
Fast copy-on-write
x83 A5 c o u n t
t x01 xA4 p a g
e xA4 b o d y
x91 A9 L O N G

T E X T ”
“
※ if SHARABLE_SUBSTRING_P() returns true
x83 A5 c o u n t
t x01 xA4 p a g
e xA4 b o d y
x91 A9 L O N G

T E X T ”
Source String
“
rb_str_dup()
Fast copy-on-write
1 1 0 1 0 1 0 1
1 0 1 1 0 0 1 0
1 0 1 0 0 1 1 0
1 1 0 0 1 1 1 0
1 1 0 0 1 1 1 0
1 0 0 0 0 1 1 0
1 1 1 0 0 1 1 0
1 0 1 0 0 1 1 0
0 0 0 0 1 0 1 0
1 0 0 0 0 1 1 0
Reserved memory pool
MessagePack::Buffer
next
mapped_string
mem
Global memory pool
next
mapped_string
mem
4KB 4KB
4KB 4KB
4KB 4KB
4KB 4KB
83 A5 ‘c’ ‘o��� ’u’ ’n’
’t' 01 A4 ‘p’ ’a’ ’g’
’e’ A4 ‘b’ ‘o’ ’d’ ’y’
91 A9
1 1 0 1 0 1 0 1
1 0 1 1 0 0 1 0
1 0 1 0 0 1 1 0
1 1 0 0 1 1 1 0
1 1 0 0 1 1 1 0
1 0 0 0 0 1 1 0
1 1 1 0 0 1 1 0
1 0 1 0 0 1 1 0
0 0 0 0 1 0 1 0
1 0 0 0 0 1 1 0
Reserved memory pool
MessagePack::Buffer
next
mapped_string
mem
Global memory pool
next
mapped_string
mem
4KB 4KB
4KB 4KB
4KB 4KB
4KB 4KB
1 1 0 1 0 1 0 1
1 0 1 1 0 0 1 0
1 0 1 0 0 1 1 0
1 1 0 0 1 1 1 0
1 1 0 0 1 1 1 0
1 0 0 0 0 1 1 0
1 1 1 0 0 1 1 0
1 0 1 0 0 1 1 0
0 0 0 0 1 0 1 0
1 0 0 0 0 1 1 0
1 1 0 0 0 1 1 0
Benchmark & further optimization
1 1 0 1 0 1 0 1
1 0 1 1 0 0 1 0
1 0 1 0 0 1 1 0
1 1 0 0 1 1 1 0
1 1 0 0 1 1 1 0
1 0 0 0 0 1 1 0
1 1 1 0 0 1 1 0
1 0 1 0 0 1 1 0
0 0 0 0 1 0 1 0
1 0 0 0 0 1 1 0
Benchmark Data Sets
DB
100_000.times.map {|i| r.rand(i+1) }
[{
"id" : "gfmg-6ppu",
"name" : "Mortgage Complaints",
"averageRating" : 0,
"createdAt" : 1433953219,
"moderationStatus" : true,
“numberOfComments" : 0,
"description" : "Each week we …”,
…
Blogs
Integers
[
0,0,0,1,3,0,0,
…,

42991,26906,18655,7015

]
{
“results”: [
{
“attachments”:[],
“body”:

”Dear Friends and Colleagues,nn

I always look forward to the…
https://www.justice.gov/api/v1/blog_entries.json
http://data.consumerfinance.gov/api/views.json
Benchmark code available at
https://gist.github.com/frsyuki/9777c4adba2b5c957695b64f17b64ba1
1 1 0 1 0 1 0 1
1 0 1 1 0 0 1 0
1 0 1 0 0 1 1 0
1 1 0 0 1 1 1 0
1 1 0 0 1 1 1 0
1 0 0 0 0 1 1 0
1 1 1 0 0 1 1 0
1 0 1 0 0 1 1 0
0 0 0 0 1 0 1 0
1 0 0 0 0 1 1 0
MessagePack vs JSON
DB
Integers
Blogs
0 600 1200 1800 2400 3000
Serialization time
MessagePack
JSON (Oj)
JSON (JSON)
100%
DB
Integers
Blogs
0 600 1200 1800 2400 3000
Deserialization time
100%
1 1 0 1 0 1 0 1
1 0 1 1 0 0 1 0
1 0 1 0 0 1 1 0
1 1 0 0 1 1 1 0
1 1 0 0 1 1 1 0
1 0 0 0 0 1 1 0
1 1 1 0 0 1 1 0
1 0 1 0 0 1 1 0
0 0 0 0 1 0 1 0
1 0 0 0 0 1 1 0
MessagePack without optimization
DB
Integers
Blogs
0 30 60 90 120 150
Serialization time
100%
DB
Integers
Blogs
0 30 60 90 120 150
Deserialization time
100%
Default
No memory pool
No zero-copy read
1 1 0 1 0 1 0 1
1 0 1 1 0 0 1 0
1 0 1 0 0 1 1 0
1 1 0 0 1 1 1 0
1 1 0 0 1 1 1 0
1 0 0 0 0 1 1 0
1 1 1 0 0 1 1 0
1 0 1 0 0 1 1 0
0 0 0 0 1 0 1 0
1 0 0 0 0 1 1 0
Zero-copy read optimization
MessagePack::Buffer
next
mapped_string
mem
Object
{
“count” => 1,
“page” => 0,
“body” => [
“LONG TEXT”
]
]
x83 A5 c o u n t
t x01 xA4 p a g
e xA4 b o d y
x91 L O N G

T E X T ”
“x83 A5 c o u n t
t x01 xA4 p a g
e xA4 b o d y
x91 L O N G

T E X T ”
Source String
“
rb_str_dup()
Fast copy-on-write
rb_str_substr()
Fast copy-on-write
※ if SHARABLE_SUBSTRING_P() returns true
1 1 0 1 0 1 0 1
1 0 1 1 0 0 1 0
1 0 1 0 0 1 1 0
1 1 0 0 1 1 1 0
1 1 0 0 1 1 1 0
1 0 0 0 0 1 1 0
1 1 1 0 0 1 1 0
1 0 1 0 0 1 1 0
0 0 0 0 1 0 1 0
1 0 0 0 0 1 1 0
Copy-on-write substring
SHARABLE_SUBSTRING_P() returns true only when substring shares the last
0 termination with the original string
s = "a" * 1_000_100
10_000.times do
s.slice(100, 1_000_000)
end
#=> 0.002 sec
s = "a" * 1_000_100
10_000.times do
s.slice(100, 1_000_000 - 1)
end
#=> 2.7 sec
Not including the last character
disables Copy-on-Write
1 1 0 1 0 1 0 1
1 0 1 1 0 0 1 0
1 0 1 0 0 1 1 0
1 1 0 0 1 1 1 0
1 1 0 0 1 1 1 0
1 0 0 0 0 1 1 0
1 1 1 0 0 1 1 0
1 0 1 0 0 1 1 0
0 0 0 0 1 0 1 0
1 0 0 0 0 1 1 0
Copy-on-write substring
SHARABLE_SUBSTRING_P() returns true only when substring shares the last
0 termination with the original string OR Ruby is compiled with
SHARABLE_MIDDLE_SUBSTRING flag (not enabled by default)
s = "a" * 1_000_100
10_000.times do
s.slice(100, 1_000_000)
end
#=> 0.002 sec
s = "a" * 1_000_100
10_000.times do
s.slice(100, 1_000_000 - 1)
end
#=> 2.7 sec 0.002 sec
Using ruby binary compiled with
SHARABLE_MIDDLE_SUBSTRING
1 1 0 1 0 1 0 1
1 0 1 1 0 0 1 0
1 0 1 0 0 1 1 0
1 1 0 0 1 1 1 0
1 1 0 0 1 1 1 0
1 0 0 0 0 1 1 0
1 1 1 0 0 1 1 0
1 0 1 0 0 1 1 0
0 0 0 0 1 0 1 0
1 0 0 0 0 1 1 0
Deserialization with copy-on-write substring
DB
Integers
Blogs
0 30 60 90 120 150
Serialization time
100%
DB
Integers
Blogs
0 30 60 90 120 150
Deserialization time
Default
With SHARABLE_MIDDLE_SUBSTRING
1 1 0 1 0 1 0 1
1 0 1 1 0 0 1 0
1 0 1 0 0 1 1 0
1 1 0 0 1 1 1 0
1 1 0 0 1 1 1 0
1 0 0 0 0 1 1 0
1 1 1 0 0 1 1 0
1 0 1 0 0 1 1 0
0 0 0 0 1 0 1 0
1 0 0 0 0 1 1 0
What is deserialization bottleneck?
0 14,000,000 28,000,000 42,000,000 56,000,000 70,000,000
Deserialization objects/sec
DB
Integers
Blogs
Boolean
(2^62)-1
(2^62)
Immediate
On-Heap
On-Heap
1 1 0 1 0 1 0 1
1 0 1 1 0 0 1 0
1 0 1 0 0 1 1 0
1 1 0 0 1 1 1 0
1 1 0 0 1 1 1 0
1 0 0 0 0 1 1 0
1 1 1 0 0 1 1 0
1 0 1 0 0 1 1 0
0 0 0 0 1 0 1 0
1 0 0 0 0 1 1 0
What is deserialization bottleneck?
0 14,000,000 28,000,000 42,000,000 56,000,000 70,000,000
Deserialization objects/sec
DB
Integers
Blogs
Boolean
(2^62)-1
(2^62)
Immediate
Immediate
Immediate
On-Heap
=> Object allocation is slow
On-Heap
On-Heap
1 1 0 1 0 1 0 1
1 0 1 1 0 0 1 0
1 0 1 0 0 1 1 0
1 1 0 0 1 1 1 0
1 1 0 0 1 1 1 0
1 0 0 0 0 1 1 0
1 1 1 0 0 1 1 0
1 0 1 0 0 1 1 0
0 0 0 0 1 0 1 0
1 0 0 0 0 1 1 0
Reusing Hash key objects
data: [
{
“id”: “s6ew-h6mp”,
“name”: “Consumer Complaints”,
…
},
{
“id”: “nsyy-je5y”,
“name”: “Beta Consumers”,
…
}
{
“id”: “wkue-ycpk”,
“name”: “Survey”,
…
}
]
Same key repeats.
Keys of Hash are always frozen.
=> We can reuse objects!
1 1 0 1 0 1 0 1
1 0 1 1 0 0 1 0
1 0 1 0 0 1 1 0
1 1 0 0 1 1 1 0
1 1 0 0 1 1 1 0
1 0 0 0 0 1 1 0
1 1 1 0 0 1 1 0
1 0 1 0 0 1 1 0
0 0 0 0 1 0 1 0
1 0 0 0 0 1 1 0
How to reuse Hash key objects?
Use fstring. Ruby uses it to reuse same objects for
immutable strings (but C API is not available…yet):
p “a”.object_id == “a”.object_id
#=> false
# frozen_string_literal: true
p “a”.object_id == “a”.object_id
#=> true
p (-“a”).object_id == (-“a”).object_id
#=> true
1 1 0 1 0 1 0 1
1 0 1 1 0 0 1 0
1 0 1 0 0 1 1 0
1 1 0 0 1 1 1 0
1 1 0 0 1 1 1 0
1 0 0 0 0 1 1 0
1 1 1 0 0 1 1 0
1 0 1 0 0 1 1 0
0 0 0 0 1 0 1 0
1 0 0 0 0 1 1 0
Reusing Hash key objects using fstring
DB
Integers
Blogs
0 30 60 90 120 150
Serialization time
100%
DB
Integers
Blogs
0 25 50 75 100 125 150
Deserialization time
Default
Hash key fstring using

hacked Ruby binary
1 1 0 1 0 1 0 1
1 0 1 1 0 0 1 0
1 0 1 0 0 1 1 0
1 1 0 0 1 1 1 0
1 1 0 0 1 1 1 0
1 0 0 0 0 1 1 0
1 1 1 0 0 1 1 0
1 0 1 0 0 1 1 0
0 0 0 0 1 0 1 0
1 0 0 0 0 1 1 0
Link Time Optimization
DB
Integers
Blogs
0 30 60 90 120 150
Serialization time
100%
DB
Integers
Blogs
0 25 50 75 100 125 150
Deserialization time
All optimization
And -flto=thin
1 1 0 1 0 1 0 1
1 0 1 1 0 0 1 0
1 0 1 0 0 1 1 0
1 1 0 0 1 1 1 0
1 1 0 0 1 1 1 0
1 0 0 0 0 1 1 0
1 1 1 0 0 1 1 0
1 0 1 0 0 1 1 0
0 0 0 0 1 0 1 0
1 0 0 0 0 1 1 0
1 1 0 0 0 1 1 0
Reading other MessagePack implementations
1 1 0 1 0 1 0 1
1 0 1 1 0 0 1 0
1 0 1 0 0 1 1 0
1 1 0 0 1 1 1 0
1 1 0 0 1 1 1 0
1 0 0 0 0 1 1 0
1 1 1 0 0 1 1 0
1 0 1 0 0 1 1 0
0 0 0 0 1 0 1 0
1 0 0 0 0 1 1 0
msgpack-java: TypeProfile bypassing
interface MessageBuffer {
public void putInt(int index, int value);
}
class MessageBufferBE implements MessageBuffer {
public void putInt(int index, int value)
{
byteBuffer.putInt(index, value);
}
}
This code has significant overhead:
1 1 0 1 0 1 0 1
1 0 1 1 0 0 1 0
1 0 1 0 0 1 1 0
1 1 0 0 1 1 1 0
1 1 0 0 1 1 1 0
1 0 0 0 0 1 1 0
1 1 1 0 0 1 1 0
1 0 1 0 0 1 1 0
0 0 0 0 1 0 1 0
1 0 0 0 0 1 1 0
msgpack-java: TypeProfile bypassing
interface MessageBuffer {
public void putInt(int index, int value);
}
class MessageBufferBE implements MessageBuffer {
public void putInt(int index, int value)
{
byteBuffer.putInt(index, value);
}
}
This code has significant overhead:
Dynamic method lookup

(slow even after JIT)
Dynamic method lookup
1 1 0 1 0 1 0 1
1 0 1 1 0 0 1 0
1 0 1 0 0 1 1 0
1 1 0 0 1 1 1 0
1 1 0 0 1 1 1 0
1 0 0 0 0 1 1 0
1 1 1 0 0 1 1 0
1 0 1 0 0 1 1 0
0 0 0 0 1 0 1 0
1 0 0 0 0 1 1 0
msgpack-java: TypeProfile bypassing
interface MessageBuffer {
public void putInt(int index, int value)
{
v = Integer.reverseBytes(v);
unsafe.putInt(base, address + index, v);
}
public static MessageBuffer newInstance() {
// …
}
Much faster code:
No override.
JVM JIT inlines them.
1 1 0 1 0 1 0 1
1 0 1 1 0 0 1 0
1 0 1 0 0 1 1 0
1 1 0 0 1 1 1 0
1 1 0 0 1 1 1 0
1 0 0 0 0 1 1 0
1 1 1 0 0 1 1 0
1 0 1 0 0 1 1 0
0 0 0 0 1 0 1 0
1 0 0 0 0 1 1 0
msgpack-java: TypeProfile bypassing
interface MessageBuffer {
public void putInt(int index, int value)
{
v = Integer.reverseBytes(v);
unsafe.putInt(base, address + index, v);
}
public static MessageBuffer newInstance() {
// …
}
Much faster code:
JVM intrinsics

(1-to-1 mapping to CPU instruction,

no function call)
JVM intrinsics
Load inherited class lazily.

No override on little-endian machine.
No override.
JVM JIT inlines them.
1 1 0 1 0 1 0 1
1 0 1 1 0 0 1 0
1 0 1 0 0 1 1 0
1 1 0 0 1 1 1 0
1 1 0 0 1 1 1 0
1 0 0 0 0 1 1 0
1 1 1 0 0 1 1 0
1 0 1 0 0 1 1 0
0 0 0 0 1 0 1 0
1 0 0 0 0 1 1 0
MessagePack-CSharp: Lookup cache of optimized Ser/De
class User {
public int Age { get; set; }
public string FirstName { get; set; }
public string LastName { get; set; }
}
[
31,
“Sadayuki”,
“Furuhashi”
]
Mapping between class and
semi-structured data
Java: Jackson databind, JAXB, …
C#: MessagePack-CSharp, System.Runtime.Serialization, …
Swift: Codable, SwiftMsgPack, …
…
1 1 0 1 0 1 0 1
1 0 1 1 0 0 1 0
1 0 1 0 0 1 1 0
1 1 0 0 1 1 1 0
1 1 0 0 1 1 1 0
1 0 0 0 0 1 1 0
1 1 1 0 0 1 1 0
1 0 1 0 0 1 1 0
0 0 0 0 1 0 1 0
1 0 0 0 0 1 1 0
MessagePack-CSharp: Native LZ4 integration
1 1 0 1 0 1 0 1
1 0 1 1 0 0 1 0
1 0 1 0 0 1 1 0
1 1 0 0 1 1 1 0
1 1 0 0 1 1 1 0
1 0 0 0 0 1 1 0
1 1 1 0 0 1 1 0
1 0 1 0 0 1 1 0
0 0 0 0 1 0 1 0
1 0 0 0 0 1 1 0
MessagePack-CSharp: Native LZ4 integration
It's like JSON.

but fast and small.
Sadayuki Furuhashi
#MyMessagePack
Tweet your msgpack usage with

More Related Content

Performance Optimization Techniques of MessagePack-Ruby - RubyKaigi 2019

  • 1. Performance Optimization Techniques of MessagePack-Ruby Sadayuki Furuhashi RubyKaigi 2019 #MyMessagePack Tweet your msgpack usage with
  • 2. 1 1 0 1 0 1 0 1 1 0 1 1 0 0 1 0 1 0 1 0 0 1 1 0 1 1 0 0 1 1 1 0 1 1 0 0 1 1 1 0 1 0 0 0 0 1 1 0 1 1 1 0 0 1 1 0 1 0 1 0 0 1 1 0 0 0 0 0 1 0 1 0 1 0 0 0 0 1 1 0 About me A founder of Treasure Data, Inc. Located in Silicon Valley, USA. OSS Hacker. Github: @frsyuki
  • 3. 1 1 0 1 0 1 0 1 1 0 1 1 0 0 1 0 1 0 1 0 0 1 1 0 1 1 0 0 1 1 1 0 1 1 0 0 1 1 1 0 1 0 0 0 0 1 1 0 1 1 1 0 0 1 1 0 1 0 1 0 0 1 1 0 0 0 0 0 1 0 1 0 1 0 0 0 0 1 1 0 1 1 0 0 0 1 1 0 Basics of MessagePack
  • 4. 1 1 0 1 0 1 0 1 1 0 1 1 0 0 1 0 1 0 1 0 0 1 1 0 1 1 0 0 1 1 1 0 1 1 0 0 1 1 1 0 1 0 0 0 0 1 1 0 1 1 1 0 0 1 1 0 1 0 1 0 0 1 1 0 0 0 0 0 1 0 1 0 1 0 0 0 0 1 1 0 What’s MessagePack? { “compact”: true, “schema”: 0 } 82 A7 compact C3 A6 schema 00 JSON MessagePack It’s like JSON, but fast and small. 7-byte string2-element map true 6-byte string 0 27 bytes 18 bytes (34% smaller)
  • 5. 1 1 0 1 0 1 0 1 1 0 1 1 0 0 1 0 1 0 1 0 0 1 1 0 1 1 0 0 1 1 1 0 1 1 0 0 1 1 1 0 1 0 0 0 0 1 1 0 1 1 1 0 0 1 1 0 1 0 1 0 0 1 1 0 0 0 0 0 1 0 1 0 1 0 0 0 0 1 1 0 What’s MessagePack? It’s like JSON, but fast and small. > Self-descriptive, Schema-on-Read semantics > Everyone knows > De facto standard data format > Human-readable > Self-descriptive, Schema-on-Read semantics > Everyone who uses JSON can use > Drop-in improvement of JSON > Machine-readable JSON MessagePack
  • 6. 1 1 0 1 0 1 0 1 1 0 1 1 0 0 1 0 1 0 1 0 0 1 1 0 1 1 0 0 1 1 1 0 1 1 0 0 1 1 1 0 1 0 0 0 0 1 1 0 1 1 1 0 0 1 1 0 1 0 1 0 0 1 1 0 0 0 0 0 1 0 1 0 1 0 0 0 0 1 1 0 timestamp 32 timestamp 64 timestamp 96 Language Agnostic Type System MessagePack
 Type System MessagePack
 Format String Timestamp Language Types fixstr str 8 str 16 str 32 String Timestamp (JSON compatible)(Ruby, Swift, Java, Go, …) Convert
  • 7. 1 1 0 1 0 1 0 1 1 0 1 1 0 0 1 0 1 0 1 0 0 1 1 0 1 1 0 0 1 1 1 0 1 1 0 0 1 1 1 0 1 0 0 0 0 1 1 0 1 1 1 0 0 1 1 0 1 0 1 0 0 1 1 0 0 0 0 0 1 0 1 0 1 0 0 0 0 1 1 0 Supported by Super Skilled Engineers All Over The World Ruby /msgpack
  • 8. 1 1 0 1 0 1 0 1 1 0 1 1 0 0 1 0 1 0 1 0 0 1 1 0 1 1 0 0 1 1 1 0 1 1 0 0 1 1 1 0 1 0 0 0 0 1 1 0 1 1 1 0 0 1 1 0 1 0 1 0 0 1 1 0 0 0 0 0 1 0 1 0 1 0 0 0 0 1 1 0 MessagePack-Ruby Major Committers
  • 9. 1 1 0 1 0 1 0 1 1 0 1 1 0 0 1 0 1 0 1 0 0 1 1 0 1 1 0 0 1 1 1 0 1 1 0 0 1 1 1 0 1 0 0 0 0 1 1 0 1 1 1 0 0 1 1 0 1 0 1 0 0 1 1 0 0 0 0 0 1 0 1 0 1 0 0 0 0 1 1 0 Real World MessagePack We import over 2,000,000 records/sec and
 store 30PB of data in MessagePack format. 15 trillion rows processed every day. Sada Furuhashi, Arm Treasure Data Sada Furuhashi, Initial Creator of Fluentd MessagePack is an essential component of Fluentd to achieve high performance and flexibility at the same time.
  • 10. 1 1 0 1 0 1 0 1 1 0 1 1 0 0 1 0 1 0 1 0 0 1 1 0 1 1 0 0 1 1 1 0 1 1 0 0 1 1 1 0 1 0 0 0 0 1 1 0 1 1 1 0 0 1 1 0 1 0 1 0 0 1 1 0 0 0 0 0 1 0 1 0 1 0 0 0 0 1 1 0 Adoption of MessagePack Today Mobile Apps Microprocessors Automotive Telematics Sensors Cloud Infrastructure Middleware Machine Learning Games Analytical Databases MessgePack Zero-overhead, heterogeneous data exchange
  • 11. 1 1 0 1 0 1 0 1 1 0 1 1 0 0 1 0 1 0 1 0 0 1 1 0 1 1 0 0 1 1 1 0 1 1 0 0 1 1 1 0 1 0 0 0 0 1 1 0 1 1 1 0 0 1 1 0 1 0 1 0 0 1 1 0 0 0 0 0 1 0 1 0 1 0 0 0 0 1 1 0 1 1 0 0 0 1 1 0 MessagePack-Ruby implementation
  • 12. 1 1 0 1 0 1 0 1 1 0 1 1 0 0 1 0 1 0 1 0 0 1 1 0 1 1 0 0 1 1 1 0 1 1 0 0 1 1 1 0 1 0 0 0 0 1 1 0 1 1 1 0 0 1 1 0 1 0 1 0 0 1 1 0 0 0 0 0 1 0 1 0 1 0 0 0 0 1 1 0 MessagePack::Packer Object Buffer [ 1, 2, {“k” => “v”}, nil ] 94 4-element array
  • 13. 1 1 0 1 0 1 0 1 1 0 1 1 0 0 1 0 1 0 1 0 0 1 1 0 1 1 0 0 1 1 1 0 1 1 0 0 1 1 1 0 1 0 0 0 0 1 1 0 1 1 1 0 0 1 1 0 1 0 1 0 0 1 1 0 0 0 0 0 1 0 1 0 1 0 0 0 0 1 1 0 MessagePack::Packer Object Buffer [ 1, 2, {“k” => “v”}, nil ] 94 4-element array 01 Integer 1
  • 14. 1 1 0 1 0 1 0 1 1 0 1 1 0 0 1 0 1 0 1 0 0 1 1 0 1 1 0 0 1 1 1 0 1 1 0 0 1 1 1 0 1 0 0 0 0 1 1 0 1 1 1 0 0 1 1 0 1 0 1 0 0 1 1 0 0 0 0 0 1 0 1 0 1 0 0 0 0 1 1 0 MessagePack::Packer Object Buffer [ 1, 2, {“k” => “v”}, nil ] 94 4-element array 01 Integer 1 02 Integer 2
  • 15. 1 1 0 1 0 1 0 1 1 0 1 1 0 0 1 0 1 0 1 0 0 1 1 0 1 1 0 0 1 1 1 0 1 1 0 0 1 1 1 0 1 0 0 0 0 1 1 0 1 1 1 0 0 1 1 0 1 0 1 0 0 1 1 0 0 0 0 0 1 0 1 0 1 0 0 0 0 1 1 0 MessagePack::Packer Object Buffer [ 1, 2, {“k” => “v”}, nil ] 94 4-element array 01 Integer 1 02 Integer 2 81 1-element map
  • 16. 1 1 0 1 0 1 0 1 1 0 1 1 0 0 1 0 1 0 1 0 0 1 1 0 1 1 0 0 1 1 1 0 1 1 0 0 1 1 1 0 1 0 0 0 0 1 1 0 1 1 1 0 0 1 1 0 1 0 1 0 0 1 1 0 0 0 0 0 1 0 1 0 1 0 0 0 0 1 1 0 MessagePack::Packer Object Buffer [ 1, 2, {“k” => “v”}, nil ] 94 01 02 81 A1 1-byte string 4-element array Integer 1 Integer 2 1-element map
  • 17. 1 1 0 1 0 1 0 1 1 0 1 1 0 0 1 0 1 0 1 0 0 1 1 0 1 1 0 0 1 1 1 0 1 1 0 0 1 1 1 0 1 0 0 0 0 1 1 0 1 1 1 0 0 1 1 0 1 0 1 0 0 1 1 0 0 0 0 0 1 0 1 0 1 0 0 0 0 1 1 0 MessagePack::Packer Object Buffer [ 1, 2, {“k” => “v”}, nil ] 94 01 02 81 ‘k’ ‘k’ A1 1-byte string 4-element array Integer 1 Integer 2 1-element map
  • 18. 1 1 0 1 0 1 0 1 1 0 1 1 0 0 1 0 1 0 1 0 0 1 1 0 1 1 0 0 1 1 1 0 1 1 0 0 1 1 1 0 1 0 0 0 0 1 1 0 1 1 1 0 0 1 1 0 1 0 1 0 0 1 1 0 0 0 0 0 1 0 1 0 1 0 0 0 0 1 1 0 MessagePack::Packer Object Buffer [ 1, 2, {“k” => “v”}, nil ] 94 01 02 81 ‘k’ ‘k’ A1 1-byte string ‘v’ ‘v’ A1 1-byte string 4-element array Integer 1 Integer 2 1-element map
  • 19. 1 1 0 1 0 1 0 1 1 0 1 1 0 0 1 0 1 0 1 0 0 1 1 0 1 1 0 0 1 1 1 0 1 1 0 0 1 1 1 0 1 0 0 0 0 1 1 0 1 1 1 0 0 1 1 0 1 0 1 0 0 1 1 0 0 0 0 0 1 0 1 0 1 0 0 0 0 1 1 0 MessagePack::Packer Object Buffer [ 1, 2, {“k” => “v”}, nil ] 94 01 02 81 ‘k’ A1 ‘v’ A1 C0 nil 4-element array Integer 1 Integer 2 1-element map ‘k’ 1-byte string ‘v’ 1-byte string
  • 20. 1 1 0 1 0 1 0 1 1 0 1 1 0 0 1 0 1 0 1 0 0 1 1 0 1 1 0 0 1 1 1 0 1 1 0 0 1 1 1 0 1 0 0 0 0 1 1 0 1 1 1 0 0 1 1 0 1 0 1 0 0 1 1 0 0 0 0 0 1 0 1 0 1 0 0 0 0 1 1 0 MessagePack::Unpacker Buffer Stack 94 01 02 81 ‘k’ A1 ‘v’ A1 C0 [ □, □, □, □ ] 4-element array
  • 21. 1 1 0 1 0 1 0 1 1 0 1 1 0 0 1 0 1 0 1 0 0 1 1 0 1 1 0 0 1 1 1 0 1 1 0 0 1 1 1 0 1 0 0 0 0 1 1 0 1 1 1 0 0 1 1 0 1 0 1 0 0 1 1 0 0 0 0 0 1 0 1 0 1 0 0 0 0 1 1 0 MessagePack::Unpacker Buffer Stack 94 01 02 81 ‘k’ A1 ‘v’ A1 C0 [ □, □, □, □ ] 1 Integer 1
  • 22. 1 1 0 1 0 1 0 1 1 0 1 1 0 0 1 0 1 0 1 0 0 1 1 0 1 1 0 0 1 1 1 0 1 1 0 0 1 1 1 0 1 0 0 0 0 1 1 0 1 1 1 0 0 1 1 0 1 0 1 0 0 1 1 0 0 0 0 0 1 0 1 0 1 0 0 0 0 1 1 0 MessagePack::Unpacker Buffer Stack 94 01 02 81 ‘k’ A1 ‘v’ A1 C0 [ 1, □, □, □ ] 1 complete object
  • 23. 1 1 0 1 0 1 0 1 1 0 1 1 0 0 1 0 1 0 1 0 0 1 1 0 1 1 0 0 1 1 1 0 1 1 0 0 1 1 1 0 1 0 0 0 0 1 1 0 1 1 1 0 0 1 1 0 1 0 1 0 0 1 1 0 0 0 0 0 1 0 1 0 1 0 0 0 0 1 1 0 MessagePack::Unpacker Buffer Stack 94 01 02 81 ‘k’ A1 ‘v’ A1 C0 [ 1, □, □, □ ] 2 Integer 2
  • 24. 1 1 0 1 0 1 0 1 1 0 1 1 0 0 1 0 1 0 1 0 0 1 1 0 1 1 0 0 1 1 1 0 1 1 0 0 1 1 1 0 1 0 0 0 0 1 1 0 1 1 1 0 0 1 1 0 1 0 1 0 0 1 1 0 0 0 0 0 1 0 1 0 1 0 0 0 0 1 1 0 MessagePack::Unpacker Buffer Stack 94 01 02 81 ‘k’ A1 ‘v’ A1 C0 [ 1, 2, □, □ ] 2 complete object
  • 25. 1 1 0 1 0 1 0 1 1 0 1 1 0 0 1 0 1 0 1 0 0 1 1 0 1 1 0 0 1 1 1 0 1 1 0 0 1 1 1 0 1 0 0 0 0 1 1 0 1 1 1 0 0 1 1 0 1 0 1 0 0 1 1 0 0 0 0 0 1 0 1 0 1 0 0 0 0 1 1 0 MessagePack::Unpacker Buffer Stack 94 01 02 81 ‘k’ A1 ‘v’ A1 C0 [ 1, 2, □, □ ] { □ => □ }1-element map
  • 26. 1 1 0 1 0 1 0 1 1 0 1 1 0 0 1 0 1 0 1 0 0 1 1 0 1 1 0 0 1 1 1 0 1 1 0 0 1 1 1 0 1 0 0 0 0 1 1 0 1 1 1 0 0 1 1 0 1 0 1 0 0 1 1 0 0 0 0 0 1 0 1 0 1 0 0 0 0 1 1 0 MessagePack::Unpacker Buffer Stack 94 01 02 81 ‘k’ A1 ‘v’ A1 C0 [ 1, 2, □, □ ] { □ => □ } “k”1-byte string
  • 27. 1 1 0 1 0 1 0 1 1 0 1 1 0 0 1 0 1 0 1 0 0 1 1 0 1 1 0 0 1 1 1 0 1 1 0 0 1 1 1 0 1 0 0 0 0 1 1 0 1 1 1 0 0 1 1 0 1 0 1 0 0 1 1 0 0 0 0 0 1 0 1 0 1 0 0 0 0 1 1 0 MessagePack::Unpacker Buffer Stack 94 01 02 81 ‘k’ A1 ‘v’ A1 C0 [ 1, 2, □, □ ] { “k” => □ } “k” complete object
  • 28. 1 1 0 1 0 1 0 1 1 0 1 1 0 0 1 0 1 0 1 0 0 1 1 0 1 1 0 0 1 1 1 0 1 1 0 0 1 1 1 0 1 0 0 0 0 1 1 0 1 1 1 0 0 1 1 0 1 0 1 0 0 1 1 0 0 0 0 0 1 0 1 0 1 0 0 0 0 1 1 0 MessagePack::Unpacker Buffer Stack 94 01 02 81 ‘k’ A1 ‘v’ A1 C0 [ 1, 2, □, □ ] { “k” => □ } “v” 1-byte string
  • 29. 1 1 0 1 0 1 0 1 1 0 1 1 0 0 1 0 1 0 1 0 0 1 1 0 1 1 0 0 1 1 1 0 1 1 0 0 1 1 1 0 1 0 0 0 0 1 1 0 1 1 1 0 0 1 1 0 1 0 1 0 0 1 1 0 0 0 0 0 1 0 1 0 1 0 0 0 0 1 1 0 MessagePack::Unpacker Buffer Stack 94 01 02 81 ‘k’ A1 ‘v’ A1 C0 [ 1, 2, □, □ ] { “k” => “v” } “v” complete object
  • 30. 1 1 0 1 0 1 0 1 1 0 1 1 0 0 1 0 1 0 1 0 0 1 1 0 1 1 0 0 1 1 1 0 1 1 0 0 1 1 1 0 1 0 0 0 0 1 1 0 1 1 1 0 0 1 1 0 1 0 1 0 0 1 1 0 0 0 0 0 1 0 1 0 1 0 0 0 0 1 1 0 MessagePack::Unpacker Buffer Stack 94 01 02 81 ‘k’ A1 ‘v’ A1 C0 [1, 2,{“k”=>“v”}, □ ] { “k” => “v” } complete object
  • 31. 1 1 0 1 0 1 0 1 1 0 1 1 0 0 1 0 1 0 1 0 0 1 1 0 1 1 0 0 1 1 1 0 1 1 0 0 1 1 1 0 1 0 0 0 0 1 1 0 1 1 1 0 0 1 1 0 1 0 1 0 0 1 1 0 0 0 0 0 1 0 1 0 1 0 0 0 0 1 1 0 MessagePack::Unpacker Buffer Stack 94 01 02 81 ‘k’ A1 ‘v’ A1 C0 [1, 2,{“k”=>“v”}, □ ] nil nil
  • 32. 1 1 0 1 0 1 0 1 1 0 1 1 0 0 1 0 1 0 1 0 0 1 1 0 1 1 0 0 1 1 1 0 1 1 0 0 1 1 1 0 1 0 0 0 0 1 1 0 1 1 1 0 0 1 1 0 1 0 1 0 0 1 1 0 0 0 0 0 1 0 1 0 1 0 0 0 0 1 1 0 MessagePack::Unpacker Buffer Stack 94 01 02 81 ‘k’ A1 ‘v’ A1 C0 [1, 2,{“k”=>“v”},nil] nil complete object
  • 33. 1 1 0 1 0 1 0 1 1 0 1 1 0 0 1 0 1 0 1 0 0 1 1 0 1 1 0 0 1 1 1 0 1 1 0 0 1 1 1 0 1 0 0 0 0 1 1 0 1 1 1 0 0 1 1 0 1 0 1 0 0 1 1 0 0 0 0 0 1 0 1 0 1 0 0 0 0 1 1 0 MessagePack::Unpacker Buffer Stack 94 01 02 81 ‘k’ A1 ‘v’ A1 C0 [1, 2,{“k”=>“v”},nil] Complete object
  • 34. 1 1 0 1 0 1 0 1 1 0 1 1 0 0 1 0 1 0 1 0 0 1 1 0 1 1 0 0 1 1 1 0 1 1 0 0 1 1 1 0 1 0 0 0 0 1 1 0 1 1 1 0 0 1 1 0 1 0 1 0 0 1 1 0 0 0 0 0 1 0 1 0 1 0 0 0 0 1 1 0 1 1 0 0 0 1 1 0 Optimization
  • 35. 1 1 0 1 0 1 0 1 1 0 1 1 0 0 1 0 1 0 1 0 0 1 1 0 1 1 0 0 1 1 1 0 1 1 0 0 1 1 1 0 1 0 0 0 0 1 1 0 1 1 1 0 0 1 1 0 1 0 1 0 0 1 1 0 0 0 0 0 1 0 1 0 1 0 0 0 0 1 1 0 MessagePack::Buffer MessagePack::Buffer next mapped_string mem { “count” => 1, “page” => 0, “body” => [ “LONG TEXT” ] ] Object 83 A5 ‘c’ ‘o’ ’u’ ’n’ ’t' 01 A4 ‘p’ ’a’ ’g’ ’e’ A4 ‘b’ ‘o’ ’d’ ’y’
  • 36. 1 1 0 1 0 1 0 1 1 0 1 1 0 0 1 0 1 0 1 0 0 1 1 0 1 1 0 0 1 1 1 0 1 1 0 0 1 1 1 0 1 0 0 0 0 1 1 0 1 1 1 0 0 1 1 0 1 0 1 0 0 1 1 0 0 0 0 0 1 0 1 0 1 0 0 0 0 1 1 0 MessagePack::Buffer MessagePack::Buffer next mapped_string mem next mapped_string mem { “count” => 1, “page” => 0, “body” => [ “LONG TEXT” ] ] Object 83 A5 ‘c’ ‘o’ ’u’ ’n’ ’t' 01 A4 ‘p’ ’a’ ’g’ ’e’ A4 ‘b’ ‘o’ ’d’ ’y’ 91 A9 ‘L’ ‘O’ ’N’ ’G’ ’ ’ ’T‘ ’E’ ‘X’ ’T’ Add more buffer chunks instead of realloc()
  • 37. 1 1 0 1 0 1 0 1 1 0 1 1 0 0 1 0 1 0 1 0 0 1 1 0 1 1 0 0 1 1 1 0 1 1 0 0 1 1 1 0 1 0 0 0 0 1 1 0 1 1 1 0 0 1 1 0 1 0 1 0 0 1 1 0 0 0 0 0 1 0 1 0 1 0 0 0 0 1 1 0 Zero-copy write optimization MessagePack::Buffer next mapped_string mem next mapped_string mem next mapped_string mem { “count” => 1, “page” => 0, “body” => [ “LONG TEXT” ] ] Object 83 A5 ‘c’ ‘o’ ’u’ ’n’ ’t' 01 A4 ‘p’ ’a’ ’g’ ’e’ A4 ‘b’ ‘o’ ’d’ ’y’ 91 A9 “LONG TEXT” rb_str_dup() Fast copy-on-write
  • 38. 1 1 0 1 0 1 0 1 1 0 1 1 0 0 1 0 1 0 1 0 0 1 1 0 1 1 0 0 1 1 1 0 1 1 0 0 1 1 1 0 1 0 0 0 0 1 1 0 1 1 1 0 0 1 1 0 1 0 1 0 0 1 1 0 0 0 0 0 1 0 1 0 1 0 0 0 0 1 1 0 Zero-copy read optimization MessagePack::Buffer next mapped_string mem Object { “count” => 1, “page” => 0, “body” => [ “LONG TEXT” ] ] rb_str_substr() Fast copy-on-write x83 A5 c o u n t t x01 xA4 p a g e xA4 b o d y x91 A9 L O N G
 T E X T ” “ ※ if SHARABLE_SUBSTRING_P() returns true x83 A5 c o u n t t x01 xA4 p a g e xA4 b o d y x91 A9 L O N G
 T E X T ” Source String “ rb_str_dup() Fast copy-on-write
  • 39. 1 1 0 1 0 1 0 1 1 0 1 1 0 0 1 0 1 0 1 0 0 1 1 0 1 1 0 0 1 1 1 0 1 1 0 0 1 1 1 0 1 0 0 0 0 1 1 0 1 1 1 0 0 1 1 0 1 0 1 0 0 1 1 0 0 0 0 0 1 0 1 0 1 0 0 0 0 1 1 0 Reserved memory pool MessagePack::Buffer next mapped_string mem Global memory pool next mapped_string mem 4KB 4KB 4KB 4KB 4KB 4KB 4KB 4KB 83 A5 ‘c’ ‘o’ ’u’ ’n’ ’t' 01 A4 ‘p’ ’a’ ’g’ ’e’ A4 ‘b’ ‘o’ ’d’ ’y’ 91 A9
  • 40. 1 1 0 1 0 1 0 1 1 0 1 1 0 0 1 0 1 0 1 0 0 1 1 0 1 1 0 0 1 1 1 0 1 1 0 0 1 1 1 0 1 0 0 0 0 1 1 0 1 1 1 0 0 1 1 0 1 0 1 0 0 1 1 0 0 0 0 0 1 0 1 0 1 0 0 0 0 1 1 0 Reserved memory pool MessagePack::Buffer next mapped_string mem Global memory pool next mapped_string mem 4KB 4KB 4KB 4KB 4KB 4KB 4KB 4KB
  • 41. 1 1 0 1 0 1 0 1 1 0 1 1 0 0 1 0 1 0 1 0 0 1 1 0 1 1 0 0 1 1 1 0 1 1 0 0 1 1 1 0 1 0 0 0 0 1 1 0 1 1 1 0 0 1 1 0 1 0 1 0 0 1 1 0 0 0 0 0 1 0 1 0 1 0 0 0 0 1 1 0 1 1 0 0 0 1 1 0 Benchmark & further optimization
  • 42. 1 1 0 1 0 1 0 1 1 0 1 1 0 0 1 0 1 0 1 0 0 1 1 0 1 1 0 0 1 1 1 0 1 1 0 0 1 1 1 0 1 0 0 0 0 1 1 0 1 1 1 0 0 1 1 0 1 0 1 0 0 1 1 0 0 0 0 0 1 0 1 0 1 0 0 0 0 1 1 0 Benchmark Data Sets DB 100_000.times.map {|i| r.rand(i+1) } [{ "id" : "gfmg-6ppu", "name" : "Mortgage Complaints", "averageRating" : 0, "createdAt" : 1433953219, "moderationStatus" : true, “numberOfComments" : 0, "description" : "Each week we …”, … Blogs Integers [ 0,0,0,1,3,0,0, …,
 42991,26906,18655,7015
 ] { “results”: [ { “attachments”:[], “body”:
 ”Dear Friends and Colleagues,nn
 I always look forward to the… https://www.justice.gov/api/v1/blog_entries.json http://data.consumerfinance.gov/api/views.json Benchmark code available at https://gist.github.com/frsyuki/9777c4adba2b5c957695b64f17b64ba1
  • 43. 1 1 0 1 0 1 0 1 1 0 1 1 0 0 1 0 1 0 1 0 0 1 1 0 1 1 0 0 1 1 1 0 1 1 0 0 1 1 1 0 1 0 0 0 0 1 1 0 1 1 1 0 0 1 1 0 1 0 1 0 0 1 1 0 0 0 0 0 1 0 1 0 1 0 0 0 0 1 1 0 MessagePack vs JSON DB Integers Blogs 0 600 1200 1800 2400 3000 Serialization time MessagePack JSON (Oj) JSON (JSON) 100% DB Integers Blogs 0 600 1200 1800 2400 3000 Deserialization time 100%
  • 44. 1 1 0 1 0 1 0 1 1 0 1 1 0 0 1 0 1 0 1 0 0 1 1 0 1 1 0 0 1 1 1 0 1 1 0 0 1 1 1 0 1 0 0 0 0 1 1 0 1 1 1 0 0 1 1 0 1 0 1 0 0 1 1 0 0 0 0 0 1 0 1 0 1 0 0 0 0 1 1 0 MessagePack without optimization DB Integers Blogs 0 30 60 90 120 150 Serialization time 100% DB Integers Blogs 0 30 60 90 120 150 Deserialization time 100% Default No memory pool No zero-copy read
  • 45. 1 1 0 1 0 1 0 1 1 0 1 1 0 0 1 0 1 0 1 0 0 1 1 0 1 1 0 0 1 1 1 0 1 1 0 0 1 1 1 0 1 0 0 0 0 1 1 0 1 1 1 0 0 1 1 0 1 0 1 0 0 1 1 0 0 0 0 0 1 0 1 0 1 0 0 0 0 1 1 0 Zero-copy read optimization MessagePack::Buffer next mapped_string mem Object { “count” => 1, “page” => 0, “body” => [ “LONG TEXT” ] ] x83 A5 c o u n t t x01 xA4 p a g e xA4 b o d y x91 L O N G
 T E X T ” “x83 A5 c o u n t t x01 xA4 p a g e xA4 b o d y x91 L O N G
 T E X T ” Source String “ rb_str_dup() Fast copy-on-write rb_str_substr() Fast copy-on-write ※ if SHARABLE_SUBSTRING_P() returns true
  • 46. 1 1 0 1 0 1 0 1 1 0 1 1 0 0 1 0 1 0 1 0 0 1 1 0 1 1 0 0 1 1 1 0 1 1 0 0 1 1 1 0 1 0 0 0 0 1 1 0 1 1 1 0 0 1 1 0 1 0 1 0 0 1 1 0 0 0 0 0 1 0 1 0 1 0 0 0 0 1 1 0 Copy-on-write substring SHARABLE_SUBSTRING_P() returns true only when substring shares the last 0 termination with the original string s = "a" * 1_000_100 10_000.times do s.slice(100, 1_000_000) end #=> 0.002 sec s = "a" * 1_000_100 10_000.times do s.slice(100, 1_000_000 - 1) end #=> 2.7 sec Not including the last character disables Copy-on-Write
  • 47. 1 1 0 1 0 1 0 1 1 0 1 1 0 0 1 0 1 0 1 0 0 1 1 0 1 1 0 0 1 1 1 0 1 1 0 0 1 1 1 0 1 0 0 0 0 1 1 0 1 1 1 0 0 1 1 0 1 0 1 0 0 1 1 0 0 0 0 0 1 0 1 0 1 0 0 0 0 1 1 0 Copy-on-write substring SHARABLE_SUBSTRING_P() returns true only when substring shares the last 0 termination with the original string OR Ruby is compiled with SHARABLE_MIDDLE_SUBSTRING flag (not enabled by default) s = "a" * 1_000_100 10_000.times do s.slice(100, 1_000_000) end #=> 0.002 sec s = "a" * 1_000_100 10_000.times do s.slice(100, 1_000_000 - 1) end #=> 2.7 sec 0.002 sec Using ruby binary compiled with SHARABLE_MIDDLE_SUBSTRING
  • 48. 1 1 0 1 0 1 0 1 1 0 1 1 0 0 1 0 1 0 1 0 0 1 1 0 1 1 0 0 1 1 1 0 1 1 0 0 1 1 1 0 1 0 0 0 0 1 1 0 1 1 1 0 0 1 1 0 1 0 1 0 0 1 1 0 0 0 0 0 1 0 1 0 1 0 0 0 0 1 1 0 Deserialization with copy-on-write substring DB Integers Blogs 0 30 60 90 120 150 Serialization time 100% DB Integers Blogs 0 30 60 90 120 150 Deserialization time Default With SHARABLE_MIDDLE_SUBSTRING
  • 49. 1 1 0 1 0 1 0 1 1 0 1 1 0 0 1 0 1 0 1 0 0 1 1 0 1 1 0 0 1 1 1 0 1 1 0 0 1 1 1 0 1 0 0 0 0 1 1 0 1 1 1 0 0 1 1 0 1 0 1 0 0 1 1 0 0 0 0 0 1 0 1 0 1 0 0 0 0 1 1 0 What is deserialization bottleneck? 0 14,000,000 28,000,000 42,000,000 56,000,000 70,000,000 Deserialization objects/sec DB Integers Blogs Boolean (2^62)-1 (2^62) Immediate On-Heap On-Heap
  • 50. 1 1 0 1 0 1 0 1 1 0 1 1 0 0 1 0 1 0 1 0 0 1 1 0 1 1 0 0 1 1 1 0 1 1 0 0 1 1 1 0 1 0 0 0 0 1 1 0 1 1 1 0 0 1 1 0 1 0 1 0 0 1 1 0 0 0 0 0 1 0 1 0 1 0 0 0 0 1 1 0 What is deserialization bottleneck? 0 14,000,000 28,000,000 42,000,000 56,000,000 70,000,000 Deserialization objects/sec DB Integers Blogs Boolean (2^62)-1 (2^62) Immediate Immediate Immediate On-Heap => Object allocation is slow On-Heap On-Heap
  • 51. 1 1 0 1 0 1 0 1 1 0 1 1 0 0 1 0 1 0 1 0 0 1 1 0 1 1 0 0 1 1 1 0 1 1 0 0 1 1 1 0 1 0 0 0 0 1 1 0 1 1 1 0 0 1 1 0 1 0 1 0 0 1 1 0 0 0 0 0 1 0 1 0 1 0 0 0 0 1 1 0 Reusing Hash key objects data: [ { “id”: “s6ew-h6mp”, “name”: “Consumer Complaints”, … }, { “id”: “nsyy-je5y”, “name”: “Beta Consumers”, … } { “id”: “wkue-ycpk”, “name”: “Survey”, … } ] Same key repeats. Keys of Hash are always frozen. => We can reuse objects!
  • 52. 1 1 0 1 0 1 0 1 1 0 1 1 0 0 1 0 1 0 1 0 0 1 1 0 1 1 0 0 1 1 1 0 1 1 0 0 1 1 1 0 1 0 0 0 0 1 1 0 1 1 1 0 0 1 1 0 1 0 1 0 0 1 1 0 0 0 0 0 1 0 1 0 1 0 0 0 0 1 1 0 How to reuse Hash key objects? Use fstring. Ruby uses it to reuse same objects for immutable strings (but C API is not available…yet): p “a”.object_id == “a”.object_id #=> false # frozen_string_literal: true p “a”.object_id == “a”.object_id #=> true p (-“a”).object_id == (-“a”).object_id #=> true
  • 53. 1 1 0 1 0 1 0 1 1 0 1 1 0 0 1 0 1 0 1 0 0 1 1 0 1 1 0 0 1 1 1 0 1 1 0 0 1 1 1 0 1 0 0 0 0 1 1 0 1 1 1 0 0 1 1 0 1 0 1 0 0 1 1 0 0 0 0 0 1 0 1 0 1 0 0 0 0 1 1 0 Reusing Hash key objects using fstring DB Integers Blogs 0 30 60 90 120 150 Serialization time 100% DB Integers Blogs 0 25 50 75 100 125 150 Deserialization time Default Hash key fstring using
 hacked Ruby binary
  • 54. 1 1 0 1 0 1 0 1 1 0 1 1 0 0 1 0 1 0 1 0 0 1 1 0 1 1 0 0 1 1 1 0 1 1 0 0 1 1 1 0 1 0 0 0 0 1 1 0 1 1 1 0 0 1 1 0 1 0 1 0 0 1 1 0 0 0 0 0 1 0 1 0 1 0 0 0 0 1 1 0 Link Time Optimization DB Integers Blogs 0 30 60 90 120 150 Serialization time 100% DB Integers Blogs 0 25 50 75 100 125 150 Deserialization time All optimization And -flto=thin
  • 55. 1 1 0 1 0 1 0 1 1 0 1 1 0 0 1 0 1 0 1 0 0 1 1 0 1 1 0 0 1 1 1 0 1 1 0 0 1 1 1 0 1 0 0 0 0 1 1 0 1 1 1 0 0 1 1 0 1 0 1 0 0 1 1 0 0 0 0 0 1 0 1 0 1 0 0 0 0 1 1 0 1 1 0 0 0 1 1 0 Reading other MessagePack implementations
  • 56. 1 1 0 1 0 1 0 1 1 0 1 1 0 0 1 0 1 0 1 0 0 1 1 0 1 1 0 0 1 1 1 0 1 1 0 0 1 1 1 0 1 0 0 0 0 1 1 0 1 1 1 0 0 1 1 0 1 0 1 0 0 1 1 0 0 0 0 0 1 0 1 0 1 0 0 0 0 1 1 0 msgpack-java: TypeProfile bypassing interface MessageBuffer { public void putInt(int index, int value); } class MessageBufferBE implements MessageBuffer { public void putInt(int index, int value) { byteBuffer.putInt(index, value); } } This code has significant overhead:
  • 57. 1 1 0 1 0 1 0 1 1 0 1 1 0 0 1 0 1 0 1 0 0 1 1 0 1 1 0 0 1 1 1 0 1 1 0 0 1 1 1 0 1 0 0 0 0 1 1 0 1 1 1 0 0 1 1 0 1 0 1 0 0 1 1 0 0 0 0 0 1 0 1 0 1 0 0 0 0 1 1 0 msgpack-java: TypeProfile bypassing interface MessageBuffer { public void putInt(int index, int value); } class MessageBufferBE implements MessageBuffer { public void putInt(int index, int value) { byteBuffer.putInt(index, value); } } This code has significant overhead: Dynamic method lookup
 (slow even after JIT) Dynamic method lookup
  • 58. 1 1 0 1 0 1 0 1 1 0 1 1 0 0 1 0 1 0 1 0 0 1 1 0 1 1 0 0 1 1 1 0 1 1 0 0 1 1 1 0 1 0 0 0 0 1 1 0 1 1 1 0 0 1 1 0 1 0 1 0 0 1 1 0 0 0 0 0 1 0 1 0 1 0 0 0 0 1 1 0 msgpack-java: TypeProfile bypassing interface MessageBuffer { public void putInt(int index, int value) { v = Integer.reverseBytes(v); unsafe.putInt(base, address + index, v); } public static MessageBuffer newInstance() { // … } Much faster code: No override. JVM JIT inlines them.
  • 59. 1 1 0 1 0 1 0 1 1 0 1 1 0 0 1 0 1 0 1 0 0 1 1 0 1 1 0 0 1 1 1 0 1 1 0 0 1 1 1 0 1 0 0 0 0 1 1 0 1 1 1 0 0 1 1 0 1 0 1 0 0 1 1 0 0 0 0 0 1 0 1 0 1 0 0 0 0 1 1 0 msgpack-java: TypeProfile bypassing interface MessageBuffer { public void putInt(int index, int value) { v = Integer.reverseBytes(v); unsafe.putInt(base, address + index, v); } public static MessageBuffer newInstance() { // … } Much faster code: JVM intrinsics
 (1-to-1 mapping to CPU instruction,
 no function call) JVM intrinsics Load inherited class lazily.
 No override on little-endian machine. No override. JVM JIT inlines them.
  • 60. 1 1 0 1 0 1 0 1 1 0 1 1 0 0 1 0 1 0 1 0 0 1 1 0 1 1 0 0 1 1 1 0 1 1 0 0 1 1 1 0 1 0 0 0 0 1 1 0 1 1 1 0 0 1 1 0 1 0 1 0 0 1 1 0 0 0 0 0 1 0 1 0 1 0 0 0 0 1 1 0 MessagePack-CSharp: Lookup cache of optimized Ser/De class User { public int Age { get; set; } public string FirstName { get; set; } public string LastName { get; set; } } [ 31, “Sadayuki”, “Furuhashi” ] Mapping between class and semi-structured data Java: Jackson databind, JAXB, … C#: MessagePack-CSharp, System.Runtime.Serialization, … Swift: Codable, SwiftMsgPack, … …
  • 61. 1 1 0 1 0 1 0 1 1 0 1 1 0 0 1 0 1 0 1 0 0 1 1 0 1 1 0 0 1 1 1 0 1 1 0 0 1 1 1 0 1 0 0 0 0 1 1 0 1 1 1 0 0 1 1 0 1 0 1 0 0 1 1 0 0 0 0 0 1 0 1 0 1 0 0 0 0 1 1 0 MessagePack-CSharp: Native LZ4 integration
  • 62. 1 1 0 1 0 1 0 1 1 0 1 1 0 0 1 0 1 0 1 0 0 1 1 0 1 1 0 0 1 1 1 0 1 1 0 0 1 1 1 0 1 0 0 0 0 1 1 0 1 1 1 0 0 1 1 0 1 0 1 0 0 1 1 0 0 0 0 0 1 0 1 0 1 0 0 0 0 1 1 0 MessagePack-CSharp: Native LZ4 integration
  • 63. It's like JSON.
 but fast and small. Sadayuki Furuhashi #MyMessagePack Tweet your msgpack usage with