SlideShare a Scribd company logo
Gopher
in performance tales
Mateusz Szczyrzyca
mateusz@szczyrzyca.pl
http://mateusz.szczyrzyca.
pl
picture from: https://golangtraining-in-web.appspot.com/
It’s about…
• Performance in general
• Some important basics
• Interesting performance case studies
• From the Gopher world’s perspective:
• general basics and tips
• pprof & tracer
• recommendations
Preface
Things are not that simple as they look like.
Especially numbers.
Trade offs are always a part of software
performance engineering
Performance (Software)
How available hardware resources are
utilized by applications
app ability to operate under certain
conditions (low hardware resources,
big amount of traffic, etc)
Basic terms
• algorithm
• runtime / compile time
• stack & heap
• GC (garbage collector)
• real/user/sys time
• Big-O notation
• concurrency (multithreading)
• parallelism
• IOPs
• Throughput, Latency, Response time
• utilization
• saturation
• bottleneck
• Workload
Stack vs Heap
source: https://stackoverflow.com/questions/79923/what-and-where-are-th
Throughput, Latency, Response Time, Saturation
Throughput, Latency, Response Time, Saturation
Root of all evil
Premature optimizations
When you lose your time and efforts to make
uneccessary
optimalizations or choices (ex. changing tech stack)
because you imagine it will be needed in the future.
Premature optimization
….is like a casual car for everyday city driving
The most difficult part
Benchmar
k
Wrong benchmarks
Source: https://benchmarksgame-team.pages.debian.net/bench
Typical benchmark
Real world app
Real world project timeline
Better benchmarks
source: https://www.techempower.com/benc
Fast (cpu-bound) languages
• Assembler?
• C?
• C++?
• Rust?
• Java?
• Go?
• Python?
Case study: Python (Japronto)
source: https://github.com/squeaky-pl/j
Case study: Python (Japronto)
source: https://github.com/squeaky-pl/j
Case study: Go (fasthttp)
source: https://github.com/valyala/fasth
Case study: chess engines
The most important factors:
1) playing strength (ELO)
2) analysis speed (nodes per sec),
especially in alpha-beta prunning
engines
Case study: chess engines
Case study: stockfish
• it’s a chess engine (strongest alpha-beta prunning)
• it’s written in C++
• it uses multithreading efficiently
• It has many derivates, asmFish is one of them (written in x86
asm)
Case study: stockfish
Suprisingly asmFish is neither the strongest or
„fastest” type of stockfish version.
Stockfish: the strongest because of
The evaluation speed?
• No: Houdini 6.03 (alpha-beta prunning) chess
engine is faster (nodes per sec on same
machine). But Houdini 6.03 is a slighty weaker
engine.
• Yes: Better chess algorithms (but the main
algorithm is the same)
• Leela Chess Zero: NN-based chess engine,
1000x slower in terms of nodes per sec.
Currently slightly weaker than stockfish.
Stockfish vs Leela Chess Zero
• Leela Chess Zero: NN-based (MCTS) chess
engine, build to reflect AlphaZero DeepMind
ideas (using NN and MCTS in chess)
• Self learning algorithm (games between LC0 vs
LC0 and LC0 vs rest of the world)
• Slower more than 1000x in terms of nodes per
second than stockfish due to the different
algorithm
• It’s playing strength it’s very close to stockfish,
especially at very fast games (bullet and
blitz chess)
Case study: fibonacci numbers
Task: get 50th numer from the fibonacci sequence
C vs
Python
Case study: fibonacci numbers
Performance eaters
• algorithms
• doing unnecessary work (GC, logging)
• non cpu-bound waiting
• not using multithreading
• using too many threads
• slow (cpu-bound) programming language?
Gopher Performance World
Go: benchmarking
Go: benchmarking
Go: pprof
Package pprof writes runtime profiling data in the format expected by
the pprof visualization tool. Useful for profiling CPU & Memory.
Go: pprof
Go: trace
Useful for trace execution of the program over time and goroutines
Performance profiling steps
1. Measurement:
• make benchmark and get results,
• do profiling to determine a bottleneck,
2. Make appropiate changes in your code
3. Repeat 1) and 2) if results are still no acceptable
Go: string concatenation
Go: string concatenation
Go: string concatenation
Go: GC
Garbage Collector (GC) allows you to focus on business logic instead of
memory management. However, this can lead to some
performance tradeoffs in some cases.
Turning off GC completely is highly not recommended unless you really
know what are you doing (you risk crash of your app)
GC usually does what you would have done in your code without
such mechanism.
GC improves slightly (usually) in every new Go version, but don’t treat
this statement as a general rule
Go: GC
runtime/debug – some tunning/stats options
Disabling GC may improve performance if there are many short lived
memory allocations but it’s not recommended overall due to it’s side
effects
Go: pointer vs value
Performance dilemma
pointer value
stack or heap (mostly) – GC traces it,
exception: unsafe.Pointer (as uintptrs)
stack, no GC pressure
passing bigger data structures passing small values
underlying value can be modified value is for read-only
no thread safe (synch needed) thread safe
Go: array vs slice
[…]array
Size is known during compile time thus not flexibe. Compiler checks validity of indexes.
Good for performance (keep on stack) if you know exact size of array.
[]slice
the rest cases that does not apply to arrays. Accessing elements out of scope results
in runtime panic. Preallocated slices are better for performance.
Go: Escape analysis
The compiler warns if variables will be stored on heap.
It applies for dynamic data structures which size cannot be
determined during compile time
Go: Escape analysis
Go: Escape analysis
Go: Escape analysis
Go: Escape analysis
Go: Escape analysis
Go: Escape analysis
Go: too few/many goroutines
Sometimes not using many threads (goroutines in Go) may affect
performance.
The opposite scenario is also possible – using too many goroutines
can impact performance negatively
https://golang.org/pkg/runtime/#GOMAXP
Go: sync.Pool
Use it to reduce memory allocations temporary objects than can be
stored/retrievied later
By allocation reduction you can reduce GC activities
Go: sync.Pool
Go: sync.Pool
Go: sync.Pool
Go: interface{} problem
Go: interface{ } problem
Go: interface{ } problem
why?
…lets find out using pprof
Go: interface{ } problem
Go: faster libs than std
std lib non std (fast) lib
net/http fasthttp
html/template fasttemplate
encoding/json gojay
Go: non discussed here
• channels vs mutexes
• Advanced GC tunning and GC-wise
programming
• struct paddings (memory saving)
• unsafe.Pointer and unsafe package
Conclusions
Focus on your business logic
and on the architecture
Conclusions
Write idiomatic and clean Go code
Conclusions
Use apropiate algorithms
Conclusions
Avoid using interface{} if it’s possible and
use a specific type instead
Conclusions
Use tests, linters, code reviews, etc
Conclusions
Detect your bottlenecks and profile your
code when it’s needed
Conclusions
Use microoptimalisations if they are required
Conclusions
Rewrite a performance-problematic
part in another programming language
if it offers functionality which you need.
Use it when rewritting does not cost too
much time and/or there is the lib in
another language for you purpose
Conclusions
Rewrite the entire app in a performant
cpu-bound language if it won’t take too
much time, all required libs
are available and the effort is really
worth of it
Links
• https://golang.org/pkg/runtime/pprof/
• https://golang.org/pkg/runtime/trace/
• https://blog.golang.org/profiling-go-programs
• https://github.com/golang/go/wiki/Performance
• http://www.brendangregg.com/
• https://github.com/dgryski/go-perfbook
• https://dave.cheney.net/tag/performance
• http://bigocheatsheet.com/
Q&A
https://github.com/mateusz-szczyrzyca/gocracow3

More Related Content

Gopher in performance_tales_ms_go_cracow