SlideShare a Scribd company logo
High-Performance Python
Python is fast!
• Python is fast to write, but natively 10x - 100x slower than C.
• Python has great C interop, so you can use C for the slow parts.
• This makes Python competitive with C.
Before you try this at home…
• “Premature optimization is the root of all evil.”
• Use external standards for how fast your code needs to be.
• Remember: performance is a tradeoff against readability, 

maintainability, and developer time.
Part 1:
General Optimization
Profile Your Code
• 95%+ of your code is irrelevant to performance.
• A profiler will tells you which 5% is important.
Profile Your Code
In Python, use cProfile:
source: https://ymichael.com/2014/03/08/profiling-python-with-cprofile.html
Basics
• Make sure your Big-O performance is optimal.
• Move operations outside of loops.
• Use cacheing for repeated calculations.
• Apply algebraic simplifications.
Accidentally Quadratic
The *most* common issue:
def find_intersection(list_one, list_two):
intersection = []
for a in list_one:
if a in list_two:
intersection.append(a)
return intersection
Accidentally Quadratic
The *most* common issue:
def find_intersection(list_one, list_two):
intersection = []
for a in list_one:
if a in list_two:
intersection.append(a)
return intersection
def find_intersection(list_one, list_two):
intersection = []
list_two = set(list_two)
for a in list_one:
if a in list_two:
intersection.append(a)
return intersection
Business Logic
Leverage business logic. You’ll often have 

NP-Complete optimizations to make.
The underlying business reasoning should
guide your approximations.
Part II:
Python Optimization
Libraries
• Use numpy, scipy, pandas, scikit-learn, etc.
• Incredible built-in functionality.



If you need something esoteric, try combining 

built-ins or adapting a more general built-in
approach.
• Extremely fast, thoroughly optimized, and best of all,
already written.
Pure Python Tips
• Function calls are expensive. Avoid them and avoid recursion.
• Check the runtime of built-in data types.
• Make variables local. Global lookups are expensive.
• Use map/filter/reduce instead of for loops, they’re written in C.
• Vectorize! numpy arrays are much faster than lists.
Mixed Tips
• Vectorize! numpy arrays are much faster than lists.
Mixed Tips
def complex_sum(in_list):
in_list = [(a + 2) for a
in in_list]
# more transformations
return sum(in_list)
def complex_sum(in_list):
in_list = np.array(in_list)
in_list += 2
# more transformations
return in_list.sum()
Mixed Tips
• Vectorize! numpy arrays are much faster than lists.
• Array allocation can be a bottleneck. 

Try moving it outside of loops.
Mixed Tips
• Vectorize! numpy arrays are much faster than lists.
• Array allocation can be a bottleneck. 

Try moving it outside of loops.
n = 10 ** 3
output = 0
for i in xrange(10**9):
result = np.zeros(n)
## calculations ##
output += result.sum()
result = np.zeros(10**3)
output = 0
for i in xrange(10**9):
result[:] = 0 # zero out array
## calculations ##
output += result.sum()
• Cython: inline C code directly into Python.
Last Resort: C
def fib(int n):
cdef int a, b, temp
a = 0
b = 1
while b < n:
temp = b
b = a + b
a = temp
• Cython: inline C code directly into Python.
Last Resort: C
def fib(n):
a = 0
b = 1
while b < n:
temp = b
b = a + b
a = temp
return b
• Cython: inline C code directly into Python.
Last Resort: C
def fib(int n):
cdef int a, b, temp
a = 0
b = 1
while b < n:
temp = b
b = a + b
a = temp
return b
Last Resort: C
• Cython: inline C code directly into Python.
• C extensions: write C and call it from Python.
Last Resort: C
• Cython: inline C code directly into Python.
• C extensions: write C and call it from Python.
• Limit these techniques to hot loops.
Things I haven’t mentioned
• multithreading: basically doesn’t work in Python
• pypy: A Python JIT compiler with a different ecosystem
Warning
Optimization is addictive.
Conclusions
• Avoid premature optimizations!

Have objective benchmarks you’re trying to hit.
• Profile your code.

You will be surprised by the results.
• The gold standard for performance is highly-tuned C
(that’s already been written by someone else)
Resources
• Programming Pearls (Jon Bentley)
• accidentallyquadratic.tumblr.com
• Performance Engineering of Software
Systems, 6.172, MIT OpenCourseWare
• cProfile Docs
• Cython Docs
• Guido Van Rossum’s advice:

python.org/doc/essays/list2str
General Python Specific
Contact me: ben@caffeinatedanalytics.com

More Related Content

High-Performance Python

  • 2. Python is fast! • Python is fast to write, but natively 10x - 100x slower than C. • Python has great C interop, so you can use C for the slow parts. • This makes Python competitive with C.
  • 3. Before you try this at home… • “Premature optimization is the root of all evil.” • Use external standards for how fast your code needs to be. • Remember: performance is a tradeoff against readability, 
 maintainability, and developer time.
  • 5. Profile Your Code • 95%+ of your code is irrelevant to performance. • A profiler will tells you which 5% is important.
  • 6. Profile Your Code In Python, use cProfile: source: https://ymichael.com/2014/03/08/profiling-python-with-cprofile.html
  • 7. Basics • Make sure your Big-O performance is optimal. • Move operations outside of loops. • Use cacheing for repeated calculations. • Apply algebraic simplifications.
  • 8. Accidentally Quadratic The *most* common issue: def find_intersection(list_one, list_two): intersection = [] for a in list_one: if a in list_two: intersection.append(a) return intersection
  • 9. Accidentally Quadratic The *most* common issue: def find_intersection(list_one, list_two): intersection = [] for a in list_one: if a in list_two: intersection.append(a) return intersection def find_intersection(list_one, list_two): intersection = [] list_two = set(list_two) for a in list_one: if a in list_two: intersection.append(a) return intersection
  • 10. Business Logic Leverage business logic. You’ll often have 
 NP-Complete optimizations to make. The underlying business reasoning should guide your approximations.
  • 12. Libraries • Use numpy, scipy, pandas, scikit-learn, etc. • Incredible built-in functionality.
 
 If you need something esoteric, try combining 
 built-ins or adapting a more general built-in approach. • Extremely fast, thoroughly optimized, and best of all, already written.
  • 13. Pure Python Tips • Function calls are expensive. Avoid them and avoid recursion. • Check the runtime of built-in data types. • Make variables local. Global lookups are expensive. • Use map/filter/reduce instead of for loops, they’re written in C.
  • 14. • Vectorize! numpy arrays are much faster than lists. Mixed Tips
  • 15. • Vectorize! numpy arrays are much faster than lists. Mixed Tips def complex_sum(in_list): in_list = [(a + 2) for a in in_list] # more transformations return sum(in_list) def complex_sum(in_list): in_list = np.array(in_list) in_list += 2 # more transformations return in_list.sum()
  • 16. Mixed Tips • Vectorize! numpy arrays are much faster than lists. • Array allocation can be a bottleneck. 
 Try moving it outside of loops.
  • 17. Mixed Tips • Vectorize! numpy arrays are much faster than lists. • Array allocation can be a bottleneck. 
 Try moving it outside of loops. n = 10 ** 3 output = 0 for i in xrange(10**9): result = np.zeros(n) ## calculations ## output += result.sum() result = np.zeros(10**3) output = 0 for i in xrange(10**9): result[:] = 0 # zero out array ## calculations ## output += result.sum()
  • 18. • Cython: inline C code directly into Python. Last Resort: C
  • 19. def fib(int n): cdef int a, b, temp a = 0 b = 1 while b < n: temp = b b = a + b a = temp • Cython: inline C code directly into Python. Last Resort: C def fib(n): a = 0 b = 1 while b < n: temp = b b = a + b a = temp return b
  • 20. • Cython: inline C code directly into Python. Last Resort: C def fib(int n): cdef int a, b, temp a = 0 b = 1 while b < n: temp = b b = a + b a = temp return b
  • 21. Last Resort: C • Cython: inline C code directly into Python. • C extensions: write C and call it from Python.
  • 22. Last Resort: C • Cython: inline C code directly into Python. • C extensions: write C and call it from Python. • Limit these techniques to hot loops.
  • 23. Things I haven’t mentioned • multithreading: basically doesn’t work in Python • pypy: A Python JIT compiler with a different ecosystem
  • 25. Conclusions • Avoid premature optimizations!
 Have objective benchmarks you’re trying to hit. • Profile your code.
 You will be surprised by the results. • The gold standard for performance is highly-tuned C (that’s already been written by someone else)
  • 26. Resources • Programming Pearls (Jon Bentley) • accidentallyquadratic.tumblr.com • Performance Engineering of Software Systems, 6.172, MIT OpenCourseWare • cProfile Docs • Cython Docs • Guido Van Rossum’s advice:
 python.org/doc/essays/list2str General Python Specific Contact me: ben@caffeinatedanalytics.com