PyGotham 2014
Introduction to
Perrin Harkins
“We should forget about small efficiencies, say
about 97% of the time: premature optimization is
the root of all evil. Yet we should not pass up our
opportunities in that critical 3%. A good
programmer will not be lulled into complacency
by such reasoning, he will be wise to look carefully
at the critical code; but only after that code has
been identified.”
–Donald Knuth
“Bottlenecks occur in surprising places, so don't
try to second guess and put in a speed hack until
you have proven that's where the bottleneck is.”
–Rob Pike
What will a profiler tell us?
❖ Function execution time!
❖ Memory usage, etc. are possible, but for another day!
❖ More about line profiling later!
❖ Real (wall clock) time!
❖ Inclusive vs exclusive time!
❖ Number of calls, primitive and recursive

❖ Generates profile data that can be read in shell or GUI
❖ 30% or more speed penalty
From command line:!
$ python -m cProfile -o
Or, in your program:!
import cProfile'slow_function', '')
Or, even more flexible:!
pr = cProfile.Profile()	
… thing you want to profile …!

import pstats	
profile = pstats.Stats('')	
12192418 function calls (11990470 primitive calls) in 84.268 seconds	
Ordered by: cumulative time	
List reduced from 1211 to 20 due to restriction <20>	
ncalls tottime percall cumtime percall filename:lineno(function)	
1 0.000 0.000 84.402 84.402 <string>:1(<module>)	
1 0.021 0.021 84.402 84.402	
500 0.096 0.000 84.381 0.169	
500 0.007 0.000 35.874 0.072	
500 0.066 0.000 33.431 0.067	
500 0.160 0.000 22.684 0.045	
10501 0.175 0.000 21.963 0.002	
14001 0.286 0.000 21.472 0.002	
6501 0.047 0.000 14.200 0.002
profile.print_callees('full_clean', 10)	
List reduced from 1211 to 2 due to restriction <'full_clean'>	
Function called...	
ncalls tottime cumtime -> 500 0.177 2.855
500 0.003 0.030	
500 0.031 2.784
393(_post_clean) -> 500 0.001 0.001	
500 0.096 2.399
List reduced from 1211 to 2 due to restriction <'full_clean'>	
Function was called by...	
ncalls tottime cumtime <- 500 0.009 5.678 <- 500 0.005 2.405

❖ GUI for viewing profile data!
❖ Run your profile output through pyprof2calltree!
❖ On a Mac, qcachegrind is easier to install
PyGotham 2014 Introduction to Profiling
PyGotham 2014 Introduction to Profiling
❖ Squaremap of call tree!
❖ Maybe useful for spotting large exclusive time functions

PyGotham 2014 Introduction to Profiling
PyGotham 2014 Introduction to Profiling
Using your results
❖ Bottom up approach!
❖ Start with a large exclusive time sub!
❖ Climb up call graph to find something you can affect!
❖ "We're spending a lot of time in deepcopy(). What's
calling that so much?"!
❖ Might miss higher-level fixes
Using your results
❖ Top down approach!
❖ Start with a large inclusive time sub!
❖ Walk down call graph to find something you can
❖ "We're spending a lot of time in this validate() method.
What's it doing that takes so long?"!
❖ Look for structural changes

Line profiling
❖ line_profiler does exist!
❖ Results are not very actionable!
❖ If you get this far, you probably should stop (or refactor
your methods!)
Good profiling technique
❖ Create a repeatable benchmark test!
❖ Allows you to measure progress!
❖ Iterations/second!
❖ Time for n iterations
What usually helps
❖ Removing unnecessary work!
❖ “We load that config data every time, even when we don’t
use it.”!
❖ Using a more efficient algorithm
What usually helps
❖ Batching I/O (disk or net) operations!
❖ Database stuff!
❖ SQL tuning!
❖ Indexes!
❖ Transactions

What usually helps
❖ Caching!
❖ Easy to add, hard to live with!
❖ Code complexity!
❖ Invalidation calls!
❖ Dependency tracking!
❖ Business customers care about data freshness
Thank you!

PyGotham 2014 Introduction to Profiling

  • 2. “We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil. Yet we should not pass up our opportunities in that critical 3%. A good programmer will not be lulled into complacency by such reasoning, he will be wise to look carefully at the critical code; but only after that code has been identified.” –Donald Knuth
  • 3. “Bottlenecks occur in surprising places, so don't try to second guess and put in a speed hack until you have proven that's where the bottleneck is.” –Rob Pike
  • 4. What will a profiler tell us? ❖ Function execution time! ❖ Memory usage, etc. are possible, but for another day! ❖ More about line profiling later! ❖ Real (wall clock) time! ❖ Inclusive vs exclusive time! ❖ Number of calls, primitive and recursive
  • 5. cProfile ❖ Generates profile data that can be read in shell or GUI tools! ❖ 30% or more speed penalty
  • 6. cProfile From command line:! $ python -m cProfile -o
  • 7. cProfile Or, in your program:! import cProfile'slow_function', '')
  • 8. cProfile Or, even more flexible:! pr = cProfile.Profile() pr.enable() … thing you want to profile …! pr.disable()
  • 9. pstats import pstats profile = pstats.Stats('') profile.add('myscript.prof2') profile.strip_dirs() profile.sort_stats('cumulative') profile.print_stats(20)
  • 10. 12192418 function calls (11990470 primitive calls) in 84.268 seconds ! Ordered by: cumulative time List reduced from 1211 to 20 due to restriction <20> ! ncalls tottime percall cumtime percall filename:lineno(function) 1 0.000 0.000 84.402 84.402 <string>:1(<module>) 1 0.021 0.021 84.402 84.402 500 0.096 0.000 84.381 0.169 500 0.007 0.000 35.874 0.072 500 0.066 0.000 33.431 0.067 500 0.160 0.000 22.684 0.045 10501 0.175 0.000 21.963 0.002 14001 0.286 0.000 21.472 0.002 6501 0.047 0.000 14.200 0.002
  • 11. profile.print_callees('full_clean', 10) ! List reduced from 1211 to 2 due to restriction <'full_clean'> ! Function called... ncalls tottime cumtime -> 500 0.177 2.855 277(_clean_fields) 500 0.003 0.030 500 0.031 2.784 393(_post_clean) -> 500 0.001 0.001 500 0.096 2.399
  • 12. profile.print_callers('full_clean') ! List reduced from 1211 to 2 due to restriction <'full_clean'> ! Function was called by... ncalls tottime cumtime <- 500 0.009 5.678 <- 500 0.005 2.405 393(_post_clean)
  • 13. KCacheGrind ! ❖ GUI for viewing profile data! ❖ Run your profile output through pyprof2calltree! ❖ On a Mac, qcachegrind is easier to install
  • 16. RunSnakeRun ❖ Squaremap of call tree! ❖ Maybe useful for spotting large exclusive time functions
  • 19. Using your results ❖ Bottom up approach! ❖ Start with a large exclusive time sub! ❖ Climb up call graph to find something you can affect! ❖ "We're spending a lot of time in deepcopy(). What's calling that so much?"! ❖ Might miss higher-level fixes
  • 20. Using your results ❖ Top down approach! ❖ Start with a large inclusive time sub! ❖ Walk down call graph to find something you can affect! ❖ "We're spending a lot of time in this validate() method. What's it doing that takes so long?"! ❖ Look for structural changes
  • 21. Line profiling ❖ line_profiler does exist! ❖ Results are not very actionable! ❖ If you get this far, you probably should stop (or refactor your methods!)
  • 22. Good profiling technique ❖ Create a repeatable benchmark test! ❖ Allows you to measure progress! ❖ Iterations/second! ❖ Time for n iterations
  • 23. What usually helps ❖ Removing unnecessary work! ❖ “We load that config data every time, even when we don’t use it.”! ❖ Using a more efficient algorithm
  • 24. What usually helps ❖ Batching I/O (disk or net) operations! ❖ Database stuff! ❖ SQL tuning! ❖ Indexes! ❖ Transactions
  • 25. What usually helps ❖ Caching! ❖ Easy to add, hard to live with! ❖ Code complexity! ❖ Invalidation calls! ❖ Dependency tracking! ❖ Business customers care about data freshness