SlideShare a Scribd company logo
Getting started with
   Concurrency
  ...using Multiprocessing and Threading

         PyWorks, Atlanta 2008
            Jesse Noller
Who am I?
•   Just another guy.
•   Wrote PEP 371- “Addition of the multiprocessing package”
•   Did a lot of the integration, now the primary point of
    contact.
    •   This means I talk a lot on the internet.
•   Test Engineering is my focus - many of those are distributed
    and/or concurrent.
    •   This stuff is not my full time job.
What is concurrency?


•   Simultaneous execution

•   Potentially interacting tasks

•   Uses multi-core hardware

•   Includes Parallelism.
What is a thread?

•   Share the memory and state of the parent.
•   Are “light weight”
•   Each gets its own stack
•   Do not use Inter-Process Communication or messaging.

•   POSIX “Threads” - pthreads.
What are they good for?

•   Adding throughput and reduce latency within most
    applications.
    •   Throughput: adding threads allows you to process
        more information faster.
    •   Latency: adding threads to make the application react
        faster, such as GUI actions.

•   Algorithms which rely on shared data/state
What is a process?

•   An independent process-of-control.
•   Processes are “share nothing”.
•   Must use some form of Inter-Process
    Communication to communicate/coordinate.

•   Processes are “big”.
Uses for Processes.

•   When you don’t need to share lots of state and want a large
    amount of throughput.
•   Shared-Nothing-or-Little is “safer” then shared-everything.
•   Processes automatically run on multiple cores.
•   Easier to turn into a distributed application.
The Difference

•   Threads are implicitly “share everything” - this makes the
    programmer have to protect (lock) anything which will be
    shared between threads.
•   Processes are “share nothing” - programmers must explicitly
    share any data/state - this means that the programmer is
    forced to think about what is being shared
•   Explicit is better than Implicit
Python Threads


•   Python has threads, they are real, OS/Kernel level POSIX (p)
    threads.
    •   When you use threading.Thread, you get a pthread.
2.6 changes

•   camelCase method names are now foo_bar() style, e.g.:
    active_count, current_thread, is_alive, etc.
•   Attributes of threads and processes have been turned into
    properties.
    •   E.g.: daemon is now Thread.daemon = <bool>
•   For threading: these changes are optional. The old methods
    still exist.
...


Python does not use “green threads”. It has real threads. OS
Ones. Stop saying it doesn’t.
...


But: Python only allows a single thread to be executing within
the interpreter at once. This restriction is enforced by the
GIL.
The GIL


•   GIL: “Global Interpreter Lock” - this is a lock which must be
    acquired for a thread to enter the interpreter’s space.
•   Only one thread may be executing within the Python
    interpreter at once.
Yeah but...
•   No, it is not a bug.
•   It is an implementation detail of CPython interpreter
•   Interpreter maintenance is easier.
•   Creation of new C extension modules easier.
•   (mostly) sidestepped if the app is I/O (file, socket) bound.
•   A threaded app which makes heavy use of sockets, won’t see
    a a huge GIL penalty: it is still there though.
•   Not going away right now.
The other guys

•   Jython: No GIL, allows “free” threading by using the
    underlying Java threading system.
•   IronPython: No GIL - uses the underlying CLR, threads all
    run concurrently.
•   Stackless: Has a GIL. But has micro threads.
•   PyPy: Has a GIL... for now (dun dun dun)
Enter Multiprocessing
What is multiprocessing?

•   Follows the threading API closely but uses Processes and
    inter-process communication under the hood
•   Also offers distributed-computing faculties as well.
•   Allows the side-stepping of the GIL for CPU bound
    applications.
•   Allows for data/memory sharing.
•   CPython only.
Why include it?

•   Covered in PEP 371, we wanted to have something which was
    fast and “freed” many users from the GIL restrictions.
•   Wanted to add to the concurrency toolbox for Python as a
    whole.
     •   It is not the “final” answer. Nor is it “feature complete”
•   Oh, and it beats the threading module in speed.*
                          * lies, damned lies and benchmarks
How much faster?


•   It depends on the problem.
•   For example: number crunching, it’s significantly faster then
    adding threads.
•   Also faster in wide-finder/map-reduce situations
•   Process creation can be sluggish: create the workers up front.
Example: Crunching Primes


•       Yes, I picked something embarrassingly parallel.
•       Sum all of the primes in a range of integers starting from
        1,000,000 and going to 5,000,000.
•       Run on an 8 Core Mac Pro with 8 GB of ram with Python 2.6,
        completely idle, except for iTunes.
    •     The single threaded version took so long I needed music.
# Single threaded version
import math

def isprime(n):
    quot;quot;quot;Returns True if n is prime and False otherwisequot;quot;quot;
    if not isinstance(n, int):
        raise TypeError(quot;argument passed to is_prime is not of 'int' typequot;)
    if n < 2:
        return False
    if n == 2:
        return True
    max = int(math.ceil(math.sqrt(n)))
    i = 2
    while i <= max:
        if n % i == 0:
            return False
        i += 1
    return True

def sum_primes(n):
    quot;quot;quot;Calculates sum of all primes below given integer nquot;quot;quot;
    return sum([x for x in xrange(2, n) if isprime(x)])

if __name__ == quot;__main__quot;:
    for i in xrange(100000, 5000000, 100000):
        print sum_primes(i)
# Multi Threaded version
from threading import Thread
from Queue import Queue, Empty
...
def do_work(q):
    while True:
        try:
             x = q.get(block=False)
             print sum_primes(x)
        except Empty:
             break

if __name__ == quot;__main__quot;:
    work_queue = Queue()
    for i in xrange(100000, 5000000, 100000):
        work_queue.put(i)

    threads = [Thread(target=do_work, args=(work_queue,)) for i in range(8)]
    for t in threads:
        t.start()
    for t in threads:
        t.join()
# Multiprocessing version
from multiprocessing import Process, Queue
from Queue import Empty
...

if __name__ == quot;__main__quot;:
    work_queue = Queue()
    for i in xrange(100000, 5000000, 100000):
        work_queue.put(i)

   processes = [Process(target=do_work, args=(work_queue,)) for i in range(8)]
   for p in processes:
       p.start()
   for p in processes:
       p.join()
Results

•   All results are in wall-clock time.
•   Single Threaded: 41 minutes, 57 seconds
•   Multi Threaded (8 threads):    106 minutes, 29 seconds
•   MultiProcessing (8 Processes): 6 minutes, 22 seconds
•   This is a trivial example. More benchmarks/data were
    included in the PEP.
The catch.
•   Objects that are shared between processes must be
    serialize-able (pickle).
    •   40921 object/sec versus 24989 objects a second.
•   Processes are “heavy-weight”.
•   Processes can be slow to start. (on windows)
•   Supported on Linux, Solaris, Windows, OS/X - but not *BSD,
    and possibly others.
•   If you are creating and destroying lots of threads - processes
    are a significant impact.
API Time
It starts with a Process

•   Exactly like threading:
    •    Thread(target=func, args=(args,)).start()
    •    Process(target=func, args=(args,)).start()
•   You can subclass multiprocessing.Process exactly as you
    would with threading.Thread.
from threading import Thread
threads = [Thread(target=do_work, args=(q,)) for i in range(8)]

from multiprocessing import Process
processes = [Process(target=do_work, args=(q,)) for i in range(8)]
# Multiprocessing version             # Threading version
from multiprocessing import Process   from threading import Thread

class MyProcess(Process):             class MyThread(Thread):
    def __init__(self):                   def __init__(self):
        Process.__init__(self)                Thread.__init__(self)
    def run(self):                        def run(self):
        a, b = 0, 1                           a, b = 0, 1
        for i in range(100000):               for i in range(100000):
            a, b = b, a + b                       a, b = b, a + b

if __name__ == quot;__main__quot;:            if __name__ == quot;__main__quot;:
    p = MyProcess()                       t = MyThread()
    p.start()                             t.start()
    print p.pid                           t.join()
    p.join()
    print p.exitcode
Queues


•   multiprocessing includes 2 Queue implementations - Queue
    and JoinableQueue.
•   Queue is modeled after Queue.Queue but uses pipes
    underneath to transmit the data.
•   JoinableQueue is the same as Queue except it adds a .join()
    method and .task_done() ala Queue.Queue in python 2.5.
Queue.warning

•   The first is that if you call .terminate or kill a process which is
    currently accessing a queue: that queue may become corrupted.
•   The second is that any Queue that a Process has put data on
    must be drained prior to joining the processes which have put
    data there: otherwise, you’ll get a deadlock.
     •   Avoid this by calling Queue.cancel_join_thread() in
         the child process.
     •   Or just eat everything on the results pipe before calling
         join (e.g. work_queue, results_queue).
Pipes and Locks

•   Multiprocessing supports communication primitives.
    •   multiprocessing.Pipe(), which returns a pair of Connection
        objects which represent the ends of the pipe.
    •   The data sent on the connection must be pickle-able.
•   Multiprocessing has clones of all of the threading modules
    lock/RLock, Event, Condition and semaphore objects.
    •   Most of these support timeout arguments, too!
Shared Memory
•   Multiprocessing has a sharedctypes
    module.
•   This module allows you to create a   from multiprocessing import Process
    ctypes object in shared memory       from multiprocessing.sharedctypes import Value
    and share it with other processes.   from ctypes import c_int

                                         def modify(x):
•   The sharedctypes module offers           x.value += 1
    some safety through the use/
                                         x = Value(ctypes.c_int, 7)
    allocation of locks which prevent    p = Process(target=modify, args=(x))
    simultaneous accessing/              p.start()
    modification of the shared objects.   p.join()

                                         print x.value
Pools


•   One of the big “ugh” moments using threading is when you
    have a simple problem you simply want to pass to a pool of
    workers to hammer out.
•   Fact: There’s more thread pool implementations out there
    then stray cats in my neighborhood.
Process Pools!
•   Multiprocessing has the Pool object. This supports the up-front
    creation of a number of processes and a number of methods of
    passing work to the workers.
•   Pool.apply() - this is a clone of builtin apply() function.
     •   Pool.apply_async() - which can call a callback for you
         when the result is available.
•   Pool.map() - again, a parallel clone of the built in function.
     •   Pool.map_async() method, which can also get a callback to
         ring up when the results are done.
•   Fact: Functional programming people love this!
Pools raise insurance rates!

  from multiprocessing import Pool

  def f(x):
      return x*x

  if __name__ == '__main__':
      pool = Pool(processes=2)
      result = pool.apply_async(f, (10,))
      print result.get()


The output is: 100, note that the result returned is a AsyncResult
                                type.
Managers

•   Managers are a network and process-based way of sharing data
    between processes (and machines).
•   The primary manager type is the BaseManager - this is the
    basic Manager object, and can easily be subclassed to share
    data remotely
•   A proxy object is the type returned when accessing a
    shared object - this is a reference to the actual object being
    exported by the manager.
Sharing a queue (server)
# Manager Server
from Queue import Empty
from multiprocessing import managers, Queue

_queue = Queue()
def get_queue():
    return _queue

class QueueManager(managers.BaseManager): pass

QueueManager.register('get_queue', callable=get_queue)

m = QueueManager(address=('127.0.0.1', 8081), authkey=quot;lolquot;)
_queue.put('What’s up remote process')

s = m.get_server()
s.serve_forever()
Sharing a queue (client)

# Manager Client
from multiprocessing import managers

class QueueManager(managers.BaseManager): pass

QueueManager.register('get_queue')

m = QueueManager(address=('127.0.0.1', 8081), authkey=quot;lolquot;)
m.connect()
remote_queue = m.get_queue()
print remote_queue.get()
Gotchas

•   Processes which feed into a multiprocessing.Queue will block
    waiting for all the objects it put there to be removed.
•   Data must be pickle-able: this means some objects (for
    instance, GUI ones) can not be shared.
•   Arguments to Proxies (managers) methods must be pickle-able
    as well.
•   While it supports locking/semaphores: using those means
    you’re sharing something you may not need to be sharing.
In Closing

•   Multiple processes are not mutually exclusive with using
    Threads
•   Multiprocessing offers a simple and known API
     •   This lowers the barrier of entry significantly
     •   Side steps the GIL
•   In addition to “just processes” multiprocessing offers the start
    of grid-computing utilities
Questions?

More Related Content

Multiprocessing with python

  • 1. Getting started with Concurrency ...using Multiprocessing and Threading PyWorks, Atlanta 2008 Jesse Noller
  • 2. Who am I? • Just another guy. • Wrote PEP 371- “Addition of the multiprocessing package” • Did a lot of the integration, now the primary point of contact. • This means I talk a lot on the internet. • Test Engineering is my focus - many of those are distributed and/or concurrent. • This stuff is not my full time job.
  • 3. What is concurrency? • Simultaneous execution • Potentially interacting tasks • Uses multi-core hardware • Includes Parallelism.
  • 4. What is a thread? • Share the memory and state of the parent. • Are “light weight” • Each gets its own stack • Do not use Inter-Process Communication or messaging. • POSIX “Threads” - pthreads.
  • 5. What are they good for? • Adding throughput and reduce latency within most applications. • Throughput: adding threads allows you to process more information faster. • Latency: adding threads to make the application react faster, such as GUI actions. • Algorithms which rely on shared data/state
  • 6. What is a process? • An independent process-of-control. • Processes are “share nothing”. • Must use some form of Inter-Process Communication to communicate/coordinate. • Processes are “big”.
  • 7. Uses for Processes. • When you don’t need to share lots of state and want a large amount of throughput. • Shared-Nothing-or-Little is “safer” then shared-everything. • Processes automatically run on multiple cores. • Easier to turn into a distributed application.
  • 8. The Difference • Threads are implicitly “share everything” - this makes the programmer have to protect (lock) anything which will be shared between threads. • Processes are “share nothing” - programmers must explicitly share any data/state - this means that the programmer is forced to think about what is being shared • Explicit is better than Implicit
  • 9. Python Threads • Python has threads, they are real, OS/Kernel level POSIX (p) threads. • When you use threading.Thread, you get a pthread.
  • 10. 2.6 changes • camelCase method names are now foo_bar() style, e.g.: active_count, current_thread, is_alive, etc. • Attributes of threads and processes have been turned into properties. • E.g.: daemon is now Thread.daemon = <bool> • For threading: these changes are optional. The old methods still exist.
  • 11. ... Python does not use “green threads”. It has real threads. OS Ones. Stop saying it doesn’t.
  • 12. ... But: Python only allows a single thread to be executing within the interpreter at once. This restriction is enforced by the GIL.
  • 13. The GIL • GIL: “Global Interpreter Lock” - this is a lock which must be acquired for a thread to enter the interpreter’s space. • Only one thread may be executing within the Python interpreter at once.
  • 14. Yeah but... • No, it is not a bug. • It is an implementation detail of CPython interpreter • Interpreter maintenance is easier. • Creation of new C extension modules easier. • (mostly) sidestepped if the app is I/O (file, socket) bound. • A threaded app which makes heavy use of sockets, won’t see a a huge GIL penalty: it is still there though. • Not going away right now.
  • 15. The other guys • Jython: No GIL, allows “free” threading by using the underlying Java threading system. • IronPython: No GIL - uses the underlying CLR, threads all run concurrently. • Stackless: Has a GIL. But has micro threads. • PyPy: Has a GIL... for now (dun dun dun)
  • 17. What is multiprocessing? • Follows the threading API closely but uses Processes and inter-process communication under the hood • Also offers distributed-computing faculties as well. • Allows the side-stepping of the GIL for CPU bound applications. • Allows for data/memory sharing. • CPython only.
  • 18. Why include it? • Covered in PEP 371, we wanted to have something which was fast and “freed” many users from the GIL restrictions. • Wanted to add to the concurrency toolbox for Python as a whole. • It is not the “final” answer. Nor is it “feature complete” • Oh, and it beats the threading module in speed.* * lies, damned lies and benchmarks
  • 19. How much faster? • It depends on the problem. • For example: number crunching, it’s significantly faster then adding threads. • Also faster in wide-finder/map-reduce situations • Process creation can be sluggish: create the workers up front.
  • 20. Example: Crunching Primes • Yes, I picked something embarrassingly parallel. • Sum all of the primes in a range of integers starting from 1,000,000 and going to 5,000,000. • Run on an 8 Core Mac Pro with 8 GB of ram with Python 2.6, completely idle, except for iTunes. • The single threaded version took so long I needed music.
  • 21. # Single threaded version import math def isprime(n): quot;quot;quot;Returns True if n is prime and False otherwisequot;quot;quot; if not isinstance(n, int): raise TypeError(quot;argument passed to is_prime is not of 'int' typequot;) if n < 2: return False if n == 2: return True max = int(math.ceil(math.sqrt(n))) i = 2 while i <= max: if n % i == 0: return False i += 1 return True def sum_primes(n): quot;quot;quot;Calculates sum of all primes below given integer nquot;quot;quot; return sum([x for x in xrange(2, n) if isprime(x)]) if __name__ == quot;__main__quot;: for i in xrange(100000, 5000000, 100000): print sum_primes(i)
  • 22. # Multi Threaded version from threading import Thread from Queue import Queue, Empty ... def do_work(q): while True: try: x = q.get(block=False) print sum_primes(x) except Empty: break if __name__ == quot;__main__quot;: work_queue = Queue() for i in xrange(100000, 5000000, 100000): work_queue.put(i) threads = [Thread(target=do_work, args=(work_queue,)) for i in range(8)] for t in threads: t.start() for t in threads: t.join()
  • 23. # Multiprocessing version from multiprocessing import Process, Queue from Queue import Empty ... if __name__ == quot;__main__quot;: work_queue = Queue() for i in xrange(100000, 5000000, 100000): work_queue.put(i) processes = [Process(target=do_work, args=(work_queue,)) for i in range(8)] for p in processes: p.start() for p in processes: p.join()
  • 24. Results • All results are in wall-clock time. • Single Threaded: 41 minutes, 57 seconds • Multi Threaded (8 threads): 106 minutes, 29 seconds • MultiProcessing (8 Processes): 6 minutes, 22 seconds • This is a trivial example. More benchmarks/data were included in the PEP.
  • 25. The catch. • Objects that are shared between processes must be serialize-able (pickle). • 40921 object/sec versus 24989 objects a second. • Processes are “heavy-weight”. • Processes can be slow to start. (on windows) • Supported on Linux, Solaris, Windows, OS/X - but not *BSD, and possibly others. • If you are creating and destroying lots of threads - processes are a significant impact.
  • 27. It starts with a Process • Exactly like threading: • Thread(target=func, args=(args,)).start() • Process(target=func, args=(args,)).start() • You can subclass multiprocessing.Process exactly as you would with threading.Thread.
  • 28. from threading import Thread threads = [Thread(target=do_work, args=(q,)) for i in range(8)] from multiprocessing import Process processes = [Process(target=do_work, args=(q,)) for i in range(8)]
  • 29. # Multiprocessing version # Threading version from multiprocessing import Process from threading import Thread class MyProcess(Process): class MyThread(Thread): def __init__(self): def __init__(self): Process.__init__(self) Thread.__init__(self) def run(self): def run(self): a, b = 0, 1 a, b = 0, 1 for i in range(100000): for i in range(100000): a, b = b, a + b a, b = b, a + b if __name__ == quot;__main__quot;: if __name__ == quot;__main__quot;: p = MyProcess() t = MyThread() p.start() t.start() print p.pid t.join() p.join() print p.exitcode
  • 30. Queues • multiprocessing includes 2 Queue implementations - Queue and JoinableQueue. • Queue is modeled after Queue.Queue but uses pipes underneath to transmit the data. • JoinableQueue is the same as Queue except it adds a .join() method and .task_done() ala Queue.Queue in python 2.5.
  • 31. Queue.warning • The first is that if you call .terminate or kill a process which is currently accessing a queue: that queue may become corrupted. • The second is that any Queue that a Process has put data on must be drained prior to joining the processes which have put data there: otherwise, you’ll get a deadlock. • Avoid this by calling Queue.cancel_join_thread() in the child process. • Or just eat everything on the results pipe before calling join (e.g. work_queue, results_queue).
  • 32. Pipes and Locks • Multiprocessing supports communication primitives. • multiprocessing.Pipe(), which returns a pair of Connection objects which represent the ends of the pipe. • The data sent on the connection must be pickle-able. • Multiprocessing has clones of all of the threading modules lock/RLock, Event, Condition and semaphore objects. • Most of these support timeout arguments, too!
  • 33. Shared Memory • Multiprocessing has a sharedctypes module. • This module allows you to create a from multiprocessing import Process ctypes object in shared memory from multiprocessing.sharedctypes import Value and share it with other processes. from ctypes import c_int def modify(x): • The sharedctypes module offers x.value += 1 some safety through the use/ x = Value(ctypes.c_int, 7) allocation of locks which prevent p = Process(target=modify, args=(x)) simultaneous accessing/ p.start() modification of the shared objects. p.join() print x.value
  • 34. Pools • One of the big “ugh” moments using threading is when you have a simple problem you simply want to pass to a pool of workers to hammer out. • Fact: There’s more thread pool implementations out there then stray cats in my neighborhood.
  • 35. Process Pools! • Multiprocessing has the Pool object. This supports the up-front creation of a number of processes and a number of methods of passing work to the workers. • Pool.apply() - this is a clone of builtin apply() function. • Pool.apply_async() - which can call a callback for you when the result is available. • Pool.map() - again, a parallel clone of the built in function. • Pool.map_async() method, which can also get a callback to ring up when the results are done. • Fact: Functional programming people love this!
  • 36. Pools raise insurance rates! from multiprocessing import Pool def f(x): return x*x if __name__ == '__main__': pool = Pool(processes=2) result = pool.apply_async(f, (10,)) print result.get() The output is: 100, note that the result returned is a AsyncResult type.
  • 37. Managers • Managers are a network and process-based way of sharing data between processes (and machines). • The primary manager type is the BaseManager - this is the basic Manager object, and can easily be subclassed to share data remotely • A proxy object is the type returned when accessing a shared object - this is a reference to the actual object being exported by the manager.
  • 38. Sharing a queue (server) # Manager Server from Queue import Empty from multiprocessing import managers, Queue _queue = Queue() def get_queue(): return _queue class QueueManager(managers.BaseManager): pass QueueManager.register('get_queue', callable=get_queue) m = QueueManager(address=('127.0.0.1', 8081), authkey=quot;lolquot;) _queue.put('What’s up remote process') s = m.get_server() s.serve_forever()
  • 39. Sharing a queue (client) # Manager Client from multiprocessing import managers class QueueManager(managers.BaseManager): pass QueueManager.register('get_queue') m = QueueManager(address=('127.0.0.1', 8081), authkey=quot;lolquot;) m.connect() remote_queue = m.get_queue() print remote_queue.get()
  • 40. Gotchas • Processes which feed into a multiprocessing.Queue will block waiting for all the objects it put there to be removed. • Data must be pickle-able: this means some objects (for instance, GUI ones) can not be shared. • Arguments to Proxies (managers) methods must be pickle-able as well. • While it supports locking/semaphores: using those means you’re sharing something you may not need to be sharing.
  • 41. In Closing • Multiple processes are not mutually exclusive with using Threads • Multiprocessing offers a simple and known API • This lowers the barrier of entry significantly • Side steps the GIL • In addition to “just processes” multiprocessing offers the start of grid-computing utilities