Concurrency in Python - concepts, frameworks and best practices

This talk discusses: - concurrency concepts (e. g. atomicity, race conditions and deadlocks) - frameworks for concurrency and their use (`threading`, `multiprocessing`, `concurrent.futures`, `asyncio`) - higher abstractions, e. g. queues and active objects - best practices for writing concurrent code

Tags: Parallel Programming, Programming, Python

Scheduled on friday 10:30 in room lecture

Speaker

Stefan Schwarzer

Stefan has been using Python professionally for more than 15 years, the last 10+ years as a freelancing software developer and consultant (https://sschwarzer.com/en/). He has published articles on Python and given talks on Python at several conferences. He's also the maintainer of the ftputil library (https://pypi.org/project/ftputil/).

Description

Have you run in situations where concurrent execution could speed up your Python code? Are you using a GUI toolkit?

This talk gives you the background to use concurrency in your code without shooting yourself in the foot - which is quite easy if you don't understand how concurrent execution differs from linear execution!

The presentation starts with explaining some concepts like concurrency, parallelism, resources, atomic operations, race conditions and deadlocks.

Then we discuss the commonly-used approaches to concurrency: multithreading with the threading module, multiprocessing with the multiprocessing module, and event loops (which include the asyncio framework). Each of these approaches has its typical use cases, which are explained.

You can implement concurrency on a number of abstraction levels. The lowest level consists of primitives like locks, events, semaphores and so on. A higher abstraction level is using queues, typically with worker threads or processes. Even higher abstraction levels are active objects (hiding primitives or queues behind an API; this includes "actors" if you heard of them), the thread and process pools in concurrent.futures and the asyncio framework. Finally, you can "outsource" concurrency by leaving it to a message broker, which is a distinct process that receives and distributes messages.

The talk closes with some tips and best practices, mainly:

  • Don't use concurrency if you don't have to.

  • Keep it simple. "Simple" usually doesn't mean using primitives like locks, but rather using higher abstractions if you can.

  • Operations that look atomic may not be atomic. For example, if descriptors are used, an "attribute access" may do arbitrarily complex things. If there's any doubt, assume an operation is not atomic.

  • Try to hide concurrency behind an API. In particular, serialize accesses to a resource by using a single thread or process (if you use threads and processes) for this resource.

  • Defects in concurrent code are often difficult to expose. If your code seems to work, it doesn't mean it will work on a different computer, on a complex network, under high load etc. Think about what you're doing and what could go wrong. (Of course, this applies to coding in general, but even more so to concurrent code.)