Deadlocks in non-hierarchical CSP

Last updated 272 days ago by Roman Elizarov

kotlin

It is well-known that programs with shared mutable state protected by locks are prone to deadlocks, but it is less known that programs without shared mutable state, based either on Communicating Sequential Processes (CSP) or based on an actor model can deadlock, too, despite them not using locks at all. They deadlock in a different way, known as communication deadlock.

See, the formal definition of a deadlock is not directly tied to locks. The set of processes (threads/coroutines/actors) is in a deadlock when each one is waiting for the next one in a loop. We’ll see a concrete example soon, but first I need to apologize for not highlighting this issue before.

The stage

During KotlinConf 2018 I gave a talk titled “Kotlin Coroutines in Practice” (video and slides) in which I’ve shown the importance of structured concurrency to build reliable software in practice and gave a glimpse of some advanced concepts of CSP programming style as implemented in [kotlinx.coroutines](https://github.com/Kotlin/kotlinx.coroutines) library. In that section of my talk lies a problem. If you write all the code from my presentation verbatim, you are bound to get into a deadlock. Somehow I fell into this trap, thinking it is safe in my case, but no. So let us dissect it.

To start, note that all CSP-like architectures (including actor model) are safe if an application is structured as a data processing pipeline, where incoming messages enter into the system, get processed by different coroutines in sequence, and sent to the outside world at the end. More broadly, the code is trivially deadlock-free as long as it has a hierarchy (or DAG — directed acyclic graph) of coroutines, where higher-level coroutines only send messages to lower-level ones, but not vice versa, and each coroutine receives all incoming messages in one place, reacting to them by sending messages downstream.

Read full Article