Go Sync Mutex Implementation¶

This blog analyzes the implementation of Go sync/mutex after briefing the background knowledge in the previous blog Prerequisite of Go Sync Mutex Implementation.

This blog only concerns mutex.go. It introduces the internal logic of mutex and explains why the mutex shouldn't be copied and why the mutex isn't reentrant.

Brief¶

Mutex is the Mut-ual Ex-clusion, a property of concurrency control, which is instituted for the purpose of preventing race conditions. According to the implementation, the mutex contains several ways during Lock.

The implementation has three paths to acquire the mutex, and the author named slow path and fast path.

The slow paths have both busy-waiting(also called spinning) and a blocked model, which waits to be awoken and then continues to execute.

Fast Path: Try CAS¶

CAS is used directly to see whether it could lock the mutex. Otherwise it enters the slow path.

    if atomic.CompareAndSwapInt32(&m.state, 0, mutexLocked) {
        if race.Enabled {
            race.Acquire(unsafe.Pointer(m))
        }
        return
    }

Slow path: Spin¶

After entering slow path, the go routine which tries to lock the mutex spins active_spin(4) times. Spinning consumes the CPU to avoid sleeping and go routine switching. The spinning itself doesn't try to acquire the mutex, and it's the way to wait a bit for the next try.

Slow Path: Stuck by semacquire¶

Spinning is a way to wait for the next try, and the semacquire is the same. Different from spinning, semacquire stuck the go routine until the go routine is awoken again once the Unlock is called.

More about the Slow Path¶

Unlike the fast path which tries to lock the mutex only, slow path contains more logics. It handles the mutex states, manages the go routines attempting to lock either spinning or semacquire, and let the go routine try to lock. The slow paths above are only the management of the go routines, which defines rules to schedule go routines in the mutex package level by utilizing the runtime.

Later, we will see the states of the mutex and the logic for locking attempt.

Mutex Representation¶

The definition of a mutex is very straightforward: a structure holds only two integers.

type Mutex struct {
    state int32
    sema  uint32
}

However, more complexities are hidden inside the implementation. The integers are required by the CAS operations. They shouldn't be treated as integer variables at all.

This hints the convention that Mutex must not be copied after first use as well. Generally, the bits of integers is not the only part of a mutex, the go routines related with the mutex make sense as well.

Mutex Field "state"¶

The field state is used as a bitmap to maintain the states of a mutex in a batch and could be changed through a CAS call. It is consisted by the four fields:

+--------+-------------------------------+----------+-------+--------+
| Syntax | -------------waiters--------- | Starving | Woken | Locked |
+--------+-------------------------------+----------+-------+--------+
|  Bit   | 31--------------------------3 |     2    |   1   |    0   |
+--------+-------------------------------+----------+-------+--------+

Waiters¶

The waiting number of go routines that want to acquire the mutex. This again demonstrates the state of integers and the corresponding go routines assemble a completed mutex.

Starving¶

Starving is not a concept at the mutex level, instead, it's a concept at the go routine level. It represents a situation where a mutex remains continuously locked, either by the same thread or by multiple threads, in such a way that another thread waiting to acquire the mutex is unable to do so for a long time.

Analogy of the Starving Concept

Making an analogy here, assume multiple customers joined a queue for a restaurant, and the servant will call the queue number with an algorithm. The starving here means one of the customers has joined the queue for a long time(beyond a certain time), but has had to wait for the servant to call him. In such a case, we can say this unlucky customer is starving. The starving is a concept at the customer(go routine) level instead of the restaurant(mutex) level. But at the same time, we can mark the restaurant is in starving mode because one of customers is starving, because the servant must take action for this customer with a higher priority.

The Starving bit is stored inside the field state to synchronize the starving information to all go routines who attempt to lock the mutex. The starving is designed for the mutex fairness.

Woken¶

Woken is the state of a mutex, which refers the available active go routines to attempt locking. It's used to communicate to help Unlock release the slept go routines through semarelase properly. For example, if the woken bit is 1, unlock won't release new go routines because there are active go routines already to avoid thundering herd.

Locked¶

A locked state means the mutex is locked.

Constant Definition¶

To represent those cases, Go defines some constants in src/sync/mutex.go:

const (
    // resolved value is 0b001, added by blog author
    mutexLocked = 1 << iota // mutex is locked
    // resolved value is 0b010, added by blog author
    mutexWoken
    // resolved value is ob100, added by blog author
    mutexStarving
    // resolved value is 3, added by blog author
    mutexWaiterShift = iota

    starvationThresholdNs = 1e6
)

Mutex Field "sema"¶

The field sema is used by runtime_SemacquireMutex to interact with the go routines who stuck there.

Generally, the lock operation tries to reduce its value and the unlock tries to increase it atomically both.

Lock: Quick Path¶

The quick path uses a CAS to try to lock the mutex:

func (m *Mutex) Lock() {
    if atomic.CompareAndSwapInt32(&m.state, 0, mutexLocked) {
        // ignore lines, added by blog author
        return
    }
    // Slow path (outlined so that the fast path can be inlined)
    m.lockSlow()
}

The implication is only the contents of the state are 0, as the diagram shows below, and the logic executes.

If the three left parts(waiters, starving, and woken) have a non-zero value, the fast path will be skipped, and the slow path will be entered directly.

+------------+-------------------------------+----------+-------+--------+
| Syntax     | -------------waiters--------- | Starving | Woken | Locked |
+------------+-------------------------------+----------+-------+--------+
| Bit Value  | 00000000000000000000000000000 |     0    |   0   |    0   |
+------------+-------------------------------+----------+-------+--------+

Schema for CAS¶

The CAS operation has three typical operation: copy, processor* and writing back. The step of writing back requires to check the original value to ensure the other go routines haven't changed it yet. If the value is modified, the CAS returns false and the go routine needs to try again.

The mutex uses this idea to modify the state field as well, so understanding it correctly helps us to further reading the implementation code.

Lock: Slow Path¶

Spin and Check Woken¶

Spin will spin 4 times and then try to acquire the mutex with certain conditions. The awoken here is used to communicate with the Unlock call for the same mutex.

If the mutex is locked and not starving(no go routine is starving), and the go routine could spin, it will spin while setting the woken bit true to refer its activeness.

        // Don't spin in starvation mode, ownership is handed off to waiters
        // so we won't be able to acquire the mutex anyway.
        if old&(mutexLocked|mutexStarving) == mutexLocked && runtime_canSpin(iter) {
            // Active spinning makes sense.
            // Try to set mutexWoken flag to inform Unlock
            // to not wake other blocked goroutines.
            if !awoke && old&mutexWoken == 0 && old>>mutexWaiterShift != 0 &&
                atomic.CompareAndSwapInt32(&m.state, old, old|mutexWoken) {
                awoke = true
            }
            runtime_doSpin()
            iter++
            old = m.state
            continue
        }

Then, the continue yields and the turn starts. In normal case, it will spin 4 times and then try to lock the mutex again.

Starving Check for Locking¶

Starving check prevents current go routine from acquiring the mutex due to the starvation of the other go routine.

After spinning, a new variable new, which is the copied value of state mentioned in the schema of CAS section. is defined for the lock acquiring later.

    new := old
    // Don't try to acquire starving mutex, new arriving goroutines must queue.
    if old&mutexStarving == 0 {
        new |= mutexLocked
    }

The modification of the new variable will later be applied to the mutex through CAS to really acquire the mutex. Hence, we could discuss in three cases:

The starving mode: the starved go routines need to retrieve the mutex first, so we don't update the lock bit which refers whether the mutex is locked.
the locked bit is 0: it's possible to acquire the mutex, so set the locked bit to 1 for future CAS. If the CAS returns true, it succeeds locking the mutex. Otherwise, it needs to try again.
the locked bit is 1: because or operation here doesn't change the value of new, it won't take effects later.

Increase Waiter Number¶

If either Locked or Starving flag is 1, the waiter will increase one because we know this go routine won't acquire the mutex now.

    if old&(mutexLocked|mutexStarving) != 0 {
        new += 1 << mutexWaiterShift
    }

Starving Switching¶

The starving variable refers to whether the current go routine is under starving mode. It's assigned after awoken from the semacquire call.

If the mutex is locked and the function scope variable starving variable is true, it refers the current go routine is starving.

As a result, the starving bit is set into 1 to notify the other go routines of its starving.

    // The current goroutine switches mutex to starvation mode.
    // But if the mutex is currently unlocked, don't do the switch.
    // Unlock expects that starving mutex has waiters, which will not
    // be true in this case.
    if starving && old&mutexLocked != 0 {
        new |= mutexStarving
    }

Awoken Setting¶

Then slowPath defines a function scope variable awoke to refer whether the current go routine is awoken from sleeping by runtime_semacquire. It's assigned later after the runtime_semacquire stops blocking.

    if awoke {
        // The goroutine has been woken from sleep,
        // so we need to reset the flag in either case.
        if new&mutexWoken == 0 {
            throw("sync: inconsistent mutex state")
        }
        new &^= mutexWoken
    }

The awoke bit marks the activeness of the current go routines. If the current go routine is awoken from sleeping, the woken bit is reset as once it acquires the lock the mutex doesn't have active go routines.

Acquire Mutex by CAS¶

All logics mentioned above is the processor shown in the schema, and in this section we will try to write the modified copied value back, which means the go routine tries to acquire the mutex.

        if atomic.CompareAndSwapInt32(&m.state, old, new) {
            // ignore lines, added by blog authors
        } else {
            old = m.state
        }

Then, the new state is swapped into the state of mutex if the state equals to the old value(no other go routine has updated the state yet).

Here, we need to be aware of the change between the old and the new state, which is a collection of the state changes in the previous topics:

The new lock flag is always 1 if no starving
The new waiter number increases 1
The new starving flag is true if there are starving
The new awoke flag is flipped if the old could awoke(TODO: check the usage of awoke)

Here we need to highlight the mutex acquiring and go routine unblocking(by the semacquire) are two different things. Unblocking at the runtime_SemacquireMutex doesn't mean the successfully retrieving lock. It just stands for the case the go routine has woken to try locking.

Unblock if the condition is satisfied¶

If neither locked nor starving bit is 1, this endless loop is yielded to break directly. It again demonstrates that the CAS doesn't manage the go routines and controls to unleash blocked go routines. The internal state transfer of the mutex decides the lock logic.

Here, the CAS makes sure the unlocked case can be occupied by a single go routine; after that all other go routines will read the new loaded state, which has been locked already.

if atomic.CompareAndSwapInt32(&m.state, old, new) {
    if old&(mutexLocked|mutexStarving) == 0 {
        break // locked the mutex with CAS
    }
    // ignore lines, added by blog author
}

Besides this logic, the other logic focuses on the state transferring for the mutex.

Blocked at runtime_SemacquireMutex¶

Each Lock attempt in a separate go routine holds its own for loop, so each holds a waitStartTime to mark the waited time. We can utilize it to decide the priority when joining the queue, which might be released when the value of &m.sema is larger than 0.

    // If we were already waiting before, queue at the front of the queue.
    queueLifo := waitStartTime != 0
    if waitStartTime == 0 {
        waitStartTime = runtime_nanotime()
    }
    runtime_SemacquireMutex(&m.sema, queueLifo, 1)

Again, the unblocking doesn't have any meaning of succeeding locking. It just means the current awoke from the sleep state. The successful locking is marked when the current(old) state is locked and not starving, which is mentioned above.

Back to Spin¶

After unblocked from the runtime_SemacquireMutex, the awoke and iter is set so it could spin again:

    awoke = true
    iter = 0

Handle Starving¶

If the current go routine is under the starving mode, it acquires the lock at the highest priority.

    if old&mutexStarving != 0 {
        // If this goroutine was woken and mutex is in starvation mode,
        // ownership was handed off to us but mutex is in somewhat
        // inconsistent state: mutexLocked is not set and we are still
        // accounted as waiter. Fix that.
        if old&(mutexLocked|mutexWoken) != 0 || old>>mutexWaiterShift == 0 {
            throw("sync: inconsistent mutex state")
        }
        delta := int32(mutexLocked - 1<<mutexWaiterShift)
        if !starving || old>>mutexWaiterShift == 1 {
            // Exit starvation mode.
            // Critical to do it here and consider wait time.
            // Starvation mode is so inefficient, that two goroutines
            // can go lock-step infinitely once they switch mutex
            // to starvation mode.
            delta -= mutexStarving
        }
        atomic.AddInt32(&m.state, delta)
        break
    }

Conclusion of the Slow Path¶

As we can see, Lock's logic is not as hard as imagined. After failing to lock in the fast path, all go routines switch their states between sleeping into runtime_SemacquireMutex and spinning to retrieve the mutex when the mutex is unlocked and un-starving. This check is protected by the CAS operation so only one check could happen simultaneously.

Besides the normal case, there is a fallback logic for the starving go routines.

Unlock¶

Fast Path: Drop to Unlock¶

The Unlock drops the lock bit to unlock the mutex. If the updated value isn't zero, we need to handle more cases such as waking the other blocked go routines and so on.

func (m *Mutex) Unlock() {
    // Fast path: drop lock bit.
    new := atomic.AddInt32(&m.state, -mutexLocked)
    if new != 0 {
        // Outlined slow path to allow inlining the fast path.
        // To hide unlockSlow during tracing
        // we skip one extra frame when tracing GoUnblock.
        m.unlockSlow(new)
    }
}

Slow Path: Handle Starving and Normal Cases¶

The unlockSlow focuses on the invoking of the other sleeping go routines by setting up the woken bits.

In non-starving mode, if there are no waiters or a goroutine has already been woken or grabbed the lock, no need to wake anyone. Otherwise, it tries to reduce one waiter, uses CAS to change the state, and call runtime_Semrelease.

Under starving mode, it calls runtime_Semrelease with handoff as true to release the first waiter.

FAQ¶

Why mutex shouldn't be copied¶

As we mentioned, the two integers stand for a lot of states of a mutex. Combining the semacquire and semrelease APIs, the mutex internal state(field state and sema) is updated on time correctly.

Hence, if we simply copy the mutex(with its value), the internal state might have a problem when we try to use it as a new mutex.

Reentrant¶

A reentrant mutex (also known as a recursive mutex) allows the same thread to acquire the mutex multiple times without causing a deadlock. As the source code above demonstrates, go mutex doesn't record the go routine identify and uses the semacquire to control when the mutex is acquired.

As a result, go mutex isn't reentrant. Unpaired lock-unlocked calls will cause problems. For example, docker has encountered such an issue before.