Is Mutex+Chan Version of Once Better Than sync.Once?¶

In my previous blog packages.Load jitters, I said the jitters are caused by too many go routines are spawned so synchronization takes a lot of times.

However, at the beginning I thought the Lock in sync.Once costs a lot, so I tried to replace sync.Once with Mutex+Chan. The result is the sync.Once is still better.

Every call of sync.Once tries to lock¶

After checking the atomic value, the Do tries to lock the mutex before executing f to ensure only 1 function will be executed. It also means when the Do is called simultaneously, if the f takes a long time to execute, the other go routines will keep trying to acquire the mutex lock.

func (o *Once) Do(f func()) {
    if atomic.LoadUint32(&o.done) == 0 {
        // Outlined slow-path to allow inlining of the fast-path.
        o.doSlow(f)
    }
}
func (o *Once) doSlow(f func()) {
    o.m.Lock()
    defer o.m.Unlock()
    if o.done == 0 {
        defer atomic.StoreUint32(&o.done, 1)
        f()
    }
}

Usually, after spinning, the lock will fall asleep, see my previous blog about mutex. But actually it could sleep directly if it fails to acquire a lock.

Use Mutex+Chan to implement once¶

Here, I try to use TryLock to acquire the mutex and let state control whether the function has been initialized. The usage of TryLock will try only once while channel allows to sleep directly and wait to be awoken.

type Once struct {
    mu    *sync.Mutex
    state chan struct{}
}

func NewOnce() *Once {
    return &Once{
        state: make(chan struct{}, 1),
        mu:    &sync.Mutex{},
    }
}

func (o *Once) Do(f func()) {
    if o.mu.TryLock() {
        f()
        close(o.state)
        return
    }
    <-o.state
}

Conclusion¶

The benchmark result shows the mutex+chan is 2.5x slower than sync.Once. It shows that keeping acquiring mutex inside sync.Once is not a problem. Indeed, the synchronization of channel is slower.

I have overlooked the cost of channel and the efficiency of mutex lock.

To reduce synchronization efforts, focus on reducing to spawn un-necessary go routines.

BenchmarkMutexChan
BenchmarkMutexChan-10              1    2210972750 ns/op
BenchmarkSyncOnce
BenchmarkSyncOnce-10               2     746455666 ns/op
BenchmarkMutexChan
BenchmarkMutexChan-10              1    2269286375 ns/op
BenchmarkSyncOnce
BenchmarkSyncOnce-10               2     767407875 ns/op
BenchmarkMutexChan
BenchmarkMutexChan-10              1    1914167458 ns/op
BenchmarkSyncOnce
BenchmarkSyncOnce-10               2     762076584 ns/op

Benchmark Code

const (
    TURN    = 100_0000
    SleepNs = 1_000_000
)

func f() {
    time.Sleep(SleepNs * time.Nanosecond)
}

type Once struct {
    mu    *sync.Mutex
    state chan struct{}
}

func NewOnce() *Once {
    return &Once{
        state: make(chan struct{}, 1),
        mu:    &sync.Mutex{},
    }
}

func (o *Once) Do(f func()) {
    if o.mu.TryLock() {
        f()
        close(o.state)
        return
    }
    <-o.state
}

func BenchmarkMutexChan(b *testing.B) {
    for n := 0; n < b.N; n++ {
        once := NewOnce()
        wg := sync.WaitGroup{}
        barrier := sync.WaitGroup{}
        barrier.Add(1)
        for i := 0; i < TURN; i++ {
            wg.Add(1)
            go func() {
                defer wg.Done()
                barrier.Wait()
                once.Do(f)
            }()
        }
        barrier.Done()
        wg.Wait()
    }
}

func BenchmarkSyncOnce(b *testing.B) {
    for n := 0; n < b.N; n++ {
        once := sync.Once{}
        wg := sync.WaitGroup{}
        barrier := sync.WaitGroup{}
        barrier.Add(1)
        for i := 0; i < TURN; i++ {
            wg.Add(1)
            go func() {
                defer wg.Done()
                barrier.Wait()
                once.Do(f)
            }()
        }
        barrier.Done()
        wg.Wait()
    }
}