Skip to content

Iceoryx2 Lockfree Structures Benchmark

This page lists the benchmark results of the iceoryx2 lockfree structures. At the beginning, I hold the assumption that the lockfree structures are faster. However, that's not a true assumption. There is no significant differences between locked implementation and lockfree implementation in my test cases.

You can view my benchmark code here.

Rust bench isn't a stable version and the nightly is required:

rustup install nightly
rustup default nightly  

BitSet

  • Spawn 3 threads:
test sets_bench_test::bench_lock_set     
    ... bench:     142,236.56 ns/iter (+/- 45,850.27)
test sets_bench_test::bench_lockfree_set 
    ... bench:     140,587.12 ns/iter (+/- 16,576.66)
test sets_bench_test::bench_lock_set     
    ... bench:     140,909.46 ns/iter (+/- 30,108.68)
test sets_bench_test::bench_lockfree_set 
    ... bench:     143,902.79 ns/iter (+/- 38,403.69)
  • Spawn 10 threads:
test sets_bench_test::bench_lock_set     
    ... bench:     510,333.57 ns/iter (+/- 105,315.37)
test sets_bench_test::bench_lockfree_set 
    ... bench:     476,959.17 ns/iter (+/- 106,816.62)
test sets_bench_test::bench_lock_set     
    ... bench:     509,264.11 ns/iter (+/- 104,714.02)
test sets_bench_test::bench_lockfree_set 
    ... bench:     564,790.84 ns/iter (+/- 231,446.76)
test sets_bench_test::bench_lock_set     
    ... bench:     479,311.18 ns/iter (+/- 32,482.02)
test sets_bench_test::bench_lockfree_set 
    ... bench:     476,138.56 ns/iter (+/- 42,120.09)
  • Spawn 100 threads:
test sets_bench_test::bench_lock_set     
    ... bench:   9,093,653.00 ns/iter (+/- 1,062,443.96)
test sets_bench_test::bench_lockfree_set 
    ... bench:   9,030,541.00 ns/iter (+/- 1,306,544.63)
test sets_bench_test::bench_lock_set     
    ... bench:   8,135,253.90 ns/iter (+/- 2,495,481.05)
test sets_bench_test::bench_lockfree_set 
    ... bench:   7,607,292.65 ns/iter (+/- 1,182,462.61)
test sets_bench_test::bench_lock_set     
    ... bench:   8,221,662.15 ns/iter (+/- 2,181,111.38)
test sets_bench_test::bench_lockfree_set 
    ... bench:   7,282,570.10 ns/iter (+/- 543,363.17)
  • Spawn 200 threads:
test sets_bench_test::bench_lock_set     
    ... bench:  17,611,426.80 ns/iter (+/- 3,339,318.58)
test sets_bench_test::bench_lockfree_set 
    ... bench:  17,492,900.80 ns/iter (+/- 4,899,939.38)
test sets_bench_test::bench_lock_set     
    ... bench:  19,397,568.90 ns/iter (+/- 5,940,216.88)
test sets_bench_test::bench_lockfree_set 
    ... bench:  18,374,333.40 ns/iter (+/- 5,099,306.18)
test sets_bench_test::bench_lock_set     
    ... bench:  17,886,734.80 ns/iter (+/- 3,985,477.79)
test sets_bench_test::bench_lockfree_set 
    ... bench:  18,035,225.70 ns/iter (+/- 4,363,520.09)
test sets_bench_test::bench_lock_set     
    ... bench:  17,759,799.80 ns/iter (+/- 3,051,784.73)
test sets_bench_test::bench_lockfree_set 
    ... bench:  18,572,965.40 ns/iter (+/- 6,688,336.47)
  • Spawn 500 threads:
test sets_bench_test::bench_lock_set     
    ... bench:  47,106,102.50 ns/iter (+/- 8,619,192.46)
test sets_bench_test::bench_lockfree_set 
    ... bench:  47,488,284.80 ns/iter (+/- 11,977,784.07)
test sets_bench_test::bench_lock_set     
    ... bench:  46,907,406.90 ns/iter (+/- 6,424,154.53)
test sets_bench_test::bench_lockfree_set 
    ... bench:  47,813,943.50 ns/iter (+/- 13,783,177.07)
test sets_bench_test::bench_lock_set     
    ... bench:  44,824,857.90 ns/iter (+/- 7,468,876.95)
test sets_bench_test::bench_lockfree_set 
    ... bench:  40,346,977.60 ns/iter (+/- 3,527,602.85)
test sets_bench_test::bench_lock_set     
    ... bench:  40,863,622.80 ns/iter (+/- 5,528,543.32)
test sets_bench_test::bench_lockfree_set 
    ... bench:  40,516,263.30 ns/iter (+/- 2,060,834.97)
  • Spawn 10000 threads:
test sets_bench_test::bench_lock_set     
    ... bench:  870,365,424.80 ns/iter (+/- 32,854,496.28)
test sets_bench_test::bench_lockfree_set 
    ... bench:  863,838,287.80 ns/iter (+/- 39,979,505.39)
test sets_bench_test::bench_lock_set     
    ... bench: 860,196,524.50 ns/iter (+/- 35,809,552.75)
test sets_bench_test::bench_lockfree_set 
    ... bench: 859,823,755.30 ns/iter (+/- 47,843,411.23)
  • Set the thread same with CPU number(8 threads):
test sets_bench_test::bench_lock_set                       
    ... bench:     677,967.95 ns/iter (+/- 566,810.22)
test sets_bench_test::bench_lockfree_set                 
    ... bench:     628,967.59 ns/iter (+/- 186,699.22)
test sets_bench_test::bench_lock_set                       
    ... bench:     592,269.71 ns/iter (+/- 222,827.07)
test sets_bench_test::bench_lockfree_set                 
    ... bench:     607,693.88 ns/iter (+/- 1,065,052.54)
test sets_bench_test::bench_lock_set                       
    ... bench:     503,961.33 ns/iter (+/- 87,196.11)
test sets_bench_test::bench_lockfree_set                 
    ... bench:     502,034.43 ns/iter (+/- 78,611.69)

Container Structure

  • Turn 50:
test mpmc_container_bench_test::bench_lock_container     
    ... bench:   4,012,460.55 ns/iter (+/- 955,348.97)
test mpmc_container_bench_test::bench_lockfree_container 
    ... bench:   3,892,901.50 ns/iter (+/- 910,201.54)

test mpmc_container_bench_test::bench_lock_container     
    ... bench:   4,030,926.75 ns/iter (+/- 1,021,361.06)
test mpmc_container_bench_test::bench_lockfree_container 
    ... bench:   4,031,596.70 ns/iter (+/- 983,474.72)

test mpmc_container_bench_test::bench_lock_container     
    ... bench:   4,407,091.10 ns/iter (+/- 3,062,364.63)
test mpmc_container_bench_test::bench_lockfree_container 
    ... bench:   4,375,045.45 ns/iter (+/- 1,896,113.58)
  • Turn 200:
test mpmc_container_bench_test::bench_lock_container     
    ... bench:  16,154,605.90 ns/iter (+/- 2,156,485.65)
test mpmc_container_bench_test::bench_lockfree_container 
    ... bench:  16,495,923.70 ns/iter (+/- 4,157,138.94)

test mpmc_container_bench_test::bench_lock_container     
    ... bench:  16,041,073.60 ns/iter (+/- 2,208,656.21)
test mpmc_container_bench_test::bench_lockfree_container 
    ... bench:  17,949,996.00 ns/iter (+/- 14,665,121.08)

test mpmc_container_bench_test::bench_lock_container     
    ... bench:  16,637,231.00 ns/iter (+/- 3,968,024.97)
test mpmc_container_bench_test::bench_lockfree_container 
    ... bench:  15,872,389.40 ns/iter (+/- 1,970,470.06) 

test mpmc_container_bench_test::bench_lock_container     
    ... bench:  18,006,051.60 ns/iter (+/- 20,525,639.88)
test mpmc_container_bench_test::bench_lockfree_container 
    ... bench:  19,730,836.00 ns/iter (+/- 12,648,645.82)
  • Turn 100:
test mpmc_container_bench_test::bench_lock_container     
    ... bench:  85,431,352.30 ns/iter (+/- 6,288,049.68)
test mpmc_container_bench_test::bench_lockfree_container 
    ... bench:  85,097,561.40 ns/iter (+/- 5,134,950.25)

test mpmc_container_bench_test::bench_lock_container     
    ... bench: 106,241,775.30 ns/iter (+/- 60,110,865.27)
test mpmc_container_bench_test::bench_lockfree_container 
    ... bench:  99,199,730.10 ns/iter (+/- 26,088,050.60)
  • Turn 500:
test mpmc_container_bench_test::bench_lock_container       
    ... bench:  50,230,632.80 ns/iter (+/- 13,979,756.29)
test mpmc_container_bench_test::bench_lockfree_container   
    ... bench:  50,988,633.90 ns/iter (+/- 14,443,455.11)

test mpmc_container_bench_test::bench_lock_container       
    ... bench:  51,880,709.90 ns/iter (+/- 16,580,897.33)
test mpmc_container_bench_test::bench_lockfree_container   
    ... bench:  51,842,761.20 ns/iter (+/- 22,942,782.72)

Queue

test spsc_queue_bench_test::bench_lock_queue             
    ... bench:     674,299.66 ns/iter (+/- 888,429.97)
test spsc_queue_bench_test::bench_lockfree_queue         
    ... bench:     574,698.44 ns/iter (+/- 194,670.78)

test spsc_queue_bench_test::bench_lock_queue             
    ... bench:     710,897.90 ns/iter (+/- 686,012.18)
test spsc_queue_bench_test::bench_lockfree_queue         
    ... bench:     690,330.20 ns/iter (+/- 425,587.03)

test spsc_queue_bench_test::bench_lock_queue             
    ... bench:     406,889.06 ns/iter (+/- 265,046.73)
test spsc_queue_bench_test::bench_lockfree_queue         
    ... bench:     472,730.90 ns/iter (+/- 265,581.04)

test spsc_queue_bench_test::bench_lock_queue             
    ... bench:   1,434,549.48 ns/iter (+/- 1,658,126.61)
test spsc_queue_bench_test::bench_lockfree_queue         
    ... bench:     457,031.25 ns/iter (+/- 53,033.70)

test spsc_queue_bench_test::bench_lock_queue             
    ... bench:     336,386.72 ns/iter (+/- 193,513.12)
test spsc_queue_bench_test::bench_lockfree_queue         
    ... bench:     613,891.67 ns/iter (+/- 327,383.84)

test spsc_queue_bench_test::bench_lock_queue             
    ... bench:   1,015,869.80 ns/iter (+/- 1,622,771.10)
test spsc_queue_bench_test::bench_lockfree_queue         
    ... bench:     458,995.83 ns/iter (+/- 68,525.64)

Using Perf to Analyze

After benchmark, I want to learn about the CPU cost between user and system, so I use time to run the test with TURN=200.

# lock free
1.96s user 9.22s system 213% cpu 5.230 total
1.94s user 9.25s system 219% cpu 5.105 total
# lock
 1.98s user 9.53s system 213% cpu 5.389 total
2.06s user 9.62s system 202% cpu 5.769 total

I can use perf to see the lock and lockfree:

# lockfree
Samples: 850K of event 'cycles:P', Event count (approx.): 40072157908
Overhead  Command          Shared Object          Symbol
   2.94%  demo-d326a53a0e  [unknown]              [k] 0xffffffffa6424721
   2.37%  demo-d326a53a0e  [unknown]              [k] 0xffffffffa76001c6
   2.03%  demo-d326a53a0e  [unknown]              [k] 0xffffffffa7387717
   1.63%  demo-d326a53a0e  [unknown]              [k] 0xffffffffa760190e
   1.53%  demo-d326a53a0e  [unknown]              [k] 0xffffffffa7600151
   1.43%  demo-d326a53a0e  [unknown]              [k] 0xffffffffa741ffc1
   1.42%  demo-d326a53a0e  [unknown]              [k] 0xffffffffa639c5b7

Samples: 848K of event 'cycles:P', Event count (approx.): 40285286987
Overhead  Command          Shared Object          Symbol
   2.92%  demo-d326a53a0e  [kernel.kallsyms]      [k] 0xffffffffa6424721
   2.39%  demo-d326a53a0e  [kernel.kallsyms]      [k] 0xffffffffa76001c6
   1.96%  demo-d326a53a0e  [kernel.kallsyms]      [k] 0xffffffffa7387717
   1.54%  demo-d326a53a0e  [kernel.kallsyms]      [k] 0xffffffffa760190e
   1.49%  demo-d326a53a0e  [kernel.kallsyms]      [k] 0xffffffffa7600151
   1.47%  demo-d326a53a0e  [kernel.kallsyms]      [k] 0xffffffffa639c5b7
   1.45%  demo-d326a53a0e  [kernel.kallsyms]      [k] 0xffffffffa741ffc1

# lock
Samples: 720K of event 'cycles:P', Event count (approx.): 35274681204
Overhead  Command          Shared Object          Symbol
   2.89%  demo-d326a53a0e  [unknown]              [k] 0xffffffffa6424721          
   2.32%  demo-d326a53a0e  [unknown]              [k] 0xffffffffa76001c6                   
   2.16%  demo-d326a53a0e  [unknown]              [k] 0xffffffffa7387717 
   1.55%  demo-d326a53a0e  [unknown]              [k] 0xffffffffa760190e     
   1.54%  demo-d326a53a0e  [unknown]              [k] 0xffffffffa741ffc1             
   1.38%  demo-d326a53a0e  [unknown]              [k] 0xffffffffa7600151  

Samples: 859K of event 'cycles:P', Event count (approx.): 40447755632
Overhead  Command          Shared Object          Symbol
   3.10%  demo-d326a53a0e  [kernel.kallsyms]      [k] 0xffffffffa6424721
   2.39%  demo-d326a53a0e  [kernel.kallsyms]      [k] 0xffffffffa76001c6
   2.21%  demo-d326a53a0e  [kernel.kallsyms]      [k] 0xffffffffa7387717
   1.60%  demo-d326a53a0e  [kernel.kallsyms]      [k] 0xffffffffa760190e
   1.46%  demo-d326a53a0e  [kernel.kallsyms]      [k] 0xffffffffa639c5b7
   1.42%  demo-d326a53a0e  [kernel.kallsyms]      [k] 0xffffffffa7600151
   1.34%  demo-d326a53a0e  [kernel.kallsyms]      [k] 0xffffffffa741ffc1

Using sudo cat /proc/kallsyms | grep -B5 -A5 "ffffffffa6424" to search the funciton in the same page(the address is an offset within a function):

ffffffffa64245e0 t smp_call_function_many_cond
ffffffffa6424b60 T __pfx_smp_call_function_many

The top3 function call are:

  • smp_call_function_many_cond
  • entry_SYSRETQ_unsafe_stack
  • rep_stos_alternative