Iceoryx2 Lockfree Structures Benchmark¶
This page lists the benchmark results of the iceoryx2 lockfree structures. At the beginning, I hold the assumption that the lockfree structures are faster. However, that's not a true assumption. There is no significant differences between locked implementation and lockfree implementation in my test cases.
You can view my benchmark code here.
Rust bench isn't a stable version and the nightly is required:
BitSet¶
- Spawn 3 threads:
test sets_bench_test::bench_lock_set
... bench: 142,236.56 ns/iter (+/- 45,850.27)
test sets_bench_test::bench_lockfree_set
... bench: 140,587.12 ns/iter (+/- 16,576.66)
test sets_bench_test::bench_lock_set
... bench: 140,909.46 ns/iter (+/- 30,108.68)
test sets_bench_test::bench_lockfree_set
... bench: 143,902.79 ns/iter (+/- 38,403.69)
- Spawn 10 threads:
test sets_bench_test::bench_lock_set
... bench: 510,333.57 ns/iter (+/- 105,315.37)
test sets_bench_test::bench_lockfree_set
... bench: 476,959.17 ns/iter (+/- 106,816.62)
test sets_bench_test::bench_lock_set
... bench: 509,264.11 ns/iter (+/- 104,714.02)
test sets_bench_test::bench_lockfree_set
... bench: 564,790.84 ns/iter (+/- 231,446.76)
test sets_bench_test::bench_lock_set
... bench: 479,311.18 ns/iter (+/- 32,482.02)
test sets_bench_test::bench_lockfree_set
... bench: 476,138.56 ns/iter (+/- 42,120.09)
- Spawn 100 threads:
test sets_bench_test::bench_lock_set
... bench: 9,093,653.00 ns/iter (+/- 1,062,443.96)
test sets_bench_test::bench_lockfree_set
... bench: 9,030,541.00 ns/iter (+/- 1,306,544.63)
test sets_bench_test::bench_lock_set
... bench: 8,135,253.90 ns/iter (+/- 2,495,481.05)
test sets_bench_test::bench_lockfree_set
... bench: 7,607,292.65 ns/iter (+/- 1,182,462.61)
test sets_bench_test::bench_lock_set
... bench: 8,221,662.15 ns/iter (+/- 2,181,111.38)
test sets_bench_test::bench_lockfree_set
... bench: 7,282,570.10 ns/iter (+/- 543,363.17)
- Spawn 200 threads:
test sets_bench_test::bench_lock_set
... bench: 17,611,426.80 ns/iter (+/- 3,339,318.58)
test sets_bench_test::bench_lockfree_set
... bench: 17,492,900.80 ns/iter (+/- 4,899,939.38)
test sets_bench_test::bench_lock_set
... bench: 19,397,568.90 ns/iter (+/- 5,940,216.88)
test sets_bench_test::bench_lockfree_set
... bench: 18,374,333.40 ns/iter (+/- 5,099,306.18)
test sets_bench_test::bench_lock_set
... bench: 17,886,734.80 ns/iter (+/- 3,985,477.79)
test sets_bench_test::bench_lockfree_set
... bench: 18,035,225.70 ns/iter (+/- 4,363,520.09)
test sets_bench_test::bench_lock_set
... bench: 17,759,799.80 ns/iter (+/- 3,051,784.73)
test sets_bench_test::bench_lockfree_set
... bench: 18,572,965.40 ns/iter (+/- 6,688,336.47)
- Spawn 500 threads:
test sets_bench_test::bench_lock_set
... bench: 47,106,102.50 ns/iter (+/- 8,619,192.46)
test sets_bench_test::bench_lockfree_set
... bench: 47,488,284.80 ns/iter (+/- 11,977,784.07)
test sets_bench_test::bench_lock_set
... bench: 46,907,406.90 ns/iter (+/- 6,424,154.53)
test sets_bench_test::bench_lockfree_set
... bench: 47,813,943.50 ns/iter (+/- 13,783,177.07)
test sets_bench_test::bench_lock_set
... bench: 44,824,857.90 ns/iter (+/- 7,468,876.95)
test sets_bench_test::bench_lockfree_set
... bench: 40,346,977.60 ns/iter (+/- 3,527,602.85)
test sets_bench_test::bench_lock_set
... bench: 40,863,622.80 ns/iter (+/- 5,528,543.32)
test sets_bench_test::bench_lockfree_set
... bench: 40,516,263.30 ns/iter (+/- 2,060,834.97)
- Spawn 10000 threads:
test sets_bench_test::bench_lock_set
... bench: 870,365,424.80 ns/iter (+/- 32,854,496.28)
test sets_bench_test::bench_lockfree_set
... bench: 863,838,287.80 ns/iter (+/- 39,979,505.39)
test sets_bench_test::bench_lock_set
... bench: 860,196,524.50 ns/iter (+/- 35,809,552.75)
test sets_bench_test::bench_lockfree_set
... bench: 859,823,755.30 ns/iter (+/- 47,843,411.23)
- Set the thread same with CPU number(8 threads):
test sets_bench_test::bench_lock_set
... bench: 677,967.95 ns/iter (+/- 566,810.22)
test sets_bench_test::bench_lockfree_set
... bench: 628,967.59 ns/iter (+/- 186,699.22)
test sets_bench_test::bench_lock_set
... bench: 592,269.71 ns/iter (+/- 222,827.07)
test sets_bench_test::bench_lockfree_set
... bench: 607,693.88 ns/iter (+/- 1,065,052.54)
test sets_bench_test::bench_lock_set
... bench: 503,961.33 ns/iter (+/- 87,196.11)
test sets_bench_test::bench_lockfree_set
... bench: 502,034.43 ns/iter (+/- 78,611.69)
Container Structure¶
- Turn 50:
test mpmc_container_bench_test::bench_lock_container
... bench: 4,012,460.55 ns/iter (+/- 955,348.97)
test mpmc_container_bench_test::bench_lockfree_container
... bench: 3,892,901.50 ns/iter (+/- 910,201.54)
test mpmc_container_bench_test::bench_lock_container
... bench: 4,030,926.75 ns/iter (+/- 1,021,361.06)
test mpmc_container_bench_test::bench_lockfree_container
... bench: 4,031,596.70 ns/iter (+/- 983,474.72)
test mpmc_container_bench_test::bench_lock_container
... bench: 4,407,091.10 ns/iter (+/- 3,062,364.63)
test mpmc_container_bench_test::bench_lockfree_container
... bench: 4,375,045.45 ns/iter (+/- 1,896,113.58)
- Turn 200:
test mpmc_container_bench_test::bench_lock_container
... bench: 16,154,605.90 ns/iter (+/- 2,156,485.65)
test mpmc_container_bench_test::bench_lockfree_container
... bench: 16,495,923.70 ns/iter (+/- 4,157,138.94)
test mpmc_container_bench_test::bench_lock_container
... bench: 16,041,073.60 ns/iter (+/- 2,208,656.21)
test mpmc_container_bench_test::bench_lockfree_container
... bench: 17,949,996.00 ns/iter (+/- 14,665,121.08)
test mpmc_container_bench_test::bench_lock_container
... bench: 16,637,231.00 ns/iter (+/- 3,968,024.97)
test mpmc_container_bench_test::bench_lockfree_container
... bench: 15,872,389.40 ns/iter (+/- 1,970,470.06)
test mpmc_container_bench_test::bench_lock_container
... bench: 18,006,051.60 ns/iter (+/- 20,525,639.88)
test mpmc_container_bench_test::bench_lockfree_container
... bench: 19,730,836.00 ns/iter (+/- 12,648,645.82)
- Turn 100:
test mpmc_container_bench_test::bench_lock_container
... bench: 85,431,352.30 ns/iter (+/- 6,288,049.68)
test mpmc_container_bench_test::bench_lockfree_container
... bench: 85,097,561.40 ns/iter (+/- 5,134,950.25)
test mpmc_container_bench_test::bench_lock_container
... bench: 106,241,775.30 ns/iter (+/- 60,110,865.27)
test mpmc_container_bench_test::bench_lockfree_container
... bench: 99,199,730.10 ns/iter (+/- 26,088,050.60)
- Turn 500:
test mpmc_container_bench_test::bench_lock_container
... bench: 50,230,632.80 ns/iter (+/- 13,979,756.29)
test mpmc_container_bench_test::bench_lockfree_container
... bench: 50,988,633.90 ns/iter (+/- 14,443,455.11)
test mpmc_container_bench_test::bench_lock_container
... bench: 51,880,709.90 ns/iter (+/- 16,580,897.33)
test mpmc_container_bench_test::bench_lockfree_container
... bench: 51,842,761.20 ns/iter (+/- 22,942,782.72)
Queue¶
test spsc_queue_bench_test::bench_lock_queue
... bench: 674,299.66 ns/iter (+/- 888,429.97)
test spsc_queue_bench_test::bench_lockfree_queue
... bench: 574,698.44 ns/iter (+/- 194,670.78)
test spsc_queue_bench_test::bench_lock_queue
... bench: 710,897.90 ns/iter (+/- 686,012.18)
test spsc_queue_bench_test::bench_lockfree_queue
... bench: 690,330.20 ns/iter (+/- 425,587.03)
test spsc_queue_bench_test::bench_lock_queue
... bench: 406,889.06 ns/iter (+/- 265,046.73)
test spsc_queue_bench_test::bench_lockfree_queue
... bench: 472,730.90 ns/iter (+/- 265,581.04)
test spsc_queue_bench_test::bench_lock_queue
... bench: 1,434,549.48 ns/iter (+/- 1,658,126.61)
test spsc_queue_bench_test::bench_lockfree_queue
... bench: 457,031.25 ns/iter (+/- 53,033.70)
test spsc_queue_bench_test::bench_lock_queue
... bench: 336,386.72 ns/iter (+/- 193,513.12)
test spsc_queue_bench_test::bench_lockfree_queue
... bench: 613,891.67 ns/iter (+/- 327,383.84)
test spsc_queue_bench_test::bench_lock_queue
... bench: 1,015,869.80 ns/iter (+/- 1,622,771.10)
test spsc_queue_bench_test::bench_lockfree_queue
... bench: 458,995.83 ns/iter (+/- 68,525.64)
Using Perf to Analyze¶
After benchmark, I want to learn about the CPU cost between user and system, so I use time
to run the test with TURN=200.
# lock free
1.96s user 9.22s system 213% cpu 5.230 total
1.94s user 9.25s system 219% cpu 5.105 total
# lock
1.98s user 9.53s system 213% cpu 5.389 total
2.06s user 9.62s system 202% cpu 5.769 total
I can use perf
to see the lock and lockfree:
# lockfree
Samples: 850K of event 'cycles:P', Event count (approx.): 40072157908
Overhead Command Shared Object Symbol
2.94% demo-d326a53a0e [unknown] [k] 0xffffffffa6424721
2.37% demo-d326a53a0e [unknown] [k] 0xffffffffa76001c6
2.03% demo-d326a53a0e [unknown] [k] 0xffffffffa7387717
1.63% demo-d326a53a0e [unknown] [k] 0xffffffffa760190e
1.53% demo-d326a53a0e [unknown] [k] 0xffffffffa7600151
1.43% demo-d326a53a0e [unknown] [k] 0xffffffffa741ffc1
1.42% demo-d326a53a0e [unknown] [k] 0xffffffffa639c5b7
Samples: 848K of event 'cycles:P', Event count (approx.): 40285286987
Overhead Command Shared Object Symbol
2.92% demo-d326a53a0e [kernel.kallsyms] [k] 0xffffffffa6424721
2.39% demo-d326a53a0e [kernel.kallsyms] [k] 0xffffffffa76001c6
1.96% demo-d326a53a0e [kernel.kallsyms] [k] 0xffffffffa7387717
1.54% demo-d326a53a0e [kernel.kallsyms] [k] 0xffffffffa760190e
1.49% demo-d326a53a0e [kernel.kallsyms] [k] 0xffffffffa7600151
1.47% demo-d326a53a0e [kernel.kallsyms] [k] 0xffffffffa639c5b7
1.45% demo-d326a53a0e [kernel.kallsyms] [k] 0xffffffffa741ffc1
# lock
Samples: 720K of event 'cycles:P', Event count (approx.): 35274681204
Overhead Command Shared Object Symbol
2.89% demo-d326a53a0e [unknown] [k] 0xffffffffa6424721
2.32% demo-d326a53a0e [unknown] [k] 0xffffffffa76001c6
2.16% demo-d326a53a0e [unknown] [k] 0xffffffffa7387717
1.55% demo-d326a53a0e [unknown] [k] 0xffffffffa760190e
1.54% demo-d326a53a0e [unknown] [k] 0xffffffffa741ffc1
1.38% demo-d326a53a0e [unknown] [k] 0xffffffffa7600151
Samples: 859K of event 'cycles:P', Event count (approx.): 40447755632
Overhead Command Shared Object Symbol
3.10% demo-d326a53a0e [kernel.kallsyms] [k] 0xffffffffa6424721
2.39% demo-d326a53a0e [kernel.kallsyms] [k] 0xffffffffa76001c6
2.21% demo-d326a53a0e [kernel.kallsyms] [k] 0xffffffffa7387717
1.60% demo-d326a53a0e [kernel.kallsyms] [k] 0xffffffffa760190e
1.46% demo-d326a53a0e [kernel.kallsyms] [k] 0xffffffffa639c5b7
1.42% demo-d326a53a0e [kernel.kallsyms] [k] 0xffffffffa7600151
1.34% demo-d326a53a0e [kernel.kallsyms] [k] 0xffffffffa741ffc1
Using sudo cat /proc/kallsyms | grep -B5 -A5 "ffffffffa6424"
to search the funciton in the same page(the address is an offset within a function):
The top3 function call are:
- smp_call_function_many_cond
- entry_SYSRETQ_unsafe_stack
- rep_stos_alternative