The Faster C++ blog

False sharing: A vicious problem when multi-threading

Posted on 2026-01-04

Multi-threading is a lightweight solution to bring parallelism to a code. Under ideal conditions, your computing time can be divided by the number of threads you use. In practice, there is always some overhead in using them, and a need for post-processing to merge the results. But the real problem with multi-threading is that they are notoriously difficult to implement properly. Data races and false sharing are the main concerns. I will focus on false sharing in this post. Simply put, false sharing happens when two threads work on different variables (Which is actually nice!), but those variables are located in the same cache line (That is the bad part!). This leads to cache misses after cache misses. And it results in a huge performance penalty! False sharing differs from data races in the sense that the results you get at the end are not corrupted. There is no logical problem with the code per se. The only thing is you could have gotten your threads running way faster with some adjustments. So it is definitely worth checking it out! The following code seems perfectly fine, yet it contains a false sharing problem:

#include <iostream> #include <chrono> #include <thread> using namespace std; using namespace std::chrono; struct Data { long a; long b; // sits on the same L1 cache line as the "a" member Data() : a(0l), b(0l) { } }; const long nb_iter = 100000000l; void work(long &e) { while(e<nb_iter) { e++; } } int main() { const auto tic = high_resolution_clock::now(); Data d; std::thread t1(work, ref(d.a)); std::thread t2(work, ref(d.b)); t1.join(); t2.join(); const auto toc = high_resolution_clock::now(); const std::chrono::duration<double, std::milli> elapsed = toc - tic; cout << "a = " << d.a << endl; cout << "b = " << d.b << endl; cout << "Computation time: " << elapsed.count() << "ms" << endl; return 0; }

I have run 100 executions and recorded the timings. They average to around 600ms. Now let us add useless variables between the data members, and see if there is any consequence on performance. My computer has an L1 cache of 64 bytes, and the long type sits on 8 bytes. So I will insert 7 long variables to ensure the b member sits on a different L1 cache line than a:

struct Data { long a; long padding[7]; // seemingly useless variables long b; // now sits on a different L1 cache line Data() : a(0l), padding(), b(0l) { } };

Running yet another 100 rounds of executions and compiling the results shows an astonishing result: The average computing time drops significantly, at around 200ms!!! If you read my post regarding benchmarking, you know that it is wise to use the Student test to ensure this observation is not by random chance. So I did exactly that using Scipy. The Student test applied on my benchmark data showed that there was only one chance over 10²⁵⁰ for the measurements to be pure coincidence... So the effect of the padding variables is indeed real! How in the world is that possible?! Adding useless code speeds things up?! Yes it does! Because now the data members live in distinct L1 cache lines. And this is guaranteed to prevent L1 cache misses! After reading this post, you might be tempted to have another look at your data structures!

Snapcraft and venv clash

Posted on 2025-12-27

Although this is not about C++, the following seemed interesting enough to me to have it published.

Walking through the example projects of the excellent Webots robotics simulation platform, I encountered a puzzling problem which took me a few days to solve. I taught I would share the solution here to help out anybody losing his mind over this! My problem was with the ABB IRB simulation. I could not get it to run on my computer because the ikpy Python module was not found. Of course I already had ikpy installed in a virtual environment (As this is a good practice whenever working with Python). I used venv for that purpose. And I was running Webots in the context of that virtual environment. But I figured that the system Python interpreter was called at runtime, instead of the one from my venv. Which logically lead to that "ikpy module not found" error.

I investigated this by tracking down the PATH environment variable in the few processes launched by Webots. By the way, my workstation runs on Kubuntu. So I mainly achieved this by making use of:

ps -e --forest -o pid,comm,exe,args

This is to list processes and the path to their executable.

And:

xargs -0 -L1 -a /proc/pid/environ | grep PATH

To check what PATH environment variable is visible to the pid process.

The thing that struck me is that there was no trace of venv in the PATH variable of webots, even though webots was directly run from venv! The PATH visible to webots only contained Snapcraft entries, like the following:

$SNAP/usr/sbin:$SNAP/usr/bin:$SNAP/sbin:...

It occured to me that Snapcraft was overwriting the PATH, with no consideration whatsoever for venv entries! This is in fact not a bug, but a feature, as documented here.

Long story short, I could find no way of really fixing this... It seems that Snapcraft and venv are just not meant for each other!

So I had to give up on one of them: Either install the Python ikpy module system-wide, or quit using the Snap package of Webots. The latter seemed less painful to me. So I settled on using the APT package of Webots. And guess what? The ABB simulation now works like a charm!

_- Faster C++