Data-Oriented Design
Posted on
There is a common pattern when dealing with a huge number of objects. It consists in looping over the array of objects, and invoking a method depending on a condition. The code could look like this:
class Agent { bool _active; char _data[256]; // a big member public: Agent() : _active(false) { } inline bool is_active() const { return _active; } inline void activate() { _active = true; } void work() { [...] // work on the _data member } }; int main() { Agent array[1000]; [...] // Assume there is code here activating some elements of the array. for(int i = 0; i < 1000; ++i) { if(array[i].is_active()) { array[i].work(); } } return 0; }
An array of C++ objects is a contiguous arrangement in memory. Each object sits next to another. In this example, there is a gap of 256 bytes between every two "_active" boolean members. Therefore, the loop essentially jumps 256 bytes in memory at each iteration. This causes an L1 cache miss on any modern architecture, as they usually retain only 64 bytes of data. Now, if all booleans are true and work() is invoked, that call's cost may outweight the cache miss. And if all booleans are false, the hardware may predict the access pattern (After a few iterations) and actually prefetch the next boolean ahead of time. But when true and false booleans are mangled, the cache misses are likely to dominate the execution time. Except if work() is very time consuming. This is a real-world problem, completely related to OOP (Object-Oriented Programming). And in fact, it has a solution, which requires a bit of refactoring. Precisely, converting the code to DOD (Data-Oriented Design). The goal is to make the most out of the cache lines, by removing potiental "bubbles". In our example, the for loop could just use one boolean from the cache line. Because the rest was occupied by the "_data" member. Let us make the loop use all of the cache line instead!
// Refactor work() to accept the _data array as an in-out parameter void work(char array_of_data[]) { ... } int main() { bool array_of_is_active[1000]; char array_of_data[1000][256]; [...] // Assume there is code here activating some elements of the array. for(int i = 0; i < 1000; ++i) { if(array_of_is_active[i]) { work(array_of_data[i]); } } return 0; }
This looks more like C than C++ indeed. But it is way more efficient in practical cases. So it should not be overlooked when looking for faster C++!