In previous posts, I talked about entity systems (though incomplete), but theres one big beneficial side-effect of using entity systems which I hadn’t considered until now. Recently, I’ve been spending a lot of time building a production-quality component system, using Intel’s Threading Building Blocks to efficiently parallelize message handling across available processor cores and I’ve spent a lot of time on trying to make efficient use of the processor cache (minimizing cache misses, reducing false sharing, etc). In single threaded applications, the processor cache works in the background, transparently reducing memory latency of your code by storing frequently accessed data in fast memory inside the processor, but in multithreaded programs, this works against you because each core’s cache must be kept synchronized. That means that when a thread executing on one core modifies memory, the changes must be reflected in the caches of the other cores, which takes time and increases processor interconnect traffic. Too much traffic and the bus gets saturated, decreasing the performance of all threads, even ones which do not access this shared data. So, efficient use of the cache is pretty important and it must be done (more or less) manually.
In short, the processor cache works by keeping a copy of values accessed from RAM locally in the processor core. If read memory location A, it will store A and the N following bytes in the cache (the size of A + N is a cache line). I simplified this by assuming that A is aligned to the beginning of a cache line – in real life, it could be in the middle, or even the end of a cache line. That means that if you store a number of related items next to each other, eg in an array, an iterate through them, you only need to access RAM once – the other items will already be loaded into the cache (assuming the total size is no larger than the cache line). This is based on the principle of locality of reference, that is, it is likely that memory which is near other memory which is being accessed will also be accessed.
Recently, I came across some slides which talk about the pitfalls of object oriented programming and it points out how classes and objects are actually pretty cache-unfriendly. This is because objects encapsulate different properties which fragment the cache. For example, in a game, you might have an Actor class, which encapsulates the idea of a player or computer controlled character. Actors may have the following properties:
- position
- health
- inventory
- geometry
One common operation of Actors may be to update its position in some way. Something like the following pseudocode:
class Actor
{
Position position;
Health health;
Inventory inventory;
Geometry geometry;
};
...
Actor[] global_actors;
...
function update_positions ()
{
for_each (Actor a in global_actors)
{
a.position = update(a.position);
}
}
This seems like a reasonable approach, right?
Except it has a serious flaw. Step by step, what actually happens is something like this: the first Actor is loaded – a cache miss, so its loaded from RAM, into the cache. The position is read and updated and written to the cache (the processor will flush the change to RAM in the background). Then the second Actor is loaded, but this is also a cache miss, so its loaded from RAM.. and so on for each Actor. This is because the cache line is taken up by other unrelated properties of the Actor class.
If, instead, the positions of all Actors was stored together (and the health for all Actors is stored together, and the inventories and geometry and so forth), lets say as four arrays of structures – something like the following:
Position[] actor_positions;
Health[] actor_health;
Inventory[] actor_inventories;
Geometry[] actor_geometry;
...
function update_positions ()
{
for_each (Position p in actor_positions)
{
p.position = update(p.position);
}
}
the update function would be much more cache friendly. This is because the position objects are adjacent in memory. By accessing the first one, the second and third and so on are loaded into the cache (how many are loaded depends on the size of the Position structure and the size of a cache line). So instead of having to access RAM for every Actor, it now only needs to do so once for however many Position structures fit in a cache line. Less cache misses equals higher performance.
So back to entity systems. In an entity system, an entity (like the Actor entity) is a collection of traits (like Position). Since the traits are separated from the entities which posses these traits, they can be stored together, such as in an array. Since the traits can be stored together, processing them iteratively, for example, within a system which updates or handles them, conforms to the principle of locality of reference, and is therefore cache friendly.
Entity systems have a number of benefits as a software paradigm, architectural pattern and abstraction, including:
- Separation of concerns – features and functionality is split into distinct traits (the properties/data) and systems (the code implementing the features)
- Encapsulation – entity systems promote well-defined, potentially restricted interfaces for external systems to change traits (such as, through message passing), essentially a form of encapsulation and data-hiding. Similarly since traits can be considered independently, they offer an abstraction over specific entities
- Concurrency – as systems are independent, operating on their traits should be an independent procedure, allowing systems to be executed concurrently
- Polymorphism – entities can be treated as objects which are a union of their traits, that is, entities essentially function as a duck-typed object system
- Inheritance – the concept of an entity system doesn’t give inheritance in itself, however an implementation could by, for example, allowing entities to be used as templates for new entities (create a new entity with the same traits as an existing entity – prototype based OO – which can then be extended by adding other traits, if required)
- Code reuse – traits and systems offer an appealing method of code reuse
- Cache friendliness – a described above, the nature of entity systems provides opportunities for cache friendly implementations
I’m sure I missed more. As can be seen from the list above, an entity system shares most, if not all, of the desirable traits of object oriented programming, as well as some which are dificult in object oriented software such as concurrency. One can easily see that entity systems are an appealing paradigm for many kinds of software development.




