A “free” benefit to using entity systems

In previous posts, I talked about entity systems (though incomplete), but theres one big beneficial side-effect of using entity systems which I hadn’t considered until now. Recently, I’ve been spending a lot of time building a production-quality component system, using Intel’s Threading Building Blocks to efficiently parallelize message handling across available processor cores and I’ve spent a lot of time on trying to make efficient use of the processor cache (minimizing cache misses, reducing false sharing, etc). In single threaded applications, the processor cache works in the background, transparently reducing memory latency of your code by storing frequently accessed data in fast memory inside the processor, but in multithreaded programs, this works against you because each core’s cache must be kept synchronized. That means that when a thread executing on one core modifies memory, the changes must be reflected in the caches of the other cores, which takes time and increases processor interconnect traffic. Too much traffic and the bus gets saturated, decreasing the performance of all threads, even ones which do not access this shared data. So, efficient use of the cache is pretty important and it must be done (more or less) manually.

In short, the processor cache works by keeping a copy of values accessed from RAM locally in the processor core. If read memory location A, it will store A and the N following bytes in the cache (the size of A + N is a cache line). I simplified this by assuming that A is aligned to the beginning of a cache line – in real life, it could be in the middle, or even the end of a cache line. That means that if you store a number of related items next to each other, eg in an array, an iterate through them, you only need to access RAM once – the other items will already be loaded into the cache (assuming the total size is no larger than the cache line). This is based on the principle of locality of reference, that is, it is likely that memory which is near other memory which is being accessed will also be accessed.

Recently, I came across some slides which talk about the pitfalls of object oriented programming and it points out how classes and objects are actually pretty cache-unfriendly. This is because objects encapsulate different properties which fragment the cache. For example, in a game, you might have an Actor class, which encapsulates the idea of a player or computer controlled character. Actors may have the following properties:

  • position
  • health
  • inventory
  • geometry

One common operation of Actors may be to update its position in some way. Something like the following pseudocode:

class Actor
{
    Position position;
    Health health;
    Inventory inventory;
    Geometry geometry;
};
...
Actor[] global_actors;
...
function update_positions ()
{
    for_each (Actor a in global_actors)
    {
        a.position = update(a.position);
    }
}

This seems like a reasonable approach, right?

Except it has a serious flaw. Step by step, what actually happens is something like this: the first Actor is loaded – a cache miss, so its loaded from RAM, into the cache. The position is read and updated and written to the cache (the processor will flush the change to RAM in the background). Then the second Actor is loaded, but this is also a cache miss, so its loaded from RAM.. and so on for each Actor. This is because the cache line is taken up by other unrelated properties of the Actor class.

If, instead, the positions of all Actors was stored together (and the health for all Actors is stored together, and the inventories and geometry and so forth), lets say as four arrays of structures – something like the following:

Position[] actor_positions;
Health[] actor_health;
Inventory[] actor_inventories;
Geometry[] actor_geometry;
...
function update_positions ()
{
    for_each (Position p in actor_positions)
    {
        p.position = update(p.position);
    }
}

the update function would be much more cache friendly. This is because the position objects are adjacent in memory. By accessing the first one, the second and third and so on are loaded into the cache (how many are loaded depends on the size of the Position structure and the size of a cache line).  So instead of having to access RAM for every Actor, it now only needs to do so once for however many Position structures fit in a cache line. Less cache misses equals higher performance.

So back to entity systems. In an entity system, an entity (like the Actor entity) is a collection of traits (like Position). Since the traits are separated from the entities which posses these traits, they can be stored together, such as in an array. Since the traits can be stored together, processing them iteratively, for example, within a system which updates or handles them, conforms to the principle of locality of reference, and is therefore cache friendly.

Entity systems have a number of benefits as a software paradigm, architectural pattern and abstraction, including:

  • Separation of concerns – features and functionality is split into distinct traits (the properties/data) and systems (the code implementing the features)
  • Encapsulation – entity systems promote well-defined, potentially restricted interfaces for external systems to change traits (such as, through message passing), essentially a form of encapsulation and data-hiding. Similarly since traits can be considered independently, they offer an abstraction over specific entities
  • Concurrency – as systems are independent, operating on their traits should be an independent procedure, allowing systems to be executed concurrently
  • Polymorphism – entities can be treated as objects which are a union of their traits, that is, entities essentially function as a duck-typed object system
  • Inheritance – the concept of an entity system doesn’t give inheritance in itself, however an implementation could by, for example, allowing entities to be used as templates for new entities (create a new entity with the same traits as an existing entity – prototype based OO – which can then be extended by adding other traits, if required)
  • Code reuse – traits and systems offer an appealing method of code reuse
  • Cache friendliness – a described above, the nature of entity systems provides opportunities for cache friendly implementations

I’m sure I missed more. As can be seen from the list above, an entity system shares most, if not all, of the desirable traits of object oriented programming, as well as some which are dificult in object oriented software such as concurrency. One can easily see that entity systems are an appealing paradigm for many kinds of software development.

Advertisements

ANI: implicit, safe, guaranteed deadlock-free parallelism

I’ve been involved in the ANI community lately and I’ve wanted to blog about it, but couldn’t bring myself to come up with some code samples, since its such an early work-in-progress. I finally managed to muster up enough motivation to talk about some of the existing sample code instead, so here we are.

The most important question is “What is ANI and why is it interesting?” and this is exactly what I hope to answer.

ANI is a (very early) work-in-progress parallel programming language based on the dataflow paradigm. Currently, anic (the ANI compiler) can parse the source code, but cannot yet generate any runtime code. Hopefully I will be able to give a real, live demonstration soon.
What makes ANI interesting is that it aims to be fast, safe and easy to use and it achieves this by building on a solid dataflow-based core language. The compiler and runtime are to ensure that an ANI program will make use of available processing cores, in parallel, and that this implicit parallelism will never deadlock. That is, ANI is parallel and safe.
ANI programs also manage memory in such a way that the programmer will not have to manually handle allocation and freeing of memory, but the runtime also does not need a complex or slow garbage collector. This is achieved by the way that ANI manages the dataflow core – it always knows what data is in use and when it can be freed.
Because of these features, ANI is a very different language to the imperative languages we are used to and, therefore, may seem quite complex and difficult, but once the core concepts are understood, the language becomes surprisingly easy. So, lets take a look at these core concepts.

The ANI language is built around a handful of base concepts, namely latches, streams, filters and pipes. I will introduce each in turn.

  • A pipe is a sequence of instructions through which data flows and the core construct through which ANI moves data from operation to operation. A pipe is executed sequentially and multiple pipes will execute in parallel, side by side. When a pipe requires data from a latch or a stream, it may block while waiting for the latch or stream to produce data – that is, become non-empty.
  • A latch is a placeholder for a value, not unlike a variable in imperative languages. Latches may be empty – containing no value at all – or they may be initialized. You can, of course, put a value into a latch later and you can take the value out again, making the latch be empty once again. The important part, however, is that when a latch contains a value, any pipes which are blocking to receive its value become runnable and are scheduled to be executed.
  • A stream is a queue of latches. Values can be fed into a stream and into a pipe, where it can be processed.
  • A filter is a piece of code through which values may flow, as part of a pipe. Filters may alter or consume values from the pipe. For example, a filter could take in a pair of numbers and add them.

So, what does ANI look like? The following sample program demonstrates the major constructs which make up the ANI language. This program, taken from the ANI test code, runs a combination of a realtime clock and a simple textual calculator, running in parallel. The user is prompted to enter a number, followed by one of +, , * or / and then another number. The calculation is performed and the result is output on the next clock tick. The three parts – input, calculation and clock tick/output – can all execute in parallel.

@std.in;

a  = [[int\]]  <- 0;
op = [[char\]] <- ' ';
b  = [[int\]]  <- 0;
r  = [[int\]]  <- 0;

0 {
    clock => [[int ms]] {
        ("\r" + ms/1000.0 + ": " + a + op + b + "=" + r) ->std.out;
        1 std.delay
        (ms+1) clock
    }
};

inLoop => {
    \in ->a
    \in ->op
    \in ->b
    inLoop
};

\\op ?? {
    '+': (\a + \b)
    '-': (\a - \b)
    '*': (\a * \b)
    '/': (\a / \b)
       : 0
} ->r;

I will now disect the program and explain each part in turn.

@std.in;

a  = [[int\]]  <- 0;
op = [[char\]] <- ' ';
b  = [[int\]]  <- 0;
r  = [[int\]]  <- 0;

The @std.in; adds the in latch from the std package to the current namespace. The rest of this code snippets defines and initializes latches for the programs internal use. int\ means a latch to hold an integer, char\ means a latch to hold a single character. [[datatype]] is how objects are constructed in ANI. So, for example, [[int\]] constructs a latch which can hold an integer. The <- value assigns the specified value to the newly constructed object and the = binds it to an identifier so it can be referred to later. So, the first line creates an integer latch, initializes it to zero and binds the name a to it.

The next code block is a little more involved.

0 {
    clock => [[int ms]] {
        ("\r" + ms/1000.0 + ": " + a + op + b + "=" + r) ->std.out;
        1 std.delay
        (ms+1) clock
    }
};

The easiest way to understand this code snippet is by starting with clock => [[int ms]] {. What this line does is creates a pipe named clock, which takes an integer value (bound to the name ms). The => means that any values to the left of the name (in this case 0) will flow into the pipe – that is, it is defined and used in place. If = had been used instead, then 0 would have had to have been sent to clock with 0 clock, somewhere outside of its definition. The code inside { … } is the body of the clock pipe.

The first line in the clock pipe first concatenates a bunch of strings (implicitly converting the values of a, op, b and r to strings).  ms is divided by 1000.0 before being converted to a string. For example, if ms contains 2345, a contains 2, b contains 5, r contains 7 and op contains ‘+‘, then the expression inside the parentheses would evaluate to “\r2.345: 2+5=7” (and \r is a carriage return). The final part of this line, ->std.out, sends this string to std.out which causes it to be output to the screen. The semicolon at the end of the line means that this is the end of the pipe and a new pipe is started (but both are sub-pipes of the clock pipe, which continues until the } is reached. Basically, clock forks into two parallel pipes). This pipe and the next may be executed concurrently, every time clock is sent a value.

The second pipe sends 1 to std.delay, causing this pipe to be delayed (or paused) for (at least) 1 millisecond and then the pipe continues. The final part of this pipe adds 1 to the value of ms and passes it to clock – this causes these two pipes to be executed again, but this time ms has been incremented by one. Essentially, this defines a feedback loop, which causes clock (and therefore its two sub-pipes) to be executed in 1 millisecond increments, printing the current state of the calculator (a, b, op and r) each time.

The next code block prompts the user for input to the calculator:

inLoop => {
    \in ->a
    \in ->op
    \in ->b
    inLoop
};

Just like with clock, this defines a named pipe, but this time it doesn’t take in any values. Instead it executes three input commands before recursively calling itself, causing the program to loop endlessly, asking for input on each iteration. \in unlatches the in latch, which causes the program to wait for user input, which it returns when the user presses the return key. The ->a sends this value to the latch, a. This is repeated for op and b. Note that the in latch is a node\ and its values are implicitly converted to int and char, as appropriate.

\\op ?? {
    '+': (\a + \b)
    '-': (\a - \b)
    '*': (\a * \b)
    '/': (\a / \b)
       : 0
} ->r;

This final code block executes every time op has received a new value. This is done by destreaming the op latch. Simply unlatching it would have removed the value once, but it would not trigger the following pipe again the next time it receives a value.

The ?? initializes a pattern matching construct – essentially, the value on the pipe (the value destreamed from op) is matched against each of the values specified before the : on each line in the following block and the code on the right is executed for whichever value is matched – or the last one, where the value is omitted, is executed if none match. The code that is executed on a match simply unlatches a and b and applies the appropriate operator to them, depending on what the value of op was. The result of this calculation is then sent into r.

So, we can see how the calculator has received its values, performed its calculations and put the result into r. On its next iteration, the clock would pick up these values and they would be output. At the same time as all of this is happening, the user may be typing the next set of inputs for the calculator. In fact, the input, output/clock and the calculations may all be ocurring in parallel, at the same time.

Language concept: elastic scopes

I was playing around with the idea of having global variables.. With nested scopes.. In a threaded environment. Yes, I’m aware that a global variable with scope isn’t a global variable – but.. it works.. though doesn’t really make sense. Actually, it does make sense, but the terminology is off. Basically, its a global variable whose value depends on the scope in which its used. Quick code example?

ScopedGlobal<int> global = 5; // Global scoped global.

void foo () {
    ScopedGlobal<int> global  = 4; // Local scoped global.
    output("Value in foo: ", global); // Print 4
}

void bar () {
    output("Value in bar: ", global); // Print 5
}

Ok, that code clearly won’t compile in C++, but its just an example. Also, it proves nothing – the local variant simply aliases the global. What I had in mind is a little more complex than this – what if the local version can be passed to a different scope and still be valid? Perhaps to another thread? Like how a closure captures its enclosing environment – a local variable could capture its environment and carry it along with it, to another scope or another thread. Capturing the entire environment could be a bit of effort, but a simple variable doesn’t need to – its not a function which references its environment, its just a value which needs to retain its state in various scopes. This can actually be implemented in C++ with a little effort and even made thread safe so it can be passed between threads. You can take a look at my test code here. Of course, my little test only allows the creation of one of these, since the global access is handled through static variables – but you could keep a table that can be accessed to store a number of variables, so it can actually be implemented in plain C++.

But, thats besides the point of this article. It is, however, the background information for my newest and craziest language concept. I’m unsure if I should add an “esoteric language” tag to this post or not 😉

What if scopes were elastic? They could change and move around, altering the state of your variables and program transparently to its execution. Perhaps elastic scopes could even be seen as a computation paradigm? Then again, I’m not going to go to the effort of figuring out how to turn elastic scopes into a turing complete execution model – but I am going to theorize how they could be used as part of an execution environment in a highly parallel system.

Instead of functions, I am going to use named scopes, which can be called asynchronously and whose environments can be read from or sent to by other scopes. Heres some possible sample code:

scope1 = {
    var val = "in scope1"
    print "scope1 " val

    async {
        scope2 = {
            val = "in scope2"
            print "scope2 " val
            sleep 500
            print "scope2 " val
        }
        scope3 = {
            print "scope3 " val
            sleep 100
            print "scope3 " val
            val = "in scope3"
            print "scope3 " val
            send scope3 to scope2
        }
    }
    print "scope1 " val
}

And the output would be something like the following:

scope1 in scope1
scope3 in scope1
scope2 in scope2
scope3 in scope1
scope3 in scope3
scope2 in scope3
scope1 in scope1

Spaghetti code, of course, but overall, simple enough so far.

So far, its nothing too special, besides perhaps sending the value of val from scope3 to scope2. But imagine, for a moment, what would happen if the value was passed, as an argument, to a function which is executed in its own thread. What if it was passed in such a way that its scope was retained? Lets see an example:

function f1 (var v) {
    print v
    sleep 500
    print v
}
scope1 = {
    var val = 5
    threat t = f1
    f1.start(val)
    sleep 100
}

What does the second print output? In a normal language, it would be 5, but if val retains its scope – scope1 – then it will have went out of scope before reacng this print. Imagine that.. scopes changing in other threads could affect your variables! Of course, I’m assuming that accessing a variable is always atomic. This would be extremely important in such a language.

Am I going to design a lnaguage around this idea? No, of course not and I shudder to think of all the changing state! Shared state is evil and makes concurrency hard. Yet here I take shared state and make it even more unpredictable than before. But, the idea is still interesting. Could it be used to do wonderful things? Can it be used with immutable data structures? What is an immutable data structure in a mutable, elastic scope? Is it, essentially, mutable? Can this be used as part of a dataflow language to do weird and wonderful things? Or was this a pointless exercise? These are all open questions and I invite people to play with them and see what happens. I will ammend this post after toying with the concept some more.

Flat hierarchies in C++: Part 2

This post expands on the previous post in the series by decoupling the components and event handlers and allowing the execution environment to be easily redefined by replacing small sections of the code, without having to change anything else. It also lays the groundwork for a thread safe parallel execution environment.

First, we include the standard C++ library classes which we will be suing.

#include <map>
#include <vector>
#include <deque>
#include <iostream>

This next block of code defines a typedef for a mutex and a class for creating atomic blocks of code through RAII. This isn’t yet used, but in the next post, when we introduce threads and parallel execution, this will be modified to actually refer to a mutex and actually lock and unlock the mutex, allowing the rest of the code to be made threadsafe without any other changes (besides implementing the threads).

// Replace this with a Mutex or lock datatype.
typedef int Lock;

// Use RAII to lock and unlock a mutex.
class Atomic {
private:
    Lock lock;
public:
    Atomic (Lock& l)
        : lock(l) {
        // Lock the lock.
    }
    ~Atomic () {
        // Unlock the lock.
    }
};

The event base class is unchanged from the previous version.

// Base class for event data type.
class Event {
public:
    Event (unsigned int t) : type(t) {}
    virtual ~Event () {}

    const unsigned int type;
};

The component base class is almost the same as in the previous version. In fact, its simpler, as it no longer stores a map of event ids to components. It does still contain the methods for registering to receive events and to send events, but these are merely convenience functions which delegate the actual work to the execution manager.

// Base class for a component.
class Component {
protected:
    // Register for all events of type "type".
    void registerForEvent (const unsigned type);
    // Send an event.
     void send (const Event* const event);

public:
     // Event handler, called when an event, which is being listened for, is received.
     virtual void eventReceived(const Event* const event)=0;
};

The following is an abstract class defining the execution manager. The job of the execution manager is to map events to the desired components and schedule the event handlers. This is an abstract class so that the actual execution model can be defined in a subclass. This lets us swap in different execution models later without having to change much code.

// Abstract class used to manage execution of components.
class Manager {
private:
    static Manager* manager;

public:
    // Provide access to the stored manager.
    static void set (Manager* man) {manager = man;}
    static Manager* const get () {return manager;}

    virtual ~Manager () {}

    // Register for all events of type "type".
    virtual void registerForEvent (Component* component, const unsigned type)=0;
    // Send an event.
    virtual void send (const Event* const event)=0;
    // Execute component system.
    virtual void exec ()=0;
};

Manager* Manager::manager = 0;

We now implement the component methods by delegaing the work to the execution manager.

void Component::registerForEvent (const unsigned int type) {
    // Delegate to manager.
    Manager::get()->registerForEvent(this, type);
}
void Component::send (const Event* const event) {
    // Delegate to manager.
    Manager::get()->send(event);
}

The following code defines a sample event and component and remains largely unchanged from the previous version. It does, however, add another event type, to demonstrate how events may be sent from inside the event handler and it removes the quit event, which is not needed.

// Keep a list of event types.
enum EventIdentifiers {
    PRINT_HELLO = 0,
    PRINT_MESSAGE,
    SEND_MESSAGE
};

// A sample event.
class MessageEvent : public Event {
public:
    MessageEvent () : Event(PRINT_MESSAGE) {}
    virtual ~MessageEvent () {}

    std::string message;
};

// A sample component.
class HelloPrinter : public Component {
public:
    HelloPrinter () {
        registerForEvent(PRINT_HELLO);
        registerForEvent(PRINT_MESSAGE);
        registerForEvent(SEND_MESSAGE);
    }
    virtual ~HelloPrinter () {}

    // Sample event handler.
    void eventReceived(const Event* const event) {
        if (event->type == PRINT_HELLO) {
            std::cout << "Hello!\n";
        }
        else if (event->type == PRINT_MESSAGE) {
            std::cout << "Message: " << static_cast<const MessageEvent*>(event)->message << "\n";
        }
        else if (event->type == SEND_MESSAGE) {
            // Demonstrate sending events from inside the event handler.
            sendMessage("Send message event sent a message");
        }
    }

    // Convenience functions for sending events.
    void sendHello () {
        Event* event = new Event(PRINT_HELLO);
        send(event);
    }
    void sendMessage (const std::string& message) {
        MessageEvent* event = new MessageEvent;
        event->message = message;
        send(event);
    }
    void sendEvent(unsigned int id) {
        Event* event = new Event(id);
        send(event);
    }
};

The reference counting class is a utility class which we use to allow us to reference count events, so that if an event is sent to multiple components, we do not delete it until all components have had a chance to process the event. This means that an event can be allocated, sent and forgotton about, as the reference counter will take care of notifying the manager of when the event should be deleted.

// Class to keep a thread safe reference count for a pointer.
template <class T>
class ReferenceCounter {
private:
    const T* const item;
    unsigned int count;
    Lock lock;

public:
    ReferenceCounter (const T* const pointer)
        : item(pointer), count(0)
    {
    }
    // Delete the pointer when the reference counter is destroyed.
    ~ReferenceCounter () {
        delete item;
    }
    // Provide access to the internal lock, in case access to addRef() needs to be synchronised.
    Lock& getLock () {return lock;}

    // Provide access to the stored pointer - addRef() must have been called once for each time this is used and release() should be called once after each call - if release() returns true, then this pointer is now invalid and the ReferenceCounter should be destroyed.
    const T* const get () {return item;}

    // Increase the reference count.
    ReferenceCounter<T>* addRef () {
        ++count;
        return this;
    }

    // Decrease the reference count and return true is the reference counter is ready to be destroyed.
    bool release () {
        Atomic atom(lock);
        return --count <= 0;
    }
};

We now implement the execution manager. This execution manager will process the events sequentially, inthe order that they were sent and the components will process them in the order that they registered for the events. The execution manager stores a queue of pair – an event and component to process this event – which ill be processed in turn. Events are added to the back and taken from the front for FIFO processing. Also, a map of event types and components is stored, so that we can keep track of which component is itnerested in which event.

// Sample execution manager used to execute components in a synchronous fashion.
class SynchronousExecutionManager : public Manager {
private:
    Lock queueLock;
    // Queue of events needing to be processed.
    std::deque<std::pair<Component*, ReferenceCounter<Event>* > > eventQueue;

    Lock mapLock;
    // Map of event types to components registered for those events.
    std::map<unsigned int, std::vector<Component*> > eventMap;

public:
    SynchronousExecutionManager () {}
    virtual ~SynchronousExecutionManager () {}

    // Register for all events of type "type".
    void registerForEvent (Component* component, const unsigned type);
    // Send an event.
    void send (const Event* const event);
    // Execute component system.
    void exec ();
};

Registering for an event is a simple matter of adding the component to the list of components registered for a given event type. This is done in an atomic block, so that the event map cannot be modified concurrently.

void SynchronousExecutionManager::registerForEvent (Component* component, const unsigned type) {
    Atomic atom(mapLock);
    eventMap[type].push_back(component);
}

The send method simply ensures the event is reference counted and then adds itself to the queue for each component that is registered to receive that event. Again, this method locks the shared mutexes so that the queue and map cannot be in an inconsistent state.

void SynchronousExecutionManager::send (const Event* const event) {
    Atomic atom1(queueLock);
    Atomic atom2(mapLock);

    // Create a new reference counter for the event.
    ReferenceCounter<Event>* ref = new ReferenceCounter<Event>(event);

    // Get list of components which should receive this event.
    std::vector<Component*>& components = eventMap[event->type];

    // Add the event to the queue, once for each component - add to the reference count for each one.
    for (std::vector<Component*>::iterator i = components.begin(); i != components.end(); i++) {
        eventQueue.push_back(std::pair<Component*, ReferenceCounter<Event>* >(*i, ref->addRef()));
    }
}

This method handles actually calling the event handler for each event. It does this by taking the next event off the queue, in a thread safe manner, and then calling the event handler. Finally, the reference count for the event is decremented and if needs be, the event is deleted. This ensures that events do not leak memory.

void SynchronousExecutionManager::exec () {
    ReferenceCounter<Event>* ref;
    Component* component;
    // As long as there are events to be processed, keep processing.
    while (!eventQueue.empty()) {
        {
            Atomic atom(queueLock);
            // Get the next event/component pair from the queue.
            std::pair<Component*, ReferenceCounter<Event>* > next = eventQueue.front();
            eventQueue.pop_front();
            ref = next.second;
            component = next.first;
        }
        // Handle the event.
        component->eventReceived(ref->get());
        // Release reference count for the event just processed.
        if (ref->release()) {
            // Nobody else is holding any more references to this or will be touching this in any way
            delete ref;
        }
    }
}

Finally, a test program to test the event system.

// Sample run
int main (int argc, char** argv) {
    // Create the execution manager.
    Manager::set(new SynchronousExecutionManager());

    // Seed the system with some events.
    HelloPrinter a, b, c;
    a.sendHello();
    b.sendMessage("Hi from b");
    c.sendEvent(SEND_MESSAGE); // Sent to all 3, which each send another event to all 3, causing 9 string messages to be sent.

    // Run the event system.
    Manager* manager = Manager::get();
    manager->exec();

    // Clean up.
    delete manager;
}

Hopefully this helped you understand how the components from the previous system can be decoupled in a way that the execution environment can be easily controlled without requiring code modification. In the next part, we will implement a new execution manager, which changes the exec() method to actually process the event handlers in a thread pool, for parallel execution.

Flat hierarchies in C++: Part 1

I have decided that the time has come to finish the mini series on flat hierarchies in C++. This part, Part 1 of the new and improved series, is simply a reiteration of my earlier post on the subject, with a full, compilable, code listing. For an explanation, please read the other post. The next part of the series will decouple the components in preparation of adding concurrency.

#include <map>
#include <vector>
#include <iostream>
#include <cstdlib>

class Event {
public:
    Event (unsigned int t) : type(t) {}
    virtual ~Event () {}

    const unsigned int type;
};

class Component {
private:
     static std::map > eventMap;
protected:
     // Register for all events of type "type".
     void registerForEvent (const unsigned type);
     // Send an event.
     void send (const Event* const event);

public:
     // Event handler, called when an event, which is being listened for, is received.
     virtual void eventReceived(const Event* const event)=0;
};

std::map<unsigned int, std::vector<Component*> > Component::eventMap;

void Component::registerForEvent (const unsigned int type) {
    eventMap[type].push_back(this);
}
void Component::send (const Event* const event) {
    std::vector& components = eventMap[event->type];
    for (std::vector::iterator i = components.begin(); i != components.end(); i++) {
        (*i)->eventReceived(event);
    }
}

enum EventIdentifiers {
    QUIT_EVENT=0,
    PRINT_HELLO,
    PRINT_MESSAGE
};

class MessageEvent : public Event {
public:
    MessageEvent () : Event(PRINT_MESSAGE) {}
    virtual ~MessageEvent () {}

    std::string message;
};

class HelloPrinter : public Component {
public:
    HelloPrinter () {
        registerForEvent(QUIT_EVENT);
        registerForEvent(PRINT_HELLO);
        registerForEvent(PRINT_MESSAGE);
    }
    virtual ~HelloPrinter () {}

    void eventReceived(const Event* const event) {
        if (event->type == QUIT_EVENT) {
            std::exit(0);
        }
        else if (event->type == PRINT_HELLO) {
            std::cout << "Hello!\n";
        }
        else if (event->type == PRINT_MESSAGE) {
            std::cout << "Message: " << static_cast<const MessageEvent*>(event)->message << "\n";
        }
    }

    void sendQuit () {
        Event event(QUIT_EVENT);
        send(&event);
    }
    void sendHello () {
        Event event(PRINT_HELLO);
        send(&event);
    }
     void sendHelloMessage (const std::string& message) {
        MessageEvent event;
        event.message = message;
        send(&event);
    }
};

int main (int argc, char** argv) {
    HelloPrinter a, b, c;
    a.sendHello();
    b.sendHelloMessage("Hi from b");
    c.sendQuit();
}

Language concept: mixing functional, imperative and dataflow

See formatted code here.

fun do-op a b op {
    match op
    with '+: a + b
    with '-: a - b
    with '*: a * b
    with '/: a / b
    with  _: error
}

flow gen a b ops -> res {
    (do-op a>> b>> ops>>) -> >>res
}

do main {
    commit (reduce print
                   (gen [1 2 3]
                        [3 2 1]
                        ['+ '- '*]))
}

As I’ve stated before, I feel that a truely long lasting and useful langauge will have to merge imperative and dataflow programming styles, since both have their own set of advantages and disadvantages and locking programmers into one or the other only limits their expressive ability. This language concept is a largely incomplete idea I had for a textual programming language that does just that by providing a cohesive hybrid between imperative, pure functional and dataflow.

This language consists of three types of code blocks:

  1. do – these are imperative blocks with standard imperative features: mutable data, I/O interface and manual synchronization. They can call other do blocks as well as flow and fun blocks.
  2. flow – these are dataflow blocks and feature dataflow semantics: streams, pipelined processing, automatic memory management and automatic parallelization. They cannot mutate external data and can call only other flow blocks and fun blocks.
  3. fun – these are pure functional blocks and they cannot produce side-effects, so may only operate on immutable data and data structures. They can only call other fun blocks.

This enforced split between the different paradgms ensures that they do not get inappropraitely polluted by conflicting features. It is perfectly safe for pure functions to be called in a dataflow pipeline, for example, as they cannot affect any concurrently running code, but calling impure functions from within a fun block would taint that blocks purity, so this cannot be allowed.

It may make sense to allow calling do blocks from within flow blocks, but this could lead to synchronisation issues, race conditions and undefined behaviour, so disallowing this would solve a lot of issues. With these restrictions we can make assumptions which simplify compilation, runtime and optimisation as well as development.

Back to the code. Besides the three basic blocks, this code demonstrates a number of different features:

  • support for symbols. The code references ‘+, ‘-, ‘* and ‘/. This demonstrates support for a symbolic data type.
  • pattern matching. The match … with … construct demonstrates a Haskell/ML-like pattern matching construct. To make full use of this, a proper parametric type system would be desirable.
  • streams. Dataflow blocks naturally work well with lists and streams of data and this demonstrates a possible means of taking values from and putting values onto streams, as well as directing flow of data.
  • functional building blocks – reduce, in this case.
  • pure functional I/O handling. In this case, print is passed to reduce, which, since fun blocks can only call other fun blocks (and calling reduce inside of fun blocks is soemthing I’d expect people to do often), reduce must also be a fun block. This means that any function or closure passed into reduce must also be a fun block – this means that print must be a fun block. Since print produces output (a side-effect), it cannot be a fun block – so instead it doesn’t actually produce side effects, but generates an immutable thunk. The commit function (which is a do block) basically commits the recorded state of the thunk – producing the side effects dictated by print.

That last feature is my answer to Haskells I/O monads. By allowing restricted state (through do blocks) and providing a mechanism of recording state in immutable data structures, through thunks and other mechanisms, most code can be kept side-effect free and pure, meaning it can be safely used in parallel environments, auto-parallelised and also simplifies memory management and allows for more aggressive optimisation.

Since this is a very early concept, thats about as far as I got. I will post more as I think of it and will update the blog if I take any steps towards developing this concept. I would also be interested in hearing thoughts and opinions on this language concept.

Building a flat hierarchial component system in Clojure

I’ve been playing around with Clojure a lot lately and, just for fun, I implemented a simple system for constructing a component based software architecture where each component runs completely independently and asynchronously. Components communicate by sending events to each other, achieving a flat software hierarchy.

 

; Define a function to send events
(defn event [k v] nil)

; Define the data structure used to model a component
(defstruct component-s :name :handlers :state)

; Create a component by creating an instance of the data structure
(defn make-component [name handlers]

(struct component-s name handlers {}))

; Run a collection of components, start by sending an :init event to each component
(defn run-components [& comps]
   (loop [components comps
          events     (atom [])
          key        :init
          value      nil]
      (if (= key nil)
         ; Component processing has completed - no new events have been generated
         'Done
         ; Components have not completed processing - events are still left to be processed
         (let [updated ; Updated list of components
            ; Bind the event function to a closure which can update the vector of events
            (binding [event (fn [k v] (swap! events concat (vector (list k v))))]
               ; Map the update function accross each component in parallel
               (doall (pmap #(conj % (if ((:handlers %) key)
                                                      {:state (conj (:state %)
                                                                    (((:handlers %) key) value (:state %)))}
                                                      {}))
                                    components)))]
            ; Recursively process components
            (recur updated                        ; Updated list of components
                   (atom (rest @events))          ; Remaining events
                   (ffirst @events)               ; Next event type
                   (second (first @events)))))))  ; Next event message

We can test this by creating and running some test components:

(run-components
   (make-component "Comp1"
      {:init (fn [v s]
            (event :set-msg "Ho")
            (event :print nil)
            {:msg "Hi"})
       :print (fn [v s]
          (println (:msg s)))
       :set-msg (fn [v s]
          {:msg v})})
   (make-component "Comp2"
      {:init (fn [v s]
         {:msg "Hello"})
       :print (fn [v s]
         (println (:msg s)))}))

Comp1 contains three event handlers – :init, :print and :set-msg. Comp2 only contains two event handlers – :init and :print.
:init is called to setup the components state and trigger any initial events. :print will print the contents of the states :msg slot to the terminal and :set-msg will change the :msg slot to the value of the events message.

Running this sample will output the following:

Ho
Hello
Done

It becomes more interesting if the components do a little more, for example:

(run-components
   (make-component "Comp1"
      {:init (fn [v s]
            (event :print nil) ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
            (event :set-msg "Ho")
            (event :print nil)
            {:msg "Hi"})
       :print (fn [v s]
          (println (:msg s)))
       :set-msg (fn [v s]
          {:msg v})})
   (make-component "Comp2"
      {:init (fn [v s]
         {:msg "Hello"})
       :print (fn [v s]
         (println (:msg s)))}))

Running this will output:

Hi
Hello
Ho
Hello
Done

Note how the first event gets processed before the second one – ie, Comp1’s :msg has not yet been modified.

Of course, you can set up all sorts of crazy chains of events with many event types, complex logic and so on.

What would be really nice is a DSL designed for writing these component-based programs, since calling run-components and make-components manually is a little messy. Something like the following would be nice:

(program
    (component "Comp1"
        (on :init [v s]
               (event :print nil)
               (event :set-msg "Ho")
               (event :print nil)
               {:msg "Hi"})

        (on :print [v s]
            (println (:msg s)))

        (on :set-msg [v s]
            {:msg v}))

    (component "Comp2"
        (on :init [v s]
            {:msg "Hello"})

        (on :print [v s]
            (println (:msg s))))))

Luckily, this is the kind of thing Lisp is good at and Clojure, being a Lisp dialect, has everything we need to make it happen:

(defmacro program [& components]
    `(run-components ~@components))

(defmacro component [name & handlers]
    `(make-component ~name (conj {} ~@handlers)))

(defmacro on [handler args & body]
    {handler `(fn ~args ~@body)})

These macros transform our convenient little DSL into a set of calls to run-components and make-components. Success! We can now write fun little component-based programs, arranged in a flat hierarchy, using an anonymous (sender never knows who receives the events; receiver never knows where the events come from) event-based messaging system.