tl;dr

Building a multi-dimensional spatially configured feed forward nodal compute network.

  •  

    February 2010
    M T W T F S S
    « Dec    
    1234567
    891011121314
    15161718192021
    22232425262728
  • Archives

  • Subscribe

  • Other Blogs

  • RSS Latest Posts on Vorlaths Blog

    • Out Of Memory Checks (in DataFlow?)
      Just saw this short blog entry about not checking if you're out of memory. I don't normally check for this, but in some cases, I do check for it. I tend to do this in production code or in certain cases where I request large chunks of memory. If it's just allocating objects here and there, then I agree there's not much point considering t […]
    • Failing Drastically or Quietly Produce Incorrect Results?
      The paraphrased words in the title were uttered by Andrew Koenig who blogs on the Dobbs Code Talk web site. The specific comment is located in his own blog article titled What Dijkstra said was harmful about goto statements. This is a followup from a previous blog entry where he asks readers for their opinions about goto statements. Yes, it's on older t […]
    • The Scientific Method (Updated: New 2008 map)
      I'm going to take a theory that is going to seem rather extreme to most people for this demonstration. Now, I haven't been able complete the experiment, but perhaps some new ideas will form by posting this article. The point being that the scientific method should not concern itself with how ridiculous a certain claim appears to be. We will see in […]
    • Assembly: Interlaced Zig Zag x264 (Updated Dec 8,2009)
      Via reddit, we find this article about implementing zig zag in assembly. See zig zag figure in section 1.6.6 of this article for more info. The author implemented the interlaced version of zigzag (16bit word elements) as it was not already available. This version is for more recent processors as can be seen by this quote. Furthermore, as should be clear by t […]
  • RSS Latest LtU Posts

    • Resolving and Exploiting the k-CFA Paradox
      Resolving and Exploiting the k-CFA Paradox, Matthew Might, Yannis Smaragdakis, and David Van Horn. To appear in PLDI 2010. Low-level program analysis is a fundamental problem, taking the shape of "flow analysis" in functional languages and "points-to" analysis in imperative and object-oriented (OO) languages. Despite the similarities, the […]
    • Continuity Analysis of Programs
      Continuity Analysis of Programs, Swarat Chaudhuri, Sumit Galwani, and Roberto Lublinerman. POPL 2010. We present an analysis to automatically determine if a program represents a continuous function, or equivalently, if infinitesimal changes to its inputs can only cause infinitesimal changes to its outputs. The analysis can be used to verify the robustness of […]
    • Monads in Action
      Monads in Action, Andrzej Filinski, POPL 2010. In functional programming, monadic characterizations of computational effects are normally understood denotationally: they describe how an effectful program can be systematically expanded or translated into a larger, pure program, which can then be evaluated according to an effect-free semantics. Any effect-spec […]
    • HipHop: Facebook runs compiled PHP on its servers
      While PHP deservedly gets a terrible rep around programming language folks, this is still an interesting announcement: HipHop compiles PHP down to C++ and gets about a 2x speedup. HipHop will be released as open source, and is currently in production use, serving 90% of Facebook's traffic. It makes me wish Facebook used Python: a large-scale deployment […]
  • RSS Latest ACM Queue Posts

    • A Conversation with Arthur Whitney
      Can code ever be too terse? The designer of the K and Q languages discusses this question and many more with Queue editorial board member Bryan Cantrill.
    • Purpose-Built Languages
      While often breaking the rules of traditional language design, the growing ecosystem of purpose-built "little" langages is an essential part of systems development.
    • Realtime Garbage Collection
      Traditional computer science deals with the computation of correct results. Realtime systems interact with the physical world, so they have a second correctness criterion: they have to compute the correct result within a bounded amount of time. Simply building functionally correct software is hard enough. When timing is added to the requirements, the cost an […]
    • How Not to Write Fortran in Any Language
      There are characteristics of good coding that transcend all programming languages. Programmers have debated the merits of different programming languages since the dawn of programming. Every coder has a favorite general-purpose programming language, and many have an unfavorite language, too. If the coder is old enough, often that unfavorite language is Fortr […]

Building a flat hierarchial component system in Clojure

Posted by dublindan on December 1, 2009

I’ve been playing around with Clojure a lot lately and, just for fun, I implemented a simple system for constructing a component based software architecture where each component runs completely independently and asynchronously. Components communicate by sending events to each other, achieving a flat software hierarchy.

 

; Define a function to send events
(defn event [k v] nil)

; Define the data structure used to model a component
(defstruct component-s :name :handlers :state)

; Create a component by creating an instance of the data structure
(defn make-component [name handlers]

(struct component-s name handlers {}))

; Run a collection of components, start by sending an :init event to each component
(defn run-components [& comps]
   (loop [components comps
          events     (atom [])
          key        :init
          value      nil]
      (if (= key nil)
         ; Component processing has completed - no new events have been generated
         'Done
         ; Components have not completed processing - events are still left to be processed
         (let [updated ; Updated list of components
            ; Bind the event function to a closure which can update the vector of events
            (binding [event (fn [k v] (swap! events concat (vector (list k v))))]
               ; Map the update function accross each component in parallel
               (doall (pmap #(conj % (if ((:handlers %) key)
                                                      {:state (conj (:state %)
                                                                    (((:handlers %) key) value (:state %)))}
                                                      {}))
                                    components)))]
            ; Recursively process components
            (recur updated                        ; Updated list of components
                   (atom (rest @events))          ; Remaining events
                   (ffirst @events)               ; Next event type
                   (second (first @events)))))))  ; Next event message

We can test this by creating and running some test components:

(run-components
   (make-component "Comp1"
      {:init (fn [v s]
            (event :set-msg "Ho")
            (event :print nil)
            {:msg "Hi"})
       :print (fn [v s]
          (println (:msg s)))
       :set-msg (fn [v s]
          {:msg v})})
   (make-component "Comp2"
      {:init (fn [v s]
         {:msg "Hello"})
       :print (fn [v s]
         (println (:msg s)))}))

Comp1 contains three event handlers – :init, :print and :set-msg. Comp2 only contains two event handlers – :init and :print.
:init is called to setup the components state and trigger any initial events. :print will print the contents of the states :msg slot to the terminal and :set-msg will change the :msg slot to the value of the events message.

Running this sample will output the following:

Ho
Hello
Done

It becomes more interesting if the components do a little more, for example:

(run-components
   (make-component "Comp1"
      {:init (fn [v s]
            (event :print nil) ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
            (event :set-msg "Ho")
            (event :print nil)
            {:msg "Hi"})
       :print (fn [v s]
          (println (:msg s)))
       :set-msg (fn [v s]
          {:msg v})})
   (make-component "Comp2"
      {:init (fn [v s]
         {:msg "Hello"})
       :print (fn [v s]
         (println (:msg s)))}))

Running this will output:

Hi
Hello
Ho
Hello
Done

Note how the first event gets processed before the second one – ie, Comp1’s :msg has not yet been modified.

Of course, you can set up all sorts of crazy chains of events with many event types, complex logic and so on.

What would be really nice is a DSL designed for writing these component-based programs, since calling run-components and make-components manually is a little messy. Something like the following would be nice:

(program
    (component "Comp1"
        (on :init [v s]
               (event :print nil)
               (event :set-msg "Ho")
               (event :print nil)
               {:msg "Hi"})

        (on :print [v s]
            (println (:msg s)))

        (on :set-msg [v s]
            {:msg v}))

    (component "Comp2"
        (on :init [v s]
            {:msg "Hello"})

        (on :print [v s]
            (println (:msg s))))))

Luckily, this is the kind of thing Lisp is good at and Clojure, being a Lisp dialect, has everything we need to make it happen:

(defmacro program [& components]
    `(run-components ~@components))

(defmacro component [name & handlers]
    `(make-component ~name (conj {} ~@handlers)))

(defmacro on [handler args & body]
    {handler `(fn ~args ~@body)})

These macros transform our convenient little DSL into a set of calls to run-components and make-components. Success! We can now write fun little component-based programs, arranged in a flat hierarchy, using an anonymous (sender never knows who receives the events; receiver never knows where the events come from) event-based messaging system.

Posted in Concurrent Programming, Software | Tagged: , , | Leave a Comment »

Brainstorming the dataflow language “syntax” (Part 1)

Posted by dublindan on August 9, 2009

I’ve spent quite a lot of time, during the past few weeks, working on the representation of my dataflow language. In this post I will attempt to outline my current thoughts and hopefully generate some feedback on what I got right and wrong, what I could improve, how the language could be made more intuitive, easier (and faster) to develop in and less bug-prone.

For this post, I will define some basic temrinology. This will help me in explaining my ideas in a consistent way, but may not match up with existing terminology. If there is such a mismatch, please feel free to let me know and I’ll update the post. Later, I will probably make a list of terminology specific to my language, using existing temrinology where possible, as a reference to help people who are viewing my language for the first time. For now, however, I’ll simply use these:

  • operation – a fundamental instruction or external block of code, which acts as a single node in the dataflow network
  • component – either an operation or a collection of operations, or a collection of collections of.. That is, a componecan refer to a single operation, or collection of components (which themselves refer to either a single operation or a collection of components).
  • expression – a deterministic calculation whos output is a function of its inputs, calculated using the substitution model of computation. Expressions consist of a limited set of predefined (usually mathematical or logical) operators.
  • pre-condition – a logical expression which evaluates to either True or False and must be evaluated as True for a component to be considered executable. May optionally triggeran an arror state if this expression evaluates to False.
  • post-condition – a logical expression which evaluates to either True or False and must be evaluated as True for a component to propogate its output to connected nodes. May optionally trigger an arror state if this expression evaluates to False.

For now, I have defined a set of data types for use in the language, though what the final set of supported data types will be is, so far, undeicded as I need to do more real-world tests to see what is useful and what is not. The data types I have currently defined are:

  • integer – a signed whole number whos range is defined by the host systems native sized integers (ie 32bit on a 32bit machine).
  • real – a signed floating point number, whos range is defined by the host systems native floating point support.
  • byte – an unsigned 8bit value used to store binary data.
  • character – a single unicode character.
  • pair – a container of two values, where each value may be of a different data type.
  • array – a container of a fixed number of values, where each value is of the same data type.
  • record – a container of a predetermined set of named fields, where each field may be of a different data type.

Additional, theres a special data type: the sequence. A sequence is a stream of values of the same data type and is the connection between components – that is, I am naming the flow of data between nodes a sequence.

I am also contemplating whether a variant data type should also be provided as this could be useful, together with a guard or pattern matching mechanism, for receiving data from external sources.

To demonstrate the language syntax, I will construct a network to perform the following sample pseudocode:

input a, b and c
value d = (a * b) - 1
if d > 10 then
    value e = c - 1

This simple piece of code would be represented in my dataflow language as:

Sample Code

This simple dataflow network shows two “merge” components connected together. Before explaining how this network works, I will briefly introduce the merge components.

A basic merge component takes one of the following two forms:

Merge Component

A merge component has two or three inputs and a single output. It also has an accosiated expression. The output is the result of applying the expression to the inputs.

There also exist additional variants of most operations – those with optional pre- and/or post-conditions attached. Pre-conditions are marked as a green block on the left and post-conditions are marked as a red block on the right. In the sample code, presented above, the second merge node has a pre-condition assigned to it.

In the code editor, it would be possible to collapse, and therefore view, the pre-conditions, expressions and post-conditions. The diagram used the hidden view to demonstrate that a more compact view of code, which does not display all available information, would be available in an editor. The collapsed view would look as follows:

Sample code (full)

Note the use of coloured markers where the inputs are used. This is done because forcing inputs to be named, when it may not make sense to name them, programmers, being as lazy as we are, tend to give them meaningless names. This only adds to the noise and makes the program more dificult to read and understand. Using coloured markers makes it immediately obvious which inputs are used where without polluting the code with meaningless identifiers. It should, however, be possible to optionally label the markers, if it makes sense to do so. Perhaps the labels are displayed as a tooltip when hovering the mouse over the marker. It may also be useful to highlight each occurance of an input within the component when hovering the mouse over it.

This code shows that, in the first merge component, the expression (a * b) – 1 is computed. The result of this expression is passed to the next node, which contains the pre-condition that its first input (d in the pseudocode) must be greater than 10. If it is not, then that input (and the input to d) are discarded and no furether action is taken. If the pre-condition passes, however, then the expression is evaluated, subtracting 2 from input c.

The example below is a bit more complex and introduces variants of the “merge” component (multiple expressions, single input, etc) and a new node: the “route” component. The route component evaluates a condition and then routes the inputs to one of two possible sets of outputs. Route components may have a pre-condition, but cannot have any post-conditions, as the inputs are never modified by this node – the same effect may be achieved through pre-conditions.

This example deserves a more in-depth explanation than I hae time for, so I will edit this post over the next week or so to add in additional detail and explain my ideas better.

Fibonacci Sequence sample code

Fibonacci Sequence sample code

Posted in Dataflow | Tagged: , , , | 4 Comments »

The dataflow virtual machine, first look

Posted by dublindan on August 9, 2009

Since my last post, I’ve been mainly working on two things regarding this blog: a simple dataflow virtual machine and the graphical “syntax” or representation for a dataflow language.

I’ll briefly write about the virtual machine in this post and then talk about the language “syntax” in the next.

The virtual machine is a very simple system of connected operations/nodes with zero or more inputs and outputs. An output can be connected to an input and then data is sent to the output by the node; this data is forwarded along to the connected inputs. Inputs and outputs are statically typed and, during network construction, the types are checked for compatibility ensuring that type mismatches do not occur at runtime. A node is processed when all of its inputs are met and processing is decoupled from the rest of the logic in such a way that it can be performed in a dedicated thread or even pool of threads. Currently no threads are used, so everything occurs sequentially, although later this will be changed.

Nodes can be set up to receive a constant stream of inputs, without actually running the code to do this. For example, if a nodes input is attached to a constant data source, like a static number, this would not actually have to keep generating the values, instead it is implied that it happens and the input simply retains the value after using it, instead of clearing it.

Though it has yet to be implemented, I’m thinking a copy-on-write system should be used for data, so that if data is duplicated and sent to multiple nodes, a single copy is shared until a node tries to modify its value, at which point a new copy of the data is created. This makes duplicating/copying data cheaper.

I have also been thinking about dynamic runtime switching between a synchronous and asynchronous operation evaluation systems, where the executing program can choose to switch between these. In the synchronous model, concurrency is synchronised into cycles and any new operations which become executable during a cycle must wait until the next cycle to be run. In an asynchronous model, the new operations are queued and everything essentally runs in a single (although large) cycle. The asynchronous mode potentially tields better utilisation of processing resources, while the synchronous model may be useful to provide runtime guarantees for realtime systems. I’ve also been thinking about how a profiler could generate utilisation graphs for both – hopefully further aiding in coming up with runtime guarantees. I definitely need to work on this more, however.

As an aside, I found this great C++ library for dataflow, which I might be able to use for my own implementation! I’ll see. For now, I’ll continue with my own, since its almost done and allows me to do things how I want. Could speed up work, though, by using existing code.

Posted in Dataflow, Software | Tagged: , , | Leave a Comment »

Some thoughts on flat hierarchies in C++

Posted by dublindan on July 23, 2009

The top search terms for this blog seem to be “flat hierarchy C++”, so this is something I should address.

I had originally intended on writing a crude implementation in Python, to demonstrate the consept, leaving an actual efficient implementation as an exercise for the reader (possibly implementing it myself for a future post), but perhaps it would be more useful to think about it sooner. In this post, I will write a few short sentences to outline my thoughts on implementing a flat hierarchy in C++, which will hopefully provide readers with some ideas to implement this themselves.

There are two ways of implementing flat hierarchies which I am currently interested in exploring, though I’m sure theres more. I will briefly explain these here:

  1. Independent components which communicate over an anonymous shared-nothing messaging system. This works because the messaging is transparent – a component never knows which other components will receive its messages, if any. A receiving component likewise never knows which components sent the messages which it is receviing. This anonymous message sending means that components may be added or removed at will and other components will never know. Obviously it may not logically make sense to do this – if a component relies on certain messages to work and you remove the component which generates the messages, then obviously the first component cannot fnction – but it will not fail either, it will wait around until you add a component which does generate those messages back into the system. Each component is a node in a flat hierarchy.
  2. An entity system. In this context, an entity system is an alternative to object oriented systems where an object is simply a collection of traits and is a technique borrowed from the game development community. How this object behaves is determined by the traits it has associated with it. Traits may be added or removed at runtime. Each object is independent and can only communicate through its traits. Since traits can be added and removed dynamically, the system maintains a flat hierarchy. Objects can be processed independently and concurrently. The building of entity systems is the subject of my “Building a Flat Hierarchy” series of posts, so be sure to check those out, if interested.

Since entity systems are reasonably complex and are being covered in their own series of blog posts, I will not talk about them here. Instead I will briefly talk about the independent component method of building a flat software hierarchy.

Depicts how components communicate with each other

Depicts how components communicate with each other

Ok, so hopefully at this stage you have a reasonably good idea of why this system works. If theres still confusion, please leave a comment and I’ll try and clear things up. Time to move on to implementation details.

I’m not going to implement the entire thing and just give out the code for people to use because that would take more time than I have to write this post. Also, different projects have different requirements and a system I build now may not be ideal for whatever you have in mind. Best to leave it up to you to fine tune the details to your needs. The version presented here whould give you ideas, but it won’t be terribly good or efficient.

Lets start with a component:

class Component {
private:
     static std::map<unsigned int, std::vector<Component*> eventMap;
protected:
     // Register for all events of type "type".
     void registerForEvent (const unsigned type);
     // Send an event.
     void send (const Event* const event);

public:
     // Event handler, called when an event, which is being listened for, is received.
     virtual void eventReceived(const Event* const event)=0;
};

And the implementation:

std::map<unsigned int, std::vector<Component*> Component::eventMap;

void Component::registerForEvent (const unsigned int type) {
    eventMap[type].push_back(this);
}
void Component::send (const Event* const event) {
    std::vector<Component*>& components = eventMap[event->type];
    for (std::vector<Component*>::iterator i = components.begin(); i != components.end(); i++) {
        (*i)->eventReceived(event);
    }
}

All events are derived from the Event base type:

class Event {
public:
    Event (unsigned int t) : type(t) {}
    virtual ~Event () {}

    const unsigned int type;
};

Lets test this with some stupid events:

enum EventIdentifiers {
    QUIT_EVENT=0,
    PRINT_HELLO,
    PRINT_MESSAGE
};

class MessageEvent : public Event {
public:
    MessageEvent () : Event(PRINT_MESSAGE) {}
    virtual ~MessageEvent {}

    std::string message;
};

Now I’ll implement some very simple and crude components, just to have something that can be run:

class HelloPrinter : public Component {
public:
    HelloPrinter () {
        registerForEvent(QUIT_EVENT);
        registerForEvent(PRINT_HELLO);
        registerForEvent(PRINT_MESSAGE);
    }
    virtual ~HelloPrinter () {}

    void eventReceived(const Event* const event) {
        if (event->type == QUIT_EVENT) {
            std::exit(0);
        }
        else if (event->type == PRINT_HELLO) {
            std::cout << "Hello!\n";
        }
        else if (event->type == PRINT_MESSAGE) {
            std::cout << "Message: " << reinterpret_cast<MessageEvent*>(event)->message << "\n";
        }
    }

    void sendQuit () {
        Event event(QUIT_EVENT);
        send(&event);
    }    
    void sendHello () {
        Event event(PRINT_HELLO);
        send(&event);
    }
     void sendHelloMessage (const std::string&amp; message) {
        MessageEvent event;
        event.message = message;
        send(&event);
    }
};

Ok, almost done. Lets tie it all together:

int main (int argc, char** argv) {
    HelperPrinter a, b, c;
    a.sendHello();
    b.sendMessage("Hi from b");
    c.sendQuit();
}

This should print “Hello” three times followed by “Hi from b” three times and then it should quit.

Ok, I’ll be the first to admit that this implementation is extremely limited and suffers from a bunch of flaws – BUT hopefully you can see the ideas and use that to implement a much more powerful version. Before I leave you to experiment, let me briefly mention some improvements which could be made to the above code. I’m sure you can think of many more too:

  • Should the sending component receive its own events? Perhaps the sender should be able to choose.
  • Should events be processed right away? This is useful for simple programs, but for large systems, you may end up in situations where some components never get to process events because other events are always processed first. A good solution would be to queue up all events, instead of sending them right away and then dispatch them when all event handlers have returned.
  • Should events be processed sequentially? It may be better to process them in threads. One good solution would be to have a pool of threads which will execute the event handlers. Since communication is done through immutable events, the handlers can all run at once.
  • The events themselves can be constructed from a memory pool to make creating and destroying events (event large events) a fast and cheap operation.

At some point in the future, I will implement (because I will need an implementation soon) and explain an implementation which has all of those features and probably some others too. I cannot commit to a date though, could be tomorrow or it could be next year.. who knows.

I hope that seeing the idea in code has helped you understand what it is I’m talking about and hopefully this will spark your imagination and creativity and perhaps motivate you to implement your own, more advanced, efficient and powerful versions. I’d love to hear what people come up with, so if you do implement this yourself, I’d appreciate a quick comment!

Posted in Software | Tagged: , , , , | 3 Comments »

What is dataflow?

Posted by dublindan on July 20, 2009

I was reading through old blog posts and came upon this gem in a 3 year old post on Cleo Saulniers blog, which I think is a near perfect description of what dataflow actually is. I know some of you were itching at a real discussion about what dataflow is and how it helps us. Well, here you go:

Many still can’t distinguish the difference between control statements and the traces on a circuit board. These are two and completely unrelated concepts.

  1. One is used to control what next component or instruction gets executed.
  2. The other determines what next component (or instruction) the data goes to.

Read the last two sentences again. Do you see the difference?

This is, in my opinion, the clearest, simplest contrast between dataflow and conventional programming paradigms that I have found. However, just in case this didn’t sink in, I’ll attempt to clarify:

  • Dataflow: material flows from person to person on an assembly line, where each person always performs the same actions
  • Not dataflow: material is passed to a person, who performs a number of different operations on it before passing it to the next.

It is interesting to look at other systems in nature, because software is not, after all, unique. Software builds on hardware, which builds on physical properties of the world. The theories are built upon abstract mathematical models, on chemestry, on biology, on architecture, on art.. Computer scientists draw inspiration for new concepts, theories, ideas and designs from a vast array of other fields. Software is not unique at all. So, lets look at nature, at biological organisms.

Biology is interesting to us, because it works in exactly the same way as computer hardware, but instead of electrical processes, they rely on biological and chemical processes. They have the interesting property that they are, essentially, dataflow systems (just like computer hardware, under the hood). They are spatially sequenced (instead of temporally), just like dataflow systems (and spreadsheets, pixel shaders etc) and they are structured in flat hierarchies. Do you see the buzzwords? Flat hierarchies, spatial software, dataflow. Exactly what I’ve been ranting about here, on this blog, all along.

Lets look at a concrete example:

Cell based organisms. A cell is, essentially, a biological compute node. They are fed data, through the exchange of proteins, hormones and electrical and chemical signals, do something with this data and feed it on. Input, Process, Output. They are spatially configured, in multiple dimensions, in a flat hierarchy. That is, if you take out a cell, the entire system does not collapse. Sure, it is now missing a compute node and it won’t be able to work quite the same as before, but the system is still intact. The cells are not temporally sequential structures; they are spatial – their sequence depends on their spatial position in relation to each other.

Neural networks (or, to avoid confusion, networks of neurons) are also spatially configured compute nodes, leveraging a flat hierarchical architecture, composed of a dataflow network. The links are the synapses, which feed data through a dataflow network of neurons. The neurons are positioned spatially and the synapses depend upon their spatial position in relation to one another. The overall hierarchy is a flat one – if you remove a subsection of the network, the rest of the network does not collapse. Most biological systems appear to have these properties.

So back to dataflow programming. Our dataflow system will also feed data from node to node, in a spatially configured flat hierarchy. They are no different from biological systems. Can we infer anything from this?

Of course we can. Biological systems have other interesting properties, which we would love to achieve in software. Since our dataflow system exhibits the other properties of biological systems, logically we could infer that it may also exhibit the other desirable properties. At the risk of falling into the correlation = causation trap (that is, just because our system has similar properties to biological systems, does not mean ours will have all properties present in biological systems), we can see the potential for other desirable properties: fault tolerance, tolerance of incomplete data, redundancy, replaceable parts (that is, hot-swappable components – thanks to the flat hierarchy).

I hope you can see the huge potential a dataflow system provides.

I guess you can describe the programming system, which I am trying to describe here in this blog, as a multi-dimensional spatially configured feed forward nodal compute network. Now that’s a buzzword loaded sentence!

Posted in Concurrent Programming, Dataflow, Software | Tagged: , , | 5 Comments »

Building a Flat Hierarchy (Part 1)

Posted by dublindan on July 14, 2009

To understand this post, it may be necessary to read about the differences between stacked and flat software hierarchies. Yes, its a long piece of required reading, but its worth it. I’ll wait for you here.

Still here? In this post, I am not going to talk about what stacked or flat hierarchies are. I’m not going to talk about their advantages or disadvantages. Instead, I will talk about a little software architecture I learned about from the game development community known as Entity Systems. I will also show a simple sample implementation of such an entity system – a flat hierarchy. My implementation will be crude and simple and not really a good representative of flat hierarchies, component systems or entity systems, but I’m hoping that it will be enough to give you a practical taster of how one can make use of these concepts. My hope is that others will expand on these ideas to build some useful software tools for the programmer community to make use of. So, put simply, please don’t complain that this implementation is inefficient and useless, instead use the concepts to build an efficient and useful variant. In a future post, I may develop such a system, but for now, a simple proof-of-concept implementation is more suitable, as it allows us to talk and reason about the concepts. Anyway, here goes…

My definitions and terminology may be slightly different from those used in the game development communities and if anyone spots such discrepancies, please point them out so that I can fix it. The concepts are simple enough – instead of of having our software composed as objects in the object oriented manner, we have entities. This is more than just a simple name change, however, as an Entity is really just a name or a tag. It does not contain data, it does not contain actions. Its just a named placeholder so that our system knows that it exists and can find it. An entity may be tagged with zero or more traits. A trait provides two things to an entity: a system and a state. The system is a function of code which continuously processes all entities with its particular trait. For example, a Renderable trait may exist, which tells the software that an entity can be rendered on the screen, that traits system will continuosly process all Renderable entities and render them to the screen. By tagging an entity as Renderable, it instantly becomes seen and processed by the system and if the tag is removed, it will no longer be processed by that system. A trait will also give an entity a state, which is the collection of all data required for that traits system to process it. In the case of a Renderable trait, the state could contain important data such as the representation (image data, polygons, etc) and position on screen.

Time for a quick recap. Our entity based software will consist of the following components:

  • Entity – an identifier which may have zero or more traits attached to it.
  • Trait – a behaviour or characteristic applied to an entity.
  • System – a processing system which updates or otherwise processes the states for a given trait.
  • State – the data asscociated with a trait.

In the next post (or perhaps I’ll edit this one instead), I’ll introduce a simple prototype implementation of these ideas, so that people can play around with it and see what they can do. This should help spark new ideas. Hopefully in the future, I will then implement a proper and efficient version which can be used to build real softweare, though for now a prototype should be good enough.

Posted in Software | Tagged: , , , | Leave a Comment »

The Internet is Broken

Posted by dublindan on June 25, 2009

After a discussion with Diarmuid about the state of computing, its become pretty clear to me that the Internet is broken. Not only that, but applications are broken, software frameworks are broken, operating systems are broken and hardware is broken too.

Sure, they all work, but they work in an I-need-to-sit-beside-my-toaster-with-duct-tape sense of work. A true appliance does not bother the user with so much low level detail, so much implementation detail and so much meaningless management. A real appliance just does its work without bothering the user about how. Computers are far from being an appliance, in this sense.

Mobile phones are a lot better here, but even they could be better still.

Ok, let me start from the beginning. Why is the Internet broken?

Come see The Dark Knight today at 7pm in Cineworld.

Every time I see a message like this, I’m reminded that the Internet is broken.Every time I’m shown old, outdated or plain irrelevant information, I know that the Internet is broken. Its 8pm! Theres no Cineworld near me. The Dark Night hasn’t been screened in a year! Yep, the Internet is definitely broken.

I have nothing against archived data. In fact, I think its a good thing, but the Internet (or rather, the applications and protocols used to implement the Internet) should be smart enough to know when I require archived information or up-to-date information. Alongside this, the Internet should be a lot less manual, I don’t care about web sites and URL’s and all those meaningless details. I just want to accomplish my task, get my information. We’re already heading this way – the average user (who really does not want to be bothered with low level details at all) already seems to use Google to find their websites instead of remembering the URL and typing it into the address bar. Does your girlfriend/grandparents/dog even know what an address bar is? Hell, 92% of people don’t know what an Internet browser is… (My apologies to the tech savvy girlfriends/grandparent/dogs out there)

I’d love to rant about the Internet longer (actually, I wouldn’t, which is why I’m promptly changing the subject), but I feel its time to take a step back and watch how this phenomenon affects the rest of the computing environment. In fact, this should really be evaluated and fixed before the Internet is tackled.

My biggest problem with computers is that the bother me with details which I just don’t care about. What do you mean I need to start an application to edit my presentation? Why should I have to care about that! Applications, programs, services, web sites, filesystems and all those other things which take up such a large portion of my computer time – why am I constantly being bothered about this? Surely the computer is in a better position to manage these boring, monotonous operations for me, transparently, in the background – letting me get on with my work so that I can leave the office sooner and enjoy some sunshine for once…

Lets say I want to calculate some tax returns, generate a pretty table and graph, put some information underneath, in bullet point format, email it to my boss and send a copy to some archive in the office. On a current system, the procedure to accomplish this task might be something like this:

  1. Start the spreadsheet application
  2. Enter the data and formulas
  3. Generate a graph
  4. Export data and graph as an image (Yeah, I’m aware that often I can actually embed the spreadsheet in other tools and this is certainly a step in the right direction, but it suffers severe interoperability issues currently…)
  5. Start the presentation application
  6. Add the text
  7. Import the image
  8. Save/export in some suitable format
  9. Open my email application or web browser to access my emails
  10. Write an email to the boss, attach the presentation, send email
  11. Mount shared drive/start FTP client/whatever I need to do
  12. Send file to archive

Thats 12 steps – without dealing with the issues of closing applications after I;m done with them… I also left out details such as where do I save/export these files to? and load them from again.. This is ridiculous. Thats too much management taking up my time. Obviously these things still need to happen, but they should be as transparent as possible. Maybe something like this:

  1. I browse to my new work item (more commonly known as a file, I am renaming it because I want to emphasise that its a higher level idea) widget (the meaning of browse depends on the user interface, of course)
  2. I begin entering spreadsheet data, as I enter, the system tries to detect what kind of data I’m entering and provides me with an unobtrusive menu, in case it guessed incorrectly
  3. From inside this environment, I should be able to perform all types of editing tasks. An unobtrusive panel or menu should allow me to switch between tasks. The system may start and stop applications behind the scenes to accommodate these tasks.
  4. When I’m done, I simply browse away from the work item. The system stores it for me, without my interaction, unless I tell it not to. There should also be some means of naming, tagging and categorizing work items.
  5. I tell the system, perhaps through some glorified terminal, to “email work_item_1 to boss” and it will try and predict who this is and provide a prompt of relevant choices.
  6. I tell the system, again, using my glorified terminal, or whatever the user interface would be, to “archive work_item_1

Actually, we can do better than that. Because we know that the user was working on that item, we can assume that this is the item they want to interact with, unless told otherwise. So a simple “email to boss” and “archive” (or “send to archive” or whatever) should be enough. Of course, theres no reason all of this cannot be done with some graphical interface for the typing-averse people out there.

The point is that the computer should worry about loading and saving my work for me (ie orthogonal persistence), starting and stopping programs, applications and services and determining what I want using some context sensitive heuristics.

The entire system stack should be built like this: from hardware, to operating system, to inter-application communication, to application framework, to application to internet. Everything should be context sensitive and should be tuned to my needs. Think Google search with its suggestions, only applied to everything. The computer should simply work.

One possible user interface could be something like this

  • A zoomable desktop with a Google-like search bar at the bottom of the screen which doubles as a temrinal command input area.
  • When I finish editing a work item, a screenshot of it is placed on the desktop, where I was working on it. I can drag this screenshot to another place on the desktop, if I should want to.
  • I can pan around the desktop as well as zoom in and out, hence zoomable desktop.
  • When I zoom in on a work item screenshot, the application needed to edit this type of item is automagically started up (transparently, behind the scenes) and I can edit that item.
  • When I zoom out again or pan away, then an updated screenshot is generated and the application is stopped again,
  • I can tag these items with blog-like tags and categories to aid sorting and searching and add different metadata to it so that the system knows how to deal with it.
  • The search box at the bottom of the screen allows me to search for items by name, pattern, tag, metadata, category, general location on the desktop etc.
  • The search box can also be used to enter specific commands, ike you would in a temrinal now.
  • The search box would also allow you to enter context sensitive smart-commands, like “send to bob”, in which case the send command would send the last work item which you edited/viewed to bob. Bob would be looked up in my user profile list of destinations (email contacts, IM contacts, FTP server, whatever) to find where it should be sent to. An unobstrusive popup dialog would prompt me with the matching items, so I can correct it if it guessed incorrectly.

Theres obviously more to it than that, but it should be a good enough start to get everyone thinking about the problem and potenital solutions a bit. If you have any ideas, comments, queries or feedback, please comment! I’d love to hear other peoples ideas on this.

Of course, the real problem can be seen with a simple question, which will be asked when trying to transition users to a new, fixed and working system:

Thats all very well and good, but does it run Microsoft Word?

Which is completely in line with what I have already said.

Posted in Issues, Software, Technology | Tagged: , , | 3 Comments »

Language Concept

Posted by dublindan on June 12, 2009

I was digging through my external drive yesterday and I discovered some notes on a programming language concept. I will briefly talk about it here.

There are a few things I really like in programming languages:

  1. Parametric recursive data types
  2. Automatically checked type constraints
  3. Pattern matching and guards
  4. Automatic event management
  5. A concise and simple syntax (point free, if possible)
  6. Restartable/Resumable exceptions
  7. Type inference

This language concept features, at a minimum, numbers 4, 5 and 7.

In this language, just like in Java, everything is an object, of sorts. Also, just like in Java, you define each “class” inside its own file and the file name must match the class name. In the spirit of being short and concise, however, the language eliminates the redundancy and boilerplate: you do not need to define the class or give it a name, we already know its name is the same as the file name. We also already know that everything inside that file belongs in that class.

This is only a minor, cosmetic change, though. The more interesting features are the automatic event and dependency management.

Lets start with some sample code:

[Filename: ButtonClick.lang]

public
    reset click -> false
    click_count <- local_click_count
private
    local local_click_count = 0
    react click: local_click_count += 1

in some other file:

import io
import .ButtonClick
private
    local clicker = ButtonClick
    display: io.print “Button clicked ”
                      clicker.click_count
                      “ times.\n”
    main: times [clicker.click = true, yield] 3

What this does is creates a number of cells which contain data:

  • click, a cell which will reset to false every time it gets set
     >>> clicker.click = true
     >>> io.print clicker.click
     --- false
  • click_count, a cell which is bound to the value of local_click_count
  • local_click_count, a local variable. This is a standard integer variable, “bound” to click_count
  • react click, a function which reacts to click being modified

When click is set to true, the react click code is run. This will change the value of the local state variable. In turn, this causes the public cell to change.
The other file creates an instance of this class as the object clicker and defines two functions: display, which displays the current click count and main which is run on program start. Because of the automatic event management system, any time clicker.click_count changes, the display function gets rerun. As I stated above, setting click to true, will increment click_count, so if you set clicker.click to true, the value for click_count is incremented, click is reset to false and display is executed.

The main function simply sets clicker.click to true three times, waiting for events to be handled in between.

The concept is simple enough – modules/objects may contain mutable local state, but communications between modules can only be done through automatically managed events. Any code which relies on these events (by referencing a cell or because its a reactor) will be automatically evaluated any time the event occurs (that is, any time the dependents change).

The language could be visually enhanced too:

Visual representation of code

Visual representation of code

At this point, I’d like to point out that this is not the language I have envisioned and is not what this blog aims to be about, but it is a step in that direction. So bear with me, I will slowly transition into dataflow and hopefully design a useful, intuitive and powerful dataflow language in the process. Or at the very least, learn something useful.

Posted in Software | Leave a Comment »

Concurrent or Parallel?

Posted by dublindan on June 10, 2009

I just realised that I made a bit of a mistake with my “Concurrent Programming” post. Concurrent programming and parallel programming are not really the same things, yet in my post, I talked about both under the one term: concurrent programming.

Instead of editing the post, I’ll just clarify here:

  • Concurrent programming is programming in a way that multiple events happen, conceptually, at the same time. That is, we impose no sequence or order on operations because we do not need to or because we want to avoid one long-running action from blocking others. For example, when you save a large file, it may be useful to run the user interface concurrently to the code which writes the file to disk. On mult core or CPU systems, these things could actually run in parallel, but they don’t need to – they could be interleaved, for example.
  • Parallel programming is about executing tasks in parallel as a performance optimisation or for redundancy. The goal of parallel programming is either to perform calculations quicker, by splitting them into sub-calculations which can be run at the same time or to achieve fault-tolerance through redundant calculations happening at the same time.

The difference between the two is that concurrent programming doesn’t actually need to be parallel programming, but must give that illusion because the concurrency serves some other purpose. Parallel programming does this for performance or redundancy and therefore does need to be run in parallel. Obviously, the two may be mixed and mashed together. It is worth noting that there is a difference, however.

In this blog, when I speak of concurrent programming and concurrency, I’m really referring to both of these and I also usually mean that tasks will actually run in parallel, at the same time, though this may not always be a requirement.

Just something to be aware of when reading my insanity ;-)

Posted in Concurrent Programming, Software | Tagged: , , , | Leave a Comment »

Upcoming…

Posted by dublindan on June 10, 2009

Ok, its been a few days since my last post. I really want to post about dataflow, but not until I have a prototype interpreter up and running. I had hoped to have a simple one written over the weekend, but other things got in the way. I’ve been insanely busy with work at the moment too.. (we are in the middle of a deployment in the middle-east, so its a bit hectic here at the moment).

Since I’m not quite ready to write about datalow and I don’t really have the time to write a detailed post about anything else right now, I’ll instead try to outline the topics I hope to cover over the coming weeks in order to give you a taster of what I have planned.

So here goes:

  • Dataflow, concepts, ideas, techniques, advantages
  • Dataflow implementation based on my prototype interpreter, how it can be improved, what I intend on doing with it
  • Mixing dataflow and other paradigms, compiling other code to dataflow
  • Hierarchies, flat vs stacked (credit goes to Cleo Saulnier for this one)
  • Anonymous message passing event system for component based software
  • Ideas for operating systems, application interoperability, user interfaces, etc
  • Some unrelated stuff:
    • Musca tiled window manager
    • Vim scripting
    • Python programming
    • Factor programming
    • Miscellaneous programming topics

I will edit more into this post as I think of it. Also, if anyone has any requests or questions, feel fre to comment on this post and let me know.

Posted in Uncategorized | Leave a Comment »