tl;dr

Building a multi-dimensional spatially configured feed forward nodal compute network.

  •  

    December 2009
    M T W T F S S
    « Aug    
     123456
    78910111213
    14151617181920
    21222324252627
    28293031  
  • Archives

  • Subscribe

  • Other Blogs

  • RSS Latest Posts on Vorlaths Blog

    • Assembly: Interlaced Zig Zag x264 (Updated Dec 8,2009)
      Via reddit, we find this article about implementing zig zag in assembly. See zig zag figure in section 1.6.6 of this article for more info. The author implemented the interlaced version of zigzag (16bit word elements) as it was not already available. This version is for more recent processors as can be seen by this quote.Furthermore, as should be clear by th […]
    • Intel 48 Core Processor (Update: Larrabee canceled)
      Link to article.I like what they did with the routing. I wonder how it works at the low level. They have 24 dual core IA32 processing units arranged in tiles. There is a 2D grid routing system. From what I gather, there is a buffer for incoming data (and outgoing?) and each L2 cache is independent. There is a central (up to) 64GB of RAM.I like the buffering […]
    • Climategate
      If you only watch CNN, MSNBC, CBS, ABC, BBC or CBC for your news, you probably didn't hear about one of the biggest scientific, economic and global scandals ever. I'm a big fan of conspiracy theories for their entertainment values, but this has real consequences. Still, it does have a lot of comedic value in some parts. Specifically, the coding par […]
    • Generic List
      I don't know if this is possible, but it'd be cool if it was. In Project V, I use a generic list with the objective that no operation takes more than O(log n). However, it has one drawback which causes it to be just short of that objective. Before going into that, a description of the list is required.I will simplify the scenario since Project V ha […]
  • RSS Latest LtU Posts

    • An Innocent Model of Linear Logic
    • Back to the Future: Lisp as a Base for a Statistical Computing System
      Back to the Future: Lisp as a Base for a Statistical Computing System by Ross Ihaka and Duncan Temple Lang, and the accompanying slides. This paper was previously discussed on comp.lang.lisp, but apparently not covered on LtU before. The application of cutting-edge statistical methodology is limited by the capabilities of the systems in which it is implement […]
    • Why API Design Matters
      Michi Henning, Why API Design Matters, Communications of the ACM, May 2009. After more than 25 years as a software engineer, I still find myself underestimating the time it takes to complete a particular programming task. Sometimes, the resulting schedule slip is caused by my own shortcomings: as I dig into a problem, I simply discover it is a lot more diffi […]
    • ActorScript(TM): Industrial strength integration of local and nonlocal concurrency for Client-cloud Computing
      ActorScript(TM): Industrial strength integration of local and nonlocal concurrency for Client-cloud Computing by Carl Hewitt, 2009. ActorScript is based on a mathematical model of computation that treats “Actors” as the universal primitives of concurrent digital computation [Hewitt, Bishop, and Steiger 1973; Hewitt 1977]. Actors been used both as a framework […]
  • RSS Latest ACM Queue Posts

    • A Conversation with Arthur Whitney
      Can code ever be too terse? The designer of the K and Q languages discusses this question and many more with Queue editorial board member Bryan Cantrill.
    • Purpose-Built Languages
      While often breaking the rules of traditional language design, the growing ecosystem of purpose-built "little" langages is an essential part of systems development.
    • Realtime Garbage Collection
      Traditional computer science deals with the computation of correct results. Realtime systems interact with the physical world, so they have a second correctness criterion: they have to compute the correct result within a bounded amount of time. Simply building functionally correct software is hard enough. When timing is added to the requirements, the cost an […]
    • How Not to Write Fortran in Any Language
      There are characteristics of good coding that transcend all programming languages. Programmers have debated the merits of different programming languages since the dawn of programming. Every coder has a favorite general-purpose programming language, and many have an unfavorite language, too. If the coder is old enough, often that unfavorite language is Fortr […]

Archive for the ‘Concurrent Programming’ Category

Building a flat hierarchial component system in Clojure

Posted by dublindan on December 1, 2009

I’ve been playing around with Clojure a lot lately and, just for fun, I implemented a simple system for constructing a component based software architecture where each component runs completely independently and asynchronously. Components communicate by sending events to each other, achieving a flat software hierarchy.

 

; Define a function to send events
(defn event [k v] nil)

; Define the data structure used to model a component
(defstruct component-s :name :handlers :state)

; Create a component by creating an instance of the data structure
(defn make-component [name handlers]

(struct component-s name handlers {}))

; Run a collection of components, start by sending an :init event to each component
(defn run-components [& comps]
   (loop [components comps
          events     (atom [])
          key        :init
          value      nil]
      (if (= key nil)
         ; Component processing has completed - no new events have been generated
         'Done
         ; Components have not completed processing - events are still left to be processed
         (let [updated ; Updated list of components
            ; Bind the event function to a closure which can update the vector of events
            (binding [event (fn [k v] (swap! events concat (vector (list k v))))]
               ; Map the update function accross each component in parallel
               (doall (pmap #(conj % (if ((:handlers %) key)
                                                      {:state (conj (:state %)
                                                                    (((:handlers %) key) value (:state %)))}
                                                      {}))
                                    components)))]
            ; Recursively process components
            (recur updated                        ; Updated list of components
                   (atom (rest @events))          ; Remaining events
                   (ffirst @events)               ; Next event type
                   (second (first @events)))))))  ; Next event message

We can test this by creating and running some test components:

(run-components
   (make-component "Comp1"
      {:init (fn [v s]
            (event :set-msg "Ho")
            (event :print nil)
            {:msg "Hi"})
       :print (fn [v s]
          (println (:msg s)))
       :set-msg (fn [v s]
          {:msg v})})
   (make-component "Comp2"
      {:init (fn [v s]
         {:msg "Hello"})
       :print (fn [v s]
         (println (:msg s)))}))

Comp1 contains three event handlers – :init, :print and :set-msg. Comp2 only contains two event handlers – :init and :print.
:init is called to setup the components state and trigger any initial events. :print will print the contents of the states :msg slot to the terminal and :set-msg will change the :msg slot to the value of the events message.

Running this sample will output the following:

Ho
Hello
Done

It becomes more interesting if the components do a little more, for example:

(run-components
   (make-component "Comp1"
      {:init (fn [v s]
            (event :print nil) ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
            (event :set-msg "Ho")
            (event :print nil)
            {:msg "Hi"})
       :print (fn [v s]
          (println (:msg s)))
       :set-msg (fn [v s]
          {:msg v})})
   (make-component "Comp2"
      {:init (fn [v s]
         {:msg "Hello"})
       :print (fn [v s]
         (println (:msg s)))}))

Running this will output:

Hi
Hello
Ho
Hello
Done

Note how the first event gets processed before the second one – ie, Comp1’s :msg has not yet been modified.

Of course, you can set up all sorts of crazy chains of events with many event types, complex logic and so on.

What would be really nice is a DSL designed for writing these component-based programs, since calling run-components and make-components manually is a little messy. Something like the following would be nice:

(program
    (component "Comp1"
        (on :init [v s]
               (event :print nil)
               (event :set-msg "Ho")
               (event :print nil)
               {:msg "Hi"})

        (on :print [v s]
            (println (:msg s)))

        (on :set-msg [v s]
            {:msg v}))

    (component "Comp2"
        (on :init [v s]
            {:msg "Hello"})

        (on :print [v s]
            (println (:msg s))))))

Luckily, this is the kind of thing Lisp is good at and Clojure, being a Lisp dialect, has everything we need to make it happen:

(defmacro program [& components]
    `(run-components ~@components))

(defmacro component [name & handlers]
    `(make-component ~name (conj {} ~@handlers)))

(defmacro on [handler args & body]
    {handler `(fn ~args ~@body)})

These macros transform our convenient little DSL into a set of calls to run-components and make-components. Success! We can now write fun little component-based programs, arranged in a flat hierarchy, using an anonymous (sender never knows who receives the events; receiver never knows where the events come from) event-based messaging system.

Posted in Concurrent Programming, Software | Tagged: , , | Leave a Comment »

What is dataflow?

Posted by dublindan on July 20, 2009

I was reading through old blog posts and came upon this gem in a 3 year old post on Cleo Saulniers blog, which I think is a near perfect description of what dataflow actually is. I know some of you were itching at a real discussion about what dataflow is and how it helps us. Well, here you go:

Many still can’t distinguish the difference between control statements and the traces on a circuit board. These are two and completely unrelated concepts.

  1. One is used to control what next component or instruction gets executed.
  2. The other determines what next component (or instruction) the data goes to.

Read the last two sentences again. Do you see the difference?

This is, in my opinion, the clearest, simplest contrast between dataflow and conventional programming paradigms that I have found. However, just in case this didn’t sink in, I’ll attempt to clarify:

  • Dataflow: material flows from person to person on an assembly line, where each person always performs the same actions
  • Not dataflow: material is passed to a person, who performs a number of different operations on it before passing it to the next.

It is interesting to look at other systems in nature, because software is not, after all, unique. Software builds on hardware, which builds on physical properties of the world. The theories are built upon abstract mathematical models, on chemestry, on biology, on architecture, on art.. Computer scientists draw inspiration for new concepts, theories, ideas and designs from a vast array of other fields. Software is not unique at all. So, lets look at nature, at biological organisms.

Biology is interesting to us, because it works in exactly the same way as computer hardware, but instead of electrical processes, they rely on biological and chemical processes. They have the interesting property that they are, essentially, dataflow systems (just like computer hardware, under the hood). They are spatially sequenced (instead of temporally), just like dataflow systems (and spreadsheets, pixel shaders etc) and they are structured in flat hierarchies. Do you see the buzzwords? Flat hierarchies, spatial software, dataflow. Exactly what I’ve been ranting about here, on this blog, all along.

Lets look at a concrete example:

Cell based organisms. A cell is, essentially, a biological compute node. They are fed data, through the exchange of proteins, hormones and electrical and chemical signals, do something with this data and feed it on. Input, Process, Output. They are spatially configured, in multiple dimensions, in a flat hierarchy. That is, if you take out a cell, the entire system does not collapse. Sure, it is now missing a compute node and it won’t be able to work quite the same as before, but the system is still intact. The cells are not temporally sequential structures; they are spatial – their sequence depends on their spatial position in relation to each other.

Neural networks (or, to avoid confusion, networks of neurons) are also spatially configured compute nodes, leveraging a flat hierarchical architecture, composed of a dataflow network. The links are the synapses, which feed data through a dataflow network of neurons. The neurons are positioned spatially and the synapses depend upon their spatial position in relation to one another. The overall hierarchy is a flat one – if you remove a subsection of the network, the rest of the network does not collapse. Most biological systems appear to have these properties.

So back to dataflow programming. Our dataflow system will also feed data from node to node, in a spatially configured flat hierarchy. They are no different from biological systems. Can we infer anything from this?

Of course we can. Biological systems have other interesting properties, which we would love to achieve in software. Since our dataflow system exhibits the other properties of biological systems, logically we could infer that it may also exhibit the other desirable properties. At the risk of falling into the correlation = causation trap (that is, just because our system has similar properties to biological systems, does not mean ours will have all properties present in biological systems), we can see the potential for other desirable properties: fault tolerance, tolerance of incomplete data, redundancy, replaceable parts (that is, hot-swappable components – thanks to the flat hierarchy).

I hope you can see the huge potential a dataflow system provides.

I guess you can describe the programming system, which I am trying to describe here in this blog, as a multi-dimensional spatially configured feed forward nodal compute network. Now that’s a buzzword loaded sentence!

Posted in Concurrent Programming, Dataflow, Software | Tagged: , , | 5 Comments »

Concurrent or Parallel?

Posted by dublindan on June 10, 2009

I just realised that I made a bit of a mistake with my “Concurrent Programming” post. Concurrent programming and parallel programming are not really the same things, yet in my post, I talked about both under the one term: concurrent programming.

Instead of editing the post, I’ll just clarify here:

  • Concurrent programming is programming in a way that multiple events happen, conceptually, at the same time. That is, we impose no sequence or order on operations because we do not need to or because we want to avoid one long-running action from blocking others. For example, when you save a large file, it may be useful to run the user interface concurrently to the code which writes the file to disk. On mult core or CPU systems, these things could actually run in parallel, but they don’t need to – they could be interleaved, for example.
  • Parallel programming is about executing tasks in parallel as a performance optimisation or for redundancy. The goal of parallel programming is either to perform calculations quicker, by splitting them into sub-calculations which can be run at the same time or to achieve fault-tolerance through redundant calculations happening at the same time.

The difference between the two is that concurrent programming doesn’t actually need to be parallel programming, but must give that illusion because the concurrency serves some other purpose. Parallel programming does this for performance or redundancy and therefore does need to be run in parallel. Obviously, the two may be mixed and mashed together. It is worth noting that there is a difference, however.

In this blog, when I speak of concurrent programming and concurrency, I’m really referring to both of these and I also usually mean that tasks will actually run in parallel, at the same time, though this may not always be a requirement.

Just something to be aware of when reading my insanity ;-)

Posted in Concurrent Programming, Software | Tagged: , , , | Leave a Comment »

Concurrent Programming

Posted by dublindan on June 4, 2009

Since the advent of inexpensive multicore processors, there has been a lot of buzz about how concurrent programming is hard. So hard, in fact, that we are in a software crisis because Moores Law is coming to an end, being replaced by systems with many parallel cores. This means that, in order to take advantage of these systems, concurrent programming is a must. Yet concurrent programming is extremely difficult.

Let me ask you, is concurrent programming actually hard? Does it need to be?

My answers are yes, concurrent programming IS, indeed, hard and no, it DOESN’T need to be. Let me explain.

In my own professional experience, concurrent programming has proven itself to be hard.

I work on core network services for the telecommunications industry, with clients in the middle-east and southeast asia, amongst other areas. We develop value added services for SMS networks for mobile operators – SMS anti spam, blacklisting, throttling and so on. On a medium sized network, we must be able to cope with 2000+ messages per second – this includes a large number of rules and processes which must be applied to each message.

When our current product was still in its infancy, the architecture was very much different than it is now. We wrapped each rule and process inside a “task” and we had a “Sequential Task Processor” which processed these tasks individually, but sequentially. One day it was decided that its time to “optimise” this system by “parallelising it”. Time estimate? About 2 hours – the tasks are there, they “only” needed to be made execute concurrently! To cut a long and painful story short, about two weeks later, we were still ironing out bugs and issues and wondering why it was slow and why we had so many intermittent problems. Not to mention how difficult this made tracing through logs – the entries could now be in any order! Eventually, months later, the entire task code was thrown out and replaced with a much much simpler system where messages are processed concurrently, but the rules were all sequential. The code was suddenly easy to understand, fast and stable. Logs were now predictable too!

Another example, again from work, was that we run our system in a clustered environment. Both to increase overall throughput and to provide redundancy. To achieve this, we had a number of data structures (hash tables, trees, etc) which were globally accessible between all nodes. These “shared caches” were used to store all sorts of interesting information, including messages themselves (so that if the node died, another one can process the message instead). Conceptually, the system was brilliant! Fault tolerant, scalable, highly distributed… except not. All of these data structures needed to be manually locked. Changes needed to be pushed out to other nodes. It became our biggest source of problems and bugs. Again, eventually, after a lot of pain, we scrapped the entire concept and redesigned the system so that each node is as independent as possible. Again, doing this made the code easier to understand, faster and a lot more stable!

What did I learn from this? Simple – concurrency is extremely hard to get right! Well, I knew this already, since I read a lot of programming blogs, but this was the first time it really sunk in.

I’m sure that anyone who has ever worked on a large concurrent system has experienced similar pain. Anyone who still feels that traditional (monitor based shared state) concurrency is not hard probably hasn’t written a terribly large or complex concurrent system.

The question, however, is: does it have to be hard?

I don’t think it does. There are some techniques which can ease the pain of concurrency. The first is Transactional Memory. A good introduction can be found over on the Intel website. Transactional Memory aims to remove explicit locking by handling all synchronised accesses as a transaction and if theres a clash, the transaction can be rolled back and run again on clean data. This certainly simplifies concurrent programming, but it solves the symptom, not the cause by hiding the complexity. Brushing it under the rug, so to speak. One still needs to deal with the same issues.

Another method, which, IMHO, greatly improves concurrent programming, is a shared-nothing message passing system where messages are immutable. This works really well, because the logic can be completely lockless! If the data cannot change, once created, then theres no danger of concurrent writes. Brilliant! Concurrency is not so hard now, is it? Highly concurrent languages like Erlang use this model and they do it pretty well.

But no, shared-nothing message passing is far from a “Silver Bullet” of concurrency. Firstly, not all problems are suited to a shared-nothing model. Secondly, and more importantly, the programmer must still manually handle the concurrency. Must still manually decide what should be concurrent and what should be sequential. The programmer must still decide when to process messages or when to post messages. In fact, this is the real problem with concurrent programming – its too manual. Locking causes problems, but its not the root problem – as I said, locking can be elliminated using Transactional Memory or message passing.

So, is concurrency still hard? Yes, it most certainly is. How can this be solved? The answer to that is by eliminating the programmer from the decisions of what should be concurrent, when and how. The compiler should determine this and concurrency should be automatic and transparent.

Easier said than done, of course and this is the holy grail of concurrent programming research! Just take a look at recent ACM SIGPLAN publications. So far, there has been little real success…

But I said that I believe concurrency doesn’t need to be hard! I still believe this, even after all that. In fact, I actually believe that concurrency is a solved problem! Not only that, but that it was solved a long long time ago.

Outrageous claim? Maybe, but let me explain.

In the book, Concepts, Techniques and Models of Computer Programming , Peter Van Roy writes a few interesting words on the topic. In the chapter which deals with internal state (for example, in object oriented programming, objects contain internal mutable state) he writes that in the next chapter he will reintroduce concurrency to produce a stateful concurrent model not unlike what we are used to in mainstream languages. He then states that this is an extremely dificult model to program in. He agrees that concurrent programming is hard!

In the introduction to the chapter on declarative concurrency, however, he asserts that “concurrent programming need not be hard”. Wow! Where did that come from!? Reading on (or understanding the chapter title: “declarative concurrency”) provides the answer: dataflow.

Dataflow is an old, yet often forgotten or ignored, programming paradigm and I believe it holds the answers, where concurrent programming is concerned. Why? Because it can handle concurrency automatically, implicitly and transparently. Programmer not required! In effect, dataflow programming is the holy grail of concurrent programming. I won’t explain here why – dataflow deserves its own dedicated blog post – but I will assure you that dataflow is real and exists RIGHT NOW.

In fact, in certain circles, its even quite popular:

  • LabVIEW, commonly used for data acquisition, instrument control, and industrial automation by industry and by the scientific community.
  • SCADE & Lustre, this is used in the defence and aerospace industries and is also used to run big industry operations like nuclear power plants.
  • VHDL & Verilog, hardware design languages used to program FPGA’s and ASIC’s.
  • I almost forgot the most well known dataflow systems of all!! One everybody is familiar with: spreadsheets.

LabVIEW and SCADE are graphical dataflow systems. VHDL and Verilog are textual dataflow languages. All four are used fairly heavily and have existed for quite some time. There are many more. I find it interesting that dataflow was the chosen paradigm for critical applications such as defence, aerospace, hardware design and power plants! If its good enough for them, its surely good enough for us mere mortals.

This proves that not only is dataflow real and available RIGHT NOW, but it also implies that there is, indeed, value in using dataflow, or these languages wouldn’t be popular in their domains.

The next step, in my opinion, is a general purpose dataflow language which is tailored for mainstream programmers and caters for general programming tasks. This langauge must automatically scale across multiple cores (or even multiple computers), be easy to learn and use and be a productive language to develop software in.

I will talk about this in a lot more detail in future posts.

Posted in Concurrent Programming | Leave a Comment »