It’s always nice to get an Ebay perspective on things – especially when you totally agree with what is being said. For a long time I have felt that if we really want to embrace the power of large scale distributed systems we have to accept that we just do not know what is going on – life is non-determinstic, messy, chaotic and generally random. Most tech orientated people are used to thinking in a single linear fashion. First the processor does this, then it does that. Nice – but not the way the world works. Things actually happen at the same time. Just not neccasarily on your processor or in your address space. For many people dealing with that is a problem and something that should be somehow hidden away or, at the very least, forced into a 2PC harness. That might work in a small system but not when you reach internet scale systems. Forget all about a linear world and move to thinking about multiple agents doing multiple things at the same time.
This is a thought process that is going to increasingly dominant the way we build systems.
We had the inaugural Next Net meeting, held at the Betfair.com offices in Hammersmith, last night. It was well attended event with about 40 people there. Dan Creswell gave an excellent talk on the work he has been doing with Amazons EC2 service (slides are : here).
The aim of the group is to try and get a group of people together who are interested in creating next generation distributed systems â€“ focussing on how you build, manage and develop on large scale, high performance, highly resilient, self healing distributed systems out in the real world not buzzword land.
We are looking at holding the next meeting at Brunel University on January 18th. If you are interested in coming along, or have an idea you want to talk about – then please drop me a mail.
One of the things I want to spend some time thinking about in this new blog is how we might build massive scale (as oppposed to merely large scale) distributed systems and what we can learn about building these applications from other areas of knowledge such as biological, organic and social systems.
Last year Werner Vogels gave a talk (related pdf) about how you might build VERY large systems (million+ nodes) and why these systems just will not scale with our current deterministic way of building systems.
Systems of this size are highly fluid in nature. Individual nodes within the system will come and go almost constantly as hardware, software or user failure/action/error causes localised problems. Even with highly reliable hardware and a mean time between failure of components measured in hundreds of thousands (or even millions) of hours – with millions of nodes in a system the law of averages means that you will get failures hourly. Add in coding errors, support screw ups and end user errors and it very quickly ends up looking like a digital massacre. Somehow the application that is sitting on top of this quantum flux of failure must be able to deal with all the chaos and provide a stable and coherent user experience.
With all this failure it begs the question: “Is it even possible to build systems of this size and complexity?”
Trying to deal with this flux in a deterministic, synchronous or Turing style organised system manner is clearly a non starter. The management overhead will be horrendous. The overall system will be highly brittle and subject to the most extreme strain.
If we are unable to muscle the system into the desired shape we need to think about different approaches. The application needs to deal with the flux as a fact of life and embrace it. As Bloglines found out this afternoon even systems that are of the scale most of us build could take some of this thinking on board.
As Werner points out in his talk there are many highly complex biological and organic systems that are capable of scaling to massive degrees with virtually no centralised control mechanism in place. These systems are probabilistic and self organising in nature.
The classic example is the ant colony or bee hive where each individual ant or drone goes about its own business and yet adds to the collective good. Cells within the human body are capable of acting in a highly consistent and coherent manner, displaying highly complex behaviour (you try programming a system to fight viruses or even breathe!) despite minimal directed management. Even humans are capable of forming complex, self organising systems with minimal direct interaction – think of markets and other forms of large scale crowd behaviour.
How do these systems come about? How do they manage to create such stable environments? How do they fail and what are their weak points?