I'm not big on cloud computing. This is amusing in its own way because I built many services around massive clusters in the late 90s and early 00's. On the desktop I was an early advocate of SMP processors (1991), and felt (rightly) as it turned out, that multiple CPUs there made a big difference in interactivity and robustness.
1) Who the heck needs all their information OUT THERE, rather than IN HERE?
2) The massive efforts at parallization going on in the clustered database field - stuff like BigTable - discard what we've learned from decades of relational database design.
3) Starting up whole instances of entire multi-user operating systems to run single applications seems like overkill. What was so wrong about multi-user OSes and the schemes we had in place to regulate their cpu usage?
I'm far more interested in what can be done using resources that are on the edge of the network, on devices that are cheap and fast, using code that is designed to be fast. Every home connected to the Internet has a perfectly capable, always on, computer attached to it, called a wireless router. Many people keep a small NAS around, too.
I get excited about new, low cost, low power chips that can be embedded on devices inside the home, or on your person. For the last year I've been hacking on an openrd box - built around a 1.2Ghz arm processor. I have it doing genuinely useful stuff - it's running this blog, DNS, a web server, a chat server, email, and storing about 3TB of data for an infinitesmal cost per joule, and a one time acquisition cost of a little over a hundred bucks.
That little box eats 11 watts (I plan to replace it with a guruplug, next week, which eats 5 watts) - the hard disk, when spun up, eats another 6.
It's WAY faster than the wireless router it replaced. The local DNS server smokes the ones supplied by comcast. The web server serves up content 10x faster and 1/6th as latent than anything on the internet.
Downloads via the old router never cracked 11Mbit (even through the wired interface), now - the internet runs at 24Mbit, and via wireless, inside the house, it's pushing 150Mbits or more at sub 60 ms latencies.
I know it would be hard for "normal" users to use the openrd box (it's running debian) but I sure would like to see efforts being made to make immensely powerful devices like this - under your full control and in your home - easier to use.
But all the development money seems to be being sucked into the cloud. Progress on making things like the guruplug and related "plugs" non-hacker-only devices has been really slow.
For the past 6 months I've averaged 2 emails a month from recruiters from companies doing stuff in the cloud. They all look at my cluster experience and get hot and horny...
EVERY last company doing cloud stuff, at least in America, has people working for it that I've never heard of, and very few of those companies are releasing any code that I can run on my own machine. I was shocked to realize that every single piece of code I've been working with lately originated outside of America, actually. It seems like there is a giant intellectual black hole here - code goes in, and never comes out, except as a service.
For me the fascination of computers came from having one of my own that I could control and make do interesting stuff - unleashed, uncontrolled. If I did something to mess it up, it was just my fault, and my fault alone. Running stuff in the cloud scares me, one accidental infinite loop and I'm facing a massive bill, instead of my laptop or openrd merely heating up a little.
I hear, off in the distance, the VC's and their accountants chortling at the financial prospect of me coding and computing in their cloud...
The last cloud company I interviewed with asked me what I would do with 10,000 servers. I said, "Use them as very big and heavy Christmas ornaments?"
- Underutilized processors
The vast numbers of smart cell phones being sold are in "clouds" of their own, almost entirely cut off from each other, even with CPU's and memory allotments that I would have paid 10s of thousands of dollars for in 2000.
Cell phones are admittedly battery limited, but there is so much MORE that could be done on our cell phones, if only they can be effectively used. It boggles my mind that I can sit an iphone down next to a android phone - both capable of communicating at 54Mbits/sec - and only be able to transfer files directly (via bluetooth) at 64KB/sec.
I'd love it if there was a BOINC client for my phone.
I wish I could switch the cloud conversation over to some other items that matter, namely latency, security, energy use, and ease of content creation, in the hope that perhaps more devices and services could exist at the edge of the network and inside the home.
My personal bugaboo is latency. I've ranted on this recently as everything that costs me latency costs me think time. I've gone to great extremes to cut my latency between tasks down by funnelling as many of the web driven applications I HAVE to use - like facebook, and twitter, and chat, into customized emacs-and-process-driven tools.
To get this blog, from the aformentioned openrd box - takes 60ms. From my server in SF - 600ms. If you use RSS to read this - the store and forward architecture of RSS cuts your latency again, to nearly 0. Nearly 0 is good. More than 100ms is bad. Why don't people "get" this?
I am processing over 30 RSS feeds via RSS2Email - scanning nearly 5000 messages per day from various mailing lists - in less than an hour. Via the web, it would take me all day, and I'd get nothing important done. Recently I switched to mostly reading^H^H^H^H^H^H processing RSS via gwene.org - I can scan ALL the interesting stuff in seconds, just read what I want to read by hitting return, catch up on everything by hitting "c", and then stuff "Expires" so I never have to look at it again.
Search takes a half a second via the web. A Xapian search of my entire dual core laptop takes about 50ms.
I really wish we could move the depth of the web conversation about our ever increasing bandwidth to a good discussion about our ever increasing latency, and what we could do to decrease it. I'm VERY happy that there is an application market for intelligent cell phones - the idea of being tied to a server ALL THE TIME for everything my handheld does is nuts. I LOVE internet radio for example, but 3G isn't fast or reliable enough to stream podcasts or mp3s, so I store those up on the phone via some automated utilities for later - offline - playback. I shudder to think of the day that I'd have to pay-per-byte for data on the darn phone.
(not, that, I'd mind if my cell phone supported X11(NX) based applications, but that's a rant for another day)
Here's a case where the constraints of the cloud make me a little crazy: NetFlix Streaming.
Netflix's video on demand service is great, but the streaming video quality sucks. Netflix HAS A QUEUE of stuff they will automatically send you via mail OR you can stream it. I have about 84 items in my queue.
Given that I have a NAS, AND that queue - piled up - I could be downloading high quality videos at night, when I'm not using the net for anything else.
I know there are weird legal concerns about you having a cache of videos that you don't "own", but paying the latent price for "streaming low quality videos" or abusing the postal system seems really silly to me. Yes, I could use bittorrent to get full quality videos, overnight, instead of using Netflix.
The web's security model is broken. It's just plain broken; it's almost unfixable. Worse, we've been dumbing down all our tools, from desktops to handhelds, to servers, to make them secure enough to run apps in the cloud and AS A RESULT making them less useful to run on our own machines AND sharing our data with people we probably don't want to share data with.
Here's an example: recently I switched from filtering my mail via the crufty old procmail utility and thunderbird's inherent filters, to sieve. Sieve has some nice features - the language is free of regular expressions, unless you want to use them, it's human readable, and compilable - it's much nicer than procmail in most respects.
But - in the name of security - sieve lost one feature of procmail that I used a LOT to make it easier for me to process my email - piping. I can - have, do - pipe email through a bunch of other filters that do interesting stuff, everything from automagically maintaining my bug databases to generating text to speech and chat notifications. I'm almost mad enough about this to start hacking at dovecot to make it "do the right thing", to re-enable myself to do what I've been doing for nearly two decades - processing vast amounts of email into useful content.
The core word here is “process”. I don't “browse” the web. I process it. I process email. I process words and code. Browsing - or grazing - is what herd animals do.
I have a personal email server - D AT DONTSPAMME TAHT.NET. It just gets email to me and email from me, unbelievably fast compared to what it takes to download my email via imap or worse, read it with a web browser.
Anybody remember when having “Your personal webserver” was cool? I STILL run a webserver on my laptop - prototyping this blog, various other services, like rt, as well as a local search engine. What happens when I unplug from the network? Ubuntu AUTOMATICALLY puts my browser in offline mode and I CAN'T get to http://localhost anymore.
My laptop is a PERFECTLY, fully capable web server, far faster (and far less latent) than anything that can exist in the cloud. It comes, setup, out of the box. There are all kinds of cool server applications you can run with a local web server, like accounting or sales related tools. Why turn the network off so completely? Why assume you aren't going to use the web, personally, on your own machine?
My cell phone would be a perfectly capable web server, if only it could register itself somewhere sane with dynamic dns.
Stuff that's inside your firewall and home have CLEAR legal boundries that cloud stuff does not. I am comforted by the fact that my email comes to my house, just like my regular mail does. Same goes for the stuff I have under development. Physical security is knowing only one person has the key to the front door and the computer room.
- Content creation
There are so many things that respond badly to latency. I don't think anyone seriously thinks art (photoshop) creation, or music production can move into the cloud. Some (non-writers I hope) keep thinking that the act of writing can be moved into the cloud - (and to a distressing extent, that does work, for short, twittery stuff)
A boss once called me an IP generator. He meant it as a complement. As one of the people generating the IP, I'd dearly like more people to be enabling the stuff directly under my hands. There are many wonderful tools - like Ardour and emacs - that can help you do cool stuff, and escape the cloud. It's a bitch being one of the .004%.
- In conclusion:
I'm not against "the cloud" for when it's appropriate. In fact I have a good use for cloud computing now. My friend Craig does a lot of high-end graphical rendering. The minimum time a typical render takes is about 4 minutes. Big ones take hours. Even if he had to upload an entire 160MB file to a render farm, he'd be able to cut that basic time to about a minute using a 10 machine cluster - and given the ~60 cent cost on 4 minutes of 10 machine's compute hours on Amazon EC2, he could get 3 minutes of his life back - at a bill rate of 95/hr - which he could use for other things. He could get a lot more life back for the big renders.
Unfortunately, the licensing scheme for the maya rendering software renders this idea prohibitive. I've tried a few other renderers, like luxrender, with less satisfying results. Secondly, the time it would take to "spin up" 10 virtual machines for a 1 minute job would probably be well in excess of 1 minute.
The rendering problem I note above is different from most cloud applications in that what I want to do is fully utilize (rent), briefly, a LOT more processors than I want to own, and then return them to normal use. I hope that there is some service out there offering maya based rendering already, that uses some other method besides pure virtualization to apply to Craig's situation.
Hey buddy, can you spare a few cpu cycles?
I'd like to put Amdahls Law up in big red letters and have latency get discussed, seriously and knowledgably, every time someone talks about moving an essential service into the cloud rather than into your own hands. Maybe if there was a cataphony of reporters asking: “how latent is your new service?”, “how does it save me time?”, “will it work when I'm offline or outside the USA?” and “how can I integrate this with my workflow?” at every new cloud based product launch...
Maybe, just maybe, everybody'd get focused again, on enabling people, on their own hardware, to work better, and smarter.
I'm not holding my breath.