Thursday, November 11, 2010 Beating the speed of light on the web by Dave Täht

I started writing this piece this morning to talk about two things - bandwidth - which is pretty well understood - and latency, which is not - in the context of getting better performance out of humanity's synergistic relationship with web based applications.

The problem is the speed of light!

    “For a successful technology, reality must take precedence over public relations, for nature cannot be fooled.” - Richard Feynman

Yesterday, I accidentally introduced a triangular routing situation on my network, which effectively put me on the moon, in time and space, relative to google. I was a good 3+ seconds away from their servers, where normally I'm about 70ms away.

It made clear the source of the latency problems I'd seen while travelling in Australia and in Nicaragua, where google's servers (in 2008) were over 300ms and 170ms RTT, respectively.

Everybody outside the USA notices the KER... CHUNK of time they lose between a click to web access... and even in the USA this sort of latency is a problem.

Programmers try really, really hard to mask latency - web browsers spawn threads that do DNS lookups asynchronously, they make connections to multiple sites simultaneously, and they try to render as much of the page as possible as it is still streaming, and for all that, the best most web sites can do is deliver their content in a little over half a second, and most are adding additional layers of redirects and graphical gunk that make matters worse - and all they are doing, is trying to mask the latency that is unavoidable.

It then takes me FAR more than half a second to process all the gunk on a typical web page.

Web based desktop environments have limited utility, despite the accolades they get in the press.

The speed of light is unbeatable. The Net is getting perilously close to the speed of light, and until we come up with a tachyon based networking system, the only way to outsource your desktop is to have the network resources EXTREMELY close to the user, less than 40 ms away, a couple hundred miles at most, and doing that costs. Even 40ms is far too much: your home network and computer typically has latencies in the sub 2 or .2ms range, and significantly higher bandwidth than the Net can ever offer.

A trendy method for lowering search times on is to do a json search (via javascript) on EVERY character the end user types...

This is where living outside of the USA, or on the moon, becomes a problem. I'm going to pick on google here, but this applies to nearly every darn website out there that strives for better interactivity.

Google's front page now issues a http query after EVERY character you type, spitting out a new page on EVERY freaking character. As you might imagine, with a 3 second RTT as I had yesterday, this didn't work very well. Not only do you have setup and teardown of the tcp connection, but the content is fairly big... and there's DOZENS of DNS lookups that are all now taking place on EVERY character you type.

I looked at the packet trace of what was going on. I was horrified. This is what typing the second of two characters does to my (now repaired) network:

And that's only HALF the packet traffic a single character typed into google generates. I couldn't fit it all on the screen! Talk about SLAMMING the DNS server and the Internet! Don't these guys ever leave their cubies?

Especially outside of the USA...

  • Extra content costs...

  • DNS lookup costs...

  • And TCP's three way handshake costs...

  • And web redirects cost...

Not just bandwidth, but MY TIME. YOUR TIME. EVERYBODY'S TIME.

Having an ADHD-causing distraction for every damn character I type is driving me bats. Am I alone in this?

This is why I spend so much time ranting here - and fighting back - Every KERCHUNK of my time lost to all this extraneous BS is time I'LL never have back. Those few hundred microseconds of wait are too long for my focus and too short to form any new thoughts.

I was upset when tv went to commercials every 13 minutes instead of 15.

Now I get a commercial on every character! My screen flashes madly as I type at 120WPM...

The Net is for me, NOT them. I want my web time to be as minimal as possible so I have time for ME, my stuff, etc. I hate sacrificing half a second on every click to everyone else.

I wouldn't mind if there was some other way of keeping score as to your intellectual contributions to the world - even it if it was a fictional currency, like wuffie. Reading and searching should cost you wuffie. Writing - and making searching better - should earn it.

Solutions

<rant continue="y">

    ‎“Even though the Web is slowly being eroded into the usual consumer-based, mindless dreck that every other form of media is... there are still 65, 534 other ports to play on.” -- elf

I collect odd, obscure protocols that solve difficult problems in interesting ways. The world is a lot bigger than just the web's port 80 and port 443, and there's lots of useful stuff you can have on your own machine or network, as elf alludes above.

Some are still doing new interesting things at the lowest levels of the IP stack - take multicast for example: Dennis Bush's multicast uftp meets a real need for distributing data over satellite links. The Babel protocol uses link-local multicast and host routing to make it possible for users to have multiple links to the Internet and choose the best one, reliably.

There are hundreds of other protocols that people use every day, without even noticing they are using them. I'm not going to talk about ARP or DCHP... or stuff layered on top of HTTP, like json, today.

My dictionary server uses a standardized protocol and runs directly on my laptop. I have multiple dictionaries (english,spanish, eng-spa, spa-eng) installed. I LOVE having my own dictionary integrated into chat via Emacs's erc. Being able to spel stuff correctly while in coding environment is cool too. Web based spell checkers bug me a lot.

I run X11 over the wire, using its native protocol, using one keyboard/mouse for multiple computers. See x2x or Synergy for details. Darn useful tools - and synergy works with macs and windows, too.

Rsync is blazing-ly fast, using its native protocol - it mirrors this website in 1.2 seconds flat. I am not sure why people keep re-inventing rsync or use anything other than rsync.net for backups.

Git has its own protocol. I've moved all my source code and writing into git. Git lets me work offline in a way that no other source code control system can match, and have GOOD backups.

I use samba (CIFS) for filesharing - for copying large files it outperforms lighttpd on the same openrd box by a factor of about 3 on a gigE network - and, unlike a webserver, lets you copy things both ways and drag and drop - anybody remember drag and drop?

When I can't use samba, I use ssh and sshfs A LOT. I like drag and drop on the desktop. You simply can't do that easily over a browser. You can grab single files but simple stuff like mv * /mnt/mybigdisk (move all the files and directories to this other directory, which happens to be over the network), doesn't happen unless you have a virtual filesystem to do it with, which tends to be slower than ssh and much slower than samba. I remember - :sniff: - when SUN seriously proposed WebNFS a nfs:// url type. Vestiges of many older protocols exist as url types on browsing tools (smb:// and afp:// for example), too.

I would like common filesystem semantics like user based file locking and permissions to be part of the web's security infrastructure, too.

Not that this sort of stuff matters to most people nowadays. Most people share big files by swapping big usb sticks, for example, or upload stuff to web servers via sftp, or some form of a http post. Filesharing has become nearly synonymous with bittorrent, tainted with the stench of illegitimacy, when it's what we used to use LANs for in the first place!

Databases are all client/server and the database takes care of the locking problems so the loss of the old file system semantics is lost on everybody... except me and the system administrators that have to deal with the problems created on their side of the connection. And most everybody runs the database server locally nowadays and has no idea how to back it up, or optimize it.

I ran a dns benchmark recently. My local DNS server (running bind9 on an openrd box) smoked every other server outside my network, including comcasts for speed, latency and reliability.

Now... I wouldn't have a problem with using the web for everything, if these other protocols weren't so efficient and useful on their own. Not just faster - more importantly - they are very low latency, as most of them run on your own machine or inside your network, and give you a useful result in much less than .1 second.

Back when I was still doing GUI design, anything over .1 seconds was viewed as too much latency, and over 3 seconds, anathema. Wikipedia tells me that the web moved the outside limit to 8 seconds. In the USA, wikipedia delivers a typical page in about 600ms. Not bad, but still about 6x more than I can stand, and far longer than what is theoretically required. Other sites are worse, and far worse the farther you get away from the cloud.

Too many forces are trying to put everything useful OUT THERE, on the web, rather than IN HERE, on my own machine. I am grateful for my android which - while it uses the Internet heavily - at LEAST caches my email so I can read it when I'm away from a good signal. Why can't I do that with my voicemail?

Yes, there are compelling advantages to having everything OUT there, on the web, for information oriented businesses. You don't ever have to release your application's code to the public, and go through a release cycle, test multiple platforms, nor support older versions in the field - these are huge cost savings for application developers.

On my really bad days, I think the web, as it has evolved to have ever more monolithic and centralized services like Amazon, Bing, google, salesforce.com, etc., is also partially the GPL's fault. One of its loopholes (closed by the 2008 release of the AGPLv3, which few projects use (yet)) is that you CAN keep GPL based code on your own servers and never release it.

It's easy to leverage and enhance GPL'd code if you run a service, rather than distribute an application. But I've come to believe that the combination of all these outsourced services is not good for people.

Gradually, the money has moved nearly every service formerly done inside your firewall, onto the web. Mucho marketing dollars are expended making these applications shiny, sexy and desirable.

The browser has grown until it is often the only application a person runs. It takes over a full screen, and then you layer dozens of little apps and widgets over it until you have a tiny workspace full of tabs for your stuff, your thoughts, and your life. The screen-space consumption problem with a typical web page is so terrible - and with 1600x1200 screens becoming more common, with lots of spare space simply wasted - VCs are now producing specialized WHOLE browsers, like Rockmelt, targeted at the facebookers and twitterers... perfect consumers who apparently do nothing but chat all day, read ads, and buy stuff.

People with purpose driven lives run something like Outlook, or ACT instead. In my case I do as much as I possibly can, within emacs, using org-mode for gtd and time management.

The browser based desktop is not the right answer. There are huge disadvantages to having everything out on the Web - not just the privacy issues, but the human interface issues, that cannot be solved unless the data is moved closer to the user. Moving everything OUT THERE slows down the thought processes of the internet mind.

More protocols and tools need to migrate stuff to IN HERE.

Worse, moving everything OUT THERE can't beat the speed of light.

</rant>

In ranting so far today, I've tried to identify and explain the latency and bandwidth problem on today's web in various scopes. I have been carrying around solutions to some of them for a long time, in addition to using specialized, local protocols, like my own dictionary server, using a jabber chat client for facebook and gmail... keeping the web browser a tiny part of my desktop, and all the stuff I mentioned in the early part of this blog entry.

My laptop runs its own bind9 DNS server, usually. It caches a huge portion of DNS, and I USED to configure it to take the dhcp provided DNS server as a forwarder, as to lighten the load on the main DNS servers, until people (like comcast!) started breaking DNS by redirecting you to a web page chock full of ads on a cache miss. Bind9 is a LOT smaller than most virus scanners and more useful too.

I also - until google started hammering it - ran a web proxy server (Squid) for everything. In Nicaragua I used that for additional web filtering and for specialized services that enforced USA copyright restrictions - a US citizen, overseas, shouldn't be unable to access US content - shouldn't he?

Running a web proxy server used to speed things up in the general case - I turned it off yesterday because google was hammering it so hard, and noticed that ubuntu's ever-so-user-friendly implementation of firefox was preferring ipv4 urls over ipv6, for some reason. Sigh. But that's another rant... middling sized corporations and educational institutions still use proxy servers, don't they?

I use a custom little command line search client that does searches via json, that gives me JUST results, on keywords, inside of Emacs. I'll have to get around to releasing that one day.

But all that - and other stuff - are just workarounds for trying to fool nature a little too much, using protocols that are inadequate and using tools that are OUT THERE, rather than IN here.

There's an old joke about the engineer facing the guillotine:

    On a beautiful Sunday afternoon in the midst of the French Revolution the revolting citizens led a priest, a drunkard and an engineer to the guillotine. They ask the priest if he wants to face up or down when he meets his fate. The priest says he would like to face up so he will be looking towards heaven when he dies. They raise the blade of the guillotine and release it. It comes speeding down and suddenly stops just inches from his neck. The authorities take this as divine intervention and release the priest.

    The drunkard comes to the guillotine next. He also decides to die face up, hoping that he will be as fortunate as the priest. They raise the blade of the guillotine and release it. It comes speeding down and suddenly stops just inches from his neck. Again, the authorities take this as a sign of divine intervention, and they release the drunkard as well.

    Next is the engineer. He, too, decides to die facing up. As they slowly raise the blade of the guillotine, the engineer suddenly says, "Hey, I see what your problem is ..."

I can't help but think that DNS is getting overloaded, and TCP is overstressed for the kind of workloads we are giving it today. TCP's three-way handshake and tear-down costs over a hundred ms in the USA, time that could be used for something else...

Now that I'm done ranting I'll hopefully get around to discussing some alternate protocols, after I get the code finished, and more stuff, IN HERE.

More in a week or two. I'll be calmer, too. Probably.

You know you can turn that feature off, right? Using https://www.google.com/webhp?complete=0 turns off autocomplete, and there's a little option beside the search box to turn off instant lookup.
Comment by mr-cellaneous [livejournal.com] Thursday, November 11, 2010

There's a setting in Firefox to enable IPv6 sites (about:config, search for ipv6, it's pretty obvious) and I'm finding that I'm having to toggle that feature. Locally, all my sites run over IPv6 (site local addresses) but I found the occasional site (oddly enough, a few web comics!) that also have an IPv6 address, but sadly, I don't have an IPv6 tunnel so I have to wait for such sites (like yours) to timeout with the IPv6 address before it tries the IPv4 address. Sigh.

But yes, I'm running IPv6 locally for some stuff (even my custom syslogd supports IPv6 and mutlticast http://www.conman.org/software/syslogintr/ ). I still run my own server, not wanting to give my data to large third parties that might have other interests. I'll be curious to see what you have to replace TCP with.

Comment by boston [conman.org] Friday, November 12, 2010

Welcome to my really retro corner of the universe! (I'm really puzzled now as to why myopenid works and google/yahoo don't... now I gotta figure out whitelists. Or find a commenting system I like more than this.

@mr-cellaneous: How many people turn it off? How many people just complain about the speed? I'd like to have "off" be the default if you are more than 70ms away from google. And, as I noted, it's not just google that's doing it.

(And I note that I didn't know how to turn it off until you mentioned it, I was too overwhelmed by all the other stuff on the page)

@boston conman: I've had that firefox setting on for ages (sorry for not mentioning it in my rant). At least with my ubuntu 9.10, it seems to be ignoring it, given a choice between a ipv4 url and an ipv6 url, it goes for the ipv4 url every time, according to my server logs for nex-6, which has 2 A and 2 AAAA records. Sites like: http://ipv6-speedtest.net/ report I'm connecting with a ipv4 address. If it's just a v6 url (like www.v6.facebook.com) it works fine. Chromium is doing the same thing - so maybe it's something in my resolver? Or maybe my mtu... I have a sniffer going, maybe enlightenment will strike....

I'm actually doing a build of firefox from scratch right now (don't ask me why! )

Comment by dtaht [myopenid.com] Friday, November 12, 2010
UFTP has since moved to http://uftp-multicast.sourceforge.net
Comment by Dennis Tuesday, February 11, 2014