A Server in Every Home

This is it. This is the idea that's been bouncing around in my head for years. It brings together my longtime interests of web development, networks, distributed systems, and self-hosted software. If, years from now, I become "known" for this idea (not as its inventor, since I am not, but as its chief promoter), then career-wise, I can die a happy man. If I could work on anything, this would probably be the project that I work on: A Server in Every Home.

What is "A Server in Every Home"? It's not a hardware or software product. I will not try to sell you anything by the end of this article (except, of course, the idea itself!). I do think specific hardware and software are involved in the solution, and they may end up as one or more products, but I'll get to that. "A Server in Every Home" is my (temporary) name for a philosophy - one that I believe our society will have to adopt in the years to come.

It relates - among other things - to social media and communications.

As we all know, right now, all social media and most of our internet communication is mediated by a few large companies, like facebook and google. People want to connect with other people and share things, and these companies perform this service for us. They do it free of charge - meaning you don't pay money - but there are other costs to you, and I believe these costs are too high. I don't want to take too long discussing them - each one can be and is the subject of numerous whole articles - so I'll summarize only a few.

What's Wrong with What We Have Now?

One big one is content ownership. [1] Social media sites, in their terms of agreement, give themselves unlimited usage rights to anything you post on their sites. This is mainly to protect their legal behinds in our ridiculously litigious world, but it also means that you may one day see your own smiling face advertising facebook. Of course, they would never do that... right? No court on earth could stop them from using your content, or anyone else, for that matter. [1a] But I'm... 99% sure they would never do that. At any rate, is giving them the power to do it worth the use of their site? (And don't get me started on their proposed anti-revenge porn program. [2])

Another is privacy. [3] Privacy, tracking, data selling, metadata collection... There's only one reason that these companies do this: advertising. Advertising is an old and noble profession, but advertising in the internet age has gone crazy. There is so much data, and so much noise out there that everybody's head is spinning. The moment we started believing that all sorts of demographic, behaviourial, and historical data could be number-crunched into the perfect marketing target, companies very quickly became spy agencies, willing to do any number of morally questionable activities just to get the right dataset. And we all bear responsibility in this - the service is free, but hosting and distributing our content is not. What other business model can there be?

Other concerns like data leaks [4], emotional manipulation [5], and content censorship [6] are just more nails in the coffin; reminding, virtually begging us to stop turning over all our data to a handful of tech giants, just to share links and photos with our friends. Our data must be distributed.

Distributed Data

My particular interest within my field of computer science and software engineering is in distributed computing. I have an almost romantic obsession with the idea of networks and decentralized systems (which I would love to expound upon on another occasion). However, when I say our data must be "distributed", I do not mean distributed computing in the traditional "computer science" sense.

"Distributed computing" refers to processing data on more than one computer that are connected together through a network. This increases the system's complexity, but gives benefits in return such as processing speed and data availability. But we already have that now. Tech companies like facebook have data centers all over the world. If I, in Quebec, upload a photo to facebook, it may be stored principally in a server in North Carolina (as an example), with a backup or two stored somewhere else. Presumably, most of my friends also live in or near Quebec, and when they want to see my photo, their browser queries that same data center. On the other hand, if Hamish in British Columbia uploaded a photo, it may be stored in the Oregon data center. If I wanted to see that photo, facebook would send me a copy of the photo from Oregon, possibly even caching it in North Carolina, in case I wanted to see it again a few days later. This process is completely transparent, and the end user is unaware that this is happening. It makes the whole experience faster, since data is closer to where it needs to be.

For the purposes of this discussion, I am inventing a new sense for the term "distributed". Rather than being based on the question, "on what machines is the data being processed?", I am asking, "who owns the data being processed?". It has almost nothing to do with geo-replication, or network latency, or scheduling algorithms, but rather, data ownership. Data is not "distributed" on many machines, it is "distributed" across many owners. This not only applies to the data processed on the machines, but also the machines themselves.

This is all well and good, but it brings back the original problem that people wanted to solve: "how do I share my baby photos and cat videos with my friends?"

The way this was always done on the internet - and still is today - is through servers. A server is a program, running on a computer. It's constantly listening to the network for incoming connections. Another program, called a client (a web browser is a perfect example of a client program) sends a request to that program, and asks for content (a website, a photo, a video, etc). The server wakes up and says, "oh great! I get to do what I was programmed for!" and happily provides a copy of the web page or photo. Sometimes the word "server" refers to the physical machine on which the server program runs.

My proposal is that every individual runs their own server.

What Will This Look Like?

The way I see it, in the long term, every house, every apartment will have a device: a physical box that contains a small processor, a large hard drive, and an internet connection. This device is always powered on, so that it can be accessed at any time, and so that it can synchronize data during off hours. It will serve as a social media hub, a fileserver, document storage, a secure email server, and a home IoT control center - among other things.

The device might be one that you buy and set up yourself. It might instead be offered to you by your internet provider as a modem/router/server combo device. For those that are technically saavy, but don't want to buy a device, or someone that changes location a lot, he or she could install the program on a virtual private server, and operate it remotely. Worse comes to worst, you could install it on your laptop, and synchronize data whenever you turn it on.

To access social media - rather than log in to the usual sites - you would log in to your home server. You would post your photos to your server, and from there it will be available to all your contacts. Your server will connect with the servers of your friends, and they will synchronize data with each other. By the time you log in, your server will have already downloaded and cached content from the friends on your "favourites" list, and you will see their updates first. Everyone in your family will share the server in your house, with every user's data carefully partitioned. You could even host your friends if they don't have their own server yet.

The whole time, you (or a family member or trusted friend) are in control of the server, and the data that resides on it. There is no need for powerful server farms, since your server only handles your data, not the data of a million people. Because your data is stored on a machine that is physically close to you, and since most of your friends are geographically close to you, network latency is reduced for most of your contacts, and there is no need for geo-replication.

There's a lot to take into account. I will endeavour to discuss all the considerations in future articles, such as what exactly such a system would be used for (not just social media!) and what not, the costs and practicalities of operating such a system, methods of deployment, the pros and cons of distributed data (as I have explained it), the technical approaches to implementing it, and possible business models that could be associated.

Can It Be Done?

I will say at the outset that for now, this will be a hard sell. I don't expect anyone to adopt this philosophy in the near future, simply due (for one) to network effects. Very few people care about the issues I mentioned earlier. What kind of breach of trust will have to happen (that hasn't already happened) that will be enough to wrench people away from the internet data oligarchy? I don't know, but if some people agree with me on the reasoning, I think we need to get started and have something ready for when others are ready to make the switch.

Is this feasible? I believe it is! It doesn't have to be a physical box in everyone's house, but with decreasing hardware and internet costs, and increasing ease of maintenance, it can be. Many families these days are buying Amazon Echoes, Google Homes, and other "smart speaker" devices, which are just little, always-on computers. This is essentially "a client in every home". The hardware requirements for "a server in every home" are almost the same: remove the espionage microphone and loudspeaker, add a hard drive, and you're done! I already mentioned a few alternatives, including sharing servers with family and friends, running the server software on a VPS, or even running the program on your laptop or phone.

All I'm saying is that if we want to have control over our own data, then it is possible to create a system that is almost just as convenient, and in the long run, less costly than the one in place now. In an age where there is a computer in every pocket, I don't see why there couldn't be a server in every home.

Notes and References:

A special thanks to Arthur Prats Ladous and Vincent Cloutier for some of the following references and help with editing.

content ownership
1. No one would use your photos without your knowledge, for money, just because of some broad wording in the terms of use? Don't be so sure.
  - https://money.cnn.com/2015/05/28/technology/do-i-own-my-instagram-photos/
  - https://www.telegraph.co.uk/technology/2019/04/09/facebook-plans-pass-photographs-advertisers-make-users-stars/

anti-revenge porn
- https://fossbytes.com/facebook-anti-revenge-porn-tool/

privacy, tracking
- https://www.eff.org/deeplinks/2009/12/google-ceo-eric-schmidt-dismisses-privacy
- https://www.facebook.com/about/privacy/update
- See also: the attention economy
  - https://en.wikipedia.org/wiki/Attention_economy

data leaks
- Yahoo's leak was so big, it has its own wikipedia page!
  - https://en.wikipedia.org/wiki/Yahoo!_data_breaches
- https://thenextweb.com/facebook/2019/04/03/facebook-amazon-third-party-data-leak-again/

emotional manipulation

tech giants and censorship
- https://www.engadget.com/2018/04/27/social-media-has-a-censorship-problem-of-its-own-making/
- https://www.eff.org/deeplinks/2019/04/mark-zuckerberg-does-not-speak-internet