COSS: Operating Systems

August 31, 2021

As I wrote in the first installment of this series, one of my theses about open-source companies is that there are a few defined categories, and companies belonging to a category tend to follow similar paths.

There, we explored commercial open source database companies. Here, we’ll explore operating systems. Both databases and operating systems are where commercial open source started.

The commonality is that these are infrastructure deep in the stack, developer-facing rather than consumer-facing (as we’ll see, open source took off in server OSs but never really dented the commercial market).

The historical approach is probably most useful here.

1980-1991: beginnings

In the 1960s and 1970s, mainframes and minicomputers coupled together hardware and software fairly tightly.

That changed in the 1980s. Computer usage exploded, with PC purchases growing from 1M to 20M / year over the decade. Microsoft struck its (in)famous licensing deal with IBM, and grew into the first truly successful operating system company.

The growing commercialization of personal computing worried many in the hacker community, which in the 80s made up a sizable minority of computer users.

In 1985, an ornery, manifesto-prone hacker at MIT named Richard Stallman outlined a “free” operating system which he called GNU. Over the next few years Stallman contributed a compiler (gcc), a text editor (emacs), and a build automator (make), but not the most complex piece, a kernel.

In 1991, a 21-year-old Finnish student, Linus Torvalds, posted the source code to a kernel he built, mostly as a toy, to a relevant Usenet group. Over the next two years, a team of around 100 volunteers formed to add patches and integrate the two systems together, getting what became known as Linux to a 1.0 release.

The Linux Companies: Red Hat & SUSE

Two of the first prominent distributors of Linux were the German firm SUSE and the American firm Red Hat. These firms created their own Linux distributions, which they then sold on shrink-wrapped CDs and floppy disks.

Both Red Hat and SUSE focused commercially on their “enterprise server” offerings — a distribution of Linux specifically adopted to run in enterprise datacenters.

(The “enterprise datacenter” was itself a fairly new concept. Networking servers together to create an alternative to mainframes had emerged in the 1980s; Sun invented the network file transfers; 3Com invented the client/server model, and so on.)

Among Linux users, SUSE was more popular in Europe; Red Hat was more popular in the US (the bigger market). Both players offered support & services; SUSE’s comparative advantage was the German engineering; Red Hat’s was the American sales & marketing efforts.

They competed with a number of proprietary vendors — Unix-based systems like Sun’s Solaris, Microsoft’s Windows, and older systems like Novell Netware and IBM’s OS/2 mainframes.

There were two value propositions: price (the obvious one), and customizability:

In talking with other Linux users, [Red Hat CEO Bob] Young was told time and time again that sure, “Solaris was much better than Linux, but it was only by using Linux that he could tweak the operating systems to meet their needs.”

There are three distinct time periods since 1995:

  • The dotcom bubble (1995-2000). Data centers were the “picks and shovels” of the Internet era, so during the late 90s dotcom bubble the market grew significantly. Linux burst onto the scene, going from basically zero to around 20%, while Windows grabbed share from Novell and IBM, whose systems were not well-suited to the web era.

  • The internet consolidation (2001 - 2010). Unix-based proprietary vendors began to decay rapidly, with new installations in the mid-2000s shifting almost exclusively to Windows and Linux-based solutions.

  • Rise of the public clouds (2011 - 2021). Over the 2010s, Windows and Linux switched positions, with Linux rising from 25% to 75% market share over the course of the decade, in a quickly growing market.

With the rise of Linux, providing Linux services and support was increasingly a lucrative business for the top players:

At some point, Linux usage kicked off a virtuous cycle, where more users meant successful vendors meant more investment into the open source ecosystem, which meant a better product which in turn generated more users…

Red Hat, in particular, didn’t confine itself to operating systems. As subsequent layers were built on top of the OS (we’ll get more into these in a bit), Red Hat had its finger in each of them.

It made significant investments in the application stack through its OpenStack suite, virtualization through the KVM hypervisor, and containers through acquiring CoreOS and building OpenShift, a Kubernetes-based platform.

Red Hat leveraged its position as the leading Linux vendor, and expertise throughout the emerging open source stack, to build a strong solutions & services business that could go up against the likes of IBM and Accenture.

This effort came full circle in 2018 when IBM acquired Red Hat for $34 billion. Meanwhile, SUSE after changing hands over five times IPO-ed earlier this year and is now valued at $6 billion.

New Layers: VMs, Containers, Functions

What an operating system meant would change over and over again over the next decades — virtualization in the early 2000s, containerization in the early 2010s, serverless functions today.

Think of each of these as a layer on top of the last, working in tandem with Moore’s Law.

Roughly speaking, a physical server went from running one operating system to running ten virtual machines to hundred containers to a few hundred functions.

Each of these layers changed the nature of the businesses you could build on top.

Virtualization

In 1998, the most important paper to come out of the Stanford computer science department was probably an article called “The Anatomy of a Large-Scale Hypertextual Web Search Engine” by a couple of grad students who then took their insights to found a company called Google.

But the second most important paper was called “Performance isolation: sharing and isolation in shared-memory multiprocessors” by a CS professor, Mendel Rosenblum, and his grad students.

This paper demonstrated that you could run multiple workloads in parallel on a machine, while isolating each workload from each other so that data could not leak between processes.

By the early 2000s, advances in computer hardware had created an odd situation:

Thanks to Moore’s Law, processor clock speeds had doubled every 18 months and processors had moved to multiple cores --- yet the software stack was unable to effectively utilize the newer processors and all those cores. [As a result], enterprises were faced with data centers full of expensive servers that were running at very low utilization levels.

Rosenblum teamed up with his wife, the technology executive Diane Greene, and they started VMWare to solve this problem.

Productized virtualization allowed enterprises to dramatically increase server utilization by packing multiple applications into a single, physical server box.

In addition, by decoupling a logical operating system from a physical machine, it allowed both Windows and Linux VMs to run on the same servers, simplifying datacenter management.

Both VMWare and virtualization would go on to take over the enterprise datacenter in the 2000s. By 2008, VMWare reached over 80% of market share, with the rest taken by open source hypervisors like Xen and KVM.

XenSource, an open-source company, was acquired by Citrix in 2007 for $500M a day after VMWare’s $20B IPO. KVM was folded into the Red Hat suite of products.

But in the 2010s, while Linux overtook Windows in the enterprise OS market; VMWare held steady. In enterprise servers, VMWare is still at around 80% market share of hypervisors.

Even the large cloud providers have tended to build with custom code rather than the popular open source tools; both GCP and Azure from the start; AWS used a fork of Xen up to 2017, when they switched to KVM.

The virtuous cycle between big cloud users of Linux and (spoiler alert) Kubernetes failed to kick in here,

Containerization

By the early 2010s, the popularity of frameworks like Ruby on Rails had inspired a wave of startups providing easy cloud hosting — EngineYard, Heroku, Rackspace, and so on.

One of the key challenge these companies faced was margin pressure — they sold to customers at $5-10 per application instance per month, and had to run these applications in a way that they could still make money.

Some of these companies developed innovative packaging techniques in order to securely run several applications on a single virtual machine. They generally relied on a Linux namespacing technique known as cgroups.

One of the smaller players, a company called DotCloud, had a framework that they called Docker to manage these “containers”.

In March 2013, they decided to open-source Docker, and its popularity skyrocketed. By the 1.0 release in June 2014, it had 10,000 Github stars, a Red Hat Certification program, and official support on AWS’s Elastic Beanstalk.

Both containerization, and apps with enough users that they had to run like distributed systems, changed the nature of the game in a way that virtualization never really did.

Here’s Paul Biggar, founder of CircleCI, writing a popular post in 2015:

Docker is at the merge point of two disciplines: web applications and distributed systems. For the past decade, we in the web community have largely been building web applications by writing some HTML and some JavaScript and some Rails.

But then something interesting happened. Web applications got large enough that they started to need to scale. Enough people arrived on the internet that web apps could no longer sit on a single VPS. And as we started to scale, we started seeing all these bugs in our applications, bugs with interesting names like “race conditions” and “network partitions” and “deadlock”.

In the early years of this scalability crisis, Heroku happened. And Heroku made it really easy to scale infrastructure horizontally, which bought [us] as an industry, maybe 5 years.

We’ve now hit the limit of that, [and now] we find ourselves trying to build scalability early, and re-architecting broken things so that they can scale. We come up with phrases like “pets vs cattle”, microservices, and a whole set of practices to try and make this easier.

At this point, during this shift, Docker comes in. But instead of telling us we can keep doing things in basically the same way, like Heroku did, Docker tells us that distributed systems are what we’ve been doing all along, and so we need to accept it and start to work within this model. Instead of dealing with simple things like web frameworks, databases, and operating systems, we are now presented with tools like Swarm and Kubernetes, tools that don’t pretend that everything is simple.

As an open-source project, Docker has wildly succeeded; as a business, it’s fallen far short of its 2015 promise. Development and interest quickly shifted to Kubernetes and the orchestration layer.

In 2020, Docker sold its enterprise business to Mirantis for a reported $35 million.

Other companies emerged with competing container solutions and corresponding offerings; CoreOS; Mesosphere; etc. They didn’t seem to be successful either. Mesosphere renamed itself; Red Hat bought CoreOS.

Serverless

In November 2014, AWS introduced a new service they called Lambda, with a remarkable billing structure. Lambda let you buy cloud compute, not on an application level, but on a function level. You could specify a function to run, along with secret keys, and then invoke that function.

While similar services had been launched before, AWS Lambda was the first to gain widespread traction.

Adopting containerization requires a significant refactor of deployment paradigm and rethinking of service boundaries. But adopting serverless requires more or less a whole-app rewrite, which is why serverless adoption has been much more gradual than containerization: it’s primarily being used for a subset of greenfield development.

Because adoption has slow, the jury is still out whether large open source companies can be built on the serverless paradigm.

If serverless functions are a small part of an application, website, or infrastructure, but not the whole site, it seems unlikely for a pure-play winner to emerge.

Instead, the beneficiaries will be app and website development frameworks like Gatsby and Next/Vercel, and deployment frameworks like Terraform, for whom serverless functions are one feature among many.

Conclusion

While virtualization let applications run in a more or less unchanged state, both containerization and serverless have prompted application re-architectures to take advantage of the new patterns, sending shockwaves up the stack.

With each wave of virtualization, the coordination layer has grown thicker and more complex; first VMWare hypervisors; then container orchestration tools like Kubernetes; finally serverless frameworks.

Meanwhile, the “operating system” layer has gotten thinner and thinner; first an OS on a physical machine; then an OS on a virtual machine; then a container; then a mere function.

The action, in other words, is shifting from the operating system to the orchestration layer. That will be the subject of our next post.

Thanks to Keith Adams for feedback on this post