Will Kubernetes Collapse Under the Weight of Its Complexity?
By
Paul Dix /
Use Cases, Developer
May 24, 2018
Navigate to:
Thoughts About the Importance of Serving Application Developers
A few weeks ago, I attended and spoke at KubeCon EU. It was a massive event attended by around 4,700 people. I was reminded of the OpenStack summit in Paris in November 2014. It had the same level of crazy hype, vendor displays and massive conference party at a public venue that was taken over by programmers. However, I felt there was an underlying problem with the whole spectacle: everyone I talked to was either an operator or an SRE. Where were all the application developers? Aren’t those the people that all this complex infrastructure is supposed to serve? Is this community really connected with the needs of its users? And it made me wonder: is Kubernetes too complex? Will it end up collapsing under the weight of its own complexity? Will it fade away as OpenStack has seemed to since 2014?
Ok, now that I’ve gotten a bit dramatic and hyperbolic, I should say that I ultimately don’t think that will be the case. Kubernetes has a few advantages going for it that OpenStack never had. First, it’s already more scalable, so it is actually able to deliver on the dream of cluster scheduling at scale on environments of thousands of servers. Second, service offerings for hosted Kubernetes have been opened up by all three of the major cloud vendors. In just under four short years, Kubernetes has positioned itself to become the lingua franca of the cloud and infrastructure world. Yet the problem of complexity is still a vexing one.
Most Developers Don't Have Google-Scale Problems
At the core of this is the idea that Kubernetes was developed to solve Google scale complexity and types of problems. You want infrastructure that is self-healing, horizontally scalable, and declarative (as in infrastructure as code). However, the vast majority of applications and applications developers don’t work in this kind of world. Most applications are either much more modest in terms of scale (both project size and user base) or they are applications that are the beginning of a search for product market fit.
In the case of applications that simply don’t have the scale problem, they usually don’t need the added complexity of a self-healing, horizontally scalable system. It’s much easier to think of a single server database and application server that is able to handle your load. And if it’s available 99.5% of the time with decent alerting for operators to kick it, that’s fine for many applications and workloads. The complexity and increased time to market that putting up a distributed system creates simply aren’t worth the hassle and investment.
For applications that are brand new, their biggest risk is that they don’t find product/market fit. That is, they get deployed and never used. It’s code that ends up getting thrown away because it was something built that no one actually wanted. This represents the vast majority of application code that gets written. It’s certainly where I’ve spent a good deal of my career, iterating on code and features in a search for the right set. Once you find that application and feature set, you can then scale it out. But to do so before that point is a premature optimization.
The problem I see with Kubernetes is that the cognitive load in the early parts of a project are simply too high. The number of things you need to create and worry about are a barrier to starting a project and iterating on it quickly. In the early days of a project, feature velocity and iteration speed are the most important factors. I view the Heroku model as the ideal development model. You have a managed hosted Postgres database and you just git push to get new code deployed and out the door. There’s very little to think about. It may not scale to infinity and it may get expensive, but you can worry about those things once you’ve actually got a hit on your hands.
Put simply, Kubernetes made the simple things hard and the hard things possible (credit to Tom Wilkie for that line which he used in reference to a different project). If Kubernetes is really going to be successful, it needs to make the early parts of a project easy. It needs to lower the cognitive load for application developers. It’s fine if developers need to learn all the ins, outs and intricacies of Kubernetes at some point. At scale, everything is complex and difficult. But a successful development platform doesn’t put those decisions and intellectual strain in front of the developer until it’s necessary. David Heinemeier Hansson referred to this as “JIT Learning” in his RailsConf 2018 keynote.
Speaking of DHH, I’d like to use Rails as a model here. The first thing I’d like to explore is: Why was Rails so successful at the time? It wasn’t because of Ruby’s popularity (it was mostly an unknown thing from Japan), and it wasn’t because Rails and Ruby were the fastest at serving dynamic web pages. It’s because they enabled massive developer productivity gains. When DHH gave his talk on creating a blog in 15 minutes, it blew the minds of Java and .NET developers who were taking weeks to months to do the same thing. Developers picked up Rails because they could ship features faster to their users, which in the end is what all these frameworks and infrastructure are really all about.
One of the other great things that Rails did was that it generated the shell of an application for you and gave you helpers to generate scaffolds and other parts of a project. This made it easy for developers to not have to make a ton of decisions at the start of a project. How should the application code be organized, where do database models go, what does the asset pipeline look like, how to run database migrations, and many other tasks and decisions were just made for you. If you wanted to go out of bounds, that’s possible, but you didn’t have to think of it right out of the gate.
I think Kubernetes could benefit from more of these kinds of generators. There are already generators for specific resources, but that falls well short of what I’m thinking is needed. It would be great if there were templates for different kinds of applications. Or even for the most common template. The combination of a relational database, application tier, cache server, message queue, and worker pool probably encompass the complexity required for 90% or more of all applications that get built. That simple structure is able to scale to an incredible amount of requests and complexity. A template or generator that created everything out of the box, with the organizational structure for the Kubernetes resources and code, would be very helpful.
Scaffold generators for common elements would be great. Need a MySQL database? Having a command like
kubectl scaffold mysql --generate
to create a stateful set, service and everything else necessary would go a long way. Then just a single command to deploy the scaffold into your k8s environment and you’d have a production-worthy database in just a few console commands. The same could be created for the popular application frameworks, message brokers, and anything else we can think of.
Maybe the combination of operators and Helm charts covers this, but I don’t think that will cut it. Then we’re forcing developers to learn about two other things in addition to Kubernetes. Even if it’s just increasing the vocabulary and installing a new command line tool, it’s extra effort and thought. These things need to be first-class citizens and part of the Kubernetes out-of-the-box experience. They need to be accessible via kubectl.
The CNCF Project Landscape is Big and Getting Bigger
This isn’t specifically an issue with Kubernetes, but the CNCF project landscape is huge. It makes the KubeCon/CNCF conferences extremely fragmented. There were 14 talk tracks during the sessions at the conference. As a developer, which of these tools do I need for my project? Take a look at this Cloud Native Landscape.
<figcaption> Cloud Native Landscape. Image source: https://twitter.com/dankohn1/status/989956137603747840</figcaption>
It’s impossible to make sense of it. Application developers would be better served by having a happy path to follow with the tools preselected. If they want to plug different tools in, great, but they shouldn’t have to think about it up front. CNCF’s increasing complexity and broader reach might dilute the focus and brand of Kubernetes as a platform for getting things built. I’m not sure what the answer might be to this or if I’m overblowing it, but from my perspective at the conference, it was like tool porn. Why bother with solving user problems when you can spend your entire career learning about and building new tools for infrastructure?
I can’t state it enough: infrastructure is a tool to help application developers solve real users’ problems. It should optimize their efficiency and ability to deliver. The open platform that does this well will win over all others.
Required Knowledge
At some point, there is a requirement for developers to learn the infrastructure tools more deeply. Most developers that work in AWS are familiar with some of the components from that cloud, just as developers on GCP or Azure are familiar with those. Having Kubernetes as a base layer may be more promising since developers can learn this one paradigm and take it with them to any public or private cloud out there.
This portability is the reason we’ve selected Kubernetes as the infrastructure base layer for the 2.0 version of our new cloud offering that is under development. But the learning curve is there, so our development team is getting ramped up. To that end I’ve made Kubernetes: Up and Running required reading for our entire development team.
I’m hopeful that the learning curve is gentle, but Kubernetes as an ecosystem needs to embrace the needs of application developers to be truly successful. At the end of the day, infrastructure is a support tool in service of solving a customer problem that is represented in user-facing application code. Enabling application developer productivity gains is the best and most sure way to ensure the widespread and successful adoption of a platform.