The Kubernetes Test for Modernization

Coming of age as a programmer around the .com boom of the late 90s and early 2000s, there was one blogger that had a big influence on the developer me, and that was Joel Spolsky of Fog Creek Software. One of his posts went viral and became the cornerstone of what true software development was supposed to be like, called the Joel Test.

Over 20 years from when he published his test, most of the points he raised have been adopted through most industries from software/SaaS to even traditional brick and mortar companies such as insurance. Even a majority of open source projects follow his checklist. What’s in it? Things like if your organization uses source control, a daily build process, an issue tracker, and whether specs are written for new features. This may not sound revolutionary, but at the time, you were considered lucky if someone even used a spreadsheet to track the things they were working on or outstanding issues.

Why am I thinking about this now, and what does this have to do with the blog topic? I was recently reflecting on my recent engagements where I’ve been helping a few clients lay out a strategy for how to modernize their stack. They fell within two camps:

  1. Avoid it like the plague as it appears too complex.
  2. They’re experimenting with it in a non-production or skunk-works fashion.

To be honest, the concern of complexity isn’t entirely wrong. It is very complex, but there is quite a bit of complex things going on under the covers. Additionally, it requires some organizational infrastructure to be in place. To help gauge whether you’re ready to explore Kubernetes, keep your evangelists at bay, and speed up your adoption to see what the fuss is about, I’ll provide some guidance by asking a few questions in a fashion inspired by Joel’s Test.

  1. Are you using Docker?
  2. Are you utilizing the microservices architecture pattern?
  3. Do your services report their health and metrics?
  4. Do your services need to persist data?
  5. Do your services communicate solely over TCP?
  6. Are your services event driven?
  7. Are you comfortable living on the bleeding edge?

Are you using Docker?

At it’s core, Kubernetes is a service orchestration platform. Originally built around Docker, it can use any container management system provided it adheres to the containerd specification. This provides a common interface for the cloud providers to manage Dockerized containers within their own platforms such as AWS’s Fargate and Cloud Foundary.

What this would mean for you and your organization is that your applications must be containerized before you can think of leveraging the Kubernetes ecosystem. Not only by generating images, but having a registry to host them. Luckily platforms like JFrog’s Artifactory and Sonotype’s Nexus can support this.

Are you utilizing the microservices architecture pattern?

Kubernetes really shines when your applications are built using the microservice architecture. Kubernetes can automatically handle all routing and isolation between the services ensuring they are locked down to just the services that need them. Additionally, Kubernetes can ensure the required number of services remain operational and routeable without having your team manage their own service registry. Kubernetes will automatically handle the scheduling of the services across the nodes in the cluster and ensure the number of services remain functional.

However, for this to work, the service must be treated as ephemeral.

Do your services report their health and metrics?

By default, Kubernetes incorporates a health check for all registered services to ensure the underlying services are behaving properly. However this health check only verifies the process is running. You can define your own health liveness probe to ensure that the service really is up and accepting traffic. Additionally, Kubernetes can be configured to use Prometheus or other metrics/tracing components that adhere to the OpenTelementry APIs. This will provide finer grained metrics and alerting when your services are failing.

One important point that I didn’t mention on why this matters is that Kubernetes allows for the concept of “self-healing” so if it detects the service is misbehaving, then it will try to either fix it, or restart it elsewhere based on the rules you provide.

Do your services need to persist data?

While Kubernetes is great for providing a service orchestration platform, up until fairly recently, the expectation is that all services are ephemeral. Don’t worry, Docker had the same limitation for a time as well. This prevented organizations from benefiting from some of Kubernetes features such as the self-healing capabilities and service isolation if they tried to Dockerize their database application.

To resolve this issue, the CNCF created a working group dedicated to storage. At this point, they use the concept of a persistent volume claim to obtain a persistent block of storage, but it is still a work in progress with the associated complexity and stability issues one would expect.

Do your services communicate solely over TCP or HTTP

Kubernetes is very dependent upon TCP for its services and by extension HTTP. This is primarily historical as Google is effectively a web company so HTTP is their bread and butter. Same for most companies that are looking at adopting Kubernetes. However, I still recall a time when applications were written with an eye towards their own protocols and ports instead of using HTTP as the common carrier.

What this means is if your application makes use of UDP, or creates an ephemeral port for communication (such as SIP servers), then additional care and investigation may be required to see if Kubernetes is the right platform for this application as some it is a technical limitation of the underlying framework.

Are your services event driven?

Kubernetes by it’s nature, doesn’t care if a service is event driven or not, but there are some sub projects such as Knative that bring the power of serverless computing found in services such as AWS Lambdas or OpenFunctions to Kubernetes. Additionally, through Knative, they are bringing an eventing platform built on top of platforms such as AWS Kinesis or Confluent’s Kafka.

The caveat with this ecosystem is that it does require the event driven service to be written as an HTTP service (see the previous question asking if your services communicate solely over TCP or HTTP).

Are you comfortable living on the bleeding edge?

This is perhaps one of the most important questions. Kubernetes, and the underlying software it is dependent upon, is constantly changing where Kubernetes itself is on a three month cadence when it comes to releases. While they do a decent job of ensuring micro point releases don’t break things, features to come and go quite frequently on the minor point releases. This holds especially true for
some of the more active components such as Knative and Tekton. Unless you or members on your team are tracking the active development announcement lists, then be prepared to encounter breaking changes during the upgrade process.

Some would say this is a feature as it shows the project is active and constantly being maintained, but putting on my dusty sustaining engineer hat, this is not a recipe for success as it forces organizations leveraging the software to maintain an even more rapid cadence for updating their software to account for the breaking changes in addition to delivering value to the business they are supposed to support.

The biggest advantage to Kubernetes is that it provides a common and very powerful abstraction on top of several dependent technologies. It is agnostic about how some of the underlying can be integrated giving you ultimate say in what to use, but with that power comes the great responsibility in understanding how the underlying tools function, and keeping everything in sync.

Even though Kubenetes itself has been around for about 10 years, the core parts of it have stabilized, but it is the edges where the true flexibility comes from, and can be quite powerful if also a little jagged.