Latest News

Our 2022 infrastructure project recommendation: Link your Kubernetes with your new data layer

Written by David Walker, , Field CTO, EMEA, Yugabyte

Database veteran David Walker has an intriguing budget proposal for 2022 that’s worth considering if you are a ‘K8s’ user.

Hugely popular in the Fortune 500 enterprise space, Kubernetes has proven itself as the orchestration layer of choice for working with containers. Paralleling the growth of microservices and cloud computing for large-scale e-commerce and digital transformation, big retailers have been using Kubernetes (or ‘K8s’ in developer-speak) since 2018 to help them deal with huge peak workloads (like Black Friday) across the world.

The Kubernetes backstory is a drive to optimise infrastructure. IT went from on-premises servers as the default bare-metal machines to virtual ones. From there it moved to containers, which used even fewer resources and were more efficient than running virtual machines. Via the commercial software of Docker, this led to containers, which are great little stores of bundled programs that do useful things.

Finally, along came open source Kubernetes, which allowed developers to coordinate having lots of containers, so you can run and manage (‘orchestrate’) all the containers you need. Most usefully, K8s groups containers together for us in ‘pods’. A ‘pod’ is the smallest deployable unit of computing that you can create and manage in the system. A pod comprises a group of one or more containers, with shared storage and network resources, and a specification for how to run them. To give you an idea of the potential scale, some businesses run thousands of pods at once in production.

The stateless versus stateful conundrum

So, we like containers! We’d like to use them more, and we’d also like Kubernetes to take away some of the hard work of getting them to do what we want at scale. But, there are some complications, as a result of the difference between stateless and stateful applications. Put simply, a stateless application has just one function or service, for example, an IoT device, so it’s essentially a containerized microservices application. Because the process inside it just works with the data supplied to it each time, it doesn’t need to look back at earlier ones.

So, stateless applications don’t “store” data, unlike stateful applications which need to. Typically, stateful apps include a data store or actually are the database being accessed by other apps to look at what happened before, e.g. the transactions in a financial services app.

As it stands, the most popular mode of working with K8s is with ‘stateless’ workloads. That’s great; there are lots of ways we’d want to do that. But, as we want to use the cloud more often for business, that will require transaction-based processes.

You want to keep the great stuff that cloud and containers offer for such use cases. Previously, you had an application and it ran on your servers or clusters. These days, you have multiple microservice applications, hundreds of copies of the application. So, instead of having five application servers meeting your needs, if demand suddenly peaks, you can scale elastically and instantly have 15 servers running. This is what horizontal scalability looks like; the ability to create more containers and get more copies of the application to give you an immediate resource boost cheaply and automatically.

Logic suggests that it would be handy if you could get Kubernetes to help with big stateful applications too. We need to move from stateless applications to being able to store the data required by stateful services.

So far in its (admittedly brief) life as an enterprise IT-level tool, Kubernetes has worked around the data issue by either using old-style RDBMS/relational databases, or relying on ‘NoSQL’ databases instead. That’s been fine up until now, but we’ve ended up in a bit of a cul de sac. Relational databases (often referred to as ‘monolithic’ databases) are fantastic for any topology except cloud. No SQL databases like Cassandra and MongoDB are great in the cloud (with Kubernetes, for some use cases), but aren’t great at replicating the transactional capabilities of relational databases.

Additionally, if you have microservices, you want to be able to deploy loads of different database formats, ideally relational and NoSQL, for different uses. For a modern application, you may need a transactional database; you may have an analytical store – something like Snowflake; you may have some events or messaging – something like Kafka. In a shopping application, you might have the ‘next best offer’ serviced from your analytical data store; you may have the ‘delivery’ serviced from your messaging store; ‘shopping basket’ serviced by a NoSQL database; and ‘payment’ serviced by a relational database.

If Kubernetes can’t give you this, it starts to look like a very high-effort programming and software-heavy way to move forward, or an impossible task.

The good news is that there is an answer. It’s another ‘turn of the wheel’ that moved us from bare metal to containers: the idea of a data layer. If containers are about horizontal scaling and elasticity for applications, why not do the same for data? The data layer resolves the issue by intelligently working with data, one step below the application.

Linking the power of K8s and distributed databases as part of a data layer

The data layer is where all the different storage components you need will go. The more you can push work into Kubernetes, the less operational management work you’ll have to do, so the objective is to get more into Kubernetes.

The more that we can automate using a data layer approach, the easier it is to scale. Indeed, automation is key to working with cloud for real business at scale; it’s the only way you can drive down costs and increase the productivity of your resources without breaking the bank.

Some parts of the data layer can already operate in a data layer hosted in K8s.  Kafka is an obvious one. But, the challenge most organisations are trying to address is overcoming the limitations of their monolithic (relational) database.

If we can move from a monolithic database outside Kubernetes to an elastic, stateless database in Kubernetes, that’s great. Even better would be getting a fully-distributed database on the same platform as the application, so that you have one method of managing it all. This allows you to furtherlower your operating costs, and make the process easier to manage.

It’s still early days for this project, but several of our customers have already reported significant successes linking the power of K8s and distributed databases as part of a data layer. This suggests to me that many CIOs will have earmarked this as a top 2022 project. Maybe you should, too.