0 Nov 24, 2019 — https://sf.gl/1795
Takeaways from Kubecon North America 2019
Field Report from San Diego
Let me start by giving some praise to the event organizers of Kubecon. With attendee counts more than doubling every year, with more than 12,000 attendees at Kubecon+CloudNativeCon in San Diego, California – just last year in Copenhagen, it was around 4000 – it was amazing how well it's ran. And even succumbing to rain – one of the few days a year it does in San Diego – the conference happened without any significant hitches (the rain caused a power outage at the convention center, and some breakout sessions had to be moved.)
It's also very exciting to see that most large companies in the Fortune 500 are using Kubernetes in some way, probably for production systems. One of the key moments of the conference was China Mobile who, with Faraday cage and all, set up a video chat live demo of a complete 5G end-to-end cell phone stack and performed a call across the ocean with another endpoint in Europe. The connection seemingly wasn't great, but all control software ran on Kubernetes.
The same with Walmart, who is now running a Kubernetes cluster in each of their stores, with two-way sync using Kafka and complete replication to multiple data centers (if something goes wrong.) That's an impressive Grafana dashboard.
Kubernetes is the App Store of the Enterprise
It's not hard to understand that the CNCF – and the hundreds sponsors – are pushing Kubernetes as the Enterprise application platform. The software application deployment needs of modern software stacks have evolved so quickly and are now so complex, that tools like Kubernetes are the only stacks that fulfill these demands while providing a huge ecosystem (and therefore developer adoption.)
These increasing business needs has resulted in a huge ecosystem, with everything that you could possibly want from your cluster available in a Kubernetes-compatible solution.
Statefulness on Kubernetes is Still Not Trivial
But that's not to say you can't do it. Slack, the chat app everyone seemed to be furiously typing away on at KubeCon, has engineered a truly impressive database system using Vitess/MySQL. It's not running on Kubernetes, but they are evaluating it. The stats speak for themselves: 53 billion queries per day, 7500 TB of storage. (Slides)
If you're a smaller shop, it's not impossible, but you might see your cluster becoming more like pets, and less like cattle. For those of you looking into that, Operators is the new thing. They will take care of running database servers for you, without the manual setup. There are operators available for most open-source database systems these days.
Security is Still Lacking
It seems like every few weeks we read of a company having exposed a large amount of data from an open-to-the-internet Elasticsearch cluster. That's caused by how easy it is to install it and make it accessible over the network. It didn't even use to come with any user management or authorization solutions. Kubernetes is the same way, it's just more complicated. And CNCF wants to avoid having enterprises make the same mistake.
A security audit was performed by Trail of Bits, with a big focus being on the trust zones (there are many) and ergonomics of Kubernetes – how do you manage it daily? – and found 37 issues during the source review. Slides here.
Users are confused on how to secure their Kubernetes cluster. Doing it right requires third-party functionality which makes it yet another hump to jump over. Security also greatly depends on how you run it: GKE? AWS? Kops on AWS? On-prem? Each security solution is different.
Per default, there is no way to give other users control of your cluster, secrets are not encrypted and every service you're running can talk to everything else (including Kubernetes services.) Something to be aware of if you're not running Kubernetes with a large security team behind you.
Kubernetes is Simple to Set Up, but Easy to Take Down
Employees at Airbnb gave a great talk detailing some of the was complexity was creating problems in their Kubernetes clusters. With 700 services and 1000s of nodes running the platform, they are some of the largest users of Kubernetes, and have experienced their share of issues ranging from sudden out-of-memory errors to service reliability during releases. The video of their talk is up, and you should watch it!
Being a system that thousands of organization all use to run their business means complexity. This was demonstrated by Leigh Capili of Weaveworks (great intro by the way). Building such a system requires a certain feature set, which businesses depend on. What has surprised many Kubernetes user is how easy it is to screw up those features in your deployment in such a way that, say, every time you release, your users will experience 503 errors and timeouts. And you have to do some really unexpected things to circumvent this. Most companies only notice this when they have really good monitoring (which, again, is not trivial to implement), or when they start to release many times a day.
Kubernetes is Simple to Set Up, but Hard to Upgrade
The Kubernetes story for upgrading is brutal (I'd like to have asked the Walmart guys that; see above), and something that was evidenced in a talk by Puneet Pruthi from Lyft (titled Handling Risky Business) that detailed their own tool that carried out upgrades by purging pods until it was empty so a new one could be started. They also talked about typical scenarios that may be catastrophic: etcd loses quorum (I can hear you wince), apiserver overload, cloud provider capacity/rate limits, and more.
For smaller shops, I would recommend creating a completely new cluster, moving the traffic, and deleting the old cluster. It makes sure that a) you keep your configuration in e.g. Git (you never directly manipulate your cluster with kubectl, right? ... right?), b) your monitoring works well and c) that you're ready for disaster recovery.
Another tool, Velero, was also the subject of a talk, and provides an easy CLI utility for backing up all your Kubernetes entities – and their storage backends – to a service like Amazon S3.
The Kubernetes Development Story is Still Under Development
The number 1 reason for migrating to Kubernetes is because it's hard to manage all those (micro)services, and if you think it's hard to run them on millions of dollars of enterprise server-grade hardware, it's harder to run it all on an overheating Apple MacBook Pro.
At the conference, I've seen multiple MacBook Pro owners look up in bewilderment as their workstations were lifting themselves up and flew out the windows due to the fans spinning so much. Okay, that last part was not completely true, but it's no surprise that dev environments were mentioned in the keynote. It's about the only complaint developers have about Kubernetes at my job.
But it's looking less bad now that some serious effort has been invested into the issue. We're seeing several potential solutions, including new minimal Kubernetes stacks like Microk8s and K3s as well as development tools like Telepresence, Skaffold, Tilt, Garden, and Azure Draft. But I'm still waiting for a definite solution on this.
A big feature for developer friendliness was announced at the Keynote: debugging using sidecar containers. This allows you to debug production images that don't have the typical debug tools included, so you don't have to manually install these in your running pod. Nice feature (and potentially nice for attackers, too, so keep that in mind!)
On another note, take a look at this huge list of kubectl productivity hacks from Daniel Weibel at Learnk8s.io.
Full list of videos
Lastly, it seems like all the talks are already available right now on YouTube, so go watch them all here.
What are your opinions and experiences of Kubernetes as of late? Let me know in the comments.