Upgrades:
In general, with any upgrade/update there is always likelihood things might fail so your team need to have multiple strategies, here are few suggestions that
- Test upgrades on non-prod first (must)
- AKS Upgrade API: The in-place upgrade should ideally work (easiest), one risk here is sometimes this can fail.
- New Node Pool: To minimise risk creating a new node pool in the same cluster with the latest version and then draining older one can be a bit more risk-averse approach.
- Blue Green with Node Pool - Idea is to deploy two active node pools running workloads so instead of one node pool with all the VMs use a two-node pool dividing the same number of VMs. When it comes to upgrading you apply the in-place upgrade to first node pool and then to the second one on success. In case of failure, you can easily scale the running node pool and scale down failed one.
- Creating a new cluster, this is a good approach however will need to be managed carefully as it starts to have a dependency on other components like Ingress traffic, persistent volumes etc.
- Blue Green with AKS Cluster - This is another approach where you always have two clusters running workload when it comes to upgrading you can always upgrade one cluster at a time.
I came across interesting MS Ignite Talk with some of the approaches I discussed here
Security Patches / Reboots:
Linux
- AKS automatically downloads and installs security fixes on each Linux nodes but does not restart
- You can run kured (KUbernetes REboot Daemon) as a demon on your cluster that can detect and restart nodes
- https://docs.microsoft.com/en-us/azure/aks/operator-best-practices-cluster-security#process-linux-node-updates-and-reboots-using-kured
- In exceptional circumstances, such as a node experiencing a permanent failure whilst rebooting, manual intervention may be required to remove the cluster lock see https://github.com/weaveworks/kured#manual-unlock
Windows
- Microsoft do not apply patches on existing nodes – so to get latest patches for Windows nodes, you can either upgrade the node pool or upgrade the node image.
- This is user's responsibility, here are details on how to https://docs.microsoft.com/en-us/azure/aks/windows-faq#how-do-i-patch-my-windows-nodes
Useful AKS Resources
AKS Current preview features: https://aka.ms/aks/previewfeatures
AKS Release notes: https://aka.ms/aks/releasenotes
AKS Public roadmap: http://aka.ms/aks/roadmap
AKS Known issues: https://aka.ms/aks/knownissues