Maintenance Guide¶
This guide provides instructions for regular maintenance tasks necessary to ensure the smooth and secure operation of the system.
Evacuating Nodes for Maintenance¶
When you need to perform maintenance on a node, you will need to evacuate the node to ensure that no workloads are running on it. Depending on the type of node you are evacuating, you will need to use different commands.
Control Plane Node¶
To evacuate a control plane node, you will need to drain the node. This will cause all the control plane components to be moved to other nodes in the cluster. To drain a control plane node, run the following command against the node you want to drain:
$ kubectl drain <node-name> --ignore-daemonsets --delete-local-data <node-name>
In the example above, you would replace <node-name>
with the name of the
node you want to drain. Once this process is complete, you can safely perform
maintenance on the node.
When you are done with the maintenance, you can uncordon the node by running the following command:
$ kubectl uncordon <node-name>
Compute Node¶
In order to evacuate a compute node, you will need to start by disabling the OpenStack compute service on the node. This will prevent new workloads from being scheduled on the node. To disable the OpenStack compute service, run the following command against the node you want to evacuate:
$ openstack compute service set --disable <node-name> nova-compute
In the example above, you would replace <node-name>
with the name of the
node you want to evacuate. Once the OpenStack compute service has been
disabled, you will need to evacuate all the virtual machines running on the
node. To do this, run the following command:
$ nova host-evacuate-live <node-name>
In the example above, you would replace <node-name>
with the name of the
node you want to evacuate. This command will live migrate all the virtual
machines running on the node to other nodes in the cluster.
Note
It is generally not recommended to use the nova
client however the
nova host-evacuate-live
command is not available in the openstack
client (see bug 2055552).
You can monitor the progress of this operation by seeing if there are any VMs left on the node by running the following command:
$ openstack server list --host <node-name>
Once all the virtual machines have been evacuated, you can safely perform maintenance on the node. When you are done with the maintenance, you can reenable the OpenStack compute service by running the following command:
$ openstack compute service set --enable <node-name> nova-compute
Note
Once you enable the compute service, the node will start accepting new VMs but it will not automatically move the VMs back to the node. You will need to manually move the VMs back to the node if you want them to run there.
Renewing Certificates¶
The certificates used by the Kubernetes cluster are valid for one year. They are automatically renewed when the cluster is upgraded to a new version of Kubernetes. However, if you are running the same version of Kubernetes for more than a year, you will need to manually renew the certificates.
To renew the certificates, run the following command on each one of your control plane nodes:
$ sudo kubeadm certs renew all
Once the certificates have been renewed, you will need to restart the Kubernetes control plane components to pick up the new certificates. You need to do this on each one of your control plane nodes by running the following command one at a time on each node:
$ ps auxf | egrep '(kube-(apiserver|controller-manager|scheduler)|etcd)' | awk '{ print $2 }' | xargs sudo kill