This stumped me for a while since I did not find any article that explained this in a TLDR; way. Frankly, I still don’t understand it fully but it works and for now I’m going to leave it at that. 🙂
As always, if you see something that is not correct or can be improved upon or just want to leave a thanks, please leave a comment below.
Table of Contents
Why should we do this?
Before the cloud providers has been built-in into Kubernetes code base (in-tree). Since ie. Vmware had to wait for a new Kubernetes release to get new code out there it is not a very efficient process. On top of that the code base is more bulky with all these features shipped in every release. I guess most people would not use more than one Cloud provider in a cluster so that means larger footprint for features not used.
This said, there is a functioning provider “in-tree” but it won’t be getting new features so Vmware has made it clear where the future is.
Some prep work
First we need to create a storage class in Vmware. This is to abstract the way storage is chosen when someone asks for a persistent volume. Go to Menu -> Policies and Profiles and then click on VM Storage Policies.
There’s likely a few of them there. I created my own but if you have one you like you can just skip this step. The guide was pretty simple. In my case I gave it a name, enabled host base rules chose custom encryption and left it at default, noted that all my local disks were compatible and the clicked on Finish.
All you need here is one Storage Class to use as the default way for Vmware to choose disks for the persistent volumes but if you have different Tiers of disk here, like slow disks, fast disks you can work with ie. tagging and have different storage classes named ie. Fast, Slow. Gold, Silver and Bronze also seems popular with the cool kids.
Installing the Vsphere provider
First, the installation. I simply followed the guide that Rancher provided but read the three sections below before you do anything so you’re well prepared. The market place installation is really easy. Warning, if you’re adding this to an existing cluster you probably will cause disruptions. I did it but this is my home lab so I did not care much. Did not lose any data but then again, experiences may vary!
If you have an existing cluster
If you have an existing cluster you need to pre-taint your nodes with uninitialized=true:Noshedule. The reason behind this is that the vsphere controller will double check that the nodes are OK before they enter the cluster.
for node in $(kubectl get nodes | awk '{print $1}' | tail -n +2); do
kubectl taint node $node node.cloudprovider.kubernetes.io/uninitialized=true:NoSchedule
done
If you have an existing cluster you also need to change rancher_kubernetes_engine_config.cloud_provider.name to external this re-configured my cluster but did not destroy my nodes.
Notes regarding the installation
- Login with the full user notation, ie. domain.local\rancher as opposed to just rancher
- In the step where you configure Storage class in the CSI installation you will refer to the VMware Storage policy you created earlier. I left the storage class name default and entered my VSphere storage class.
- I left the Node configuration blank. My nodes managed by Rancher and based on Ubutu + RKE.
VM Machine version
The last thing you might want to check is if your Nodes are running on a machine version higher than 13. If not there will be issues later on when trying to attach the persistent volume to the Nodes. You can check this in the VM details under compatibility.
If it’s below 13 I would advice you to schedule an upgrade by right clicking on the VM, choosing Compatibility and scheduling an upgrade. After this is done, cordon off the node in the Rancher cluster manager and reboot it to upgrade it.
You might also want to change the default machine version by right clicking on the Data Center in Vcenter and clicking Edit Default VM Compatibility. This did not help me when creating new nodes though as the template I was using had compatibility version of 10. In order to fix this I had to convert my template to a virtual machine, upgrade the machine version and then convert it back to a template.
You also might want to keep an eye on this GitHub issue for a way to do this via the Node template in Rancher.
Creating and using a Persistent Volume
Woah, longer article than I originally planned. Now to the fun stuff, actually putting the stuff we just prepared into practive. From what I’ve read the normal way to provision persistent storage in Kubernetes is to first pre-allocate disk space by creating a volume, then create a PersistentVolumeClaim to claim a part of, or the whole volume.
With the CSI provider Vsphere will take care of the PersistentVolume for you and all you need is to define a persistent volume claim with the correct storage class name.
Example Persistent Volume Claim
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: grafana-pvc
namespace: grafana
spec:
storageClassName: vsphere-csi-sc
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 1Gi
Note that the storage class here is the default storage class name that is chosen for you when installing CSI. Don’t confuse Storage Class with Vsphere Storage Policy!
Using the volume
There’s probably thousands of examples out there but to make this post complete, here’s one example how to run Grafana with the PVC above:
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
app: grafana
name: grafana
namespace: grafana
spec:
replicas: 1
selector:
matchLabels:
app: grafana
template:
metadata:
labels:
app: grafana
spec:
containers:
- image: grafana/grafana:7.5.2
name: grafana
ports:
- containerPort: 3000
name: http
volumeMounts:
- name: grafana-storage
mountPath: /var/lib/grafana
volumes:
- name: grafana-storage
persistentVolumeClaim:
claimName: grafana-pvc
securityContext:
runAsNonRoot: true
runAsUser: 65534
fsGroup: 472
Chaos monkey-ish
Find out where the pod is running:
Kill the node using vcenter:
Grafana still lives! Let’s delete the node:
Rancher starts to spin up a new machine directly, grafana lives:
And when rancher-prod3 is shutting down vcenter attaches the disk to the next available node:
Success!
Some final words
When you create the persistent volume claim above Vsphere will create a disk and attach it to the node running the pod in question. This happens automatically and Vsphere will move around the disk and pods for you if nodes are restarted. You can see all this happening real-time in Vcenter!
One last thing that you might want to note is that newly provisioned nodes will have the aforementioned taint for a while before the Vpshere controller marks them as ready.