The previous article covered getting rancher up and running using their container service. This is great to test it out but for a production purpose a HA setup is the way to go.
I’m not going to cover the whole process, because they have good guides for that. Instead, I’m going to cover how to get it up and running with LetsEncrypt without having a publicly available endpoint using Google Cloud DNS and LetsEncrypts DNS-01 challenge. This guide is a mishmash of a bunch of articles out there. I did not find any that covered my specific needs so here we are!
Again, I’m no expert. Just documenting my learnings in case it helps someone else and also to help the future me if I have to do this again.
As usual, if you see something wrong, something that could use improvement or simply leave a thank you note, please do leave a comment.
Then you can start to follow this guide up until the step Choose your SSL Configuration on which you chose to use your own certificate and install cert manager using this command:
helm install \ cert-manager jetstack/cert-manager \ --namespace cert-manager \ --version v1.1.0 \ --set installCRDs=true
Then, when you come to step where you install rancher you’ll use this line:
helm install rancher rancher-latest/rancher \ --namespace cattle-system \ --set hostname=rancher.my.org \ --set ingress.tls.source=secret
Note that the value of ingress.tls.source is literally the word secret. This threw me off and had me troubleshooting for way longer than I would like to admit.
Adding Google Cloud DNS to the mix
Since the GCP DNS is supported by Cert-Manager, is cheap and I know their platform pretty well I went with this provider. If you use antother provider, check out their list of supported providers here.
Let Google Cloud DNS manage your domain
First you want to point your domain to Googles Name servers by pointing it’s NS records to googles name servers. You can point the whole domain there or delegate a subdomain. As long as Google Cloud DNS is managing the domain you want to sign certificates for you’re good to go.
This might take a 1+ hours depending on your NS records.
Create a service account
Depending on your GCP setup the Service Account settings might reside under IAM or Identity. Either way go to the Service Account settings in GCP.
There you want to create a new service account with the role DNS Administrator. When you’re done, click on Add Key -> Create New Key and the pick the JSON format. Your browser will now download services account credentials in the form of a JSON file to your computer. This file contains sensitive information so you might want to keep track of it.
Put the service account into a secret
In order for Cert-Manager to use the service account it needs to know the content of the json file you created just now. To make it accessible we’ll create a secret called cloud-dns-key:
kubectl create secret \ --namespace cert-manager generic cloud-dns-key \ --from-file=<service account json file>
Make a note of the name of the json file and then delete it from your harddrive.
This is is the internal service which will interface with LetsEncrypt in order to sign the certificates for you.
In the YAML below you need to change these things:
|This is the mail adddess where|
you want to get information
about cert expirations
|privateKeySecretRef||Give this a name of an unused secret in your cluster|
You can check existing secrets with:
kubectl get secret –all-namespaces
|project||This is the ID of your google project. Please note that the name of the google project commonly has some random characters. You can verify that full name by clicking on the project selection drop down looking at the ID column.|
|key||This is the name of the service account json file you downloaded before. So if the name was my-dns-686999ca5083.json the key should also be my-dns-686999ca5083.json.|
kubectl apply --filename - <<EOF apiVersion: cert-manager.io/v1alpha2 kind: ClusterIssuer metadata: name: letsencrypt-issuer spec: acme: server: https://acme-v02.api.letsencrypt.org/directory email: firstname.lastname@example.org privateKeySecretRef: name: letsencrypt-issuer solvers: - dns01: clouddns: project: my-google-project-id serviceAccountSecretRef: name: cloud-dns-key key: my-dns-686999ca5083.json EOF
Run the command and verify its success by running this command:
kubectl describe clusterissuer letsencrypt-issuer
At the bottom you should see something like this:
... Status: Acme: Last Registered Email: email@example.com Uri: https://acme-v02.api.letsencrypt.org/acme/acct/108071945 Conditions: Last Transition Time: 2021-01-02T09:55:29Z Message: The ACME account was registered with the ACME server Reason: ACMEAccountRegistered Status: True Type: Ready Events:
Create the certificate
Now that we have an issuer that’s running it’s time to create the certificate. If you used the example names above all you need to change here is the dnsNames array to contain the domains you wish to sign. In my case I chose a wildcard certificate.
kubectl apply --filename - <<EOF apiVersion: cert-manager.io/v1alpha2 kind: Certificate metadata: name: tls-rancher-ingress namespace: cattle-system spec: secretName: tls-rancher-ingress issuerRef: name: letsencrypt-issuer kind: ClusterIssuer dnsNames: - "my.org" - "*.my.org" EOF
Do not change the name, secretName or the namespace of the certificate. Rancher will look for a secret with the name tls-rancher-ingress in the namespace cattle-system.
Run the command. If everything was successful the ClusterIssuer should now create validation records based on the challenges from LetsEncrypt. You can look for these records in GCP to verify that the Issuer has access.
You can also validate progress by running kubectl describe certificate -n cattle-system tls-rancher-ingress.
Finally, recycle the ingress-nginx pods to make it use the new certificate:
kubectl delete pods -l app=ingress-nginx -n ingress-nginx
Rancher has a good troubleshooting guide, but I found it lacking if one does not understand the fundamentals of the installation procedure.
I’ll list two of the things I ran into below.
tls-rancher-ingress was not found
In my case this was due to me trying to use the name of the secret containing my certificate instead of simply using secret when running the helm install command. Solution to this was to set ingress.tls.source to literally the word secret and then use cert-manager to create a certificate called tls-rancher-ingress in the namespace cattle-system. I also needed to recycle the pods (look above).
No matches for kind “Certificate” in version…
This error message could also be “no matches for kind “ClusterIssuer” in version”. Make sure that you have the latest stable version of Cert-Manager (1.1 at the time of writing this post) and that you’re using apiVersion: cert-manager.io/v1alpha2 in the ClusterIssuer and Certificate manifests.
Troubleshoot the ACME challenge
The people over at cert-manager did a great job with this guide. Recommended.
Wrong project ID
Don’t forget that your project id might be domain-com while your domain might be domain.com. It’s easy to mix up these two. *cough* might have done that myself and scratched my head for a while.
If you’ve gone too deep into the rabbit hole and can’t find your way back you might wish to delete all the namespaces using ie. kubectl delete namespace cattle-system.
It might be stuck in the state terminating when you do this. To solve that you need to run kubectl edit namespace cattle-system and find:
Delete these lines and save the file. Then the namespace should disappear.