Internal Rancher HA + LetsEncrypt + Google Cloud DNS

The previous article covered getting rancher up and running using their container service. This is great to test it out but for a production purpose a HA setup is the way to go.

I’m not going to cover the whole process, because they have good guides for that. Instead, I’m going to cover how to get it up and running with LetsEncrypt without having a publicly available endpoint using Google Cloud DNS and LetsEncrypts DNS-01 challenge. This guide is a mishmash of a bunch of articles out there. I did not find any that covered my specific needs so here we are!

Again, I’m no expert. Just documenting my learnings in case it helps someone else and also to help the future me if I have to do this again.

As usual, if you see something wrong, something that could use improvement or simply leave a thank you note, please do leave a comment.

Prerequisites

First you need a functioning Kubernetes cluster. I used RKE and they have an excellent guide here. Trust me, it’s easy and if you think that is hard you should try this one.

Then you can start to follow this guide up until the step Choose your SSL Configuration on which you chose to use your own certificate and install cert manager using these commands (do not run the Rancher commands or else you’ll get old CRDS installed):

kubectl create namespace cert-manager
helm repo add jetstack https://charts.jetstack.io
helm repo update
helm install \
  cert-manager jetstack/cert-manager \
  --namespace cert-manager \
  --version v1.1.0 \
  --set installCRDs=true

Then, when you come to step where you install rancher you’ll use this line:

helm install rancher rancher-latest/rancher \
  --namespace cattle-system \
  --set hostname=rancher.my.org \
  --set ingress.tls.source=secret

Note that the value of ingress.tls.source is literally the word secret. This threw me off and had me troubleshooting for way longer than I would like to admit.

Depending on if you’ve configured name resolution on your nodes or not you might run into problems with DNS. See comment from Matson Kepson below for guidance.

Adding Google Cloud DNS to the mix

Since the GCP DNS is supported by Cert-Manager, is cheap and I know their platform pretty well I went with this provider. If you use antother provider, check out their list of supported providers here.

Let Google Cloud DNS manage your domain

First you want to point your domain to Googles Name servers by pointing it’s NS records to googles name servers. You can point the whole domain there or delegate a subdomain. As long as Google Cloud DNS is managing the domain you want to sign certificates for you’re good to go.

This might take a 1+ hours depending on your NS records.

Create a service account

Depending on your GCP setup the Service Account settings might reside under IAM or Identity. Either way go to the Service Account settings in GCP.

There you want to create a new service account with the role DNS Administrator. When you’re done, click on Add Key -> Create New Key and the pick the JSON format. Your browser will now download services account credentials in the form of a JSON file to your computer. This file contains sensitive information so you might want to keep track of it.

Put the service account into a secret

In order for Cert-Manager to use the service account it needs to know the content of the json file you created just now. To make it accessible we’ll create a secret called cloud-dns-key:

kubectl create secret \
  --namespace cert-manager generic cloud-dns-key \
  --from-file=<service account json file>

Make a note of the name of the json file and then delete it from your harddrive.

Cluster Issuer

This is is the internal service which will interface with LetsEncrypt in order to sign the certificates for you.

In the YAML below you need to change these things:

PropertyExplanation
emailThis is the mail adddess where
you want to get information
about cert expirations
privateKeySecretRefGive this a name of an unused secret in your cluster
You can check existing secrets with:
kubectl get secret –all-namespaces
projectThis is the ID of your google project. Please note that the name of the google project commonly has some random characters. You can verify that full name by clicking on the project selection drop down looking at the ID column.
keyThis is the name of the service account json file you downloaded before. So if the name was my-dns-686999ca5083.json the key should also be my-dns-686999ca5083.json.
kubectl apply --filename - <<EOF
apiVersion: cert-manager.io/v1alpha2
kind: ClusterIssuer
metadata:
  name: letsencrypt-issuer
spec:
  acme:
    server: https://acme-v02.api.letsencrypt.org/directory
    email: myemail@domain.com
    privateKeySecretRef:
      name: letsencrypt-issuer
    solvers:
    - dns01:
        clouddns:
          project: my-google-project-id
          serviceAccountSecretRef:
            name: cloud-dns-key
            key: my-dns-686999ca5083.json
EOF

Run the command and verify its success by running this command:

kubectl describe clusterissuer letsencrypt-issuer

At the bottom you should see something like this:

...
Status:
   Acme:
     Last Registered Email:  myemail@domain.com
     Uri:                    https://acme-v02.api.letsencrypt.org/acme/acct/108071945
   Conditions:
     Last Transition Time:  2021-01-02T09:55:29Z
     Message:               The ACME account was registered with the ACME server
     Reason:                ACMEAccountRegistered
     Status:                True
     Type:                  Ready
 Events:                    

Create the certificate

Now that we have an issuer that’s running it’s time to create the certificate. If you used the example names above all you need to change here is the dnsNames array to contain the domains you wish to sign. In my case I chose a wildcard certificate.

kubectl apply --filename - <<EOF
apiVersion: cert-manager.io/v1alpha2
kind: Certificate
metadata:
  name: tls-rancher-ingress
  namespace: cattle-system
spec:
  secretName: tls-rancher-ingress
  issuerRef:
    name: letsencrypt-issuer
    kind: ClusterIssuer
  dnsNames:
  - "my.org"
  - "*.my.org"
EOF

Do not change the name, secretName or the namespace of the certificate. Rancher will look for a secret with the name tls-rancher-ingress in the namespace cattle-system.

Run the command. If everything was successful the ClusterIssuer should now create validation records based on the challenges from LetsEncrypt. You can look for these records in GCP to verify that the Issuer has access.

You can also validate progress by running kubectl describe certificate -n cattle-system tls-rancher-ingress.

Finally, recycle the ingress-nginx pods to make it use the new certificate:

kubectl delete pods -l app=ingress-nginx -n ingress-nginx

Troubleshooting

Rancher has a good troubleshooting guide, but I found it lacking if one does not understand the fundamentals of the installation procedure.

I’ll list two of the things I ran into below.

tls-rancher-ingress was not found

In my case this was due to me trying to use the name of the secret containing my certificate instead of simply using secret when running the helm install command. Solution to this was to set ingress.tls.source to literally the word secret and then use cert-manager to create a certificate called tls-rancher-ingress in the namespace cattle-system. I also needed to recycle the pods (look above).

No matches for kind “Certificate” in version…

This error message could also be “no matches for kind “ClusterIssuer” in version”. Make sure that you have the latest stable version of Cert-Manager (1.1 at the time of writing this post) and that you’re using apiVersion: cert-manager.io/v1alpha2 in the ClusterIssuer and Certificate manifests.

Troubleshoot the ACME challenge

The people over at cert-manager did a great job with this guide. Recommended.

Wrong project ID

Don’t forget that your project id might be domain-com while your domain might be domain.com. It’s easy to mix up these two. *cough* might have done that myself and scratched my head for a while.

Nuke everything

If you’ve gone too deep into the rabbit hole and can’t find your way back you might wish to delete all the namespaces using ie. kubectl delete namespace cattle-system.

It might be stuck in the state terminating when you do this. To solve that you need to run kubectl edit namespace cattle-system and find:

finalizers:
 controller.cattle.io/namespace-auth

Delete these lines and save the file. Then the namespace should disappear.

Related Posts

2 thoughts on “Internal Rancher HA + LetsEncrypt + Google Cloud DNS

  1. HI
    this is a good start.
    My pinpoints which i would like to share here which helped me.

    1. you do not need to deply to cattle-system namespace, cert-manager is sufficient
    2. create cert only for the wildcard domain
    3 MOST IMPORTANT- add this direct to cert-manager deployment
    “`extraArgs:
    – –dns01-recursive-nameservers=”8.8.8.8:53″
    – –dns01-recursive-nameservers-only=true“`
    taken from here > https://github.com/jetstack/cert-manager/issues/909#issuecomment-504317585

    rest is working like charm

    My config:
    rancher+gcp+LE wildcard
    Thanks!

    1. Thank you Matson!

      I’m a bit too busy at the moment to rewrite and test again but I referred to your comment in the guide.

Leave a Reply

Your email address will not be published. Required fields are marked *