Installing, Troubleshooting and Running BigIP Ingress Controller

I found that one of the biggest hurdles if you’re not in a Cloud environment is to overcome is how to get traffic into your cluster. You could use a cloud provider for this such as Cloudflare, Akamai or Volterra and expose node ports on the internet. Or you can use the BigIP Controller to automatically configure your local F5 load balancers to ship traffic to the cluster.

As always, if I got something wrong or could explain something better, leave a comment.

The aim for this article

I’m running this on a Rancher v2.5.7 user cluster together with an Istio ingress controller. The aim is to let F5 handle the network layer and Istio the application routing. This means that I will not do any SSL termination or application routing in the F5. Instead it will just to network load balancing. I might add other things to this article later on though.

Also, I’m using Argo in my cluster and will include some screenshots to visualize how the deployment looks like, but you can just deploy the helm charts directly if you like. If you like to try Argo I have written a guide for getting started with Argo on Rancher here.

Documentation

Rant

Skip this section if you don’t want your eyes to bleed. Boy, did they make this hard on people. A Google search puts version 1.0 at the top and clicking it takes you to a page where there are no hints that you’re not browsing documentation for the latest version. There’s also no link to the latest version.

Looking a bit lower down the Google search page yields v1.5.x, but that’s also not the latest version. Also found a link somewhere with latest in the name and it takes you to v1.1!

You can find the latest versions on the releases page on GitHub and amend the documentation URL manually.
https://clouddocs.f5.com/products/connectors/k8s-bigip-ctlr/vX.Y/

But the latest version on GitHub is v2.3 but entering v2.3 in the URL leads to a dead link.

But are we done? NOPE. Furthermore, clicking on the Kubernetes and Openshift installation links takes you to a dead-end. Luckily the Helm Chart link worked. Happy days!

Useful documentation links

Skipped the rant above and here for the goodies? Good choice, here you go!

The list below is basically me cherry picking the pages I found the most useful from this documentation page.

Other useful links

Deploying the BIG-IP Container Ingress Service

Prepare your F5s

Start by creating a partition that your controller can manage. Please note that this partition should not be shared by any other deployment, be it manual stuff or another controller.

I called mine “rancher”. Very innovative.

Then install AS3 onto it by downloading it from the GitHub Release page and then import it under Package Management LX.

Installing BigIP Controller onto your Kubernetes Cluster

They documented two main ways of installing the chart, with or without the CRDS. Funnily, running the command without –skip-crds still does not install them for you so it would seem that you need to do that manually.

It’s super easy though. Just apply the YAML file from the crd sub folder on the GitHub repository using the following command:

kubectl apply -f https://raw.githubusercontent.com/F5Networks/charts/master/src/stable/f5-bigip-ctlr/crds/f5-bigip-ctlr-customresourcedefinitions.yml

Then continue by creating a secret containing credentials to your F5s. The official example specifies the admin user but I’d recommend never using the local admin account. Also, the Resource Administrator role is good enough for what it needs to do. Terminal access disabled.

kubectl create secret generic f5-bigip-ctlr-login -n kube-system \
--from-literal=username=bigip-controller --from-literal=password=<password>

Then add the F5 helm repo:

helm repo add f5-stable https://f5networks.github.io/charts/stable

Create a new folder where all of the magic yaml juice can be stored:

mkdir ./bigipcontroller
cd ./bigipcontroller

Then create a file called values.yaml and populate it with the following:

bigip_login_secret: f5-bigip-ctlr-login
rbac:
  create: true
serviceAccount:
  create: true
  name: bigip-ctlr-service-account
namespace: kube-system
args:
  bigip_url: bigip-01.domain.se
  bigip_partition: rancher
  log_level: INFO
  pool_member_type: nodeport
  insecure: true
  custom-resource-mode: true
  log-as3-response: true
image:
  user: f5networks
  repo: k8s-bigip-ctlr
  pullPolicy: Always
version: latest

The configuration above is the base minimum but there’s a bunch of options available like nodeSelector which chooses which nodes that should be in the pools that is being created. More information here.

Now, either you deploy with helm directly, or use the template function to output the YAML files for git storage.

# PICK ONE
# Save the helm chart output to deployment.yaml and then kubectl apply each file manually
# helm template -f values.yaml bigip-controller f5-stable/f5-bigip-ctlr > deployment.yaml

# Install directly with helm
# helm install -f values.yaml bigip-controller f5-stable/f5-bigip-ctlr

When you deploy the YAML, either the template result or via Helm directly your Kubernetes cluster will start to pull a bunch of containers and try to deploy them on your cluster. Will it work the first time? Probably not. 🙂

Here’s the set of initial containers in Argo being executed after running the installation. Uh, oohhh, there’s a broken heart!

To get something similar from the command line, run kubectl get pods -n kube-system and look for failed pods. Look below in the troubleshooting section for how that was solved.

Declaring F5 resources

I might add more examples here later on but for now I will only cover the Network load balancing example.

Network Load Balancing

After the broken heart was mended it is time to create F5 ingress configuration on the F5 itself. This is done by creating YAML files.

apiVersion: "cis.f5.com/v1"
kind: VirtualServer
metadata:
  namespace: istio-system
  name: istio-vs
  labels:
    f5cr: "true"
spec:
  virtualServerAddress: "192.168.1.225"
  virtualServerHTTPSPort: 443
  tlsProfileName: bigip-tlsprofile
  httpTraffic: none
  pools:
  - service: istio-ingressgateway
    servicePort: 443
---
apiVersion: cis.f5.com/v1
kind: TLSProfile
metadata:
  name: bigip-tlsprofile
  namespace: istio-system
  labels:
    f5cr: "true"
spec:
  tls:
    clientSSL: ""
    termination: passthrough
    reference: bigip

The configuration above creates:

  • Virtual Server with destination 192.168.1.225:443
  • A pool containing all your cluster nodes (providing you did not specify the aforementioned nodeSelector option)
  • A policy which forwards all traffic to the pool

Worth noting

  • You can skip the TLS Profile above but if you do the controller will create a port 80 virtual server too
  • Your service must match an existing service with that name in the cluster
  • The VirtualServer must exist in the same namespace as the Service.
  • The servicePort must be matching the Service Port of the service (NOT the nodePort or the targetPort).

Here’s the final result of a Network Loadbalancing Configuration:

Troubleshooting

Authentication issues

Did yours work better? Probably not. Ok, so the pod called bigip-controller-f5-bigip-ctlr failed. If you have support you can try to use it.

Feet, Socks, Living Room, Person, Relaxing, Table

Personally I don’t like to wait, so here’s what I did to solve my problem.

kubectl describe -n kube-system pod bigip-controller-f5-bigip-ctlr-6f77d6f478-b6lqd
... stuff removed for brevity ...
Events:
  Type     Reason   Age    From     Message
  ----     ------   ----   ----     -------
  Normal   Pulling  36m    kubelet  Pulling image "f5networks/k8s-bigip-ctlr:latest"
  Warning  BackOff  100s   kubelet  Back-off restarting failed container

Not very helpful, let’s check the logs!

kubectl logs -n kube-system bigip-controller-f5-bigip-ctlr-6f77d6f478-b6lqd
2021/03/26 07:33:31 [INFO] [INIT] Starting: Container Ingress Services - Version: 2.3.0, BuildInfo: azure-65-f3c176bb7132859516810114ec3547b75df7c37a
2021/03/26 07:33:31 [INFO] ConfigWriter started: 0xc0003b3c20
2021/03/26 07:33:31 [INFO] Started config driver sub-process at pid: 17
2021/03/26 07:33:31 [INFO] [CORE] NodePoller (0xc00046d710) registering new listener: 0x1361ea0
2021/03/26 07:33:31 [INFO] Posting GET BIGIP AS3 Version request on https://bigip-01.domain.se/mgmt/shared/appsvcs/info
2021/03/26 07:33:31 [INFO] Starting Custom Resource Manager
2021/03/26 07:33:31 [INFO] Starting VirtualServer Informer
2021/03/26 07:33:31 [INFO] Starting TLSProfile Informer
2021/03/26 07:33:31 [INFO] Starting TransportServer Informer
2021/03/26 07:33:31 [INFO] Starting ExternalDNS Informer
I0326 07:33:31.274821       1 shared_informer.go:197] Waiting for caches to sync for F5 CIS CRD Controller
I0326 07:33:31.375080       1 shared_informer.go:204] Caches are synced for F5 CIS CRD Controller
2021/03/26 07:33:31 [INFO] [CORE] NodePoller started: (0xc00046d710)
2021/03/26 07:33:31 [INFO] [CORE] NodePoller stopped: 0xc00046d710
2021/03/26 07:33:31 [ERROR] Error response from BIGIP with status code 401
2021/03/26 07:33:31 [INFO] [CCCL] ConfigWriter stopped: 0xc0003b3c20

Error response from BIGIP with status code 401 sounds like an authentication issue on the F5!

Start by logging into the device using the same credentials you’ve assigned to the CIS when you created the secret and make sure that the account has been assigned at least the Resource Admin role. If not, assign the correct permissions and re-launch the container.

Was the role ok? Then we need to dig deeper. SSH into to the F5 and run a tcpdump command which captures REST traffic in clear text. Be careful when doing this in a production environment as it will capture credentials from everyone. Be nice.

tcpdump -s 0 -nnni lo tcp port 8100 -vw /shared/tmp/bigipcontroller.pcap
tcpdump: listening on lo, link-type EN10MB (Ethernet), capture size 65535 bytes
61 packets captured
122 packets received by filter
0 packets dropped by kernel

Transfer the pcap to your machine and open it up to verify that the credentials looks ok.

In my case they did, so now we’re moving on from troubleshooting the Kubernetes deployment to troubleshooting AS3. Wireshark shows that the controller tried to use Basic Auth to access /mgmt/shared/appsvcs/info. Trying this manually also fails (surprise!).

curl -k -u "bigip-controller:supersecret" https://bigip-01.domain.se/mgmt/shared/appsvcs/info
{"code":401,"message":"Authorization failed: user=https://localhost/mgmt/shared/authz/users/bigip-controller resource=/mgmt/shared/appsvcs/info verb=POST uri:http://localhost:8100/mgmt/shared/appsvcs/info referrer:192.168.1.30 sender:192.168.70.30","referer":"192.168.1.30","restOperationId":6641357,"kind":":resterrorresponse"}

Validate that you can get a token:

curl -k -X POST -u "bigip-controller:supersecret" \
-d '{"username":"bigip-controller", "password":"supersecret", "loginProviderName":"tmos"}' \
https://bigip-01.domain.se/mgmt/shared/authn/login
{"username":"bigip-controller","loginReference":{"link":"https://localhost/mgmt/cm/system/authn/providers/tmos/1f44a60e-11a7-3c51-a49f-82983026b41b/login"},"loginProviderName":"tmos","token":{"token":"Y7UTL4RXFRBIJOOCJTGVZT5KX4","name":"Y7UTL4RXFRBIJOOCJTGVZT5KX4","userName":"bigip-controller","authProviderName":"tmos","user":{"link":"https://localhost/mgmt/shared/authz/users/bigip-controller"},"timeout":1200,"startTime":"2021-03-26T02:35:32.793-0700","address":"192.168.1.30","partition":"[All]","generation":1,"lastUpdateMicros":1616751332792939,"expirationMicros":1616752532793000,"kind":"shared:authz:tokens:authtokenitemstate","selfLink":"https://localhost/mgmt/shared/authz/tokens/Y7UTL4RXFRBIJOOCJTGVZT5KX4"},"generation":0,"lastUpdateMicros":0}

Try to use the token to fetch pools:

curl -k -H "X-F5-Auth-Token:Y7UTL4RXFRBIJOOCJTGVZT5KX4" https://bigip-01.xip.se/mgmt/tm/ltm/pool
... pool data ...

So that worked fine which would indicate that the account permissions are fine. If you’ve been around for a while you know that Basic Auth and the REST API is generally a no-no so you should also try the local admin account:

curl -k -u "bigip-controller:supersecret" https://bigip-01.domain.se/mgmt/shared/appsvcs/info
{"version":"3.26.0","release":"5","schemaCurrent":"3.26.0","schemaMinimum":"3.0.0"}

Bingo. Or… half way there. Since you should not use the local admin account we still have some work to do. Tried to re-install AS3 again. No luck. Then I updated from 16.0.1 to 16.0.1.1 and that solved it. Lovely.

No Kubernetes nodes in the F5 pool

The pool is empty, darn. How do we figure this out? Let’s start by looking at the logs of the controller.

kubectl logs -n kube-system bigip-controller-f5-bigip-ctlr-557cdc7477-6mgb4
... omitted ...
2021/03/27 12:30:19 [DEBUG] [CORE] NodePoller (0xc0006425a0) ready to poll, last wait: 30s
2021/03/27 12:30:19 [DEBUG] [CORE] NodePoller (0xc0006425a0) notifying listener: {l:0xc000091920 s:0xc000091980}
2021/03/27 12:30:19 [DEBUG] [CORE] NodePoller (0xc0006425a0) listener callback - num items: 3 err: <nil>

Looking at num items: 3 err: <nil> and the code of the NodePoller we can see that it returned 3 nodes and no errors. That’s good, but we still don’t get any nodes in the pool. Continuing to dig we can find the following:

kubectl logs -n kube-system bigip-controller-f5-bigip-ctlr-557cdc7477-6mgb4
... omitted ...
2021/03/27 12:23:42 [DEBUG] Found endpoints for backend istio-system/istio-ingressgateway: []

Endpoints for istio-ingressgateway returns an empty list. The plot thickens! Validate that the service in question exists and that you have specified in the correct port in the pools section of the VirtualServer manifest (look above). To do that, start by running kubectl describe for the service you want to load balance:

kubectl describe service -n istio-system istio-ingressgateway
... omitted some info ...
Name:                     istio-ingressgateway
Selector:                 app=istio-ingressgateway,istio=ingressgateway
... omitted some info ...

Port:                     https  443/TCP
TargetPort:               8443/TCP
NodePort:                 https  31390/TCP
Endpoints:                10.42.2.9:8443

In my case above I want to load balance 443 (note that it is the Port that should be specified in the pool configuration, not the TargetPort. Looks fine to me, moving on!

Validate that you’ve started the controller with the –pool-member-type=nodeport argument in your controller Deployment. I made the mistake of accidentally specifying both nodeport and cluster mode which made it choose cluster mode since it was the last one specified. Luckily Stanislas Piron came to the rescue and pointed out the issue. Details regarding that available here.

Related Posts

Leave a Reply

Your email address will not be published. Required fields are marked *