Adding a Fortigate HA cluster check to LibreNMS

We realized the other day that one of our Firewalls was broken. Instead of our monitoring system (LibreNMS) alerting us we found out the “organic way” (as in OOOPS, this does not look right).

This is probably due to how Fortigate is clustered. When clustering fortigate it creates a “virtual instance” which represents both firewalls. So when we monitor a HA cluster we monitor one endpoint as opposed to ie. F5 where the two instances are managed separately. Since Fortigate only has one endpoint that is monitored and one Firewall was functioning all was well according to LibreNMS. This might have been avoided if traps was used, but we don’t).

Since LibreNMS is an open source project and managed by friendly crew (not always the case) I decided to try to develop a new check and this article will cover how this was done. Please note that I’m still a newbie concerning LibreNMS so there might be stuff down there that’s wrong. Feel free to let me know if that’s the case!

So with disclaimers out of the way, let’s dig in!

Discovery

When LibreNMS first adds a device it starts with trying to identify which type of device it is and what capabilities it has. They call this “discovery” and the code for this step is located in these locations:

  • ./includes/discovery/
  • ./includes/definitions/discovery

The definitions directory contains YAML definition files and these seems to be the preferred way of managing the available discoveries. I did not go this way though but you can find more information about YAML discoveries here. Advanced sensor discover, which I used is documented here.

Since my new monitoring endpoint was a cluster state I created a new file in ./includes/discovery/sensors/state called fortigate.inc.php.

<?php
$index = 0;
$fgHaSystemModeOid = 'fgHaSystemMode.0';
$systemMode = snmp_get($device, $fgHaSystemModeOid, '-Ovq', 'FORTINET-FORTIGATE-MIB');

// Verify that the device is clustered
if ($systemMode == 'activePassive' || $systemMode == 'activeActive') {
    $fgHaStatsEntryOid = 'fgHaStatsEntry';

    // Fetch the cluster members
    $haStatsEntries = snmpwalk_cache_multi_oid($device, $fgHaStatsEntryOid, [], 'FORTINET-FORTIGATE-MIB');

    if (is_array($haStatsEntries)) {
        $stateName = 'clusterState';
        $descr = 'Cluster State';

        $states = [
            ['value' => 0, 'generic' => 2, 'graph' => 0, 'descr' => 'CRITICAL'],
            ['value' => 1, 'generic' => 0, 'graph' => 1, 'descr' => 'OK'],
        ];

        create_state_index($stateName, $states);

        $clusterMemberCount = count($haStatsEntries);

        // If the device is part of a cluster but the member count is 1 the cluster has issues
        $clusterState = $clusterMemberCount == 1 ? 0 : 1;

        discover_sensor(
            $valid['sensor'],
            'state',
            $device,
            $fgHaSystemModeOid,
            $index,
            $stateName,
            $descr,
            1,
            1,
            null,
            null,
            null,
            null,
            $clusterState,
            'snmp',
            null,
            null,
            null,
            'HA'
        );

        create_sensor_to_state_index($device, $stateName, $index);

        // Setup a sensor for the cluster sync state
        $stateName = 'haSyncStatus';
        $descr = 'HA sync status';
        $states = [
            ['value' => 0, 'generic' => 2, 'graph' => 0, 'descr' => 'Out Of Sync'],
            ['value' => 1, 'generic' => 0, 'graph' => 1, 'descr' => 'In Sync'],
            ['value' => 2, 'generic' => 1, 'graph' => 0, 'descr' => 'No Peer'],
        ];

        create_state_index($stateName, $states);

        discover_sensor(
            $valid['sensor'],
            'state',
            $device,
            $fgHaStatsEntryOid,
            $index,
            $stateName,
            $descr,
            1,
            1,
            null,
            null,
            null,
            null,
            1,
            'snmp',
            $index,
            null,
            null,
            'HA'
        );

        create_sensor_to_state_index($device, $stateName, $index);
    }
}

unset(
    $index,
    $fgHaSystemModeOid,
    $systemMode,
    $fgHaStatsEntryOid,
    $haStatsEntries,
    $stateName,
    $descr,
    $states,
    $clusterMemberCount,
    $clusterState,
    $entry
);

Let’s walk through a few of the concepts in the code above.

States

$stateName = 'clusterState';
$states = [
            ['value' => 0, 'generic' => 2, 'graph' => 0, 'descr' => 'CRITICAL'],
            ['value' => 1, 'generic' => 0, 'graph' => 1, 'descr' => 'OK'],
        ];

$states contains the data that LibreNMS uses to map the data it gets via SNMP to the state of the sensor (good, bad etc). Let’s say LibreNMS is polling my fortigate and receives a 0 as value. It will then look at the table to find which item that has value 0 and determine that generic = 2, graph = 0 and description = CRITICAL.

Generic

The generic value determines the actual state that LibreNMS should interpret it as.

0 = OK
1 = Warning
2 = Critical
3 = Unknown

In our example above the value was 0 and this lead us to generic = 2. Thus LibreNMS is interpreting the state as critical.

Graph

I did not encounter any documentation for this yet but I do believe that the following holds true.

Graph is what value that should be displayed in the sensor history graph. In our example above graph = 0 which means that LibreNMS will set the Y-axis value to 0 as long as the state value is 0. If the state value changes to ie. 1 the graph Y-axis will be set to 1.

Descr

This is the description that is shown in the Sensor state table. In our case the descr = ‘CRITICAL’ so the badge for the alert would thus spell out “CRITICAL”.

Note how the state above (coincidently also critical, don’t confuse them) gives the badge a red color.

Adding the state to the database

This updates the database with the logic stated above:

create_state_index($stateName, $states);

And this one connects the sensor to the state translation (state index):

create_sensor_to_state_index($device, $stateName, $index);

Creating the sensor

By calling the discover_sensor function we add the endpoint to the list of things that LibreNMS should poll. Please do note that the initial value you can give the sensor here is overwritten when the endpoint is polled by LibreNMS.

discover_sensor(
            $valid['sensor'],
            'state',
            $device,
            $fgHaStatsEntryOid,
            $index,
            $stateName,
            $descr,
            1,
            1,
            null,
            null,
            null,
            null,
            1,
            'snmp',
            $index,
            null,
            null,
            'HA'
        );

Poller

If all we wanted to do was to take the raw data and map it to the state table we’d be done now. But in the case of the cluster check it was a bit more tricky as there are no values to determine the cluster member count. This is where a custom graph comes in. Official documentation for custom graphs is available here.

Let’s start with the code again and then go through some components. Please note that for the purpose of this guide I’ve added a bit more comments regarding the cluster state.

Please also note that

<?php
if ($device['os'] == 'fortigate') {
    if (in_array($sensor['sensor_type'], ['clusterState', 'haSyncStatus'])) {
        // Fetch the cluster members and their data
        $fgHaStatsEntryOid = 'fgHaStatsEntry';
        $haStatsEntries = snmpwalk_cache_multi_oid($device, $fgHaStatsEntryOid, [], 'FORTINET-FORTIGATE-MIB');

        if ($sensor['sensor_type'] == 'clusterState') {
            // Determine if the cluster contains more than 1 device
            $clusterState = 0;
            if (is_array($haStatsEntries)) {
                $clusterMemberCount = count($haStatsEntries);
                $clusterState = $clusterMemberCount == 1 ? 0 : 1;
            }
            $sensor_value = $clusterState;
        } elseif ($sensor['sensor_type'] == 'haSyncStatus') {
            // 0 = Out of sync, 1 = In Sync, 2 = No Peer
            $synchronized = 1;

            $clusterMemberCount = count($haStatsEntries);
            if ($clusterMemberCount == 1) {
                $synchronized = 2;
            } else {
                foreach ($haStatsEntries as $entry) {
                    if ($entry['fgHaStatsSyncStatus'] == 'unsynchronized') {
                        $synchronized = 0;
                    }
                }
            }
            $sensor_value = $synchronized;
// Only run this on fortigates
if ($device['os'] == 'fortigate') {

    // Only run this if the sensor type is either called clusterState or haSyncStatus
    // This ties together with the sensor name in the discovery step above
    if (in_array($sensor['sensor_type'], ['clusterState', 'haSyncStatus'])) {
        
        // Fetch the cluster members and their data
        $fgHaStatsEntryOid = 'fgHaStatsEntry';
        $haStatsEntries = snmpwalk_cache_multi_oid($device, $fgHaStatsEntryOid, [], 'FORTINET-FORTIGATE-MIB');

        if ($sensor['sensor_type'] == 'clusterState') {
            // Determine if the cluster contains more than 1 device
            $clusterState = 0;
            if (is_array($haStatsEntries)) {
                $clusterMemberCount = count($haStatsEntries);
                // If the number of members are 1, set the sensor value to 0 = CRITICAL, otherwise 1 = OK.
                $clusterState = $clusterMemberCount == 1 ? 0 : 1;
            }
            $sensor_value = $clusterState;
        } elseif ($sensor['sensor_type'] == 'haSyncStatus') {
            // 0 = Out of sync, 1 = In Sync, 2 = No Peer
            $synchronized = 1;
            $clusterMemberCount = count($haStatsEntries);
            if ($clusterMemberCount == 1) {
                $synchronized = 2;
            } else {
                foreach ($haStatsEntries as $entry) {
                    if ($entry['fgHaStatsSyncStatus'] == 'unsynchronized') {
                        $synchronized = 0;
                    }
                }
            }
            $sensor_value = $synchronized;
        }

        // Clean up
        unset($fgHaStatsEntryOid, $haStatsEntries, $clusterMemberCount, $synchronized, $clusterState, $clusterMemberCount);
    }
}
        unset($fgHaStatsEntryOid, $haStatsEntries, $clusterMemberCount, $synchronized, $clusterState, $clusterMemberCount);
    }
}

Adding tests

This probably took almost as much time to do as the actual code, but mostly because I did not really understand how it works.

There’s three components in the unit testing

  • The raw SNMP data (*.snmprec)
  • The LibreNMS code (including your code)
  • The result of running the raw SNMP data through LibreNMS (the json file)

So in order for the unit testing to pass the following equation must stand true:

SNMP + LibreNMS == json

Below we’ll go throw the following steps in detail.

Collecting data

This step takes the longes since you likely want to take some time to remove things that you don’t want public.

If there’s a lot of stuff to censor you might want to only grab the module that you’re writing code for. Since our example is a state sensor I only grabbed the sensor module (after a very helpful tip from JellyFrog).

./scripts/collect-snmp-data.php -h <HOSTNAME> -v 1500d-sensors -m sensors
OS: fortigate
Module(s): sensors
Variant: 1500d-sensors
...
Updated snmprec data /opt/librenms/tests/snmpsim/fortigate_1500d-sensors.snmprec

Verify this file does not contain any private data before submitting!

Open up this snmprec file in a text editor and clean up any sensitive data.

Parsing

Now you can parse the SNMP rec file by running

./scripts/save-test-data.php -o fortigate -v 1500d-sensors -m sensors
...
Saved to /opt/librenms/tests/data/fortigate_1500d-sensors.json
Ready for testing!

Add the snmp rec file and the json file to your pull request and pray that the almighty Travis will approve it.

Running the unit tests locally

Update 2021-01-06: Boy this is a painful set of steps to do each time so I’ll document them too.

Alpine

apk add php-xmlwriter
apk add gc
apk add musl-dev
python3 -m pip install --upgrade pip
pip3 install snmpsim
cd /opt/librenms
./scripts/composer_wrapper.php install --no-dev
./lnms dev:check unit --db --snmpsim

Ubuntu

This one is untested since I wrote this when using a docker container with alpine. Please let me know is something is off.

sudo apt install php-xmlwriter
 build-essential python3 python3-pip
sudo python3 -m pip install --upgrade pip
pip3 install snmpsim
# Only needed if not installing snmpsim globally
export PATH="$PATH:$HOME/.local/bin/"
cd /opt/librenms
./scripts/composer_wrapper.php install
./lnms dev:check unit --db --snmpsim

Database schema

During all my confusion getting this done I dove down the rabbit hole by trying to understand how the state functions work I started to dig into the database. There were so many columns and getting an overview was hard from the terminal so I figured there must be a better way.

After some Google engineering I found an excellent tool called SchemaSpy and generated an overview.

The result is published here.

Related Posts

Leave a Reply

Your email address will not be published. Required fields are marked *