Support Forum
The Forums are a place to find answers on a range of Fortinet products from peers and product experts.
Neko
New Contributor II

How to create a HA cluster for a FortiGate 60F - Running into issues when building one

Got a site which has a FortiGate 60F with a configuration which we would like to enhance by adding a secondary FortiGate 60F in an active-passive cluster. Main goal here is to leave the site up-and-running while performing a firmware update or something of the sort. The FortiOS active (at present) is 7.0.13. So I found this: https://docs.fortinet.com/document/fortigate/7.0.13/administration-guide/900885

 

Seems pretty straight forward, but rather than dive in head first on site, I figured... I have two spare FortiGates and a Fortiswitch, so I can just build a testcase and see what happens. Not that we're playing with an HA cluster anyday, so we'd rather make sure I have a reproducable result before trying it in the liver environment. So did just that, and while I needed to accomodate for the test environment (by authorizing the switch and adding a couple of VLANs to facilitate the topology), I managed to get the existing (non HA FortiGate) situation up and running with a copy of the existing configuration on a desk. 

 

So following the procedure, made the connections, used ports 2 and 4 as the heartbeat on the existing FortiGate and factory reset the second FortiGate. Once up I removed port 2 and 4 from the 'Internal' group, and dropped into the HA menu. Set the exact same thing just with a higher priority number (trying to force the active member to be the primary and thus retain the configuration). 

 

Once the secondary unit came back it was complaining about not being in sync and we could see the checksums were inconsistent. Leaving it for 30 minutes didn't seem to eventually resolve it tho. 

 

Next workday, we made the primary unit standalone again, and factory reset the second unit and tried again. Toying a bit with the priority setting, but with pretty much the same result. The system is complaining about 27 to 29 tables being out of sync, which doesn't make much sense to me, seeing the factory reset device should have grabbed the whole configuration (less than 1 MB) on joining the cluster. We left it during the night, and apparently it corrected itself eventually (unsure what it does in the sync part, so also unsure why that would take that long), tho that may have been a fluke. 

 

When manually checking the configs we found differences between the two which seemed mostly to be focused on ordering... So where one configh had three objects in order 123, the second unit would have the same objects but in order 132 for instance. 

 

Then today my colleague redid the whole thing from scratch. Factory reset both FortiGates, redownloaded the configuration from the active onsite unit, uploaded that to the primary unit and created the caveats needed to facilitate the test-environment. Then did a factory reset on the second unit again, made the alterations for the heartbeat ports by excluding them from the internal interface group, and joined the cluster (in this case using the same priority on both FortiGates). Which again was then complaining about not being in sync... However this time round we found that the FortiGate with no config which was supposed to join apparently decided it was the master, and thus wiped the entire config on the other unit (which had the active config for the site).

 

Retried again, but just making the priority of the second unit higher, and still no dice... So we only got an insync HA cluster once out of the 4 times we tried. Not good odds to go onsite and just pray it'll work as required there. 

 

We're looking for any insight in how to either:

1) Designate which configuration from the two HA units is the one that should be retained

2) Force push the configuration in it's entirety to the fellow HA member, thus forcing a full sync between the two. 

3) Any expiriences with HA clusters on FortiGates that may indicate we're missing something or what we could do better. 

 

The provided URL is rather brief on how the process works, and doesn't hint on the issues that arise when actually doing the procedure, The ins-and-outs of the process seem to be a lot more involved than the overview seems to suggest.

 

As it is, I'm not feeling too confident to go onsite and just add the second FortiGate, then hope and pray all goes well... 

I reject your reality, and substitute my own - Adam Savage

I reject your reality, and substitute my own - Adam Savage
1 Solution
fricci_FTNT
Staff
Staff

Hi @Neko ,

 

Bearing in mind that I have not seen your config and I do not have the details of the steps you performed, it seems that you have followed the correct procedure.
When a new unit is configured to join a HA cluster, the confsync might take some time (your config seems to be small anyway). The sync process continuously check the HA config sync and if it finds some difference, it tries to sync. That is why leaving the units overnight might have helped in one occurrence.

Alternatively to the official procedure described in the document you linked, you can try the below:
1- backup the config from primary unit in production.
2- both test FortiGates should be disconnected from each other (no HA cables connected).
3- restore the config exported into the test primary FortiGate.
4- properly set HA config settings on test primary.
5- make a new copy of the exported config, edit it [in notepad], and properly change HA settings, hostname and mgmt IP for the secondary unit (mgmt IP same as primary unit if you do not use reserved mgmt interfaces).
6- restore the edited config on test secondary unit.
7- connect HA cables (only) between test primary and test secondary units.
8- if you still see some difference, you can force the hash recalculation or the ha re-sync (execute ha synchronize stop" -->"execute ha synchronize start" --> "diagnose sys ha checksum recalculate").
9- once they are in sync, you can connect the data cables on the test secondary unit,

The following articles may further clarify your doubts:
- HA primary election process:
https://community.fortinet.com/t5/FortiGate/Technical-Tip-FortiGate-HA-Primary-unit-selection-proces...
- HA sync troubleshooting:
https://community.fortinet.com/t5/FortiGate/Technical-Tip-Troubleshooting-a-checksum-mismatch-in-a-F...
https://community.fortinet.com/t5/FortiGate/Technical-Tip-Procedure-for-HA-manual-synchronization/ta...


Best regards,

---
If you have found a useful article or a solution, please like and accept it to make it easily accessible to others.

View solution in original post

9 REPLIES 9
Neko
New Contributor II

My colleague had a bit of trial and error today, and managed to get the two _mostly_ in sync, by only connecting the heartbeat cables... At that point there were still two tables causing issues (system.custom-language, and firewall.internet-service-name). 

 

Might be due to some incompatible character somewhere, unsure where the issue is. Then again, given time it might resolve it on it's own, so we'll have a look tomorrow after leaving it to it's own devices for a night. 

 

The diag sys ha checksum cluster is showing that the global entry is different between units, but the root entry is in sync. So overall there is some means of progress here. After that it's a question if attaching the cables would suffice. 

 

One thing of note which _may_ cause issues (tho logically I wouldn't expect it tho), is that for the primary FortiGate the switch is connected between port A on the FortiGate to port 48 on the switch (FortiLink). For the secondary unit we opted to connect port A on the FortiGate to port 47 on the switch. 

 

From my understanding of the HA A-P cluster, the second unit only starts actively using ports and shuffling data if a failover occurs. So having it connected to all ports and port 47 for the uplink shouldn't really matter. Those ports are not actively used while the FortiGate is in secondary modus.

 

Unless it IS using the FortiLink connection to actually pull it's data from it's partner (which we're assuming is done over het heartbeat cables). Again, insight from anyone would be appreciated. 

I reject your reality, and substitute my own - Adam Savage

I reject your reality, and substitute my own - Adam Savage
fricci_FTNT
Staff
Staff

Hi @Neko ,

 

Bearing in mind that I have not seen your config and I do not have the details of the steps you performed, it seems that you have followed the correct procedure.
When a new unit is configured to join a HA cluster, the confsync might take some time (your config seems to be small anyway). The sync process continuously check the HA config sync and if it finds some difference, it tries to sync. That is why leaving the units overnight might have helped in one occurrence.

Alternatively to the official procedure described in the document you linked, you can try the below:
1- backup the config from primary unit in production.
2- both test FortiGates should be disconnected from each other (no HA cables connected).
3- restore the config exported into the test primary FortiGate.
4- properly set HA config settings on test primary.
5- make a new copy of the exported config, edit it [in notepad], and properly change HA settings, hostname and mgmt IP for the secondary unit (mgmt IP same as primary unit if you do not use reserved mgmt interfaces).
6- restore the edited config on test secondary unit.
7- connect HA cables (only) between test primary and test secondary units.
8- if you still see some difference, you can force the hash recalculation or the ha re-sync (execute ha synchronize stop" -->"execute ha synchronize start" --> "diagnose sys ha checksum recalculate").
9- once they are in sync, you can connect the data cables on the test secondary unit,

The following articles may further clarify your doubts:
- HA primary election process:
https://community.fortinet.com/t5/FortiGate/Technical-Tip-FortiGate-HA-Primary-unit-selection-proces...
- HA sync troubleshooting:
https://community.fortinet.com/t5/FortiGate/Technical-Tip-Troubleshooting-a-checksum-mismatch-in-a-F...
https://community.fortinet.com/t5/FortiGate/Technical-Tip-Procedure-for-HA-manual-synchronization/ta...


Best regards,

---
If you have found a useful article or a solution, please like and accept it to make it easily accessible to others.
Neko
New Contributor II

Saw that as a valid means of also going about things, but wasn't exactly sure what fields would be editable as beyond the scope of the HA sync and which not. You name three, wheras I've seen some documents referencing atleast 6 items that are applicable to changing prior to restoring to the secondary unity. Hence not something we tried yet. 

 

As to the diagnosing with the ha stop, start and recalculate... Already had a go with that, but the output (in so far there is output) isn't all that clear on what is going on.

 

Based on that I just wanted to see about telling the set: Look... This primary unit has the appropriate config.The secondary has no config, or is confused... Just grab the whole config from the primary, overwriting anything in the secondary and be done with it. But there isn't a command or something to force that from what I've seen. 

 

From your comment I gather that it might do that automatically already, but at a snails pace... And with a config around 830 kByte I'd expect that to be near instant.

 

Frankly what we;'re trying to achieve here is a means to enable the HA sync conclusively with minimal effort atleast twice or thrice in our test setup before feeling confident enough to try it in a live setting leaving the network as much up and running as possible (seeing downtime is undesirable for that site. In fact, the elimination of downtime in so far we can is also why we're looking at an HA cluster to alleviate issues during firmware updates and the like).

 

So far we've spent about a day on the FortiGates and still feel like we're not getting what we want/need, despite the procedure only having 6 (somewhat deceptively simple) steps... And the troubleshooting commands don't really provide much insight in their output as to what is the actual problem it's struggeling with (nor telling us why it's a problem), so we can go in directly and sort that specifically. Hence my asking for a bit of insight here. 

 

I'll confer with my colleague to see if the above process is something we want to try, seeing that would force the config on the secondary unit to be identical to the first. Still makes me wonder why it would be needed to do it that way if the process itself makes it seem like it's not needed at all. 

I reject your reality, and substitute my own - Adam Savage

I reject your reality, and substitute my own - Adam Savage
fricci_FTNT
Staff
Staff

Hi @Neko ,

 

I can understand your concerns. I mainly work with FortiGate 6000/7000 series, they have a slightly more complicated architecture compared to normal FortiGates. From my experience HA config sync issues can happen sometimes (not so often multiple times in a row) and I am wondering if the specific issue you are experiencing depends on your specific config or platform. Anyway following the standard procedure (as per your link) should be sufficient to get a HA working and it should not take such long time.

Please feel free to open a ticket with our support to investigate the HA sync issue more in detail.

Best regards,

---
If you have found a useful article or a solution, please like and accept it to make it easily accessible to others.
Neko
New Contributor II

Okay... Left it for 23 hours, and it seems to have sorted itself, besides two bits of the configuration which still cause it trouble.

 

We're unsure if this is the result of a leftover from previous versions (6.2/6.4 being updated to 7.0 and so forth), but atleast I can narrow down the differences in the parts it's having issues with.

 

Maybe someone here might want to suggest something to get this in line?

 

First bit seems to be the system.custom-language.

Where the primary lists the following:

 

config system custom-language
edit "en"
next
edit "fr"
next
edit "sp"
next
edit "pg"
next
edit "x-sjis"
next
edit "big5"
next
edit "GB2312"
next
edit "euc-kr"
next
end

 

The secondary seems adamant on using:

 

config system custom-language
edit "en"
set filename "en"
next
edit "fr"
set filename "fr"
next
edit "sp"
set filename "sp"
next
edit "pg"
set filename "pg"
next
edit "x-sjis"
set filename "x-sjis"
next
edit "big5"
set filename "big5"
next
edit "GB2312"
set filename "GB2312"
next
edit "euc-kr"
set filename "euc-kr"
next
end

 

Can't seem to unset the filename on the secondary unit, nor set the filename on the first... So that may well require a bit of digging around. May result in a change to make to the running config beforehand to circumvent this issue from cropping up all together.

 

Second bit is in the firewall.internet-service-name bit. For the primary:

 

config firewall internet-service-name
edit "Microsoft-Intune"
set internet-service-id 327886
next

At that position in the config, the secondary doesn't have that service. A looooooong way further down however, the primary doesn't list something, where the secondary lists:

 

config firewall internet-service-name
edit "Microsoft-Intune"
set internet-service-id 327886
next

 

So the entry is present on both primary and secondary, but the order in which they're listed is different... And I have no clue how to resolve the order, other than dumping the entire config from the second unit, and reloading it with an edited config of the first as per the above instruction.

 

Any insights are welcome...

I reject your reality, and substitute my own - Adam Savage

I reject your reality, and substitute my own - Adam Savage
Neko
New Contributor II

That Microsoft-Intune thing seems to be factory-default thing or something. Unable to remove on either node of the HA cluster. That was my idea, to remove it on both ends, and re-add it on the primary.

 

As to the custom-languages... Can't remove the filename on the secondary HA node seeing it is a required setting. So setting it on the primary was my next move... which seems to indicate those custom things don't exist, and thus the filename isn't able to be set at all. 

 

We strongly start to suspect that some configuration inheritance from older versions of the FortiOS are to blame or something. 

 

Monday we're both in the office, and I've pulled the config from the primary unit. 

Modified two things in the config:

 

The priority for the HA (I left everything else as-is, including the password and the like)

The hostname

 

There sn't a management IP, so that shouldn't be an issue. 

 

We intend to factory reset the second node, login to it, and then forcefeed it the altered config, reboot and see what happens. 

 

If anyone has any insights in whatwe might need to change further, or if we're on the wrong track with trying this, let us know. 

I reject your reality, and substitute my own - Adam Savage

I reject your reality, and substitute my own - Adam Savage
Neko
New Contributor II

Took a bit of time today to test the above. 

 

Grabbed the altered config (with just the priority for the HA and the hostname changed), factory reset the second unit, logged into the GUI, and restored the config. 

 

Second unit rebooted and came back up, and joined the HA cluster fully in sync. 

 

Only weird thing was the priority for the second unit (which I set to 300) was dropped to 128. Possibly because 300 wouldn't be a supported value? Also unit one had a priority of 200, but was also dropped to 128. So alternatively it may well be the similar HA config that causes some conflicts here, and as a result of the automatic process of trying to resolve that the priorities get reset or something. 

 

Anyway, this does provide a means to implement the cluster on-site with (hopefully) a minimum of downtime (we're still going to have downtime seeing the routers are now connected directly to the primary unit there, and those should be connected through a VLAN on the switches to allow both FortiGates to access the routers). 

 

From the looks of things that means a bit of planning on our end when to do this. 

I reject your reality, and substitute my own - Adam Savage

I reject your reality, and substitute my own - Adam Savage
Toshi_Esumi

fg40f-xxx (ha) # set priority ?
priority Enter an integer value from <0> to <255> (default = <128>).

Toshi

Neko
New Contributor II

Thanks... Not that I don't know how to change the priority (GUI also provides some means for that), but more that I wanted to make a note of the change made when importing the config to the second unit. 

 

The config stated a priority of 300, but the import dropped that to 128 on its own. 

I reject your reality, and substitute my own - Adam Savage

I reject your reality, and substitute my own - Adam Savage
Labels
Top Kudoed Authors