FortiGate HA override problems

unai_satec · ‎08-08-2018

Hi!

We have two FortiGates 201E, and we have configured a cluster to get high availability, all the interfaces which are giving services are por monitoring interfaces, so if any of them break down, the master of the cluster change. the anomaly begin when you try to come up the interface of the device which has more priority than the other one, and the device that has more priority becomes the master of the cluster and as I´ve read the secondary firewall should mantain its condition as master.

Other times when we follow the same proccess, the secondary continue being the master, but that occurs in few situations. Any idea of that?

override is disabled if you think that the problem is in this fact.

Toshi_Esumi · ‎08-08-2018

Where did you read that? At least below HA handbook:

https://docs.fortinet.com/uploaded/files/3997/fortigate-ha-56.pdf

says below in pp.46:

"With override enabled, the primary unit with the highest device priority will always become the primary unit. Whenever an event occurs that may affect primary unit selection, the cluster negotiates."

It also says below in the previous page in this HA override section:

"In most cases you should keep override disabled to reduce how often the cluster negotiates. Frequent negotiations may cause frequent traffic interruptions."

For this reason we don't use HA override.

Toshi

unai_satec · ‎08-08-2018

Thanks Toshi,

So it´s impossible to mantain the master until a manual action, although the comeup of the device with more priority?

My question was because i´ve read that if you have override disabled, the comeup of a device doesnt affect the cluster hierarchy. I think that is better to mantain the master in this situation in order to not stop the services which are being supported by the firewall.

unai_satec · ‎08-10-2018

I have found out that the fact is the ha-uptime-margin so if you have override disabled, what is recommended by forti, the devices will compare the time they have been in the cluster unit, there are a few situations in which this time is set to 0 and starts again. So I minimize the margin time and now the device with more priority dont interfere in the services until a manual intervention.

If that helped the people of the forum would be fantastic

unai_satec · ‎08-10-2018

I have found out that the fact is the ha-uptime-margin so if you have override disabled, what is recommended by forti, the devices will compare the time they have been in the cluster unit, there are a few situations in which this time is set to 0 and starts again. So I minimize the margin time and now the device with more priority dont interfere in the services until a manual intervention. If that helped the people of the forum would be fantastic

Toshi_Esumi · ‎08-10-2018

If uptime difference is within the margin (ha-uptime-diff-margin), the last factor for the master election is serial numbers. It wouldn't reduce the chances for the election for random situations. The most important thing is when you intervene or manually change one of the conditions, like trying to restore the down interface, you need to understand exactly how HA would react as the result and pre-set the conditions to keep a desirable operation.

We often (than we want to) need to break HA when troubleshooting on a slave unit at the moment.

Toshi_Esumi · ‎08-09-2018

The main issue is when you restores the monitored interface on the primary unit, it triggers a master election. It's not statefull and just decide based on the current conditions. Then obviously the unit that has the highest priority would be elected if override is enabled.

Alexis_G · ‎08-24-2018

I m not sure if i anderstand well.

If port monitoring enabled AND if an interface that was down comes up on a subordinary unit AND if this unit has more interfaces up (than the current primry) this situation is a by design behaviour (its normal)

--------------------------------------------

If all else fails, use the force !

ede_pfau · ‎08-24-2018

The algorithm which decides which unit to promote to master is aimed at 2 goals:

1- use the most stable unit as master

2- try to avoid unnecessary role changes

The criteria for determining which unit is more suitable are

- uptime (higher wins)

- serial number (higher wins)

- number of monitored ports which are up (higher wins)

- HA priority (higher wins)

(not necessarily in this order, see the HA chapter in the Handbook).

Setting on unit on HA override breaks this scheme; almost always this unit will become master. HA override just cannot override the number of monitored ports. The cluster will suffer from more failovers than necessary in case the primary unit fails (in a HA sense) and comes back up.

I used to like the idea that "FGT1" will always be the master. As management is completely transparent I nowadays don't care anymore which unit has which role. Main thing is, the cluster is working, and there are as few failovers / interruptions as possible.

Ede

"Kernel panic: Aiee, killing interrupt handler!"

Toshi_Esumi · ‎08-24-2018

The order is...

when override is diabled:

1.number of up monitored ports>2.uptime(more than 5 min diff by default)>3.priority>4.serial number

when override is enabled:

1.number of up monitored ports>2.priority>3.uptime(more than 5 min diff by default)>4.serial number

The override is to flip the order 2 and 3.