Support Forum
The Forums are a place to find answers on a range of Fortinet products from peers and product experts.
Johan_Lysen
New Contributor

Heartbeat device(interface) down

Hi I have a cluster that seams to works OK, but still i get these messages; ---------------------------------------------------- Message meets Alert condition The following critical firewall event was detected: Critical Event. date=2011-09-01 time=14:34:00 devname=SE-OSD-FGT-001 device_id=FGT60C3G10013303 log_id=0105037901 type=event subtype=ha pri=critical vd=" root" msg=" Heartbeat device(interface) down" ha_role=master hbdn_reason=neighbor-info-lost devintfname=dmz Message meets Alert condition The following critical firewall event was detected: Critical Event. date=2011-09-01 time=14:34:00 devname=SE-OSD-FGT-001 device_id=FGT60C3G10013303 log_id=0105037901 type=event subtype=ha pri=critical vd=" root" msg=" Heartbeat device(interface) down" ha_role=master hbdn_reason=neighbor-info-lost devintfname=internal4 ---------------------------------------------------- FGT60C-4.00-FW-build458-110627 HA settings looks like this on the " primary" : config system ha set group-id 7 set group-name " FGT-HA" set mode a-p <b> set hbdev " dmz" 100 " internal4" 50</b> set override disable set priority 150 set monitor " internal1" " internal2" " internal3" " wan2" end Any ideas?

Johan Lysen Consulting AB Johan Lysen, Johan@Lysen.nu Byvagen 87, 832 46 FROSON Mobile: +46 70 6009221

Johan Lysen Consulting AB Johan Lysen, Johan@Lysen.nu Byvagen 87, 832 46 FROSON Mobile: +46 70 6009221
10 REPLIES 10
ede_pfau
SuperUser
SuperUser

Hi, from what it looks like the master has lost connectivity on both HA links simultaneously (' dmz' and ' internal4' ). Did you observe that the cluster has failed over? As for the reason I can only guess... - both physical connections have failed (i.e. were pulled) - quite unlikely - the master unit failed completely - FortiOS error You' re running 4.3.1, which is daring IMO. As long as you don' t find any other indication I' d bet on FortiOS failure. Some guesses: Can you observe signs that CPU and/or memory usage is exceedingly high? Did a signature update happen shortly before the HA failure? If the master unit still is alive, is the HA info synched? If the HA master has been demoted to slave now, you may reboot the unit without affecting the (live) network it is in. Depending on the HA settings it will fail over to master again after rebooting, or stay standby. IMHO you have only chances to open a support case if the behaviour is repeatable.

Ede

"Kernel panic: Aiee, killing interrupt handler!"
Ede"Kernel panic: Aiee, killing interrupt handler!"
Johan_Lysen
New Contributor

Hi There is no failover involved, the diag sys top doesn´t show high cpu. We get this issue say, 1-10 times each day. no ticket created yet...

Johan Lysen Consulting AB Johan Lysen, Johan@Lysen.nu Byvagen 87, 832 46 FROSON Mobile: +46 70 6009221

Johan Lysen Consulting AB Johan Lysen, Johan@Lysen.nu Byvagen 87, 832 46 FROSON Mobile: +46 70 6009221
ede_pfau
SuperUser
SuperUser

OK, so the cluster just detects that HB packets were lost but the threshold is high enough to prevent a failover. You can now - enlarge the interval the cluster members will wait until they detect a HB packet loss
 config system ha
 set hb-lost-threshold 6
 set hb-interval 4
 end
This increases the hb-interval from 200 ms to 400 ms. A total of 6 missed packets lead to a failover, so there' s a gap of 2.4 seconds until it triggers. - disable session-pickup if the unit is too busy. This will decrease the traffic load on the HA link substantially. That depends on the type of traffic you expect; if it' s mainly HTTP you can disable session pickup. - stop monitoring all other interfaces. The loss of the HA heartbeat will take care of a device failure. If you absolutely must monitor a link, choose just one; and traffic on it should not be too heavy. - downgrade to 4.2.x if available for the 60C. This is your weakest option IMHO. I assume that the HA link is made by a simple TP cable and not via a switch. Switches can sometimes stumble upon the ethernet type the HA traffic protocol uses.

Ede

"Kernel panic: Aiee, killing interrupt handler!"
Ede"Kernel panic: Aiee, killing interrupt handler!"
Johan_Lysen
New Contributor

Hi and thx for fast answers I have done the hb-lost-threshold/hb-interval change, and also changed the number of interfaces monitored to only two, one per switch-teer (internal, internet) - so we can detect that external main internetswitch is lost and make a failover, and also if the internal main networkswitch is down. Yes we have a crossed TP cable on the DMZ port for HA traffic No we don´t use session pickup since the FG60C doesn´t have main CPU resources enough to use that. When we add session pickup we get 100% CPU usage when hitting the unit with >~100Mbps of traffic. When we disable session pickup then this issue is gone.

Johan Lysen Consulting AB Johan Lysen, Johan@Lysen.nu Byvagen 87, 832 46 FROSON Mobile: +46 70 6009221

Johan Lysen Consulting AB Johan Lysen, Johan@Lysen.nu Byvagen 87, 832 46 FROSON Mobile: +46 70 6009221
Johan_Lysen
New Contributor

Hi again There is more and more evidence that points to some issue with logging - and all other issues is because of that. miglogd runs at 25-50% cpu in average and makes all other tasks " high" - even login to WebGUI can be " down" for 15minutes some times. commands like " show log ?" hangs cli " ha-device-lost" is probably because there is no more CPU to run hatalk on. if i tries to disable all logging and make a fresh restart - everthing works pretty nice for a while (days). there is a ticket created with fortinet support, but... no

Johan Lysen Consulting AB Johan Lysen, Johan@Lysen.nu Byvagen 87, 832 46 FROSON Mobile: +46 70 6009221

Johan Lysen Consulting AB Johan Lysen, Johan@Lysen.nu Byvagen 87, 832 46 FROSON Mobile: +46 70 6009221
deltasoft

Hi Johan i' ve the same exact problem, any news about Fortinet support feedback? 2 x FGT60B, 4.0MR1 patch 10 Thanks a lot
Bye Gianf
Bye Gianf
Carl_Wallmark
Valued Contributor

Hi Johan, I would stay away from MR3, its not stable at all, i have seen memory leaks, log issues etc... i have heard Patch 2 is out within weeks.

FCNSA, FCNSP
---
FortiGate 200A/B, 224B, 110C, 100A/D, 80C/CM/Voice, 60B/C/CX/D, 50B, 40C, 30B
FortiAnalyzer 100B, 100C
FortiMail 100,100C
FortiManager VM
FortiAuthenticator VM
FortiToken
FortiAP 220B/221B, 11C

FCNSA, FCNSP---FortiGate 200A/B, 224B, 110C, 100A/D, 80C/CM/Voice, 60B/C/CX/D, 50B, 40C, 30BFortiAnalyzer 100B, 100CFortiMail 100,100CFortiManager VMFortiAuthenticator VMFortiTokenFortiAP 220B/221B, 11C
Johan_Lysen
New Contributor

Why is it so hard to release something stable?

Johan Lysen Consulting AB Johan Lysen, Johan@Lysen.nu Byvagen 87, 832 46 FROSON Mobile: +46 70 6009221

Johan Lysen Consulting AB Johan Lysen, Johan@Lysen.nu Byvagen 87, 832 46 FROSON Mobile: +46 70 6009221
Carl_Wallmark
Valued Contributor

We have been asking the same for a long time, a rule of thumb: stay one MR release behind the latest. im on 4.2.8, and its very stable.

FCNSA, FCNSP
---
FortiGate 200A/B, 224B, 110C, 100A/D, 80C/CM/Voice, 60B/C/CX/D, 50B, 40C, 30B
FortiAnalyzer 100B, 100C
FortiMail 100,100C
FortiManager VM
FortiAuthenticator VM
FortiToken
FortiAP 220B/221B, 11C

FCNSA, FCNSP---FortiGate 200A/B, 224B, 110C, 100A/D, 80C/CM/Voice, 60B/C/CX/D, 50B, 40C, 30BFortiAnalyzer 100B, 100CFortiMail 100,100CFortiManager VMFortiAuthenticator VMFortiTokenFortiAP 220B/221B, 11C
Labels
Top Kudoed Authors