Support Forum
The Forums are a place to find answers on a range of Fortinet products from peers and product experts.
m0j0
New Contributor III

60E HA Configuration - Cluster won't form, both units showing as master

I'm setting up a pair of 60E's in HA but I'm unable to get the cluster to form, and both units think they are master.  The units were factory reset and both upgraded to 5.6.6.  I issued the following on the "primary":

config system dhcp server
delete 1
end
config firewall policy
delete 1
end
config system session-helper
delete 13
end
config system virtual-switch
delete internal
end

config system global
set hostname "ussv4gvlfw1-1"
end
config system ha
set group-name "ussv4gvlfwcluster1"
set mode a-p
set password DktJSnrGZGTVSF7v
set hbdev "internal6" 200 "internal7" 100
set override disable
set priority 110
set monitor "internal1"
end

 

And this on the "secondary":

config system dhcp server
delete 1
end
config firewall policy
delete 1
end
config system session-helper
delete 13
end
config system virtual-switch
delete internal
end
config system global
set hostname "ussv4gvlfw1-2"
end
config system ha
set group-name "ussv4gvlfwcluster1"
set mode a-p
set password DktJSnrGZGTVSF7v
set hbdev "internal6" 200 "internal7" 100
set override disable
set priority 100
set monitor "internal1"
end

 

I've connected the two internal6 interfaces and the two internal7 interfaces to each other and I have link lights (and CLI shows them all at 1000Mbps full duplex), however the HA light on both units is orange and I get the following from a "diag sys ha status":

ussv4gvlfw1-1 # diag sys ha status
HA information
Statistics
        traffic.local = s:0 p:159174 b:23507717
        traffic.total = s:0 p:159174 b:23506842
        activity.fdb = c:0 q:0

Model=60, Mode=2 Group=0 Debug=0
nvcluster=1, ses_pickup=0, delay=0

[Debug_Zone HA information]
HA group member information: is_manage_master=1.
FGT60ETK18052822: Master, serialno_prio=0, usr_priority=110, hostname=ussv4gvlfw1-1

[Kernel HA information]
vcluster 1, state=work, master_ip=169.254.0.1, master_id=0:
FGT60ETK18052822: Master, ha_prio/o_ha_prio=0/0

ussv4gvlfw1-2 # diag sys ha status
HA information
Statistics
        traffic.local = s:0 p:67 b:27520
        traffic.total = s:0 p:67 b:27520
        activity.fdb = c:0 q:0

Model=60, Mode=2 Group=0 Debug=0
nvcluster=1, ses_pickup=0, delay=0

[Debug_Zone HA information]
HA group member information: is_manage_master=1.
FGT60ETK18054479: Master, serialno_prio=0, usr_priority=100, hostname=ussv4gvlfw1-2

[Kernel HA information]
vcluster 1, state=work, master_ip=169.254.0.1, master_id=0:
FGT60ETK18054479: Master, ha_prio/o_ha_prio=0/0

 

I thought that maybe the cluster password might have pasted incorrectly on one (or both of the units), so I did a complete factory reset of the secondary unit and reapplied the same initial config commands and I also went and manually entered the cluster password on both units to ensure they were the same.  Still no joy.

 

I've set essentially the same configuration on two other new 60E units in a different location and they seem to be working fine.  The only difference is that I didn't upgrade the software on these units and they're still running 5.6.4.  The problem I have is that I'm working from a different country so am relying on remote hands and remote console connections to make changes.  I'm not sure how to troubleshoot this further.

 

 

1 Solution
Toshi_Esumi
Esteemed Contributor III

Must be something to do with hearbeat connections. You're sure internal6<->internal6, and internal7<->internal7 are connected (Not 6<->7, 7<->6), right? To simplify, disconnect internal7 and use only internl6. Then check "get sys ha status", which shows more info than "diag sys ha status".

If something is wong with those physical connections, it would show you error/warning w/ "hbdev" or something at the beginning.

Also get in both through console (need two PCs or one PC with two USB Serial adapters) and run "diag debug app hatalk -1" on both sides. Probably you would see it's trying to communicate to the other end but can't get anything back on both ends.

View solution in original post

3 REPLIES 3
Toshi_Esumi
Esteemed Contributor III

Must be something to do with hearbeat connections. You're sure internal6<->internal6, and internal7<->internal7 are connected (Not 6<->7, 7<->6), right? To simplify, disconnect internal7 and use only internl6. Then check "get sys ha status", which shows more info than "diag sys ha status".

If something is wong with those physical connections, it would show you error/warning w/ "hbdev" or something at the beginning.

Also get in both through console (need two PCs or one PC with two USB Serial adapters) and run "diag debug app hatalk -1" on both sides. Probably you would see it's trying to communicate to the other end but can't get anything back on both ends.

m0j0
New Contributor III

Thanks for the response.

 

Having never gone to the trouble to not connect HA heartbeat interfaces to the corresponding interface on the other device, I've never thought about the result of doing so and never seen the effect.  If my colleague had indeed connected 6 to 7 and 7 to 6, would that likely be the cause of my issues?

 

The reason I ask, is that I jumped on these devices via the remote consoles I have and disabled interface7 on the primary and when I checked the secondary device, interface6 was down.  So, it would appear this is what they've done.

 

Regards,

Mark

m0j0
New Contributor III

Thanks for that info.  I got remote hands to swap the cables around and up came the cluster.

 

Regards,

Mark

Labels
Top Kudoed Authors