HTTP request timeouts when going through Virtual IP (NAT Reflection, NAT Hairpin)

Author
Eric Lackey
Bronze Member
  • Total Posts : 26
  • Scores: 1
  • Reward points: 0
  • Joined: 2013/07/20 19:16:48
  • Status: offline
2015/02/16 16:12:16 (permalink)
0

HTTP request timeouts when going through Virtual IP (NAT Reflection, NAT Hairpin)

I've got a really strange issue that we've spent a week on and haven't been able to get anywhere.
 
Here are the specs:
FortiGate 600C running 5.2.2 in a HA Active-Active
Connected to Cisco 3560X switches with LACP aggregate interfaces
 
We recently switched from Watchguard to Fortigate firewalls in our web environment. In our web stack, we use NAT Reflection (or NAT Hairpin) to simplify DNS management. So, internal servers (CentOS) call out to to external VIP addresses that get NAT'd back into servers on the same subnet. I know that this isn't a great thing to do, but it's worked for years and we're working to change this soon.
 
As soon as we switched over to the Fortigate, we started getting timeouts with requests that follow that path (App Server>VIP>Web Server). What we are seeing from the App server side is that it sends a SYN, but never gets a SYN-ACK back. The Web Server never receives the packet either. I've done a trace on the Fortigate and I'm pretty sure that the Fortigate does receive the packet, but I'm unable to tell in the trace if it's actually responding correctly. The packet seems to get lost somewhere between the app servers and Web server. The issue is random and it does not seem to increase or decrease based on load. It might happen once every hundred requests or so. When it happens, the App server will eventually retry the request and it sometimes hangs again, but it will eventually go through - sometimes up to 90 seconds later.
 
Here is another thing that we've noticed. We've started to see Output Drops on the switch interfaces that connect to the Fortigate. As far as we know, this was not happening before, but we did not monitor it before so hard to know. We've changes cables to make sure it's not bad cables. We've also swapped to the secondary switch and see the same thing. One other thing to note is that this does not affect traffic coming into the Web servers from external. It only affects traffic that take the NAT hairpin loop. I have suspicions that it's one of two things. 1) The Fortigate is performing the NAT and then somehow losing the VLAN tag when it puts it back out on the network. This might explain why the switch is dropping the packet if it didn't have a VLAN tag and didn't know what to do with it. 2) It might be some type of MTU issue. Our switches and firewalls are configured with the default MTU of 1500.
 
 
We've done so many things at this point that we're just about out of ideas. Here is a list of things we've tried. Some of these are based from findings from these forums.
1. Reboot firewalls
2. Shutdown secondary firewall so it's running in standalone
3. Run on secondary firewall only
4. Moved policy to top of list
5. Disable vlanforward on aggregate interface and vlan interface
6. Swapped cables between firewalls and switches
7. Enabled send-deny-packet on specific policy
8. Set tcp-mss-sender and tcp-mss-receiver to 1380 on specific policy
9. Set tcp-mss to 1380 on vlan interface and aggregate interface
 
I've tried to include every config I can think of below. I would appreciate help if anyone can think of anything.
 
config system interface
    edit "aggr.webprod.in"
        set vdom "webprod"
        set type aggregate
        set tcp-mss 1380
        set member "port17" "port18"
        set snmp-index 71
    next
 
 
config system interface
    edit "vlan.webprod.in"
        set vdom "webprod"
        set ip 172.XXX.XXX.XXX 255.255.0.0
        set allowaccess ping
        set tcp-mss 1380
        set snmp-index 74
        set secondary-IP enable
        set interface "aggr.webprod.in"
        set vlanid 55
            config secondaryip
                edit 1
                    set ip 172.XXX.XXX.XXX 255.255.0.0
                    set allowaccess ping
                next
                edit 2
                    set ip 172.XXX.XXX.XXX 255.255.0.0
                    set allowaccess ping
                next
            end
    next
end
 
config firewall policy
    edit 58
        set srcintf "zone.webint"
        set dstintf "zone.webint"
        set srcaddr "all"
        set dstaddr "vip.http.aaa" "vip.http.bbb "vip.http.ccc" "vip.https.aaa" "vip.https.bbb" "vip.https.ccc" 
        set action accept
        set schedule "always"
        set service "HTTP" "HTTPS" "DNS"
        set logtraffic all
        set match-vip enable
        set tcp-mss-sender 1360
        set tcp-mss-receiver 1360
        set timeout-send-rst enable
        set nat enable
    next
end
 
 
 
 
config firewall vip
    edit "vip.http.aaa"
        set extip 67.XXX.XXX.21-67.XXX.XXX.23
        set extintf "any"
        set portforward enable
        set mappedip "172.XXX.XXX.1-172.XXX.XXX.3"
        set extport 80
        set mappedport 80
    next
    edit "vip.http.bbb"
        set extip 67.XXX.XXX.25-67.XXX.XXX.27
        set extintf "any"
        set portforward enable
        set mappedip "172.XXX.XXX.1-172.XXX.XXX.3"
        set extport 80
        set mappedport 80
    next
    edit "vip.http.ccc"
        set extip 67.XXX.XXX.28-67.XXX.XXX.30
        set extintf "any"
        set portforward enable
        set mappedip "172.XXX.XXX.1-172.XXX.XXX.3"
        set extport 80
        set mappedport 80
    next
    edit "vip.https.aaa"
        set extip 67.XXX.XXX.21-67.XXX.XXX.23
        set extintf "any"
        set portforward enable
        set mappedip "172.XXX.XXX.1-172.XXX.XXX.3"
        set extport 443
        set mappedport 443
    next
    edit "vip.https.bbb"
        set extip 67.XXX.XXX.25-67.XXX.XXX.27
        set extintf "any"
        set portforward enable
        set mappedip "172.XXX.XXX.1-172.XXX.XXX.3"
        set extport 443
        set mappedport 443
    next
    edit "vip.https.ccc"
        set extip 67.XXX.XXX.25-67.XXX.XXX.27
        set extintf "any"
        set portforward enable
        set mappedip "172.XXX.XXX.1-172.XXX.XXX.3"
        set extport 443
        set mappedport 443
    next
end
 
 
#############
Cisco Configuration
#############
 
interface Port-channel10
 description ptn-fw101 webprod-int portchannel
 switchport trunk encapsulation dot1q
 switchport trunk allowed vlan 50,51,55
 switchport mode trunk
end
 
interface GigabitEthernet0/17
 description fw101 port 17 webprod-int
 switchport trunk encapsulation dot1q
 switchport trunk allowed vlan 50,51,55
 switchport mode trunk
 logging event bundle-status
 logging event spanning-tree
 spanning-tree portfast trunk
 spanning-tree bpdufilter enable
 channel-group 10 mode active
end
interface GigabitEthernet0/18
 description fw101 port 17 webprod-int
 switchport trunk encapsulation dot1q
 switchport trunk allowed vlan 50,51,55
 switchport mode trunk
 logging event bundle-status
 logging event spanning-tree
 spanning-tree portfast trunk
 spanning-tree bpdufilter enable
 channel-group 10 mode active
end
#1

5 Replies Related Threads

    Eric Lackey
    Bronze Member
    • Total Posts : 26
    • Scores: 1
    • Reward points: 0
    • Joined: 2013/07/20 19:16:48
    • Status: offline
    Re: HTTP request timeouts when going through Virtual IP (NAT Reflection, NAT Hairpin) 2015/02/16 17:55:17 (permalink)
    0
    Here is a little more detail. We were able to get some additional traces tonight and determine that the Firewall is getting the packet as soon as the host sends them.
     
     
    ######## This is what a good packet trace looks like
     
    213.037036 vlan.webprod.in in 172.XXX.XXX.4.52661 -> 67.XXX.XXX.22.80: syn 3512656144 
    213.037299 vlan.webprod.in out 67.XXX.XXX.22.80 -> 172.XXX.XXX.4.52661: syn 218656840 ack 3512656145
    213.037300 aggr.webprod.in out 67.XXX.XXX.22.80 -> 172.XXX.XXX.4.52661: syn 218656840 ack 3512656145
    213.037301 port18 out 67.XXX.XXX.22.80 -> 172.XXX.XXX.4.52661: syn 218656840 ack 3512656145
    213.037525 vlan.webprod.in in 172.XXX.XXX.4.52661 -> 67.XXX.XXX.22.80: ack 218656841
    213.037539 vlan.webprod.in in 172.XXX.XXX.4.52661 -> 67.XXX.XXX.22.80: psh 3512656145 ack 218656841
    213.045640 vlan.webprod.in in 172.XXX.XXX.4.52661 -> 67.XXX.XXX.22.80: fin 3512656416 ack 218668089
    213.045882 vlan.webprod.in out 67.XXX.XXX.22.80 -> 172.XXX.XXX.4.52661: fin 218668089 ack 3512656417
    213.045883 aggr.webprod.in out 67.XXX.XXX.22.80 -> 172.XXX.XXX.4.52661: fin 218668089 ack 3512656417
    213.045884 port18 out 67.XXX.XXX.22.80 -> 172.XXX.XXX.4.52661: fin 218668089 ack 3512656417
     
    ######## This is what a bad packet trace looks like
     
    120.071166 vlan.webprod.in in 172.XXX.XXX.4.52661 -> 67.XXX.XXX.22.80: syn 3512656144 
    123.069771 vlan.webprod.in in 172.XXX.XXX.4.52661 -> 67.XXX.XXX.22.80: syn 3512656144
    129.067483 vlan.webprod.in in 172.XXX.XXX.4.52661 -> 67.XXX.XXX.22.80: syn 3512656144
    141.063126 vlan.webprod.in in 172.XXX.XXX.4.52661 -> 67.XXX.XXX.22.80: syn 3512656144
    165.054426 vlan.webprod.in in 172.XXX.XXX.4.52661 -> 67.XXX.XXX.22.80: syn 3512656144
    #2
    vjoshi_FTNT
    Gold Member
    • Total Posts : 135
    • Scores: 6
    • Reward points: 0
    • Joined: 2015/02/02 21:28:20
    • Status: offline
    Re: HTTP request timeouts when going through Virtual IP (NAT Reflection, NAT Hairpin) 2015/02/16 22:42:08 (permalink)
    0
    Hello,

    Randomly doesn't work means that the configuration should not be a problem here. However I would like to know if you have dual ISPs and also the routing table when the issue occurs(and also the working one) with command  "get router info routing-table database"

    May I know the IP address used in the sniffer filter?

    Also, please get the output of the debug flow commands which tells you what is Fortigate is doing with any specific request and reason for dropping it(if it does):

    diag debug reset
    diag debug disable
    diag debug enable
    diag debug flow filter saddr x.x.x.x       --->> Source address from where the connection is initiated (If you do not have too many connections to the server during the test, I recommend using the filter 'daddr with server IP')
    diag debug flow filter dport 80
    diag debug flow show console enable
    diag debug console timestamp enable
    diag debug flow trace start 100

     
    NOTE:
    - Once the commands are run, try to access the server
    - Once you get the output captured, you can disable the debug with the command  #diag debug disable


    Cheers!
    #3
    Eric Lackey
    Bronze Member
    • Total Posts : 26
    • Scores: 1
    • Reward points: 0
    • Joined: 2013/07/20 19:16:48
    • Status: offline
    Re: HTTP request timeouts when going through Virtual IP (NAT Reflection, NAT Hairpin) 2015/02/17 13:32:57 (permalink)
    0
    Thanks, I'll try to upload that ASAP.
     
    We discovered one new thing today after sniffing packets. We have 8 application servers that sit behind the Fortigate and they all could be sending many requests up through these VIPs at any given time. We are able to identify the timeout issue easily in Wireshark because when it happens we get a "TCP Port numbers reused" followed by several "TCP Retransmission". What it looks like is that multiple application servers are sending requests around the same time with the same source port.
     
    I could be totally off here, but it seems like the Fortigate is having trouble processing that correctly from a NAT standpoint. This might explain why it only affects Internal>FW>Internal traffic rather than WAN>FW>Internal since traffic from the WAN side would always be coming from a different IP.
     
     
    #4
    Eric Lackey
    Bronze Member
    • Total Posts : 26
    • Scores: 1
    • Reward points: 0
    • Joined: 2013/07/20 19:16:48
    • Status: offline
    Re: HTTP request timeouts when going through Virtual IP (NAT Reflection, NAT Hairpin) 2015/02/17 17:25:44 (permalink)
    0
    We think we finally have this one fixed. We created a Dynamic IP Pool with 100 IP addresses and chose that IP pool on the policy rather than "Use Outgoing Interface Address". We only enabled this IP pool for the policy for Internal>FW>Internal policy and not for WAN>FW>Internal policy. 
     
    As soon as we made this change, the timeouts stopped. The only thing I can determine is that there is a bug in the Fortigate where it cannot properly handle this scenario when there are several (we have 8) internal hosts using the VIP. The Watchguard firewalls that we had in place before did not have this problem and firewall was the only thing that changed in our setup. 
     
    Just to summarize - the issue occurs in a NAT Reflection scenario where there are multiple internal servers sending traffic to a VIP that forwards traffic back to internal servers on the same subnet. Eventually, multiple servers will send a request from the same port number within a few seconds of each other and that can cause the second request to timeout. When following a trace, we can see that the server sends a SYN packet that appears to make it through the FW and to the other server, but no ACK is ever returned. We will then see multiple SYN retransmissions until it finally times out. 
     
    #5
    rdy2go
    New Member
    • Total Posts : 2
    • Scores: 0
    • Reward points: 0
    • Joined: 2016/02/09 09:47:38
    • Status: offline
    Re: HTTP request timeouts when going through Virtual IP (NAT Reflection, NAT Hairpin) 2016/07/18 11:01:15 (permalink)
    0
    Did you ever hear back from Fortinet on this? This is still an issue on 5.2.7. 
    #6
    Jump to:
    © 2020 APG vNext Commercial Version 5.5