I'll weigh in here...
SIP presents two problems for firewalls, broadly speaking:
(1) It is used to initiate a media stream using other high, random ports, which may include a session initiated inbound. Unless you port forward the whole defined range (ports 10000-20000, e.g.), effectively opening a wide swath of the firewall to traffic, the firewall needs a way to intelligently open "pinholes" for this media stream while the SIP signalling indicates a call is actively using those ports, then close the session when the call completes.
(2) SIP contains information about the phones, PBX, etc. at Layer-7 which may need to be modified in transit in order to be routable for return traffic. If your PBX and phones have private IPs, in a number of places in the SIP header, they will identify themselves with those private IPs. If a remote phone wants to respond to an INVITE to a phone call, all it has is the private IPs at Layer-7. The firewall may be required to change this information to the public IP of the outgoing interface if no one else is doing it already.
"IF NO ONE ELSE IS DOING IT ALREADY"...
Many phone systems will take care of these functions by using a STUN server or static settings to discover the public IP a phone or PBX is behind and modify the Layer-7 information before it even hits the firewall. Many systems will also send an outbound packet with the source and destination ports the media stream will use when initiated, and in this way, "punch a hole" through the firewall. If either or both of these conditions are met by the phone system, there is no need for the firewall to get involved.
The session helper is turned on by default on the FortiGate, which means it will listen for all UDP port 5060 traffic (this can be changed under 'config system session-helper'). The kernel can parse SIP traffic: it is aware of the structure, and can look for Media Port or Requested Port information. It will also examine the From:, Contact:, VIA:, and other fields for the IPs that devices identify themselves with. If the policy allowing the traffic performs DNAT, it can replace the original private IP with its own public IP.
Pinholes and header fix-up: that's essentially the two functions of the session helpers, which again, are on by default.
The VoIP profile (also referred to as an Application Layer Gateway, or ALG) performs the functions of the session helper, but also allows much more granular control over security and "tweaking" features. The default profile will work for many situations, but not all.
Because there are so many possible topologies, configurations, and vendors, the session helpers and VoIP profiles may not be required, and may cause more trouble than they are worth: headers modified when not necessary; pinholes opened for the wrong ports.
You just need to try the defaults until they don't work, turn the helper completely off, and if *that* doesn't work, run verbose packet sniffs and compare the packets between what arrives and what leaves the FortiGate. See if the original addresses are private or public. Check to see how the packets are modified on their way out. Verbose sniffs can be run in the context of a TAC case, so that you have help to run them and expert analysis.
This would be the rough syntax (you have to run two separate sessions in order for the output to work in Wireshark):
diag sniffer packet internal "host w.x.y.z" 6 0 a
diag sniffer packet wan1 "host w.x.y.z" 6 0 a
-Don't filter for port 5060 if you want to also see the RTP stream - it will use different ports
-Filter for an unchanging, public IP for w.x.y.z, like the IP of the trunk or VoIP provider. That way, when the traffic is NATted, you will see the packet both before and after translation.
This KB article provides the EXE and Perl script for converting the text-based sniff into a PCAP file for Wireshark:
Paste the quote contents into the address bar, or else search for 'fgt2eth' at kb.fortinet.com.