On a recent deployment I ran into an issue where everything was working correctly except an external user trying to join or create an Audio Video Conference. The customer had an enterprise edition consolidated configuration behind an F5 Load Balancer. Doing our initial sip traces we were able to see a 500 error when the external user would try to join or create a conference.
Start-Line: SIP/2.0 500 The server encountered an unexpected internal error
ms-diagnostics: 3080;reason="Internal Error: AddUser failed";source="front end server fqdn"
I removed most of the trace except the important parts. What you will see in the above trace is the SIP 500 error, and then at the bottom the AddUser is failing on the front end server. This exact symptom with an enterprise pool behind load balancers points to this KB article: http://support.microsoft.com/kb/946091. This fix explains an issue with the load balancer being in DNAT mode instead of SNAT mode. However our F5 was using SNAT for all of the OCS traffic, and the pool setting was correctly set to not be in DNAT mode.
Running more traces another error popped up which was a SIP 403 Forbidden:
SIP/2.0 403 Forbidden
SERVER: RTCC/126.96.36.199 MRAS/2.0
ms-edge-proxy-message-trust: ms-source-type=EdgeProxyGenerated;ms-ep-fqdn=Edge Internal interfacefqdn;ms-source-verified-user=verified
Ms-diagnostics: 9006;source="Edge Internal interfacefqdn";reason="Forbidden";component="Media Relay Authentication Service"
This basically means that the front end server is not able to get media relay authentication from the edge server A/V internal interface.
If this is happening you will also see an error in the event logs:
Log Name: Office Communications Server
Source: OCS Audio-Video Conferencing Server
Date: 9/25/2009 4:12:14 PM
Event ID: 32018
Task Category: (1017)
Computer: FRONT END SERVER FQDN
The Audio-Video Conferencing Server encountered an error when requesting credentials from the A/V Edge Authentication Service.
A/V Authentication Service Service URI sip:EdgeInternalFQDN@swk.pri;gruu;opaque=srvr:MRAS:HqCEupOMck6C3onsDHul1wAA, Reason: The operation has failed. See the exception’s properties as well as the logs for additional information.
Cause: The Audio-Video Conferencing Server cannot communicate with A/V Authentication Service.
Check the A/V Authentication Service is alive and that network connectivity exists.
Connectivity was available through the internal edge VIP as well as each individual edge server’s internal interface. Also, if you ran an A/V Conferencing Validation on each of the front end servers it would succeed on all tests.
I ran through this with PSS and there were two things we discovered. The first potential issue was on the Internal tab setting of the edge server. Per the Microsoft documentation when doing an enterprise deployment the name that should be listed on the “Internal Servers Authorized to Connect to this edge server” setting is the pool FQDN, not each individual front end server. There has been some debate about whether you should add the FQDN of each front end server to this list as well, because we were seeing the front end servers get denied access to the A/V Authentication service we decided to try it anyways.
The other change that was made was in the forest global settings section. On the general tab you specify your internal SIP domains and you check one for the default routing domain. In this case the customer AD domain was different from the SIP domain, both were listed, however the AD Domain was checked as the domain to be used for the default routing. Once we changed that setting to have the SIP Domain as the default routing domain and restarted the services on the front end servers, A/V conferencing started functioning properly.
I am hoping I can remove each setting and try to narrow it down to one ,but either way the internal interface setting has proved to fix some funky issues in deployments, so both of these may want to be set regardless.