Sunday, June 17, 2018

         

                               Calls made after long idle hours are failing 


Recently, we had customer reporting about a call failure issues. Since the scenario and the root cause of the issue was different I would like to share that experience here.

Issue Description

Every morning, whenever users  try to place calls from their desk phones to the PSTN,  the call rings. However, the call gets disconnected as soon as  the call gets accepted/answered.

Technical description: In this scenario, for  a SIP based VoIP Call flow, the SIP Signaling works and the phone rings fine. However, when the callee/called person answers the call, it gets disconnected.
 Hence in this issue, the SIP signaling works fine but  the Media path always fails.


Lync Platform: Lync 2013 MT platform.

Desk Phone - Yealink

Recent change: Firmware update. At the local site, the firmware was upgraded on the Yealink phones. But after the firmware update all the test cases were  successful.

Phone Firmware versions

Firmware version without the issue:   66.9.0.25

New firmware version (that caused the issue) :   66.9.0.42


Workaround (when you are using the firmware 66.9.0.42)


  • Reboot the phone after several hours of idle time. Then the issues does not occur. (OR)
  • Downgrade to another version in our case it was 66.9.0.25.


Recent Changes: 

Scenario: 

The customer was migrated from existing PBX to Lync 2013 MT several months ago. The users are using Yealinks Desk Phone to make calls. The phones were working fine for the last few weeks. However, users reported that they are always not able to make calls in the morning.  Once they reboot the phone then the calls are working fine throughout the day. However, the next morning again we have the same issue and it gets resolved once the phones are rebooted.


Troubleshooting:

  • Confirmed that the port 3478 for the STUN (UDP) was allowed in the Firewall.
  • Confirmed that Lync/SfB server was listening on the port 3478 for new sessions and there were no server related issues.
  • There were no connectivity issues between the phones and the Lync/ SfB (Skype for Business) servers.

Issue: After troubleshooting the issue with the customer, it was obvious that the issue occurred only on the phones which had the latest firmware 66.9.0.42.


Network trace collection and Analysis:

In order to collect the Wireshark trace,  I connected to the Yealink phone using its IP Address and collected the Wireshark trace for the not-working and the working scenario.

The following are the snapshots of the network trace collected while the calls were failing (after several hours of the idle time). First let us see network trace from the phone on a morning when the calls are failing. From the  snapshots (of the Wireshark trace of the failure scenario) - we could see that there were several STUN binding requests but no successful response. Moreover there were  several strange errors for STUN binding requests like.


1Allocate Error Response error-code: 401 (unauthorized) the request did not contain a Message-Integrity attribute”.   

 And sometimes the STUN binding requests failed with the other STUN binding errors like,

2. “Allocate Error Response Code : 436” – The username supplied in the request is not known.

So, it is evident that, the issue was due to some STUN Binding requests and the lack of successful STUN responses. 

While searching in the internet based on STUN error message ( that we got from the Wireshark trace),  understood that the issue was due to the STUN response or the ICE keep alive related issues.







So, tried to collected the Wire Shark trace for a working scenario. Hence, rebooted the phone and then collected the Wire Shark trace (when the calls were working fine after the phone reboot).

When the phones were rebooted, found that the phones were connected to the same Lync server but the phones started working (after the reboot). Hence, this implies that there were no problems with the Phones and the Lync server. This is because after the rebooting the phone, it was sending out a new STUN biding requests and receiving a  STUN Allocate Success  response immediately, for the Media flow.





 Hence, contacted the Yealink support and provided the Wireshark trace for the working and not-working scenarios.

The Yealink support checked wire shark traces I provided. They also confirmed the issue after performing the tests at their end. So, their Yealink Product development team worked on a hot-fix and   provided us a hot-fix in couple of days and it fixed our problem.

Root cause of the issue:  As per the Yealink support, the root cause of the issue "the phones don’t update the STUN user information in time, new firmware hot-fix would let the phones update the STUN information every 10 minutes."

A quick word about Yealink support in this case:

I must admit that since I have worked with several other UC Phone vendors, I can tell you that Yealink support was great in this case. Because, earlier when I faced similar firmware issues with other Microsoft UC vendors, my experience with their support was really time consuming and bad.

After all, the other premier UC vendor for Microsoft was in total denial mode for months rather than accepting about the issues with their firmware. During those instances, not only I need to wait for several months for them to fix the issue. But also, the vendor would take several months to even acknowledge the issue on their product. On the other hand, the Yealink support was very quick on confirming the issue and providing us the hot-fix immediately in order to fix this issue. So, a big thank  you to the Yealink Support  :-)

Lessons learnt:


From this issue we understood that we have one more scenario that needs to be tested after a firmware update. The take away from this experience is, you need to test the call flow after long idle hours as well (at least after a time frame of 12 hours since the desk phones was rebooted).

Monday, June 4, 2018



                          SIP Back-to- Back User Agent Role  [ Signaling B2BUA Role ]

Recently, while I was working on SIP Call flow issue we had to work a custom built application. Unfortunately, there were no documentation  or diagrams available for us in order to understand that application. Later while discussing about the application, we got to know that it was a custom application designed for a specific purpose like masking the Caller's Identity (before leaving the network), identifying and tearing down idle sessions etc for security reasons.


From the SIP logs that we collected from that application, we realized that the SIP Call flow looked different than the predominant SIP server roles (like SIP Registrar, SIP Proxy or SIP Redirect server  Roles) which we were well aware.  Moreover, it was not an SBC connecting to the PSTN network either.Instead it was a SIP B2BUA. Since I have worked mostly on more on Microsoft UC products the only B2BUA that I was aware was Mediation server Role in Microsoft Lync or SfB infrastructure.
What is a Mediation server ? For readers who are new to Microsoft UC platform, the Mediation Server is considered the last point of contact for the Lync/SfB environment before communicating to the telephony world for audio communication, whether its is a inbound  or outbound VoIP calls from/to the Public Switched Telephone Network (PSTN) network.  But later while learning about the B2BUA understood that there are several categories with in B2BUA.

So, started searching and reading about B2BUA in the Internet and i would like to share some of the information which i gathered while trying to understand the functionalities of a SIP B2BUA.


Back to Back User Agent:

What is a SIP B2BUA Role ?

Back to Back User Agent (B2BUA) is the logical combination of a UAS and UAC.

UAS :    User  Agent Server.

UAC :    User Agent Client.

In SIP deployments, there are several Back to Back User Agents (B2BUA). So, it is very important to understand the different types and what a B2BUA Role can do or Cannot do in a SIP infrastructure.  

Note, the Back to Back User Agents are further classified in several types. Again, it is a SIP server Role not a single system. That is, a system or a server can perform all of the B2BUA roles mentioned below in one server and not necessarily each Back to Back UA Role should run on a separate server.
 The B2BUA is broadly classified into two categories:
  • Signaling Plane B2BUA
  • Signaling + Media Plane B2BUA
The SIP B2BUA Role is in itself a vast topic. So in this post, we will discuss about the Signaling Plane B2BUA and discuss about the Signaling + Media Plane B2BUA  in a different post.


what is a Signaling Plane B2BUA ?

 Signaling Plane B2BUA as it name implies it ONLY operates on the SIP Messages and SIP Headers.  and  NOT on the Media.

Again there are several classification within the Signaling plane B2BUA like

    1)  Proxy-B2BUA 
    2)  Signaling Only B2BUA 
3  3)  SDP Modifying Only Signaling 


     1)  Proxy B2BUA: (REPLACES only the VIA: and Record-Route: SIP Headers)

The Proxy B2BUA maintains the Sufficient SIP Dialog state in order to (or if required to) generate the In-Dialog SIP messages on its own. If the Proxy B2BUA can generate In-Dialog SIP messages then it can also MODIFY the CSEQ: header after it has generated its own.

Example of this B2BUA is, sending the BYE requests in order to tear down a dead SIP session.

so what are all the SIP headers that a SIP Proxy B2BUA can modify ?

The Proxy B2BUA role can only modify the Via: and Record-Route: SIP header fields.

What SIP headers a SIP Proxy B2BUA cannot modify ?

The Proxy B2BUA role does not modify the TO: ,  FROM:  , Contact:  SIP headers

2)  Signaling – only B2BUA: (Replaces all the SIP Headers)

A Signaling Only B2BUA is the one, that operates at the SIP layer but in ways beyond those of the SIP Proxies.

That is, the Signaling -Only B2BUA can  replace the Contact URI  along with modifying or removing the Via and Record-Route headers.

Also, in this Signaling Only B2BUA role - No SIP headers are guaranteed to be copied from the Received SIP request messages from the UAS and generated on the UAC side.

So if you want to completely create a new call leg between two different System or Networks, then you need to have Signaling-only B2BUA. (The Mediation servers in the Lync/SfB infrastrucure)

Example:

Like a Application Server or a PBX which actually Processes the REFER methods locally and then generates a new INVITE on Behalf of the REFER’s target.

Another example is a  Privacy Service Proxy performing the ‘Header’ Privacy function.

This kind of  B2BUA,  a Singaling only B2BUA is useful if you want to hide the caller's identity before it leaves your SIP infrastructure. Or may be for billing purposes if you want to convert all the Call transfer (like REFER) to a new INVITE session. Then, you can have a Signaling only B2BUA Server sit in the Perimeter of a network and makes sure that any call transfers that made within the network to any outside network should be treated as a new Call. Thus, the Billing server would only consider the INVITE sessions generated with unique Call-ID and charge the calls transferred outside of a system as a new Call. So, may be then you might need this kind of feature.

3) SDP modifying Signaling-Only (can modify the SDP in SIP message).

An SDP –Modifying Signaling Only B2BUA is one that operates in the Signaling Plan only AND NOT in the Media Path. However, it can MODIFY the SDP. Thus, this type of B2BUA is aware of the SDP semantics.

Purpose:

This SDP modifying B2BUA does NOT  make changes to the Media Path.  That is, it does not stay or INSERT themselves in the PATH of the Media (like a Third Party Call control servers).

However, it will make SDP changes that affects 

  • what is sent on the Media Plane ?  (like the SDP offer changes like removing the unsupported Codecs ) OR 
  • It can MERGE two separate Media end points into one SDP offer etc.  


Certain Application servers or SIP PBX or SIP PSTN Gateways act in this role (SDP modifying B2BUA). So that they can remove the unsupported Codecs from the SDP.