Saturday, February 24, 2018

            

        In SIP, what is GRUU and how it helps in Call Transfer             


                   

Most of us would have worked on various SIP Call transfer related issues. And we already know that the SIP Method REFER is used for the call transfer. So, we are not going to discuss about REFER method here. Instead, today we are going to discuss about how a SIP server would identify the exact SIP client correctly and then routes the call to the target location (when users are signed to multiple devices). In order to route the call to the end point, the information in the Contact header is used. So, Let us see the contents of a  Contact: header in detail.

The Contact: header

In a SIP  Contact: header you will find the SIP Client IP address, port number, the protocol used, expires  value, URN and GRUU.

For example, When a client sends a SIP REGISTER request, would be similar to the one shown here

In the SIP Client Register request:

Contact: <sip:10.2.2.210:49872;transport=tls;ms-opaque=cc851bcfca>;methods="INVITE, MESSAGE, INFO, OPTIONS, BYE, CANCEL, NOTIFY, ACK, REFER, BENOTIFY";proxy=replace;+sip.instance="<urn:uuid:DBCB7786-ACCF-5829-8AF1-6928B6C14315>"

Then, the response from SIP Registrar server would be like

Contact: <sip:10.2.2.210:49872;transport=tls;ms-opaque=cc851bcfca;ms-received-cid=C020900>;expires=7200;+sip.instance="<urn:uuid:dbcb7786-accf-5829-8af1-6928b6c14315>";gruu="sip:yogesh.s@sipdomain.com;opaque=user:epid:hnfL28-sKViK8WkotsFDFQAA;gruu"

Did you notice that the Client sent a URN value and the Server had returned some GRUU value ?

Purpose of GRUU:

Normally, an administrator new to VoIP systems will have this question, why we need GRUU ?  is having a SIP Address (which is unique) coupled with IP address (which is also unique) is not sufficient to route to the correct SIP end point ? - No. Let us see the reasons.

Reason #1: No, this is because, just with the IP address we cannot route to the correct SIP client if a client is behind a NAT.  Not only the limitation in case of NAT, but also, it should remain unique even when the SIP client is connected to different IP network.

Reason #2:  What if the IP address of the SIP client (which is signed in with a SIP account) changes later after a client restart. In this case the same user with the same SIP client may get a different IP Address (if the IP address lease has expired).

So, we need a different mechanism in order to uniquely identify a SIP client (other than the IP address).  Such that, the SIP client (User Agent) instance remains globally unique within the SIP Infrastructure.

Hence, we depend on other mechanisms like URN and GRUU in the Contact: header.

Before discussing about URN and GRUU let us try to understand their purpose

Alright ! so, what is the purpose of URN and GRUU ? and how do they help in a SIP Call flow ? In order to understand this, we need to first understand the challenges in a blind call transfer scenario. So, let us consider a scenario and try to understand the challenges in it.

SIP URI:

We know that the advantage of SIP system is that, it provides a SIP URI which is unique within a SIP domain. Moreover, it allows a user to login to multiple devices with the same SIP URI. So, consider that I am logged to a desktop computer, a laptop computer and two mobile devices (like a tablet and one mobile phone device) with my SIP account (which is unique but same SIP URI is on all the four devices). Then what if,  I am already on a call with  my colleague using a (SIP client running on my laptop) and waiting for a call to be transferred (from one of my colleague) ? In this case, the call should be transferred to the end point where I am already active isn't ? Only then, I can attend the call immediately.So,  is it simple for a SIP server to find exact end point where I am already on a call or is there any challenge in it ?  Yes, the challenge here is, how a SIP server can identify exactly my laptop client and route to it. ? So, how can we solve this problem ?

SIP URI limitations:

Can a SIP URI can solve this problem, because it is unique  isn't ? No ! because, though the SIP URI is unique within the SIP domain, it only helps in  identifying the unique SIP user account. Whereas, in this scenario I am using the same SIP URI on multiple devices. Hence, we have two challenges here.

Firstly, we need a mechanism to uniquely identify each SIP client instance though I am using the same SIP URI on multiple devicesThat is, the SIP client instance running on the Desktop, Laptop and mobile devices has to differentiated (though I am using the same SIP URI).

Secondly, we also need a way to identify the exact SIP client instance along with the SIP URI that is used on the device. (why ? because the same SIP client can be used by different users with different SIP URI as well). In order to address these challenges we use URN and GRUU.

URN: 

 As per RFC 5031, Each SIP Client MUST have an Instance Identifier Uniform Resource Name (URN) that uniquely identifies the device. Furthermore, a URN has the following characteristics.
  • Usage of a URN provides a persistent and unique name for the User Agent  instance. 
  • It also provides an easy way to guarantee uniqueness within the AOR when signed to multiple devices. 
  • This URN MUST be persistent across power cycles of the device
  • The instance ID MUST NOT change as the device moves from one network to another.
  • The SIP client is responsible to create a instance ID.
  • In Soft client, during the SIP client installation the instance id is created.
Alright ! So does having a  persistent URN and with all the above characteristics, would solve our problem? Not really ! Why ? Notably it leaves us with another challenge here, what  if  the same SIP client is used by a different user with different SIP URI  ? So, the SIP server actually needs a mechanism to route to the Unique SIP Client Instance along with the SIP URI successfully.
Hence, we use GRUUGlobally Routable User agent URI.
Next, let us see this GRUU  in detail.  What is GRUU ?

GRUU:


A SIP URI that routes to a specific UA instance is called a Globally Routable User Agent URI (GRUU). That is, we need a globally routable mechanism in order to reach each SIP client Instance.

Now, lets see how the GRUU is generated ?

During the SIP client Registration phase,  the SIP client would send a Register request to the SIP REGISTRAR server with the +sip.instance and its URN value. Also the SIP client would send with the Supported: header with the value gruu, thus  indicating that it can support GRUU. Thus, a SIP Registrar will understand that the SIP client can support GRUU and it had to create one for the SIP client with the specific client instance.

As per RFC 5627, "The basic unit of reference is the Address of Record (AOR).  However, in SIP systems a single user can have a number of user agents (handsets,soft phones, voicemail accounts, etc.) that are all referenced by the same AOR. There are a number of contexts in which it is desirable to  have an identifier that addresses a single user agent rather than the group of user agents indicated by an AOR."  And it also says that "Every GRUU is associated with a single AOR and a single instance ID A SIP registrar MUST be able to determine the instance ID and AOR when presented with a GRUU.  In addition, the GRUU, like an AOR, resolves to zero or more contacts.  While the AOR resolves to all registered contacts for an AOR, a GRUU resolves only to those contacts whose instance ID matches the one associated with the GRUU. "


GRUU properties:
  • It routes to a specific UA instance.
  • It can be successfully dereferenced by any user agent on the Internet, not just ones in the same domain or IP network as the UA instance to which the GRUU points.

Once the SIP client received the GRUU from the SIP Registrar server,it uses them as the contents of the Contact: header field in non-REGISTER requests and responses that it emits (for example, an INVITE request and 200 OK response).

Contact: <sip:yogesh.s@sipdomain.com;opaque=user:epid:hnfL28-sKViK8WkotsFDFQAA;gruu>

Also, we have two types of GRUUs a) Public GRUU and b) Temporary GRUU based on the requirements and the purpose. When a SIP client refreshes this registration prior to its expiration, the SIP Registrar will return back the same public GRUU. However, it will create a new temporary GRUU only when the contact for the instance expires, either through  explicit de-registration or timeout, all of the temporary GRUUs become invalidated.

NOTE: The SIP client would use one of its temporary GRUUs for anonymous calls (because it does not have the user's SIP Address), and use its public GRUU for all the other calls. 

How a SIP Proxy would treat a GRUU ?

From RFC 5627 we see how a SIP Proxy would de-reference the GRUU. Since it is self-explanatory I will like to just mention it here. It says that "A GRUU is simply a URI, a UA  dereferences it in exactly the same way as it would any other URI.  However, once the request has been routed to the appropriate proxy, the behavior is slightly different.  The proxy will map the GRUU to the AOR and determine the set of contacts that the particular UA instance has registered.  The GRUU is then mapped to those contacts, and the request is routed towards the UA".

Analogy:

If you feel that it is difficult to follow, I could help you with an analogy that we are well aware  in the TCP/IP suite. In IP network, though the MAC address is unique and the IP address is also unique within the network, you need both MAC and IP address to reach the correct host - isn't ?

Similarly, in SIP 
  • URN is like the MAC address  -  because URN does not change over reboot or while changed to another network.
  • GRUU is like the IP address (in this case, I mean a dynamic IP with the DHCP leasing time).Since this is provided by the SIP servers (at the time of Registration phase). Hence, this may gets changed for some session, but remains unique within the SIP infrastructure. 
The SIP proxy, is just like the Routers and Switches in the IP network (which is used for routing the Packets and Frames to the respective hosts correctly). Similarly, the SIP proxy has its own  mechanism to determine the exact SIP client instance using the GRUU. As a result, it could then routes the calls to all the contacts that are registered with same SIP URI. However, once a call transfer request is made from a particular sip client instance, the SIP proxy can uniquely identify the SIP client instance and route the call accordingly.

Therefore, by using GRUU (which in turn requires URN) in the Contact: header a SIP server can uniquely identify and route to exact SIP client instance. Thus, when a blind call transfer is initiated, the calls get routed to the exact end point (from where the request was made, even though when a user is signed to multiple devices with the same SIP URI).

In summary,
  • The SIP client creates a globally unique Instance ID at the time of installation (in Soft clients).
  • During the SIP client registration phase, the SIP client contacts the SIP REGISTRAR server with Contact: header which contains details like IP address, port number, SIP methods it can support and protocol  used by the SIP client. Besides that it also provides its URN value in +sip.instance and with Supported: header with the value gruu
  • The SIP Registrar replies with the GRUU value in the Contact: header. Such that, later a SIP server  by checking the GRUU value, it would be able to uniquely identify the SIP URI and its exact sip client instance.
  • Then, the SIP client later uses this GRUU value (which it received from SIP Registrar's response) in its Contact: header for any Non-Register communication (Invite, or 200 OK). 
  • Thus, while routing a call, a SIP Proxy server or (a SIP client) identifies the exact SIP client instance using GRUU and directs the call exactly to the requested end point.

Thank you for reading !

Reference:    RFC 5031, 5627.

Sunday, January 14, 2018


                                   Identifying Active Speakers in a conference


Hello Readers,

Have you ever had this question in your mind -  how a Conference server would detect the active speakers in a conference and then displays their name or photo or video while they are speaking in a conference call ? Today we are going to discuss about the technical details behind highlighting the active speakers in a Conference.

I had this question  in mind for quite some time and was trying to find an answer to it. So, today let us discuss about it. I am sure that you might know that in a real time communication we use RTP along with the transport protocol UDP (User Datagram Protocol)  in order to carry the media from one end point to the other.  RTP helps in several ways than merely carrying the traffic with the help of UDP.  


RTP (Real Time Protocol):

Apart from carrying the media from one end point to the other, RTP also helps in identifying the active speakers in a conference calls. It does that by using Synchronization Source (SSRC) and Contribution Source (CSRC) identifier. Let us see these in detail by looking at the RTP header.

In the RTP header (snapshot shown below) we have several fields like sequence number, timestamp, Marker bit, and the Synchronization source (SSRC) and contributing source (CSRC) identifiers. For today's topic let us discuss about the Synchronization source and contributing source identifiers here.





Let us see what actually happens in a conference call. A user will use a SIP Address to join the Conference call. However, the SIP is a application layer protocol. So, it cannot help in detecting the media or in identifying the end points that is used to send the media traffic. Moreover, what if a user uses two video cameras for a session. In that case, you need a mechanism to differentiate the signals from the two different devices. So that, after Sampling the analog signal and converting them to digital it can be placed them in a RTP packet with the captured device details. In the RTP packet is there a way to notify which device was used?

Yes, the Synchronization Source (SSRC) identifier in the RTP header, helps in the identifying the actual device that was used to send the media in a RTP session. Also this Synchronization Source identifier is globally unique within a RTP session. This is true even  if you have multiple Audio devices – a headset or a laptop microphone and speaker. This does not mean that the synchronization source identifier would remain same for all the RTP sessions, it may change for the next RTP session.

So, having a synchronization source (SSRC) identifier for each device would help in identifying the exact device and its input from the other device. For example, if a user uses the headset then the Synchronization source  identifier for the RTP session value would be the headset. So, with the help of SSRC identifier the Conference server would identify the active speaker and then shows the active speaker accordingly. This look simple isn't? Alright, now let us see  a real world scenario. 

Example:

Let us consider that, in a conference we have 5 participants. And chances are that, all the participants may join from different networks, countries and would have different bandwidth limits. Let us say 2 participants have excellent bandwidth and 1 has average bandwidth and 2 members are connected from a network which has low bandwidth.
In this case, if you want to choose a common codec then obviously it  would be one  which is used in the low frequency network can support. But by doing so we don’t want the users who have the excellent bandwidth to have poor video quality. So how to overcome this situation ? Here comes the role of a Mixer (Conference Server does that) 

RTP Mixer.
 A RTP mixer (in the conference sever) would be actually collecting all the inputs from all the participants. Then it would convert them to a new RTP packet and send it to all the endpoints. Thus,  the users in the poor network location would receive the quality which their network can support. Likewise, the other participants who are having excellent bandwidth can choose the one which has the best quality.

Here comes the tricky part. If you need to just differentiate the RTP stream using the source of the device using the Synchronization Source, then in this case the Synchronization source would be Mixer (conference server). So having only a Synchronization Source value in the RTP header is not an optimal solution to find the active speaker in a conference scenario. Hence we have another identifier called the Contributing Source (CSRC) identifier which helps in this situation.

The Contribution source identifier (CSRC) plays a very significant role while collecting the RTP streams from multiple users RTP stream and converting to a new RTP packet. While RTP Mixer (the conference server) creating the new RTP packet, it would also include the list of the active speakers in that instance, like participant 1 - who was talking AND  at the same time participant # 5 was trying to ask a question, while others were silent. So in this Packet the SSRC will have the mixer/conference server value and the CSRC will have the value of the Participant #1 and Participant #5. Thus, we get to see the active speaker in the conference even if hear sounds or noise from multiple users. That is great, but does the RTP Mixer work if a user is behind a NAT or a Firewall ?  No! So, here comes another important component called  - RTP Translator. Let us check that scenario now. As usual let us check why we need it and how it help us ?


RTP Translator:

The RTP Mixer can help only if the participants are directly reachable. However, if they are behind a NAT/ firewall then obviously a participant cannot reach the Mixer (Conference server). Hence, we have another component called RTP Translator. Consider this translator is like a server who sits in the DMZ and with a funnel. Then, it funnels the RTP traffic from all the participants to the Mixer and gets the new RTP stream from the Conference. Also it funnels  out to the other participants who are in the internet.

A Participant Leaving or Exiting Scenario:

Alright this sounds like a good option, but what happens if a person is leaving the conf. session  ? Well, in order to address this scenario, we have RTCP BYE packet. An RTCP sends a RTCP - BYE message when a person leaves a conference. Hence, others get notified that a user is leaving the conference.

NOTE:  We have not discussed about RTCP here yet. Let us discuss about it on some other day :-)

Reference: RFC 3550
           
To summarize, using the Synchronization source and Contribution Source identifiers in the RTP header we get to know the active speaker details in a conference call. I hope that you liked this topic and the discussion.  Thank you for reading !