The Session Initiation Protocol (SIP) was standardized by the Internet Engineering Task Force (IETF) and is described in several documents known as RFCs (Request for Comments). RFC3261 is one of the most recent and is called SIP version 2. SIP is an application-layer protocol used to establish, modify, and terminate sessions or multimedia calls. These sessions can be conferences, e-learning, telephony over the Internet, and similar applications. It is based on a text protocol similar to Hypertext Transfer Protocol (HTTP) and it is designed to start, keep, and close interactive communication sessions between users. These days SIP is one of the most used protocols for VoIP and is present on almost every IP phone in the market.
By the end of this chapter you will be able to:
Describe what SIP is
Describe what SIP is for
Describe SIP architecture
Explain the meaning of its main components
Understand and compare the main SIP messages
Describe the header fields processing for INVITE and REGISTER requests
The SIP protocol supports five features for establishing and closing multimedia sessions.
User location: Determines the endpoint address used for communication.
User parameters negotiation: Determines the media and parameters to be used.
User availability: Determines if the user is available or not to establish a session.
Call establishment: Establishes the parameters for caller and callee, and informs on call progress (ringing, ringback, congestion) to both parties.
Call management: Session transfer and closing.
The SIP protocol was designed as part of a multimedia architecture containing other protocols such as RVSP, RTP, RTSP, and SDP. However it does not depend on them to work.
SIP is very similar to HTTP in the way it works. The SIP address is just like an e-mail address. An interesting feature used in SIP proxies is alias, so you can have multiple SIP addresses such as:
In the SIP architecture, we have user agents and servers. SIP uses a peer-to-peer distributed model with a signaling server. The server handles just the signaling, while the user agent clients and the user agent servers handles signaling and media.This is depicted in the figure below:
In the SIP model, a user agent, usually a SIP phone, will start communicating with its SIP proxy, seen here as the outgoing proxy, to send the call using a message known as INVITE.
The outgoing proxy will see that the call is directed to an outside domain. It will seek the DNS server for the address of the target domain and resolve the IP address. Then, the outgoing proxy will forward the call to the SIP proxy responsible for DomainB.
The incoming proxy will query its location table for the IP address of agentB. If this address was inserted in the location table by a previous registration process, so the incoming proxy can locate the address. Now with this address, it can forward the call to agentB.
After receiving the SIP message, agentB will have all the information required to establish an RTP session (usually audio) with agentA. Using a message such as BYE will terminate the session.
Usually VoIP providers don't implement a pure SIP trapezoid, they don't allow you to send calls to outside domains, because this affects the revenue stream. They implement a topology closer to a SIP triangle.
Below, you can see the main components of the SIP architecture. The entire SIP signaling flows through the SIP proxy server. On the other hand, the media signaling, transported by the RTP protocol, flows directly from one endpoint to another. Some of the components will be briefly explained in the list below.
UAC (user agent client)—Client or terminal that starts the SIP signaling.
UAS (user agent server)—Server that responds to the SIP signaling coming from a UAC.
Proxy Server—It receives requests from a UA and transfers them to another SIP proxy if this specific terminal requested is not under its domain.
Redirect Server—This receives requests and sends back to the caller including data about the destination, instead of sending directly to the callee.
Location Server—This provides the callee's contact addresses to Proxy and Redirect Servers.
The Proxy, Redirect, and Location servers are usually available physically in the same computer and software.
The SIP protocol employs a component called a registrar. It is a server that accepts REGISTER requests and saves the information received in these packets on the Location server for their managed domains. The SIP protocol has a discovery capacity; in other words, if a user starts a session with another user, the SIP protocol has to discover an existing host where the user can be reached. The discovery process is done by a Location server that receives the request and finds where to send it. This is based in a Location database maintained by the Location server per domain. The Registrar server may accept other types of information, not only the client's IP addresses. It can receive other information such as CPL (Call Processing Language) scripts on the server.
Before a telephone can receive a call, it needs to be registered within the location database. In this database we will have all phones associated with their respective IP addresses. In our example, you will see the SIP user
<email@example.com> registered at the IP address 22.214.171.124.
RFC3665 defines best practices to implement a minimum set of functionality for a SIP IP communications network. Below are the flows defined according to RFC3665 for the register transactions:
According to RFC3665, there are five basic flows associated with the process of registering a user agent, which are as follows:
A successful new registration—after sending the Register request, the user agent will be challenged against its credentials. We will see this in detail in the chapter dedicated to authentication.
An update of the contact list—Since it is not a new registration, the message already contains the digest and a "401" message won't be sent. To change the contact list, the user agent just needs to send a new register message with the new contact in the CONTACT header field.
Request for current contact list—In this case, the user agent will send the CONTACT header field empty, indicating the user wishes to query the server for the current contact list. In the 200 OK message, the SIP server will send the current contact list in the CONTACT header field.
Cancellation of a registration—The user agent now sends the message with an EXPIRES header field of 0 and a CONTACT HEADER field configured as '*' to apply to all existing contacts.
Unsuccessful Registration—The UAC sends a Register Request and receives a "401 Unauthorized" message, in exactly the same way as the successful registration. In the sequence, it produces a hash and tries to authenticate. The server, detecting an invalid password, again sends a "401 Unauthorized" message. The process will be repeated for the number of retries configured in the UAC.
In the SIP proxy mode, the entire SIP signaling goes through the SIP proxy. This behavior will help in processes such as billing and it is, by far, the most common choice. The drawback is the overhead caused by the server in the middle of all SIP communications during the session establishment. Remember, RTP packets will always go directly from one endpoint to another, even if the server is working as a SIP proxy.
The SIP proxy can operate in the SIP redirect mode. In this mode the SIP server is very scalable, because it doesn't keep the state of transactions. Just after the initial INVITE, it replies to the UAC with a "302 Moved Temporarily" and gets off the SIP dialog. In this mode a SIP proxy, even with very few resources, can forward millions of calls per hour. It is normally used when you need high scalability, but don't need to bill the calls.
The basic messages sent in a SIP environment are:
Most of the time, you will use REGISTER, INVITE, BYE, and CANCEL. Some messages are used for other features. As an example, INFO is used for DTMF relay and mid-call signaling information. PUBLISH, NOTIFY, and SUBSCRIBE give support to presence systems. REFER is used for call transfer and MESSAGE for chat applications. Newer messages can appear depending on the protocol standardization process.
Responses to these messages are in text format as in the HTTP protocol. Some of the most important are shown below:
This section introduces some basic SIP operations using a simple example. Let's examine this message sequence between two user agents shown below. You can see several other flows associated with the session establishment in RFC3665.
The messages are labeled in sequence. In this example userA uses an IP phone to call another IP phone over the network. To complete the call, two SIP proxies are used.
The userA calls userB using its SIP identity, called SIP URI. The URI is similar to an email address, such as
sip:userA@sip.com. A secure SIP URI can be used too, such as
sips:userA@sip.com. A call made using SIPS will use a secure transport (TLS-Transport Layer Security) between the caller and the callee.
The transaction starts with userA sending an INVITE request addressed to userB. The INVITE request contains a certain number of header fields. Header fields are named attributes that provide additional information about the message; they include a unique identifier, the destination, and information about the session.
The first line of the message contains the method name. The following lines contain a list of header fields. This example contains the minimum set required. We will briefly describe these header fields below:
VIA: This contains the address at which userA will be waiting to receive responses to this request. It also contains a parameter called branch that identifies this transaction. The VIA header defines the last SIP hop as IP, transport, and transaction-specific parameters. VIA is used exclusively for routing back the replies. Each proxy adds an additional VIA header. It is a lot easier for replies to find the route back using the VIA header, than to go again to the location server or DNS.
TO: This contains the name (display name) and the SIP URI (that is,
sip:userB@sip.com) to the destination originally selected. The TO header field is not used to route the packets.
FROM: This contains the name and SIP URI (that is,
sip:userA@sip.com) that indicate the caller ID. This header field has a tag parameter containing a random string that was added to the URI by the IP phone. It is used for purposes of identification. The tag parameter is used in the TO and FROM fields. It serves as a general mechanism to identify the dialog, which is the combination of the Call-ID along with the two tags, one from each participant in the dialog. Tags can be useful in parallel forking.
CALL-ID: This contains a globally unique identifier for this call generated by the combination of a random string and the host name or IP address from the IP phone. A combination of the tags TO, FROM, and CALL-ID fully defines an end-to-end SIP relation known as a SIP dialog.
CSEQ: The CSEQ or command sequence contains an integer and a method name. The CSEQ number is incremented for each new request inside a SIP dialog and is a traditional sequence number.
CONTACT: This contains a SIP URI, which represents a direct route to contact userA, usually composed of a user name and a FQDN (fully qualified domain name). Sometimes the domains are not registered, thus, IP address are permitted too. While the VIA header field tells the other elements where to send a response, the CONTACT tells the other elements where to send future requests.
MAX-FORWARDS: This is used to limit the number of allowed hops a request can make in the path to its final destination. It consists of an integer decremented by one on each hop.
CONTENT-LENGTH: This contains a byte count of the body message.
Session details, like media type and codec are not described using SIP. Instead it uses a session description protocol called SDP (RFC2327). This SDP message is carried by the SIP message, similar to an email attachment.
The sequence is as follows:
The phone does not know the location of userB or the server responsible for domainB. Thus, it sends the INVITE request to the server responsible for the domain sipA. This address is configured in the phone of userA or can be discovered by DHCP. The server sipA.com is also known as the SIP proxy for the domain sipA.com.
In this example, the proxy receives the INVITE request and sends a message "100 trying" back to userA, signaling that the proxy received the INVITE and is working to forward the request. The SIP responses use a three digit code followed by a descriptive phrase. This response contains the same TO, FROM, CALL-ID, and CSEQ header fields and a parameter "branch" in the header field VIA as the INVITE request. This allows userA's phone to correlate the INVITE request sent.
ProxyA locates proxyB consulting a DNS server (SRV records) to find what server is responsible for the SIP domain sipB and forwards the INVITE request. Before sending the request to proxyA, it adds a VIA header field that contains its own address. This allows userA's phone to correlate the response to the INVITE request sent. .
ProxyB receives the INVITE request and responds with a "100 Trying" message back to proxyA indicating that it is processing the request.
ProxyB consults its own location database for userB's address and then it adds another VIA header field with its own address to the INVITE request and sends to userB's IP address.
UserB's phone receives the INVITE request and start ringing. The phone indicates back this condition, sending a message "180 Ringing".
This message is routed back through both proxies in the reverse direction. Each proxy uses the VIA header fields to determine where to send the response and removes its own VIA header from the top. As a result, the message "180 Ringing" can return back to the user without any lookups to DNS or Location Service Responses and without the need for stateful processing. Thus, each proxy sees all messages resulting from the INVITE request.
When userA's phone receives the "180 Ringing" Responses, it starts to ring back, to signal to the user that the call is ringing on the other side. Some phones show this in the display.
In this example, userB decides to answer the call. When userB responds, the phone sends a response "200 Ok" to indicate that the call was taken. The "200 Ok" message contains in its body a session description specifying codecs, ports, and everything pertaining to the session. It uses the SDP protocol for this duty. As a result, there is an exchange in two phases of messages from A to B (INVITE) and B to A (200 OK) negotiating the resources and capabilities used on the call in a simple "offer/response" model. If userB does not want to receive the call or is busy, the "200 OK" won't be sent and a message signaling the condition (that is, "486 Busy Here") will be sent instead.
The first line contains the response code and a description (OK). The following lines contain the header fields. The fields VIA, TO, FROM, CALL-ID, and CSEQ are copied from the INVITE request. There are three VIA fields, one added by userA, another by proxyA and finally that added by proxy B. The SIP phone of userB added a parameter TAG on both end points inside the dialog, which will be included on all future requests and responses for this call.
The CONTACT header field contains the URI with which userB can be contacted directly on their own IP phone.
The CONTENT-TYPE and CONTENT-LENGTH header-fields give some information about the the SDP header ahead. The SDP header contains media-related parameters used to establish the RTP session.
In this case, the message "200 Ok" is sent back through both proxies and is received by userA and then the phone stops ringing back indicating that the call was accepted.
Finally userA sends an ACK message to userB's phone confirming the reception of the "200 OK" message. In this example the ACK is sent directly from phoneA to phoneB avoiding both proxies. ACK is the only SIP method that has no reply. The endpoints learned each other's addresses from the CONTACT header fields during the INVITE process. This ends the cycle INVITE/200 OK/ACK also known as SIP three way handshake.
At this moment the session between both users starts and they send media packets to each other using a mutually agreed format established by the SDP protocol. Usually these packets are end-to-end. During the session, the parties can change the session characteristics issuing a new INVITE request. This is called a re-invite. If the re-invite is not acceptable, a message "488 Not Acceptable Here" will be sent, but the session will not fail.
At the session end, userB disconnects the phone and generates a BYE message. This message is routed directly to userA's softphone bypassing both proxies.
UserA confirms the reception of the BYE message with a "200 OK" message ending the session. No ACK is sent. An ACK is sent only for INVITE requests.
In some cases it can be important for proxies to stay in the middle of the signaling to see all messages between endpoints during the whole session. If the proxy wants to stay in the path after the initial INVITE request it has to add the RECORD-ROUTE header field to the request. This information will be received by userB's phone and it will send back the message through the proxies with the RECORD-ROUTE header field included too. Record routing is used in most scenarios.
The REGISTER request is the way that proxyB uses to learn the location of userB. When the phone initializes or at regular time intervals, softphone B sends a REGISTER request to a server on domain sipB known as "SIP REGISTRAR". The REGISTER messages associate a URI (
userB@sipB.com) to an IP address. This binding is stored in a database in the Location server. Usually the Registrar, Location, and Proxy server are in the same computer and use the same software. OpenSER is capable of playing the three roles. A URI can only be registered by a single device at a certain time.
It is important to understand now the difference between a transaction and a dialog. A transaction occurs between a user agent client and a user agent server and comprises all messages from the first request to the final response. The responses can be provisional starting with 1 followed by two digits (e.g. 180 Ringing) or final starting with 2 followed by two digits (e.g. 200 OK). The scope of a transaction is defined by the stack of VIA headers of the SIP messages. So, the user agents, after the initial invite, don't need to rely on DNS or location tables to route the messages.
A dialog usually starts with an INVITE transaction and ends with a BYE transaction. A dialog is identified by the CALL-ID header field. A combination of the TO tag, the FROM tag, and the Call-ID completely defines the dialog.
According to RFC 3665 there are 11 basic session establishment flows. The list is not meant to be complete, but covers the best practices. The first two were already covered in this chapter, "Successful Session Establishment" and 'Session Establishment Through Two Proxies". Some of them will be seen in the chapter dedicated to call forwarding such as "Unsuccessfull with no Answer" and "Unsuccessful Busy".
The Real Time Protocol (RTP) is responsible for the real-time transport of data such as audio and video. It was standardized in RFC3550. It uses UDP as the transport protocol. To be transported, the audio or video has to be packetized by a codec. Basically, the protocol allows the specification of timing and content requirements of the media transmission for the incoming and outgoing packets using:
Packet forward without retransmission
The RTP has a companion protocol called RTCP (Real Time Control Protocol) used to monitor the RTP packets. It can measure the delay and jitter.
The content described in the RTP protocol is usually encoded by a codec. Each codec has a specific use. Some have compression while others don't. The G.711 codec, which does not use compression, is very common. With 64Kbps of bandwidth for a single channel it needs a high speed network, commonly found in Local Area Networks (LANs). However, in Wide Area Networks (WAN) 64Kbps can be too expensive to buy for a single voice channel. Codecs such as G.729 and GSM can compress the voice packets to as low as 8Kbps saving a lot of bandwidth. Some codecs such as the iLBC from Global IP sound can conceal packet loss. The iLBC can sustain a good voice quality even with 7% packet loss. So you have to choose the codecs you will support in your VoIP provider wisely.
In some cases the RTP protocol is used to carry signaling information such as DTMF. RFC2833 describes a method to transmit DTMF as named events in the RTP protocol. It is very important that you use the same method between user agent servers and user agent clients.
RTCP can provide feedback on the quality of reception. It provides out-of-band control information for an RTP media flow. Statistics such as jitter, round trip time (RTT), latency, and packet loss can be gathered using RTCP. RTCP is usually used for voice quality reporting.
The SDP protocol is described in RFC4566. It is used to negotiate session parameters between the user agents. Media details, transport addresses, and other media-related information are exchanged between the user agents using the SDP protocol. Normally the INVITE message contains the SDP offer message, while the "200 OK" contains the answer message. Below these messages are shown. You can observe that the GSM codec is offered, but the other phone does not support it. Then it answers with the supported codecs, in this case G.711 ulaw (PCMU) and G.729. The session rtpmap:101 is the DTMF-relay described in the RFC2833.
INVITE (SDP Offer).
200 OK (SDP Answer).
Before we start to dig in the SIP proxy it is important to understand all the components for a VoIP provider solution. A VoIP provider usually consists of several servers and services. The services described here could be installed in a single server or multiple servers depending on the dimensioning.
In this book we will cover each one of these components, from left to right, in the chapters ahead. We are going to use this picture in all chapters to help you to know where you are.
The SIP proxy is the central component of our solution. It is responsible for registering the users and for keeping the location database (which maps IP to SIP addresses). The entire SIP routing and signaling is handled by the SIP proxy and it is responsible too for end user services such as call forwarding, white/blacklist, speed dialing, and others. This component never handles the media (RTP packets); all media-related packets are routed directly from the user agent clients, servers, and PSTN gateways.
One important component is the user administration and provisioning portal. In the portal, the user may subscribe to the service and should be able to buy credits, change passwords, and verify his or her account. On the other hand, administrators should be able to remove users, change user credits, grant, and remove privileges. Provisioning is the process used to make it easier, for administrators, to provide automatic installation of user agents such as IP phones, analog telephony adapters, and softphones.
To communicate to the public switched telephone network, a PSTN gateway is required. Usually this gateway will interface to the PSTN using E1 or T1 trunks. The most common products in this arena are gateways from Cisco, AudioCodes, and Quintum. Aste risk is gaining market in this area, because of its price per port cost, sometimes 75% less than the competitors. To evaluate a good gateway, check the support of SIP extensions such as RFC3515 (REFER), RFC3891 (Replaces), and RFC3892 (Referred by). These protocols will allow unattended transfers behind the SIP proxy; without them in the gateway it might be impossible to transfer calls.
The SIP proxy never handles the media. Services such as IVRs, voicemail, conference, or anything related to media should be implemented in a media server. SEMS SIP Express media server, developed by iptel, has some nice features such as conference, voicemail, and announcements. Once again, Asterisk can be used as a wildcard to provide these services.
Any SIP provider will have to handle NAT traversal for its customers. The media proxy is an RTP bridge that helps the users behind symmetric firewalls to access the SIP provider. Without proxies it won't be possible to serve as much as 35% of the users. You can implement a universal NAT traversal technique using these components. The media proxy can help you too in the accounting correction for unfinished SIP dialogs, which, for some reason, didn't receive the BYE message.
A server with RADIUS installed will be fundamental for accounting the calls. A SIP provider should take maximum care of accounting records. OpenSER can be configured to send the accounting to a RADIUS server such as Radiator or FreeRADIUS. SIP calls can be accounted to a database as well. However, accounting to a database generates two records that need to be correlated manually.
The RADIUS server has information about call duration, but does not have information about the rates and prices for the call. Applying prices to calls can be very tricky. We will use for our provider a GPL tool called CDRTool developed by AG projects (cdrtool.agprojects.com). It will be responsible for applying rates to calls.
Finally we will need monitoring, troubleshooting, and testing tools to help debug any problems occurring in the SIP server. The first tool is the protocol analyzer and we will see how to use ngrep, ethereal, and tethereal. OpenSER has a module called SIP trace, which we will use too.
The best reference for the SIP protocol is RFC3261. To read the RFCs is a little bit boring and sleepy (it is very good when you have insomnia). You can find the RFC at: http://www.ietf.org/rfc/rfc3261.txt.
A good SIP tutorial can be found at Columbia University: http://www.cs.columbia.edu/~coms6181/slides/11/sip_long.pdf. Together with this you can find a lot of information about SIP at http://www.cs.columbia.edu/sip/.
A very good tutorial can be found at the iptel website: http://www.iptel.org/files/sip_tutorial.pdf.
There is a mailing list where you can post questions about SIP called SIP implementors: https://lists.cs.columbia.edu/mailman/listinfo/sip-implementors.
In this chapter you have learned what the protocol SIP is and its functionality. You had the opportunity to get to know the SIP components such as the SIP proxy, SIP Registrar, User Agent Client, User Agent Server, and Gateway PSTN. You saw SIP architecture, its main messages and processes. Some places to find further information were listed too.