WebRTC Integrator's Guide

By Altanai
  • Instant online access to over 7,500+ books and videos
  • Constantly updated with 100+ new titles each month
  • Breadth and depth in over 1,000+ technologies
  1. Running WebRTC with and without SIP

About this book

WebRTC enables real-time communication across the Web and with the whole telecom world behind a single button on a web page. WebRTC promises to bring new reforms and innovation for IP telephony. WebRTC comes with numerous integration features, such as new standards for VoIP services, call control applications, profile and phonebook management, and much more.

This book covers all aspects of building a standalone WebRTC communication platform, making a WebRTC SIP-based Communicator, and shows you how to overcome challenges. It also describes the integration of Rich Services such as voicemail, conference calls, and file transfers, as well as call control mechanisms such as screening and routing. The book then takes you through building a WebRTC project and its integration in the access, network, and service layers of IMS. The book ends with the creation of a commercial-quality web application capable of setting and receiving calls, messages, and conference calls, and other numerous services.

Publication date:
October 2014


Chapter 1. Running WebRTC with and without SIP

WebRTC lets us make calls right from a web page without any plugin. This was made possible using media APIs of the browser to fetch user media, WebSocket for transportation, and HTML5 to render the media on the web page. Thus, WebRTC is an evolved form of WebSocket communication. WebSocket is a Transport Layer protocol that carries data. The WebSocket API is an Application Programming Interface (API) that enables web pages to use the WebSocket protocol for (duplex) communication with a remote host.

In this chapter, we will study how WebRTC really works. We will also demonstrate the use of WebRTC media APIs to capture and render input from a user's microphone and camera onto a web page. In the later part of chapter, we will find out how to build a simple standalone WebRTC client using the plain WebSocket protocol as the signaling mechanism.


JavaScript Session Establishment Protocol (JSEP)

The communication model between a client and remote host is based on the JSEP architecture, which differentiates the signaling and media transaction into different layers.

The differentiation is shown in the following figure:

JSEP signaling and media

As an example, let's consider two peers, A and B, where A initiates communication with B. Initially, in the first case, A being the offerer will have to call the createOffer function to begin a session. A also mentions details such as codecs through a setLocalDescription function, which sets up its local config. The remote party, B, reads the offer and stores it using the setRemoteDescription function. The remote party, B, calls the createAnswer function to generate an appropriate answer, applies it using the setLocalDescription function, and sends the answer back to the initiator over the signaling channel. When A gets the answer, it also stores it using the setRemoteDescription function, and the initial setup is complete. This is repeated for multiple offers and answers. The latest on JSEP specifications can be read from the Internet Engineering Task Force (IETF) site at http://datatracker.ietf.org/doc/draft-ietf-rtcweb-jsep/.

Signal and media flows

The differentiation between signal and media flows is an important aspect of the WebRTC call setup.

The signaling mechanism can be any among HTTP/REST, JavaScript Object Notation (JSON) via XMLHttpRequest (XHR), Session Initiation Protocol (SIP) over websockets, XMPP, or any custom or proprietary protocol. The media (audio/video) is defined through the Session Description Protocol (SDP) and flows from peer to peer.

A few instances of end-to-end signaling and media flow variants are shown in the following screenshot:

The preceding figure depicts signaling over the WebRTC API in the JSON format via XHR.

Now, the following figure depicts signaling over the WebRTC API in eXtensible Messaging and Presence Protocol (XMPP):

While it's very popular to use the WebRTC API with SIP support through JavaScript libraries such as JSSIP, SIPML5, PJSIP, and so on, these libraries cater to the SIP/IMS (IP Multimedia Subsystem) world and are not mandatory for setting up enterprise-level WebRTC Infrastructure. In fact, it is a misconception that WebRTC is coupled with SIP in itself; it isn't.


IP Multimedia System (IMS) is part of the Next Generation Network (NGN) model for IP-based communication.


Running WebRTC without SIP

HTML5 websockets can be defined by ws:// followed by the URL in the server field while readying a WebRTC client for registration. This enables bidirectional, duplex communications with server-side processes, that is, server-side push events to the client. It also enables the handshake after sharing media metadata such as ports, codecs, and so on.

It should be noted that WebRTC works in an offer/answer mode and has ways of traversing the Network Address Translation (NAT) and firewalls by means of Interactive Connectivity Establishment (ICE). ICE makes use of the Session Traversal Utilities for NAT (STUN) protocol and its extension, Traversal Using Relay NAT (TURN). This is covered later in the chapter.

Sending media over WebSockets

WebRTC mainly comprises three operations: fetching user media from a camera/microphone, transmitting media over a channel, and sending messages over the channel. Now, let's take a look at the summarized description of every operation type.


The JavaScript getUserMedia function (also known as MediaStream) is used to allow the web page to access users' media devices such as camera and microphone using the browser's native API, without the need of any other third-party plugins such as Adobe Flash and Microsoft Silverlight.


For simple demos of these methods, download the WebRTC read-only by executing the following command:

svn checkout http://webrtc.googlecode.com/svn/trunk/ webrtc-read-only

The following is the code to access the IP camera in the Google Chrome browser and display the local video in a <video/> element:

/*The HTML to define a button to begin the capture and HTML5 video element on web page body */
<video id="vid" autoplay="true"></video>
<button id="btn" onclick="start()">Start</button>

/*The JavaScript block contains the following function call to start the media capture using Chrome browser's getUserMedia function*/
video = document.getElementById("vid");
function start() {
  navigator.webkitGetUserMedia({video:true}, gotStream, function() {});
  btn.disabled = true;

/*The function to add the media stream to a video element on a page*/
  function gotStream(stream) {
  video.src = webkitURL.createObjectURL(stream);


When the browser tries to access media devices such as a camera and mic from users, there is always a browser notification that asks for the user's permission.


Downloading the example code

You can download the example code files for all Packt books you have purchased from your account at http://www.packtpub.com. If you purchased this book elsewhere, you can visit http://www.packtpub.com/support and register to have the files e-mailed directly to you

The following screenshot depicts the user notification for granting permission to access the camera in Google Chrome:

The following screenshot depicts the user notification for granting permission to access the camera in Mozilla Firefox:

The following screenshot depicts the user notification for granting permission to access the camera in Opera:


In WebRTC, media traverses in a peer-to-peer fashion and is necessary to exchange information prior to setting up a communication path such as public IP and open ports. It is also necessary to know about the peer's codecs, their settings, bandwidth, and media types.

To make the peer connection, we will need a function to populate the values of the RTCPeerConnection, getUserMedia, attachMediaStream, and reattachMediaStream parameters. Due to the fact that the WebRTC standard is currently under development, the JavaScript API can change from one implementation to another. So, a web developer has to configure the RTCPeerConnection, getUserMedia, attachMediaStream, and reattachMediaStream variables in accordance to the browser on which we are running the HTML content.


It is noted that WebRTC standards are in rapid evolution. The API that was used for the first version of WebRTC was the PeerConnection API, which had distinct methods for media transmission. As of now, the old PeerConnection API has been deprecated and a new enhanced version is under process. The new Media API has replaced the media streams handling in the old PeerConnection API.

The browser APIs of different browsers have different names. The criterion is to determine the browser on which the web page is opened and then call the appropriate function for the WebRTC operation. The identity of the browser can be determined by extracting a friendly name or checking for a match with a specific library name of the different browser. For example, when navigator.webkitGetUserMedia is true, then WebRTCDetectedBrowser = "chrome", and when navigator.mozGetUserMedia is true, then WebRTCDetectedBrowser = "firefox". The following table shows the W3C standard elements in Google Chrome and Mozilla Firefox:

W3C Standard















Such methods also exist for Opera, which is a new addition to the WebRTC suite. Hopefully, Internet Explorer, in the future, would have native support for WebRTC standards. For other browsers such as Safari that don't support WebRTC as yet, there are temporary plugins that help capture and display the media elements, which can be used until these browsers release their own enhanced WebRTC supported versions. Creating WebRTC-compatible clients in Internet Explorer and Safari is discussed in Chapter 9, Native SIP Application and Interaction with WebRTC Clients.

The following code snippet is used to make an RTC peer connection and render videos from one HTML video frame to another on the same web page. The library file, adapter.js, is used, which renders the polyfill functionality to different browsers such as Mozilla Firefox and Google Chrome.

The HTML body content that includes two video elements for the local and remote videos, the text status area, and three buttons to start capturing, sending, and stop receiving the stream are given as follows:

<video id="vid1" autoplay="true" muted="true"></video>
<video id="vid2" autoplay></video>
<button id="btn1" onclick="start()">Start</button>
<button id="btn2" onclick="call()">Call</button>
<button id="btn3" onclick="hangup()">Hang Up</button>
<xtextarea id="ta1"></textarea>
<xtextarea id="ta2"></textarea>

The JavaScript program to transmit media from the video element to another at the click of the Start button, using the WebRTC API is given as follows:

/* setting the value of start, call and hangup to false initially*/
btn1.disabled = false;
btn2.disabled = true;
btn3.disabled = true;

/* declaration of global variables for peerconecection 1 and 2, local streams, sdp constrains */
var pc1,pc2;
var localstream;
var sdpConstraints = {'mandatory': {
                        'OfferToReceiveVideo':true }};

The following code snippet is the definition of the function that will get the user media for the camera and microphone input from the user:

function start() {
  btn1.disabled = true;
  getUserMedia({audio:true, video:true},
  /* get audio and video capture */
  gotStream, function() {});

The following code snippet is the definition of the function that will attach an input stream to the local video section and enable the call button:

function gotStream(stream){
  attachMediaStream(vid1, stream);
  localstream = stream;/* ready to call the peer*/
btn2.disabled = false;

The following code snippet is the function call to stream the video and audio content to the peer using RTCPeerConnection:

function call() {
  btn2.disabled = true;
  btn3.disabled = false;
  videoTracks = localstream.getVideoTracks();
  audioTracks = localstream.getAudioTracks();
  var servers = null;

  pc1 = new RTCPeerConnection(servers);/* peer1 connection to server */
  pc1.onicecandidate = iceCallback1;
  pc2 = new RTCPeerConnection(servers);/* peer2 connection to server */

  pc2.onicecandidate = iceCallback2;
  pc2.onaddstream = gotRemoteStream;

function gotDescription1(desc){/* getting SDP from offer by peer2 */
pc2.createAnswer(gotDescription2, null, sdpConstraints);

function gotDescription2(desc){/* getting SDP from answer by peer1 */

On clicking the Hang Up button, the following function closes both of the peer connections:

function hangup() {
  pc1 = null; /* peer1 connection to server closed */
  pc2 = null; /* peer2 connection to server closed */
  btn3.disabled = true; /* disables the Hang Up button */
  btn2.disabled = false; /*enables the Call button */


function gotRemoteStream(e){
  vid2.src = webkitURL.createObjectURL(e.stream);

function iceCallback1(event){
  if (event.candidate) {
    pc2.addIceCandidate(new RTCIceCandidate(event.candidate));

function iceCallback2(event){
  if (event.candidate) {
    pc1.addIceCandidate(new RTCIceCandidate(event.candidate));

In the preceding example, JSON/XHR (XMLHttpRequest) is the signaling mechanism. Both the peers, that is, the sender and receiver, are present on the same web page; this is represented by the two video elements shown in the following screenshot. They are currently in the noncommunicating state.

As soon as the Start button is hit, the user's microphone and camera begin to capture. The first peer is presented with the browser request to use their camera and microphone. After allowing the browser request, the first peer's media is successfully captured from their system and displayed on the screen. This is demonstrated in the following screenshot:

As soon as the user hits the Call button, the captured media stream is shared in the session with the second peer, who can view it on their own video element. The following screenshot depicts the two peers sharing a video stream:

The session can be discontinued by clicking on the Hang Up button.


The DataChannel function is used to exchange text messages by creating a bidirectional data channel between two peers. The following is the code to demonstrate the working of RTCDataChannel.

The following code snippet is the HTML body of the code for the DataChannel function. It consists of a text area for the two peers to view the messages and three buttons to start the session, send the message, and stop receiving messages.

<div id="left">
<h2>Send data</h2>
<textarea id="dataChannelSend" rows="5" cols="15" disabled="true">
<button id="startButton" onclick="createConnection()">
<button id="sendButton" onclick="sendData()">Send Data</button>
<button id="closeButton" onclick="closeDataChannels()">Stop Send Data
<div id="right">
<h2>Received Data</h2>
<textarea id="dataChannelReceive" rows="5" cols="15" disabled="true">

The style script for the text area is given as follows; to differentiate between the two peers, we place one text area aligned to the right and another to the left:

#left { position: absolute; left: 0; top: 0; width: 50%; }
#right { position: absolute; right: 0; top: 0; width: 50%; }

The JavaScript block that contains the functions to make the session and transmit the data is given as follows:

/*Declaring global parameters for both sides' peerconnection, sender, and receiver channel*/
var pc1, pc2, sendChannel, receiveChannel;

/*Only enable the Start button, keep the send data and stop send data button off*/
startButton.disabled = false;
sendButton.disabled = true;
closeButton.disabled = true;

The following code snippet is the script to create PeerConnection in Google Chrome, that is, webkitRTCPeerConnection that was seen in the previous table. It is noted that a user needs to have Google Chrome Version 25 or higher to test this code. Some old Chrome versions are also required to set the --enable-data-channels flag to the enabled state before using the DataChannel functions.

function createConnection() {
  var servers = null;
  pc1 = new webkitRTCPeerConnection(servers,{
    optional: [{RtpDataChannels: true}]});
    try {
      sendChannel = pc1.createDataChannel("sendDataChannel", {
        reliable: false});
    } catch (e) {
      alert('Failed to create data channel.' + 'You need Chrome M25 or later with --enable-data-channels flag'););

  pc1.onicecandidate = iceCallback1;
  sendChannel.onopen = onSendChannelStateChange;
  sendChannel.onclose = onSendChannelStateChange;
  pc2 = new webkitRTCPeerConnection(servers,{
    optional: [{RtpDataChannels: true}]});
  pc2.onicecandidate = iceCallback2;
  pc2.ondatachannel = receiveChannelCallback;

  startButton.disabled = true; /*since session is up, disable start button */
  closeButton.disabled = false;  /*enable close button */

The following function is used to invoke the sendChannel.send function along with user text to send data across the data channel:

function sendData() {
  var data = document.getElementById("dataChannelSend").value;

The following function calls the sendChannel.close() and receiveChannel.close() functions to terminate the data channel connection:

function closeDataChannels() {
pc1.close();      /* peer1 connection to server closed */
pc2.close();      /* peer2 connection to server closed */

  pc1 = null;
  pc2 = null;
startButton.disabled = false;
sendButton.disabled = true;
closeButton.disabled = true;
document.getElementById("dataChannelSend").value = "";
document.getElementById("dataChannelReceive").value = "";
document.getElementById("dataChannelSend").disabled = true;

Peer connection 1 sets the local description, and peer connection 2 sets the remote description from the SDP exchanged, and the answer is created:

function gotDescription1(desc) {

function gotDescription2(desc) {
  trace('Answer from pc2 \n' + desc.sdp);

The following is the function to get the local ICE call back:

function iceCallback1(event) {
  if (event.candidate) {

The following is the function for the remote ICE call back:

function iceCallback2(event) {
  if (event.candidate) {

The function that receives the control when a message is passed back to the user is as follows:

function receiveChannelCallback(event) {
  receiveChannel = event.channel;
  receiveChannel.onmessage = onReceiveMessageCallback;
  receiveChannel.onopen = onReceiveChannelStateChange;
  receiveChannel.onclose = onReceiveChannelStateChange;

function onReceiveMessageCallback(event) {
  document.getElementById("dataChannelReceive").value = event.data;

function onReceiveChannelStateChange() {
  var readyState = receiveChannel.readyState;

function onSendChannelStateChange() {
  var readyState = sendChannel.readyState;
  if (readyState == "open") {
    document.getElementById("dataChannelSend").disabled = false;
    sendButton.disabled = false;
    closeButton.disabled = false;
  } else {
    document.getElementById("dataChannelSend").disabled = true;
    sendButton.disabled = true;
    closeButton.disabled = true;

The following screenshot shows that Peer 1 is prepared to send text to Peer 2 using the DataChannel API of WebRTC:

Empty text areas before beginning the exchange of text

On clicking on the Start button, as shown in the following screenshot, a session is established between the peers and the server:

Putting in text from one's peers after hitting the Start button

As Peer 1 keys in the message and hits the Send button, the message is passed on to Peer 2. The preceding snapshot is taken before sending the message, and the following picture is taken after sending the message:

Text is exchanged on DataChannel on the click of the Send button

However, right now, you are only sending data from one localhost to another. This is because the system doesn't know any other peer IP or port. This is where socket-based servers such as Node.js come into the picture.

Media traversal in WebRTC clients

Real-time Transport Protocol (RTP) is the way for media to flow between end points. Media could be audio and/or video based.


Media stream uses SRTP and DTLS protocols.

RTP in WebRTC is by default peer-to-peer as enforced by the Interactive Connectivity Establishment (ICE) protocol candidates, which could be either STUN or TURN. ICE is required to establish that firewalls are not blocking any of the UDP or TCP ports. The peer-to-peer link is established with the help of the ICE protocol. Using the STUN and TURN protocols, ICE finds out the network architecture and provides some transport addresses (ICE candidates) on which the peer can be contacted.

An RTCPeerConnection object has an associated ICE, comprising the RTCPeerConnection signaling state, the ICE gathering state, and the ICE connection state. These are initialized when an object is first created. The flow of signals through these nodes is depicted in the following call flow diagram:

ICE, STUN, and TURN are defined as follows:

  • ICE: This is the framework to allow your web browser to connect with peers. ICE uses STUN or TURN to achieve this.

  • STUN: This is the protocol to discover your public address and determine any restrictions in your router that would prevent a direct connection with a peer. It presents the outside world with a public IP to the WebRTC client that can be used by the peers to communicate to it.

  • TURN: This is meant to bypass the Symmetric NAT restriction by opening a connection with a TURN server and relaying all information through this server.


STUN/ICE is built-in and mandatory for WebRTC.


WebRTC through WebSocket signaling servers

Signaling is a crucial activity to establish any kind of network-based communication. It lets the endpoints share the session description and media information before setting up the path to actually exchange media. For a simple WebRTC client, there are JavaScript-based WebSocket servers that can provide such signaling in a permanent, full duplex, real-time manner. Node.js is one such server.


Node.js is an asynchronous, server-side JavaScript server powered by Chrome's V8 JS engine. There are many WebSocket libraries, such as Socket.io and SockJS, that can run over it. Why are they used? They are used because the WebSocket server will do the WebSocket signaling between WebRTC clients and the server without using other protocols such as XMPP or SIP.

Let's see how we can use Node.js signaling server through the following simple steps:

  1. On a Windows machine, install nodejs.exe from the official download site, http://www.nodejs.org.

  2. To check whether Node.js is properly installed and working, check the version using the following command lines

    node -v

    The output in my case is v0.10.26.

  3. Open the command prompt, and type node <name of the JS file> in the window. Consider the following command line as an example:

    node signaler.js

To write and run a simple server-side program, open Notepad, make a sample JS file with a name, say, console, and add some content to the console.log('node.js running fine') file. Run this file using the following Node.js command from the command prompt:

node console.js

The following screenshot shows the output of the preceding command line:

Let's now look at the overview of steps using Node.js to set up the signaling environment for a WebRTC client.

  1. First, we need a JavaScript library to support WebRTC signaling operations. We can use signaller.js for this. Download signaller.js from https://github.com/muaz-khan/WebRTC-Experiment/blob/master/websocket-over-nodejs/signaler.js.

  2. Next, we should run the JavaScript library using the Node.js server. We can do so by executing the following command in the terminal window:

    node signaler.js
  3. Specify the address of the Node.js server machine in the WebRTC client.

    Now, we can make inter-browser WebRTC audio/video calls, where the signaling is handled by the Node.js WebSocket signaling server. The following diagram depicts how Node.js is used as a signaling server:

The preceding diagram denotes signaling across WebRTC clients over the Node.js WebSocket-based server. The media flows from peer to peer.

Making a peer-to-peer audio call using Node.js for signaling

We have seen how a JavaScript program is hosted on a Node.js signaling server. Now, let's study the process of making an audio/video call using this setup. The following code references Muaz Khan WebRTC experiments, which is under the MIT license. The library used is PeerConnection.js. The following are the CSS descriptions for the audio and video content on a page:

audio, video {
  vertical-align: top;
.setup {
  border-bottom-left-radius: 0;
  border-top-left-radius: 0;
  margin-left: -9px;
  margin-top: 8px;
  position: absolute;
.highlight { color: rgb(0, 8, 189); }

Next, we will look at the JavaScript functions that define the behavior of the WebRTC client. This is a modified version of code from one-to-one-peerconnection.html under the WebRTC experiments master from Muaz Khan. For better clarity and easy understanding, I have removed the functions of unique ID, rotate video, and scale video, and have minimal CSS styling.

The following code defines the websocket.onopen and websocket.send operations:

var channel = location.href.replace( /\/|:|#|%|\.|\[|\]/g, '');
var websocket = new WebSocket('ws://' + document.domain + ':12034');
websocket.onopen = function() {
    open: true, channel: channel
websocket.push = websocket.send;
websocket.send = function(data) {
    data: data, channel: channel

The following code is for the creation of a new peer connection and for every user who joins a session:

  var peer = new PeerConnection(websocket);
  peer.onUserFound = function(userid) {

    if (document.getElementById(userid)) return;

    /* adding the name of room to room list */
    var tr = document.createElement('tr');
    var td1 = document.createElement('td');
    var td2 = document.createElement('td');
    td1.innerHTML = userid + ' video call';

    /* creating element button to room list */
    var button = document.createElement('button');
    button.innerHTML = 'Join';
    button.id = userid;
    button.style.float = 'right';

    /* add the user to session on button click */
    button.onclick = function() {
      button = this;
      getUserMedia(function(stream) {
      // get user media
      // add the stream

      button.disabled = true;


The following code adds streaming to the video element of HTML and sets its characteristics:

    peer.onStreamAdded = function(e) {
      if (e.type == 'local')
        document.querySelector('#start-broadcasting').disabled = false;
      var video = e.mediaElement;
      video.setAttribute('width', 400);
      video.setAttribute('height', 400);
      video.setAttribute('controls', true);
      videosContainer.insertBefore(video, videosContainer.firstChild);

The following code is to close the streaming session:

    peer.onStreamEnded = function(e) {
    var video = e.mediaElement;
    if (video) {
      video.style.opacity = 0;
      setTimeout(function() {
      }, 1000);

  document.querySelector('#start-broadcasting').onclick = function() {
    this.disabled = true;
    getUserMedia(function(stream) {
  document.querySelector('#your-name').onchange = function() {
    peer.userid = this.value;

  var videosContainer = document.getElementById('videos-container') || document.body;
  var btnSetupNewRoom = document.getElementById('setup-new-room');
  var roomsList = document.getElementById('rooms-list');

  if (btnSetupNewRoom) btnSetupNewRoom.onclick = setupNewRoomButtonClickHandler;

The following code is to capture the user media:

  function getUserMedia(callback) {
    var hints = {
      audio: true,
      video: {
        optional: [],
        mandatory: {
          minWidth: 200, minHeight:200, maxWidth: 400, maxHeight: 400, minAspectRatio: 1.77
    navigator.getUserMedia(hints, function(stream) {
      var video = document.createElement('video');
      video.src = URL.createObjectURL(stream);
      video.controls = true;
      video.muted = true;
        mediaElement: video,
        userid: 'self',
        stream: stream

The following is the web page's HTML content to add a button to start transmitting the media, a video element to display the media, a text field to add a user's name, and a table to list the existing available sessions:

<input type="text" id="your-name" placeholder="your-name">
<button id="start-broadcasting" class="setup">
Start Transmitting Yourself!</button>

<!-- list of all available conferencing rooms -->
<table id="rooms-list" style="width: 100%;"></table>

<!-- local/remote videos container -->
<div id="videos-container"></div>

The following screenshot depicts a user, Alice, creating a new session named alice. Here, the user Alice creates a session for broadcasting video, which will be added to the room list.

Alice's media is streamed on the session space, as shown in the next screenshot:

A new user, Bob, views the list of ongoing sessions from his remote computer, and clicks on the Join button, as shown in the following screenshot, to join Alice's session:

The following screenshot displays a two-way audio and video session in progress between Bob and Alice:

Bob and Alice are in an audio/video sharing session. Using other WebRTC APIs, we can also add file sharing, text chat, screen sharing capabilities, and so on to this simple demonstration to turn it into a multifeatured communication tool.


Running WebRTC with SIP

This section introduces the approach to use the SIP signaling mechanism with WebRTC. Like any other VoIP protocol, SIP also provides the signaling framework before setting up an actual media path. However, the foundation of open standard and industry-adopted signaling protocol such as SIP is recommended, as it provides the first and most crucial step to a strong, scalable architecture.

Session Initiation Protocol (SIP)

As we already know, SIP is a signaling protocol that is used to establish an RTP between two endpoints.


As per the official document, RFC 3261, SIP is an application-layer control protocol that can establish, modify, and terminate multimedia sessions (conferences) such as Internet telephony calls.

The SIP stack defines the Request and Response methods. These methods are used to gather the information about endpoints that wish to participate in a communication so that the device-specific information such as IP, port, availability, media understanding, and audio-video device compatibility can be sorted out before establishing a flowing media connection.

However, it should be noted that traditional SIP is a bit different from SIP over WebSocket (SIPWS), which is used in case of WebRTC with SIP signaling. It is not by default that every SIP server would understand SIPWS. Only those SIP servers that have WebSocket support, or state that they are WebRTC compliant, will be able to proxy or understand the SIP messages sent from a WebRTC client.

Why do we use SIPWS? This protocol allows the development of Convergent applications, that is, applications that support SIP for communication, HTTP for web components, and WebRTC for media. SIPWS can be transformed into plain SIP signal through a gateway, which can then interact with the IMS network. Also, SIP can be used to integrate application logic such as call screening and call rerouting, with the help of SIP Servlets or other kinds of SIP programming. More of this is given in Chapter 3, WebRTC with SIP and IMS.

SIPWS is explained in detail in the IETF draft, The WebSocket Protocol as a Transport for the Session Initiation Protocol (SIP) draft-ietf-sipcore-sip-websocket-10 and can be found at http://tools.ietf.org/html/draft-ietf-sipcore-sip-websocket-10.

The following figure depicts the use of SIPWS signaling plane with WebRTC media plane:

The following figure provides the call flow of the SIPWS signaling mechanism. Any SIP request is preceded by a one-time WebSocket handshake.

Alice loads a web page using her web browser and retrieves the JavaScript code that implements the SIP WebSocket subprotocol. The JavaScript code (SIP WebSocket Client) establishes a secure WebSocket connection with a SIP proxy/registrar (SIP WebSocket Server) at proxy.example.com.

The following is an example of a WebSocket handshake in which the Client requests the WebSocket SIP subprotocol support from the Server:

Request Method:
Status Code:
101 Switching Protocols

Request Headers:
Provisional headers are shown.
Sec-WebSocket-Extensions:permessage-deflate; client_max_window_bits, x-webkit-deflate-frame
User-Agent:Mozilla/5.0 (X11; Linux i686 (x86_64)) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/32.0.1700.102 Safari/537.36
Response Headersview source

The following diagram shows the call between Alice and Bob through the SIP proxy server over WebSocket signaling:

Every SIP endpoint is registered with the SIP Server by a unique callable ID. This is referred to as the SIP URI and is denoted by the sip:<username>@<domainname> format. When a user, Alice, calls another user, Bob, through Bob's SIP URI, then the SIP WebSocket Server at proxy.example.com acts as a SIP proxy node and routes the INVITE call to Bob's contact. Bob answers the call to start a conversation and then terminates it with a BYE request when the communication is over.

JavaScript-based SIP libraries

There are many popular JavaScript libraries that offer easy-to-integrate support for WebRTC communication using SIP signaling:

  • SIPJS: This is an SIP stack in JavaScript to implement SIP-based audio and video user agents in the browser. You can find a running demo at http://theintencity.com/sip-js/phone.html?network_type=WebRTC. The demo application has the option to switch between WebRTC capabilities and Flash for browsers that support and do not support WebRTC.

  • JSSIP: This is an SIP over WebSocket transport API for audio/video calls and instant messaging. It works with all SIPWS-compatible SIP servers such as OverSIP, Kamailio, and Asterisk servers. You can find a running demo at http://tryit.jssip.net/.

  • sipML5: This is an open source JavaScript library with a provision for RTCWeb Breaker (audio and video transcoding when the endpoints do not support the same codecs or the remote server is not RTCWeb compliant). For example, features such as audio/video call, instant messaging, presence, call hold/resume, explicit call transfer, and Dual-tone multi-frequency (DTMF) signaling using SIP INFO are present. You can find a running example at http://sipml5.org/call.htm.

  • QuoffeSIP: This is another WebRTC SIP library to establish real-time communication between browsers. This is developed in CoffeeScript (simple syntax). It features video/audio call capabilities using SIP over the Websockets protocol and also uses the SIP Outbound and GRUU protocols. You can find a running example athttp://talksetup.quobis.com/.

The implementation of the sipML5 and JSSIP libraries to constitute a simple WebRTC browser client that is able to communicate to a similar peer in any WebRTC-supported browser is covered in the next chapter.



In this chapter, we learned that a WebRTC communication process is divided into two parts: signaling, where the session setup and teardown is agreed to, and media transactions, which deals with the actual RTP streams that contain voice/video/data that the user has sent. We saw how to program the three basic APIs of WebRTC media stack, namely, getUserMedia, RTCPeerConnection, and DataChannel. The Running WebRTC without SIP section described signaling done over JSON via XMLHttpRequest using Node.js as the intermediately signaling server to connect the peers and prepare for the media flow. The next section, Running WebRTC with SIP, listed the libraries or WebRTC clients that use SIP over WebSocket to take care of the signaling between WebRTC peers. In the following chapters, we will see how to use WebRTC media APIs over the SIP WebSocket protocol in detail.

About the Author

  • Altanai

    Altanai, born into an Indian army family, is a bubbly, vivacious, intelligent computer geek. She is an avid blogger and writes on Research and Development of evolving technologies in Telecom (http://altanaitelecom.wordpress.com).

    She holds a Bachelor's degree in Information Technology from Anna University, Chennai. She has worked on many Telecom projects worldwide, specifically in the development and deployment of IMS services. She firmly believes in contributing to the Open Source community and is currently working on building a WebRTC-based JS library with books for more applications.

    Her hobbies include photography, martial arts, oil canvas painting, river rafting, horse riding, and trekking, to name a few.

    This is her first book, and it contains useful insight into WebRTC for beginners and integrator in this field. The book has definitions and explanations that will cover many interesting concepts in a clear manner.

    Altanai can be contacted at [email protected]

    Browse publications by this author