This chapter describes the basics of developing WebRTC media web applications. You will learn how to build a simple peer-to-peer video conference with a web chat that will work through NAT and firewalls (in most cases).
The video conference developed in this chapter consists of two applications: the client-side application and the server-side application.
The client code is written in JavaScript and will be executed in the customer's web browser. This application uses the WebRTC API, handles all the media features, and provides a web page.
The server code will be executed on a server (it can even be your work machine). We need the server application to make our conference work well with peers behind NAT (peers who use private IP addresses). The server code is written in Erlang, and you will also get a brief introduction to this language.
As a bonus, you will get basic knowledge of Session Traversal Utilities for NAT (STUN) and Traversal Using Relay NAT (TURN) servers. We will discuss them in a more detailed way in Chapter 4, Security and Authentication.
WebRTC can't create direct connections between peers without the help of a signaling server. The signaling server is not something standardized that your application can use. Actually, any communication mechanism that allows us to exchange Session Description Protocol (SDP) data between peers can be used for signalization. SDP is described in the next section.
A connection between peers and a signaling server is usually called a signaling channel. In this chapter, we will use WebSockets to build our signaling server.
Also, peers that exchange SDP data should exchange data about the network connection (even called ICE candidates).
SDP is an important part of the WebRTC stack. It is used to negotiate on-session/media options while establishing a peer connection.
It is a protocol that is intended to describe multimedia communication sessions for the purposes of session announcement, session invitation, and parameter negotiation. It does not deliver the media data itself, but is used for negotiation between peers of various media types, formats, and all associated properties/options, such as resolution, encryption, and codecs. The set of properties and parameters is usually called a session profile.
Peers have to exchange SDP data using the signaling channel before they can establish a direct connection.
The following is an example of an SDP offer:
v=0 o=alice 2890844526 2890844526 IN IP4host.atlanta.example.com s= c=IN IP4host.atlanta.example.com t=0 0 m=audio 49170 RTP/AVP 0 8 97 a=rtpmap:0PCMU/8000 a=rtpmap:8PCMA/8000 a=rtpmap:97iLBC/8000 m=video 51372 RTP/AVP 31 32 a=rtpmap:31H261/90000 a=rtpmap:32MPV/90000
Here, we can see that this is a video and audio session, and multiple codecs are offered.
The following is an example of an SDP answer:
v=0 o=bob 2808844564 2808844564 IN IP4host.biloxi.example.com s= c=IN IP4host.biloxi.example.com t=0 0 m=audio 49174 RTP/AVP 0 a=rtpmap:0PCMU/8000 m=video 49170 RTP/AVP 32 a=rtpmap:32MPV/90000
Here, we can see that only one codec is accepted in response to the preceding offer.
You can find more SDP sessions' examples at https://www.rfc-editor.org/rfc/rfc4317.txt.
You can also find deep details on SDP in the appropriate RFC at http://tools.ietf.org/html/rfc4566.
Interactive Connectivity Establishment (ICE) is a mechanism that allows peers to establish a connection. In real life, customers usually don't have a direct connection to the Internet; they are connected via network devices/routers, have private IP addresses, use NAT, use network firewalls, and so on. Usually, customers' devices don't have public IP addresses. ICE uses STUN/TURN protocols to make peers establish a connection.
You can find details on ICE in the appropriate RFC at https://tools.ietf.org/html/rfc5245.
WebRTC has an in-built mechanism to use NAT traversal options such as STUN and TURN servers.
In this chapter, we will use public STUN servers, but in real life, you should install and configure your own STUN or TURN server. We will learn how to install a STUN server at the end of this chapter as a bonus to the developed application. We will get into installing and configuring the TURN server in Chapter 4, Security and Authentication, while diving into the details.
In most cases, you will use a STUN server; it helps perform a NAT/firewall traversal and establish a direct connection between the peers. In other words, the STUN server is utilized only during the stage of establishing a connection. After the connection has been established, peers will transfer the media data directly between them.

In some cases (unfortunately, they are not so rare), the STUN server won't help you get through a firewall or NAT, and establishing a direct connection between the peers will be impossible, for example, if both peers are behind a symmetric NAT. In this case, the TURN server can help you.
A TURN server works as a retransmitter between the peers. Using the TURN server, all the media data between the peers will be transmitted through the TURN server.

If your application gives a list of several STUN/TURN servers to a WebRTC API, then the web browser will try to use STUN servers first; in case the connection failed, it will try to use the TURN servers automatically.
WebSocket is a protocol that provides full-duplex communication channels over a single TCP connection. This is a relatively young protocol but today all major web browsers, including Chrome, Internet Explorer, Opera, Firefox, and Safari, support it. WebSocket is a replacement for long polling to get a two-way communication between the browser and server.
In this chapter, we will use WebSocket as a transport channel to develop a signaling server for our video conference service. Our peers will communicate with the signaling server using this.
Two important benefits of WebSocket are that it does support HTTPS (secure channel), and it can be used via web proxy (nevertheless, some proxies can block the WebSocket protocol).
Let's start with the setup:
Create a folder for the whole application somewhere on your disk. Let's call it
my_rtc_project
.Create a directory named
my_rtc_project/www
. Here, we will put all the client-side code (JavaScript files or HTML pages).The signaling server's code will be placed under its separate folder, so create a directory for it and name it
my_rtc_project /apps/rtcserver/src
.Please note that we will use Git, a free and open source-distributed version control system. For Linux boxes, it can be installed using the default package manager. For a Windows system, I recommend that you install and use the implementation available at https://github.com/msysgit/msysgit.
If you're using a Windows box, install msysgit and add a path to its
bin
folder to yourPATH
environment variable.
The signaling server is developed in the Erlang language. Erlang is a great choice to develop server-side applications due to the following reasons:
It is very comfortable and easy for prototyping
Its processes (aktors) are very lightweight and cheap
It does support network operations with no need of any external libraries
The code has been compiled to a bytecode that runs on a very powerful Erlang Virtual Machine
The following are some great projects developed using Erlang:
Yaws and Cowboy: These are web servers
Riak and CouchDB: These are distributed databases
Cloudant: This is a database service based on the forking of CouchDB
Ejabberd: This is an XMPP instant messaging service
Zotonic: This is a content management system
RabbitMQ: This is a message bus
Wings 3D: This is a 3D modeler
GitHub: This is a web-based hosting service for software development projects that use the Git versioning system
WhatsApp: This is a famous mobile messenger, sold to Facebook
Call of Duty: This is a computer game that uses Erlang on the server side
Goldman Sachs: This is a company that uses high-frequency trading computer programs
The following is a very brief history of Erlang:
1982–1985: Ericsson starts experimenting with the programming of telecom, because existing languages weren't suitable for the task.
1985–1986: Ericsson decides it must develop its own language with the desirable features of Lisp, Prolog, and Parlog. The language should have built-in concurrency and error recovery.
1987: First experiments with a new language, Erlang.
1988: Erlang is first used by external users out of the lab.
1989: Ericsson works on the fast implementation of Erlang.
1990: Erlang is presented at ISS'90 and gets new users.
1991: A fast implementation of Erlang is released to users. Erlang is presented at Telecom'91 and gets a compiler and graphic interface.
1992: Erlang gets a lot of new users. Ericsson ports Erlang to new platforms including VxWorks and Macintosh.
1993: Erlang gets distribution. This makes it possible to run homogeneous Erlang systems on heterogeneous hardware. Ericsson starts selling Erlang implementations and Erlang Tools. A separate organization in Ericsson provides support.
Erlang is supported by many platforms. You can download it from the main website, http://www.erlang.org, and install it.
Actually, you can write Erlang programs and compile them without using any additional tools. Nevertheless, it is pretty easy to compile Erlang programs using the Rebar tool.
It works like a Make or auto tools for C or C++ applications and makes a developer's life easier.
You can download the Rebar tool from GitHub at https://github.com/basho/rebar.
The installation process is pretty simple:
git clone git://github.com/rebar/rebar.git $ cd rebar $ ./bootstrap ... ==> rebar (compile) Congratulations!...
Now you have the Rebar executable in the folder where you downloaded the Rebar tool. Put it under a folder that is accessible with the PATH
environment variable.
Configure the web server for your application's domain and point it to the my_rtc_project/www
folder.
The basic application that we're considering in this chapter works fine without a web server; you can just open the index page in your web browser locally. Nevertheless, in the following chapters, we will touch on more advanced topics that will need to be configured on the web server in order to gain a better understanding of them.
For client-side code that runs in the user's web browser, we will use plain JavaScript.
The WebRTC API functions have different names in different web browsers. To make your application work well with all the browsers, you need to detect which web browser your application is running under and use the appropriate API function names. First of all, we need to implement a helper or an adapter to the WebRTC API functions.
Please note that this situation with different function names is temporary, and after WebRTC is standardized, every browser will support the standard WebRTC API function names. Thus, the WebRTC adapter that we're developing here will probably not be necessary in the future.
Create the www/myrtcadapter.js
file:
function initWebRTCAdapter() {
Check whether we're running the file in Firefox:
if (navigator.mozGetUserMedia) { webrtcDetectedBrowser = "firefox";
Redefine the RTCPeerConnection
API function, an entity to keep and control a peer connection itself:
RTCPeerConnection = mozRTCPeerConnection;
To control the session description entity, we will use RTCSessionDescription
:
RTCSessionDescription = mozRTCSessionDescription;
To support the NAT traversal functionality, we need to use the RTCIceCandidate
entity:
RTCIceCandidate = mozRTCIceCandidate;
We want to get access to audio and video and for that, we need to use the getUserMedia
API function:
getUserMedia = navigator.mozGetUserMedia.bind(navigator);
Besides the WebRTC API functions, different web browsers have different ways to control HTML entities that we need to use. For example, Chrome and Firefox attach the media stream to a media entity (the HTML tag video) in different ways. Thus, we need to redefine additional functions here.
We define the following two functions to attach and reattach the media stream to a video HTML tag:
attachMediaStream = function(element, stream) { element.mozSrcObject = stream; element.play(); }; reattachMediaStream = function(to, from) { to.mozSrcObject = from.mozSrcObject; to.play(); };
Here, we define two functions to be able to get audio-video tracks from a media stream. Unfortunately, there is no way to do this on a Firefox version that is older than Version 27; thus, here, we just have redundant functions to make our adapter universal:
if (!MediaStream.prototype.getVideoTracks) { MediaStream.prototype.getVideoTracks = function() { return []; }; }; if (!MediaStream.prototype.getAudioTracks) { MediaStream.prototype.getAudioTracks = function() { return []; }; }; return true;
Next, we do the same for Chrome:
} else if (navigator.webkitGetUserMedia) { webrtcDetectedBrowser = "chrome"; RTCPeerConnection = webkitRTCPeerConnection; getUserMedia = navigator.webkitGetUserMedia.bind(navigator);
As you can see here, we use different ways to support the "attach media stream" functionality for Chrome from the ways we used for Firefox previously:
attachMediaStream = function(element, stream) { element.src = webkitURL.createObjectURL(stream); }; reattachMediaStream = function(to, from) { to.src = from.src; };
Chrome does support the functionality to get video and audio tracks and so, here, we have a different approach as compared to the one we used for Firefox previously:
if (!webkitMediaStream.prototype.getVideoTracks) { webkitMediaStream.prototype.getVideoTracks = function() { return this.videoTracks; }; webkitMediaStream.prototype.getAudioTracks = function() { return this.audioTracks; }; } if (!webkitRTCPeerConnection.prototype.getLocalStreams) { webkitRTCPeerConnection.prototype.getLocalStreams = function() { return this.localStreams; }; webkitRTCPeerConnection.prototype.getRemoteStreams = function() { return this.remoteStreams; }; } return true; } else return false; };
It is useful to develop a little WebRTC API wrapper library to use it in your application.
Create a file and name it www/myrtclib.js
.
First of all, we need to define several variables to control WebRTC entities and use the API. We make them equal to null
. However, using our adapter that we developed previously, these variables will refer to appropriate API functions:
var RTCPeerConnection = null; var getUserMedia = null; var attachMediaStream = null; var reattachMediaStream = null; var webrtcDetectedBrowser = null;
Here, we keep the virtual room number:
var room = null;
The initiator
variable keeps the initiator state that tells us whether we are calling our peer or are waiting for a call:
var initiator;
The following two variables keep the references to local and remote media streams:
var localStream; var remoteStream;
We need the pc
variable to control a peer connection:
var pc = null;
As we discussed previously, we need a signaling mechanism to make our connection work. The following variable will store the URL that will point to our signaling server:
var signalingURL;
The following variables keep the HTML video entities: local and remote. They are just IDs of video
HTML tags:
var localVideo; var remoteVideo;
We want to know whether our signaling channel is ready for operation, and we need a variable to control it:
var channelReady; var channel;
Here, we define two STUN servers to support the NAT traversal functionality:
var pc_config = {"iceServers": [{url:'stun:23.21.150.121'}, {url:'stun:stun.l.google.com:19302'}]};
We also need to define constraints. Using this, we tell a web browser whether we want to use just audio for our conference, or video, or both:
var sdpConstraints = {'mandatory': {'OfferToReceiveAudio':true, 'OfferToReceiveVideo':true }};
Next, we define several wrapping/helping functions to make our code more universal and reusable.
This is our initialization function. It gets a signaling server's URL and references to local and remote video HTML entities.
Here, we perform the initialization of our API adapter that we developed earlier; after this, we will have universal API function names that we can use under any web browser that supports WebRTC.
After the adapter is initialized, we call the openChannel
function that we use to initiate a connection to our signaling server:
function myrtclibinit(sURL, lv, rv) { signalingURL = sURL; localVideo = lv; remoteVideo = rv; initWebRTCAdapter(); openChannel(); };
The openChannel
function opens a connection to our signaling server. Here, we use WebSockets as a transport layer, but it is not mandatory. You can create your own implementation using Ajax, for example, or any other suitable technology that you like the most:
function openChannel() { channelReady = false; channel = new WebSocket(signalingURL);
This callback function will be called if our signaling connection has been established successfully. We can't continue if the signaling channel has not been opened:
channel.onopen = onChannelOpened;
When our peer sends a message during the process of establishing the peer connection, the onChannelMessage
callback function will be called and we will be able to react on it:
channel.onmessage = onChannelMessage;
If the signaling channel has been closed due to some reason (our peer closed its browser or the signaling sever has been powered down), we will get a notification from the onChannelClosed
function and react on these two event: show a message to the user or try to re-establish a connection:
channel.onclose = onChannelClosed; };
We will get here after the signaling channel has been opened successfully and we can continue and start our conference:
function onChannelOpened() {
First of all, we need to indicate that the signaling channel is opened and alive:
channelReady = true;
Here, we try to understand whether we're calling to our peer or we're waiting for a call from it.
We take the URL of our location and try to find the room
word inside of it. If there is no such word, then we're going to create a virtual room and act passively, waiting for a call from someone.
If we find the room
word, it means that someone has already created a virtual room and we want to enter it; we're in a calling state and should behave actively, trying to initiate a connection to our peer in the room.
We use the sendMessage
function to send messages to our signaling server. If the virtual room has not been created yet, then the signaling server will create it and return its room number back to us. In case we have a virtual room number, we ask the signaling server to enter us in to the room; it will parse our message and send it to our peer to initiate the establishment of a direct connection:
if(location.search.substring(1,5) == "room") { room = location.search.substring(6); sendMessage({"type" : "ENTERROOM", "value" : room * 1}); initiator = true; } else { sendMessage({"type" : "GETROOM", "value" : ""}); initiator = false; }
We solved our questions with the virtual room; now, we need to ask the browser to give us access to the browser's media resources, video (web camera), and audio (mic):
doGetUserMedia(); };
The following function is called when we get a message from our signaling server. Here, we can add some logging or any additional logic but for now, we just need to process the message and react on it:
function onChannelMessage(message) { processSignalingMessage(message.data); };
The onChannelClosed
function will be called when the signaling server becomes unavailable (a dropped connection) or if the remote peer has closed the connection (the remote customer has closed its web browser, for example).
In this function, you can also show an appropriate message to your customer or implement any other additional logic.
In the following function, we just indicate that the channel has been closed, and we don't want to transfer any messages to our signaling server:
function onChannelClosed() { channelReady = false; };
To communicate with the signaling server, we use the sendMessage
function. It gets a message as a JSON object, makes a string from it, and just transfers it to the signaling server.
When debugging, it is usually helpful to add some kind of message-logging functionality here:
function sendMessage(message) { var msgString = JSON.stringify(message); channel.send(msgString); };
We need to parse messages from the signaling server and react on them, respectively:
function processSignalingMessage(message) { var msg = JSON.parse(message);
If we get an offer
message, then it means that someone is calling us and we need to answer the call:
if (msg.type === 'offer') { pc.setRemoteDescription(new RTCSessionDescription(msg)); doAnswer();
If we get an answer
message from the signaling server, it means that we just tried to call someone and it replied with the answer
message, confirming that it is ready to establish a direct connection:
} else if (msg.type === 'answer') { pc.setRemoteDescription(new RTCSessionDescription(msg));
When a remote peer sends a list of candidates to communicate with, we get this type of message from the signaling server. After we get this message, we add candidates to the peer connection:
} else if (msg.type === 'candidate') { var candidate = new RTCIceCandidate({sdpMLineIndex:msg.label, candidate:msg.candidate}); pc.addIceCandidate(candidate);
If we asked the signaling server to create a virtual room, it will send a GETROOM
message with the created room's number. We need to store the number to use it later:
} else if (msg.type === 'GETROOM') { room = msg.value;
The OnRoomReceived
function is called to implement an additional functionality. Here, we can perform some UI-related actions, such as showing the room's URL to the customers so that they can share it with their friends:
OnRoomReceived(room);
If we get an URL from our friend that asks us to enter a virtual room but the room number is wrong or outdated, we will get the WRONGROOM
message from the signaling server. If so, we are just moving to the index page:
} else if (msg.type === 'WRONGROOM') { window.location.href = "/"; } };
Here, we're asking the web browser to get us access to the microphone and web camera.
Chrome will show a pop-up window to the user that will ask the user whether he/she wants to provide access or not. So, you will not get access until the user decides. Chrome will ask this every time the user opens your application page. To avoid this and make Chrome remember your choice, you should use the HTTPS connection with the SSL/TLS certificate properly configured in the web server that you're using. Please note that the certificate either needs to be signed by a public CA (Certificate Authority), or by a private CA whose identity has been configured in the browser/client computer. If the browser doesn't trust the certificate automatically and prompts the user to indicate an exception, then your choice will not be remembered by Chrome.
Firefox won't remember the choice, but this behavior can be changed in future:
function doGetUserMedia() { var constraints = {"audio": true, "video": {"mandatory": {}, "optional": []}}; try {
We ask the WebRTC API to call our callback function, onUserMediaSuccess
, if we have got the access rights from the user:
getUserMedia(constraints, onUserMediaSuccess, null);
If we didn't get the access rights, we'll get an exception. Here, you probably want to add some logging and UI logic to inform your customer that something is wrong and we can't continue:
} catch (e) { } };
We will get trapped here if we get the access rights to reach the web camera and microphone via the web browser:
function onUserMediaSuccess(stream) {
We get a video stream from a local web camera and we want to show it on the page, so we're attaching the stream to the video
tag:
attachMediaStream(localVideo, stream);
Store the stream in a variable because we want to refer to it later:
localStream = stream;
Now we're ready to create a direct connection to our peer:
createPeerConnection();
After the peer connection is created, we put our local video stream into it to make the remote peer see us:
pc.addStream(localStream);
Check whether we're waiting for a call or we're the caller. If we're the initiator, we call the doCall
function to initiate an establishment to a direct connection:
if (initiator) doCall(); };
The following function will try to create a peer connection—a direct connection between peers:
function createPeerConnection() {
To improve the security of the connection, we ask the browser to switch on the DTLS-SRTP option. It enables the exchange of the cryptographic parameters and derives the keying material. The key exchange takes place in the media plane and is multiplexed on the same ports as the media itself.
This option was disabled in Chrome by default, but it has been enabled from Version 31 onwards. Nevertheless, we don't want to check the version of a browser used by our customer, so we can't rely on the default settings of the browser:
var pc_constraints = {"optional": [{"DtlsSrtpKeyAgreement": true}]}; try {
Create a peer connection using the WebRTC API function call. We pass a predefined list of STUN servers and connection configurations to the function:
pc = new RTCPeerConnection(pc_config, pc_constraints);
Here, we define a callback function to be called when we have to send the ICE candidates to the remote part:
pc.onicecandidate = onIceCandidate;
When the connection is established, the remote side will add its media stream to the connection. Here, we want to be informed of such an event in order to be able to show the remote video on our web page:
pc.onaddstream = onRemoteStreamAdded;
If the establishment of the connection fails, we will get an exception. Here, you can add debug console logging and UI improvements to inform the customer that something is wrong:
} catch (e) { pc = null; return; } };
When we have ICE candidates from the WebRTC API, we want to send them to the remote peer in order to establish a connection:
function onIceCandidate(event) { if (event.candidate) sendMessage({type: 'candidate', label: event.candidate.sdpMLineIndex, id: event.candidate.sdpMid, candidate: event.candidate.candidate}); };
We will get trapped into this function when a direct connection has been established and a remote peer has added its media stream to the connection. We want to show a remote video so, here, we're attaching a remote video to the video
tag on the web page:
function onRemoteStreamAdded(event) { attachMediaStream(remoteVideo, event.stream);
We also want to store a reference to the remote stream in order to use it later:
remoteStream = event.stream; };
The following function is called by us when we're joining a virtual room and initiating a call to the remote peer:
function doCall() {
We don't want to use the data channel yet (as it will be introduced in the next chapter). It is enabled in Firefox by default so here, we're asking Firefox to disable it:
var constraints = {"optional": [], "mandatory": {"MozDontOfferDataChannel": true}};
Check whether we're running this execution under Chrome and if so, remove the unnecessary options that are preconfigured to run under Firefox:
if (webrtcDetectedBrowser === "chrome") for (var prop in constraints.mandatory) if (prop.indexOf("Moz") != -1) delete constraints.mandatory[prop];
Merge browser options with the whole constraints
entity, and call the createOffer
function in order to initiate a peer connection. In case of a success, we will get into the setLocalAndSendMessage
function:
constraints = mergeConstraints(constraints, sdpConstraints); pc.createOffer(setLocalAndSendMessage, null, constraints); };
If we're waiting for a call and have got an offer from a remote peer, we need to answer the call in order to establish a connection and begin the conference.
Here is the function that will be used to answer a call. As is the case with doAnswer
, we will get into the setLocalAndSendMessage
function in case of a success:
function doAnswer() { pc.createAnswer(setLocalAndSendMessage, null, sdpConstraints); };
The preceding callback function is used during the process of establishing a connection by the WebRTC API. We receive a session description entity, and then we need to set up a local description and send an SDP object to the remote peer via a signaling server:
function setLocalAndSendMessage(sessionDescription) { pc.setLocalDescription(sessionDescription); sendMessage(sessionDescription); };
The following is a simple helper that merges the constraints:
function mergeConstraints(cons1, cons2) { var merged = cons1; for (var name in cons2.mandatory) merged.mandatory[name] = cons2.mandatory[name]; merged.optional.concat(cons2.optional); return merged; };
We have two JavaScript files under our www
directory: myrtclib.js
and myrtcadapter.js
.
Now, it's time to use them and create an index page of the application.
Create an index page file, www/index.html
:
<!DOCTYPE html> <html> <head> <title>My WebRTC application</title>
Here, we defined a style for the page to place a local and remote video object one by one on the same row:
<style type="text/css"> section { width: 90%; height: 200px; background: red; margin: auto; padding: 10px; } div#lVideo { width: 45%; height: 200px; background: black; float: left; } div#rVideo { margin-left: 45%; height: 200px; background: black; } </style>
Include our adapter and wrapper JavaScript code:
<script type="text/javascript"src="myrtclib.js"></script> <script type="text/javascript"src="myrtcadapter.js"></script> </head>
We want to perform some additional actions after the page is loaded, but before the start of the conferencing, we use the onLoad
property of the body
HTML tag to call the appropriate function:
<body onLoad="onPageLoad();">
The status
div will be used to store the information about the customer. For example, we will put a URL there with a virtual room number that is to be shared between the peers:
<div id='status'></div> <section>
We use the autoplay
option to start the video streaming automatically after the media stream has been attached.
We mute the local video object in order to avoid the local echo effect:
<div id='lVideo'> <video width="100%" height="100%"autoplay="autoplay" id="localVideo" muted="true"></video> </div> <div id='rVideo'> <video width="100%" height="100%"autoplay="autoplay" id="remoteVideo"></video> </div> </section>
The following function will be called by the web browser after the page has been loaded:
<script> function onPageLoad() {
First of all, we will try to make the UI look nicer. Here, we try to get the width of every video object and set an appropriate height parameter. We assume that the width/height is 4/3 and calculate the height for each object respectively:
var _divV = document.getElementById("lVideo"); var _w = _divV.offsetWidth; var _h = _w * 3 / 4; _divV.offsetHeight = _h; _divV.setAttribute("style","height:"+_h+"px"); _divV.style.height=_h+'px'; _divV = document.getElementById("rVideo"); _divV.setAttribute("style","height:"+_h+"px"); _divV.style.height=_h+'px';
This is the main point where we start our conference. We pass the signaling server's URL and local/remote objects' references to the initialization function, and the magic begins.
Please use appropriate IP address and port values where your signaling server is running (we will begin to build it in the next page):
myrtclibinit("ws://IP:PORT",document.getElementById("localVideo"),document.getElementById("remoteVideo")); };
This is a callback function called from our myrtclib.js
script when the signaling server returns a virtual room's number. Here, we construct an appropriate URL for our customer to share it with a friend:
function OnRoomReceived(room) { var st = document.getElementById("status"); st.innerHTML = "Now, if somebody wants to join you, should use this link: <a href=\""+window.location.href+"?room="+room+"\">"+window.location.href+"?room="+room+"</a>"; }; </script> </body> </html>
We prepared a client-side code to be executed inside a web browser. Now it is time to develop the signaling server. As a transport layer for the signaling mechanism, we will use WebSockets; it is supported well by all web browsers that support WebRTC, and this protocol is pretty suitable for the signaling role.
The application description file describes our application. It is something similar to the manifest
file for C# applications or Java applets. Here, we describe what our application itself is, define its version number, define other modules it depends on, and so on.
Edit the apps/rtcserver/src/rtcserver.app.src
file.
The application ID/name is as follows:
{application, rtcserver, [
The application description is not mandatory, so for now, we can skip it. The version number is set to 1
as we have just started. The applications
option gives a list of applications that we depend on. We also define the main module's name and environment variables (empty list):
{description, ""}, {vsn, "1"}, {registered, []}, {applications, [ kernel, stdlib, cowlib, cowboy, compiler, gproc ]}, {mod, { rtcserver_app, []}}, {env, []} ]}.
This application module is the main module of our signaling server application. Here, we start all the applications we're depending on and set up a web server and WebSocket handler for it.
Edit the apps/rtcserver/src/rtcserver_app.erl
the file.
The module name should be the same as the file name:
-module(rtcserver_app).
We tell Erlang VM that this is an application module:
-behaviour(application).
Describe which functions should be accessible from this module; /2
, /1
, and /0
are the parities of the function, that is, the number of arguments:
-export([start/2, stop/1, start/0]).
Now we need to start all the helping applications that we're depending on and then start our application itself:
start() -> ok = application:start(compiler),
Ranch is an effective connection pool:
ok = application:start(ranch),
Crypto needs to support SSL:
ok = application:start(crypto),
Cowboy is a lightweight web server that we use to build our signaling server on WebSockets:
ok = application:start(cowlib), ok = application:start(cowboy),
We use gproc
as a simple key/value DB in the memory to store the virtual rooms' numbers:
ok = application:start(gproc),
Start our application:
ok = application:start(rtcserver).
The following function will be called during the process of starting the application:
start(_StartType, _StartArgs) ->
First of all, we define a dispatcher, an entity used by the Cowboy application. With the dispatcher, we tell Cowboy where it should listen for requests from the clients and how to map requests to handlers:
Dispatch = cowboy_router:compile([ {'_',[
Here, we define that every /*
request to our signaling server should be processed by the handler_websocket
module (will be reviewed on the following page):
{"/", handler_websocket,[]} ]} ]),
Here, we ask Cowboy to start listening and processing clients' requests. Our HTTP process is named websocket
; it should listen on port 30000
and bind to any available network interface(s). The connection timeout value is set to 500
ms and the max_keep_alive
timeout is set to 50
seconds:
{ok, _} = cowboy:start_http(websocket, 100, [{port, 30000}], [{env, [{dispatch, Dispatch}]}, {max_keepalive, 50}, {timeout, 500} ]),
To make our application work, we need to call the start_link
function of the application's supervisor:
rtcserver_sup:start_link().
The following function is called when we want to stop the signaling server:
stop(_State) -> ok.
To make our Erlang-based signaling server work properly, we need to implement a supervisor process. This is the standard way in which Erlang applications usually work. This is not something specific to WebRTC applications, so we won't dive into deep details here. The code is very short.
Edit the apps/rtcserver/src/rtcserver_sup.erl
file:
-module(rtcserver_sup). -behaviour(supervisor). -export([start_link/0]). -export([init/1]). -define(CHILD(I, Type), {I, {I, start_link, []}, permanent, 5000, Type, [I]}). start_link() -> supervisor:start_link({local, ?MODULE}, ?MODULE, []). init([]) -> {ok, { {one_for_one, 5, 10}, []} }.
A WebSocket handler module will implement the signaling server's functionality. It will communicate with both the peers, create rooms, and do all the other stuff that we're awaiting to get done from the signaling server.
Edit the apps/rtcserver/src/handler_websocket.erl
file:
-module(handler_websocket). -behaviour(cowboy_websocket_handler). -export([init/3]). -export([websocket_init/3, websocket_handle/3, websocket_info/3, websocket_terminate/3]).
The following is a record where we can store useful information about the connection and peers:
-record(state, { client = undefined :: undefined | binary(), state = undefined :: undefined | connected | running, room = undefined :: undefined | integer() }).
We're trapped here when a peer tries to connect to the signaling server. At this stage, we just need to reply with the upgrade
state to establish the WebSockets connection with the web browser properly:
init(_Any, _Req, _Opt) -> {upgrade, protocol, cowboy_websocket}.
The following function is called when the connection is established (a peer has been connected to the signaling server):
websocket_init(_TransportName, Req, _Opt) ->
Get the x-forwarded-for
field from HTTP request header, and store it as the peer's IP address:
{Client, Req1} = cowboy_req:header(<<"x-forwarded-for">>, Req), State = #state{client = Client, state = connected}, {ok, Req1, State, hibernate}.
The following function is called when we get a message from some of our peers. We need to parse the message, decide what to do, and reply if necessary:
websocket_handle({text,Data}, Req, State) ->
Mark our state as running
; the new peer is connected and the peer to signaling server connection has been established:
StateNew = case (State#state.state) of started -> State#state{state = running}; _ -> State end,
We use JSON to encode messages that are transferred between the clients and the signaling server, so we need to decode the message:
JSON = jsonerl:decode(Data), {M,Type} = element(1,JSON), case M of <<"type">> -> case Type of
The type of the message is GETROOM
; someone wants to create a virtual room. Here, we will create the room and reply with the room's number:
<<"GETROOM">> ->
We use the generate_room
function to create a virtual room:
Room = generate_room(),
Construct the answer message and encode it to JSON:
R = iolist_to_binary(jsonerl:encode({{type, <<"GETROOM">>}, {value, Room}})),
Store the room number and the associated process ID in the key/value DB. If someone tries to enter a virtual room, we need some mechanism to understand whether the room exists:
gproc:reg({p,l, Room}),
Store the room number in the state
entity; we will want to reuse this value further on:
S = (StateNew#state{room = Room}),
Send our reply back to the peer and exit:
{reply, {text, <<R/binary>>}, Req, S, hibernate};
If the message type is ENTERROOM
, it means that someone tries to enter a virtual room that does exist and someone has to be present in this room already:
<<"ENTERROOM">> ->
Extract the room number from the message and look up all the participants present in the virtual room:
{<<"value">>,Room} = element(2,JSON), Participants = gproc:lookup_pids({p,l,Room}), case length(Participants) of
If we have just one participant, register the new peer process ID in this room and store the room number in the state
entity:
1 -> gproc:reg({p,l, Room}), S = (StateNew#state{room = Room}), {ok, Req, S, hibernate};
Otherwise, reply with the WRONGROOM
message back to the peer:
_ -> R = iolist_to_binary(jsonerl:encode({{type, <<"WRONGROOM">>}})), {reply, {text, <<R/binary>>}, Req, StateNew, hibernate} end;
If we get a message of some other type, then just transfer it to connected peer:
_ -> reply2peer(Data, StateNew#state.room), {ok, Req, StateNew, hibernate} end; _ -> reply2peer(Data, State#state.room), {ok, Req, StateNew, hibernate} end;
If we get a message of an unknown sort, we just ignore it:
websocket_handle(_Any, Req, State) -> {ok, Req, State, hibernate}.
The preceding method is called when we receive a message from the other process; in this case, we send the message to the connected peer. We will use the following code to implement the web chat and data transfer functionality in later chapters:
websocket_info(_Info, Req, State) -> {reply, {text,_Info}, Req, State, hibernate}.
The following code is called when the connection is terminated (the remote peer closed the web browser, for example):
websocket_terminate(_Reason, _Req, _State) -> ok.
Send a message (R
) to every peer that is connected to the room except the one we received the message from:
reply2peer(R, Room) -> [P ! <<R/binary>> || P <- gproc:lookup_pids({p,l,Room}) -- [self()]].
Generate the virtual room number using a random number generator:
generate_room() -> random:seed(now()), random:uniform(999999).
We need to tell the Rebar tool which applications our server is dependent on and where we can download them.
Edit the apps/rtcserver/rebar.config
file:
{erl_opts, [warnings_as_errors]}. {deps, [ {'gproc', ".*", { git, "git://github.com/esl/gproc.git", {tag, "0.2.16"} }}, {'jsonerl', ".*", { git, "git://github.com/fycth/jsonerl.git", "master" }}, {'cowboy', ".*", { git,"https://github.com/extend/cowboy.git","0.9.0" }} ]}.
Create another rebar.config
file under your project's folder:
{sub_dirs, [ "apps/rtcserver" ]}.
This configuration file tells the Rebar tool that it needs to look into apps/rtcserver
and process the content.
Now, go to the project's directory and execute the following command in the console:
rebar get-deps
It will download all the necessary dependencies to the deps
directory.
We want to compile our code, so we execute the following command:
rebar compile
It will compile our application and dependencies. After this gets completed, start the rtcserver
application using the following command:
erl -pa deps/*/ebin apps/*/ebin -saslerrlog_type error -s rtcserver_app
Using this command, you will get into the Erlang VM console and start the signaling server (the rtcserver
application). From now on, it will listen on the TCP port 30000
(or the other one, if you changed it in the code).
You can check where the server is listening on the requests using the netstat
command. For Linux, you can use the following command:
netstat –na | grep 30000
If the server is running, you should see it listening on the port that is binded to the 0.0.0.0
address.
For Windows, you can use the following construction:
netstat –na | findstr 30000
We started the signaling server and now it is time to test our application. Now, point your web browser to the domain you prepared for your application. It should open the index page with your web camera's view on it. Above the camera's view, you should see the URL that will direct the second participant to join the conference. Open this URL on another machine, and the connection should establish automatically and both sides should be able to see each other's videos.
To stop the signaling server and quit from VM console, you can use the q().
command.
As you already know, it is important to have access to the STUN/TURN server to work with peers located behind NAT or a firewall. In this chapter, developing our application, we used pubic STUN servers (actually, they are public Google servers accessible from other networks).
Nevertheless, if you plan to build your own service, you should install your own STUN/TURN server. This way, your application will not be dependent on a server even you can't control. Today, we have public STUN servers from Google; tomorrow, they can be switched off. So, the right way is to have your own STUN/TURN server.
In this section, you will be introduced to installing the STUN server as a simpler case. The installation and configuration of the TURN server is more complex and will be discovered in Chapter 4, Security and Authentication, during the development of another application.
There are several implementations of STUN servers that you can find on the Internet. You can take this one: http://www.stunprotocol.org.
The server is cross-platform and can be used under Windows, Mac OS X, or Linux.
To start the STUN server, you should use the following command line:
stunserver --mode full --primaryinterfacex1.x1.x1.x1 --altinterfacex2.x2.x2.x2
Please pay attention to the fact that you need two IP addresses on your machine to run the STUN server. It is mandatory in order to make the STUN protocol work correctly. The machine can have only one physical network interface, but it should also have a network alias with an IP address that is different from the one we used on the main network interface.
In this chapter, we developed a video conference service using WebRTC. During the development process, we learned what STUN and TURN servers are and how they can help us achieve our goals. We got an introduction to the main WebRTC API functions. Now you know what we mean by keywords such as ICE and SDP and why they are very useful.
You also had a chance to get acquainted with Erlang and WebSockets, if you were not already acquainted with them.
In the next chapter, we will learn what Data API is and will develop a peer-to-peer file-sharing application. Most of the keywords and code will be not new for you, so it will be easier to get to the topic.