Chapter 1. Getting Started with WebRTC

The Internet is no stranger to audio and video. Everyday web applications, such as Netflix and Pandora, stream audio and video content to millions of people. On the other hand, the Web is a stranger to real-time communication. Websites, such as Facebook, are only just starting to enable video-based communication in a browser, and they typically use a plugin that users have to install. This is where Web Real-Time Communication (WebRTC) comes into play.

In this chapter, we are going to cover the basics of WebRTC:

The current status of the audio and video space
The role that WebRTC plays in changing this space
The major features of WebRTC and how they can be used

Audio and video communication today

Communicating with audio and video is a fairly common task with a history of technologies and tools. For a good example of audio communication, just take a look at a cell phone carrier. Large phone companies have established large networks of audio communication technology to bring audio communication to millions of people across the globe. These networks are a great example when it comes to showing widespread audio communication at its finest.

Video communication is also becoming just as prevalent as audio communication. With technologies such as Apple's FaceTime, Google Hangouts, and Skype video calling, speaking to someone over a video stream is a simple task for an everyday user. A wide range of techniques have been developed in these applications to ensure that the quality of the video is an excellent experience for the user. There have been engineering solutions to problems, such as losing data packets, recovering from disconnections, and reacting to changes in a user's network.

The aim of WebRTC is to bring all of this technology into the browser. Many of these solutions require users to install plugins or applications on their PCs and mobile devices. They also require developers to pay for licensing, creating a huge barrier and deterring new companies to join this space. With WebRTC, the focus is on enabling this technology for every browser user without the need for plugins or hefty technology license fees for developers. The idea is to be able to simply open up a website and connect with another user right then and there.

Enabling audio and video on the Web

The biggest accomplishment of WebRTC is bringing high-quality audio and video to the open the Web without the need for third-party software or plugins. Currently, there are no high-quality, well-built, freely available solutions that enable real-time communication in the browser. The success of the Internet is largely due to the high availability and open use of technologies, such as HTML, HTTP, and TCP/IP. To move the Internet forward, we want to continue building on top of these technologies. This is where WebRTC comes into play.

To build a real-time communication application from scratch, we would need to bring in a wealth of libraries and frameworks to deal with the many issues faced when developing these types of applications. These typically include software to handle connection dropping, data loss, and NAT traversal. The great thing about WebRTC is that all of this comes built-in to the browser API. Google has open sourced much of the technology involved in accomplishing this communication in a high-quality and complete manner.

Note

Most of the information about WebRTC, including the source code of its implementation, can be found freely available at http://www.webrtc.org/.

With WebRTC, the heavy lifting is all done for you. The API brings a host of technologies into the browser to make implementation details easy. This includes camera and microphone capture, video and audio encoding and decoding, transportation layers, and session management.

Camera and microphone capture

The first step to using any communication platform is to gain access to the camera and microphone on the device that the user is using. This means detecting the types of devices available, getting permission from the user to access them, and obtaining a stream of data from the device itself. This is where we will begin implementing our first application.

Encoding and decoding audio and video

Unfortunately, even with the improvements made in network speed, sending a stream of audio and video data over the Internet is too much to handle. This is where encoding and decoding comes in. This is the process of breaking down video frames or audio waves into smaller chunks and compressing them into a smaller size. The smaller size then makes it faster to send them across a network and decompress them on the other side. The algorithm behind this technique is typically called a codec.

If you have ever had trouble playing a video file on your computer, then you have some insight into the complex world of video and audio codecs. There are several different ways to encode audio and video streams, each with their different benefits. To add to this, there are many different companies that have different business goals behind creating and maintaining a codec. This means not all of the codecs are free for everyone to use.

There are many codecs in use inside WebRTC. These include H.264, Opus, iSAC, and VP8. When two browsers speak to each other, they pick the most optimal supported codec between the two users. The browser vendors also meet regularly to decide which codecs should be supported in order for the technology to work. You can read more about the support for various codecs at http://www.webrtc.org/faq.

You could easily write several books on the subject of codecs. In fact, there are many books already written on the subject. Fortunately for us, WebRTC does most of the encoding in the browser layer. We will not worry about it over the course of this book but, once you start venturing past basic video and audio communication, you will more than likely bump heads with codec support.

Transportation layer

The transportation layer is the topic of several other books as well. This layer deals with packet loss, ordering of packets, and connecting to other users. The API makes it easy to deal with the fluctuations of a user's network and facilitates reacting to changes in connectivity.

The way WebRTC handles packet transport is very similar to how the browser handles other transport layers, such as AJAX or WebSockets. The browser gives an easy-to-access API with events that tell you when there are issues with the connection. In reality, the code to handle a simple WebRTC call could be thousands or tens of thousands of lines long. These can be used to handle all the different use cases, ranging from mobile devices, desktops, and more.

Session management

Session management is the final piece of the WebRTC puzzle. This is simpler than managing network connectivity or dealing with codecs but still an important piece of the puzzle. This will deal with opening multiple connections in a browser, managing open connections, and organizing what goes to which person. This can most commonly be called signaling and will be dealt with more in Chapter 4, Creating a Signaling Server.

Included in this array of new features is also support for data transfer. Since a high-quality data connection is needed between two clients for audio and video, it also makes sense to use this connection to transfer arbitrary data. This is exposed to the JavaScript layer through the RTCDataChannel API. We will cover this in more detail at a later point.

WebRTC today has many of the building blocks needed to build an extremely high-quality real-time communication experience. Google, Mozilla, Opera, and many others have invested a wealth of time and effort through some of their best video and audio engineers to bring this experience to the Web. WebRTC even has roots in the same technology used to bring Voice over Internet Protocol (VoIP) communication to users. It will change the future of how engineers think about building real-time communication applications.

Creating web standards

The great thing about the Web is that it moves so fast. New standards are changed or created everyday and it is always improving. Browsers have further improved on this concept by allowing updates to be downloaded and installed without the user ever knowing. This makes the web developer's job an easier one, but it does mean that you have to keep up with what is going on in the world of the Web, and this includes WebRTC.

The way these changes are implemented across browsers is through standardized bodies. These are groups of individuals who work through a common organization to democratize the changing of browser APIs. The two organizations that control the standards for WebRTC are the World Wide Web Consortium (W3C) and the Internet Engineering Task Force (IETC).

Unlike many other standardized organizations, the W3C allows much of its information to be freely available to the public. This means that anyone can go online and view information about the implementation details of an API. The one for WebRTC is located at http://www.w3.org/TR/webrtc/. This is one way to refer to and learn more about how WebRTC works.

Getting involved in these organizations is one way to not only keep up-to-date on the latest technologies, but also to help shape the future of the Web. Participating in these communities makes browsers the fastest growing development stack out there. If you would like to learn more, visit http://www.w3.org/participate/ to find different ways to participate in the discussions.

Browser support

Although the goal of WebRTC is to be ubiquitous for every user, this does not mean that every browser all the same features at the same time. Different browsers may choose to be ahead of the curve in certain areas, which makes some things work in one browser and not another. The current support for WebRTC in the browser space is shown in the following section.

Note

There are multiple websites that can tell you if your browser supports a specific technology, such as http://caniuse.com/rtcpeerconnection, that tells you which browsers support WebRTC.

Compatibility with Chrome, Firefox, and Opera

There are chances that the browser you currently use supports WebRTC. Chrome, Firefox, and Opera all support WebRTC out-of-the-box. This should work on all mainstream OSes, such as Windows, Mac, and Linux, as well. The browser vendors, such as Chrome and Firefox, have also been working together to fix interoperability issues so they can all communicate with each other easily.

Compatibility with Android OS

This is also the case for Chrome and Firefox on Android operating systems as well. WebRTC-based applications should work out-of-the-box and be able to interoperate with other browsers after Android version 4.0 (Ice Cream Sandwich). This is due to the code sharing notion between the desktop and mobile versions of both Chrome and Firefox.

Compatibilty with Apple

Apple has made little effort to enable WebRTC in either Safari or iOS. There are rumors of support but no official date on when support will come about. The one workaround that others have used for hybrid native/web-based iOS applications is to embed the WebRTC code directly into their applications and load a WebRTC application into a WebView.

Compatibility with Internet Explorer

Microsoft has not announced any plans to enable WebRTC in Internet Explorer. They have proposed an alternative solution to enable audio and video communication in the browser. This alternative was turned down in favor of WebRTC. Since then, Microsoft has been a silent partner in the development of the technology.

Note

Throughout the course of this book, it is recommended that you use Chrome for all the examples. Always keep a lookout for updates on browser support, however, as this is a constantly changing space!

Using WebRTC in your browser

Now that you know which browser to use, we will jump right in and try out WebRTC right now! Navigate your browser to the demo application available at https://apprtc.appspot.com/. If you use Chrome, Firefox, or Opera, you should see a drop-down notification that looks similar to this:

Click on Allow to start streaming your audio and video input to the web page. You might have to configure your microphone or web camera settings to get them to work. Once you allow browser access to your camera and microphone, you should see a video feed of yourself from your camera.

The page should generate a custom ID for your current session. You should see this reflected in the URL of the page, such as https://apprtc.appspot.com/r/359323927. Simply copy and paste this URL into another browser window, either on your own computer or another one, and load the web page. Now, if everything works correctly, you should see two video feeds—one from your first client and another from the second. It should start to make sense why WebRTC is a powerful solution. This is how easy WebRTC makes real-time communication in the browser.

Applications enabled by WebRTC

Under the hood, WebRTC enables a basic peer-to-peer connection between two browsers. This is the heart of everything that happens with WebRTC. It is the first truly peer-to-peer connection inside a browser. This also means that anything you can do with peer connections can be easily extended to WebRTC. Many applications today use peer-to-peer capabilities, such as file sharing, text chat, multiplayer gaming, and even currencies. There are already hundreds of great examples of these types of applications working right inside the browser.

Most of these applications have one thing in common—they need a low-latency, high-performance connection between two users. WebRTC makes use of low-level protocols to deliver high-speed performance that could not be achieved otherwise. This speeds up data flow across the network, enabling large amounts of data to be transferred in a short amount of time.

WebRTC also enables a secure connection between two users to enable a higher level of privacy between them. Traffic traveling across a peer connection will not only be encrypted, but will also take a direct route to the other user. This means that packets sent in different connections might take entirely different routes over the Internet. This gives anonymity to users of WebRTC applications that is otherwise hard to guarantee when connecting to an application server.

This is just a subset of the types of applications enabled by WebRTC. Since WebRTC is built on the foundations of JavaScript and the Web, it can benefit many existing applications today. After reading this book, you should have the knowledge you need to create innovative WebRTC applications on your own!

Self-test questions

Q1. The goal of WebRTC is to provide easy access to real-time communication with no plugins or licensing fees. True or false?

Q2. Which of the following is not a feature that the browser provides through WebRTC?

Camera and microphone capture
Video and audio stream processing
Accessing a contact list
Session management

Q3. Participating in the W3C and IETC is only for big corporations with lots of money. True or false?

Q4. Which of the following is not a type of application that could benefit from using WebRTC?

File sharing
Video communication
Multiplayer gaming
None of the above

Summary

In this chapter, we gave you a glimpse of the features and technology behind WebRTC. You should have a firm grasp of what WebRTC aims to achieve and how this affects web applications today. You should also have an idea of what types of applications can be built with WebRTC and you should have tried out WebRTC already in your browser.

There was a wealth of information in this chapter, though if you did not take it all in, do not worry. We will go back and cover many of the topics presented here in detail over the course of the book. Feel free to explore some of the resources already on the Web today to get an even better understanding of what WebRTC is all about.

Next, we will start exploring camera and microphone capture using the getUserMedia API.

Then, we will start building our WebRTC application to handle a full one on one video and audio call directly in the browser.

Later on, we will start exploring how to extend this to multiple users, add data transfer through text-based chat and file sharing, and learn about the best security practices that are in place when using WebRTC.