Appium Essentials

1 (2 reviews total)
By Manoj Hans
    Advance your knowledge in tech with a Packt subscription

  • Instant online access to over 7,500+ books and videos
  • Constantly updated with 100+ new titles each month
  • Breadth and depth in over 1,000+ technologies

About this book

Nowadays, mobile automation is growing at a fast pace; this is where Appium comes in. It is a tool that performs automated mobile testing for Android and iOS.

Appium Essentials is a practical guide that will help you to perform mobile automation testing and gain a good understanding of mobile automation concepts. This book will teach you how to use Appium drives on native, web-based, and hybrid apps.

You will then explore Appium usage with different mobile applications, get acquainted with using an emulator to automate mobile apps, and learn about mobile gestures such as tap, zoom, and swipe. Finally, you will test your apps on a physical device and experience the look and feel of an end user!

Publication date:
April 2015


Chapter 1. Appium – Important Conceptual Background

In this chapter, we will learn about the Appium architecture, JavaScript Object Notation (JSON) wire protocol, and Appium sessions as well as gain an understanding of the desired capabilities before starting Appium. This chapter will also touch upon the topics of the Appium server and its client library.

In short, we will cover the following topics:

  • Appium's architecture

  • The Selenium JSON wire protocol

  • Appium sessions

  • Desired capabilities

  • The Appium server and its client library


Appium architecture

Appium is an HTTP server written in Node.js that creates and handles WebDriver sessions. The Appium web server follows the same approach as the Selenium WebDriver, which receives HTTP requests from client libraries through JSON and then handles those requests in different ways, depending on the platform it is running on.

Let's discuss how Appium works in iOS and Android.

Appium on iOS

On an iOS device, Appium uses Apple's UIAutomation API to interact with the UI elements. UIAutomation is a JavaScript library provided by Apple to write test scripts; Appium utilizes these same libraries to automate iOS apps.

Let's take a look at the architecture, which is shown in the following diagram:

In the preceding diagram, when we execute the test scripts, it goes in the form of JSON through an HTTP request to the Appium server. The Appium server sends the command to the instruments, and the instruments look for the bootstrap.js file, which is pushed by the Appium server to the iOS device. Then, these commands execute in the bootstrap.js file within the iOS instruments' environment. After the execution of the command, the client sends back the message to the Appium server with the log details of the executed command.

A similar kind of architecture follows in the case of Android app automation. Let's discuss the Appium architecture for Android.

Appium on Android

On an Android device, Appium uses the UIAutomator framework to automate the apps. UIAutomator is a framework that is developed by the Android developers to test the Android user interface.

Let's take a look at the architecture, which is shown in the following diagram:

In the preceding diagram, we have a UIAutomator/Selendroid in place of Apple instruments and bootstrap.jar in place of the bootstrap.js file.

Appium supports Android versions greater than or equal to 17; for earlier versions, it uses the Selendroid framework. When we execute the test scripts, Appium sends the command to the UIAutomator or Selendroid on the basis of the Android version.

Here, bootstrap.jar plays the role of a TCP server, which we can use to send the test command in order to perform the action on the Android device using UIAutomator/Selendroid.


The Selenium JSON wire protocol

The JSON wire protocol (JSONWP) is a transport mechanism created by WebDriver developers. This wire protocol is a specific set of predefined, standardized endpoints exposed via a RESTful API. The purpose of WebDriver and JSONWP is the automated testing of websites via a browser such as Firefox driver, IE driver, and Chrome driver.

Appium implements the Mobile JSONWP, the extension to the Selenium JSONWP, and it controls the different mobile device behaviors, such as installing/uninstalling apps over the session.

Let's have a look at some of the endpoints from the API which are used to interact with mobile applications:

  • /session/:sessionId

  • /session/:sessionId/element

  • /session/:sessionId/elements

  • /session/:sessionId/element/:id/click

  • /session/:sessionId/source

  • /session/:sessionId/url

  • /session/:sessionId/timeouts/implicit_wait

Appium provides client libraries similar to WebDriver that act as an interface to the REST API. These libraries have functions similar to the following method:


This method will issue an HTTP request to the JSONWP, and it gets the response from the applicable API endpoint. In this case, the API endpoint that handles the getPageSource method is as follows:


The driver will execute the test script that comes in the JSON format from the AppiumDriver server to get the source. It will return the page source in the string format. In case of non-HTML (native mobile apps) platforms, the Appium library will respond with an XML document representation of the UI hierarchy. The specific structure of the document may vary from platform to platform.


Appium session

A session is a medium to send commands to the specific test application; a command is always performed in the context of a session. As we saw in the previous section, a client uses the session identifier as the sessionId parameter before performing any command. The client library requests the server to create a session. The server will then respond with a sessionId endpoint, which is used to send more commands to interact with the application(s) being tested.


Desired capabilities

Desired capabilities is a JSON object (a set of keys and values) sent by the client to the server. It describes the capabilities for the automation session in which we are interested.

Let's discuss the capabilities one by one; first, we will see the Appium server's capabilities:

We need to import "import org.openqa.Selenium.remote.DesiredCapabilities" library for Java to work with the desired capabilities.




This capability is used to define the automation engine. If you want to work with an Android SDK version less than 17, then you need to define the value as Selendroid; otherwise, the capability takes the default value as Appium. Let's see how we can implement it practically:

DesiredCapabilities caps = new DesiredCapabilities(); // creating an object
// to set capability value

We can also set the capabilities using Appium's client library. For this, users need to import "import io.appium.java_client.remote.MobileCapabilityType" library:


There's no need to use this capability in the case of iOS.


It is used to set the mobile OS platform. It uses the value as iOS, Android, or FirefoxOS:


In case of the Appium client library, you can use this:

caps.setCapability(MobileCapabilityType.PLATFORM_NAME, "Android");


To set the mobile OS version, for example, 7.1, 4.4.4, use the following command:


Alternatively, you can use the following command as well:

caps.setCapability(MobileCapabilityType.PLATFORM_VERSION, "4.4.4");


We can define the type of mobile device or emulator to use, using the following command, for example, iPhone Simulator, iPad Simulator, iPhone Retina 4-inch, Android Emulator, Moto x, Nexus 5, and so on:

caps.setCapability("deviceName", "Nexus 5");

You can use the following command as well:

caps.setCapability(MobileCapabilityType.DEVICE_NAME,"Nexus 5");


We can add the absolute local path or remote HTTP URL of the .ipa,.apk, or .zip file. Appium will install the app binary on the appropriate device first. Note that in the case of Android, if you specify the appPackage and appActivity (both the capabilities will be discussed later in this section) capabilities, then this capability shown here is not required:

caps.setCapability("app","/apps/demo/demo.apk or");

Alternatively, you can use the following command:

caps.setCapability(MobileCapabilityType.APP,"/apps/demo/demo.apk or");


If you want to automate mobile web applications, then you have to use this capability to define the browser.

For Safari on iOS, you can use this:

caps.setCapability("browserName", "Safari");

Also, you can use the following command:

caps.setCapability(MobileCapabilityType.BROWSER_NAME, "Safari");

For Chrome on Android, you can use this:

caps.setCapability("browserName", "Chrome");

Alternatively, you can use the following command:

caps.setCapability(MobileCapabilityType.BROWSER_NAME, "Chrome");


To end the session, Appium will wait for a few seconds for a new command from the client before assuming that the client quit. The default value is 60. To set this time, you can use the following command:

caps.setCapability("newCommandTimeout", "30");

You can also use this command to end the session:



This capability is used to install and launch the app automatically. The default value is set to true. You can set the capability with the following command:



This is used to set the language on the simulator/emulator, for example, fr, es, and so on. The following command will work only on the simulator/emulator:



This is used to set the locale for the simulator/emulator, for example, fr_CA, tr_TR, and so on:



A unique device identifier (udid) is basically used to identify iOS physical device. It is a 40 character long value (for example, 1be204387fc072g1be204387fc072g4387fc072g). This capability is used when you are automating apps on iOS physical device. We can easily get the device udid from iTunes, by clicking on Serial Number:

caps.setCapability("udid", "1be204387fc072g1be204387fc072g4387fc072g");


This is used to start in a certain orientation in simulator/emulator only, for example, LANDSCAPE or PORTRAIT:

caps.setCapability("orientation", "PORTRAIT");


If you are automating hybrid apps and want to move directly into the Webview context, then you can set it by using this capability; the default value is false:

caps.setCapability("autoWebview", "true");


This capability is used to reset the app's state before the session starts; the default value is false:

caps.setCapability("noReset"-," true");


In iOS, this will delete the entire simulator folder. In Android, you can reset the app's state by uninstalling the app instead of clearing the app data; also, it will remove the app after the session is complete. The default value is false. The following is the command for fullReset:

caps.setCapability("fullReset", "true");

Android capabilities

Now, let's discuss the Android capabilities, as shown in the following table:




This capability is for the Java package of the Android app that you want to run, for example,,, and so on:

caps.setCapability("appPackage", "");

Alternatively, you can use this command:

caps.setCapability(MobileCapabilityType.APP_PACKAGE, "");


By using this capability, you can specify the Android activity that you want to launch from your package, for example, MainActivity, .Settings,, and so on:

caps.setCapability("appActivity", "");

You can also use the following command:

caps.setCapability(MobileCapabilityType.APP_ACTIVITY, "");


Android activity for which the user wants to wait can be defined using this capability:


Alternatively, you can also use this command:



The Java package of the Android app you want to wait for can be defined using the following capability, for example,,, and so on:



You can set the timeout (in seconds) while waiting for the device to be ready, as follows; the default value is 5 seconds:


Alternatively, you can also use this command:



You can enable the Chrome driver's performance logging by the use of this capability. It will enable logging only for Chrome and web view; the default value is false:

caps.setCapability("enablePerformanceLogging", "true");


To set the timeout in seconds for a device to become ready after booting, you can use the following capability:



This capability is used to set DevTools socket name. It is only needed when an app is a Chromium-embedding browser. The socket is opened by the browser and the ChromeDriver connects to it as a DevTools client, for example, chrome_DevTools_remote:



Using this capability, you can specify the name of avd that you want to launch, for example, AVD_NEXUS_5:



This capability will help you define how long you need to wait (in milliseconds) for an avd to launch and connect to the Android Debug Bridge (ADB) (the default value is 120000):



You can specify the wait time (in milliseconds) for an avd to finish its boot animations using the following capability; the default wait timeout is 120000:



To pass the additional emulator arguments when launching an avd, use the following capability, for example, netfast:



You can give the absolute local path to the WebDriver executable (if the Chromium embedder provides its own WebDriver, it should be used instead of the original ChromeDriver bundled with Appium) using the following capability:



The following capability allows you to set the time (in milliseconds) for which you need to wait for the Webview context to become active; the default value is 2000:



Intent action is basically used to start an activity, as shown in the following code. The default value is android.intent.action.MAIN. For example, android.intent.action.MAIN, android.intent.action.VIEW, and so on:



This provides the intent category that will be used to start the activity (the default is android.intent.category.LAUNCHER), for example, android.intent.category.LAUNCHER, android.intent.category.APP_CONTACTS:



Flags are used to start an activity (the default is 0x10200000), for example, 0x10200000:



You can enable Unicode input by using the following code; the default value is false:



You can reset the keyboard to its original state by using this capability. The default value is false:


iOS capabilities

Let's discuss the iOS capabilities, as shown in the following table:




This is used to set the calendar format for the iOS simulator. It applies only to a simulator, for example, Gregorian:

caps.setCapability("calendarFormat"," Gregorian");


BundleId is basically used to start an app on a real device or to use other apps that require the bundleId during the test startup, for example, io.appium.TestApp:

caps.setCapability("bundleId"," io.appium.TestApp");


This is used to specify the amount of time (in millisecond) you need to wait for Instruments before assuming that it hung and the session failed. This can be done using the following command:



This capability is used to enable location services. You can apply it only on a simulator; you can give the Boolean value, as follows:



If you want to use this capability, you must provide the bundleId by using the bundleId capability. You can use this capability on a simulator. After setting this, the location services alert doesn't pop up. The default is the current simulator setting and its value is false:



Using this capability, you can accept privacy permission alerts automatically, such as location, contacts, photos, and so on, if they arise; the default value is false:



You can use the native instruments library by setting up this capability:



This can be used to enable real web taps in Safari, which are non-JavaScript based. The default value is false. Let me warn you that this might not perfectly deal with an element; it depends on the viewport's size/ratio:



You can use this capability on a simulator only. It allows JavaScript to open new windows in Safari. The default is the current simulator setting. To do this, you can use the following command:



This capability can be used only on a simulator. It prohibits Safari from displaying a fraudulent website warning. The default value is the current simulator setting, as follows:



This capability enables Safari to open links in new windows; the default keeps the current simulator settings:



Whether you need to keep keychains (Library/Keychains) when an Appium session is started/finished can be defined using this capability. You can apply it on a simulator, as follows:



This capability allows you to pass arguments while AUT using instruments, for example, myflag:



You can delay the keystrokes sent to an element when typing uses this capability. It takes the value in milliseconds:


We have seen all the desired capabilities that are used in Appium. Now, we will talk in brief about the Appium server and its client library.


The Appium server and its client libraries

The Appium server is used to interact with different platforms such as iOS and Android. It creates a session to interact with mobile apps, which are not supported on any platform. It is an HTTP server written in Node.js and uses the same concept as the Selenium Server, which identifies the HTTP requests from the client libraries and sends these requests to the appropriate platform. To start the Appium server, users need to download the source or install it directly from npm. Appium also provides the GUI version of the server. You can download it from the official Appium site, In the next chapter, we will discuss the GUI version in more detail.

One of the biggest advantages of Appium is because it is simply a REST API at its core, the code you use to interact with it is written in a number of languages such as Java, C#, Ruby, Python, and others. Appium extends the WebDriver client libraries and adds the extra commands in it to work with mobile devices. It provides client libraries that support Appium extensions to the WebDriver protocol. Because of these extensions to the protocol, it is important to use Appium-specific client libraries to write automation tests or procedures, instead of generic WebDriver client libraries.

Appium added some interesting functionality for working closely with mobile devices, such as multitouch gestures and screen orientation. We will see the practical implementation of these functionalities later.



We should now have an understanding of the Appium architecture, JSON wire protocol, desired capabilities, and its uses. We also learned about the Appium server and its language-specific client library in this chapter.

Specifically, we dove into JSONWP and Appium session, which are used to send further commands in order to interact with the application. We also set up automation sessions using the desired capabilities. In the last section, we grasped some information about the Appium server and its language-specific client libraries.

In the next chapter, we will take a look at what we require to get started with Appium.

About the Author

  • Manoj Hans

    Manoj Hans is a senior QA engineer who has rich experience in software testing. Apart from testing, he has worked in other areas of IT such as web hosting, development, and software configuration.

    He was interviewed for the September 2013 edition of Software Developer's JOURNAL magazine for Selenium training in India. Manoj is passionate about automation testing and loves to automate things.

    Browse publications by this author

Latest Reviews

(2 reviews total)
It does not have artwork for an installation and a practical case.
Appium Essentials has the distinction of being the single worst technical book I have ever read. It meanders through a set of topics with minimal instruction or explanation of underlying principles; instead it dumps source code listings. Packt deserves at least part of the blame: the book was clearly never copy-edited. Have you ever wanted to read 150+ pages about testing that uses the keyword "assert" only three times? If so, this is the book for you!
Appium Essentials
Unlock this book and the full library for $5 a month*
Start now