Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletter Hub
Free Learning
Arrow right icon
timer SALE ENDS IN
0 Days
:
00 Hours
:
00 Minutes
:
00 Seconds

How-To Tutorials

7019 Articles
article-image-adding-real-time-functionality-using-socketio
Packt
22 Sep 2014
18 min read
Save for later

Adding Real-time Functionality Using Socket.io

Packt
22 Sep 2014
18 min read
In this article by Amos Q. Haviv, the author of MEAN Web Development, decribes how Socket.io enables Node.js developers to support real-time communication using WebSockets in modern browsers and legacy fallback protocols in older browsers. (For more resources related to this topic, see here.) Introducing WebSockets Modern web applications such as Facebook, Twitter, or Gmail are incorporating real-time capabilities, which enable the application to continuously present the user with recently updated information. Unlike traditional applications, in real-time applications the common roles of browser and server can be reversed since the server needs to update the browser with new data, regardless of the browser request state. This means that unlike the common HTTP behavior, the server won't wait for the browser's requests. Instead, it will send new data to the browser whenever this data becomes available. This reverse approach is often called Comet, a term coined by a web developer named Alex Russel back in 2006 (the term was a word play on the AJAX term; both Comet and AJAX are common household cleaners in the US). In the past, there were several ways to implement a Comet functionality using the HTTP protocol. The first and easiest way is XHR polling. In XHR polling, the browser makes periodic requests to the server. The server then returns an empty response unless it has new data to send back. Upon a new event, the server will return the new event data to the next polling request. While this works quite well for most browsers, this method has two problems. The most obvious one is that using this method generates a large number of requests that hit the server with no particular reason, since a lot of requests are returning empty. The second problem is that the update time depends on the request period. This means that new data will only get pushed to the browser on the next request, causing delays in updating the client state. To solve these issues, a better approach was introduced: XHR long polling. In XHR long polling, the browser makes an XHR request to the server, but a response is not sent back unless the server has a new data. Upon an event, the server responds with the event data and the browser makes a new long polling request. This cycle enables a better management of requests, since there is only a single request per session. Furthermore, the server can update the browser immediately with new information, without having to wait for the browser's next request. Because of its stability and usability, XHR long polling has become the standard approach for real-time applications and was implemented in various ways, including Forever iFrame, multipart XHR, JSONP long polling using script tags (for cross-domain, real-time support), and the common long-living XHR. However, all these approaches were actually hacks using the HTTP and XHR protocols in a way they were not meant to be used. With the rapid development of modern browsers and the increased adoption of the new HTML5 specifications, a new protocol emerged for implementing real-time communication: the full duplex WebSockets. In browsers that support the WebSockets protocol, the initial connection between the server and browser is made over HTTP and is called an HTTP handshake. Once the initial connection is made, the browser and server open a single ongoing communication channel over a TCP socket. Once the socket connection is established, it enables bidirectional communication between the browser and server. This enables both parties to send and retrieve messages over a single communication channel. This also helps to lower server load, decrease message latency, and unify PUSH communication using a standalone connection. However, WebSockets still suffer from two major problems. First and foremost is browser compatibility. The WebSockets specification is fairly new, so older browsers don't support it, and though most modern browsers now implement the protocol, a large group of users are still using these older browsers. The second problem is HTTP proxies, firewalls, and hosting providers. Since WebSockets use a different communication protocol than HTTP, a lot of these intermediaries don't support it yet and block any socket communication. As it has always been with the Web, developers are left with a fragmentation problem, which can only be solved using an abstraction library that optimizes usability by switching between protocols according to the available resources. Fortunately, a popular library called Socket.io was already developed for this purpose, and it is freely available for the Node.js developer community. Introducing Socket.io Created in 2010 by JavaScript developer, Guillermo Rauch, Socket.io aimed to abstract Node.js' real-time application development. Since then, it has evolved dramatically, released in nine major versions before being broken in its latest version into two different modules: Engine.io and Socket.io. Previous versions of Socket.io were criticized for being unstable, since they first tried to establish the most advanced connection mechanisms and then fallback to more primitive protocols. This caused serious issues with using Socket.io in production environments and posed a threat to the adoption of Socket.io as a real-time library. To solve this, the Socket.io team redesigned it and wrapped the core functionality in a base module called Engine.io. The idea behind Engine.io was to create a more stable real-time module, which first opens a long-polling XHR communication and then tries to upgrade the connection to a WebSockets channel. The new version of Socket.io uses the Engine.io module and provides the developer with various features such as events, rooms, and automatic connection recovery, which you would otherwise implement by yourself. In this article's examples, we will use the new Socket.io 1.0, which is the first version to use the Engine.io module. Older versions of Socket.io prior to Version 1.0 are not using the new Engine.io module and therefore are much less stable in production environments. When you include the Socket.io module, it provides you with two objects: a socket server object that is responsible for the server functionality and a socket client object that handles the browser's functionality. We'll begin by examining the server object. The Socket.io server object The Socket.io server object is where it all begins. You start by requiring the Socket.io module, and then use it to create a new Socket.io server instance that will interact with socket clients. The server object supports both a standalone implementation and the ability to use it in conjunction with the Express framework. The server instance then exposes a set of methods that allow you to manage the Socket.io server operations. Once the server object is initialized, it will also be responsible for serving the socket client JavaScript file for the browser. A simple implementation of the standalone Socket.io server will look as follows: var io = require('socket.io')();io.on('connection', function(socket){ /* ... */ });io.listen(3000); This will open a Socket.io over the 3000 port and serve the socket client file at the URL http://localhost:3000/socket.io/socket.io.js. Implementing the Socket.io server in conjunction with an Express application will be a bit different: var app = require('express')();var server = require('http').Server(app);var io = require('socket.io')(server);io.on('connection', function(socket){ /* ... */ });server.listen(3000); This time, you first use the http module of Node.js to create a server and wrap the Express application. The server object is then passed to the Socket.io module and serves both the Express application and the Socket.io server. Once the server is running, it will be available for socket clients to connect. A client trying to establish a connection with the Socket.io server will start by initiating the handshaking process. Socket.io handshaking When a client wants to connect the Socket.io server, it will first send a handshake HTTP request. The server will then analyze the request to gather the necessary information for ongoing communication. It will then look for configuration middleware that is registered with the server and execute it before firing the connection event. When the client is successfully connected to the server, the connection event listener is executed, exposing a new socket instance. Once the handshaking process is over, the client is connected to the server and all communication with it is handled through the socket instance object. For example, handling a client's disconnection event will be as follows: var app = require('express')();var server = require('http').Server(app);var io = require('socket.io')(server);io.on('connection', function(socket){socket.on('disconnect', function() {   console.log('user has disconnected');});});server.listen(3000); Notice how the socket.on() method adds an event handler to the disconnection event. Although the disconnection event is a predefined event, this approach works the same for custom events as well, as you will see in the following sections. While the handshake mechanism is fully automatic, Socket.io does provide you with a way to intercept the handshake process using a configuration middleware. The Socket.io configuration middleware Although the Socket.io configuration middleware existed in previous versions, in the new version it is even simpler and allows you to manipulate socket communication before the handshake actually occurs. To create a configuration middleware, you will need to use the server's use() method, which is very similar to the Express application's use() method: var app = require('express')();var server = require('http').Server(app);var io = require('socket.io')(server);io.use(function(socket, next) {/* ... */next(null, true);});io.on('connection', function(socket){socket.on('disconnect', function() {   console.log('user has disconnected');});});server.listen(3000); As you can see, the io.use() method callback accepts two arguments: the socket object and a next callback. The socket object is the same socket object that will be used for the connection and it holds some connection properties. One important property is the socket.request property, which represents the handshake HTTP request. In the following sections, you will use the handshake request to incorporate the Passport session with the Socket.io connection. The next argument is a callback method that accepts two arguments: an error object and Boolean value. The next callback tells Socket.io whether or not to proceed with the handshake process, so if you pass an error object or a false value to the next method, Socket.io will not initiate the socket connection. Now that you have a basic understanding of how handshaking works, it is time to discuss the Socket.io client object. The Socket.io client object The Socket.io client object is responsible for the implementation of the browser socket communication with the Socket.io server. You start by including the Socket.io client JavaScript file, which is served by the Socket.io server. The Socket.io JavaScript file exposes an io() method that connects to the Socket.io server and creates the client socket object. A simple implementation of the socket client will be as follows: <script src="/socket.io/socket.io.js"></script><script>var socket = io();socket.on('connect', function() {   /* ... */});</script> Notice the default URL for the Socket.io client object. Although this can be altered, you can usually leave it like this and just include the file from the default Socket.io path. Another thing you should notice is that the io() method will automatically try to connect to the default base path when executed with no arguments; however, you can also pass a different server URL as an argument. As you can see, the socket client is much easier to implement, so we can move on to discuss how Socket.io handles real-time communication using events. Socket.io events To handle the communication between the client and the server, Socket.io uses a structure that mimics the WebSockets protocol and fires events messages across the server and client objects. There are two types of events: system events, which indicate the socket connection status, and custom events, which you'll use to implement your business logic. The system events on the socket server are as follows: io.on('connection', ...): This is emitted when a new socket is connected socket.on('message', ...): This is emitted when a message is sent using the socket.send() method socket.on('disconnect', ...): This is emitted when the socket is disconnected The system events on the client are as follows: socket.io.on('open', ...): This is emitted when the socket client opens a connection with the server socket.io.on('connect', ...): This is emitted when the socket client is connected to the server socket.io.on('connect_timeout', ...): This is emitted when the socket client connection with the server is timed out socket.io.on('connect_error', ...): This is emitted when the socket client fails to connect with the server socket.io.on('reconnect_attempt', ...): This is emitted when the socket client tries to reconnect with the server socket.io.on('reconnect', ...): This is emitted when the socket client is reconnected to the server socket.io.on('reconnect_error', ...): This is emitted when the socket client fails to reconnect with the server socket.io.on('reconnect_failed', ...): This is emitted when the socket client fails to reconnect with the server socket.io.on('close', ...): This is emitted when the socket client closes the connection with the server Handling events While system events are helping us with connection management, the real magic of Socket.io relies on using custom events. In order to do so, Socket.io exposes two methods, both on the client and server objects. The first method is the on() method, which binds event handlers with events and the second method is the emit() method, which is used to fire events between the server and client objects. An implementation of the on() method on the socket server is very simple: var app = require('express')();var server = require('http').Server(app);var io = require('socket.io')(server);io.on('connection', function(socket){socket.on('customEvent', function(customEventData) {   /* ... */});});server.listen(3000); In the preceding code, you bound an event listener to the customEvent event. The event handler is being called when the socket client object emits the customEvent event. Notice how the event handler accepts the customEventData argument that is passed to the event handler from the socket client object. An implementation of the on() method on the socket client is also straightforward: <script src="/socket.io/socket.io.js"></script><script>var socket = io();socket.on('customEvent', function(customEventData) {   /* ... */});</script> This time the event handler is being called when the socket server emits the customEvent event that sends customEventData to the socket client event handler. Once you set your event handlers, you can use the emit() method to send events from the socket server to the socket client and vice versa. Emitting events On the socket server, the emit() method is used to send events to a single socket client or a group of connected socket clients. The emit() method can be called from the connected socket object, which will send the event to a single socket client, as follows: io.on('connection', function(socket){socket.emit('customEvent', customEventData);}); The emit() method can also be called from the io object, which will send the event to all connected socket clients, as follows: io.on('connection', function(socket){io.emit('customEvent', customEventData);}); Another option is to send the event to all connected socket clients except from the sender using the broadcast property, as shown in the following lines of code: io.on('connection', function(socket){socket.broadcast.emit('customEvent', customEventData);}); On the socket client, things are much simpler. Since the socket client is only connected to the socket server, the emit() method will only send the event to the socket server: var socket = io();socket.emit('customEvent', customEventData); Although these methods allow you to switch between personal and global events, they still lack the ability to send events to a group of connected socket clients. Socket.io offers two options to group sockets together: namespaces and rooms. Socket.io namespaces In order to easily control socket management, Socket.io allow developers to split socket connections according to their purpose using namespaces. So instead of creating different socket servers for different connections, you can just use the same server to create different connection endpoints. This means that socket communication can be divided into groups, which will then be handled separately. Socket.io server namespaces To create a socket server namespace, you will need to use the socket server of() method that returns a socket namespace. Once you retain the socket namespace, you can just use it the same way you use the socket server object: var app = require('express')();var server = require('http').Server(app);var io = require('socket.io')(server);io.of('/someNamespace').on('connection', function(socket){socket.on('customEvent', function(customEventData) {   /* ... */});});io.of('/someOtherNamespace').on('connection', function(socket){socket.on('customEvent', function(customEventData) {   /* ... */});});server.listen(3000); In fact, when you use the io object, Socket.io actually uses a default empty namespace as follows: io.on('connection', function(socket){/* ... */}); The preceding lines of code are actually equivalent to this: io.of('').on('connection', function(socket){/* ... */}); Socket.io client namespaces On the socket client, the implementation is a little different: <script src="/socket.io/socket.io.js"></script><script>var someSocket = io('/someNamespace');someSocket.on('customEvent', function(customEventData) {   /* ... */});var someOtherSocket = io('/someOtherNamespace');someOtherSocket.on('customEvent', function(customEventData) {   /* ... */});</script> As you can see, you can use multiple namespaces on the same application without much effort. However, once sockets are connected to different namespaces, you will not be able to send an event to all these namespaces at once. This means that namespaces are not very good for a more dynamic grouping logic. For this purpose, Socket.io offers a different feature called rooms. Socket.io rooms Socket.io rooms allow you to partition connected sockets into different groups in a dynamic way. Connected sockets can join and leave rooms, and Socket.io provides you with a clean interface to manage rooms and emit events to the subset of sockets in a room. The rooms functionality is handled solely on the socket server but can easily be exposed to the socket client. Joining and leaving rooms Joining a room is handled using the socket join() method, while leaving a room is handled using the leave() method. So, a simple subscription mechanism can be implemented as follows: io.on('connection', function(socket) {   socket.on('join', function(roomData) {       socket.join(roomData.roomName);   })   socket.on('leave', function(roomData) {       socket.leave(roomData.roomName);   })}); Notice that the join() and leave() methods both take the room name as the first argument. Emitting events to rooms To emit events to all the sockets in a room, you will need to use the in() method. So, emitting an event to all socket clients who joined a room is quite simple and can be achieved with the help of the following code snippets: io.on('connection', function(socket){   io.in('someRoom').emit('customEvent', customEventData);}); Another option is to send the event to all connected socket clients in a room except the sender by using the broadcast property and the to() method: io.on('connection', function(socket){   socket.broadcast.to('someRoom').emit('customEvent', customEventData);}); This pretty much covers the simple yet powerful room functionality of Socket.io. In the next section, you will learn how implement Socket.io in your MEAN application, and more importantly, how to use the Passport session to identify users in the Socket.io session. While we covered most of Socket.io features, you can learn more about Socket.io by visiting the official project page at https://socket.io. Summary In this article, you learned how the Socket.io module works. You went over the key features of Socket.io and learned how the server and client communicate. You configured your Socket.io server and learned how to integrate it with your Express application. You also used the Socket.io handshake configuration to integrate the Passport session. In the end, you built a fully functional chat example and learned how to wrap the Socket.io client with an AngularJS service. Resources for Article: Further resources on this subject: Creating a RESTful API [article] Angular Zen [article] Digging into the Architecture [article]
Read more
  • 0
  • 0
  • 15879

article-image-visualization-tool-understand-data
Packt
22 Sep 2014
23 min read
Save for later

Visualization as a Tool to Understand Data

Packt
22 Sep 2014
23 min read
In this article by Nazmus Saquib, the author of Mathematica Data Visualization, we will look at a few simple examples that demonstrate the importance of data visualization. We will then discuss the types of datasets that we will encounter over the course of this book, and learn about the Mathematica interface to get ourselves warmed up for coding. (For more resources related to this topic, see here.) In the last few decades, the quick growth in the volume of information we produce and the capacity of digital information storage have opened a new door for data analytics. We have moved on from the age of terabytes to that of petabytes and exabytes. Traditional data analysis is now augmented with the term big data analysis, and computer scientists are pushing the bounds for analyzing this huge sea of data using statistical, computational, and algorithmic techniques. Along with the size, the types and categories of data have also evolved. Along with the typical and popular data domain in Computer Science (text, image, and video), graphs and various categorical data that arise from Internet interactions have become increasingly interesting to analyze. With the advances in computational methods and computing speed, scientists nowadays produce an enormous amount of numerical simulation data that has opened up new challenges in the field of Computer Science. Simulation data tends to be structured and clean, whereas data collected or scraped from websites can be quite unstructured and hard to make sense of. For example, let's say we want to analyze some blog entries in order to find out which blogger gets more follows and referrals from other bloggers. This is not as straightforward as getting some friends' information from social networking sites. Blog entries consist of text and HTML tags; thus, a combination of text analytics and tag parsing, coupled with a careful observation of the results would give us our desired outcome. Regardless of whether the data is simulated or empirical, the key word here is observation. In order to make intelligent observations, data scientists tend to follow a certain pipeline. The data needs to be acquired and cleaned to make sure that it is ready to be analyzed using existing tools. Analysis may take the route of visualization, statistics, and algorithms, or a combination of any of the three. Inference and refining the analysis methods based on the inference is an iterative process that needs to be carried out several times until we think that a set of hypotheses is formed, or a clear question is asked for further analysis, or a question is answered with enough evidence. Visualization is a very effective and perceptive method to make sense of our data. While statistics and algorithmic techniques provide good insights about data, an effective visualization makes it easy for anyone with little training to gain beautiful insights about their datasets. The power of visualization resides not only in the ease of interpretation, but it also reveals visual trends and patterns in data, which are often hard to find using statistical or algorithmic techniques. It can be used during any step of the data analysis pipeline—validation, verification, analysis, and inference—to aid the data scientist. How have you visualized your data recently? If you still have not, it is okay, as this book will teach you exactly that. However, if you had the opportunity to play with any kind of data already, I want you to take a moment and think about the techniques you used to visualize your data so far. Make a list of them. Done? Do you have 2D and 3D plots, histograms, bar charts, and pie charts in the list? If yes, excellent! We will learn how to style your plots and make them more interactive using Mathematica. Do you have chord diagrams, graph layouts, word cloud, parallel coordinates, isosurfaces, and maps somewhere in that list? If yes, then you are already familiar with some modern visualization techniques, but if you have not had the chance to use Mathematica as a data visualization language before, we will explore how visualization prototypes can be built seamlessly in this software using very little code. The aim of this book is to teach a Mathematica beginner the data-analysis and visualization powerhouse built into Mathematica, and at the same time, familiarize the reader with some of the modern visualization techniques that can be easily built with Mathematica. We will learn how to load, clean, and dissect different types of data, visualize the data using Mathematica's built-in tools, and then use the Mathematica graphics language and interactivity functions to build prototypes of a modern visualization. The importance of visualization Visualization has a broad definition, and so does data. The cave paintings drawn by our ancestors can be argued as visualizations as they convey historical data through a visual medium. Map visualizations were commonly used in wars since ancient times to discuss the past, present, and future states of a war, and to come up with new strategies. Astronomers in the 17th century were believed to have built the first visualization of their statistical data. In the 18th century, William Playfair invented many of the popular graphs we use today (line, bar, circle, and pie charts). Therefore, it appears as if many, since ancient times, have recognized the importance of visualization in giving some meaning to data. To demonstrate the importance of visualization in a simple mathematical setting, consider fitting a line to a given set of points. Without looking at the data points, it would be unwise to try to fit them with a model that seemingly lowers the error bound. It should also be noted that sometimes, the data needs to be changed or transformed to the correct form that allows us to use a particular tool. Visualizing the data points ensures that we do not fall into any trap. The following screenshot shows the visualization of a polynomial as a circle: Figure1.1 Fitting a polynomial In figure 1.1, the points are distributed around a circle. Imagine we are given these points in a Cartesian space (orthogonal x and y coordinates), and we are asked to fit a simple linear model. There is not much benefit if we try to fit these points to any polynomial in a Cartesian space; what we really need to do is change the parameter space to polar coordinates. A 1-degree polynomial in polar coordinate space (essentially a circle) would nicely fit these points when they are converted to polar coordinates, as shown in figure 1.1. Visualizing the data points in more complicated but similar situations can save us a lot of trouble. The following is a screenshot of Anscombe's quartet: Figure1.2 Anscombe's quartet, generated using Mathematica Downloading the color images of this book We also provide you a PDF file that has color images of the screenshots/diagrams used in this book. The color images will help you better understand the changes in the output. You can download this file from: https://www.packtpub.com/sites/default/files/downloads/2999OT_coloredimages.PDF. Anscombe's quartet (figure 1.2), named after the statistician Francis Anscombe, is a classic example of how simple data visualization like plotting can save us from making wrong statistical inferences. The quartet consists of four datasets that have nearly identical statistical properties (such as mean, variance, and correlation), and gives rise to the same linear model when a regression routine is run on these datasets. However, the second dataset does not really constitute a linear relationship; a spline would fit the points better. The third dataset (at the bottom-left corner of figure 1.2) actually has a different regression line, but the outlier exerts enough influence to force the same regression line on the data. The fourth dataset is not even a linear relationship, but the outlier enforces the same regression line again. These two examples demonstrate the importance of "seeing" our data before we blindly run algorithms and statistics. Fortunately, for visualization scientists like us, the world of data types is quite vast. Every now and then, this gives us the opportunity to create new visual tools other than the traditional graphs, plots, and histograms. These visual signatures and tools serve the same purpose that the graph plotting examples previously just did—spy and investigate data to infer valuable insights—but on different types of datasets other than just point clouds. Another important use of visualization is to enable the data scientist to interactively explore the data. Two features make today's visualization tools very attractive—the ability to view data from different perspectives (viewing angles) and at different resolutions. These features facilitate the investigator in understanding both the micro- and macro-level behavior of their dataset. Types of datasets There are many different types of datasets that a visualization scientist encounters in their work. This book's aim is to prepare an enthusiastic beginner to delve into the world of data visualization. Certainly, we will not comprehensively cover each and every visualization technique out there. Our aim is to learn to use Mathematica as a tool to create interactive visualizations. To achieve that, we will focus on a general classification of datasets that will determine which Mathematica functions and programming constructs we should learn in order to visualize the broad class of data covered in this book. Tables The table is one of the most common data structures in Computer Science. You might have already encountered this in a computer science, database, or even statistics course, but for the sake of completeness, we will describe the ways in which one could use this structure to represent different kinds of data. Consider the following table as an example:   Attribute 1 Attribute 2 … Item 1       Item 2       Item 3       When storing datasets in tables, each row in the table represents an instance of the dataset, and each column represents an attribute of that data point. For example, a set of two-dimensional Cartesian vectors can be represented as a table with two attributes, where each row represents a vector, and the attributes are the x and y coordinates relative to an origin. For three-dimensional vectors or more, we could just increase the number of attributes accordingly. Tables can be used to store more advanced forms of scientific, time series, and graph data. We will cover some of these datasets over the course of this book, so it is a good idea for us to get introduced to them now. Here, we explain the general concepts. Scalar fields There are many kinds of scientific dataset out there. In order to aid their investigations, scientists have created their own data formats and mathematical tools to analyze the data. Engineers have also developed their own visualization language in order to convey ideas in their community. In this book, we will cover a few typical datasets that are widely used by scientists and engineers. We will eventually learn how to create molecular visualizations and biomedical dataset exploration tools when we feel comfortable manipulating these datasets. In practice, multidimensional data (just like vectors in the previous example) is usually augmented with one or more characteristic variable values. As an example, let's think about how a physicist or an engineer would keep track of the temperature of a room. In order to tackle the problem, they would begin by measuring the geometry and the shape of the room, and put temperature sensors at certain places to measure the temperature. They will note the exact positions of those sensors relative to the room's coordinate system, and then, they will be all set to start measuring the temperature. Thus, the temperature of a room can be represented, in a discrete sense, by using a set of points that represent the temperature sensor locations and the actual temperature at those points. We immediately notice that the data is multidimensional in nature (the location of a sensor can be considered as a vector), and each data point has a scalar value associated with it (temperature). Such a discrete representation of multidimensional data is quite widely used in the scientific community. It is called a scalar field. The following screenshot shows the representation of a scalar field in 2D and 3D: Figure1.3 In practice, scalar fields are discrete and ordered Figure 1.3 depicts how one would represent an ordered scalar field in 2D or 3D. Each point in the 2D field has a well-defined x and y location, and a single temperature value gets associated with it. To represent a 3D scalar field, we can think of it as a set of 2D scalar field slices placed at a regular interval along the third dimension. Each point in the 3D field is a point that has {x, y, z} values, along with a temperature value. A scalar field can be represented using a table. We will denote each {x, y} point (for 2D) or {x, y, z} point values (for 3D) as a row, but this time, an additional attribute for the scalar value will be created in the table. Thus, a row will have the attributes {x, y, z, T}, where T is the temperature associated with the point defined by the x, y, and z coordinates. This is the most common representation of scalar fields. A widely used visualization technique to analyze scalar fields is to find out the isocontours or isosurfaces of interest. However, for now, let's take a look at the kind of application areas such analysis will enable one to pursue. Instead of temperature, one could think of associating regularly spaced points with any relevant scalar value to form problem-specific scalar fields. In an electrochemical simulation, it is important to keep track of the charge density in the simulation space. Thus, the chemist would create a scalar field with charge values at specific points. For an aerospace engineer, it is quite important to understand how air pressure varies across airplane wings; they would keep track of the pressure by forming a scalar field of pressure values. Scalar field visualization is very important in many other significant areas, ranging from from biomedical analysis to particle physics. In this book, we will cover how to visualize this type of data using Mathematica. Time series Another widely used data type is the time series. A time series is a sequence of data points that are measured usually over a uniform interval of time. Time series arise in many fields, but in today's world, they are mostly known for their applications in Economics and Finance. Other than these, they are frequently used in statistics, weather prediction, signal processing, astronomy, and so on. It is not the purpose of this book to describe the theory and mathematics of time series data. However, we will cover some of Mathematica's excellent capabilities for visualizing time series, and in the course of this book, we will construct our own visualization tool to view time series data. Time series can be easily represented using tables. Each row of the time series table will represent one point in the series, with one attribute denoting the time stamp—the time at which the data point was recorded, and the other attribute storing the actual data value. If the starting time and the time interval are known, then we can get rid of the time attribute and simply store the data value in each row. The actual timestamp of each value can be calculated using the initial time and time interval. Images and videos can be represented as tables too, with pixel-intensity values occupying each entry of the table. As we focus on visualization and not image processing, we will skip those types of data. Graphs Nowadays, graphs arise in all contexts of computer science and social science. This particular data structure provides a way to convert real-world problems into a set of entities and relationships. Once we have a graph, we can use a plethora of graph algorithms to find beautiful insights about the dataset. Technically, a graph can be stored as a table. However, Mathematica has its own graph data structure, so we will stick to its norm. Sometimes, visualizing the graph structure reveals quite a lot of hidden information. Graph visualization itself is a challenging problem, and is an active research area in computer science. A proper visualization layout, along with proper color maps and size distribution, can produce very useful outputs. Text The most common form of data that we encounter everywhere is text. Mathematica does not provide any specific visualization package for state-of-the-art text visualization methods. Cartographic data As mentioned before, map visualization is one of the ancient forms of visualization known to us. Nowadays, with the advent of GPS, smartphones, and publicly available country-based data repositories, maps are providing an excellent way to contrast and compare different countries, cities, or even communities. Cartographic data comes in various forms. A common form of a single data item is one that includes latitude, longitude, location name, and an attribute (usually numerical) that records a relevant quantity. However, instead of a latitude and longitude coordinate, we may be given a set of polygons that describe the geographical shape of the place. The attributable quantity may not be numerical, but rather something qualitative, like text. Thus, there is really no standard form that one can expect when dealing with cartographic data. Fortunately, Mathematica provides us with excellent data-mining and dissecting capabilities to build custom formats out of the data available to us. . Mathematica as a tool for visualization At this point, you might be wondering why Mathematica is suited for visualizing all the kinds of datasets that we have mentioned in the preceding examples. There are many excellent tools and packages out there to visualize data. Mathematica is quite different from other languages and packages because of the unique set of capabilities it presents to its user. Mathematica has its own graphics language, with which graphics primitives can be interactively rendered inside the worksheet. This makes Mathematica's capability similar to many widely used visualization languages. Mathematica provides a plethora of functions to combine these primitives and make them interactive. Speaking of interactivity, Mathematica provides a suite of functions to interactively display any of its process. Not only visualization, but any function or code evaluation can be interactively visualized. This is particularly helpful when managing and visualizing big datasets. Mathematica provides many packages and functions to visualize the kinds of datasets we have mentioned so far. We will learn to use the built-in functions to visualize structured and unstructured data. These functions include point, line, and surface plots; histograms; standard statistical charts; and so on. Other than these, we will learn to use the advanced functions that will let us build our own visualization tools. Another interesting feature is the built-in datasets that this software provides to its users. This feature provides a nice playground for the user to experiment with different datasets and visualization functions. From our discussion so far, we have learned that visualization tools are used to analyze very large datasets. While Mathematica is not really suited for dealing with petabytes or exabytes of data (and many other popularly used visualization tools are not suited for that either), often, one needs to build quick prototypes of such visualization tools using smaller sample datasets. Mathematica is very well suited to prototype such tools because of its efficient and fast data-handling capabilities, along with its loads of convenient functions and user-friendly interface. It also supports GPU and other high-performance computing platforms. Although it is not within the scope of this book, a user who knows how to harness the computing power of Mathematica can couple that knowledge with visualization techniques to build custom big data visualization solutions. Another feature that Mathematica presents to a data scientist is the ability to keep the workflow within one worksheet. In practice, many data scientists tend to do their data analysis with one package, visualize their data with another, and export and present their findings using something else. Mathematica provides a complete suite of a core language, mathematical and statistical functions, a visualization platform, and versatile data import and export features inside a single worksheet. This helps the user focus on the data instead of irrelevant details. By now, I hope you are convinced that Mathematica is worth learning for your data-visualization needs. If you still do not believe me, I hope I will be able to convince you again at the end of the book, when we will be done developing several visualization prototypes, each requiring only few lines of code! Getting started with Mathematica We will need to know a few basic Mathematica notebook essentials. Assuming you already have Mathematica installed on your computer, let's open a new notebook by navigating to File|New|Notebook, and do the following experiments. Creating and selecting cells In Mathematica, a chunk of code or any number of mathematical expressions can be written within a cell. Each cell in the notebook can be evaluated to see the output immediately below it. To start a new cell, simply start typing at the position of the blinking cursor. Each cell can be selected by clicking on the respective rightmost bracket. To select multiple cells, press Ctrl + right-mouse button in Windows or Linux (or cmd + right-mouse button on a Mac) on each of the cells. The following screenshot shows several cells selected together, along with the output from each cell: Figure1.4 Selecting and evaluating cells in Mathematica We can place a new cell in between any set of cells in order to change the sequence of instruction execution. Use the mouse to place the cursor in between two cells, and start typing your commands to create a new cell. We can also cut, copy, and paste cells by selecting them and applying the usual shortcuts (for example, Ctrl + C, Ctrl + X, and Ctrl + V in Windows/Linux, or cmd + C, cmd + X, and cmd + V in Mac) or using the Edit menu bar. In order to delete cell(s), select the cell(s) and press the Delete key. Evaluating a cell A cell can be evaluated by pressing Shift + Enter. Multiple cells can be selected and evaluated in the same way. To evaluate the full notebook, press Ctrl + A (to select all the cells) and then press Shift + Enter. In this case, the cells will be evaluated one after the other in the sequence in which they appear in the notebook. To see examples of notebooks filled with commands, code, and mathematical expressions, you can open the notebooks supplied with this article, which are the polar coordinates fitting and Anscombe's quartet examples, and select each cell (or all of them) and evaluate them. If we evaluate a cell that uses variables declared in a previous cell, and the previous cell was not already evaluated, then we may get errors. It is possible that Mathematica will treat the unevaluated variables as a symbolic expression; in that case, no error will be displayed, but the results will not be numeric anymore. Suppressing output from a cell If we don't wish to see the intermediate output as we load data or assign values to variables, we can add a semicolon (;) at the end of each line that we want to leave out from the output. Cell formatting Mathematica input cells treat everything inside them as mathematical and/or symbolic expressions. By default, every new cell you create by typing at the horizontal cursor will be an input expression cell. However, you can convert the cell to other formats for convenient typesetting. In order to change the format of cell(s), select the cell(s) and navigate to Format|Style from the menu bar, and choose a text format style from the available options. You can add mathematical symbols to your text by selecting Palettes|Basic Math Assistant. Note that evaluating a text cell will have no effect/output. Commenting We can write any comment in a text cell as it will be ignored during the evaluation of our code. However, if we would like to write a comment inside an input cell, we use the (* operator to open a comment and the *) operator to close it, as shown in the following code snippet: (* This is a comment *) The shortcut Ctrl + / (cmd + / in Mac) is used to comment/uncomment a chunk of code too. This operation is also available in the menu bar. Downloading the example code You can download the example code files for all Packt books you have purchased from your account at http://www.packtpub.com. If you purchased this book elsewhere, you can visit http://www.packtpub.com/support and register to have the files e-mailed directly to you. Aborting evaluation We can abort the currently running evaluation of a cell by navigating to Evaluation|Abort Evaluation in the menu bar, or simply by pressing Alt + . (period). This is useful when you want to end a time-consuming process that you suddenly realize will not give you the correct results at the end of the evaluation, or end a process that might use up the available memory and shut down the Mathematica kernel. Further reading The history of visualization deserves a separate book, as it is really fascinating how the field has matured over the centuries, and it is still growing very strongly. Michael Friendly, from York University, published a historical development paper that is freely available online, titled Milestones in History of Data Visualization: A Case Study in Statistical Historiography. This is an entertaining compilation of the history of visualization methods. The book The Visual Display of Quantitative Information by Edward R. Tufte published by Graphics Press USA, is an excellent resource and a must-read for every data visualization practitioner. This is a classic book on the theory and practice of data graphics and visualization. Since we will not have the space to discuss the theory of visualization, the interested reader can consider reading this book for deeper insights. Summary In this article, we discussed the importance of data visualization in different contexts. We also introduced the types of dataset that will be visualized over the course of this book. The flexibility and power of Mathematica as a visualization package was discussed, and we will see the demonstration of these properties throughout the book with beautiful visualizations. Finally, we have taken the first step to writing code in Mathematica. Resources for Article: Further resources on this subject: Driving Visual Analyses with Automobile Data (Python) [article] Importing Dynamic Data [article] Interacting with Data for Dashboards [article]
Read more
  • 0
  • 0
  • 9283

article-image-building-publishing-and-supporting-your-forcecom-application
Packt
22 Sep 2014
39 min read
Save for later

Building, Publishing, and Supporting Your Force.com Application

Packt
22 Sep 2014
39 min read
In this article by Andrew Fawcett, the author of Force.com Enterprise Architecture, we will use the declarative aspects of the platform to quickly build an initial version of an application, which will give you an opportunity to get some hands-on experience with some of the packaging and installation features that are needed to release applications to subscribers. We will also take a look at the facilities available to publish your application through Salesforce AppExchange (equivalent to the Apple App Store) and finally provide end user support. (For more resources related to this topic, see here.) We will then use this application as a basis for incrementally releasing new versions of the application to build our understanding of Enterprise Application Development. The following topics outline what we will achieve in this article: Required organizations Introducing the sample application Package types and benefits Creating your first managed package Package dependencies and uploading Introduction to AppExchange and creating listings Installing and testing your package Becoming a Salesforce partner and its benefits Licensing Supporting your application Customer metrics Trialforce and Test Drive Required organizations Several Salesforce organizations are required to develop, package, and test your application. You can sign up for these organizations at https://developer.salesforce.com/, though in due course, as your relationship with Salesforce becomes more formal, you will have the option of accessing their Partner Portal website to create organizations of different types and capabilities. We will discuss more on this later. It's a good idea to have some kind of naming convention to keep track of the different organizations and logins. Use the following table as a guide and create the following organizations via https://developer.salesforce.com/. As stated earlier, these organizations will be used only for the purposes of learning and exploring: Username Usage Purpose myapp@packaging.my.com Packaging Though we will perform initial work in this org, it will eventually be reserved solely for assembling and uploading a release. myapp@testing.my.com Testing In this org, we will install the application and test upgrades. You may want to create several of these in practice, via the Partner Portal website described later in this article. myapp@dev.my.com Developing Later, we will shift development of the application into this org, leaving the packaging org to focus only on packaging. You will have to substitute myapp and my.com (perhaps by reusing your company domain name to avoid naming conflicts) with your own values. Take time to take note of andyapp@packaging.andyinthecloud.com. The following are other organization types that you will eventually need in order to manage the publication and licensing of your application. Usage Purpose Production / CRM Org Your organization may already be using this org for managing contacts, leads, opportunities, cases, and other CRM objects. Make sure that you have the complete authority to make changes, if any, to this org since this is where you run your business. If you do not have such an org, you can request one via the Partner Program website described later in this article, by requesting (via a case) a CRM ISV org. Even if you choose to not fully adopt Salesforce for this part of your business, such an org is still required when it comes to utilizing the licensing aspects of the platform. AppExchange Publishing Org (APO) This org is used to manage your use of AppExchange. We will discuss this a little later in this article. This org is actually the same Salesforce org you designate as your production org, where you conduct your sales and support activities from. License Management Org (LMO) Within this organization, you can track who installs your application (as leads), the licenses you grant to them, and for how long. It is recommended that this is the same org as the APO described earlier. Trialforce Management Org (TMO) Trialforce is a way to provide orgs with your preconfigured application data for prospective customers to try out your application before buying. It will be discussed later in this article. Trialforce Source Org (TSO)   Typically, the LMO and APO can be the same as your primary Salesforce production org, which allows you to track all your leads and future opportunities in the same place. This leads to the rule of APO = LMO = production org. Though neither of them should be your actual developer or test orgs. you can work with Salesforce support and your Salesforce account manager to plan and assign these orgs. Introducing the sample application For this article, we will use the world of Formula1 motor car racing as the basis for a packaged application that we will build together. Formula1 is for me the motor sport that is equivalent to Enterprise applications software, due to its scale and complexity. It is also a sport that I follow, both of which helped me when building the examples that we will use. We will refer to this application as FormulaForce, though please keep in mind Salesforce's branding policies when naming your own application, as they prevent the use of the word "Force" in the company or product titles. This application will focus on the data collection aspects of the races, drivers, and their many statistics, utilizing platform features to structure, visualize, and process this data in both historic and current contexts. For this article, we will create some initial Custom Objects as detailed in the following table. Do not worry about creating any custom tabs just yet. You can use your preferred approach for creating these initial objects. Ensure that you are logged in to your packaging org. Object Field name and type Season__c Name (text) Race__c Name (text) Season__c (Master-Detail to Season__c) Driver__c Name Contestant__c Name (Auto Number, CONTESTANT-{00000000} ) Race__c (Master-Detail to Race__c) Driver__c (Lookup to Driver__c) The following screenshot shows the preceding objects within the Schema Builder tool, available under the Setup menu: Package types and benefits A package is a container that holds your application components such as Custom Objects, Apex code, Apex triggers, Visualforce pages, and so on. This makes up your application. While there are other ways to move components between Salesforce orgs, a package provides a container that you can use for your entire application or deliver optional features by leveraging the so-called extension packages. There are two types of packages, managed and unmanaged. Unmanaged packages result in the transfer of components from one org to another; however, the result is as if those components had been originally created in the destination org, meaning that they can be readily modified or even deleted by the administrator of that org. They are also not upgradable and are not particularly ideal from a support perspective. Moreover, the Apex code that you write is also visible for all to see, so your Intellectual Property is at risk. Unmanaged packages can be used for sharing template components that are intended to be changed by the subscriber. If you are not using GitHub and the GitHub Salesforce Deployment Tool (https://github.com/afawcett/githubsfdeploy), they can also provide a means to share open source libraries to developers. Features and benefits of managed packages Managed packages have the following features that are ideal for distributing your application. The org where your application package is installed is referred to as a subscriber org, since users of this org are subscribing to the services your application provides: Intellectual Property (IP) protection: Users in the subscriber org cannot see your Apex source code, although they can see your Visualforce pages code and static resources. While the Apex code is hidden, JavaScript code is not, so you may want to consider using a minify process to partially obscure such code. The naming scope: Your component names are unique to your package throughout the utilization of a namespace. This means that even if you have object X in your application, and the subscriber has an object of the same name, they remain distinct. You will define a namespace later in this article. The governor scope: Code in your application executes within its own governor limit scope (such as DML and SOQL governors that are subject to passing Salesforce Security Review) and is not affected by other applications or code within the subscriber org. Note that some governors such as the CPU time governor are shared by the whole execution context (discussed in a later article) regardless of the namespace. Upgrades and versioning: Once the subscribers have started using your application, creating data, making configurations, and so on, you will want to provide upgrades and patches with new versions of your application. There are other benefits to managed packages, but these are only accessible after becoming a Salesforce Partner and completing the security review process; these benefits are described later in this article. Salesforce provides ISVForce Guide (otherwise known as the Packaging Guide) in which these topics are discussed in depth; bookmark it now! The following is the URL for ISVForce Guide: http://login.salesforce.com/help/pdfs/en/salesforce_packaging_guide.pdf. Creating your first managed package Packages are created in your packaging org. There can be only one managed package being developed in your packaging org (though additional unmanaged packages are supported, it is not recommended to mix your packaging org with them). You can also install other dependent managed packages and reference their components from your application. The steps to be performed are discussed in the following sections: Setting your package namespace Creating the package and assigning it to the namespace Adding components to the package Setting your package namespace An important decision when creating a managed package is the namespace; this is a prefix applied to all your components (Custom Objects, Visualforce pages, and so on) and is used by developers in subscriber orgs to uniquely qualify between your packaged components and others, even those from other packages. The namespace prefix is an important part of the branding of the application since it is implicitly attached to any Apex code or other components that you include in your package. It can be up to 15 characters, though I personally recommend that you keep it less than this, as it becomes hard to remember and leads to frustrating typos if you make it too complicated. I would also avoid underscore characters as well. It is a good idea to have a naming convention if you are likely to create more managed packages in the future (in different packaging orgs). The following is the format of an example naming convention: [company acronym - 1 to 4 characters][package prefix 1 to 4 characters] For example, the ACME Corporation's Road Runner application might be named acmerr. When the namespace has not been set, the Packages page (accessed under the Setup menu under the Create submenu) indicates that only unmanaged packages can be created. Click on the Edit button to begin a small wizard to enter your desired namespace. This can only be done once and must be globally unique (meaning it cannot be set in any other org), much like a website domain name. The following screenshot shows the Packages page: Once you have set the namespace, the preceding page should look like the following screenshot with the only difference being the namespace prefix that you have used. You are now ready to create a managed package and assign it to the namespace. Creating the package and assigning it to the namespace Click on the New button on the Packages page and give your package a name (it can be changed later). Make sure to tick the Managed checkbox as well. Click on Save and return to the Packages page, which should now look like the following: Adding components to the package In the Packages page, click on the link to your package in order to view its details. From this page, you can manage the contents of your package and upload it. Click on the Add button to add the Custom Objects created earlier in this article. Note that you do not need to add any custom fields; these are added automatically. The following screenshot shows broadly what your Package Details page should look like at this stage: When you review the components added to the package, you will see that some components can be removed while other components cannot be removed. This is because the platform implicitly adds some components for you as they are dependencies. As we progress, adding different component types, you will see this list automatically grow in some cases, and in others, we must explicitly add them. Extension packages As the name suggests, extension packages extend or add to the functionality delivered by the existing packages they are based on, though they cannot change the base package contents. They can extend one or more base packages, and you can even have several layers of extension packages, though you may want to keep an eye on how extensively you use this feature, as managing inter-package dependency can get quite complex to manage, especially during development. Extension packages are created in pretty much the same way as the process you've just completed (including requiring their own packaging org), except that the packaging org must also have the dependent packages installed in it. As code and Visualforce pages contained within extension packages make reference to other Custom Objects, fields, Apex code, and Visualforce pages present in base packages. The platform tracks these dependencies and the version of the base package present at the time the reference was made. When an extension package is installed, this dependency information ensures that the subscriber org must have the correct version (minimum) of the base packages installed before permitting the installation to complete. You can also manage the dependencies between extension packages and base packages yourself through the Versions tab or XML metadata for applicable components. Package dependencies and uploading Packages can have dependencies on platform features and/or other packages. You can review and manage these dependencies through the usage of the Package detail page and the use of dynamic coding conventions as described here. While some features of Salesforce are common, customers can purchase different editions and features according to their needs. Developer Edition organizations have access to most of these features for free. This means that as you develop your application, it is important to understand when and when not to use those features. By default, when referencing a certain Standard Object, field, or component type, you will generate a prerequisite dependency on your package, which your customers will need to have before they can complete the installation. Some Salesforce features, for example Multi-Currency or Chatter, have either a configuration or, in some cases, a cost impact to your users (different org editions). Carefully consider which features your package is dependent on. Most of the feature dependencies, though not all, are visible via the View Dependencies button on the Package details page (this information is also available on the Upload page, allowing you to make a final check). It is a good practice to add this check into your packaging procedures to ensure that no unwanted dependencies have crept in. Clicking on this button, for the package that we have been building in this article so far, confirms that there are no dependencies. Uploading the release and beta packages Once you have checked your dependencies, click on the Upload button. You will be prompted to give a name and version to your package. The version will be managed for you in subsequent releases. Packages are uploaded in one of two modes (beta or release). We will perform a release upload by selecting the Managed - Released option from the Release Type field, so make sure you are happy with the objects created in the earlier section of this article, as they cannot easily be changed after this point. Once you are happy with the information on the screen, click on the Upload button once again to begin the packaging process. Once the upload process completes, you will see a confirmation page as follows: Packages can be uploaded in one of two states as described here: Release packages can be installed into subscriber production orgs and also provide an upgrade path from previous releases. The downside is that you cannot delete the previously released components and change certain things such as a field's type. Changes to the components that are marked global, such as Apex Code and Visualforce components, are also restricted. While Salesforce is gradually enhancing the platform to provide the ability to modify certain released aspects, you need to be certain that your application release is stable before selecting this option. Beta packages cannot be installed into subscriber production orgs; you can install only into Developer Edition (such as your testing org), sandbox, or Partner Portal created orgs. Also, Beta packages cannot be upgraded once installed; hence, this is the reason why Salesforce does not permit their installation into production orgs. The key benefit is in the ability to continue to change new components of the release, to address bugs and features relating to user feedback. The ability to delete previously-published components (uploaded within a release package) is in pilot. It can be enabled through raising a support case with Salesforce Support. Once you have understood the full implications, they will enable it. We have simply added some Custom Objects. So, the upload should complete reasonably quickly. Note that what you're actually uploading to is AppExchange, which will be covered in the following sections. If you want to protect your package, you can provide a password (this can be changed afterwards). The user performing the installation will be prompted for it during the installation process. Optional package dependencies It is possible to make some Salesforce features and/or base package component references (Custom Objects and fields) an optional aspect of your application. There are two approaches to this, depending on the type of the feature. Dynamic Apex and Visualforce For example, the Multi-Currency feature adds a CurrencyIsoCode field to the standard and Custom Objects. If you explicitly reference this field, for example in your Apex or Visualforce pages, you will incur a hard dependency on your package. If you want to avoid this and make it a configuration option (for example) in your application, you can utilize dynamic Apex and Visualforce. Extension packages If you wish to package component types that are only available in subscriber orgs of certain editions, you can choose to include these in extension packages. For example, you may wish to support Professional Edition, which does not support record types. In this case, create an Enterprise Edition extension package for your application's functionality, which leverages the functionality from this edition. Note that you will need multiple testing organizations for each combination of features that you utilize in this way, to effectively test the configuration options or installation options that your application requires. Introduction to AppExchange and listings Salesforce provides a website referred to as AppExchange, which lets prospective customers find, try out, and install applications built using Force.com. Applications listed here can also receive ratings and feedback. You can also list your mobile applications on this site as well. In this section, I will be using an AppExchange package that I already own. The package has already gone through the process to help illustrate the steps that are involved. For this reason, you do not need to perform these steps; they can be revisited at a later phase in your development once you're happy to start promoting your application. Once your package is known to AppExchange, each time you click on the Upload button on your released package (as described previously), you effectively create a private listing. Private listings are not visible to the public until you decide to make them so. It gives you the chance to prepare any relevant marketing details and pricing information while final testing is completed. Note that you can still distribute your package to other Salesforce users or even early beta or pilot customers without having to make your listing public. In order to start building a listing, you need to log in to AppExchange using the login details you designated to your AppExchange Publishing Org (APO). Go to www.appexchange.com and click on Login in the banner at the top-right corner. This will present you with the usual Salesforce login screen. Once logged in, you should see something like this: Select the Publishing Console option from the menu, then click on the Create New Listing button and complete the steps shown in the wizard to associate the packaging org with AppExchange; once completed, you should see it listed. It's really important that you consistently log in to AppExchange using your APO user credentials. Salesforce will let you log in with other users. To make it easy to confirm, consider changing the user's display name to something like MyCompany Packaging. Though it is not a requirement to complete the listing steps, unless you want to try out the process yourself a little further to see the type of information required, you can delete any private listings that you created after you complete this app. Installing and testing your package When you uploaded your package earlier in this article, you will receive an e-mail with a link to install the package. If not, review the Versions tab on the Package detail page in your packaging org. Ensure that you're logged out and click on the link. When prompted, log in to your testing org. The installation process will start. A reduced screenshot of the initial installation page is shown in the following screenshot; click on the Continue button and follow the default installation prompts to complete the installation: Package installation covers the following aspects (once the user has entered the package password if one was set): Package overview: The platform provides an overview of the components that will be added or updated (if this is an upgrade) to the user. Note that due to the namespace assigned to your package, these will not overwrite existing components in the subscriber org created by the subscriber. Connected App and Remote Access: If the package contains components that represent connections to the services outside of the Salesforce services, the user is prompted to approve these. Approve Package API Access: If the package contains components that make use of the client API (such as JavaScript code), the user is prompted to confirm and/or configure this. Such components will generally not be called much; features such as JavaScript Remoting are preferred, and they leverage the Apex runtime security configured post install. Security configuration: In this step, you can determine the initial visibility of the components being installed (objects, pages, and so on). Selecting admin only or the ability to select Profiles to be updated. This option predates the introduction of permission sets, which permit post installation configuration. If you package profiles in your application, the user will need to remember to map these to the existing profiles in the subscriber org as per step 2. This is a one-time option, as the profiles in the package are not actually installed, only merged. I recommend that you utilize permission sets to provide security configurations for your application. These are installed and are much more granular in nature. When the installation is complete, navigate to the Installed Packages menu option under the Setup menu. Here, you can see confirmation of some of your package details such as namespace and version, as well as any licensing details, which will be discussed later in this article. It is also possible to provide a Configure link for your package, which will be displayed next to the package when installed and listed on the Installed Packages page in the subscriber org. Here, you can provide a Visualforce page to access configuration options and processes for example. If you have enabled Seat based licensing, there will also be a Manage Licenses link to determine which users in the subscriber org have access to your package components such as tabs, objects, and Visualforce pages. Licensing, in general, is discussed in more detail later in this article. Automating package installation It is possible to automate some of the processes using the Salesforce Metadata API and associated tools, such as the Salesforce Migration Toolkit (available from the Tools menu under Setup), which can be run from the popular Apache Ant scripting environment. This can be useful if you want to automate the deployment of your packages to customers or test orgs. Options that require a user response such as the security configuration are not covered by automation. However, password-protected managed packages are supported. You can find more details on this by looking up the Installed Package component in the online help for the Salesforce Metadata API at https://www.salesforce.com/us/developer/docs/api_meta/. As an aid to performing this from Ant, a custom Ant task can be found in the sample code related to this article (see /lib/antsalesforce.xml). The following is a /build.xml Ant script to uninstall and reinstall the package. Note that the installation will also upgrade a package if the package is already installed. The following is the Ant script: <project name="FormulaForce" basedir="."> <!-- Downloaded from Salesforce Tools page under Setup --> <typedef uri="antlib:com.salesforce" resource="com/salesforce/antlib.xml" classpath="${basedir}/lib/ant-salesforce.jar"/> <!-- Import macros to install/uninstall packages --> <import file="${basedir}/lib/ant-salesforce.xml"/> <target name="package.installdemo"> <uninstallPackage namespace="yournamespace" username="${sf.username}" password="${sf.password}"/> <installPackage namespace="yournamespace" version="1.0" username="${sf.username}" password="${sf.password}"/> </target> </project> You can try the preceding example with your testing org by replacing the namespace attribute values with the namespace you entered earlier in this article. Enter the following commands, all on one line, from the folder that contains the build.xml file: ant package.installdemo -Dsf.username=testorgusername -Dsf.password=testorgpasswordtestorgtoken You can also use the Salesforce Metadata API to list packages installed in an org, for example, if you wanted to determine whether a dependent package needs to be installed or upgraded before sending an installation request. Finally, you can also uninstall packages if you wish. Becoming a Salesforce partner and benefits The Salesforce Partner Program has many advantages. The first place to visit is http://www.salesforce.com/partners/overview. You will want to focus on the areas of the site relating to being an Independent Software Vendor (ISV) partner. From here, you can click on Join. It is free to join, though you will want to read through the various agreements carefully of course. Once you wish to start listing a package and charging users for it, you will need to arrange billing details for Salesforce to take the various fees involved. Pay careful attention to the Standard Objects used in your package, as this will determine the license type required by your users and the overall cost to them in addition to your charges. Obviously, Salesforce would prefer your application to use as many features of the CRM application as possible, which may also be beneficial to you as a feature of your application, since it's an appealing immediate integration not found on other platforms, such as the ability to instantly integrate with accounts and contacts. If you're planning on using Standard Objects and are in doubt about the costs (as they do vary depending on the type), you can request a conversation with Salesforce to discuss this; this is something to keep in mind in the early stages. Once you have completed the signup process, you will gain access to the Partner Portal (your user will end with @partnerforce.com). You must log in to the specific site as opposed to the standard Salesforce login; currently, the URL is https://www.salesforce.com/partners/login. Starting from July 2014, the http://partners.salesforce.com URL provides access to the Partner Community. Logging in to this service using your production org user credentials is recommended. The following screenshot shows what the current Partner Portal home page looks like. Here you can see some of its key features: This is your primary place to communicate with Salesforce and also to access additional materials and announcements relevant to ISVs, so do keep checking often. You can raise cases and provide additional logins to other users in your organization, such as other developers who may wish to report issues or ask questions. There is also the facility to create test or developer orgs; here, you can choose the appropriate edition (Professional, Group, Enterprise, and others) you want to test against. You can also create Partner Developer Edition orgs from this option as well. These carry additional licenses and limits over the public's so-called Single Developer Editions orgs and are thus recommended for use only once you start using the Partner Portal. Note, however, that these orgs do expire, subject to either continued activity over 6 months or renewing the security review process (described in the following section) each year. Once you click on the create a test org button, there is a link on the page displayed that navigates to a table that describes the benefits, processes, and the expiry rules. Security review and benefits The following features require that a completed package release goes through a Salesforce-driven process known as the Security review, which is initiated via your listing when logged into AppExchange. Unless you plan to give your package away for free, there is a charge involved in putting your package through this process. However, the review is optional. There is nothing stopping you from distributing your package installation URL directly. However, you will not be able to benefit from the ability to list your new application on AppExchange for others to see and review. More importantly, you will also not have access to the following features to help you deploy, license, and support your application. The following is a list of the benefits you get once your package has passed the security review: Bypass subscriber org setup limits: Limits such as the number of tabs and Custom Objects are bypassed. This means that if the subscriber org has reached its maximum number of Custom Objects, your package will still install. This feature is sometimes referred to as Aloha. Without this, your package installation may fail. You can determine whether Aloha has been enabled via the Subscriber Overview page that comes with the LMA application, which is discussed in the next section. Licensing: You are able to utilize the Salesforce-provided License Management Application in your LMO (License Management Org as described previously). Subscriber support: With this feature, the users in the subscriber org can enable, for a specific period, a means for you to log in to their org (without exchanging passwords), reproduce issues, and enable much more detailed debug information such as Apex stack traces. In this mode, you can also see custom settings that you have declared as protected in your package, which are useful for enabling additional debug or advanced features. Push upgrade: Using this feature, you can automatically apply upgrades to your subscribers without their manual intervention, either directly by you or on a scheduled basis. You may use this for applying either smaller bug fixes that don't affect the Custom Objects or APIs or deploy full upgrades. The latter requires careful coordination and planning with your subscribers to ensure that changes and new features are adopted properly. Salesforce asks you to perform an automated security scan of your software via a web page (http://security.force.com/security/tools/forcecom/scanner). This service can be quite slow depending on how many scans are in the queue. Another option is to obtain the Eclipse plugin from the actual vendor CheckMarx at http://www.checkmarx.com, which runs the same scan but allows you to control it locally. Finally, for the ultimate confidence as you develop your application, Salesforce can provide a license to integrate it into your Continuous Integration (CI) build system. Keep in mind that if you make any callouts to external services, Salesforce will also most likely ask you and/or the service provider to run a BURP scanner, to check for security flaws. Make sure you plan a reasonable amount of time (at least 2–3 weeks, in my experience) to go through the security review process; it is a must to initially list your package, though if it becomes an issue, you have the option of issuing your package install URL directly to initial customers and early adopters. Licensing Once you have completed the security review, you are able to request through raising support cases via the Partner Portal to have access to the LMA. Once this is provided by Salesforce, use the installation URL to install it like any other package into your LMO. If you have requested a CRM for ISV's org (through a case raised within the Partner Portal), you may find the LMA already installed. The following screenshot shows the main tabs of the License Management Application once installed: In this section, I will use a package that I already own and have already taken through the process to help illustrate the steps that are involved. For this reason, you do not need to perform these steps. After completing the installation, return to AppExchange and log in. Then, locate your listing in Publisher Console under Uploaded Packages. Next to your package, there will be a Manage Licenses link. The first time after clicking on this link, you will be asked to connect your package to your LMO org. Once this is done, you will be able to define the license requirements for your package. The following example shows the license for a free package, with an immediately active license for all users in the subscriber org: In most cases, for packages that you intend to charge for, you would select a free trial rather than setting the license default to active immediately. For paid packages, select a license length, unless perhaps it's a one-off charge, and then select the license that does not expire. Finally, if you're providing a trial license, you need to consider carefully the default number of seats (users); users may need to be able to assign themselves different roles in your application to get the full experience. While licensing is expressed at a package level currently, it is very likely that more granular licensing around the modules or features in your package will be provided by Salesforce in the future. This will likely be driven by the Permission Sets feature. As such, keep in mind a functional orientation to your Permission Set design. The Manage Licenses link is shown on the Installed Packages page next to your package if you configure a number of seats against the license. The administrator in the subscriber org can use this page to assign applicable users to your package. The following screenshot shows how your installed package looks to the administrator when the package has licensing enabled: Note that you do not need to keep reapplying the license requirements for each version you upload; the last details you defined will be carried forward to new versions of your package until you change them. Either way, these details can also be completely overridden on the License page of the LMA application as well. You may want to apply a site-wide (org-wide) active license to extensions or add-on packages. This allows you to at least track who has installed such packages even though you don't intend to manage any licenses around them, since you are addressing licensing on the main package. The Licenses tab and managing customer licenses The Licenses tab provides a list of individual license records that are automatically generated when the users install your package into their orgs. Salesforce captures this action and creates the relevant details, including Lead information, and also contains contact details of the organization and person who performed the install, as shown in the following screenshot: From each of these records, you can modify the current license details to extend the expiry period or disable the application completely. If you do this, the package will remain installed with all of its data. However, none of the users will be able to access the objects, Apex code, or pages, not even the administrator. You can also re-enable the license at any time. The following screenshot shows the License Edit section: The Subscribers tab The Subscribers tab lists all your customers or subscribers (it shows their Organization Name from the company profile) that have your packages installed (only those linked via AppExchange). This includes their organization ID, edition (Developer, Enterprise, or others), and also the type of instance (sandbox or production). The Subscriber Overview page When you click on Organization Name from the list in this tab, you are taken to the Subscriber Overview page. This page is sometimes known as the Partner Black Tab. This page is packed with useful information such as the contact details (also seen via the Leads tab) and the login access that may have been granted (we will discuss this in more detail in the next section), as well as which of your packages they have installed, its current licensed status, and when it was installed. The following is a screenshot of the Subscriber Overview page: How licensing is enforced in the subscriber org Licensing is enforced in one of two ways, depending on the execution context in which your packaged Custom Objects, fields, and Apex code are being accessed from. The first context is where a user is interacting directly with your objects, fields, tabs, and pages via the user interface or via the Salesforce APIs (Partner and Enterprise). If the user or the organization is not licensed for your package, these will simply be hidden from view, and in the case of the API, return an error. Note that administrators can still see packaged components under the Setup menu. The second context is the type of access made from Apex code, such as an Apex trigger or controller, written by the customers themselves or from within another package. This indirect way of accessing your package components is permitted if the license is site (org) wide or there is at least one user in the organization that is allocated a seat. This condition means that even if the current user has not been assigned a seat (via the Manage Licenses link), they are still accessing your application's objects and code, although indirectly, for example, via a customer-specific utility page or Apex trigger, which automates the creation of some records or defaulting of fields in your package. Your application's Apex triggers (for example, the ones you might add to Standard Objects) will always execute even if the user does not have a seat license, as long as there is just one user seat license assigned in the subscriber org to your package. However, if that license expires, the Apex trigger will no longer be executed by the platform, until the license expiry is extended. Providing support Once your package has completed the security review, additional functionality for supporting your customers is enabled. Specifically, this includes the ability to log in securely (without exchanging passwords) to their environments and debug your application. When logged in this way, you can see everything the user sees in addition to extended Debug Logs that contain the same level of details as they would in a developer org. First, your customer enables access via the Grant Account Login page. This time however, your organization (note that this is the Company Name as defined in the packaging org under Company Profile) will be listed as one of those available in addition to Salesforce Support. The following screenshot shows the Grant Account Login page: Next, you log in to your LMO and navigate to the Subscribers tab as described. Open Subscriber Overview for the customer, and you should now see the link to Login as that user. From this point on, you can follow the steps given to you by your customer and utilize the standard Debug Log and Developer Console tools to capture the debug information you need. The following screenshot shows a user who has been granted login access via your package to their org: This mode of access also permits you to see protected custom settings if you have included any of those in your package. If you have not encountered these before, it's well worth researching them as they provide an ideal way to enable and disable debug, diagnostic, or advanced configurations that you don't want your customers to normally see. Customer metrics Salesforce has started to expose information relating to the usage of your package components in the subscriber orgs since the Spring '14 release of the platform. This enables you to report what Custom Objects and Visualforce pages your customers are using and more importantly those they are not. This information is provided by Salesforce and cannot be opted out by the customer. At the time of writing, this facility is in pilot and needs to be enabled by Salesforce Support. Once enabled, the MetricsDataFile object is available in your production org and will receive a data file periodically that contains the metrics records. The Usage Metrics Visualization application can be found by searching on AppExchange and can help with visualizing this information. Trialforce and Test Drive Large enterprise applications often require some consultation with customers to tune and customize to their needs after the initial package installation. If you wish to provide trial versions of your application, Salesforce provides a means to take snapshots of the results of this installation and setup process, including sample data. You can then allow prospects that visit your AppExchange listing or your website to sign up to receive a personalized instance of a Salesforce org based on the snapshot you made. The potential customers can then use this to fully explore the application for a limited duration until they sign up to be a paid customer from the trial version. Such orgs will eventually expire when the Salesforce trial period ends for the org created (typically 14 days). Thus, you should keep this in mind when setting the default expiry on your package licensing. The standard approach is to offer a web form for the prospect to complete in order to obtain the trial. Review the Providing a Free Trial on your Website and Providing a Free Trial on AppExchange sections of the ISVForce Guide for more on this. You can also consider utilizing the Signup Request API, which gives you more control over how the process is started and the ability to monitor it, such that you can create the lead records yourself. You can find out more about this in the Creating Signups using the API section in the ISVForce Guide. Alternatively, if the prospect wishes to try your package in their sandbox environment for example, you can permit them to install the package directly either from AppExchange or from your website. In this case, ensure that you have defined a default expiry on your package license as described earlier. In this scenario, you or the prospect will have to perform the setup steps after installation. Finally, there is a third option called Test Drive, which does not create a new org for the prospect on request, but does require you to set up an org with your application, preconfigure it, and then link it to your listing via AppExchange. Instead of the users completing a signup page, they click on the Test Drive button on your AppExchange listing. This logs them into your test drive org as a read-only user. Because this is a shared org, the user experience and features you can offer to users is limited to those that mainly read information. I recommend that you consider Trialforce over this option unless there is some really compelling reason to use it. When defining your listing in AppExchange, the Leads tab can be used to configure the creation of lead records for trials, test drives, and other activities on your listing. Enabling this will result in a form being presented to the user before accessing these features on your listing. If you provide access to trials through signup forms on your website for example, lead information will not be captured. Summary This article has given you a practical overview of the initial package creation process through installing it into another Salesforce organization. While some of the features discussed cannot be fully exercised until you're close to your first release phase, you can now head to development with a good understanding of how early decisions such as references to Standard Objects are critical to your licensing and cost decisions. It is also important to keep in mind that while tools such as Trialforce help automate the setup, this does not apply to installing and configuring your customer environments. Thus, when making choices regarding configurations and defaults in your design, keep in mind the costs to the customer during the implementation cycle. Make sure you plan for the security review process in your release cycle (the free online version has a limited bandwidth) and ideally integrate it into your CI build system (a paid facility) as early as possible, since the tool not only monitors security flaws but also helps report breaches in best practices such as lack of test asserts and SOQL or DML statements in loops. As you revisit the tools covered in this article, be sure to reference the excellent ISVForce Guide at http://www.salesforce.com/us/developer/docs/packagingGuide/index.htm for the latest detailed steps and instructions on how to access, configure, and use these features. Resources for Article: Further resources on this subject: Salesforce CRM Functions [Article] Force.com: Data Management [Article] Configuration in Salesforce CRM [Article]
Read more
  • 0
  • 0
  • 3412

article-image-improving-code-quality
Packt
22 Sep 2014
18 min read
Save for later

Improving Code Quality

Packt
22 Sep 2014
18 min read
In this article by Alexandru Vlăduţu, author of Mastering Web Application Development with Express, we are going to see how to test Express applications and how to improve the code quality of our code by leveraging existing NPM modules. (For more resources related to this topic, see here.) Creating and testing an Express file-sharing application Now, it's time to see how to develop and test an Express application with what we have learned previously. We will create a file-sharing application that allows users to upload files and password-protect them if they choose to. After uploading the files to the server, we will create a unique ID for that file, store the metadata along with the content (as a separate JSON file), and redirect the user to the file's information page. When trying to access a password-protected file, an HTTP basic authentication pop up will appear, and the user will have to only enter the password (no username in this case). The package.json file, so far, will contain the following code: { "name": "file-uploading-service", "version": "0.0.1", "private": true, "scripts": { "start": "node ./bin/www" }, "dependencies": { "express": "~4.2.0", "static-favicon": "~1.0.0", "morgan": "~1.0.0", "cookie-parser": "~1.0.1", "body-parser": "~1.0.0", "debug": "~0.7.4", "ejs": "~0.8.5", "connect-multiparty": "~1.0.5", "cuid": "~1.2.4", "bcrypt": "~0.7.8", "basic-auth-connect": "~1.0.0", "errto": "~0.2.1", "custom-err": "0.0.2", "lodash": "~2.4.1", "csurf": "~1.2.2", "cookie-session": "~1.0.2", "secure-filters": "~1.0.5", "supertest": "~0.13.0", "async": "~0.9.0" }, "devDependencies": { } } When bootstrapping an Express application using the CLI, a /bin/www file will be automatically created for you. The following is the version we have adopted to extract the name of the application from the package.json file. This way, in case we decide to change it we won't have to alter our debugging code because it will automatically adapt to the new name, as shown in the following code: #!/usr/bin/env node var pkg = require('../package.json'); var debug = require('debug')(pkg.name + ':main'); var app = require('../app'); app.set('port', process.env.PORT || 3000); var server = app.listen(app.get('port'), function() { debug('Express server listening on port ' + server.address().port); }); The application configurations will be stored inside config.json: { "filesDir": "files", "maxSize": 5 } The properties listed in the preceding code refer to the files folder (where the files will be updated), which is relative to the root and the maximum allowed file size. The main file of the application is named app.js and lives in the root. We need the connect-multiparty module to support file uploads, and the csurf and cookie-session modules for CSRF protection. The rest of the dependencies are standard and we have used them before. The full code for the app.js file is as follows: var express = require('express'); var path = require('path'); var favicon = require('static-favicon'); var logger = require('morgan'); var cookieParser = require('cookie-parser'); var session = require('cookie-session'); var bodyParser = require('body-parser'); var multiparty = require('connect-multiparty'); var Err = require('custom-err'); var csrf = require('csurf'); var ejs = require('secure-filters').configure(require('ejs')); var csrfHelper = require('./lib/middleware/csrf-helper'); var homeRouter = require('./routes/index'); var filesRouter = require('./routes/files'); var config = require('./config.json'); var app = express(); var ENV = app.get('env'); // view engine setup app.engine('html', ejs.renderFile); app.set('views', path.join(__dirname, 'views')); app.set('view engine', 'html'); app.use(favicon()); app.use(bodyParser.json()); app.use(bodyParser.urlencoded()); // Limit uploads to X Mb app.use(multiparty({ maxFilesSize: 1024 * 1024 * config.maxSize })); app.use(cookieParser()); app.use(session({ keys: ['rQo2#0s!qkE', 'Q.ZpeR49@9!szAe'] })); app.use(csrf()); // add CSRF helper app.use(csrfHelper); app.use('/', homeRouter); app.use('/files', filesRouter); app.use(express.static(path.join(__dirname, 'public'))); /// catch 404 and forward to error handler app.use(function(req, res, next) { next(Err('Not Found', { status: 404 })); }); /// error handlers // development error handler // will print stacktrace if (ENV === 'development') { app.use(function(err, req, res, next) { res.status(err.status || 500); res.render('error', { message: err.message, error: err }); }); } // production error handler // no stacktraces leaked to user app.use(function(err, req, res, next) { res.status(err.status || 500); res.render('error', { message: err.message, error: {} }); }); module.exports = app; Instead of directly binding the application to a port, we are exporting it, which makes our lives easier when testing with supertest. We won't need to care about things such as the default port availability or specifying a different port environment variable when testing. To avoid having to create the whole input when including the CSRF token, we have created a helper for that inside lib/middleware/csrf-helper.js: module.exports = function(req, res, next) { res.locals.csrf = function() { return "<input type='hidden' name='_csrf' value='" + req.csrfToken() + "' />"; } next(); }; For the password–protection functionality, we will use the bcrypt module and create a separate file inside lib/hash.js for the hash generation and password–compare functionality: var bcrypt = require('bcrypt'); var errTo = require('errto'); var Hash = {}; Hash.generate = function(password, cb) { bcrypt.genSalt(10, errTo(cb, function(salt) { bcrypt.hash(password, salt, errTo(cb, function(hash) { cb(null, hash); })); })); }; Hash.compare = function(password, hash, cb) { bcrypt.compare(password, hash, cb); }; module.exports = Hash; The biggest file of our application will be the file model, because that's where most of the functionality will reside. We will use the cuid() module to create unique IDs for files, and the native fs module to interact with the filesystem. The following code snippet contains the most important methods for models/file.js: function File(options, id) { this.id = id || cuid(); this.meta = _.pick(options, ['name', 'type', 'size', 'hash', 'uploadedAt']); this.meta.uploadedAt = this.meta.uploadedAt || new Date(); }; File.prototype.save = function(path, password, cb) { var _this = this; this.move(path, errTo(cb, function() { if (!password) { return _this.saveMeta(cb); } hash.generate(password, errTo(cb, function(hashedPassword) { _this.meta.hash = hashedPassword; _this.saveMeta(cb); })); })); }; File.prototype.move = function(path, cb) { fs.rename(path, this.path, cb); }; For the full source code of the file, browse the code bundle. Next, we will create the routes for the file (routes/files.js), which will export an Express router. As mentioned before, the authentication mechanism for password-protected files will be the basic HTTP one, so we will need the basic-auth-connect module. At the beginning of the file, we will include the dependencies and create the router: var express = require('express'); var basicAuth = require('basic-auth-connect'); var errTo = require('errto'); var pkg = require('../package.json'); var File = require('../models/file'); var debug = require('debug')(pkg.name + ':filesRoute'); var router = express.Router(); We will have to create two routes that will include the id parameter in the URL, one for displaying the file information and another one for downloading the file. In both of these cases, we will need to check if the file exists and require user authentication in case it's password-protected. This is an ideal use case for the router.param() function because these actions will be performed each time there is an id parameter in the URL. The code is as follows: router.param('id', function(req, res, next, id) { File.find(id, errTo(next, function(file) { debug('file', file); // populate req.file, will need it later req.file = file; if (file.isPasswordProtected()) { // Password – protected file, check for password using HTTP basic auth basicAuth(function(user, pwd, fn) { if (!pwd) { return fn(); } // ignore user file.authenticate(pwd, errTo(next, function(match) { if (match) { return fn(null, file.id); } fn(); })); })(req, res, next); } else { // Not password – protected, proceed normally next(); } })); }); The rest of the routes are fairly straightforward, using response.download() to send the file to the client, or using response.redirect() after uploading the file: router.get('/', function(req, res, next) { res.render('files/new', { title: 'Upload file' }); }); router.get('/:id.html', function(req, res, next) { res.render('files/show', { id: req.params.id, meta: req.file.meta, isPasswordProtected: req.file.isPasswordProtected(), hash: hash, title: 'Download file ' + req.file.meta.name }); }); router.get('/download/:id', function(req, res, next) { res.download(req.file.path, req.file.meta.name); }); router.post('/', function(req, res, next) { var tempFile = req.files.file; if (!tempFile.size) { return res.redirect('/files'); } var file = new File(tempFile); file.save(tempFile.path, req.body.password, errTo(next, function() { res.redirect('/files/' + file.id + '.html'); })); }); module.exports = router; The view for uploading a file contains a multipart form with a CSRF token inside (views/files/new.html): <%- include ../layout/header.html %> <form action="/files" method="POST" enctype="multipart/form-data"> <div class="form-group"> <label>Choose file:</label> <input type="file" name="file" /> </div> <div class="form-group"> <label>Password protect (leave blank otherwise):</label> <input type="password" name="password" /> </div> <div class="form-group"> <%- csrf() %> <input type="submit" /> </div> </form> <%- include ../layout/footer.html %> To display the file's details, we will create another view (views/files/show.html). Besides showing the basic file information, we will display a special message in case the file is password-protected, so that the client is notified that a password should also be shared along with the link: <%- include ../layout/header.html %> <p> <table> <tr> <th>Name</th> <td><%= meta.name %></td> </tr> <th>Type</th> <td><%= meta.type %></td> </tr> <th>Size</th> <td><%= meta.size %> bytes</td> </tr> <th>Uploaded at</th> <td><%= meta.uploadedAt %></td> </tr> </table> </p> <p> <a href="/files/download/<%- id %>">Download file</a> | <a href="/files">Upload new file</a> </p> <p> To share this file with your friends use the <a href="/files/<%- id %>">current link</a>. <% if (isPasswordProtected) { %> <br /> Don't forget to tell them the file password as well! <% } %> </p> <%- include ../layout/footer.html %> Running the application To run the application, we need to install the dependencies and run the start script: $ npm i $ npm start The default port for the application is 3000, so if we visit http://localhost:3000/files, we should see the following page: After uploading the file, we should be redirected to the file's page, where its details will be displayed: Unit tests Unit testing allows us to test individual parts of our code in isolation and verify their correctness. By making our tests focused on these small components, we decrease the complexity of the setup, and most likely, our tests should execute faster. Using the following command, we'll install a few modules to help us in our quest: $ npm i mocha should sinon––save-dev We are going to write unit tests for our file model, but there's nothing stopping us from doing the same thing for our routes or other files from /lib. The dependencies will be listed at the top of the file (test/unit/file-model.js): var should = require('should'); var path = require('path'); var config = require('../../config.json'); var sinon = require('sinon'); We will also need to require the native fs module and the hash module, because these modules will be stubbed later on. Apart from these, we will create an empty callback function and reuse it, as shown in the following code: // will be stubbing methods on these modules later on var fs = require('fs'); var hash = require('../../lib/hash'); var noop = function() {}; The tests for the instance methods will be created first: describe('models', function() { describe('File', function() { var File = require('../../models/file'); it('should have default properties', function() { var file = new File(); file.id.should.be.a.String; file.meta.uploadedAt.should.be.a.Date; }); it('should return the path based on the root and the file id', function() { var file = new File({}, '1'); file.path.should.eql(File.dir + '/1'); }); it('should move a file', function() { var stub = sinon.stub(fs, 'rename'); var file = new File({}, '1'); file.move('/from/path', noop); stub.calledOnce.should.be.true; stub.calledWith('/from/path', File.dir + '/1', noop).should.be.true; stub.restore(); }); it('should save the metadata', function() { var stub = sinon.stub(fs, 'writeFile'); var file = new File({}, '1'); file.meta = { a: 1, b: 2 }; file.saveMeta(noop); stub.calledOnce.should.be.true; stub.calledWith(File.dir + '/1.json', JSON.stringify(file.meta), noop).should.be.true; stub.restore(); }); it('should check if file is password protected', function() { var file = new File({}, '1'); file.meta.hash = 'y'; file.isPasswordProtected().should.be.true; file.meta.hash = null; file.isPasswordProtected().should.be.false; }); it('should allow access if matched file password', function() { var stub = sinon.stub(hash, 'compare'); var file = new File({}, '1'); file.meta.hash = 'hashedPwd'; file.authenticate('password', noop); stub.calledOnce.should.be.true; stub.calledWith('password', 'hashedPwd', noop).should.be.true; stub.restore(); }); We are stubbing the functionalities of the fs and hash modules because we want to test our code in isolation. Once we are done with the tests, we restore the original functionality of the methods. Now that we're done testing the instance methods, we will go on to test the static ones (assigned directly onto the File object): describe('.dir', function() { it('should return the root of the files folder', function() { path.resolve(__dirname + '/../../' + config.filesDir).should.eql(File.dir); }); }); describe('.exists', function() { var stub; beforeEach(function() { stub = sinon.stub(fs, 'exists'); }); afterEach(function() { stub.restore(); }); it('should callback with an error when the file does not exist', function(done) { File.exists('unknown', function(err) { err.should.be.an.instanceOf(Error).and.have.property('status', 404); done(); }); // call the function passed as argument[1] with the parameter `false` stub.callArgWith(1, false); }); it('should callback with no arguments when the file exists', function(done) { File.exists('existing-file', function(err) { (typeof err === 'undefined').should.be.true; done(); }); // call the function passed as argument[1] with the parameter `true` stub.callArgWith(1, true); }); }); }); }); To stub asynchronous functions and execute their callback, we use the stub.callArgWith() function provided by sinon, which executes the callback provided by the argument with the index <<number>> of the stub with the subsequent arguments. For more information, check out the official documentation at http://sinonjs.org/docs/#stubs. When running tests, Node developers expect the npm test command to be the command that triggers the test suite, so we need to add that script to our package.json file. However, since we are going to have different tests to be run, it would be even better to add a unit-tests script and make npm test run that for now. The scripts property should look like the following code: "scripts": { "start": "node ./bin/www", "unit-tests": "mocha --reporter=spec test/unit", "test": "npm run unit-tests" }, Now, if we run the tests, we should see the following output in the terminal: Functional tests So far, we have tested each method to check whether it works fine on its own, but now, it's time to check whether our application works according to the specifications when wiring all the things together. Besides the existing modules, we will need to install and use the following ones: supertest: This is used to test the routes in an expressive manner cheerio: This is used to extract the CSRF token out of the form and pass it along when uploading the file rimraf: This is used to clean up our files folder once we're done with the testing We will create a new file called test/functional/files-routes.js for the functional tests. As usual, we will list our dependencies first: var fs = require('fs'); var request = require('supertest'); var should = require('should'); var async = require('async'); var cheerio = require('cheerio'); var rimraf = require('rimraf'); var app = require('../../app'); There will be a couple of scenarios to test when uploading a file, such as: Checking whether a file that is uploaded without a password can be publicly accessible Checking that a password-protected file can only be accessed with the correct password We will create a function called uploadFile that we can reuse across different tests. This function will use the same supertest agent when making requests so it can persist the cookies, and will also take care of extracting and sending the CSRF token back to the server when making the post request. In case a password argument is provided, it will send that along with the file. The function will assert that the status code for the upload page is 200 and that the user is redirected to the file page after the upload. The full code of the function is listed as follows: function uploadFile(agent, password, done) { agent .get('/files') .expect(200) .end(function(err, res) { (err == null).should.be.true; var $ = cheerio.load(res.text); var csrfToken = $('form input[name=_csrf]').val(); csrfToken.should.not.be.empty; var req = agent .post('/files') .field('_csrf', csrfToken) .attach('file', __filename); if (password) { req = req.field('password', password); } req .expect(302) .expect('Location', /files/(.*).html/) .end(function(err, res) { (err == null).should.be.true; var fileUid = res.headers['location'].match(/files/(.*).html/)[1]; done(null, fileUid); }); }); } Note that we will use rimraf in an after function to clean up the files folder, but it would be best to have a separate path for uploading files while testing (other than the one used for development and production): describe('Files-Routes', function(done) { after(function() { var filesDir = __dirname + '/../../files'; rimraf.sync(filesDir); fs.mkdirSync(filesDir); When testing the file uploads, we want to make sure that without providing the correct password, access will not be granted to the file pages: describe("Uploading a file", function() { it("should upload a file without password protecting it", function(done) { var agent = request.agent(app); uploadFile(agent, null, done); }); it("should upload a file and password protect it", function(done) { var agent = request.agent(app); var pwd = 'sample-password'; uploadFile(agent, pwd, function(err, filename) { async.parallel([ function getWithoutPwd(next) { agent .get('/files/' + filename + '.html') .expect(401) .end(function(err, res) { (err == null).should.be.true; next(); }); }, function getWithPwd(next) { agent .get('/files/' + filename + '.html') .set('Authorization', 'Basic ' + new Buffer(':' + pwd).toString('base64')) .expect(200) .end(function(err, res) { (err == null).should.be.true; next(); }); } ], function(err) { (err == null).should.be.true; done(); }); }); }); }); }); It's time to do the same thing we did for the unit tests: make a script so we can run them with npm by using npm run functional-tests. At the same time, we should update the npm test script to include both our unit tests and our functional tests: "scripts": { "start": "node ./bin/www", "unit-tests": "mocha --reporter=spec test/unit", "functional-tests": "mocha --reporter=spec --timeout=10000 --slow=2000 test/functional", "test": "npm run unit-tests && npm run functional-tests" } If we run the tests, we should see the following output: Running tests before committing in Git It's a good practice to run the test suite before committing to git and only allowing the commit to pass if the tests have been executed successfully. The same applies for other version control systems. To achieve this, we should add the .git/hooks/pre-commit file, which should take care of running the tests and exiting with an error in case they failed. Luckily, this is a repetitive task (which can be applied to all Node applications), so there is an NPM module that creates this hook file for us. All we need to do is install the pre-commit module (https://www.npmjs.org/package/pre-commit) as a development dependency using the following command: $ npm i pre-commit ––save-dev This should automatically create the pre-commit hook file so that all the tests are run before committing (using the npm test command). The pre-commit module also supports running custom scripts specified in the package.json file. For more details on how to achieve that, read the module documentation at https://www.npmjs.org/package/pre-commit. Summary In this article, we have learned about writing tests for Express applications and in the process, explored a variety of helpful modules. Resources for Article: Further resources on this subject: Web Services Testing and soapUI [article] ExtGWT Rich Internet Application: Crafting UI Real Estate [article] Rendering web pages to PDF using Railo Open Source [article]
Read more
  • 0
  • 0
  • 1842

article-image-handling-selinux-aware-applications
Packt
19 Sep 2014
5 min read
Save for later

Handling SELinux-aware Applications

Packt
19 Sep 2014
5 min read
This article is written by Sven Vermeulen, the author of SELinux Cookbook. In this article, we will cover how to control D-Bus message flows. (For more resources related to this topic, see here.) Controlling D-Bus message flows D-Bus implementation on Linux is an example of an SELinux-aware application, acting as a user space object manager. Applications can register themselves on a bus and can send messages between applications through D-Bus. These messages can be controlled through the SELinux policy as well. Getting ready Before looking at the SELinux access controls related to message flows, it is important to focus on a D-Bus service and see how its authentication is done (and how messages are relayed in D-Bus) as this is reflected in the SELinux integration. Go to /etc/dbus-1/system.d/ (which hosts the configuration files for D-Bus services) and take a look at a configuration file. For instance, the service configuration file for dnsmasq looks like the following: <!DOCTYPE busconfig PUBLIC "-//freedesktop//DTD D-BUS Bus Configuration 1.0//EN" "http://www.freedesktop.org/standards/dbus/1.0/busconfig.dtd"> <busconfig> <policy user="root">    <allow own="uk.org.thekelleys.dnsmasq"/>    <allow send_destination="uk.org.thekelleys.dnsmasq"/> </policy> <policy context="default">    <deny own="uk.org.thekelleys.dnsmasq"/>    <deny send_destination="uk.org.thekelleys.dnsmasq"/> </policy> </busconfig> This configuration tells D-Bus that only the root Linux user is allowed to have a service own the uk.org.thekelleys.dnsmasq service and send messages to this service. Others (as managed through the default policy) are denied these operations. On a system with SELinux enabled, having root as the finest granularity doesn't cut it. So, let's look at how the SELinux policy can offer a fine-grained access control in D-Bus. How to do it… To control D-Bus message flows with SELinux, perform the following steps: Identify the domain of the application that will (or does) own the D-Bus service we are interested in. For the dnsmasq application, this would be dnsmasq_t: ~# ps -eZ | grep dnsmasq | awk '{print $1}' system_u:system_r:dnsmasq_t:s0-s0:c0.c1023 Identify the domain of the application that wants to send messages to the service. For instance, this could be the sysadm_t user domain. Allow the two domains to interact with each other through D-Bus messages as follows: gen_require(` class dbus send_msg; ') allow sysadm_t dnsmasq_t:dbus send_msg; allow dnsmasq_t sysadm_t:dbus send_msg; How it works… When an application connects to D-Bus, the SELinux label of its connection is used as the label to check when sending messages. As there is no transition for such connections, the label of the connection is the context of the process itself (the domain); hence, the selection of dnsmasq_t in the example. When D-Bus receives a request to send a message to a service, D-Bus will check the SELinux policy for the send_msg permission. It does so by passing on the information about the session (source and target context and the permission that is requested) to the SELinux subsystem, which computes whether access should be allowed or not. The access control itself, however, is not enforced by SELinux (it only gives feedback), but by D-Bus itself as governing the message flows is solely D-Bus' responsibility. This is also why, when developing D-Bus-related policies, both the class and permission need to be explicitly mentioned in the policy module. Without this, the development environment might error out, claiming that dbus is not a valid class. D-Bus checks the context of the client that is sending a message as well as the context of the connection of the service (which are both domain labels) and see if there is a send_msg permission allowed. As most communication is two-fold (sending a message and then receiving a reply), the permission is checked in both directions. After all, sending a reply is just sending a message (policy-wise) in the reverse direction. It is possible to verify this behavior with dbus-send if the rule is on a user domain. For instance, to look at the objects provided by the service, the D-Bus introspection can be invoked against the service: ~# dbus-send --system --dest=uk.org.thekelleys.dnsmasq --print-reply /uk/org/thekelleys/dnsmasq org.freedesktop.DBus.Introspectable.Introspect When SELinux does not have the proper send_msg allow rules in place, the following error will be logged by D-Bus in its service logs (but no AVC denial will show up as it isn't the SELinux subsystem that denies the access): Error org.freedesktop.DBus.Error.AccessDenied: An SELinux policy prevents this sender from sending this message to this recipient. 0 matched rules; type="method_call", sender=":1.17" (uid=0 pid=6738 comm="") interface="org.freedesktop.DBus.Introspectable" member="Introspect" error name="(unset)" requested_reply="0" destination="uk.org.thekelleys.dnsmasq" (uid=0 pid=6635 comm="") When the policy does allow the send_msg permission, the introspection returns an XML output showing the provided methods and interfaces for this service. There's more... The current D-Bus implementation is a pure user space implementation. Because more applications become dependent on D-Bus, work is being done to create a kernel-based D-Bus implementation called kdbus. The exact implementation details of this project are not finished yet, so it is unknown whether the SELinux access controls that are currently applicable to D-Bus will still be valid on kdbus. Summary In this article, we learned how to control D-Bus message flows. It also covers what happens when the policy has or doesn't have the send_msg permission in place. Resources for Article: Further resources on this subject: An Introduction to the Terminal [Article] Wireless and Mobile Hacks [Article] Baking Bits with Yocto Project [Article]
Read more
  • 0
  • 0
  • 3258

article-image-jump-right
Packt
19 Sep 2014
21 min read
Save for later

Jump Right In

Packt
19 Sep 2014
21 min read
In this article by Victor Quinn, J.D., the author of the book Getting Started with tmux, we'll go on a little tour, simulate an everyday use of tmux, and point out some key concepts along the way. tmux is short for Terminal Multiplexer. (For more resources related to this topic, see here.) Running tmux For now, let's jump right in and start playing with it. Open up your favorite terminal application and let's get started. Just run the following command: $ tmux You'll probably see a screen flash, and it'll seem like not much else has happened; it looks like you're right where you were previously, with a command prompt. The word tmux is gone, but not much else appears to have changed. However, you should notice that now there is a bar along the bottom of your terminal window. This can be seen in the following screenshot of the terminal window: Congratulations! You're now running tmux. That bar along the bottom is provided by tmux. We call this bar the status line. The status line gives you information about the session and window you are currently viewing, which other windows are available in this session, and more. Some of what's on that line may look like gibberish now, but we'll learn more about what things mean as we progress through this book. We'll also learn how to customize the status bar to ensure it always shows the most useful items for your workflow. These customizations include things that are a part of tmux (such as the time, date, server you are connected to, and so on) or things that are in third-party libraries (such as the battery level of your laptop, current weather, or number of unread mail messages). Sessions By running tmux with no arguments, you create a brand new session. In tmux, the base unit is called a session. A session can have one or more windows. A window can be broken into one or more panes. We'll have a sneak preview on this topic, what we have here on the current screen is a single pane taking up the whole window in a single session. Imagine that it could be split into two or more different terminals, all running different programs, and each visible split of the terminal is a pane. What is a session in tmux? It may be useful to think of a tmux session as a login on your computer. You can log on to your computer, which initiates a new session. After you log on by entering your username and password, you arrive at an empty desktop. This is similar to a fresh tmux session. You can run one or more programs in this session, where each program has its own window or windows and each window has its own state. In most operating systems, there is a way for you to log out, log back in, and arrive back at the same session, with the windows just as you left them. Often, some of the programs that you had opened will continue to run in the background when you log out, even though their windows are no longer visible. A session in tmux works in much the same way. So, it may be useful to think of tmux as a mini operating system that manages running programs, windows, and more, all within a session. You can have multiple sessions running at the same time. This is convenient if you want to have a session for each task you might be working on. You might have one for an application you are developing by yourself and another that you could use for pair programming. Alternatively, you might have one to develop an application and one to develop another. This way everything can be neat and clean and separate. Naming the session Each session has a name that you can set or change. Notice the [0] at the very left of the status bar? This is the name of the session in brackets. Here, since you just started tmux without any arguments, it was given the name 0. However, this is not a very useful name, so let's change it. In the prompt, just run the following command: $ tmux rename-session tutorial This tells tmux that you want to rename the current session and tutorial is the name you'd like it to have. Of course, you can name it anything you'd like. You should see that your status bar has now been updated, so now instead of [0] on the left-hand side, it should now say [tutorial]. Here's a screenshot of my screen: Of course, it's nice that the status bar now has a pretty name we defined rather than 0, but it provides many more utilities than this, as we'll see in a bit! It's worth noting that here we were giving a session a name, but this same command can also be used to rename an existing session. The window string The status bar has a string that represents each window to inform us about the things that are currently running. The following steps will help us to explore this a bit more: Let's fire up a text editor to pretend we're doing some coding: $ nano test Now type some stuff in there to simulate working very hard on some code: First notice how the text blob in our status bar just to the right of our session name ([tutorial]) has changed. It used to be 0:~* and now it's 0:nano*. Depending on the version of tmux and your chosen shell, yours may be slightly different (for example, 0:bash*). Let's decode this string a bit. This little string encodes a lot of information, some of which is provided in the following bullet points: The zero in the front represents the number of the window. As we'll shortly see, each window is given a number that we can use to identify and switch to it. The colon separates the window number from the name of the program running in that window. The symbols ~ or nano in the previous screenshot are loosely names of the running program. We say "loosely" because you'll notice that ~ is not the name of a program, but was the directory we were visiting. tmux is pretty slick about this; it knows some state of the program you're using and changes the default name of the window accordingly. Note that the name given is the default; it's possible to explicitly set one for the window, as we'll see later. The symbol * indicates that this is the currently viewed window. We only have one at the moment, so it's not too exciting; however, once we get more than one, it'll be very helpful. Creating another window OK! Now that we know a bit about a part of the status line, let's create a second window so we can run a terminal command. Just press Ctrl + b, then c, and you will be presented with a new window! A few things to note are as follows: Now there is a new window with the label 1:~*. It is given the number 1 because the last one was 0. The next will be 2, then 3, 4, and so on. The asterisk that denoted the currently active window has been moved to 1 since it is now the active one. The nano application is still running in window 0. The asterisk on window 0 has been replaced by a hyphen (-). The - symbol denotes the previously opened window. This is very helpful when you have a bunch of windows. Let's run a command here just to illustrate how it works. Run the following commands: $ echo "test" > test $ cat test The output of these commands can be seen in the following screenshot: This is just some stuff so we can help identify this window. Imagine in the real world though you are moving a file, performing operations with Git, viewing log files, running top, or anything else. Let's jump back to window 0 so we can see nano still running. Simply press Ctrl + b and l to switch back to the previously opened window (the one with the hyphen; l stands for the last). As shown in the following screenshot, you'll see that nano is alive, and well, it looks exactly as we left it: The prefix key There is a special key in tmux called the prefix key that is used to perform most of the keyboard shortcuts. We have even used it already quite a bit! In this section, we will learn more about it and run through some examples of its usage. You will notice that in the preceding exercise, we pressed Ctrl + b before creating a window, then Ctrl + b again before switching back, and Ctrl + b before a number to jump to that window. When using tmux, we'll be pressing this key a lot. It's even got a name! We call it the prefix key. Its default binding in tmux is Ctrl + b, but you can change that if you prefer something else or if it conflicts with a key in a program you often use within tmux. You can send the Ctrl + b key combination through to the program by pressing Ctrl + b twice in a row; however, if it's a keyboard command you use often, you'll most likely want to change it. This key is used before almost every command we'll use in tmux, so we'll be seeing it a lot. From here on, if we need to reference the prefix key, we'll do it like <Prefix>. This way if you rebind it, the text will still make sense. If you don't rebound it or see <Prefix>, just type Ctrl + b. Let's create another window for another task. Just run <Prefix>, c again. Now we've got three windows: 0, 1, and 2. We've got one running nano and two running shells, as shown in the following screenshot: Some more things to note are as follows: Now we have window 2, which is active. See the asterisk? Window 0 now has a hyphen because it was the last window we viewed. This is a clear, blank shell because the one we typed stuff into is over in Window 1. Let's switch back to window 1 to see our test commands above still active. The last time we switched windows, we used <Prefix>, l to jump to the last window, but that will not work to get us to window 1 at this point because the hyphen is on window 0. So, going to the last selected window will not get us to 1. Thankfully, it is very easy to switch to a window directly by its number. Just press <Prefix>, then the window number to jump to that window. So <Prefix>, 1 will jump to window 1 even though it wasn't the last one we opened, as shown in the following screenshot: Sure enough, now window 1 is active and everything is present, just as we left it. Now we typed some silly commands here, but it could just as well have been an active running process here, such as unit tests, code linting, or top. Any such process would run in the background in tmux without an issue. This is one of the most powerful features of tmux. In the traditional world, to have a long-running process in a terminal window and get some stuff done in a terminal, you would need two different terminal windows open; if you accidentally close one, the work done in that window will be gone. tmux allows you to keep just one terminal window open, and this window can have a multitude of different windows within it, closing all the different running processes. Closing this terminal window won't terminate the running processes; tmux will continue humming along in the background with all of the programs running behind the scenes. Help on key bindings Now a keen observer may notice that the trick of entering the window number will only work for the first 10 windows. This is because once you get into double digits, tmux won't be able to tell when you're done entering the number. If this trick of using the prefix key plus the number only works for the first 10 windows (windows 0 to 9), how will we select a window beyond 10? Thankfully, tmux gives us many powerful ways to move between windows. One of my favorites is the choose window interface. However, oh gee! This is embarrassing. Your author seems to have entirely forgotten the key combination to access the choose window interface. Don't fear though; tmux has a nice built-in way to access all of the key bindings. So let's use it! Press <Prefix>, ? to see your screen change to show a list with bind-key to the left, the key binding in the middle, and the command it runs to the right. You can use your arrow keys to scroll up and down, but there are a lot of entries there! Thankfully, there is a quicker way to get to the item you want without scrolling forever. Press Ctrl + s and you'll see a prompt appear that says Search Down:, where you can type a string and it will search the help document for that string. Emacs or vi mode tmux tries hard to play nicely with developer defaults, so it actually includes two different modes for many key combinations tailored for the two most popular terminal editors: Emacs and vi. These are referred to in tmux parlance as status-keys and mode-keys that can be either Emacs or vi. The tmux default mode is Emacs for all the key combinations, but it can be changed to vi via configuration. It may also be set to vi automatically based on the global $EDITOR setting in your shell. If you are used to Emacs, Ctrl + s should feel very natural since it's the command Emacs uses to search. So, if you try Ctrl + s and it has no effect, your keys are probably in the vi mode. We'll try to provide guidance when there is a mode-specific key like this by including the vi mode's counterpart in parentheses after the default key. For example, in this case, the command would look like Ctrl + s (/) since the default is Ctrl + s and / is the command in the vi mode. Type in choose-window and hit Enter to search down and find the choose-window key binding. Oh look! There it is; it's w: However, what exactly does that mean? Well, all that means is that we can type our prefix key (<Prefix>), followed by the key in that help document to run the mentioned command. First, let's get out of these help docs. To get out of these or any screens like them, generated by tmux, simply press q for quit and you should be back in the shell prompt for window 2. If you ever forget any key bindings, this should be your first step. A nice feature of this key binding help page is that it is dynamically updated as you change your key bindings. Later, when we get to Configuration, you may want to change bindings or bind new shortcuts. They'll all show up in this interface with the configuration you provide them with. Can't do that with manpages! Now, to open the choose window interface, simply type <Prefix>, w since w was the key shown in the help bound to choose-window and voilà: Notice how it nicely lays out all of the currently open windows in a task-manager-like interface. It's interactive too. You can use the arrow keys to move up and down to highlight whichever window you like and then just hit Enter to open it. Let's open the window with nano running. Move up to highlight window 0 and hit Enter. You may notice a few more convenient and intuitive ways to switch between the currently active windows when browsing through the key bindings help. For example, <Prefix>, p will switch to the previous window and <Prefix>, n will switch to the next window. Whether refreshing your recollection on a key binding you've already learnt or seeking to discover a new one, the key bindings help is an excellent resource. Searching for text Now we only have three windows so it's pretty easy to remember what's where, but what if we had 30 or 300? With tmux, that's totally possible. (Though, this is not terribly likely or useful! What would you do with 300 active windows?) One other convenient way to switch between windows is to use the find-window feature. This will prompt us for some text, and it will search all the active windows and open the window that has the text in it. If you've been following along, you should have the window with nano currently open (window 0). Remember we had a shell in window 1 where we had typed some silly commands? Let's try to switch to that one using the find-window feature. Type <Prefix>, f and you'll see a find-window prompt as shown in the following screenshot: Here, type in cat test and hit Enter . You'll see you've switched to window 1 because it had the cat test command in it. However, what if you search for some text that is ambiguous? For example, if you've followed along, you will see the word test appear multiple times on both windows 0 and 1. So, if you try find-window with just the word test, it couldn't magically switch right away because it wouldn't know which window you mean. Thankfully, tmux is smart enough to handle this. It will give you a prompt, similar to the choose-window interface shown earlier, but with only the windows that match the query (in our case, windows 0 and 1; window 2 did not have the word test in it). It also includes the first line in each window (for context) that had the text. Pick window 0 to open it. Detaching and attaching Now press <Prefix>, d . Uh oh! Looks like tmux is gone! The familiar status bar is no more available. The <Prefix> key set does nothing anymore. You may think we the authors have led you astray, causing you to lose your work. What will you do without that detailed document you were writing in nano? Fear not explorer, we are simply demonstrating another very powerful feature of tmux. <Prefix>, d will simply detach the currently active session, but it will keep running happily in the background! Yes, although it looks like it's gone, our session is alive and well. How can we get back to it? First, let's view the active sessions. In your terminal, run the following command: $ tmux list-sessions You should see a nice list that has your session name, number of windows, and date of creation and dimensions. If you had more than one session, you'd see them here too. To re attach the detached session to your session, simply run the following command: $ tmux attach-session –t tutorial This tells tmux to attach a session and the session to attach it to as the target (hence -t). In this case, we want to attach the session named tutorial. Sure enough, you should be back in your tmux session, with the now familiar status bar along the bottom and your nano masterpiece back in view. Note that this is the most verbose version of this command. You can actually omit the target if there is only one running session, as is in our scenario. This shortens the command to tmux attach-session. It can be further shortened because attach-session has a shorter alias, attach. So, we could accomplish the same thing with just tmux attach. Throughout this text, we will generally use the more verbose version, as they tend to be more descriptive, and leave shorter analogues as exercises for the reader. Explaining tmux commands Now you may notice that attach-session sounds like a pretty long command. It's the same as list-sessions, and there are many others in the lexicon of tmux commands that seem rather verbose. Tab completion There is less complexity to the long commands than it may seem because most of them can be tab-completed. Try going to your command prompt and typing the following: $ tmux list-se Next, hit the Tab key. You should see it fill out to this: $ tmux list-sessions So thankfully, due to tab completion, there is little need to remember these long commands. Note that tab completion will only work in certain shells with certain configurations, so if the tab completion trick doesn't work, you may want to search the Web and find a way to enable tab completion for tmux. Aliases Most of the commands have an alias, which is a shorter form of each command that can be used. For example, the alias of list-sessions is ls. The alias of new-session is new. You can see them all readily by running the tmux command list-commands (alias lscm), as used in the following code snippet: $ tmux list-commands This will show you a list of all the tmux commands along with their aliases in parenthesis after the full name. Throughout this text, we will always use the full form for clarity, but you could just as easily use the alias (or just tab complete of course). One thing you'll most likely notice is that only the last few lines are visible in your terminal. If you go for your mouse and try to scroll up, that won't work either! How can you view the text that is placed above? We will need to move into something called the Copy mode. Renaming windows Let's say you want to give a more descriptive name to a window. If you had three different windows, each with the nano editor open, seeing nano for each window wouldn't be all that helpful. Thankfully, it's very easy to rename a window. Just switch to the window you'd like to rename. Then <Prefix>, ,will prompt you for a new name. Let's rename the nano window to masterpiece. See how the status line has been updated and now shows window 0 with the masterpiece title as shown in the following screenshot. Thankfully, tmux is not smart enough to check the contents of your window; otherwise, we're not sure whether the masterpiece title would make it through. Killing windows As the last stop on our virtual tour, let's kill a window we no longer need. Switch to window 1 with our find-window trick by entering <Prefix>, f, cat test, Enter or of course we could use the less exciting <Prefix>, l command to move to the last opened window. Now let's say goodbye to this window. Press <Prefix>, & to kill it. You will receive a prompt to which you have to confirm that you want to kill it. This is a destructive process, unlike detaching, so be sure anything you care about has been saved. Once you confirm it, window 1 will be gone. Poor window 1! You will see that now there are only window 0 and window 2 left: You will also see that now <Prefix>, f, cat test, Enter no longer loads window 1 but rather says No windows matching: cat test. So, window 1 is really no longer with us. Whenever we create a new window, it will take the lowest available index, which in this case will be 1. So window 1 can rise again, but this time as a new and different window with little memory of its past. We can also renumber windows as we'll see later, so if window 1 being missing is offensive to your sense of aesthetics, fear not, it can be remedied! Summary In this article, we got to jump right in and get a whirlwind tour of some of the coolest features in tmux. Here is a quick summary of the features we covered in this article: Starting tmux Naming and renaming sessions The window string and what each chunk means Creating new windows The prefix key Multiple ways to switch back and forth between windows Accessing the help documents for available key bindings Detaching and attaching sessions Renaming and killing windows Resources for Article:  Further resources on this subject: Getting Started with GnuCash [article] Creating a Budget for your Business with Gnucash [article] Apache CloudStack Architecture [article]
Read more
  • 0
  • 0
  • 1524
Unlock access to the largest independent learning library in Tech for FREE!
Get unlimited access to 7500+ expert-authored eBooks and video courses covering every tech area you can think of.
Renews at $19.99/month. Cancel anytime
article-image-event-driven-bpel-process
Packt
19 Sep 2014
5 min read
Save for later

Event-driven BPEL Process

Packt
19 Sep 2014
5 min read
In this article, by Matjaz B. Juric and Denis Weerasiri, authors of the book WS-BPEL 2.0 Beginner's Guide, we will study about the event-driven BPEL process. We will learn to develop a book shelving BPEL process. (For more resources related to this topic, see here.) Developing an event-driven BPEL process Firstly, we will develop an event-driven BPEL process. This is a BPEL process triggered by a business event. We will develop a process for book shelving. As we have already mentioned, such a process can be executed on various occasions, such as when a book arrives to the bookstore for the first time, after a customer has looked at the book, or even during an inventory. In contrast to a BPEL process, which exposes an operation that needs to be invoked explicitly, our book shelving process will react on a business event. We will call it a BookshelfEvent. We can see that in order to develop an event-driven BPEL process, we will need to firstly declare a business event, the BookshelfEvent. Following this, we will need to develop the event-driven book shelving BPEL process. Declaring a business event We will declare the BookshelfEvent business event, which will signal that a book is ready to be book shelved. Each business event contains a data payload, which is defined by the corresponding XML schema type. In our case, we will use the BookData type, the same one that we used in the book warehousing process. Time for action – declaring a business event To declare the BookshelfEvent business event, we will go to the composite view. We will proceed as follows: Right-click on the project in the Application window and select New and then Event Definition: A Create Event Definition dialog box will open. We will specify the EDL filename. This is the file where all the events are defined (similar to WSDL, where the web service operations are defined). We will use the BookEDL for the EDL filename. For the Namespace field, we will use http://packtpub.com/events/edl/BookEDL, as shown in the following screenshot: Next, we need to define the business events. We will use the green plus sign to declare the BookshelfEvent business event. After clicking on the green plus sign, the Create Event dialog box will open. We need to specify the event name, which is BookshelfEvent. We also have to specify the XML Type, which will be used for the event data payload. We will use the BookData from the Book Warehousing BPEL process schema, as shown in the following screenshot: After clicking on the OK button, we should see the following: What just happened? We have successfully declared the BookshelfEvent business event. This has generated the BookEDL.edl file with the source code as shown in the following screenshot: Developing a book shelving BPEL process After declaring the business event, we are ready to develop the event-driven book shelving BPEL process. The process will be triggered by our BookshelfEvent business event. This means that the process will not have a classic WSDL with the operation declaration. Rather it will be triggered by the BookshelfEvent business event. Time for action – developing an event-driven book shelving BPEL process To develop the event-driven book shelving BPEL process, we will go to the composite view. We will carry out the following steps: Drag-and-drop the BPEL Process service component from the right-hand side toolbar to the composite components area. The Create BPEL Process dialog box will open. We will select the BPEL 2.0 Specification, type BookShelvingBPEL for the Name of the process, and specify the namespace as http://packtpub.com/Bookstore/BookShelvingBPEL. Then, we will select Subscribe to Events from the drop-down list for the Template: Next, we will need to specify the event to which our BPEL process will be subscribed. We will select the green plus sign and the Event Chooser dialog window will open. Here, we will simply select the BookshelfEvent business event: After clicking on the OK button, we should see the following screenshot: For event-driven BPEL processes, three consistency strategies for delivering events exist. The one and only one option delivers the events in the global transaction. Guaranteed delivers events asynchronously without a global transaction. Immediate delivers events in the same global transaction and the same thread as the publisher, and the publish call does not return until all immediate subscribers have completed processing. After clicking on the OK button, we can find the new BookShelvingBPEL process on the composite diagram. Please note that the arrow icon denotes that the process is triggered by an event: Double-clicking on the BookShelvingBPEL process opens the BPEL editor, where we can see that the BPEL process has a slightly different <receive> activity, which denotes that the process will be triggered by an event. Also, notice that an event-driven process does not return anything to the client, as event-driven processes are one-way and asynchronous: What just happened? We have successfully created the BookShelvingBPEL process. Looking at the source code we can see that the overall structure is the same as with any other BPEL process. The difference is in the initial <receive> activity, which is triggered by the BookshelfEvent business event, as shown in the following screenshot: Have a go hero – implementing the BookShelvingBPEL process Implementing the event-driven BookShelvingBPEL process does not differ from implementing any other BPEL process. Therefore, it's your turn now. You should implement the BookShelvingBPEL process to do something meaningful. It could, for example, call a service which will query a database table. Or, it could include a human task. Summary In this article, we learned how to develop event-driven BPEL processes and how to invoke events from BPEL processes. We also learned to implement the BookShelvingBPEL process. Resources for Article: Further resources on this subject: Securing a BPEL process [Article] Scopes in Advanced BPEL [Article] Advanced Activities in BPEL [Article]
Read more
  • 0
  • 0
  • 1655

article-image-driving-visual-analyses-automobile-data-python
Packt
19 Sep 2014
19 min read
Save for later

Driving Visual Analyses with Automobile Data (Python)

Packt
19 Sep 2014
19 min read
This article written by Tony Ojeda, Sean Patrick Murphy, Benjamin Bengfort, and Abhijit Dasgupta, authors of the book Practical Data Science Cookbook, will cover the following topics: Getting started with IPython Exploring IPython Notebook Preparing to analyze automobile fuel efficiencies Exploring and describing the fuel efficiency data with Python (For more resources related to this topic, see here.) The dataset, available at http://www.fueleconomy.gov/feg/epadata/vehicles.csv.zip, contains fuel efficiency performance metrics over time for all makes and models of automobiles in the United States of America. This dataset also contains numerous other features and attributes of the automobile models other than fuel economy, providing an opportunity to summarize and group the data so that we can identify interesting trends and relationships. We will perform the entire analysis using Python. However, we will ask the same questions and follow the same sequence of steps as before, again following the data science pipeline. With study, this will allow you to see the similarities and differences between the two languages for a mostly identical analysis. In this article, we will take a very different approach using Python as a scripting language in an interactive fashion that is more similar to R. We will introduce the reader to the unofficial interactive environment of Python, IPython, and the IPython notebook, showing how to produce readable and well-documented analysis scripts. Further, we will leverage the data analysis capabilities of the relatively new but powerful pandas library and the invaluable data frame data type that it offers. pandas often allows us to complete complex tasks with fewer lines of code. The drawback to this approach is that while you don't have to reinvent the wheel for common data manipulation tasks, you do have to learn the API of a completely different package, which is pandas. The goal of this article is not to guide you through an analysis project that you have already completed but to show you how that project can be completed in another language. More importantly, we want to get you, the reader, to become more introspective with your own code and analysis. Think not only about how something is done but why something is done that way in that particular language. How does the language shape the analysis? Getting started with IPython IPython is the interactive computing shell for Python that will change the way you think about interactive shells. It brings to the table a host of very useful functionalities that will most likely become part of your default toolbox, including magic functions, tab completion, easy access to command-line tools, and much more. We will only scratch the surface here and strongly recommend that you keep exploring what can be done with IPython. Getting ready If you have completed the installation, you should be ready to tackle the following recipes. Note that IPython 2.0, which is a major release, was launched in 2014. How to do it… The following steps will get you up and running with the IPython environment: Open up a terminal window on your computer and type ipython. You should be immediately presented with the following text: Python 2.7.5 (default, Mar 9 2014, 22:15:05)Type "copyright", "credits" or "license" for more information. IPython 2.1.0 -- An enhanced Interactive Python.?         -> Introduction and overview of IPython's features.%quickref -> Quick reference.help     -> Python's own help system.object?   -> Details about 'object', use 'object??' for extra details.In [1]: Note that your version might be slightly different than what is shown in the preceding command-line output. Just to show you how great IPython is, type in ls, and you should be greeted with the directory listing! Yes, you have access to common Unix commands straight from your Python prompt inside the Python interpreter. Now, let's try changing directories. Type cd at the prompt, hit space, and now hit Tab. You should be presented with a list of directories available from within the current directory. Start typing the first few letters of the target directory, and then, hit Tab again. If there is only one option that matches, hitting the Tab key automatically will insert that name. Otherwise, the list of possibilities will show only those names that match the letters that you have already typed. Each letter that is entered acts as a filter when you press Tab. Now, type ?, and you will get a quick introduction to and overview of IPython's features. Let's take a look at the magic functions. These are special functions that IPython understands and will always start with the % symbol. The %paste function is one such example and is amazing for copying and pasting Python code into IPython without losing proper indentation. We will try the %timeit magic function that intelligently benchmarks Python code. Enter the following commands: n = 100000%timeit range(n)%timeit xrange(n) We should get an output like this: 1000 loops, best of 3: 1.22 ms per loop1000000 loops, best of 3: 258 ns per loop This shows you how much faster xrange is than range (1.22 milliseconds versus 2.58 nanoseconds!) and helps show you the utility of generators in Python. You can also easily run system commands by prefacing the command with an exclamation mark. Try the following command: !ping www.google.com You should see the following output: PING google.com (74.125.22.101): 56 data bytes64 bytes from 74.125.22.101: icmp_seq=0 ttl=38 time=40.733 ms64 bytes from 74.125.22.101: icmp_seq=1 ttl=38 time=40.183 ms64 bytes from 74.125.22.101: icmp_seq=2 ttl=38 time=37.635 ms Finally, IPython provides an excellent command history. Simply press the up arrow key to access the previously entered command. Continue to press the up arrow key to walk backwards through the command list of your session and the down arrow key to come forward. Also, the magic %history command allows you to jump to a particular command number in the session. Type the following command to see the first command that you entered: %history 1 Now, type exit to drop out of IPython and back to your system command prompt. How it works… There isn't much to explain here and we have just scratched the surface of what IPython can do. Hopefully, we have gotten you interested in diving deeper, especially with the wealth of new features offered by IPython 2.0, including dynamic and user-controllable data visualizations. See also IPython at http://ipython.org/ The IPython Cookbook at https://github.com/ipython/ipython/wiki?path=Cookbook IPython: A System for Interactive Scientific Computing at http://fperez.org/papers/ipython07_pe-gr_cise.pdf Learning IPython for Interactive Computing and Data Visualization, Cyrille Rossant, Packt Publishing, available at http://www.packtpub.com/learning-ipython-for-interactive-computing-and-data-visualization/book The future of IPython at http://www.infoworld.com/print/236429 Exploring IPython Notebook IPython Notebook is the perfect complement to IPython. As per the IPython website: "The IPython Notebook is a web-based interactive computational environment where you can combine code execution, text, mathematics, plots and rich media into a single document." While this is a bit of a mouthful, it is actually a pretty accurate description. In practice, IPython Notebook allows you to intersperse your code with comments and images and anything else that might be useful. You can use IPython Notebooks for everything from presentations (a great replacement for PowerPoint) to an electronic laboratory notebook or a textbook. Getting ready If you have completed the installation, you should be ready to tackle the following recipes. How to do it… These steps will get you started with exploring the incredibly powerful IPython Notebook environment. We urge you to go beyond this simple set of steps to understand the true power of the tool. Type ipython notebook --pylab=inline in the command prompt. The --pylab=inline option should allow your plots to appear inline in your notebook. You should see some text quickly scroll by in the terminal window, and then, the following screen should load in the default browser (for me, this is Chrome). Note that the URL should be http://127.0.0.1:8888/, indicating that the browser is connected to a server running on the local machine at port 8888. You should not see any notebooks listed in the browser (note that IPython Notebook files have a .ipynb extension) as IPython Notebook searches the directory you launched it from for notebook files. Let's create a notebook now. Click on the New Notebook button in the upper right-hand side of the page. A new browser tab or window should open up, showing you something similar to the following screenshot: From the top down, you can see the text-based menu followed by the toolbar for issuing common commands, and then, your very first cell, which should resemble the command prompt in IPython. Place the mouse cursor in the first cell and type 5+5. Next, either navigate to Cell | Run or press Shift + Enter as a keyboard shortcut to cause the contents of the cell to be interpreted. You should now see something similar to the following screenshot. Basically, we just executed a simple Python statement within the first cell of our first IPython Notebook. Click on the second cell, and then, navigate to Cell | Cell Type | Markdown. Now, you can easily write markdown in the cell for documentation purposes. Close the two browser windows or tabs (the notebook and the notebook browser). Go back to the terminal in which you typed ipython notebook, hit Ctrl + C, then hit Y, and press Enter. This will shut down the IPython Notebook server. How it works… For those of you coming from either more traditional statistical software packages, such as Stata, SPSS, or SAS, or more traditional mathematical software packages, such as MATLAB, Mathematica, or Maple, you are probably used to the very graphical and feature-rich interactive environments provided by the respective companies. From this background, IPython Notebook might seem a bit foreign but hopefully much more user friendly and less intimidating than the traditional Python prompt. Further, IPython Notebook offers an interesting combination of interactivity and sequential workflow that is particularly well suited for data analysis, especially during the prototyping phases. R has a library called Knitr (http://yihui.name/knitr/) that offers the report-generating capabilities of IPython Notebook. When you type in ipython notebook, you are launching a server running on your local machine, and IPython Notebook itself is really a web application that uses a server-client architecture. The IPython Notebook server, as per ipython.org, uses a two-process kernel architecture with ZeroMQ (http://zeromq.org/) and Tornado. ZeroMQ is an intelligent socket library for high-performance messaging, helping IPython manage distributed compute clusters among other tasks. Tornado is a Python web framework and asynchronous networking module that serves IPython Notebook's HTTP requests. The project is open source and you can contribute to the source code if you are so inclined. IPython Notebook also allows you to export your notebooks, which are actually just text files filled with JSON, to a large number of alternative formats using the command-line tool called nbconvert (http://ipython.org/ipython-doc/rel-1.0.0/interactive/nbconvert.html). Available export formats include HTML, LaTex, reveal.js HTML slideshows, Markdown, simple Python scripts, and reStructuredText for the Sphinx documentation. Finally, there is IPython Notebook Viewer (nbviewer), which is a free web service where you can both post and go through static, HTML versions of notebook files hosted on remote servers (these servers are currently donated by Rackspace). Thus, if you create an amazing .ipynb file that you want to share, you can upload it to http://nbviewer.ipython.org/ and let the world see your efforts. There's more… We will try not to sing too loudly the praises of Markdown, but if you are unfamiliar with the tool, we strongly suggest that you try it out. Markdown is actually two different things: a syntax for formatting plain text in a way that can be easily converted to a structured document and a software tool that converts said text into HTML and other languages. Basically, Markdown enables the author to use any desired simple text editor (VI, VIM, Emacs, Sublime editor, TextWrangler, Crimson Editor, or Notepad) that can capture plain text yet still describe relatively complex structures such as different levels of headers, ordered and unordered lists, and block quotes as well as some formatting such as bold and italics. Markdown basically offers a very human-readable version of HTML that is similar to JSON and offers a very human-readable data format. See also IPython Notebook at http://ipython.org/notebook.html The IPython Notebook documentation at http://ipython.org/ipython-doc/stable/interactive/notebook.html An interesting IPython Notebook collection at https://github.com/ipython/ipython/wiki/A-gallery-of-interesting-IPython-Notebooks The IPython Notebook development retrospective at http://blog.fperez.org/2012/01/ipython-notebook-historical.html Setting up a remote IPython Notebook server at http://nbviewer.ipython.org/github/Unidata/tds-python-workshop/blob/master/ipython-notebook-server.ipynb The Markdown home page at https://daringfireball.net/projects/markdown/basics Preparing to analyze automobile fuel efficiencies In this recipe, we are going to start our Python-based analysis of the automobile fuel efficiencies data. Getting ready If you completed the first installation successfully, you should be ready to get started. How to do it… The following steps will see you through setting up your working directory and IPython for the analysis for this article: Create a project directory called fuel_efficiency_python. Download the automobile fuel efficiency dataset from http://fueleconomy.gov/feg/epadata/vehicles.csv.zip and store it in the preceding directory. Extract the vehicles.csv file from the zip file into the same directory. Open a terminal window and change the current directory (cd) to the fuel_efficiency_python directory. At the terminal, type the following command: ipython notebook Once the new page has loaded in your web browser, click on New Notebook. Click on the current name of the notebook, which is untitled0, and enter in a new name for this analysis (mine is fuel_efficiency_python). Let's use the top-most cell for import statements. Type in the following commands: import pandas as pdimport numpy as npfrom ggplot import *%matplotlib inline Then, hit Shift + Enter to execute the cell. This imports both the pandas and numpy libraries, assigning them local names to save a few characters while typing commands. It also imports the ggplot library. Please note that using the from ggplot import * command line is not a best practice in Python and pours the ggplot package contents into our default namespace. However, we are doing this so that our ggplot syntax most closely resembles the R ggplot2 syntax, which is strongly not Pythonic. Finally, we use a magic command to tell IPython Notebook that we want matploblib graphs to render in the notebook. In the next cell, let's import the data and look at the first few records: vehicles = pd.read_csv("vehicles.csv")vehicles.head Then, press Shift + Enter. The following text should be shown: However, notice that a red warning message appears as follows: /Library/Python/2.7/site-packages/pandas/io/parsers.py:1070: DtypeWarning: Columns (22,23,70,71,72,73) have mixed types. Specify dtype option on import or set low_memory=False.   data = self._reader.read(nrows) This tells us that columns 22, 23, 70, 71, 72, and 73 contain mixed data types. Let's find the corresponding names using the following commands: column_names = vehicles.columns.values column_names[[22, 23, 70, 71, 72, 73]]   array([cylinders, displ, fuelType2, rangeA, evMotor, mfrCode], dtype=object) Mixed data types sounds like it could be problematic so make a mental note of these column names. Remember, data cleaning and wrangling often consume 90 percent of project time. How it works… With this recipe, we are simply setting up our working directory and creating a new IPython Notebook that we will use for the analysis. We have imported the pandas library and very quickly read the vehicles.csv data file directly into a data frame. Speaking from experience, pandas' robust data import capabilities will save you a lot of time. Although we imported data directly from a comma-separated value file into a data frame, pandas is capable of handling many other formats, including Excel, HDF, SQL, JSON, Stata, and even the clipboard using the reader functions. We can also write out the data from data frames in just as many formats using writer functions accessed from the data frame object. Using the bound method head that is part of the Data Frame class in pandas, we have received a very informative summary of the data frame, including a per-column count of non-null values and a count of the various data types across the columns. There's more… The data frame is an incredibly powerful concept and data structure. Thinking in data frames is critical for many data analyses yet also very different from thinking in array or matrix operations (say, if you are coming from MATLAB or C as your primary development languages). With the data frame, each column represents a different variable or characteristic and can be a different data type, such as floats, integers, or strings. Each row of the data frame is a separate observation or instance with its own set of values. For example, if each row represents a person, the columns could be age (an integer) and gender (a category or string). Often, we will want to select the set of observations (rows) that match a particular characteristic (say, all males) and examine this subgroup. The data frame is conceptually very similar to a table in a relational database. See also Data structures in pandas at http://pandas.pydata.org/pandas-docs/stable/dsintro.html Data frames in R at http://www.r-tutor.com/r-introduction/data-frame Exploring and describing the fuel efficiency data with Python Now that we have imported the automobile fuel efficiency dataset into IPython and witnessed the power of pandas, the next step is to replicate the preliminary analysis performed in R, getting your feet wet with some basic pandas functionality. Getting ready We will continue to grow and develop the IPython Notebook that we started in the previous recipe. If you've completed the previous recipe, you should have everything you need to continue. How to do it… First, let's find out how many observations (rows) are in our data using the following command: len(vehicles) 34287 If you switch back and forth between R and Python, remember that in R, the function is length and in Python, it is len. Next, let's find out how many variables (columns) are in our data using the following command: len(vehicles.columns) 74 Let's get a list of the names of the columns using the following command: print(vehicles.columns) Index([u'barrels08', u'barrelsA08', u'charge120', u'charge240', u'city08', u'city08U', u'cityA08', u'cityA08U', u'cityCD', u'cityE', u'cityUF', u'co2', u'co2A', u'co2TailpipeAGpm', u'co2TailpipeGpm', u'comb08', u'comb08U', u'combA08', u'combA08U', u'combE', u'combinedCD', u'combinedUF', u'cylinders', u'displ', u'drive', u'engId', u'eng_dscr', u'feScore', u'fuelCost08', u'fuelCostA08', u'fuelType', u'fuelType1', u'ghgScore', u'ghgScoreA', u'highway08', u'highway08U', u'highwayA08', u'highwayA08U', u'highwayCD', u'highwayE', u'highwayUF', u'hlv', u'hpv', u'id', u'lv2', u'lv4', u'make', u'model', u'mpgData', u'phevBlended', u'pv2', u'pv4', u'range', u'rangeCity', u'rangeCityA', u'rangeHwy', u'rangeHwyA', u'trany', u'UCity', u'UCityA', u'UHighway', u'UHighwayA', u'VClass', u'year', u'youSaveSpend', u'guzzler', u'trans_dscr', u'tCharger', u'sCharger', u'atvType', u'fuelType2', u'rangeA', u'evMotor', u'mfrCode'], dtype=object) The u letter in front of each string indicates that the strings are represented in Unicode (http://docs.python.org/2/howto/unicode.html) Let's find out how many unique years of data are included in this dataset and what the first and last years are using the following command: len(pd.unique(vehicles.year)) 31 min(vehicles.year) 1984 max(vehicles["year"]) 2014 Note that again, we have used two different syntaxes to reference individual columns within the vehicles data frame. Next, let's find out what types of fuel are used as the automobiles' primary fuel types. In R, we have the table function that will return a count of the occurrences of a variable's various values. In pandas, we use the following: pd.value_counts(vehicles.fuelType1)Regular Gasoline     24587Premium Gasoline     8521Diesel                    1025Natural Gas               57Electricity                 56Midgrade Gasoline      41dtype: int64 Now if we want to explore what types of transmissions these automobiles have, we immediately try the following command: pd.value_counts(vehicles.trany) However, this results in a bit of unexpected and lengthy output: What we really want to know is the number of cars with automatic and manual transmissions. We notice that the trany variable always starts with the letter A when it represents an automatic transmission and M for manual transmission. Thus, we create a new variable, trany2, that contains the first character of the trany variable, which is a string: vehicles["trany2"] = vehicles.trany.str[0]pd.value_counts(vehicles.trany2) The preceding command yields the answer that we wanted or twice as many automatics as manuals: A   22451M   11825dtype: int64 How it works… In this recipe, we looked at some basic functionality in Python and pandas. We have used two different syntaxes (vehicles['trany'] and vehicles.trany) to access variables within the data frame. We have also used some of the core pandas functions to explore the data, such as the incredibly useful unique and the value_counts function. There's more... In terms of the data science pipeline, we have touched on two stages in a single recipe: data cleaning and data exploration. Often, when working with smaller datasets where the time to complete a particular action is quite short and can be completed on our laptop, we will very quickly go through multiple stages of the pipeline and then loop back, depending on the results. In general, the data science pipeline is a highly iterative process. The faster we can accomplish steps, the more iterations we can fit into a fixed time, and often, we can create a better final analysis. See also The pandas API overview at http://pandas.pydata.org/pandas-docs/stable/api.html Summary This article took you through the process of analyzing and visualizing automobile data to identify trends and patterns in fuel efficiency over time using the powerful programming language, Python. Resources for Article: Further resources on this subject: Importing Dynamic Data [article] MongoDB data modeling [article] Report Data Filtering [article]
Read more
  • 0
  • 0
  • 9021

article-image-creating-restful-api
Packt
19 Sep 2014
24 min read
Save for later

Creating a RESTful API

Packt
19 Sep 2014
24 min read
In this article by Jason Krol, the author of Web Development with MongoDB and NodeJS, we will review the following topics: (For more resources related to this topic, see here.) Introducing RESTful APIs Installing a few basic tools Creating a basic API server and sample JSON data Responding to GET requests Updating data with POST and PUT Removing data with DELETE Consuming external APIs from Node What is an API? An Application Programming Interface (API) is a set of tools that a computer system makes available that provides unrelated systems or software the ability to interact with each other. Typically, a developer uses an API when writing software that will interact with a closed, external, software system. The external software system provides an API as a standard set of tools that all developers can use. Many popular social networking sites provide developer's access to APIs to build tools to support those sites. The most obvious examples are Facebook and Twitter. Both have a robust API that provides developers with the ability to build plugins and work with data directly, without them being granted full access as a general security precaution. As you will see with this article, providing your own API is not only fairly simple, but also it empowers you to provide your users with access to your data. You also have the added peace of mind knowing that you are in complete control over what level of access you can grant, what sets of data you can make read-only, as well as what data can be inserted and updated. What is a RESTful API? Representational State Transfer (REST) is a fancy way of saying CRUD over HTTP. What this means is when you use a REST API, you have a uniform means to create, read, and update data using simple HTTP URLs with a standard set of HTTP verbs. The most basic form of a REST API will accept one of the HTTP verbs at a URL and return some kind of data as a response. Typically, a REST API GET request will always return some kind of data such as JSON, XML, HTML, or plain text. A POST or PUT request to a RESTful API URL will accept data to create or update. The URL for a RESTful API is known as an endpoint, and while working with these endpoints, it is typically said that you are consuming them. The standard HTTP verbs used while interfacing with REST APIs include: GET: This retrieves data POST: This submits data for a new record PUT: This submits data to update an existing record PATCH: This submits a date to update only specific parts of an existing record DELETE: This deletes a specific record Typically, RESTful API endpoints are defined in a way that they mimic the data models and have semantic URLs that are somewhat representative of the data models. What this means is that to request a list of models, for example, you would access an API endpoint of /models. Likewise, to retrieve a specific model by its ID, you would include that in the endpoint URL via /models/:Id. Some sample RESTful API endpoint URLs are as follows: GET http://myapi.com/v1/accounts: This returns a list of accounts GET http://myapi.com/v1/accounts/1: This returns a single account by Id: 1 POST http://myapi.com/v1/accounts: This creates a new account (data submitted as a part of the request) PUT http://myapi.com/v1/accounts/1: This updates an existing account by Id: 1 (data submitted as part of the request) GET http://myapi.com/v1/accounts/1/orders: This returns a list of orders for account Id: 1 GET http://myapi.com/v1/accounts/1/orders/21345: This returns the details for a single order by Order Id: 21345 for account Id: 1 It's not a requirement that the URL endpoints match this pattern; it's just common convention. Introducing Postman REST Client Before we get started, there are a few tools that will make life much easier when you're working directly with APIs. The first of these tools is called Postman REST Client, and it's a Google Chrome application that can run right in your browser or as a standalone-packaged application. Using this tool, you can easily make any kind of request to any endpoint you want. The tool provides many useful and powerful features that are very easy to use and, best of all, free! Installation instructions Postman REST Client can be installed in two different ways, but both require Google Chrome to be installed and running on your system. The easiest way to install the application is by visiting the Chrome Web Store at https://chrome.google.com/webstore/category/apps. Perform a search for Postman REST Client and multiple results will be returned. There is the regular Postman REST Client that runs as an application built into your browser, and then separate Postman REST Client (packaged app) that runs as a standalone application on your system in its own dedicated window. Go ahead and install your preference. If you install the application as the standalone packaged app, an icon to launch it will be added to your dock or taskbar. If you installed it as a regular browser app, you can launch it by opening a new tab in Google Chrome and going to Apps and finding the Postman REST Client icon. After you've installed and launched the app, you should be presented with an output similar to the following screenshot: A quick tour of Postman REST Client Using Postman REST Client, we're able to submit REST API calls to any endpoint we want as well as modify the type of request. Then, we can have complete access to the data that's returned from the API as well as any errors that might have occurred. To test an API call, enter the URL to your favorite website in the Enter request URL here field and leave the dropdown next to it as GET. This will mimic a standard GET request that your browser performs anytime you visit a website. Click on the blue Send button. The request is made and the response is displayed at the bottom half of the screen. In the following screenshot, I sent a simple GET request to http://kroltech.com and the HTML is returned as follows: If we change this URL to that of the RSS feed URL for my website, you can see the XML returned: The XML view has a few more features as it exposes the sidebar to the right that gives you a handy outline to glimpse the tree structure of the XML data. Not only that, you can now see a history of the requests we've made so far along the left sidebar. This is great when we're doing more advanced POST or PUT requests and don't want to repeat the data setup for each request while testing an endpoint. Here is a sample API endpoint I submitted a GET request to that returns the JSON data in its response: A really nice thing about making API calls to endpoints that return JSON using Postman Client is that it parses and displays the JSON in a very nicely formatted way, and each node in the data is expandable and collapsible. The app is very intuitive so make sure you spend some time playing around and experimenting with different types of calls to different URLs. Using the JSONView Chrome extension There is one other tool I want to let you know about (while extremely minor) that is actually a really big deal. The JSONView Chrome extension is a very small plugin that will instantly convert any JSON you view directly via the browser into a more usable JSON tree (exactly like Postman Client). Here is an example of pointing to a URL that returns JSON from Chrome before JSONView is installed: And here is that same URL after JSONView has been installed: You should install the JSONView Google Chrome extension the same way you installed Postman REST Client—access the Chrome Web Store and perform a search for JSONView. Now that you have the tools to be able to easily work with and test API endpoints, let's take a look at writing your own and handling the different request types. Creating a Basic API server Let's create a super basic Node.js server using Express that we'll use to create our own API. Then, we can send tests to the API using Postman REST Client to see how it all works. In a new project workspace, first install the npm modules that we're going to need in order to get our server up and running: $ npm init $ npm install --save express body-parser underscore Now that the package.json file for this project has been initialized and the modules installed, let's create a basic server file to bootstrap up an Express server. Create a file named server.js and insert the following block of code: var express = require('express'),    bodyParser = require('body-parser'),    _ = require('underscore'), json = require('./movies.json'),    app = express();   app.set('port', process.env.PORT || 3500);   app.use(bodyParser.urlencoded()); app.use(bodyParser.json());   var router = new express.Router(); // TO DO: Setup endpoints ... app.use('/', router);   var server = app.listen(app.get('port'), function() {    console.log('Server up: http://localhost:' + app.get('port')); }); Most of this should look familiar to you. In the server.js file, we are requiring the express, body-parser, and underscore modules. We're also requiring a file named movies.json, which we'll create next. After our modules are required, we set up the standard configuration for an Express server with the minimum amount of configuration needed to support an API server. Notice that we didn't set up Handlebars as a view-rendering engine because we aren't going to be rendering any HTML with this server, just pure JSON responses. Creating sample JSON data Let's create the sample movies.json file that will act as our temporary data store (even though the API we build for the purposes of demonstration won't actually persist data beyond the app's life cycle): [{    "Id": "1",    "Title": "Aliens",    "Director": "James Cameron",    "Year": "1986",    "Rating": "8.5" }, {    "Id": "2",    "Title": "Big Trouble in Little China",    "Director": "John Carpenter",    "Year": "1986",    "Rating": "7.3" }, {    "Id": "3",    "Title": "Killer Klowns from Outer Space",    "Director": "Stephen Chiodo",    "Year": "1988",    "Rating": "6.0" }, {    "Id": "4",    "Title": "Heat",    "Director": "Michael Mann",    "Year": "1995",    "Rating": "8.3" }, {    "Id": "5",    "Title": "The Raid: Redemption",    "Director": "Gareth Evans",    "Year": "2011",    "Rating": "7.6" }] This is just a really simple JSON list of a few of my favorite movies. Feel free to populate it with whatever you like. Boot up the server to make sure you aren't getting any errors (note we haven't set up any routes yet, so it won't actually do anything if you tried to load it via a browser): $ node server.js Server up: http://localhost:3500 Responding to GET requests Adding a simple GET request support is fairly simple, and you've seen this before already in the app we built. Here is some sample code that responds to a GET request and returns a simple JavaScript object as JSON. Insert the following code in the routes section where we have the // TO DO: Setup endpoints ... waiting comment: router.get('/test', function(req, res) {    var data = {        name: 'Jason Krol',        website: 'http://kroltech.com'    };      res.json(data); }); Let's tweak the function a little bit and change it so that it responds to a GET request against the root URL (that is /) route and returns the JSON data from our movies file. Add this new route after the /test route added previously: router.get('/', function(req, res) {    res.json(json); }); The res (response) object in Express has a few different methods to send data back to the browser. Each of these ultimately falls back on the base send method, which includes header information, statusCodes, and so on. res.json and res.jsonp will automatically format JavaScript objects into JSON and then send using res.send. res.render will render a template view as a string and then send it using res.send as well. With that code in place, if we launch the server.js file, the server will be listening for a GET request to the / URL route and will respond with the JSON data of our movies collection. Let's first test it out using the Postman REST Client tool: GET requests are nice because we could have just as easily pulled that same URL via our browser and received the same result: However, we're going to use Postman for the remainder of our endpoint testing as it's a little more difficult to send POST and PUT requests using a browser. Receiving data – POST and PUT requests When we want to allow our users using our API to insert or update data, we need to accept a request from a different HTTP verb. When inserting new data, the POST verb is the preferred method to accept data and know it's for an insert. Let's take a look at code that accepts a POST request and data along with the request, and inserts a record into our collection and returns the updated JSON. Insert the following block of code after the route you added previously for GET: router.post('/', function(req, res) {    // insert the new item into the collection (validate first)    if(req.body.Id && req.body.Title && req.body.Director && req.body.Year && req.body.Rating) {        json.push(req.body);        res.json(json);    } else {        res.json(500, { error: 'There was an error!' });    } }); You can see the first thing we do in the POST function is check to make sure the required fields were submitted along with the actual request. Assuming our data checks out and all the required fields are accounted for (in our case every field), we insert the entire req.body object into the array as is using the array's push function. If any of the required fields aren't submitted with the request, we return a 500 error message instead. Let's submit a POST request this time to the same endpoint using the Postman REST Client. (Don't forget to make sure your API server is running with node server.js.): First, we submitted a POST request with no data, so you can clearly see the 500 error response that was returned. Next, we provided the actual data using the x-www-form-urlencoded option in Postman and provided each of the name/value pairs with some new custom data. You can see from the results that the STATUS was 200, which is a success and the updated JSON data was returned as a result. Reloading the main GET endpoint in a browser yields our original movies collection with the new one added. PUT requests will work in almost exactly the same way except traditionally, the Id property of the data is handled a little differently. In our example, we are going to require the Id attribute as a part of the URL and not accept it as a parameter in the data that's submitted (since it's usually not common for an update function to change the actual Id of the object it's updating). Insert the following code for the PUT route after the existing POST route you added earlier: router.put('/:id', function(req, res) {    // update the item in the collection    if(req.params.id && req.body.Title && req.body.Director && req.body.Year && req.body.Rating) {        _.each(json, function(elem, index) {             // find and update:            if (elem.Id === req.params.id) {                elem.Title = req.body.Title;                elem.Director = req.body.Director;                elem.Year = req.body.Year;                elem.Rating = req.body.Rating;            }        });          res.json(json);    } else {        res.json(500, { error: 'There was an error!' });    } }); This code again validates that the required fields are included with the data that was submitted along with the request. Then, it performs an _.each loop (using the underscore module) to look through the collection of movies and find the one whose Id parameter matches that of the Id included in the URL parameter. Assuming there's a match, the individual fields for that matched object are updated with the new values that were sent with the request. Once the loop is complete, the updated JSON data is sent back as the response. Similarly, in the POST request, if any of the required fields are missing, a simple 500 error message is returned. The following screenshot demonstrates a successful PUT request updating an existing record. The response from Postman after including the value 1 in the URL as the Id parameter, which provides the individual fields to update as x-www-form-urlencoded values, and finally sending as PUT shows that the original item in our movies collection is now the original Alien (not Aliens, its sequel as we originally had). Removing data – DELETE The final stop on our whirlwind tour of the different REST API HTTP verbs is DELETE. It should be no surprise that sending a DELETE request should do exactly what it sounds like. Let's add another route that accepts DELETE requests and will delete an item from our movies collection. Here is the code that takes care of DELETE requests that should be placed after the existing block of code from the previous PUT: router.delete('/:id', function(req, res) {    var indexToDel = -1;    _.each(json, function(elem, index) {        if (elem.Id === req.params.id) {            indexToDel = index;        }    });    if (~indexToDel) {        json.splice(indexToDel, 1);    }    res.json(json); }); This code will loop through the collection of movies and find a matching item by comparing the values of Id. If a match is found, the array index for the matched item is held until the loop is finished. Using the array.splice function, we can remove an array item at a specific index. Once the data has been updated by removing the requested item, the JSON data is returned. Notice in the following screenshot that the updated JSON that's returned is in fact no longer displaying the original second item we deleted. Note that ~ in there! That's a little bit of JavaScript black magic! The tilde (~) in JavaScript will bit flip a value. In other words, take a value and return the negative of that value incremented by one, that is ~n === -(n+1). Typically, the tilde is used with functions that return -1 as a false response. By using ~ on -1, you are converting it to a 0. If you were to perform a Boolean check on -1 in JavaScript, it would return true. You will see ~ is used primarily with the indexOf function and jQuery's $.inArray()—both return -1 as a false response. All of the endpoints defined in this article are extremely rudimentary, and most of these should never ever see the light of day in a production environment! Whenever you have an API that accepts anything other than GET requests, you need to be sure to enforce extremely strict validation and authentication rules. After all, you are basically giving your users direct access to your data. Consuming external APIs from Node.js There will undoubtedly be a time when you want to consume an API directly from within your Node.js code. Perhaps, your own API endpoint needs to first fetch data from some other unrelated third-party API before sending a response. Whatever the reason, the act of sending a request to an external API endpoint and receiving a response can be done fairly easily using a popular and well-known npm module called Request. Request was written by Mikeal Rogers and is currently the third most popular and (most relied upon) npm module after async and underscore. Request is basically a super simple HTTP client, so everything you've been doing with Postman REST Client so far is basically what Request can do, only the resulting data is available to you in your node code as well as the response status codes and/or errors, if any. Consuming an API endpoint using Request Let's do a neat trick and actually consume our own endpoint as if it was some third-party external API. First, we need to ensure we have Request installed and can include it in our app: $ npm install --save request Next, edit server.js and make sure you include Request as a required module at the start of the file: var express = require('express'),    bodyParser = require('body-parser'),    _ = require('underscore'),    json = require('./movies.json'),    app = express(),    request = require('request'); Now let's add a new endpoint after our existing routes, which will be an endpoint accessible in our server via a GET request to /external-api. This endpoint, however, will actually consume another endpoint on another server, but for the purposes of this example, that other server is actually the same server we're currently running! The Request module accepts an options object with a number of different parameters and settings, but for this particular example, we only care about a few. We're going to pass an object that has a setting for the method (GET, POST, PUT, and so on) and the URL of the endpoint we want to consume. After the request is made and a response is received, we want an inline callback function to execute. Place the following block of code after your existing list of routes in server.js: router.get('/external-api', function(req, res) {    request({            method: 'GET',            uri: 'http://localhost:' + (process.env.PORT || 3500),        }, function(error, response, body) {             if (error) { throw error; }              var movies = [];            _.each(JSON.parse(body), function(elem, index) {                movies.push({                    Title: elem.Title,                    Rating: elem.Rating                });            });            res.json(_.sortBy(movies, 'Rating').reverse());        }); }); The callback function accepts three parameters: error, response, and body. The response object is like any other response that Express handles and has all of the various parameters as such. The third parameter, body, is what we're really interested in. That will contain the actual result of the request to the endpoint that we called. In this case, it is the JSON data from our main GET route we defined earlier that returns our own list of movies. It's important to note that the data returned from the request is returned as a string. We need to use JSON.parse to convert that string to actual usable JSON data. Using the data that came back from the request, we transform it a little bit. That is, we take that data and manipulate it a bit to suit our needs. In this example, we took the master list of movies and just returned a new collection that consists of only the title and rating of each movie and then sorts the results by the top scores. Load this new endpoint by pointing your browser to http://localhost:3500/external-api, and you can see the new transformed JSON output to the screen. Let's take a look at another example that's a little more real world. Let's say that we want to display a list of similar movies for each one in our collection, but we want to look up that data somewhere such as www.imdb.com. Here is the sample code that will send a GET request to IMDB's JSON API, specifically for the word aliens, and returns a list of related movies by the title and year. Go ahead and place this block of code after the previous route for external-api: router.get('/imdb', function(req, res) {    request({            method: 'GET',            uri: 'http://sg.media-imdb.com/suggests/a/aliens.json',        }, function(err, response, body) {            var data = body.substring(body.indexOf('(')+1);            data = JSON.parse(data.substring(0,data.length-1));            var related = [];            _.each(data.d, function(movie, index) {                related.push({                    Title: movie.l,                    Year: movie.y,                    Poster: movie.i ? movie.i[0] : ''                });            });              res.json(related);        }); }); If we take a look at this new endpoint in a browser, we can see the JSON data that's returned from our /imdb endpoint is actually itself retrieving and returning data from some other API endpoint: Note that the JSON endpoint I'm using for IMDB isn't actually from their API, but rather what they use on their homepage when you type in the main search box. This would not really be the most appropriate way to use their data, but it's more of a hack to show this example. In reality, to use their API (like most other APIs), you would need to register and get an API key that you would use so that they can properly track how much data you are requesting on a daily or an hourly basis. Most APIs will to require you to use a private key with them for this same reason. Summary In this article, we took a brief look at how APIs work in general, the RESTful API approach to semantic URL paths and arguments, and created a bare bones API. We used Postman REST Client to interact with the API by consuming endpoints and testing the different types of request methods (GET, POST, PUT, and so on). You also learned how to consume an external API endpoint by using the third-party node module Request. Resources for Article: Further resources on this subject: RESTful Services JAX-RS 2.0 [Article] REST – Where It Begins [Article] RESTful Web Services – Server-Sent Events (SSE) [Article]
Read more
  • 0
  • 0
  • 8114

article-image-mobility
Packt
19 Sep 2014
11 min read
Save for later

Mobility

Packt
19 Sep 2014
11 min read
In this article by Martyn Coupland, author of the book, Microsoft System Center Configuration Manager Advanced Deployments, we will explore some of these options and look at how they can help you manage mobility in your workforce. We'll cover the following topics: Deploying company resource profiles Managing roaming devices Integrating the Microsoft Exchange connector Using Windows Intune (For more resources related to this topic, see here.) Deploying company resource profiles One of the improvements that shipped with the R2 release of Configuration Manager 2012 was the ability to deploy company resources such as Wi-Fi profiles, certificates, and VPN profiles. This functionality really opened up the management story for organizations that already have a big take up of bring your own device or have mobility in their long-term strategy. You do not need Windows Intune to deploy company resource profiles. The company resource profiles are really useful in extending some of the services that you provide to domain-based clients using Group Policy. Some examples of this include deploying VPN and Wi-Fi profiles to domain clients using Group Policy preferences. As you cannot deploy a group policy to non-domain-joined devices, it becomes really useful to manage and deploy these via Configuration Manager. Another great use case for company resource profiles is deploying certificates. Configuration Manager includes the functionality to allow managed clients to have certificates enrolled to them. This can include those resources that rarely or never contact the domain. This scenario is becoming more common, so it is important that we have the capability to deploy these settings to users without relying on the domain. Managing Wi-Fi profiles with Configuration Manager The deployment of Wi-Fi profiles in Configuration Manager is very similar to that of a manual setup. The wizard provides you with the same options that you would expect to see should you configure the network manually within Windows. You can also configure a number of security settings, such as certificates for clients and server authentication. You can configure the following device types with Wi-Fi profiles: Windows 8.1 32-bit Windows 8.1 64-bit Windows RT 8.1 Windows Phone 8.1 iOS 5, iOS 6, and iOS 7 iOS 5, iOS 6, and iOS 7 Android devices that run Version 4 Configuring a Wi-Fi network profile in Configuration Manager is a simple process that is wizard-driven. First, in the Assets and Compliance workspace, expand Compliance Settings and Company Resource Access, and then click on Wi-Fi Profiles. Right-click on the node and select Create Wi-Fi Profile, or select this option from the home tab on the ribbon. On the general page of the wizard, provide a name for the profile. If required, you can add a description here as well. If you have exported the settings from a Windows 8.1 device, you can import them here as well. Click on Next in the Wi-Fi Profile page that you need to provide information about the network you want to connect to. Network Name is what is displayed on the users' device and so should be friendly for them. You also need to enter the SSID of the network. Make sure this is entered correctly as clients will use this to attempt to connect to the network. You can also specify other settings here, like you can in Windows, such as specifying whether we should connect if the network is not broadcasting or while the network is in range. Click on Next to continue to the security configuration page. Depending on the security, encryption, and Extensible Authentication Protocol (EAP) settings that you select, some items on this page of the wizard might not be available. As shown in the previous screenshot, the settings you configure here replicate those that you can configure in Windows when manually connecting to the network. On the Advanced Settings page of Create Wi-Fi Profile Wizard, specify any additional settings for the Wi-Fi profile. This can be the authentication mode, single sign-on options, and Federal Information Processing Standards (FIPS) compliance. If you require any proxy settings, you can also configure these on the next page as well as providing information on which platforms should process this profile. When the profile has been created, you can then right-click on the profile to deploy it to a collection. Managing certificates with Configuration Manager Deploying a certificate profile in Configuration Manager is actually a little quicker than creating a Wi-Fi profile. However, before you move on to deploying a certificate, you need some prerequisites in your environment. First, you need to deploy the Network Device Enrollment Service (NDES), which is part of the Certificate Services functionality in Windows Server. You can find guidance on deploying NDES in the Active Directory TechNet library at http://bit.ly/1kjpgxD. You must then install and configure at least one certificate registration point in the Configuration Manager hierarchy, and you can install this site system role in the central administration site or in a primary site. In the preceding screenshot, you can see the configuration screen in the wizard to deploy the certificate enrollment point in Configuration Manager. For the URL, enter the address in the https://<FQDN>/certsrv/mscep/mscep.dll format. For the root certificate, you should browse for the certificate file of your certificate authority. If you are using certificates in Configuration Manager, this will be the same certificate that you imported in the Client Communication tab in Site Settings. When this is configured on the server that runs the NDES, log on as a domain administrator and copy the files listed from the <ConfigMgrInstallationMedia>SMSSETUPPOLICYMODULEX64 folder on the Configuration Manager installation media to a folder on your server: PolicyModule.msi PolicyModuleSetup.exe On the Certificate Registration Point page, specify the URL of the certificate registration point and the virtual application name. The default virtual application name is CMCertificateRegistration. For example, if the site system server has an FQDN of scep1.contoso.com and you used the default virtual application name, specify https://scep1.contoso.com/CMCertificateRegistration. Creating certificate profiles Click on Certificate Profiles in the Assets and Compliance workspace under the Compliance Settings folder. On the General page, provide the name and description of the profile, and then provide information about the type of certificate that you want to deploy. Select the trusted CA certificate profile type if you want to deploy a trusted root certification authority (CA) or intermediate CA certificate, for example, you might want to deploy you own internal CA certificate to your own workgroup devices managed by Configuration Manager. Select the SCEP certificate profile type if you want to request a certificate for a user or device using the Simple Certificate Enrollment Protocol and the Network Device Enrollment Service role service. You will be provided with different settings depending on the option that you specify. If you select SCEP, then you will be asked about the number of retries and storage information about TPM. You can find specific information about each of the settings on the TechNet library at http://bit.ly/1n5CtZF. Configuring a trusted CA certificate is much simpler; provide the certificate settings and the destination store, as shown in the following screenshot: When you have finished configuring information on your certificate profile, select the supported platforms for the profile and continue through the wizard to create the profile. When it has been created, you can right-click on the profile to deploy it to a collection. Managing VPN profiles with Configuration Manager At a high level, the process to create VPN profiles is the same as creating Wi-Fi profiles; no prerequisites are required such as deploying certificates. Click on VPN Profiles in the Assets and Compliance workspace under the Compliance Settings folder. Create a new VPN profile, and on the initial screen, provide simple information about the profile. The following table provides an overview of which profiles are supported on which device: Connection type iOS Windows 8.1 Windows RT Windows RT 8.1 Windows Phone 8.1 Cisco AnyConnect Yes No No No No Juniper Pulse Yes Yes No Yes Yes F5 Edge Client Yes Yes No Yes Yes Dell SonicWALL Mobile Connect Yes Yes No Yes Yes Check Point Mobile VPN Yes Yes No Yes Yes Microsoft SSL (SSTP) No Yes Yes Yes No Microsoft Automatic No Yes Yes Yes No IKEv2 No Yes Yes Yes Yes PPTP Yes Yes Yes Yes No L2TP Yes Yes Yes Yes No Specific options will be required, depending on which technology you choose from the drop-down list. Ensure that the settings are specified, and move on to the profile information in the authentication method. If you require proxy settings with your VPN profile, then specify these settings on the Proxy Settings page of the wizard. See the following screenshot for an example of this screen: Continue through the wizard and select the supported profiles for the profile. When the profile is created, you can right-click on the profile and select Deploy. Managing Internet-based devices We have already looked at deploying certain company resources to those clients to whom we have very little connectivity on a regular basis. We can use Configuration Manager to manage these devices just like those domain-based clients over the Internet. This scenario works really well when the clients do not use VPN or DirectAccess, or maybe when we do not deploy a remote access solution for our remote users. This is where we can use Configuration Manager to manage clients using Internet-based client management (IBCM). How Internet-based client management works We have the ability to manage Internet-based clients in Configuration Manager by deploying certain site system roles in DMZ. By doing this, we make the management point, distribution point, and software update point Internet-facing and configure clients to connect to this while on the Internet. With these measures in place, we now have the ability to manage clients that are on the Internet, extending our management capabilities. Functionality in Internet-based client management In general, functionality will not be supported for Internet-based client management when we have to rely on network functionality that is not appropriate on a public network or relies some kind of communication with Active Directory. The following is not supported for Internet-based clients: Client push and software-update-based client deployment Automatic site assignment Network access protection Wake-On-LAN Operating system deployment Remote control Out-of-band management Software distribution for users is only supported when the Internet-based management point can authenticate the user in Active Directory using the Windows authentication. Requirements for Internet-based client management In terms of requirements, the list is fairly short but depending on your current setup, this might take a while to set up. The first seems fairly obvious, but any site system server or client must have Internet connectivity. This might mean some firewall configuration, depending on your configuration. A public key infrastructure (PKI) is also required. It must be able to deploy and manage certificates to clients that are on the Internet and site systems that are Internet-based. This does not mean deploying certificates over the public Internet. The following information can help you plan and deploy Internet-based client management in your environment: Planning for Internet-based client management (http://bit.ly/1p1qtsU) Planning for certificates (http://bit.ly/1kj9PFr) PKI certificate requirements (http://bit.ly/1hssMFM) Using Internet-based client management As the administrator, you have no additional concerns and requirements in terms of how you manage your clients when they are based on the Internet and are reporting to an Internet-facing management point. When you are administering clients that are Internet-based, you will see them report to the Internet-facing management point. This is the only thing you will see. You will see that the preceding features we listed are not working. The icon for the client in the list of devices does not change; this is one of the reasons the functionality is powerful, as it gives you many of the management capabilities you already perform on your premise devices. Lots of people will implement DirectAccess to get around the need to set up additional Configuration Manager Infrastructure and provisioning certificates. DirectAccess with the Manage Out functionality is a viable alternative. Summary In this article, we explored a number of ways in which you can manage the growing popularity of bring your own device and also look at how we can manage mobility in your user estate. We explored the deployment of profiles that contain settings for Wi-Fi profiles, VPN profiles for Windows, and other devices as well as deploying certificates via Configuration Manager. Resources for Article: Further resources on this subject: Wireless and Mobile Hacks [Article] Introduction to Mobile Forensics [Article] Getting Started with Microsoft Dynamics CRM 2013 Marketing [Article]
Read more
  • 0
  • 0
  • 2266
article-image-redis-autosuggest
Packt
18 Sep 2014
8 min read
Save for later

Redis in Autosuggest

Packt
18 Sep 2014
8 min read
In this article by Arun Chinnachamy, the author of Redis Applied Design Patterns, we are going to see how to use Redis to build a basic autocomplete or autosuggest server. Also, we will see how to build a faceting engine using Redis. To build such a system, we will use sorted sets and operations involving ranges and intersections. To summarize, we will focus on the following topics in this article: (For more resources related to this topic, see here.) Autocompletion for words Multiword autosuggestion using a sorted set Faceted search using sets and operations such as union and intersection Autosuggest systems These days autosuggest is seen in virtually all e-commerce stores in addition to a host of others. Almost all websites are utilizing this functionality in one way or another from a basic website search to programming IDEs. The ease of use afforded by autosuggest has led every major website from Google and Amazon to Wikipedia to use this feature to make it easier for users to navigate to where they want to go. The primary metric for any autosuggest system is how fast we can respond with suggestions to a user's query. Usability research studies have found that the response time should be under a second to ensure that a user's attention and flow of thought are preserved. Redis is ideally suited for this task as it is one of the fastest data stores in the market right now. Let's see how to design such a structure and use Redis to build an autosuggest engine. We can tweak Redis to suit individual use case scenarios, ranging from the simple to the complex. For instance, if we want only to autocomplete a word, we can enable this functionality by using a sorted set. Let's see how to perform single word completion and then we will move on to more complex scenarios, such as phrase completion. Word completion in Redis In this section, we want to provide a simple word completion feature through Redis. We will use a sorted set for this exercise. The reason behind using a sorted set is that it always guarantees O(log(N)) operations. While it is commonly known that in a sorted set, elements are arranged based on the score, what is not widely acknowledged is that elements with the same scores are arranged lexicographically. This is going to form the basis for our word completion feature. Let's look at a scenario in which we have the words to autocomplete: jack, smith, scott, jacob, and jackeline. In order to complete a word, we need to use n-gram. Every word needs to be written as a contiguous sequence. n-gram is a contiguous sequence of n items from a given sequence of text or speech. To find out more, check http://en.wikipedia.org/wiki/N-gram. For example, n-gram of jack is as follows: j ja jac jack$ In order to signify the completed word, we can use a delimiter such as * or $. To add the word into a sorted set, we will be using ZADD in the following way: > zadd autocomplete 0 j > zadd autocomplete 0 ja > zadd autocomplete 0 jac > zadd autocomplete 0 jack$ Likewise, we need to add all the words we want to index for autocompletion. Once we are done, our sorted set will look as follows: > zrange autocomplete 0 -1 1) "j" 2) "ja" 3) "jac" 4) "jack$" 5) "jacke" 6) "jackel" 7) "jackeli" 8) "jackelin" 9) "jackeline$" 10) "jaco" 11) "jacob$" 12) "s" 13) "sc" 14) "sco" 15) "scot" 16) "scott$" 17) "sm" 18) "smi" 19) "smit" 20) "smith$" Now, we will use ZRANK and ZRANGE operations over the sorted set to achieve our desired functionality. To autocomplete for ja, we have to execute the following commands: > zrank autocomplete jac 2 zrange autocomplete 3 50 1) "jack$" 2) "jacke" 3) "jackel" 4) "jackeli" 5) "jackelin" 6) "jackeline$" 7) "jaco" 8) "jacob$" 9) "s" 10) "sc" 11) "sco" 12) "scot" 13) "scott$" 14) "sm" 15) "smi" 16) "smit" 17) "smith$" Another example on completing smi is as follows: zrank autocomplete smi 17 zrange autocomplete 18 50 1) "smit" 2) "smith$" Now, in our program, we have to do the following tasks: Iterate through the results set. Check if the word starts with the query and only use the words with $ as the last character. Though it looks like a lot of operations are performed, both ZRANGE and ZRANK are O(log(N)) operations. Therefore, there should be virtually no problem in handling a huge list of words. When it comes to memory usage, we will have n+1 elements for every word, where n is the number of characters in the word. For M words, we will have M(avg(n) + 1) records where avg(n) is the average characters in a word. The more the collision of characters in our universe, the less the memory usage. In order to conserve memory, we can use the EXPIRE command to expire unused long tail autocomplete terms. Multiword phrase completion In the previous section, we have seen how to use the autocomplete for a single word. However, in most real world scenarios, we will have to deal with multiword phrases. This is much more difficult to achieve as there are a few inherent challenges involved: Suggesting a phrase for all matching words. For instance, the same manufacturer has a lot of models available. We have to ensure that we list all models if a user decides to search for a manufacturer by name. Order the results based on overall popularity and relevance of the match instead of ordering lexicographically. The following screenshot shows the typical autosuggest box, which you find in popular e-commerce portals. This feature improves the user experience and also reduces the spell errors: For this case, we will use a sorted set along with hashes. We will use a sorted set to store the n-gram of the indexed data followed by getting the complete title from hashes. Instead of storing the n-grams into the same sorted set, we will store them in different sorted sets. Let's look at the following scenario in which we have model names of mobile phones along with their popularity: For this set, we will create multiple sorted sets. Let's take Apple iPhone 5S: ZADD a 9 apple_iphone_5s ZADD ap 9 apple_iphone_5s ZADD app 9 apple_iphone_5s ZADD apple 9 apple_iphone_5s ZADD i 9 apple_iphone_5s ZADD ip 9 apple_iphone_5s ZADD iph 9 apple_iphone_5s ZADD ipho 9 apple_iphone_5s ZADD iphon 9 apple_iphone_5s ZADD iphone 9 apple_iphone_5s ZADD 5 9 apple_iphone_5s ZADD 5s 9 apple_iphone_5s HSET titles apple_iphone_5s "Apple iPhone 5S" In the preceding scenario, we have added every n-gram value as a sorted set and created a hash that holds the original title. Likewise, we have to add all the titles into our index. Searching in the index Now that we have indexed the titles, we are ready to perform a search. Consider a situation where a user is querying with the term apple. We want to show the user the five best suggestions based on the popularity of the product. Here's how we can achieve this: > zrevrange apple 0 4 withscores 1) "apple_iphone_5s" 2) 9.0 3) "apple_iphone_5c" 4) 6.0 As the elements inside the sorted set are ordered by the element score, we get the matches ordered by the popularity which we inserted. To get the original title, type the following command: > hmget titles apple_iphone_5s 1) "Apple iPhone 5S" In the preceding scenario case, the query was a single word. Now imagine if the user has multiple words such as Samsung nex, and we have to suggest the autocomplete as Samsung Galaxy Nexus. To achieve this, we will use ZINTERSTORE as follows: > zinterstore samsung_nex 2 samsung nex aggregate max ZINTERSTORE destination numkeys key [key ...] [WEIGHTS weight [weight ...]] [AGGREGATE SUM|MIN|MAX] This computes the intersection of sorted sets given by the specified keys and stores the result in a destination. It is mandatory to provide the number of input keys before passing the input keys and other (optional) arguments. For more information about ZINTERSTORE, visit http://redis.io/commands/ZINTERSTORE. The previous command, which is zinterstore samsung_nex 2 samsung nex aggregate max, will compute the intersection of two sorted sets, samsung and nex, and stores it in another sorted set, samsung_nex. To see the result, type the following commands: > zrevrange samsung_nex 0 4 withscores 1) samsung_galaxy_nexus 2) 7 > hmget titles samsung_galaxy_nexus 1) Samsung Galaxy Nexus If you want to cache the result for multiword queries and remove it automatically, use an EXPIRE command and set expiry for temporary keys. Summary In this article, we have seen how to perform autosuggest and faceted searches using Redis. We have also understood how sorted sets and sets work. We have also seen how Redis can be used as a backend system for simple faceting and autosuggest system and make the system ultrafast. Further resources on this subject: Using Redis in a hostile environment (Advanced) [Article] Building Applications with Spring Data Redis [Article] Implementing persistence in Redis (Intermediate) [Article] Resources used for creating the article: Credit for the featured tiger image: Big Cat Facts - Tiger
Read more
  • 0
  • 0
  • 9713

Packt
18 Sep 2014
6 min read
Save for later

Waiting for AJAX, as always…

Packt
18 Sep 2014
6 min read
In this article, by Dima Kovalenko, author of the book, Selenium Design Patterns and Best Practices, we will learn how test automation have progressed over the period of time. Test automation was simpler in the good old days, before asynchronous page loading became mainstream. Previously the test would click on a button causing the whole page to reload; after the new page load we could check if any errors were displayed. The act of waiting for the page to load guaranteed that all of the items on the page are already there, and if expected element was missing our test could fail with confidence. Now, an element might be missing for several seconds, and magically show up after an unspecified delay. The only thing for a test to do is become smarter! (For more resources related to this topic, see here.) Filling out credit card information is a common test for any online store. Let’s take a look at a typical credit card form: Our form has some default values for user to fill out, and a quick JavaScript check that the required information was entered into the field, by adding Done next to a filled out input field, like this: Once all of the fields have been filled out and seem correct, JavaScript makes the Purchase button clickable. Clicking on the button will trigger an AJAX request for the purchase, followed by successful purchase message, like this: Very simple and straight forward, anyone who has made an online purchase has seen some variation of this form. Writing a quick test to fill out the form and make sure the purchase is complete should be a breeze! Testing AJAX with sleep method Let’s take a look at a simple test, written to test this form. Our tests are written in Ruby for this demonstration for easy of readability. However, this technique will work in Java or any other programming language you may choose to use. To follow along with this article, please make sure you have Ruby and selenium-webdriver gem installed. Installers for both can be found here https://www.ruby-lang.org/en/installation/ and http://rubygems.org/gems/selenium-webdriver. Our test file starts like this: If this code looks like a foreign language to you, don’t worry we will walk through it until it all makes sense. First three lines of the test file specify all of the dependencies such as selenium-webdriver gem. On line five, we declare our test class as TestAjax which inherits its behavior from the Test::Unit framework we required on line two. The setup and teardown methods will take care of the Selenium instance for us. In the setup we create a new instance of Firefox browser and navigate to a page, which contains the mentioned form; the teardown method closes the browser after the test is complete. Now let’s look at the test itself: Lines 17 to 21 fill out the purchase form with some test data, followed by an assertion that Purchase complete! text appears in the DIV with ID of success. Let’s run this test to see if it passes. The following is the output result of our test run; as you can see it’s a failure: Our test fails because it was expecting to see Purchase complete! right here: But no text was found, because the AJAX request took a much longer time than expected. The AJAX request in progress indicator is seen here: Since this AJAX request can take anywhere from 15 to 30 seconds to complete, the most logical next step is to add a pause in between the click on the Purchase button and the test assertion; shown as follows: However, this obvious solution is really bad for two reasons: If majority of AJAX requests take 15 seconds to run, than our test is wasting another 15 seconds waiting for things instead continuing. If our test environment is under heavy load, the AJAX request can take as long as 45 seconds to complete, so our test will fail. The better choice is to make our tests smart enough to wait for AJAX request to complete, instead of using a sleep method. using smart AJAX waits To solve the shortcomings of the sleep methods we will create a new method called wait_for_ajax, seen here: In this method, we use the Wait class built into the WebDriver. The until method in the Wait class allows us to pause the test execution for an arbitrary reason. In this case to sleep for 1 second, on line 29, and to execute a JavaScript command in the browser with the help of the execute_script method. This method allows us to run a JavaScript snippet in the current browser window on the current page, which gives us access to all of the variables and methods that JavaScript has. The snippet of JavaScript that we are sending to the browser is a query against jQuery framework. The active method in jQuery returns an integer of currently active AJAX requests. Zero means that the page is fully loaded, and there are no background HTTP requests happening. On line 30, we ask the execute_script to return the current active count of AJAX requests happening on the page, and if the returned value equals 0 we break out of the Wait loop. Once the loop is broken, our tests can continue on their way. Note that the upper limit of the wait_for_ajax method is set to 60 seconds on line 28. This value can be increased or decreased, depending on how slow the test environment is. Let’s replace the sleep method call with our newly created method, shown here: And run our tests one more time, to see this passing result: Now that we stabilized our test against slow and unpredictable AJAX requests, we need to add a method that will wait for JavaScript animations to finish. These animations can break our tests just as much as the AJAX requests. Also, are tests are incredibly vulnerable due third party slowness; such as when the Facebook Like button takes long time to load. Summary This article introduced you to using a simple method that intelligently waits for all of the AJAX requests to complete, we have increased the overall stability of our test and test suite. Furthermore, we have removed a wasteful delay, which adds unnecessary delay in our test execution. In conclusion, we have improved the test stability while at the same time making our test run faster! Resources for Article: Further resources on this subject: Quick Start into Selenium Tests [article] Behavior-driven Development with Selenium WebDriver [article] Exploring Advanced Interactions of WebDriver [article]
Read more
  • 0
  • 0
  • 1171

article-image-caches
Packt
18 Sep 2014
18 min read
Save for later

Caches

Packt
18 Sep 2014
18 min read
In this article, by Federico Razzoli, author of the book Mastering MariaDB, we will see that how in order to avoid accessing disks, MariaDB and storage engines have several caches that a DBA should know about. (For more resources related to this topic, see here.) InnoDB caches Since InnoDB is the recommended engine for most use cases, configuring it is very important. The InnoDB buffer pool is a cache that should speed up most read and write operations. Thus, every DBA should know how it works. The doublewrite buffer is an important mechanism that guarantees that a row is never half-written to a file. For heavy-write workloads, we may want to disable it to obtain more speed. InnoDB pages Tables, data, and indexes are organized in pages, both in the caches and in the files. A page is a package of data that contains one or two rows and usually some empty space. The ratio between the used space and the total size of pages is called the fill factor. By changing the page size, the fill factor changes inevitably. InnoDB tries to keep the pages 15/16 full. If a page's fill factor is lower than 1/2, InnoDB merges it with another page. If the rows are written sequentially, the fill factor should be about 15/16. If the rows are written randomly, the fill factor is between 1/2 and 15/16. A low fill factor represents a memory waste. With a very high fill factor, when pages are updated and their content grows, they often need to be reorganized, which negatively affects the performance. The columns with a variable length type (TEXT, BLOB, VARCHAR, or VARBIT) are written into separate data structures called overlow pages. Such columns are called off-page columns. They are better handled by the DYNAMIC row format, which can be used for most tables when backward compatibility is not a concern. A page never changes its size, and the size is the same for all pages. The page size, however, is configurable: it can be 4 KB, 8 KB, or 16 KB. The default size is 16 KB, which is appropriate for many workloads and optimizes full table scans. However, smaller sizes can improve the performance of some OLTP workloads involving many small insertions because of lower memory allocation, or storage devices with smaller blocks (old SSD devices). Another reason to change the page size is that this can greatly affect the InnoDB compression. The page size can be changed by setting the innodb_page_size variable in the configuration file and restarting the server. The InnoDB buffer pool On servers that mainly use InnoDB tables (the most common case), the buffer pool is the most important cache to consider. Ideally, it should contain all the InnoDB data and indexes to allow MariaDB to execute queries without accessing the disks. Changes to data are written into the buffer pool first. They are flushed to the disks later to reduce the number of I/O operations. Of course, if the data does not fit the server's memory, only a subset of them can be in the buffer pool. In this case, that subset should be the so-called working set: the most frequently accessed data. The default size of the buffer pool is 128 MB and should always be changed. On production servers, this value is too low. On a developer's computer, usually, there is no need to dedicate so much memory to InnoDB. The minimum size, 5 MB, is usually more than enough when developing a simple application. Old and new pages We can think of the buffer pool as a list of data pages that are sorted with a variation of the classic Last Recently Used (LRU) algorithm. The list is split into two sublists: the new list contains the most used pages, and the old list contains the less used pages. The first page in each sublist is called the head. The head of the old list is called the midpoint. When a page is accessed that is not in the buffer pool, it is inserted into the midpoint. The other pages in the old list shift by one position, and the last one is evicted. When a page from the old list is accessed, it is moved from the old list to the head of the new list. When a page in the new list is accessed, it goes to the head of the list. The following variables affect the previously described algorithm: innodb_old_blocks_pct: This variable defines the percentage of the buffer pool reserved to the old list. The allowed range is 5 to 95, and it is 37 (3/5) by default. innodb_old_blocks_time: If this value is not 0, it represents the minimum age (in milliseconds) the old pages must reach before they can be moved into the new list. If an old page is accessed that did not reach this age, it goes to the head of the old list. innodb_max_dirty_pages_pct: This variable defines the maximum percentage of pages that were modified in-memory. This mechanism will be discussed in the Dirty pages section later in this article. This value is not a hard limit, but InnoDB tries not to exceed it. The allowed range is 0 to 100, and the default is 75. Increasing this value can reduce the rate of writes, but the shutdown will take longer (because dirty pages need to be written onto the disk before the server can be stopped in a clean way). innodb_flush_neighbors: If set to 1, when a dirty page is flushed from memory to a disk, even the contiguous pages are flushed. If set to 2, all dirty pages from the same extent (the portion of memory whose size is 1 MB) are flushed. With 0, only dirty pages are flushed when their number exceeds innodb_max_dirty_pages_pct or when they are evicted from the buffer pool. The default is 1. This optimization is only useful for spinning disks. Write-incentive workloads may need an aggressive flushing strategy; however, if the pages are written too often, they degrade the performance. Buffer pool instances On MariaDB versions older than 5.5, InnoDB creates only one instance of the buffer pool. However, concurrent threads are blocked by a mutex, and this may become a bottleneck. This is particularly true if the concurrency level is high and the buffer pool is very big. Splitting the buffer pool into multiple instances can solve the problem. Multiple instances represent an advantage only if the buffer pool size is at least 2 GB. Each instance should be of size 1 GB. InnoDB will ignore the configuration and will maintain only one instance if the buffer pool size is less than 1 GB. Furthermore, this feature is more useful on 64-bit systems. The following variables control the instances and their size: innodb_buffer_pool_size: This variable defines the total size of the buffer pool (no single instances). Note that the real size will be about 10 percent bigger than this value. A percentage of this amount of memory is dedicated to the change buffer. innodb_buffer_pool_instances: This variable defines the number of instances. If the value is -1, InnoDB will automatically decide the number of instances. The maximum value is 64. The default value is 8 on Unix and depends on the innodb_buffer_pool_size variable on Windows. Dirty pages When a user executes a statement that modifies data in the buffer pool, InnoDB initially modifies the data that is only in memory. The pages that are only modified in the buffer pool are called dirty pages. Pages that have not been modified or whose changes have been written on the disk are called as clean pages. Note that changes to data are also written to the redo log. If a crash occurs before those changes are applied to data files, InnoDB is usually able to recover the data, including the last modifications, by reading the redo log and the doublewrite buffer. The doublewrite buffer will be discussed later, in the Explaining the doublewrite buffer section. At some point, the data needs to be flushed to the InnoDB data files (the .ibd files). In MariaDB 10.0, this is done by a dedicated thread called the page cleaner. In older versions, this was done by the master thread, which executes several InnoDB maintenance operations. The flushing is not only concerned with the buffer pool, but also with the InnoDB redo and undo log. The list of dirty pages is frequently updated when transactions write data at the physical level. It has its own mutex that does not lock the whole buffer pool. The maximum number of dirty pages is determined by innodb_max_dirty_pages_pct as a percentage. When this maximum limit is reached, dirty pages are flushed. The innodb_flush_neighbor_pages value determines how InnoDB selects the pages to flush. If it is set to none, only selected pages are written. If it is set to area, even the neighboring dirty pages are written. If it is set to cont, all contiguous blocks of the dirty pages are flushed. On shutdown, a complete page flushing is only done if innodb_fast_shutdown is 0. Normally, this method should be preferred, because it leaves data in a consistent state. However, if many changes have been requested but still not written to disk, this process could be very slow. It is possible to speed up the shutdown by specifying a higher value for innodb_fast_shutdown. In this case, a crash recovery will be performed on the next restart. The read ahead optimization The read ahead feature is designed to reduce the number of read operations from the disks. It tries to guess which data will be needed in the near future and reads it with one operation. Two algorithms are available to choose the pages to read in advance: linear read ahead random read ahead The linear read ahead is used by default. It counts the pages in the buffer pool that are read sequentially. If their number is greater than or equal to innodb_read_ahead_threshold, InnoDB will read all data from the same extent (a portion of data whose size is always 1 MB). The innodb_read_ahead_threshold value must be a number from 0 to 64. The value 0 disables the linear read ahead but does not enable the random read ahead. The default value is 56. The random read ahead is only used if the innodb_random_read_ahead server variable is set to ON. By default, it is set to OFF. This algorithm checks whether at least 13 pages in the buffer pool have been read to the same extent. In this case, it does not matter whether they were read sequentially. With this variable enabled, the full extent will be read. The 13-page threshold is not configurable. If innodb_read_ahead_threshold is set to 0 and innodb_random_read_ahead is set to OFF, the read ahead optimization is completely turned off. Diagnosing the buffer pool performance MariaDB provides some tools to monitor the activities of the buffer pool and the InnoDB main thread. By inspecting these activities, a DBA can tune the relevant server variables to improve the performance. In this section, we will discuss the SHOW ENGINE INNODB STATUS SQL statement and the INNODB_BUFFER_POOL_STATS table in the information_schema database. While the latter provides more information about the buffer pool, the SHOW ENGINE INNODB STATUS output is easier to read. The INNODB_BUFFER_POOL_STATS table contains the following columns: Column name Description POOL_ID Each InnoDB buffer pool instance has a different ID. POOL_SIZE Size (in pages) of the instance. FREE_BUFFERS Number of free pages. DATABASE_PAGES Total number of data pages. OLD_DATABASE_PAGES Pages in the old list. MODIFIED_DATABASE_PAGES Dirty pages. PENDING_DECOMPRESS Number of pages that need to be decompressed. PENDING_READS Pending read operations. PENDING_FLUSH_LRU Pages in the old or new lists that need to be flushed. PENDING_FLUSH_LIST Pages in the flush list that need to flushed. PAGES_MADE_YOUNG Number of pages moved into the new list. PAGES_NOT_MADE_YOUNG Old pages that did not become young. PAGES_MADE_YOUNG_RATE Pages made young per second. This value is reset each time it is shown. PAGES_MADE_NOT_YOUNG_RATE Pages read but not made young (this happens because they do not reach the minimum age) per second. This value is reset each time it is shown. NUMBER_PAGES_READ Number of pages read from disk. NUMBER_PAGES_CREATED Number of pages created in the buffer pool. NUMBER_PAGES_WRITTEN Number of pages written to disk. PAGES_READ_RATE Pages read from disk per second. PAGES_CREATE_RATE Pages created in the buffer pool per second. PAGES_WRITTEN_RATE Pages written to disk per second. NUMBER_PAGES_GET Requests of pages that are not in the buffer pool. HIT_RATE Rate of page hits. YOUNG_MAKE_PER_THOUSAND_GETS Pages made young per thousand physical reads. NOT_YOUNG_MAKE_PER_THOUSAND_GETS Pages that remain in the old list per thousand reads. NUMBER_PAGES_READ_AHEAD Number of pages read with a read ahead operation. NUMBER_READ_AHEAD_EVICTED The number of pages read with a read ahead operation that were never used and then were evicted. READ_AHEAD_RATE Similar to NUMBER_PAGES_READ_AHEAD, but this is a per second rate. READ_AHEAD_EVICTED_RATE Similar to NUMBER_READ_AHEAD_EVICTED, but this is a per-second rate. LRU_IO_TOTAL Total number of pages read or written to disk. LRU_IO_CURRENT Pages read or written to disk within the last second. UNCOMPRESS_TOTAL Pages that have been uncompressed. UNCOMPRESS_CURRENT Pages that have been uncompressed within the last second. The per-second values are reset after they are shown. The PAGES_MADE_YOUNG_RATE and PAGES_NOT_MADE_YOUNG_RATE values show us, respectively, how often old pages become new and how much old pages are never accessed in a reasonable amount of time. If the former value is too high, the old list is probably not big enough and vice versa. Comparing READ_AHEAD_RATE and READ_AHEAD_EVICTED_RATE is useful to tune the read ahead feature. The READ_AHEAD_EVICTED_RATE value should be low, because it indicates which pages read with the read ahead operations were not useful. If their ratio is good but READ_AHEAD_RATE is low, probably the read ahead should be used more often. In this case, if the linear read ahead is used, we can try to increase or decrease innodb_read_ahead_threshold. Or, we can change the used algorithm (linear or random read ahead). The columns whose names end with _RATE better describe the current server activities. They should be examined several times a day, and during the whole week or month, perhaps with the help of one of more monitoring tools. Good, free software monitoring tools include Cacti and Nagios. The Percona Monitoring Tools package includes MariaDB (and MySQL) plugins that provide an interface to these tools. Dumping and loading the buffer pool In some cases, one may want to save the current contents of the buffer pool and reload it later. The most common case is when the server is stopped. Normally, on startup, the buffer pool is empty, and InnoDB needs to fill it with useful data. This process is called warm-up. Until the warm-up is complete, the InnoDB performance is lower than usual. Two variables help avoid the warm-up phase: innodb_buffer_pool_dump_at_shutdown and innodb_buffer_pool_load_at_startup. If their value is ON, InnoDB automatically saves the buffer pool into a file at shut down and restores it at startup. Their default value is OFF. Turning them ON can be very useful, but remember the caveats: The startup and shutdown time might be longer. In some cases, we might prefer MariaDB to start more quickly even if it is slower during warm-up. We need the disk space necessary to store the buffer pool. The user may also want to dump the buffer pool at any moment and restore it without restarting the server. This is advisable when the buffer pool is optimal and some statements are going to heavily change its contents. A common example is when a big InnoDB table is fully scanned. This happens, for example, during logical backups. A full table scan will fill the old list with non-frequently accessed data. A good way to solve the problem is to dump the buffer pool before the table scan and reload it later. This operation can be performed by setting two special variables: innodb_buffer_pool_dump_now and innodb_buffer_pool_load_now. Reading the values of these variables always returns OFF. Setting the first variable to ON forces InnoDB to immediately dump the buffer pool into a file. Setting the latter variable to ON forces InnoDB to load the buffer pool from that file. In both cases, the progress of the dump or load operation is indicated by the Innodb_buffer_pool_dump_status and Innodb_buffer_pool_load_status status variables. If loading the buffer pool takes too long, it is possible to stop it by setting innodb_buffer_pool_load_abort to ON. The name and path of the dump file is specified in the innodb_buffer_pool_filename server variable. Of course, we should be sure that the chosen directory can contain the file, but it is much smaller than the memory used by the buffer pool. InnoDB change buffer The change buffer is a cache that is a part of the buffer pool. It contains dirty pages related to secondary indexes (not primary keys) that are not stored in the main part of the buffer pool. If the modified data is read later, it will be merged into the buffer pool. In older versions, this buffer was called the insert buffer, but now it is renamed, because it can handle deletions. The change buffer speeds up the following write operations: insertions: When new rows are written. deletions: When existing row are marked for deletion but not yet physically erased for performance reasons. purges: The physical elimination of previously marked rows and obsolete index values. This is periodically done by a dedicated thread. In some cases, we may want to disable the change buffer. For example, we may have a working set that only fits the memory if the change buffer is discarded. In this case, even after disabling it, we will still have all the frequently accessed secondary indexes in the buffer pool. Also, DML statements may be rare for our database, or we may have just a few secondary indexes: in these cases, the change buffer does not help. The change buffer can be configured using the following variables: innodb_change_buffer_max_size: This is the maximum size of the change buffer, expressed as a percentage of the buffer pool. The allowed range is 0 to 50, and the default value is 25. innodb_change_buffering: This determines which types of operations are cached by the change buffer. The allowed values are none (to disable the buffer), all, inserts, deletes, purges, and changes (to cache inserts and deletes, but not purges). The all value is the default value. Explaining the doublewrite buffer When InnoDB writes a page to disk, at least two events can interrupt the operation after it is started: a hardware failure or an OS failure. In the case of an OS failure, this should not be possible if the pages are not bigger than the blocks written by the system. In this case, the InnoDB redo and undo logs are not sufficient to recover the half-written page, because they only contain pages ID's, not their data. This improves the performance. To avoid half-written pages, InnoDB uses the doublewrite buffer. This mechanism involves writing every page twice. A page is valid after the second write is complete. When the server restarts, if a recovery occurs, half-written pages are discarded. The doublewrite buffer has a small impact on performance, because the writes are sequential, and are flushed to disk together. However, it is still possible to disable the doublewrite buffer by setting the innodb_doublewrite variable to OFF in the configuration file or by starting the server with the --skip-innodb-doublewrite parameter. This can be done if data correctness is not important. If performance is very important, and we use a fast storage device, we may note the overhead caused by the additional disk writes. But if data correctness is important to us, we do not want to simply disable it. MariaDB provides an alternative mechanism called atomic writes. These writes are like a transaction: they completely succeed or they completely fail. Half-written data is not possible. However, MariaDB does not directly implement this mechanism, so it can only be used on FusionIO storage devices using the DirectFS filesystem. FusionIO flash memories are very fast flash memories that can be used as block storage or DRAM memory. To enable this alternative mechanism, we can set innodb_use_atomic_writes to ON. This automatically disables the doublewrite buffer. Summary In this article, we discussed the main MariaDB buffers. The most important ones are the caches used by the storage engine. We dedicated much space to the InnoDB buffer pool, because it is more complex and, usually, InnoDB is the most used storage engine. Resources for Article:  Further resources on this subject: Building a Web Application with PHP and MariaDB – Introduction to caching [article] Installing MariaDB on Windows and Mac OS X [article] Using SHOW EXPLAIN with running queries [article]
Read more
  • 0
  • 0
  • 2186
article-image-index-item-sharding-and-projection-dynamodb
Packt
17 Sep 2014
13 min read
Save for later

Index, Item Sharding, and Projection in DynamoDB

Packt
17 Sep 2014
13 min read
Understanding the secondary index and projections should go hand in hand because of the fact that a secondary index cannot be used efficiently without specifying projection. In this article by Uchit Vyas and Prabhakaran Kuppusamy, authors of DynamoDB Applied Design Patterns, we will take a look at local and global secondary indexes, and projection and its usage with indexes. (For more resources related to this topic, see here.) The use of projection in DynamoDB is pretty much similar to that of traditional databases. However, here are a few things to watch out for: Whenever a DynamoDB table is created, it is mandatory to create a primary key, which can be of a simple type (hash type), or it can be of a complex type (hash and range key). For the specified primary key, an index will be created (we call this index the primary index). Along with this primary key index, the user is allowed to create up to five secondary indexes per table. There are two kinds of secondary index. The first is a local secondary index (in which the hash key of the index must be the same as that of the table) and the second is the global secondary index (in which the hash key can be any field). In both of these secondary index types, the range key can be a field that the user needs to create an index for. Secondary indexes A quick question: while writing a query in any database, keeping the primary key field as part of the query (especially in the where condition) will return results much faster compared to the other way. Why? This is because of the fact that an index will be created automatically in most of the databases for the primary key field. This the case with DynamoDB also. This index is called the primary index of the table. There is no customization possible using the primary index, so the primary index is seldom discussed. In order to make retrieval faster, the frequently-retrieved attributes need to be made as part of the index. However, a DynamoDB table can have only one primary index and the index can have a maximum of two attributes (hash and range key). So for faster retrieval, the user should be given privileges to create user-defined indexes. This index, which is created by the user, is called the secondary index. Similar to the table key schema, the secondary index also has a key schema. Based on the key schema attributes, the secondary index can be either a local or global secondary index. Whenever a secondary index is created, during every item insertion, the items in the index will be rearranged. This rearrangement will happen for each item insertion into the table, provided the item contains both the index's hash and range key attribute. Projection Once we have an understanding of the secondary index, we are all set to learn about projection. While creating the secondary index, it is mandatory to specify the hash and range attributes based on which the index is created. Apart from these two attributes, if the query wants one or more attribute (assuming that none of these attributes are projected into the index), then DynamoDB will scan the entire table. This will consume a lot of throughput capacity and will have comparatively higher latency. The following is the table (with some data) that is used to store book information: Here are few more details about the table: The BookTitle attribute is the hash key of the table and local secondary index The Edition attribute is the range key of the table The PubDate attribute is the range key of the index (let's call this index IDX_PubDate) Local secondary index While creating the secondary index, the hash and range key of the table and index will be inserted into the index; optionally, the user can specify what other attributes need to be added. There are three kinds of projection possible in DynamoDB: KEYS_ONLY: Using this, the index consists of the hash and range key values of the table and index INCLUDE: Using this, the index consists of attributes in KEYS_ONLY plus other non-key attributes that we specify ALL: Using this, the index consists of all of the attributes from the source table The following code shows the creation of a local secondary index named Idx_PubDate with BookTitle as the hash key (which is a must in the case of a local secondary index), PubDate as the range key, and using the KEYS_ONLY projection: private static LocalSecondaryIndex getLocalSecondaryIndex() { ArrayList<KeySchemaElement> indexKeySchema =    newArrayList<KeySchemaElement>(); indexKeySchema.add(new KeySchemaElement()    .withAttributeName("BookTitle")    .withKeyType(KeyType.HASH)); indexKeySchema.add(new KeySchemaElement()    .withAttributeName("PubDate")    .withKeyType(KeyType.RANGE)); LocalSecondaryIndex lsi = new LocalSecondaryIndex()    .withIndexName("Idx_PubDate")    .withKeySchema(indexKeySchema)    .withProjection(new Projection()    .withProjectionType("KEYS_ONLY")); return lsi; } The usage of the KEYS_ONLY index type will create the smallest possible index and the usage of ALL will create the biggest possible index. We will discuss the trade-offs between these index types a little later. Going back to our example, let us assume that we are using the KEYS_ONLY index type, so none of the attributes (other than the previous three key attributes) are projected into the index. So the index will look as follows: You may notice that the row order of the index is almost the same as that of the table order (except the second and third rows). Here, you can observe one point: the table records will be grouped primarily based on the hash key, and then the records that have the same hash key will be ordered based on the range key of the index. In the case of the index, even though the table's range key is part of the index attribute, it will not play any role in the ordering (only the index's hash and range keys will take part in the ordering). There is a negative in this approach. If the user is writing a query using this index to fetch BookTitle and Publisher with PubDate as 28-Dec-2008, then what happens? Will DynamoDB complain that the Publisher attribute is not projected into the index? The answer is no. The reason is that even though Publisher is not projected into the index, we can still retrieve it using the secondary index. However, retrieving a nonprojected attribute will scan the entire table. So if we are sure that certain attributes need to be fetched frequently, then we must project it into the index; otherwise, it will consume a large number of capacity units and retrieval will be much slower as well. One more question: if the user is writing a query using the local secondary index to fetch BookTitle and Publisher with PubDate as 28-Dec-2008, then what happens? Will DynamoDB complain that the PubDate attribute is not part of the primary key and hence queries are not allowed on nonprimary key attributes? The answer is no. It is a rule of thumb that we can write queries on the secondary index attributes. It is possible to include nonprimary key attributes as part of the query, but these attributes must at least be key attributes of the index. The following code shows how to add non-key attributes to the secondary index's projection: private static Projection getProjectionWithNonKeyAttr() { Projection projection = new Projection()    .withProjectionType(ProjectionType.INCLUDE); ArrayList<String> nonKeyAttributes = new ArrayList<String>(); nonKeyAttributes.add("Language"); nonKeyAttributes.add("Author2"); projection.setNonKeyAttributes(nonKeyAttributes); return projection; } There is a slight limitation with the local secondary index. If we write a query on a non-key (both table and index) attribute, then internally DynamoDB might need to scan the entire table; this is inefficient. For example, consider a situation in which we need to retrieve the number of editions of the books in each and every language. Since both of the attributes are non-key, even if we create a local secondary index with either of the attributes (Edition and Language), the query will still result in a scan operation on the entire table. Global secondary index A problem arises here: is there any way in which we can create a secondary index using both the index keys that are different from the table's primary keys? The answer is the global secondary index. The following code shows how to create the global secondary index for this scenario: private static GlobalSecondaryIndex getGlobalSecondaryIndex() { GlobalSecondaryIndex gsi = new GlobalSecondaryIndex()    .withIndexName("Idx_Pub_Edtn")    .withProvisionedThroughput(new ProvisionedThroughput()    .withReadCapacityUnits((long) 1)    .withWriteCapacityUnits((long) 1))    .withProjection(newProjection().withProjectionType      ("KEYS_ONLY"));   ArrayList<KeySchemaElement> indexKeySchema1 =    newArrayList<KeySchemaElement>();   indexKeySchema1.add(new KeySchemaElement()    .withAttributeName("Language")    .withKeyType(KeyType.HASH)); indexKeySchema1.add(new KeySchemaElement()    .withAttributeName("Edition")    .withKeyType(KeyType.RANGE));   gsi.setKeySchema(indexKeySchema1); return gsi; } While deciding the attributes to be projected into a global secondary index, there are trade-offs we must consider between provisioned throughput and storage costs. A few of these are listed as follows: If our application doesn't need to query a table so often and it performs frequent writes or updates against the data in the table, then we must consider projecting the KEYS_ONLY attributes. The global secondary index will be minimum size, but it will still be available when required for the query activity. The smaller the index, the cheaper the cost to store it and our write costs will be cheaper too. If we need to access only those few attributes that have the lowest possible latency, then we must project only those (lesser) attributes into a global secondary index. If we need to access almost all of the non-key attributes of the DynamoDB table on a frequent basis, we can project these attributes (even the entire table) into the global secondary index. This will give us maximum flexibility with the trade-off that our storage cost would increase, or even double if we project the entire table's attributes into the index. The additional storage costs to store the global secondary index might equalize the cost of performing frequent table scans. If our application will frequently retrieve some non-key attributes, we must consider projecting these non-key attributes into the global secondary index. Item sharding Sharding, also called horizontal partitioning, is a technique in which rows are distributed among the database servers to perform queries faster. In the case of sharding, a hash operation will be performed on the table rows (mostly on one of the columns) and, based on the hash operation output, the rows will be grouped and sent to the proper database server. Take a look at the following diagram: As shown in the previous diagram, if all the table data (only four rows and one column are shown for illustration purpose) is stored in a single database server, the read and write operations will become slower and the server that has the frequently accessed table data will work more compared to the server storing the table data that is not accessed frequently. The following diagram shows the advantage of sharding over a multitable, multiserver database environment: In the previous diagram, two tables (Tbl_Places and Tbl_Sports) are shown on the left-hand side with four sample rows of data (Austria.. means only the first column of the first item is illustrated and all other fields are represented by ..).We are going to perform a hash operation on the first column only. In DynamoDB, this hashing will be performed automatically. Once the hashing is done, similar hash rows will be saved automatically in different servers (if necessary) to satisfy the specified provisioned throughput capacity. Have you ever wondered about the importance of the hash type key while creating a table (which is mandatory)? Of course we all know the importance of the range key and what it does. It simply sorts items based on the range key value. So far, we might have been thinking that the range key is more important than the hash key. If you think that way, then you may be correct, provided we neither need our table to be provisioned faster nor do we need to create any partitions for our table. As long as the table data is smaller, the importance of the hash key will be realized only while writing a query operation. However, once the table grows, in order to satisfy the same provision throughput, DynamoDB needs to partition our table data based on this hash key (as shown in the previous diagram). This partitioning of table items based on the hash key attribute is called sharding. It means the partitions are created by splitting items and not attributes. This is the reason why a query that has the hash key (of table and index) retrieves items much faster. Since the number of partitions is managed automatically by DynamoDB, we cannot just hope for things to work fine. We also need to keep certain things in mind, for example, the hash key attribute should have more distinct values. To simplify, it is not advisable to put binary values (such as Yes or No, Present or Past or Future, and so on) into the hash key attributes, thereby restricting the number of partitions. If our hash key attribute has either Yes or No values in all the items, then DynamoDB can create only a maximum of two partitions; therefore, the specified provisioned throughput cannot be achieved. Just consider that we have created a table called Tbl_Sports with a provisioned throughput capacity of 10, and then we put 10 items into the table. Assuming that only a single partition is created, we are able to retrieve 10 items per second. After a point of time, we put 10 more items into the table. DynamoDB will create another partition (by hashing over the hash key), thereby satisfying the provisioned throughput capacity. There is a formula taken from the AWS site: Total provisioned throughput/partitions = throughput per partition OR No. of partitions = Total provisioned throughput/throughput per partition In order to satisfy throughput capacity, the other parameters will be automatically managed by DynamoDB. Summary In this article, we saw what the local and global secondary indexes are. We walked through projection and its usage with indexes. Resources for Article: Further resources on this subject: Comparative Study of NoSQL Products [Article] Ruby with MongoDB for Web Development [Article] Amazon DynamoDB - Modelling relationships, Error handling [Article]
Read more
  • 0
  • 0
  • 10079

Packt
17 Sep 2014
12 min read
Save for later

What is REST?

Packt
17 Sep 2014
12 min read
This article by Bhakti Mehta, the author of Restful Java Patterns and Best Practices, starts with the basic concepts of REST, how to design RESTful services, and best practices around designing REST resources. It also covers the architectural aspects of REST. (For more resources related to this topic, see here.) Where REST has come from The confluence of social networking, cloud computing, and era of mobile applications creates a generation of emerging technologies that allow different networked devices to communicate with each other over the Internet. In the past, there were traditional and proprietary approaches for building solutions encompassing different devices and components communicating with each other over a non-reliable network or through the Internet. Some of these approaches such as RPC, CORBA, and SOAP-based web services, which evolved as different implementations for Service Oriented Architecture (SOA) required a tighter coupling between components along with greater complexities in integration. As the technology landscape evolves, today’s applications are built on the notion of producing and consuming APIs instead of using web frameworks that invoke services and produce web pages. This requirement enforces the need for easier exchange of information between distributed services along with predictable, robust, well-defined interfaces. API based architecture enables agile development, easier adoption and prevalence, scale and integration with applications within and outside the enterprise HTTP 1.1 is defined in RFC 2616, and is ubiquitously used as the standard protocol for distributed, collaborative and hypermedia information systems. Representational State Transfer (REST) is inspired by HTTP and can be used wherever HTTP is used. The widespread adoption of REST and JSON opens up the possibilities of applications incorporating and leveraging functionality from other applications as needed. Popularity of REST is mainly because it enables building lightweight, simple, cost-effective modular interfaces, which can be consumed by a variety of clients. This article covers the following topics Introduction to REST Safety and Idempotence HTTP verbs and REST Best practices when designing RESTful services REST architectural components Introduction to REST REST is an architectural style that conforms to the Web Standards like using HTTP verbs and URIs. It is bound by the following principles. All resources are identified by the URIs. All resources can have multiple representations All resources can be accessed/modified/created/deleted by standard HTTP methods. There is no state on the server. REST is extensible due to the use of URIs for identifying resources. For example, a URI to represent a collection of book resources could look like this: http://foo.api.com/v1/library/books A URI to represent a single book identified by its ISBN could be as follows: http://foo.api.com/v1/library/books/isbn/12345678 A URI to represent a coffee order resource could be as follows: http://bar.api.com/v1/coffees/orders/1234 A user in a system can be represented like this: http://some.api.com/v1/user A URI to represent all the book orders for a user could be: http://bar.api.com/v1/user/5034/book/orders All the preceding samples show a clear readable pattern, which can be interpreted by the client. All these resources could have multiple representations. These resource examples shown here can be represented by JSON or XML and can be manipulated by HTTP methods: GET, PUT, POST, and DELETE. The following table summarizes HTTP Methods and descriptions for the actions taken on the resource with a simple example of a collection of books in a library. HTTP method Resource URI Description GET /library/books Gets a list of books GET /library/books/isbn/12345678 Gets a book identified by ISBN “12345678” POST /library/books Creates a new book order DELETE /library/books/isbn/12345678 Deletes a book identified by ISBN “12345678” PUT /library/books/isbn/12345678 Updates a specific book identified by ISBN “12345678’ PATCH /library/books/isbn/12345678 Can be used to do partial update for a book identified by ISBN “12345678” REST and statelessness REST is bound by the principle of statelessness. Each request from the client to the server must have all the details to understand the request. This helps to improve visibility, reliability and scalability for requests. Visibility is improved, as the system monitoring the requests does not have to look beyond one request to get details. Reliability is improved, as there is no check-pointing/resuming to be done in case of partial failures. Scalability is improved, as the number of requests that can be processed is increases as the server is not responsible for storing any state. Roy Fielding’s dissertation on the REST architectural style provides details on the statelessness of REST, check http://www.ics.uci.edu/~fielding/pubs/dissertation/rest_arch_style.htm With this initial introduction to basics of REST, we shall cover the different maturity levels and how REST falls in it in the following section. Richardson Maturity Model Richardson maturity model is a model, which is developed by Leonard Richardson. It talks about the basics of REST in terms of resources, verbs and hypermedia controls. The starting point for the maturity model is to use HTTP layer as the transport. Level 0 – Remote Procedure Invocation This level contains SOAP or XML-RPC sending data as POX (Plain Old XML). Only POST methods are used. This is the most primitive way of building SOA applications with a single method POST and using XML to communicate between services. Level 1 – REST resources This uses POST methods and instead of using a function and passing arguments uses the REST URIs. So it still uses only one HTTP method. It is better than Level 0 that it breaks a complex functionality into multiple resources with one method. Level 2 – more HTTP verbs This level uses other HTTP verbs like GET, HEAD, DELETE, PUT along with POST methods. Level 2 is the real use case of REST, which advocates using different verbs based on the HTTP request methods and the system can have multiple resources. Level 3 – HATEOAS Hypermedia as the Engine of Application State (HATEOAS) is the most mature level of Richardson’s model. The responses to the client requests, contains hypermedia controls, which can help the client decide what the next action they can take. Level 3 encourages easy discoverability and makes it easy for the responses to be self- explanatory. Safety and Idempotence This section discusses in details about what are safe and idempotent methods. Safe methods Safe methods are methods that do not change the state on the server. GET and HEAD are safe methods. For example GET /v1/coffees/orders/1234 is a safe method. Safe methods can be cached. PUT method is not safe as it will create or modify a resource on the server. POST method is not safe for the same reasons. DELETE method is not safe as it deletes a resource on the server. Idempotent methods An idempotent method is a method that will produce the same results irrespective of how many times it is called. For example GET method is idempotent, as multiple calls to the GET resource will always return the same response. PUT method is idempotent as calling PUT method multiple times will update the same resource and not change the outcome. POST is not idempotent and calling POST method multiple times can have different results and will result in creating new resources. DELETE is idempotent because once the resource is deleted it is gone and calling the method multiple times will not change the outcome. HTTP verbs and REST HTTP verbs inform the server what to do with the data sent as part of the URL GET GET is the simplest verb of HTTP, which enables to get access to a resource. Whenever the client clicks a URL in the browser it sends a GET request to the address specified by the URL. GET is safe and idempotent. GET requests are cached. Query parameters can be used in GET requests. For example a simple GET request is as follows: curl http://api.foo.com/v1/user/12345 POST POST is used to create a resource. POST requests are neither idempotent nor safe. Multiple invocations of the POST requests can create multiple resources. POST requests should invalidate a cache entry if exists. Query parameters with POST requests are not encouraged For example a POST request to create a user can be curl –X POST -d’{“name”:”John Doe”,“username”:”jdoe”, “phone”:”412-344-5644”} http://api.foo.com/v1/user PUT PUT is used to update a resource. PUT is idempotent but not safe. Multiple invocations of PUT requests should produce the same results by updating the resource. PUT requests should invalidate the cache entry if exists. For example a PUT request to update a user can be curl –X PUT -d’{ “phone”:”413-344-5644”} http://api.foo.com/v1/user DELETE DELETE is used to delete a resource. DELETE is idempotent but not safe. DELETE is idempotent because based on the RFC 2616 "the side effects of N > 0 requests is the same as for a single request". This means once the resource is deleted calling DELETE multiple times will get the same response. For example, a request to delete a user is as follows: curl –X DELETE http://foo.api.com/v1/user/1234 HEAD HEAD is similar like GET request. The difference is that only HTTP headers are returned and no content is returned. HEAD is idempotent and safe. For example, a request to send HEAD request with curl is as follows: curl –X HEAD http://foo.api.com/v1/user It can be useful to send a HEAD request to see if the resource has changed before trying to get a large representation using a GET request. PUT vs POST According to RFC the difference between PUT and POST is in the Request URI. The URI identified by POST defines the entity that will handle the POST request. The URI in the PUT request includes the entity in the request. So POST /v1/coffees/orders means to create a new resource and return an identifier to describe the resource In contrast PUT /v1/coffees/orders/1234 means to update a resource identified by “1234” if it does not exist else create a new order and use the URI orders/1234 to identify it. Best practices when designing resources This section highlights some of the best practices when designing RESTful resources: The API developer should use nouns to understand and navigate through resources and verbs with the HTTP method. For example the URI /user/1234/books is better than /user/1234/getBook. Use associations in the URIs to identify sub resources. For example to get the authors for book 5678 for user 1234 use the following URI /user/1234/books/5678/authors. For specific variations use query parameters. For example to get all the books with 10 reviews /user/1234/books?reviews_counts=10. Allow partial responses as part of query parameters if possible. An example of this case is to get only the name and age of a user, the client can specify, ?fields as a query parameter and specify the list of fields which should be sent by the server in the response using the URI /users/1234?fields=name,age. Have defaults for the output format for the response incase the client does not specify which format it is interested in. Most API developers choose to send json as the default response mime type. Have camelCase or use _ for attribute names. Support a standard API for count for example users/1234/books/count in case of collections so the client can get the idea of how many objects can be expected in the response. This will also help the client, with pagination queries. Support a pretty printing option users/1234?pretty_print. Also it is a good practice to not cache queries with pretty print query parameter. Avoid chattiness by being as verbose as possible in the response. This is because if the server does not provide enough details in the response the client needs to make more calls to get additional details. That is a waste of network resources as well as counts against the client’s rate limits. REST architecture components This section will cover the various components that must be considered when building RESTful APIs As seen in the preceding screenshot, REST services can be consumed from a variety of clients and applications running on different platforms and devices like mobile devices, web browsers etc. These requests are sent through a proxy server. The HTTP requests will be sent to the resources and based on the various CRUD operations the right HTTP method will be selected. On the response side there can be Pagination, to ensure the server sends a subset of results. Also the server can do Asynchronous processing thus improving responsiveness and scale. There can be links in the response, which deals with HATEOAS. Here is a summary of the various REST architectural components: HTTP requests use REST API with HTTP verbs for the uniform interface constraint Content negotiation allows selecting a representation for a response when there are multiple representations available. Logging helps provide traceability to analyze and debug issues Exception handling allows sending application specific exceptions with HTTP codes Authentication and authorization with OAuth2.0 gives access control to other applications, to take actions without the user having to send their credentials Validation provides support to send back detailed messages with error codes to the client as well as validations for the inputs received in the request. Rate limiting ensures the server is not burdened with too many requests from single client Caching helps to improve application responsiveness. Asynchronous processing enables the server to asynchronously send back the responses to the client. Micro services which comprises breaking up a monolithic service into fine grained services HATEOAS to improve usability, understandability and navigability by returning a list of links in the response Pagination to allow clients to specify items in a dataset that they are interested in. The REST Architectural components in the image can be chained one after the other as shown priorly. For example, there can be a filter chain, consisting of filters related with Authentication, Rate limiting, Caching, and Logging. This will take care of authenticating the user, checking if the requests from the client are within rate limits, then a caching filter which can check if the request can be served from the cache respectively. This can be followed by a logging filter, which can log the details of the request. For more details, check RESTful Patterns and best practices.
Read more
  • 0
  • 0
  • 2615
Modal Close icon
Modal Close icon