Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletter Hub
Free Learning
Arrow right icon
timer SALE ENDS IN
0 Days
:
00 Hours
:
00 Minutes
:
00 Seconds

How-To Tutorials

7014 Articles
article-image-classifier-construction
Packt
01 Jun 2016
8 min read
Save for later

Classifier Construction

Packt
01 Jun 2016
8 min read
In this article by Pratik Joshi, author of the book Python Machine Learning Cookbook, we will build a simple classifier using supervised learning, and then go onto build a logistic-regression classifier. Building a simple classifier In the field of machine learning, classification refers to the process of using the characteristics of data to separate it into a certain number of classes. A supervised learning classifier builds a model using labeled training data, and then uses this model to classify unknown data. Let's take a look at how to build a simple classifier. (For more resources related to this topic, see here.) How to do it… Before we begin, make sure thatyou have imported thenumpy and matplotlib.pyplot packages. After this, let's create some sample data: X = np.array([[3,1], [2,5], [1,8], [6,4], [5,2], [3,5], [4,7], [4,-1]]) Let's assign some labels to these points: y = [0, 1, 1, 0, 0, 1, 1, 0] As we have only two classes, the list y contains 0s and 1s. In general, if you have N classes, then the values in y will range from 0 to N-1. Let's separate the data into classes that are based on the labels: class_0 = np.array([X[i] for i in range(len(X)) if y[i]==0]) class_1 = np.array([X[i] for i in range(len(X)) if y[i]==1]) To get an idea about our data, let's plot this, as follows: plt.figure() plt.scatter(class_0[:,0], class_0[:,1], color='black', marker='s') plt.scatter(class_1[:,0], class_1[:,1], color='black', marker='x') This is a scatterplot where we use squares and crosses to plot the points. In this context,the marker parameter specifies the shape that you want to use. We usesquares to denote points in class_0 and crosses to denote points in class_1. If you run this code, you will see the following figure: In the preceding two lines, we just use the mapping between X and y to create two lists. If you were asked to inspect the datapoints visually and draw a separating line, what would you do? You would simply draw a line in between them. Let's go ahead and do this: line_x = range(10) line_y = line_x We just created a line with the mathematical equation,y = x. Let's plot this, as follows: plt.figure() plt.scatter(class_0[:,0], class_0[:,1], color='black', marker='s') plt.scatter(class_1[:,0], class_1[:,1], color='black', marker='x') plt.plot(line_x, line_y, color='black', linewidth=3) plt.show() If you run this code, you should see the following figure: There's more… We built a really simple classifier using the following rule: the input point (a, b) belongs to class_0 if a is greater than or equal tob;otherwise, it belongs to class_1. If you inspect the points one by one, you will see that this is true. This is it! You just built a linear classifier that can classify unknown data. It's a linear classifier because the separating line is a straight line. If it's a curve, then it becomes a nonlinear classifier. This formation worked fine because there were a limited number of points, and we could visually inspect them. What if there are thousands of points? How do we generalize this process? Let's discuss this in the next section. Building a logistic regression classifier Despite the word regression being present in the name, logistic regression is actually used for classification purposes. Given a set of datapoints, our goal is to build a model that can draw linear boundaries between our classes. It extracts these boundaries by solving a set of equations derived from the training data. Let's see how to do that in Python: We will use the logistic_regression.pyfile that is already provided to you as a reference. Assuming that you have imported the necessary packages, let's create some sample data along with training labels: X = np.array([[4, 7], [3.5, 8], [3.1, 6.2], [0.5, 1], [1, 2], [1.2, 1.9], [6, 2], [5.7, 1.5], [5.4, 2.2]]) y = np.array([0, 0, 0, 1, 1, 1, 2, 2, 2]) Here, we assume that we have three classes. Let's initialize the logistic regression classifier: classifier = linear_model.LogisticRegression(solver='liblinear', C=100) There are a number of input parameters that can be specified for the preceding function, but a couple of important ones are solver and C. The solverparameter specifies the type of solver that the algorithm will use to solve the system of equations. The C parameter controls the regularization strength. A lower value indicates higher regularization strength. Let's train the classifier: classifier.fit(X, y) Let's draw datapoints and boundaries: plot_classifier(classifier, X, y) We need to define this function: def plot_classifier(classifier, X, y):     # define ranges to plot the figure     x_min, x_max = min(X[:, 0]) - 1.0, max(X[:, 0]) + 1.0     y_min, y_max = min(X[:, 1]) - 1.0, max(X[:, 1]) + 1.0 The preceding values indicate the range of values that we want to use in our figure. These values usually range from the minimum value to the maximum value present in our data. We add some buffers, such as 1.0 in the preceding lines, for clarity. In order to plot the boundaries, we need to evaluate the function across a grid of points and plot it. Let's go ahead and define the grid: # denotes the step size that will be used in the mesh grid     step_size = 0.01       # define the mesh grid     x_values, y_values = np.meshgrid(np.arange(x_min, x_max, step_size), np.arange(y_min, y_max, step_size)) The x_values and y_valuesvariables contain the grid of points where the function will be evaluated. Let's compute the output of the classifier for all these points: # compute the classifier output     mesh_output = classifier.predict(np.c_[x_values.ravel(), y_values.ravel()])       # reshape the array     mesh_output = mesh_output.reshape(x_values.shape) Let's plot the boundaries using colored regions: # Plot the output using a colored plot     plt.figure()       # choose a color scheme     plt.pcolormesh(x_values, y_values, mesh_output, cmap=plt.cm.Set1) This is basically a 3D plotter that takes the 2D points and the associated values to draw different regions using a color scheme. You can find all the color scheme options athttp://matplotlib.org/examples/color/colormaps_reference.html. Let's overlay the training points on the plot: plt.scatter(X[:, 0], X[:, 1], c=y, edgecolors='black', linewidth=2, cmap=plt.cm.Paired)       # specify the boundaries of the figure     plt.xlim(x_values.min(), x_values.max())     plt.ylim(y_values.min(), y_values.max())       # specify the ticks on the X and Y axes     plt.xticks((np.arange(int(min(X[:, 0])-1), int(max(X[:, 0])+1), 1.0)))     plt.yticks((np.arange(int(min(X[:, 1])-1), int(max(X[:, 1])+1), 1.0)))       plt.show() Here, plt.scatter plots the points on the 2D graph. TheX[:, 0] specifies that we should take all the values along axis 0 (X-axis in our case), and X[:, 1] specifies axis 1 (Y-axis). The c=y parameter indicates the color sequence. We use the target labels to map to colors using cmap. We basically want different colors based on the target labels; hence, we use y as the mapping. The limits of the display figure are set using plt.xlim and plt.ylim. In order to mark the axes with values, we need to use plt.xticks and plt.yticks. These functions mark the axes with values so that it's easier for us to see where the points are located. In the preceding code, we want the ticks to lie between the minimum and maximum values with a buffer of 1 unit. We also want these ticks to be integers. So, we use theint() function to round off the values. If you run this code, you should see the following output: Let's see how the Cparameter affects our model. The C parameter indicates the penalty for misclassification. If we set this to 1.0, we will get the following figure: If we set C to 10000, we get the following figure: As we increase C, there is a higher penalty for misclassification. Hence, the boundaries get more optimal. Summary We successfully employed supervised learning to build a simple classifier. We subsequently went on to construct a logistic-regression classifier and saw different results of tweaking C—the regularization strength parameter. Resources for Article:   Further resources on this subject: Python Scripting Essentials [article] Web scraping with Python (Part 2) [article] Web Server Development [article]
Read more
  • 0
  • 0
  • 2323

article-image-webhooks-slack
Packt
01 Jun 2016
11 min read
Save for later

Webhooks in Slack

Packt
01 Jun 2016
11 min read
In this article by Paul Asjes, the author of the book, Building Slack Bots, we'll have a look at webhooks in Slack. (For more resources related to this topic, see here.) Slack is a great way of communicating at your work environment—it's easy to use, intuitive, and highly extensible. Did you know that you can make Slack do even more for you and your team by developing your own bots? This article will teach you how to implement incoming and outgoing webhooks for Slack, supercharging your Slack team into even greater levels of productivity. The programming language we'll use here is JavaScript; however, webhooks can be programmed with any language capable of HTTP requests. Webhooks First let's talk basics: a webhook is a way of altering or augmenting a web application through HTTP methods. Webhooks allow us to post messages to and from Slack using regular HTTP requests with a JSON payloads. What makes a webhook a bot is its ability to post messages to Slack as if it were a bot user. These webhooks can be divided into incoming and outgoing webhooks, each with their own purposes and uses. Incoming webhooks An example of an incoming webhook is a service that relays information from an external source to a Slack channel without being explicitly requested, such as GitHub Slack integration: The GitHub integration posts messages about repositories we are interested in In the preceding screenshot, we see how a message was sent to Slack after a new branch was made on a repository this team was watching. This data wasn't explicitly requested by a team member but automatically sent to the channel as a result of the incoming webhook. Other popular examples include Jenkins integration, where infrastructure changes can be monitored in Slack (for example, if a server watched by Jenkins goes down, a warning message can be posted immediately to a relevant Slack channel). Let's start with setting up an incoming webhook that sends a simple "Hello world" message: First, navigate to the Custom Integration Slack team page, as shown in the following screenshot (https://my.slack.com/apps/build/custom-integration): The various flavors of custom integrations Select Incoming WebHooks from the list and then select which channel you'd like your webhook app to post messages to: Webhook apps will post to a channel of your choosing Once you've clicked on the Add Incoming WebHooks integration button, you will be presented with this options page, which allows you to customize your integration a little further: Names, descriptions, and icons can be set from this menu Set a customized icon for your integration (for this example, the wave emoji was used) and copy down the webhook URL, which has the following format:https://hooks.slack.com/services/T00000000/B00000000/XXXXXXXXXXXXXXXXXXXXXXXX This generated URL is unique to your team, meaning that any JSON payloads sent via this URL will only appear in your team's Slack channels. Now, let's throw together a quick test of our incoming webhook in Node. Start a new Node project (remember: you can use npm init to create your package.json file) and install the superagent AJAX library by running the following command in your terminal: npm install superagent –save Create a file named index.js and paste the following JavaScript code within it: const WEBHOOK_URL = [YOUR_WEBHOOK_URL]; const request = require('superagent'); request .post(WEBHOOK_URL) .send({ text: 'Hello! I am an incoming Webhook bot!' }) .end((err, res) => { console.log(res); }); Remember to replace [YOUR_WEBHOOK_URL] with your newly generated URL, and then run the program by executing the following command: nodemon index.js Two things should happen now: firstly, a long response should be logged in your terminal, and secondly, you should see a message like the following in the Slack client: The incoming webhook equivalent of "hello world" The res object we logged in our terminal is the response from the AJAX request. Taking the form of a large JavaScript object, it displays information about the HTTP POST request we made to our webhook URL. Looking at the message received in the Slack client, notice how the name and icon are the same ones we set in our integration setup on the team admin site. Remember that the default icon, name, and channel are used if none are provided, so let's see what happens when we change that. Replace your request AJAX call in index.js with the following: request .post(WEBHOOK_URL) .send({ username: "Incoming bot", channel: "#general", icon_emoji: ":+1:", text: 'Hello! I am different from the previous bot!' }) .end((err, res) => { console.log(res); }); Save the file, and nodemon will automatically restart the program. Switch over to the Slack client and you should see a message like the following pop up in your #general channel: New name, icon, and message In place of icon_emoji, you could also use icon_url to link to a specific image of your choosing. If you wish your message to be sent only to one user, you can supply a username as the value for the channel property: channel: "@paul" This will cause the message to be sent from within the Slackbot direct message. The message's icon and username will match either what you configured in the setup or set in the body of the POST request. Finally, let's look at sending links in our integration. Replace the text property with the following and save index.js: text: 'Hello! Here is a fun link: <http://www.github.com|Github is great!>' Slack will automatically parse any links it finds, whether it's in the http://www.example.com or www.example.com formats. By enclosing the URL in angled brackets and using the | character, we can specify what we would like the URL to be shown as: Formatted links are easier to read than long URLs For more information on message formatting, visit https://api.slack.com/docs/formatting. Note that as this is a custom webhook integration, we can change the name, icon, and channel of the integration. If we were to package the integration as a Slack app (an app installable by other teams), then it is not possible to override the default channel, username, and icon set. Incoming webhooks are triggered by external sources; an example would be when a new user signs up to your service or a product is sold. The goal of the incoming webhook is to provide information to your team that is easy to reach and comprehend. The opposite of this would be if you wanted users to get data out of Slack, which can be done via the medium of outgoing webhooks. Outgoing webhooks Outgoing webhooks differ from the incoming variety in that they send data out of Slack and to a service of your choosing, which in turn can respond with a message to the Slack channel. To set up an outgoing webhook, visit the custom integration page of your Slack team's admin page again—https://my.slack.com/apps/build/custom-integration—and this time, select the Outgoing WebHooks option. On the next screen, be sure to select a channel, name, and icon. Notice how there is a target URL field to be filled in; we will fill this out shortly. When an outgoing webhook is triggered in Slack, an HTTP POST request is made to the URL (or URLs, as you can specify multiple ones) you provide. So first, we need to build a server that can accept our webhook. In index.js, paste the following code: 'use strict'; const http = require('http'); // create a simple server with node's built in http module http.createServer((req, res) => { res.writeHead(200, {'Content-Type': 'text/plain'}); // get the data embedded in the POST request req.on('data', (chunk) => { // chunk is a buffer, so first convert it to // a string and split it to make it more legible as an array console.log('Body:', chunk.toString().split('&')); }); // create a response let response = JSON.stringify({ text: 'Outgoing webhook received!' }); // send the response to Slack as a message res.end(response); }).listen(8080, '0.0.0.0'); console.log('Server running at http://0.0.0.0:8080/'); Notice how we require the http module despite not installing it with NPM. This is because the http module is a core Node dependency and is automatically included with your installation of Node. In this block of code, we start a simple server on port 8080 and listen for incoming requests. In this example, we set our server to run at 0.0.0.0 rather than localhost. This is important as Slack is sending a request to our server, so it needs to be accessible from the Internet. Setting the IP of your server to 0.0.0.0 tells Node to use your computer's network-assigned IP address. Therefore, if you set the IP of your server to 0.0.0.0, Slack can reach your server by hitting your IP on port 8080 (for example, http://123.456.78.90:8080). If you are having trouble with Slack reaching your server, it is most likely because you are behind a router or firewall. To circumvent this issue, you can use a service such as ngrok (https://ngrok.com/). Alternatively, look at port forwarding settings for your router or firewall. Let's update our outgoing webhook settings accordingly: The outgoing webhook settings, with a destination URL Save your settings and run your Node app; test whether the outgoing webhook works by typing a message into the channel you specified in the webhook's settings. You should then see something like this in Slack: We built a spam bot Well, the good news is that our server is receiving requests and returning a message to send to Slack each time. The issue here is that we skipped over the Trigger Word(s) field in the webhook settings page. Without a trigger word, any message sent to the specified channel will trigger the outgoing webhook. This causes our webhook to be triggered by a message sent by the outgoing webhook in the first place, creating an infinite loop. To fix this, we could do one of two things: Refrain from returning a message to the channel when listening to all the channel's messages. Specify one or more trigger words to ensure we don't spam the channel. Returning a message is optional yet encouraged to ensure a better user experience. Even a confirmation message such as Message received! is better than no message as it confirms to the user that their message was received and is being processed. Let's therefore presume we prefer the second option, and add a trigger word: Trigger words keep our webhooks organized Let's try that again, this time sending a message with the trigger word at the beginning of the message. Restart your Node app and send a new message: Our outgoing webhook app now functions a lot like our bots from earlier Great, now switch over to your terminal and see what that message logged: Body: [ 'token=KJcfN8xakBegb5RReelRKJng', 'team_id=T000001', 'team_domain=buildingbots', 'service_id=34210109492', 'channel_id=C0J4E5SG6', 'channel_name=bot-test', 'timestamp=1460684994.000598', 'user_id=U0HKKH1TR', 'user_name=paul', 'text=webhook+hi+bot%21', 'trigger_word=webhook' ] This array contains the body of the HTTP POST request sent by Slack; in it, we have some useful data, such as the user's name, the message sent, and the team ID. We can use this data to customize the response or to perform some validation to make sure the user is authorized to use this webhook. In our response, we simply sent back a Message received string; however, like with incoming webhooks, we can set our own username and icon. The channel cannot be different from the channel specified in the webhook's settings, however. The same restrictions apply when the webhook is not a custom integration. This means that if the webhook was installed as a Slack app for another team, it can only post messages as the username and icon specified in the setup screen. An important thing to note is that webhooks, either incoming or outgoing, can only be set up in public channels. This is predominantly to discourage abuse and uphold privacy, as we've seen that it's simple to set up a webhook that can record all the activity on a channel. Summary In this article, you learned what webhooks are and how you can use them to get data in and out of Slack. You learned how to send messages as a bot user and how to interact with your users in the native Slack client. Resources for Article: Further resources on this subject: Keystone – OpenStack Identity Service[article] A Sample LEMP Stack[article] Implementing Stacks using JavaScript[article]
Read more
  • 0
  • 0
  • 14392

article-image-understanding-patterns-and-architecturesin-typescript
Packt
01 Jun 2016
19 min read
Save for later

Understanding Patterns and Architecturesin TypeScript

Packt
01 Jun 2016
19 min read
In this article by Vilic Vane,author of the book TypeScript Design Patterns, we'll study architecture and patterns that are closely related to the language or its common applications. Many topics in this articleare related to asynchronous programming. We'll start from a web architecture for Node.js that's based on Promise. This is a larger topic that has interesting ideas involved, including abstractions of response and permission, as well as error handling tips. Then, we'll talk about how to organize modules with ES module syntax. Due to the limited length of this article, some of the related code is aggressively simplified, and nothing more than the idea itself can be applied practically. (For more resources related to this topic, see here.) Promise-based web architecture The most exciting thing for Promise may be the benefits brought to error handling. In a Promise-based architecture, throwing an error could be safe and pleasant. You don't have to explicitly handle errors when chaining asynchronous operations, and this makes it tougher for mistakes to occur. With the growing usage with ES2015 compatible runtimes, Promise has already been there out of the box. We have actually plenty of polyfills for Promises (including my ThenFail, written in TypeScript) as people who write JavaScript roughly, refer to the same group of people who create wheels. Promises work great with other Promises: A Promises/A+ compatible implementation should work with other Promises/A+ compatible implementations Promises do their best in a Promise-based architecture If you are new to Promise, you may complain about trying Promise with a callback-based project. You may intend to use helpers provided by Promise libraries, such asPromise.all, but it turns out that you have better alternatives,such as the async library. So, the reason that makes you decide to switch should not be these helpers (as there are a lot of them for callbacks).They should be because there's an easier way to handle errors or because you want to take the advantages of ES async and awaitfeatures which are based on Promise. Promisifying existing modules or libraries Though Promises do their best with a Promise-based architecture, it is still possible to begin using Promise with a smaller scope by promisifying existing modules or libraries. Taking Node.js style callbacks as an example, this is how we use them: import * as FS from 'fs';   FS.readFile('some-file.txt', 'utf-8', (error, text) => { if (error) {     console.error(error);     return; }   console.log('Content:', text); }); You may expect a promisified version of readFile to look like the following: FS .readFile('some-file.txt', 'utf-8') .then(text => {     console.log('Content:', text); }) .catch(reason => {     Console.error(reason); }); Implementing the promisified version of readFile can be easy as the following: function readFile(path: string, options: any): Promise<string> { return new Promise((resolve, reject) => {     FS.readFile(path, options, (error, result) => {         if (error) { reject(error);         } else {             resolve(result);         }     }); }); } I am using any here for parameter options to reduce the size of demo code, but I would suggest that you donot useany whenever possible in practice. There are libraries that are able to promisify methods automatically. Unfortunately, you may need to write declaration files yourself for the promisified methods if there is no declaration file of the promisified version that is available. Views and controllers in Express Many of us may have already been working with frameworks such as Express. This is how we render a view or send back JSON data in Express: import * as Path from 'path'; import * as express from 'express';   let app = express();   app.set('engine', 'hbs'); app.set('views', Path.join(__dirname, '../views'));   app.get('/page', (req, res) => {     res.render('page', {         title: 'Hello, Express!',         content: '...'     }); });   app.get('/data', (req, res) => {     res.json({         version: '0.0.0',         items: []     }); });   app.listen(1337); We will usuallyseparate controller from routing, as follows: import { Request, Response } from 'express';   export function page(req: Request, res: Response): void {     res.render('page', {         title: 'Hello, Express!',         content: '...'     }); } Thus, we may have a better idea of existing routes, and we may have controllers managed more easily. Furthermore, automated routing can be introduced so that we don't always need to update routing manually: import * as glob from 'glob';   let controllersDir = Path.join(__dirname, 'controllers');   let controllerPaths = glob.sync('**/*.js', {     cwd: controllersDir });   for (let path of controllerPaths) {     let controller = require(Path.join(controllersDir, path));     let urlPath = path.replace(/\/g, '/').replace(/.js$/, '');       for (let actionName of Object.keys(controller)) {         app.get(             `/${urlPath}/${actionName}`, controller[actionName] );     } } The preceding implementation is certainly too simple to cover daily usage. However, it displays the one rough idea of how automated routing could work: via conventions that are based on file structures. Now, if we are working with asynchronous code that is written in Promises, an action in the controller could be like the following: export function foo(req: Request, res: Response): void {     Promise         .all([             Post.getContent(),             Post.getComments()         ])         .then(([post, comments]) => {             res.render('foo', {                 post,                 comments             });         }); } We use destructuring of an array within a parameter. Promise.all returns a Promise of an array with elements corresponding to values of resolvablesthat are passed in. (A resolvable means a normal value or a Promise-like object that may resolve to a normal value.) However, this is not enough, we need to handle errors properly. Or in some case, the preceding code may fail in silence (which is terrible). In Express, when an error occurs, you should call next (the third argument that is passed into the callback) with the error object, as follows: import { Request, Response, NextFunction } from 'express';   export function foo( req: Request, res: Response, next: NextFunction ): void {     Promise         // ...         .catch(reason => next(reason)); } Now, we are fine with the correctness of this approach, but this is simply not how Promises work. Explicit error handling with callbacks could be eliminated in the scope of controllers, and the easiest way to do this is to return the Promise chain and hand over to code that was previously performing routing logic. So, the controller could be written like the following: export function foo(req: Request, res: Response) {     return Promise         .all([             Post.getContent(),             Post.getComments()         ])         .then(([post, comments]) => {             res.render('foo', {                 post,                 comments             });         }); } Or, can we make this even better? Abstraction of response We've already been returning a Promise to tell whether an error occurs. So, for a server error, the Promise actually indicates the result, or in other words, the response of the request. However, why we are still calling res.render()to render the view? The returned Promise object could be an abstraction of the response itself. Think about the following controller again: export class Response {}   export class PageResponse extends Response {     constructor(view: string, data: any) { } }   export function foo(req: Request) {     return Promise         .all([             Post.getContent(),             Post.getComments()         ])         .then(([post, comments]) => {             return new PageResponse('foo', {                 post,                 comments             });         }); } The response object that is returned could vary for a different response output. For example, it could be either a PageResponse like it is in the preceding example, a JSONResponse, a StreamResponse, or even a simple Redirection. As in most of the cases, PageResponse or JSONResponse is applied, and the view of a PageResponse can usually be implied with the controller path and action name.It is useful to have these two responses automatically generated from a plain data object with proper view to render with, as follows: export function foo(req: Request) {     return Promise         .all([             Post.getContent(),             Post.getComments()         ])         .then(([post, comments]) => {             return {                 post,                 comments             };         }); } This is how a Promise-based controller should respond. With this idea in mind, let's update the routing code with an abstraction of responses. Previously, we were passing controller actions directly as Express request handlers. Now, we need to do some wrapping up with the actions by resolving the return value, and applying operations that are based on the resolved result, as follows: If it fulfills and it's an instance of Response, apply it to the resobjectthat is passed in by Express. If it fulfills and it's a plain object, construct a PageResponse or a JSONResponse if no view found and apply it to the resobject. If it rejects, call thenext function using this reason. As seen previously,our code was like the following: app.get(`/${urlPath}/${actionName}`, controller[actionName]); Now, it gets a little bit more lines, as follows: let action = controller[actionName];   app.get(`/${urlPath}/${actionName}`, (req, res, next) => {     Promise         .resolve(action(req))         .then(result => {             if (result instanceof Response) {                 result.applyTo(res);             } else if (existsView(actionName)) {                 new PageResponse(actionName, result).applyTo(res);             } else {                 new JSONResponse(result).applyTo(res);             }         })         .catch(reason => next(reason)); });   However, so far we can only handle GET requests as we hardcoded app.get() in our router implementation. The poor view matching logic can hardly be used in practice either. We need to make these actions configurable, and ES decorators could perform a good job here: export default class Controller { @get({     View: 'custom-view-path' })     foo(req: Request) {         return {             title: 'Action foo',             content: 'Content of action foo'         };     } } I'll leave the implementation to you, and feel free to make them awesome. Abstraction of permission Permission plays an important role in a project, especially in systems that have different user groups. For example, a forum. The abstraction of permission should be extendable to satisfy changing requirements, and it should be easy to use as well. Here, we are going to talk about the abstraction of permission in the level of controller actions. Consider the legibility of performing one or more actions a privilege. The permission of a user may consist of several privileges, and usually most of the users at the same level would have the same set of privileges. So, we may have a larger concept, namely groups. The abstraction could either work based on both groups and privileges, or work based on only privileges (groups are now just aliases to sets of privileges): Abstraction that validates based on privileges and groups at the same time is easier to build. You do not need to create a large list of which actions can be performed for a certain group of user, as granular privileges are only required when necessary. Abstraction that validates based on privileges has better control and more flexibility to describe the permission. For example, you can remove a small set of privileges from the permission of a user easily. However, both approaches have similar upper-level abstractions, and they differ mostly on implementations. The general structure of the permission abstractions that we've talked about is like in the following diagram: The participants include the following: Privilege: This describes detailed privilege corresponding to specific actions Group: This defines a set of privileges Permission: This describes what a user is capable of doing, consist of groups that the user belongs to, and the privileges that the user has. Permission descriptor: This describes how the permission of a user works and consists of possible groups and privileges. Expected errors A great concern that was wiped away after using Promises is that we do not need to worry about whether throwing an error in a callback would crash the application most of the time. The error will flow through the Promises chain and if not caught, it will be handled by our router. Errors can be roughly divided as expected errors and unexpected errors. Expected errors are usually caused by incorrect input or foreseeable exceptions, and unexpected errors are usually caused by bugs or other libraries that the project relies on. For expected errors, we usually want to give users a friendly response with readable error messages and codes. So that the user can help themselves searching the error or report to us with useful context. For unexpected errors, we would also want a reasonable response (usually a message described as an unknown error), a detailed server-side log (including real error name, message, stack information, and so on), and even alerts to let the team know as soon as possible. Defining and throwing expected errors The router will need to handle different types of errors, and an easy way to achieve this is to subclass a universal ExpectedError class and throw its instances out, as follows: import ExtendableError from 'extendable-error';   class ExpectedError extends ExtendableError { constructor(     message: string,     public code: number ) {     super(message); } } The extendable-error is a package of mine that handles stack trace and themessage property. You can directly extend Error class as well. Thus, when receiving an expected error, we can safely output the error name and message as part of the response. If this is not an instance of ExpectedError, we can display predefined unknown error messages. Transforming errors Some errors such as errors that are caused by unstable networks or remote services are expected.We may want to catch these errors and throw them out again as expected errors. However, it could be rather trivial to actually do this. A centralized error transforming process can then be applied to reduce the efforts required to manage these errors. The transforming process includes two parts: filtering (or matching) and transforming. These are the approaches to filter errors: Filter by error class: Many third party libraries throws error of certain class. Taking Sequelize (a popular Node.js ORM) as an example, it has DatabaseError, ConnectionError, ValidationError, and so on. By filtering errors by checking whether they are instances of a certain error class, we may easily pick up target errors from the pile. Filter by string or regular expression: Sometimes a library might be throw errors that are instances of theError class itself instead of its subclasses.This makes these errors hard to distinguish from others. In this situation, we can filter these errors by their message with keywords or regular expressions. Filter by scope: It's possible that instances of the same error class with the same error message should result in a different response. One of the reasons may be that the operation throwing a certain error is at a lower-level, but it is being used by upper structures within different scopes. Thus, a scope mark can be added for these errors and make it easier to be filtered. There could be more ways to filter errors, and they are usually able to cooperate as well. By properly applying these filters and transforming errors, we can reduce noises, analyze what's going on within a system,and locate problems faster if they occur. Modularizing project Before ES2015, there are actually a lot of module solutions for JavaScript that work. The most famous two of them might be AMD and CommonJS. AMD is designed for asynchronous module loading, which is mostly applied in browsers. While CommonJSperforms module loading synchronously, and this is the way that the Node.js module system works. To make it work asynchronously, writing an AMD module takes more characters. Due to the popularity of tools, such asbrowserify and webpack, CommonJS becomes popular even for browser projects. Proper granularity of internal modules can help a project keep a healthy structure. Consider project structure like the following: project├─controllers├─core│  │ index.ts│  ││  ├─product│  │   index.ts│  │   order.ts│  │   shipping.ts│  ││  └─user│      index.ts│      account.ts│      statistics.ts│├─helpers├─models├─utils└─views Let's assume that we are writing a controller file that's going to import a module defined by thecore/product/order.ts file. Previously, usingCommonJS style'srequire, we would write the following: const Order = require('../core/product/order'); Now, with the new ES import syntax, this would be like the following: import * as Order from '../core/product/order'; Wait, isn't this essentially the same? Sort of. However, you may have noticed several index.ts files that I've put into folders. Now, in the core/product/index.tsfile, we could have the following: import * as Order from './order'; import * as Shipping from './shipping';   export { Order, Shipping } Or, we could also have the following: export * from './order'; export * from './shipping'; What's the difference? The ideal behind these two approaches of re-exporting modules can vary. The first style works better when we treat Order and Shipping as namespaces, under which the identifier names may not be easy to distinguish from one another. With this style, the files are the natural boundaries of building these namespaces. The second style weakens the namespace property of two files, and then uses them as tools to organize objects and classes under the same larger category. A good thingabout using these files as namespaces is that multiple-level re-exporting is fine, while weakening namespaces makes it harder to understand different identifier names as the number of re-exporting levels grows. Summary In this article, we discussed some interesting ideas and an architecture formed by these ideas. Most of these topics focused on limited examples, and did their own jobs.However, we also discussed ideas about putting a whole system together. Resources for Article: Further resources on this subject: Introducing Object Oriented Programmng with TypeScript [article] Writing SOLID JavaScript code with TypeScript [article] Optimizing JavaScript for iOS Hybrid Apps [article]
Read more
  • 0
  • 0
  • 32879

article-image-linking-data-shapes
Packt
01 Jun 2016
7 min read
Save for later

Linking Data to Shapes

Packt
01 Jun 2016
7 min read
In this article by David J Parker, the author of the book Mastering Data Visualization with Microsoft Visio Professional 2016, discusses about that Microsoft introduced the current data-linking feature in the Professional edition of Visio Professional 2007. This feature is better than the database add-on that has been around since Visio 4 because it has greater importing capabilities and is part of the core product with its own API. This provides the Visio user with a simple method of surfacing data from a variety of data sources, and it gives the power user (or developer) the ability to create productivity enhancements in code. (For more resources related to this topic, see here.) Once data is imported in Visio, the rows of data can be linked to shapes and then displayed visually, or be used to automatically create hyperlinks. Moreover, if the data is edited outside of Visio, then the data in the Visio shapes can be refreshed, so the shapes reflect the updated data. This can be done in the Visio client, but some data sources can also refresh the data in Visio documents that are displayed in SharePoint web pages. In this way, Visio documents truly become operational intelligence dashboards. Some VBA knowledge will be useful, and the sample data sources are introduced in each section. In this chapter, we shall cover the following topics: The new Quick Import feature Importing data from a variety of sources How to link shapes to rows of data Using code for more linking possibilities A very quick introduction to importing and linking data Visio Professional 2016 added more buttons to the Data ribbon tab, and some new Data Graphics, but the functionality has basically been the same since Visio 2007 Professional. The new additions, as seen in the following screenshot, can make this particular ribbon tab quite wide on the screen. Thank goodness that wide screens have become the norm: The process to create data-refreshable shapes in Visio is simply as follows: Import data as recordsets. Link rows of data to shapes. Make the shapes display the data. Use any hyperlinks that have been created automatically. The Quick Import tool introduced in Visio 2016 Professional attempts to merge the first three steps into one, but it rarely gets it perfectly, and it is only for simple Excel data sources. Therefore, it is necessary to learn how to use the Custom Import feature properly. Knowing when to use the Quick Import tool The Data | External Data | Quick Import button is new in Visio 2016 Professional. It is part of the Visio API, so it cannot be called in code. This is not a great problem because it is only a wrapper for some of the actions that can be done in code anyway. This feature can only use an Excel workbook, but fortunately Visio installs a sample OrgData.xls file in the Visio Content<LCID> folder. The LCID (Location Code Identifier) for US English is 1033, as shown in the following screenshot: The screenshot shows a Visio Professional 2016 32-bit installation is on a Windows 10 64-bit laptop. Therefore, the Office16 applications are installed in the Program Files (x86)root folder. It would just be Program Filesroot if the 64-bit version of Office was installed. It is not possible to install a different bit version of Visio than the rest of the Office applications. There is no root folder in previous versions of Office, but the rest of the path is the same. The full path on this laptop is C:Program Files (x86)Microsoft OfficerootOffice16Visio Content1033ORGDATA.XLS, but it is best to copy this file to a folder where it can be edited. It is surprising that the Excel workbook is in the old binary format, but it is a simple process to open and save it in the new Open Packaging Convention file format with an xlsx extension. Importing to shapes without existing Shape Data rows The following example contains three Person shapes from the Work Flow Objects stencil, and each one contains the names of a person’s name, spelt exactly the same as in the key column on the Excel worksheet. It is not case sensitive, and it does not matter whether there are leading or trailing spaces in the text. When the Quick Import button is pressed, a dialog opens up to show the progress of the stages that the wizard feature is going through, as shown in the following screenshot: If the workbook contains more than one table of data, the user is prompted to select the range of cells within the workbook. When the process is complete, each of the Person shapes contains all of the data from the row in the External Data recordset, where the text matches the Name column, as shown in the following screenshot: The linked rows in the External Data window also display a chain icon, and the right-click menu has many actions, such as selecting the Linked Shapes for a row. Conversely, each shape now contains a right-mouse menu action to select the linked row in an External Data recordset. The Quick Import feature also adds some default data graphics to each shape, which will be ignored in this chapter because it is explored in detail in chapter 4, Using the Built-in Data Graphics. Note that the recordset in the External Data window is named Sheet1$A1:H52. This is not perfect, but the user can rename it through the right mouse menu actions of the tab. The Properties dialog, as seen in the following screenshot: The user can also choose what to do if a data link is added to a shape that already has one. A shape can be linked to a single row in multiple recordsets, and a single row can be linked to multiple shapes in a document, or even on the same page. However, a shape cannot be linked to more than one row in the same recordset. Importing to shapes with existing Shape Data rows The Person shape from the Resources stencil has been used in the following example, and as earlier, each shape has the name text. However, in this case, there are some existing Shape Data rows: When the Quick Import feature is run, the data is linked to each shape where the text matches the Name column value. This feature has unfortunately created a problem this time because the Phone Number, E-mail Alias, and Manager Shape Data rows have remained empty, but the superfluous Telephone, E-mail, and Reports_To have been added. The solution is to edit the column headers in the worksheet to match the existing Shape Data row labels, as shown in the following screenshot: Then, when Quick Import is used again, the column headers will match the Shape Data row names, and the data will be automatically cached into the correct places, as shown in the following screenshot: Using the Custom Import feature The user has more control using the Custom Import button on the Data | External Data ribbon tab. This button was called Link Data to Shapes in the previous versions of Visio. In either case, the action opens the Data Selector dialog, as shown in the following screenshot: Each of these data sources will be explained in this chapter, along with the two data sources that are not available in the UI (namely XML files and SQL Server Stored Procedures). Summary This article has gone through the many different sources for importing data in Visio and has shown how each can be done. Resources for Article: Further resources on this subject: Overview of Process Management in Microsoft Visio 2013[article] Data Visualization[article] Data visualization[article]
Read more
  • 0
  • 0
  • 7293

article-image-holistic-view-spark
Packt
31 May 2016
19 min read
Save for later

Holistic View on Spark

Packt
31 May 2016
19 min read
In this article by Alex Liu, author of the book Apache Spark Machine Learning Blueprints, the author talks about a new stage of utilizing Apache Spark-based systems to turn data into insights. (For more resources related to this topic, see here.) According to research done by Gartner and others, many organizations lost a huge amount of value only due to the lack of a holistic view of their business. In this article, we will review the machine learning methods and the processes of obtaining a holistic view of business. Then, we will discuss how Apache Spark fits in to make the related computing easy and fast, and at the same time, with one real-life example, illustrate this process of developing holistic views from data using Apache Spark computing, step by step as follows: Spark for a holistic view Methods for a holistic view Feature preparation Model estimation Model evaluation Results Explanation Deployment Spark for holistic view Spark is very suitable for machine-learning projects such as ours to obtain a holistic view of business as it enables us to process huge amounts of data fast, and it enables us to code complicated computation easily. In this section, we will first describe a real business case and then describe how to prepare the Spark computing for our project. The use case The company IFS sells and distributes thousands of IT products and has a lot of data on marketing, training, team management, promotion, and products. The company wants to understand how various kinds of actions, such as that in marketing and training, affect sales teams’ success. In other words, IFS is interested in finding out how much impact marketing, training, or promotions have generated separately. In the past, IFS has done a lot of analytical work, but all of it was completed by individual departments on soloed data sets. That is, they have analytical results about how marketing affects sales from using marketing data alone, and how training affects sales from analyzing training data alone. When the decision makers collected all the results together and prepared to make use of them, they found that some of the results were contradicting with each other. For example, when they added all the effects together, the total impacts were beyond their intuitively imaged. This is a typical problem that every organization is facing. A soloed approach with soloed data will produce an incomplete view, and also an often biased view, or even conflicting views. To solve this problem, analytical teams need to take a holistic view of all the company data, and gather all the data into one place, and then utilize new machine learning approaches to gain a holistic view of business. To do so, companies also need to care for the following: The completeness of causes Advanced analytics to account for the complexity of relationships Computing the complexity related to subgroups and a big number of products or services For this example, we have eight datasets that include one dataset for marketing with 48 features, one dataset for training with 56 features, and one dataset for team administration with 73 features, with the following table as a complete summary: Category Number of Features Team 73 Marketing 48 Training 56 Staffing 103 Product 77 Promotion 43 Total 400 In this company, researchers understood that pooling all the data sets together and building a complete model was the solution, but they were not able to achieve it for several reasons. Besides organizational issues inside the corporation, tech capability to store all the data, to process all the data quickly with the right methods, and to present all the results in the right ways with reasonable speed were other challenges. At the same time, the company has more than 100 products to offer for which data was pooled together to study impacts of company interventions. That is, calculated impacts are average impacts, but variations among products are too large to ignore. If we need to assess impacts for each product, parallel computing is preferred and needs to be implemented at good speed. Without utilizing a good computing platform such as Apache Spark meeting the requirements that were just described is a big challenge for this company. In the sections that follow, we will use modern machine learning on top of Apache Spark to attack this business use case and help the company to gain a holistic view of their business. In order to help readers learn machine learning on Spark effectively, discussions in the following sections are all based on work about this real business use case that was just described. But, we left some details out to protect the company’s privacy and also to keep everything brief. Distributed computing For our project, parallel computing is needed for which we should set up clusters and worker notes. Then, we can use the driver program and cluster manager to manage the computing that has to be done in each worker node. As an example, let's assume that we choose to work within Databricks’ environment: The users can go to the main menu, as shown in the preceding screenshot, click Clusters. A Window will open for users to name the cluster, select a version of Spark, and then specify number of workers. Once the clusters are created, we can go to the main menu, click the down arrow on the right-hand side of Tables. We then choose Create Tables to import our datasets that were cleaned and prepared. For the data source, the options include S3, DBFS, JDBC, and File (for local fields). Our data has been separated into two subsets, one to train and one to test each product, as we need to train a few models per product. In Apache Spark, we need to direct workers to complete computation on each note. We will use scheduler to get Notebook computation completed on Databricks, and collect the results back, which will be discussed in the Model Estimation section. Fast and easy computing One of the most important advantages of utilizing Apache Spark is to make coding easy for which several approaches are available. Here for this project, we will focus our effort on the notebook approach, and specifically, we will use the R notebooks to develop and organize codes. At the same time, with an effort to illustrate the Spark technology more thoroughly, we will also use MLlib directly to code some of our needed algorithms as MLlib has been seamlessly integrated with Spark. In the Databricks’ environment, setting up notebooks will take the following steps: As shown in the preceding screenshot, users can go to the Databricks main menu, click the down arrow on the right-hand side of Workspace, and choose Create -> Notebook to create a new notebook. A table will pop up for users to name the notebook and also select a language (R, Python, Scala, or SQL). In order to make our work repeatable and also easy to understand, we will adopt a workflow approach that is consistent with the RM4Es framework. We will also adopt Spark’s ML Pipeline tools to represent our workflows whenever possible. Specifically, for the training data set, we need to estimate models, evaluate models, then maybe to re-estimate the models again before we can finalize our models. So, we need to use Spark’s Transformer, Estimator, and Evaluator to organize an ML pipeline for this project. In practice, we can also organize these workflows within the R notebook environment. For more information about pipeline programming, please go to http://spark.apache.org/docs/latest/ml-guide.html#example-pipeline.  & http://spark.apache.org/docs/latest/ml-guide.html Once our computing platform is set up and our framework is cleared, everything becomes clear too. In the following sections, we will move forward step by step. We will use our RM4Es framework and related processes to identity equations or methods and then prepare features first. The second step is to complete model estimations, the third is to evaluate models, and the fourth is to explain our results. Finally, we will deploy the models. Methods for a holistic view In this section, we need to select our analytical methods or models (equations), which is to complete a task of mapping our business use case to machine learning methods. For our use case of assessing impacts of various factors on sales team success, there are many suitable models for us to use. As an exercise, we will select: regression models, structural equation models, and decision trees, mainly for their easiness to interpret as well as their implementation on Spark. Once we finalize our decision for analytical methods or models, we will need to prepare the dependent variable and also prepare to code, which we will discuss one by one. Regression modeling To get ready for regression modeling on Spark, there are three issues that you have to take care of, as follows: Linear regression or logistic regression: Regression is the most mature and also most widely-used model to represent the impacts of various factors on one dependent variable. Whether to use linear regression or logistic regression depends on whether the relationship is linear or not. We are not sure about this, so we will use adopt both and then compare their results to decide on which one to deploy. Preparing the dependent variable: In order to use logistic regression, we need to recode the target variable or dependent variable (the sales team success variable now with a rating from 0 to 100) to be 0 versus 1 by separating it with the medium value. Preparing coding: In MLlib, we can use the following codes for regression modeling as we will use Spark MLlib’s Linear Regression with Stochastic Gradient Descent (LinearRegressionWithSGD): val numIterations = 90 val model = LinearRegressionWithSGD.train(TrainingData, numIterations) For logistic regression, we use the following codes: val model = new LogisticRegressionWithSGD() .setNumClasses(2) .run(training) For more about using MLlib for regression modeling, please go to: http://spark.apache.org/docs/latest/mllib-linear-methods.html#linear-least-squares-lasso-and-ridge-regression In R, we can use the lm function for linear regression, and the glm function for logistic regression with family=binomial(). SEM aproach To get ready for Structural Equation Modeling (SEM) on Spark, there are also three issues that we need to take care of as follows: SEM introduction specification: SEM may be considered as an extension of regression modeling, as it consists of several linear equations that are similar to regression equations. But, this method estimates all the equations at the same time regarding their internal relations, so it is less biased than regression modeling. SEM consists of both structural modeling and latent variable modeling, but for us, we will only use structural modeling. Preparing dependent variable: We can just use the sales team success scale (rating of 0 to 100) as our target variable here. Preparing coding: We will adopt the R notebook within the Databricks environment, for which we should use the R package SEM. There are also other SEM packages, such as lavaan, that are available to use, but for this project, we will use the SEM package for its easiness to learn. Loading SEM package into the R notebook, we will use install.packages("sem", repos="http://R-Forge.R-project.org"). Then, we need to perform the R code of library(sem). After that, we need to use the specify.model() function to write some codes to specify models into our R notebook, for which the following codes are needed: mod.no1 <- specifyModel() s1 <- x1, gam31 s1 <- x2, gam32 Decision trees To get ready for the decision tree modeling on Spark, there are also three issues that we need to take care of as follows: Decision tree selection: Decision tree aims to model classifying cases, which is about classifying elements into success or not success for our use case. It is also one of the most mature and widely-used methods. For this exercise, we will only use the simple linear decision tree, and we will not venture into any more complicated trees, such as random forest. Prepare the dependent variable: To use the decision tree model here, we will separate the sales team rating into two categories of SUCCESS or NOT as we did for logistic regression. Prepare coding: For MLlib, we can use the following codes: val numClasses = 2 val categoricalFeaturesInfo = Map[Int, Int]() val impurity = "gini" val maxDepth = 6 val maxBins = 32 val model = DecisionTree.trainClassifier(trainingData, numClasses, categoricalFeaturesInfo, impurity, maxDepth, maxBins)   For more information about using MLlib for decision tree, please go to: http://spark.apache.org/docs/latest/mllib-decision-tree.html As for the R notebook on Spark, we need to use an R package of rpart, and then use the rpart functions for all the calculation. For rpart, we need to specify the classifier and also all the features that have to be used. Model estimation Once feature sets get finalized, what follows is to estimate the parameters of the selected models. We can use either MLlib or R here to do this, and we need to arrange distributed computing. To simplify this, we can utilize the Databricks’ Job feature. Specifically, within the Databricks environment, we can go to Jobs, then create jobs, as shown in the following screenshot: Then, users can select what notebooks to run, specify clusters, and then schedule jobs. Once scheduled, users can also monitor the running notebooks, and then collect results back. In Section II, we prepared some codes for each of the three models that were selected. Now, we need to modify them with the final set of features selected in the last section, so to create our final notebooks. In other words, we have one dependent variable prepared, and 17 features selected out from our PCA and feature selection work. So, we need to insert all of them into the codes that were developed in Section II to finalize our notebook. Then, we will use Spark Job feature to get these notebooks implemented in a distributed way. MLlib implementation First, we need to prepare our data with the s1 dependent variable for linear regression, and the s2 dependent variable for logistic regression or decision tree. Then, we need to add the selected 17 features into them to form datasets that are ready for our use. For linear regression, we will use the following code: val numIterations = 90 val model = LinearRegressionWithSGD.train(TrainingData, numIterations) For logistic regression, we will use the following code: val model = new LogisticRegressionWithSGD() .setNumClasses(2) For Decision tree, we will use the following code: val model = DecisionTree.trainClassifier(trainingData, numClasses, categoricalFeaturesInfo, impurity, maxDepth, maxBins) R notebooks implementation For better comparison, it is a good idea to write linear regression and SEM into the same R notebook and also write logistic regression and Decision tree into the same R notebook. Then, the main task left here is to schedule the estimation for each worker, and then collect the results, using the previously mentioned Job feature in Databricks environment as follows: The code for linear regression and SEM is as follows: lm.est1 <- lm(s1 ~ T1+T2+M1+ M2+ M3+ Tr1+ Tr2+ Tr3+ S1+ S2+ P1+ P2+ P3+ P4+ Pr1+ Pr2+ Pr3) mod.no1 <- specifyModel() s1 <- x1, gam31 s1 <- x2, gam32 The code for logistic regression and Decision tree is as follows: logit.est1 <- glm(s2~ T1+T2+M1+ M2+ M3+ Tr1+ Tr2+ Tr3+ S1+ S2+ P1+ P2+ P3+ P4+ Pr1+ Pr2+ Pr3,family=binomial()) dt.est1 <- rpart(s2~ T1+T2+M1+ M2+ M3+ Tr1+ Tr2+ Tr3+ S1+ S2+ P1+ P2+ P3+ P4+ Pr1+ Pr2+ Pr3, method="class") After we get all models estimated as per each product, for simplicity, we will focus on one product to complete our discussion on model evaluation and model deployment. Model evaluation In the previous section, we completed our model estimation task. Now, it is time for us to evaluate the estimated models to see whether they meet our model quality criteria so that we can either move to our next stage for results explanation or go back to some previous stages to refine our models. To perform our model evaluation, in this section, we will focus our effort on utilizing RMSE (root-mean-square error) and ROC Curves (receiver operating characteristic) to assess whether our models are a good fit. To calculate RMSEs and ROC Curves, we need to use our test data rather than the training data that was used to estimate our models. Quick evaluations Many packages have already included some algorithms for users to assess models quickly. For example, both MLlib and R have algorithms to return a confusion matrix for logistic regression models, and they even get false positive numbers calculated. Specifically, MLlib has functions of confusionMatrixand numFalseNegatives() for us to use, and even some algorithms to calculate MSE quickly as follows: MSE = valuesAndPreds.(lambda (v, p): (v - p)**2).mean() print("Mean Squared Error = " + str(MSE)) Also, R has a function of confusion.matrix for us to use. In R, there are even many tools to produce some quick graphical plots that can be used to gain a quick evaluation of models. For example, we can perform plots of predicted versus actual values, and also residuals on predicted values. Intuitively, the methods of comparing predicted versus actual values are the easiest to understand and give us a quick model evaluation. The following table is a calculated confusion matrix for one of the company products, which shows a reasonable fit of our model.   Predicted as Success Predicted as NOT Actual Success 83% 17% Actual Not 9% 91% RMSE In MLlib, we can use the following codes to calculate RMSE: val valuesAndPreds = test.map { point => val prediction = new_model.predict(point.features) val r = (point.label, prediction) r } val residuals = valuesAndPreds.map {case (v, p) => math.pow((v - p), 2)} val MSE = residuals.mean(); val RMSE = math.pow(MSE, 0.5) Besides the above, MLlib also has some functions in the RegressionMetrics and RankingMetrics classes for us to use for RMSE calculation. In R, we can compute RMSE as follows: RMSE <- sqrt(mean((y-y_pred)^2)) Before this, we need to obtain the predicted values with the following commands: > # build a model > RMSElinreg <- lm(s1 ~ . ,data= data1) > #score the model > score <- predict(RMSElinreg, data2) After we have obtained RMSE values for all the estimated models, we will compare them to evaluate the linear regression model versus the logistic regression model versus the Decision tree model. For our case, linear regression models turned out to be almost the best. Then, we also compare RMSE values across products, and send back some product models back for refinement. For another example of obtaining RMSE, please go to http://www.cakesolutions.net/teamblogs/spark-mllib-linear-regression-example-and-vocabulary. ROC curves As an example, we calculate ROC curves to assess our logistic models. In MLlib, we can use the MLlib function of metrics.areaUnderROC() to calculate ROC once we apply our estimated model to our test data and get labels for testing cases. For more on using MLlib to obtain ROC, please go to: http://web.cs.ucla.edu/~mtgarip/linear.html In R, using package pROC, we can perform the following to calculate and plot ROC curves: mylogit <- glm(s2 ~ ., family = "binomial") summary(mylogit) prob=predict(mylogit,type=c("response")) testdata1$prob=prob library(pROC) g <- roc(s2 ~ prob, data = testdata1) plot(g) As discussed, once ROC curves get calculated, we can use them to compare our logistic models against Decision tree models, or compare models cross products. In our case, logistic models perform better than Decision tree models. Results explanation Once we pass our model evaluation and decide to select the estimated model as our final model, we need to interpret results to the company executives and also their technicians. Next, we discuss some commonly-used ways of interpreting our results, one using tables and another using graphs, with our focus on impacts assessments. Some users may prefer to interpret our results in terms of ROIs, for which cost and benefits data is needed. Once we have the cost and benefit data, our results here can be easily expanded to cover the ROI issues. Also, some optimization may need to be applied for real decision making. Impacts assessments As discussed in Section 1, the main purpose of this project is to gain a holistic view of the sales team success. For example, the company wishes to understand the impact of marketing on sales success in comparison to training and other factors. As we have our linear regression model estimated, one easy way of comparing impacts is to summarize the variance explained by each feature group, as shown by the following table. Tables for Impact Assessment: Feature Group % Team 8.5 Marketing 7.6 Training 5.7 Staffing 12.9 Product 8.9 Promotion 14.6 Total 58.2 The following figure is another example of using graphs to display the results that were discussed. Summary In this article, we went through a step-by-step process from data to a holistic view of businesses. From this, we processed a large amount of data on Spark and then built a model to produce a holistic view of the sales team success for the company IFS. Specifically, we first selected models per business needs after we prepared Spark computing and loaded in preprocessed data. Secondly, we estimated model coefficients. Third, we evaluated the estimated models. Then, we finally interpreted analytical results. This process is similar to the process of working with small data. But in dealing with big data, we will need parallel computing, for which Apache Spark is utilized. During this process, Apache Spark makes things easy and fast. After this article, readers will have gained a full understanding about how Apache Spark can be utilized to make our work easier and faster in obtaining a holistic view of businesses. At the same time, readers should become familiar with the RM4Es modeling processes of processing large amount of data and developing predictive models, and they should especially become capable of producing their own holistic view of businesses. Resources for Article: Further resources on this subject: Getting Started with Apache Hadoop and Apache Spark [article] Getting Started with Apache Spark DataFrames [article] Sabermetrics with Apache Spark [article]
Read more
  • 0
  • 0
  • 2732

article-image-splunks-input-methods-and-data-feeds
Packt
30 May 2016
13 min read
Save for later

Splunk's Input Methods and Data Feeds

Packt
30 May 2016
13 min read
This article being crafted by Ashish Kumar Yadav has been picked from Advanced Splunk book. This book helps you to get in touch with a great data science tool named Splunk. The big data world is an ever expanding forte and it is easy to get lost in the enormousness of machine data available at your bay. The Advanced Splunk book will definitely provide you with the necessary resources and the trail to get you at the other end of the machine data. While the book emphasizes on Splunk, it also discusses its close association with Python language and tools like R and Tableau that are needed for better analytics and visualization purpose. (For more resources related to this topic, see here.) Splunk supports numerous ways to ingest data on its server. Any data generated from a human-readable machine from various sources can be uploaded using data input methods such as files, directories, TCP/UDP scripts can be indexed on the Splunk Enterprise server and analytics and insights can be derived from them. Data sources Uploading data on Splunk is one of the most important parts of analytics and visualizations of data. If data is not properly parsed, timestamped, or broken into events, then it can be difficult to analyze and get proper insight on the data. Splunk can be used to analyze and visualize data ranging from various domains, such as IT security, networking, mobile devices, telecom infrastructure, media and entertainment devices, storage devices, and many more. The machine generated data from different sources can be of different formats and types, and hence, it is very important to parse data in the best format to get the required insight from it. Splunk supports machine-generated data of various types and structures, and the following screenshot shows the common types of data that comes with an inbuilt support in Splunk Enterprise. The most important point of these sources is that if the data source is from the following list, then the preconfigured settings and configurations already stored in Splunk Enterprise are applied. This helps in getting the data parsed in the best and most suitable formats of events and timestamps to enable faster searching, analytics, and better visualization. The following screenshot enlists common data sources supported by Splunk Enterprise: Structured data Machine-generated data is generally structured, and in some cases, it can be semistructured. Some of the types of structured data are EXtensible Markup Language (XML), JavaScript Object Notation (JSON), comma-separated values (CSV), tab-separated values (TSV), and pipe-separated values (PSV). Any format of structured data can be uploaded on Splunk. However, if the data is from any of the preceding formats, then predefined settings and configuration can be applied directly by choosing the respective source type while uploading the data or by configuring it in the inputs.conf file. The preconfigured settings for any of the preceding structured data is very generic. Many times, it happens that the machine logs are customized structured logs; in that case, additional settings will be required to parse the data. For example, there are various types of XML. We have listed two types here. In the first type, there is the <note> tag at the start and </note> at the end, and in between, there are parameters are their values. In the second type, there are two levels of hierarchies. XML has the <library> tag along with the <book> tag. Between the <book> and </book> tags, we have parameters and their values. The first type is as follows: <note> <to>Jack</to> <from>Micheal</from> <heading>Test XML Format</heading> <body>This is one of the format of XML!</body> </note> The second type is shown in the following code snippet: <Library> <book category="Technical"> <title lang="en">Splunk Basic</title> <author>Jack Thomas</author> <year>2007</year> <price>520.00</price> </book> <book category="Story"> <title lang="en">Jungle Book</title> <author>Rudyard Kiplin</author> <year>1984</year> <price>50.50</price> </book> </Library > Similarly, there can be many types of customized XML scripts generated by machines. To parse different types of structured data, Splunk Enterprise comes with inbuilt settings and configuration defined for the source it comes from. Let's say, for example, that the data received from a web server's logs are also structured logs and it can be in either a JSON, CSV, or simple text format. So, depending on the specific sources, Splunk tries to make the job of the user easier by providing the best settings and configuration for many common sources of data. Some of the most common sources of data are data from web servers, databases, operation systems, network security, and various other applications and services. Web and cloud services The most commonly used web servers are Apache and Microsoft IIS. All Linux-based web services are hosted on Apache servers, and all Windows-based web services on IIS. The logs generated from Linux web servers are simple plain text files, whereas the log files of Microsoft IIS can be in a W3C-extended log file format or it can be stored in a database in the ODBC log file format as well. Cloud services such as Amazon AWS, S3, and Microsoft Azure can be directly connected and configured according to the forwarded data on Splunk Enterprise. The Splunk app store has many technology add-ons that can be used to create data inputs to send data from cloud services to Splunk Enterprise. So, when uploading log files from web services, such as Apache, Splunk provides a preconfigured source type that parses data in the best format for it to be available for visualization. Suppose that the user wants to upload apache error logs on the Splunk server, and then the user chooses apache_error from the Web category of Source type, as shown in the following screenshot: On choosing this option, the following set of configuration is applied on the data to be uploaded: The event break is configured to be on the regular expression pattern ^[ The events in the log files will be broken into a single event on occurrence of [ at every start of a line (^) The timestamp is to be identified in the [%A %B %d %T %Y] format, where: %A is the day of week; for example, Monday %B is the month; for example, January %d is the day of the month; for example, 1 %T is the time that has to be in the %H : %M : %S format %Y is the year; for example, 2016 Various other settings such as maxDist that allows the amount of variance of logs can vary from the one specified in the source type and other settings such as category, descriptions, and others. Any new settings required as per our needs can be added using the New Settings option available in the section below Settings. After making the changes, either the settings can be saved as a new source type or the existing source type can be updated with the new settings. IT operations and network security Splunk Enterprise has many applications on the Splunk app store that specifically target IT operations and network security. Splunk is a widely accepted tool for intrusion detection, network and information security, fraud and theft detection, and user behaviour analytics and compliance. A Splunk Enterprise application provides inbuilt support for the Cisco Adaptive Security Appliance (ASA) firewall, Cisco SYSLOG, Call Detail Records (CDR) logs, and one of the most popular intrusion detection application, Snort. The Splunk app store has many technology add-ons to get data from various security devices such as firewall, routers, DMZ, and others. The app store also has the Splunk application that shows graphical insights and analytics over the data uploaded from various IT and security devices. Databases The Splunk Enterprise application has inbuilt support for databases such as MySQL, Oracle Syslog, and IBM DB2. Apart from this, there are technology add-ons on the Splunk app store to fetch data from the Oracle database and the MySQL database. These technology add-ons can be used to fetch, parse, and upload data from the respective database to the Splunk Enterprise server. There can be various types of data available from one source; let's take MySQL as an example. There can be error log data, query logging data, MySQL server health and status log data, or MySQL data stored in the form of databases and tables. This concludes that there can be a huge variety of data generated from the same source. Hence, Splunk provides support for all types of data generated from a source. We have inbuilt configuration for MySQL error logs, MySQL slow queries, and MySQL database logs that have been already defined for easier input configuration of data generated from respective sources. Application and operating system data The Splunk input source type has inbuilt configuration available for Linux dmesg, syslog, security logs, and various other logs available from the Linux operating system. Apart from the Linux OS, Splunk also provides configuration settings for data input of logs from Windows and iOS systems. It also provides default settings for Log4j-based logging for Java, PHP, and .NET enterprise applications. Splunk also supports lots of other applications' data such as Ruby on Rails, Catalina, WebSphere, and others. Splunk Enterprise provides predefined configuration for various applications, databases, OSes, and cloud and virtual environments to enrich the respective data with better parsing and breaking into events, thus deriving at better insight from the available data. The applications' source whose settings are not available in Splunk Enterprise can alternatively have apps or add-ons on the app store. Data input methods Splunk Enterprise supports data input through numerous methods. Data can be sent on Splunk via files and directories, TCP, UDP, scripts or using universal forwarders. Files and directories Splunk Enterprise provides an easy interface to the uploaded data via files and directories. Files can be directly uploaded from the Splunk web interface manually or it can be configured to monitor the file for changes in content, and the new data will be uploaded on Splunk whenever it is written in the file. Splunk can also be configured to upload multiple files by either uploading all the files in one shot or the directory can be monitored for any new files, and the data will get indexed on Splunk whenever it arrives in the directory. Any data format from any sources that are in a human-readable format, that is, no propriety tools are needed to read the data, can be uploaded on Splunk. Splunk Enterprise even supports uploading in a compressed file format such as (.zip and .tar.gz), which has multiple log files in a compressed format. Network sources Splunk supports both TCP and UDP to get data on Splunk from network sources. It can monitor any network port for incoming data and then can index it on Splunk. Generally, in case of data from network sources, it is recommended that you use a Universal forwarder to send data on Splunk, as Universal forwarder buffers the data in case of any issues on the Splunk server to avoid data loss. Windows data Splunk Enterprise provides direct configuration to access data from a Windows system. It supports both local as well as remote collections of various types and sources from a Windows system. Splunk has predefined input methods and settings to parse event log, performance monitoring report, registry information, hosts, networks and print monitoring of a local as well as remote Windows system. So, data from different sources of different formats can be sent to Splunk using various input methods as per the requirement and suitability of the data and source. New data inputs can also be created using Splunk apps or technology add-ons available on the Splunk app store. Adding data to Splunk—new interfaces Splunk Enterprises introduced new interfaces to accept data that is compatible with constrained resources and lightweight devices for Internet of Things. Splunk Enterprise version 6.3 supports HTTP Event Collector and REST and JSON APIs for data collection on Splunk. HTTP Event Collector is a very useful interface that can be used to send data without using any forwarder from your existing application to the Splunk Enterprise server. HTTP APIs are available in .NET, Java, Python, and almost all the programming languages. So, forwarding data from your existing application that is based on a specific programming language becomes a cake walk. Let's take an example, say, you are a developer of an Android application, and you want to know what all features the user uses that are the pain areas or problem-causing screens. You also want to know the usage pattern of your application. So, in the code of your Android application, you can use REST APIs to forward the logging data on the Splunk Enterprise server. The only important point to note here is that the data needs to be sent in a JSON payload envelope. The advantage of using HTTP Event Collector is that without using any third-party tools or any configuration, the data can be sent on Splunk and we can easily derive insights, analytics, and visualizations from it. HTTP Event Collector and configuration HTTP Event Collector can be used when you configure it from the Splunk Web console, and the event data from HTTP can be indexed in Splunk using the REST API. HTTP Event Collector HTTP Event Collector (EC) provides an API with an endpoint that can be used to send log data from applications into Splunk Enterprise. Splunk HTTP Event Collector supports both HTTP and HTTPS for secure connections. The following are the features of HTTP Event Collector, which make's adding data on Splunk Enterprise easier: It is very lightweight is terms of memory and resource usage, and thus can be used in resources constrained to lightweight devices as well. Events can be sent directly from anywhere such as web servers, mobile devices, and IoT without any need of configuration or installation of forwarders. It is a token-based JSON API that doesn't require you to save user credentials in the code or in the application settings. The authentication is handled by tokens used in the API. It is easy to configure EC from the Splunk Web console, enable HTTP EC, and define the token. After this, you are ready to accept data on Splunk Enterprise. It supports both HTTP and HTTPS, and hence it is very secure. It supports GZIP compression and batch processing. HTTP EC is highly scalable as it can be used in a distributed environment as well as with a load balancer to crunch and index millions of events per second. Summary In this article, we walked through various data input methods along with various data sources supported by Splunk. We also looked at HTTP Event Collector, which is a new feature added in Splunk 6.3 for data collection via REST to encourage the usage of Splunk for IoT. The data sources and input methods for Splunk are unlike any generic tool and the HTTP Event Collector is the added advantage compare to other data analytics tools. Resources for Article: Further resources on this subject: The Splunk Interface [article] The Splunk Web Framework [article] Introducing Splunk [article]
Read more
  • 0
  • 0
  • 16491
Unlock access to the largest independent learning library in Tech for FREE!
Get unlimited access to 7500+ expert-authored eBooks and video courses covering every tech area you can think of.
Renews at $19.99/month. Cancel anytime
article-image-running-your-spark-job-executors-docker-containers
Bernardo Gomez
27 May 2016
12 min read
Save for later

Running Your Spark Job Executors In Docker Containers

Bernardo Gomez
27 May 2016
12 min read
The following post showcases a Dockerized Apache Spark application running in a Mesos cluster. In our example, the Spark Driver as well as the Spark Executors will be running in a Docker image based on Ubuntu with the addition of the SciPy Python packages. If you are already familiar with the reasons for using Docker as well as Apache Mesos, feel free to skip the next section and jump right to the post, but if not, please carry on. Rational Today, it’s pretty common to find engineers and data scientist who need to run big data workloads in a shared infrastructure. In addition, the infrastructure can potentially be used not only for such workloads, but also for other important services required for business operations. A very common way to solve such problems is to virtualize the infrastructure and statically partition it in such a way that each development or business group in the company has its own resources to deploy and run their applications on. Hopefully, the maintainers of such infrastructure and services have a DevOps mentality and have automated, and continuously work on automating, the configuration and software provisioning tasks on such infrastructure. The problem is, as Benjamin Hindman backed by the studies done at the University of California, Berkeley, points out, static partitioning can be highly inefficient on the utilization of such infrastructure. This has prompted the development of Resource Schedulers that abstract CPU, memory, storage, and other computer resources away from machines, either physically or virtually, to enable the execution of applications across the infrastructure to achieve a higher utilization factor, among other things. The concept of sharing infrastructure resources is not new for applications that entail the analysis of large datasets, in most cases, through algorithms that favor parallelization of workloads. Today, the most common frameworks to develop such applications are Hadoop Map Reduce and Apache Spark. In the case of Apache Spark, it can be deployed in clusters managed by Resource Schedulers such as Hadoop YARN or Apache Mesos. Now, since different applications are running inside a shared infrastructure, it’s common to find applications that have different sets of requirements across the software packages and versions of such packages they depend on to function. As an operation engineer, or infrastructure manager, you can force your users to a predefine a set of software libraries, along with their versions, that the infrastructure supports. Hopefully, if you follow that path you also establish a procedure to upgrade such software libraries and add new ones. This tends to require an investment in time and might be frustrating to engineers and data scientist that are constantly installing new packages and libraries to facilitate their work. When you decide to upgrade, you might as well have to refactor some applications that might have been running for a long time but have heavy dependencies on previous versions of the packages that are part of the upgrade. All in all, it’s not simple. Linux Containers, and specially Docker, offer an abstraction such that software can be packaged into lightweight images that can be executed as containers. The containers are executed with some level of isolation, and such isolation is mainly provided by cgroups. Each image can define the type of operating system that it requires along with the software packages. This provides a fantastic mechanism to pass the burden of maintaining the software packages and libraries out of infrastructure management and operations to the owners of the applications. With this, the infrastructure and operations teams can run multiple isolated applications that can potentially have conflicting software libraries within the same infrastructure. Apache Spark can leverage this as long as it’s deployed with an Apache Mesos cluster that supports Docker. In the next sections, we will review how we can run Apache Spark applications within Docker containers. Tutorial For this post, we will use a CentOS 7.2 minimal image running on VirtualBox. However, in this tutorial, we will not include the instructions to obtain such a CentOS image, make it available in VirtualBox, or configure its network interfaces. Additionally, we will be using a single node to keep this exercise as simple as possible. We can later explore deploying a similar setup in a set of nodes in the cloud; but for the sake of simplicity and time, our single node will be running the following services: A Mesos master A Mesos slave A Zookeeper instance A Docker daemon Step 1: The Mesos Cluster To install Apache Mesos in your cluster, I suggest you follow the Mesosphere getting started guidelines. Since we are using CentOS 7.2, we first installed the Mesosphere YUM repository as follows: # Add the repository sudo rpm -Uvh http://repos.mesosphere.com/el/7/noarch/RPMS/ mesosphere-el-repo-7-1.noarch.rpm We then install Apache Mesos and the Apache Zookeeper packages. sudo yum -y install mesos mesosphere-zookeeper Once the packages are installed, we need to configure Zookeeper as well as the Mesos master and slave. Zookeeper For Zookeeper, we need to create a Zookeeper Node Identity. We do this by setting the numerical identifying inside the /var/lib/zookeeper/myid file. echo "1" > /var/lib/zookeeper/myid Since by default Zookeeper binds to all interfaces and exposes its services through port 2181, we do not need to change the /etc/zookeeper/conf/zoo.cfg file. Refer to the Mesosphere getting started guidelines if you have a Zookeeper ensemble, more than one node running Zookeeper. After that, we can start the Zookeeper service: sudo service zookeeper restart Mesos master and slave Before we start to describe the Mesos configuration, we must note that the location of the Mesos configuration files that we will talk about now is specific to Mesosphere's Mesos package. If you don't have a strong reason to build your own Mesos packages, I suggest you use the ones that Mesosphere kindly provides. We need to tell the Mesos master and slave about the connection string they can use to reach Zookeeper, including their namespace. By default, Zookeeper will bind to all interfaces; you might want to change this behavior. In our case, we will make sure that the IP address that we want to use to connect to Zookeeper can be resolved within the containers. The nodes public interface IP 192.168.99.100, and to do this, we do the following: echo "zk://192.168.99.100:2181/mesos" > /etc/mesos/zk Now, since in our setup we have several network interfaces associated with the node that will be running the Mesos master, we will pick an interface that will be reachable within the Docker containers that will eventually be running the Spark Driver and Spark Executors. Knowing that the IP address that we want to bind to is 192.168.99.100, we do the following: echo "192.168.99.100" > /etc/mesos-master/ip We do a similar thing for the Mesos slave. Again, consider that in our example the Mesos slave is running on the same node as the Mesos master and we will bind it to the same network interface. echo "192.168.99.100" > /etc/mesos-slave/ip echo "192.168.99.100" > /etc/mesos-slave/hostname The IP defines the IP address that the Mesos slave will bind to and the hostname defines the hostname that the slave will use to report its availability, and therefore, it is the value that the Mesos frameworks, in our case Apache Spark, will use to connect to it. Let’s start the services: systemctl start mesos-master systemctl start mesos-slave By default, the Mesos master will bind to port 5050 and the Mesos slave to port 5051. Let’s confirm this, assuming that you have installed the net-utils package: netstat -pleno | grep -E "5050|5051" tcp 0 0 192.168.99.100:5050 0.0.0.0:* LISTEN 0 127336 22205/mesos-master off (0.00/0/0) tcp 0 0 192.168.99.100:5051 0.0.0.0:* LISTEN 0 127453 22242/mesos-slave off (0.00/0/0) Let’s run a test: MASTER=$(mesos-resolve cat /etc/mesos/zk) LIBPROCESSIP=192.168.99.100 mesos-execute --master=$MASTER --name="cluster-test" --command="echo 'Hello World' && sleep 5 && echo 'Good Bye'" Step 2: Installing Docker We followed the Docker documentation on installing Docker in CentOS. I suggest that you do the same. In a nutshell, we executed the following: sudo yum update sudo tee /etc/yum.repos.d/docker.repo <<-'EOF' [dockerrepo] name=Docker Repository baseurl=https://yum.dockerproject.org/repo/main/centos/$releasever/ enabled=1 gpgcheck=1 gpgkey=https://yum.dockerproject.org/gpg EOF sudo yum install docker-engine sudo service docker start If the preceding code succeeded, you should be able to do a docker ps as well as a docker search ipython/scipystack successfully. Step 3: Creating a Spark image Let’s create the Dockerfile that will be used by the Spark Driver and Spark Executor. For our example, we will consider that the Docker image should provide the SciPy stack along with additional Python libraries. So, in a nutshell, the Docker image must have the following features: The version of libmesos should be compatible with the version of the Mesos master and slave. For example, /usr/lib/libmesos-0.26.0.so It should have a valid JDK It should have the SciPy stack as well as Python packages that we want It should have a version of Spark, we will choose 1.6.0. The Dockerfile below will provide the requirements that we mention above. Note that installing Mesos through the Mesosphere RPMs will install Open JDK, in this case version 1.7. Dockerfile: # Version 0.1 FROM ipython/scipystack MAINTAINER Bernardo Gomez Palacio "bernardo.gomezpalacio@gmail.com" ENV REFRESHEDAT 2015-03-19 ENV DEBIANFRONTEND noninteractive RUN apt-get update RUN apt-get dist-upgrade -y # Setup RUN sudo apt-key adv --keyserver keyserver.ubuntu.com --recv E56151BF RUN export OSDISTRO=$(lsbrelease -is | tr '[:upper:]' '[:lower:]') && export OSCODENAME=$(lsbrelease -cs) && echo "deb http://repos.mesosphere.io/${OSDISTRO} ${OSCODENAME} main" | tee /etc/apt/sources.list.d/mesosphere.list && apt-get -y update RUN apt-get -y install mesos RUN apt-get install -y python libnss3 curl RUN curl http://d3kbcqa49mib13.cloudfront.net/spark-1.6.0-bin-hadoop2.6.tgz | tar -xzC /opt && mv /opt/spark* /opt/spark RUN apt-get clean # Fix pypspark six error. RUN pip2 install -U six RUN pip2 install msgpack-python RUN pip2 install avro COPY spark-conf/* /opt/spark/conf/ COPY scripts /scripts ENV SPARKHOME /opt/spark ENTRYPOINT ["/scripts/run.sh"] Let’s explain some very important files that will be available in the Docker image according to the Dockerfile mentioned earlier: The spark-conf/spark-env.sh, as mentioned in the Spark docs, will be used to set the location of the Mesos libmesos.so: export MESOSNATIVEJAVALIBRARY=${MESOSNATIVEJAVALIBRARY:-/usr/lib/libmesos.so}export SPARKLOCALIP=${SPARKLOCALIP:-"127.0.0.1"}export SPARKPUBLICDNS=${SPARKPUBLICDNS:-"127.0.0.1"} The spark-conf/spark-defaults.conf serves as the definition of the default configuration for our Spark jobs within the container, the contents are as follows: spark.master SPARKMASTER spark.mesos.mesosExecutor.cores MESOSEXECUTORCORE spark.mesos.executor.docker.image SPARKIMAGE spark.mesos.executor.home /opt/spark spark.driver.host CURRENTIP spark.executor.extraClassPath /opt/spark/custom/lib/* spark.driver.extraClassPath /opt/spark/custom/lib/* Note that the use of environment variables such as SPARKMASTER and SPARKIMAGE are critical since this will allow us to customize how the Spark application interacts with the Mesos Docker integration. We have Docker's entry point script. The script, showcased below, will populate the spark-defaults.conf file. Now, let’s define the Dockerfile entry point such that it lets us define some basic options that will get passed to the Spark command, for example, spark-shell, spark-submit or pyspark: #!/bin/bash SPARKMASTER=${SPARKMASTER:-local} MESOSEXECUTORCORE=${MESOSEXECUTORCORE:-0.1} SPARKIMAGE=${SPARKIMAGE:-sparkmesos:lastet} CURRENTIP=$(hostname -i) sed -i 's;SPARKMASTER;'$SPARKMASTER';g' /opt/spark/conf/spark-defaults.conf sed -i 's;MESOSEXECUTORCORE;'$MESOSEXECUTORCORE';g' /opt/spark/conf/spark-defaults.conf sed -i 's;SPARKIMAGE;'$SPARKIMAGE';g' /opt/spark/conf/spark-defaults.conf sed -i 's;CURRENTIP;'$CURRENTIP';g' /opt/spark/conf/spark-defaults.conf export SPARKLOCALIP=${SPARKLOCALIP:-${CURRENTIP:-"127.0.0.1"}} export SPARKPUBLICDNS=${SPARKPUBLICDNS:-${CURRENTIP:-"127.0.0.1"}} if [ $ADDITIONALVOLUMES ]; then echo "spark.mesos.executor.docker.volumes: $ADDITIONALVOLUMES" >> /opt/spark/conf/spark-defaults.conf fi exec "$@" Let’s build the image so we can start using it. docker build -t sparkmesos . && docker tag -f sparkmesos:latest sparkmesos:latest Step 4: Running a Spark application with Docker. Now that the image is built, we just need to run it. We will call the PySpark application: docker run -it --rm -e SPARKMASTER="mesos://zk://192.168.99.100:2181/mesos" -e SPARKIMAGE="sparkmesos:latest" -e PYSPARKDRIVERPYTHON=ipython2 sparkmesos:latest /opt/spark/bin/pyspark To make sure that SciPy is working, let's write the following to the PySpark shell: from scipy import special, optimize import numpy as np f = lambda x: -special.jv(3, x) sol = optimize.minimize(f, 1.0) x = np.linspace(0, 10, 5000) x Now, let’s try to calculate PI as an example: docker run -it --rm -e SPARKMASTER="mesos://zk://192.168.99.100:2181/mesos" -e SPARKIMAGE="sparkmesos:latest" -e PYSPARKDRIVERPYTHON=ipython2 sparkmesos:latest /opt/spark/bin/spark-submit --driver-memory 500M --executor-memory 500M /opt/spark/examples/src/main/python/pi.py 10 Conclusion and further notes Although we were able to run a Spark application within a Docker container leveraging Apache Mesos, there is more work to do. We need to explore containerized Spark applications that spread across multiple nodes along with providing a mechanism that enables network port mapping. References Apache Mesos, The Apache Software Foundation, 2015. Web. 27 Jan. 2016. Apache Spark, The Apache Software Foundation, 2015. Web. 27 Jan. 2016. Benjamin Hindman, "Apache Mesos NYC Meetup", August 20, 2013. Web. 27 Jan 2016. Docker, Docker Inc, 2015. Web. 27 Jan 2016. () Hindman, Konwinski, Zaharia, Ghodsi, D. Joseph, Katz, Shenker, Stoica. "Mesos: A Platform for Fine-Grained Resource Sharing in the Data Center" Web. 27 Jan 2016. Mesosphere Inc, 2015. Web. 27 Jan 2016. SciPy, SciPy developers, 2015. Web. 28 Jan 2016. Virtual Box, Oracle Inc, 2015. Web 28 Jan 2016. Wang Qiang, "Docker Spark Mesos". Web 28 Jan 2016. About the Author Bernardo Gomez Palacio is a consulting member of technical staff, Big Data Cloud Services at Oracle Cloud. He is an electronic systems engineer but has worked for more than 12 years developing software and more than 6 years on DevOps. Currently, his work is that of developing infrastructure to aid the creation and deployment of big data applications. He is a supporter of open source software and has a particular interest in Apache Mesos, Apache Spark, Distributed File Systems, and Docker Containerization & Networking. His opinions are his own and do not reflect the opinions of his employer. 
Read more
  • 0
  • 0
  • 13661

article-image-wrappers
Packt
27 May 2016
13 min read
Save for later

Wrappers

Packt
27 May 2016
13 min read
In this article by Erik Westra, author of the book Modular Programming with Python, we learn the concepts of wrappers. A wrapper is essentially a group of functions that call other functions to do the work. Wrappers are used to simplify an interface, to make a confusing or badly designed API easier to use, to convert data formats into something more convenient, and to implement cross-language compatibility. Wrappers are also sometimes used to add testing and error-checking code to an existing API. Let's take a look at a real-world application of a wrapper module. Imagine that you work for a large bank and have been asked to write a program to analyze fund transfers to help identify possible fraud. Your program receives information, in real time, about every inter-bank funds transfer that takes place. For each transfer, you are given: The amount of the transfer The ID of the branch in which the transfer took place The identification code for the bank the funds are being sent to Your task is to analyze the transfers over time to identify unusual patterns of activity. To do this, you need to calculate, for each of the last eight days, the total value of all transfers for each branch and destination bank. You can then compare the current day's totals against the average for the previous seven days, and flag any daily totals that are more than 50% above the average. You start by deciding how to represent the total transfers for a day. Because you need to keep track of this for each branch and destination bank, it makes sense to store these totals in a two-dimensional array: In Python, this type of two-dimensional array is represented as a list of lists: totals = [[0, 307512, 1612, 0, 43902, 5602918], [79400, 3416710, 75, 23508, 60912, 5806], ... ] You can then keep a separate list of the branch ID for each row and another list holding the destination bank code for each column: branch_ids = [125000249, 125000252, 125000371, ...] bank_codes = ["AMERUS33", "CERYUS33", "EQTYUS44", ...] Using these lists, you can calculate the totals for a given day by processing the transfers that took place on that particular day: totals = [] for branch in branch_ids: branch_totals = [] for bank in bank_codes: branch_totals.append(0) totals.append(branch_totals) for transfer in transfers_for_day: branch_index = branch_ids.index(transfer['branch']) bank_index = bank_codes.index(transfer['dest_bank']) totals[branch_index][bank_index] += transfer['amount'] So far so good. Once you have these totals for each day, you can then calculate the average and compare it against the current day's totals to identify the entries that are higher than 150% of the average. Let's imagine that you've written this program and managed to get it working. When you start using it, though, you immediately discover a problem: your bank has over 5,000 branches, and there are more than 15,000 banks worldwide that your bank can transfer funds to—that's a total of 75 million combinations that you need to keep totals for, and as a result, your program is taking far too long to calculate the totals. To make your program faster, you need to find a better way of handling large arrays of numbers. Fortunately, there's a library designed to do just this: NumPy. NumPy is an excellent array-handling library. You can create huge arrays and perform sophisticated operations on an array with a single function call. Unfortunately, NumPy is also a dense and impenetrable library. It was designed and written for people with a deep understanding of mathematics. While there are many tutorials available and you can generally figure out how to use it, the code that uses NumPy is often hard to comprehend. For example, to calculate the average across multiple matrices would involve the following: daily_totals = [] for totals in totals_to_average: daily_totals.append(totals) average = numpy.mean(numpy.array(daily_totals), axis=0) Figuring out what that last line does would require a trip to the NumPy documentation. Because of the complexity of the code that uses NumPy, this is a perfect example of a situation where a wrapper module can be used: the wrapper module can provide an easier-to-use interface to NumPy, so your code can use it without being cluttered with complex and confusing function calls. To work through this example, we'll start by installing the NumPy library. NumPy (http://www.numpy.org) runs on Mac OS X, Windows, and Linux machines. How you install it depends on which operating system you are using: For Mac OS X, you can download an installer from http://www.kyngchaos.com/software/python. For MS Windows, you can download a Python "wheel" file for NumPy from http://www.lfd.uci.edu/~gohlke/pythonlibs/#numpy. Choose the pre-built version of NumPy that matches your operating system and the desired version of Python. To use the wheel file, use the pip install command, for example, pip install numpy-1.10.4+mkl-cp34-none-win32.whl. For more information about installing Python wheels, refer to https://pip.pypa.io/en/latest/user_guide/#installing-from-wheels. If your computer runs Linux, you can use your Linux package manager to install NumPy. Alternatively, you can download and build NumPy in source code form. To ensure that NumPy is working, fire up your Python interpreter and enter the following: import numpy a = numpy.array([[1, 2], [3, 4]]) print(a) All going well, you should see a 2 x 2 matrix displayed: [[1 2] [3 4]] Now that we have NumPy installed, let's start working on our wrapper module. Create a new Python source file, named numpy_wrapper.py, and enter the following into this file: import numpy That's all for now; we'll add functions to this wrapper module as we need them. Next, create another Python source file, named detect_unusual_transfers.py, and enter the following into this file: import random import numpy_wrapper as npw BANK_CODES = ["AMERUS33", "CERYUS33", "EQTYUS44", "LOYDUS33", "SYNEUS44", "WFBIUS6S"] BRANCH_IDS = ["125000249", "125000252", "125000371", "125000402", "125000596", "125001067"] As you can see, we are hardwiring the bank and branch codes for our example; in a real program, these values would be loaded from somewhere, such as a file or a database. Since we don't have any available data, we will use the random module to create some. We are also changing the name of the numpy_wrapper module to make it easier to access from our code. Let's now create some funds transfer data to process, using the random module: days = [1, 2, 3, 4, 5, 6, 7, 8] transfers = [] for i in range(10000): day = random.choice(days) bank_code = random.choice(BANK_CODES) branch_id = random.choice(BRANCH_IDS) amount = random.randint(1000, 1000000) transfers.append((day, bank_code, branch_id, amount)) Here, we randomly select a day, a bank code, a branch ID, and an amount, storing these values in the transfers list. Our next task is to collate this information into a series of arrays. This allows us to calculate the total value of the transfers for each day, grouped by the branch ID and destination bank. To do this, we'll create a NumPy array for each day, where the rows in each array represent branches and the columns represent destination banks. We'll then go through the list of transfers, processing them one by one. The following illustration summarizes how we process each transfer in turn: First, we select the array for the day on which the transfer occurred, and then we select the appropriate row and column based on the destination bank and the branch ID. Finally, we add the amount of the transfer to that item within the day's array. Let's implement this logic. Our first task is to create a series of NumPy arrays, one for each day. Here, we immediately hit a snag: NumPy has many different options for creating arrays; in this case, we want to create an array that holds integer values and has its contents initialized to zero. If we used NumPy directly, our code would look like the following: array = numpy.zeros((num_rows, num_cols), dtype=numpy.int32) This is not exactly easy to understand, so we're going to move this logic into our NumPy wrapper module. Edit the numpy_wrapper.py file, and add the following to the end of this module: def new(num_rows, num_cols): return numpy.zeros((num_rows, num_cols), dtype=numpy.int32) Now, we can create a new array by calling our wrapper function (npw.new()) and not have to worry about the details of how NumPy works at all. We have simplified the interface to this particular aspect of NumPy: Let's now use our wrapper function to create the eight arrays that we will need, one for each day. Add the following to the end of the detect_unusual_transfers.py file: transfers_by_day = {} for day in days: transfers_by_day[day] = npw.new(num_rows=len(BANK_CODES), num_cols=len(BRANCH_IDS)) Now that we have our NumPy arrays, we can use them as if they were nested Python lists. For example: array[row][col] = array[row][col] + amount We just need to choose the appropriate array, and calculate the row and column numbers to use. Here is the necessary code, which you should add to the end of your detect_unusual_transfers.py script: for day,bank_code,branch_id,amount in transfers: array = transfers_by_day[day] row = BRANCH_IDS.index(branch_id) col = BANK_CODES.index(bank_code) array[row][col] = array[row][col] + amount Now that we've collated the transfers into eight NumPy arrays, we want to use all this data to detect any unusual activity. For each combination of branch ID and destination bank code, we will need to do the following: Calculate the average of the first seven days' activity. Multiply the calculated average by 1.5. If the activity on the eighth day is greater than the average multiplied by 1.5, then we consider this activity to be unusual. Of course, we need to do this for every row and column in our arrays, which would be very slow; this is why we're using NumPy. So, we need to calculate the average for multiple arrays of numbers, then multiply the array of averages by 1.5, and finally, compare the values within the multiplied array against the array for the eighth day of data. Fortunately, these are all things that NumPy can do for us. We'll start by collecting together the seven arrays we need to average, as well as the array for the eighth day. To do this, add the following to the end of your program: latest_day = max(days) transfers_to_average = [] for day in days: if day != latest_day: transfers_to_average.append(transfers_by_day[day]) current = transfers_by_day[latest_day] To calculate the average of a list of arrays, NumPy requires us to use the following function call: average = numpy.mean(numpy.array(arrays_to_average), axis=0) Since this is confusing, we will move this function into our wrapper. Add the following code to the end of the numpy_wrapper.py module: def average(arrays_to_average): return numpy.mean(numpy.array(arrays_to_average), axis=0) This lets us calculate the average of the seven day's activity using a single call to our wrapper function. To do this, add the following to the end of your detect_unusual_transfers.py script: average = npw.average(transfers_to_average) As you can see, using the wrapper makes our code much easier to understand. Our next task is to multiply the array of calculated averages by 1.5, and compare the result against the current day's totals. Fortunately, NumPy makes this easy: unusual_transfers = current > average * 1.5 Because this code is so clear, there's no advantage in creating a wrapper function for it. The resulting array, unusual_transfers, will be the same size as our current and average arrays, where each entry in the array is either True or False: We're almost done; our final task is to identify the array entries with a value of True, and tell the user about the unusual activity. While we could scan through every row and column to find the True entries, using NumPy is much faster. The following NumPy code will give us a list containing the row and column numbers for the True entries in the array: indices = numpy.transpose(array.nonzero()) True to form, though, this code is hard to understand, so it's a perfect candidate for another wrapper function. Go back to your numpy_wrapper.py module, and add the following to the end of the file: def get_indices(array): return numpy.transpose(array.nonzero()) This function returns a list (actually an array) of (row,col) values for all the True entries in the array. Back in our detect_unusual_activity.py file, we can use this function to quickly identify the unusual activity: for row,col in npw.get_indices(unusual_transfers): branch_id = BRANCH_IDS[row] bank_code = BANK_CODES[col] average_amt = int(average[row][col]) current_amt = current[row][col] print("Branch {} transferred ${:,d}".format(branch_id, current_amt) + " to bank {}, average = ${:,d}".format(bank_code, average_amt)) As you can see, we use the BRANCH_IDS and BANK_CODES lists to convert from the row and column number back to the relevant branch ID and bank code. We also retrieve the average and current amounts for the suspicious activity. Finally, we print out this information to warn the user about the unusual activity. If you run your program, you should see an output that looks something like this: Branch 125000371 transferred $24,729,847 to bank WFBIUS6S, average = $14,954,617 Branch 125000402 transferred $26,818,710 to bank CERYUS33, average = $16,338,043 Branch 125001067 transferred $27,081,511 to bank EQTYUS44, average = $17,763,644 Because we are using random numbers for our financial data, the output will be random too. Try running the program a few times; you may not get any output at all if none of the randomly-generated values are suspicious. Of course, we are not really interested in detecting suspicious financial activity—this example is just an excuse for working with NumPy. What is far more interesting is the wrapper module that we created, hiding the complexity of the NumPy interface so that the rest of our program can concentrate on the job to be done. If we were to continue developing our unusual activity detector, we would no doubt add more functionality to our numpy_wrapper.py module as we found more NumPy functions that we wanted to wrap. Summary This is just one example of a wrapper module. As we mentioned earlier, simplifying a complex and confusing API is just one use for a wrapper module; they can also be used to convert data from one format to another, add testing and error-checking code to an existing API, and call functions that are written in a different language. Note that, by definition, a wrapper is always thin—while there might be code in a wrapper (for example, to convert a parameter from an object into a dictionary), the wrapper function always ends up calling another function to do the actual work.
Read more
  • 0
  • 0
  • 2422

article-image-security-considerations-multitenant-environment
Packt
24 May 2016
8 min read
Save for later

Security Considerations in Multitenant Environment

Packt
24 May 2016
8 min read
In this article by Zoran Pavlović and Maja Veselica, authors of the book, Oracle Database 12c Security Cookbook, we will be introduced to common privileges and learn how to grant privileges and roles commonly. We'll also study the effects of plugging and unplugging operations on users, roles, and privileges. (For more resources related to this topic, see here.) Granting privileges and roles commonly Common privilege is a privilege that can be exercised across all containers in a container database. Depending only on the way it is granted, a privilege becomes common or local. When you grant privilege commonly (across all containers) it becomes common privilege. Only common users or roles can have common privileges. Only common role can be granted commonly. Getting ready For this recipe, you will need to connect to the root container as an existing common user who is able to grant a specific privilege or existing role (in our case – create session, select any table, c##role1, c##role2) to another existing common user (c##john). If you want to try out examples in the How it works section given ahead, you should open pdb1 and pdb2. You will use: Common users c##maja and c##zoran with dba role granted commonly Common user c##john Common roles c##role1 and c##role2 How to do it... You should connect to the root container as a common user who can grant these privileges and roles (for example, c##maja or system user). SQL> connect c##maja@cdb1 Grant a privilege (for example, create session) to a common user (for example, c##john) commonly c##maja@CDB1> grant create session to c##john container=all; Grant a privilege (for example, select any table) to a common role (for example, c##role1) commonly c##maja@CDB1> grant select any table to c##role1 container=all; Grant a common role (for example, c##role1) to a common role (for example, c##role2) commonly c##maja@CDB1> grant c##role1 to c##role2 container=all; Grant a common role (for example, c##role2) to a common user (for example, c##john) commonly c##maja@CDB1> grant c##role2 to c##john container=all; How it works... Figure 16 You can grant privileges or common roles commonly only to a common user. You need to connect to the root container as a common user who is able to grant a specific privilege or role. In step 2, system privilege, create session, is granted to common user c##john commonly, by adding a container=all clause to the grant statement. This means that user c##john can connect (create session) to root or any pluggable database in this container database (including all pluggable databases that will be plugged-in in the future). N.B. container = all clause is NOT optional, even though you are connected to the root. Unlike during creation of common users and roles (if you omit container=all, user or role will be created in all containers – commonly), If you omit this clause during privilege or role grant, privilege or role will be granted locally and it can be exercised only in root container. SQL> connect c##john/oracle@cdb1 c##john@CDB1> connect c##john/oracle@pdb1 c##john@PDB1> connect c##john/oracle@pdb2 c##john@PDB2> In the step 3, system privilege, select any table, is granted to common role c##role1 commonly. This means that role c##role1 contains select any table privilege in all containers (root and pluggable databases). c##zoran@CDB1> select * from role_sys_privs where role='C##ROLE1'; ROLE PRIVILEGE ADM COM ------------- ----------------- --- --- C##ROLE1 SELECT ANY TABLE NO YES c##zoran@CDB1> connect c##zoran/oracle@pdb1 c##zoran@PDB1> select * from role_sys_privs where role='C##ROLE1'; ROLE PRIVILEGE ADM COM -------------- ------------------ --- --- C##ROLE1 SELECT ANY TABLE NO YES c##zoran@PDB1> connect c##zoran/oracle@pdb2 c##zoran@PDB2> select * from role_sys_privs where role='C##ROLE1'; ROLE PRIVILEGE ADM COM -------------- ---------------- --- --- C##ROLE1 SELECT ANY TABLE NO YES In step 4, common role c##role1, is granted to another common role c##role2 commonly. This means that role c##role2 has granted role c##role1 in all containers. c##zoran@CDB1> select * from role_role_privs where role='C##ROLE2'; ROLE GRANTED ROLE ADM COM --------------- --------------- --- --- C##ROLE2 C##ROLE1 NO YES c##zoran@CDB1> connect c##zoran/oracle@pdb1 c##zoran@PDB1> select * from role_role_privs where role='C##ROLE2'; ROLE GRANTED_ROLE ADM COM ------------- ----------------- --- --- C##ROLE2 C##ROLE1 NO YES c##zoran@PDB1> connect c##zoran/oracle@pdb2 c##zoran@PDB2> select * from role_role_privs where role='C##ROLE2'; ROLE GRANTED_ROLE ADM COM ------------- ------------- --- --- C##ROLE2 C##ROLE1 NO YES In step 5, common role c##role2, is granted to common user c##john commonly. This means that user c##john has c##role2 in all containers. Consequently, user c##john can use select any table privilege in all containers in this container database. c##john@CDB1> select count(*) from c##zoran.t1; COUNT(*) ---------- 4 c##john@CDB1> connect c##john/oracle@pdb1 c##john@PDB1> select count(*) from hr.employees; COUNT(*) ---------- 107 c##john@PDB1> connect c##john/oracle@pdb2 c##john@PDB2> select count(*) from sh.sales; COUNT(*) ---------- 918843 Effects of plugging/unplugging operations on users, roles, and privileges Purpose of this recipe is to show what is going to happen to users, roles, and privileges when you unplug a pluggable database from one container database (cdb1) and plug it into some other container database (cdb2). Getting ready To complete this recipe, you will need: Two container databases (cdb1 and cdb2) One pluggable database (pdb1) in container database cdb1 Local user mike in pluggable database pdb1 with local create session privilege Common user c##john with create session common privilege and create synonym local privilege on pluggable database pdb1 How to do it... Connect to the root container of cdb1 as user sys: SQL> connect sys@cdb1 as sysdba Unplug pdb1 by creating XML metadata file: SQL> alter pluggable database pdb1 unplug into '/u02/oradata/pdb1.xml'; Drop pdb1 and keep datafiles: SQL> drop pluggable database pdb1 keep datafiles; Connect to the root container of cdb2 as user sys: SQL> connect sys@cdb2 as sysdba Create (plug) pdb1 to cdb2 by using previously created metadata file: SQL> create pluggable database pdb1 using '/u02/oradata/pdb1.xml' nocopy; How it works... By completing previous steps, you unplugged pdb1 from cdb1 and plugged it into cdb2. After this operation, all local users and roles (in pdb1) are migrated with pdb1 database. If you try to connect to pdb1 as a local user: SQL> connect mike@pdb1 It will succeed. All local privileges are migrated, even if they are granted to common users/roles. However, if you try to connect to pdb1 as a previously created common user c##john, you'll get an error SQL> connect c##john@pdb1 ERROR: ORA-28000: the account is locked Warning: You are no longer connected to ORACLE. This happened because after migration, common users are migrated in a pluggable database as locked accounts. You can continue to use objects in these users' schemas, or you can create these users in root container of a new CDB. To do this, we first need to close pdb1: sys@CDB2> alter pluggable database pdb1 close; Pluggable database altered. sys@CDB2> create user c##john identified by oracle container=all; User created. sys@CDB2> alter pluggable database pdb1 open; Pluggable database altered. If we try to connect to pdb1 as user c##john, we will get an error: SQL> conn c##john/oracle@pdb1 ERROR: ORA-01045: user C##JOHN lacks CREATE SESSION privilege; logon denied Warning: You are no longer connected to ORACLE. Even though c##john had create session common privilege in cdb1, he cannot connect to the migrated PDB. This is because common privileges are not migrated! So we need to give create session privilege (either common or local) to user c##john. sys@CDB2> grant create session to c##john container=all; Grant succeeded. Let's try granting a create synonym local privilege to the migrated pdb2: c##john@PDB1> create synonym emp for hr.employees; Synonym created. This proves that local privileges are always migrated. Summary In this article, we learned about common privileges and the methods to grant common privileges and roles to users. We also studied what happens to users, roles, and privileges when you unplug a pluggable database from one container database and plug it into some other container database. Resources for Article: Further resources on this subject: Oracle 12c SQL and PL/SQL New Features[article] Oracle GoldenGate 12c — An Overview[article] Backup and Recovery for Oracle SOA Suite 12C[article]
Read more
  • 0
  • 0
  • 1671

article-image-mobile-forensics
Packt
24 May 2016
15 min read
Save for later

Mobile Forensics

Packt
24 May 2016
15 min read
In this article by Soufiane Tahiri, the author of Mastering Mobile Forensics, we will look at the basics of smartphone forensics. Smartphone forensic is a relatively new and quickly emerging field of interest within the digital forensic community and law enforcement, as today's mobile devices are getting smarter, cheaper, and more easily available for common daily use. (For more resources related to this topic, see here.) To investigate the growing number of digital crimes and complaints, researchers have put in a lot of efforts to design the most affordable investigative model; in this article, we will emphasize the importance of paying real attention to the growing market of smartphones and the efforts made in this field from a digital forensic point of view, in order to design the most comprehensive investigation process. Smartphone forensics models Given the pace at which mobile technology grows and the variety of complexities that are produced by today's mobile data, forensics examiners face serious adaptation problems; so, developing and adopting standards makes sense. Reliability of evidence depends directly on adopted investigative processes, choosing to bypass or bypassing a step accidentally may (and will certainly) lead to incomplete evidence and increase the risk of rejection in the court of law. Today, there is no standard or unified model that is adapted to acquiring evidences from smartphones. The dramatic development of smart devices suggests that any forensic examiner will have to apply as many independent models as necessary in order to collect and preserve data. Similar to any forensic investigation, several approaches and techniques can be used to acquire, examine, and analyze data from a mobile device. This section provides a proposed process in which guidelines from different standards and models (SWGDE Best Practices for Mobile Phone Forensics, NIST Guidelines on Mobile Device Forensics, and Developing Process for Mobile Device Forensics by Det. Cynthia A. Murphy) were summarized. The following flowchart schematizes the overall process: Evidence Intake: This triggers the examination process. This step should be documented. Identification: In this, the examiner needs to identify the device's capabilities and specifications. The examiner should document everything that takes place during the whole process of identification. Preparation: In this, the examiner should prepare tools and methods to use and must document them. Securing and preserving evidences: In this, the examiner should protect the evidences and secure the scene, as well as isolate the device from all networks. The examiner needs to be vigilant when documenting the scene. Processing: At this stage, the examiner starts performing the actual (and technical) data acquisition, analysis, and documents the steps, and tools used and all his findings. Verification and validation: The examiner should be sure of the integrity of his findings and he must validate acquired data and evidences in this step. This step should be documented as well. Reporting: The examiner produces a final report in which he documents process and finding. Presentation: This stage is meant to exhibit and present the findings. Archiving: At the end of the forensic process, the examiner should preserve data, report, tools, and all his finding in common formats for an eventual use. Low-level techniques Digital forensic examiners can neither always nor exclusively rely on commercially available tools, handling low-level techniques is a must. This section will also cover the techniques of extracting strings from different object (for example, smartphone images) Any digital examiner should be familiar with concepts and techniques, such as: File carving: This is defined as the process of extracting a collection of data from a larger data set. It is applied to a digital investigation case. File carving is the process of extracting "data" from unallocated filesystem space using file type inner structure and not filesystem structure, meaning that the extraction process is principally based on file types headers and trailers. Extracting metadata: In an ambiguous way metadata is data that describes data or information about information. In general, metadata is hidden and extra information is generated and embedded automatically in a digital file. The definition of metadata differs depending on the context in which it's used and the community that refers to it; metadata can be considered as machine understandable information or record that describes digital records. In fact, metadata can be subdivided into three important types: Descriptive (including elements, such as author, title, abstract, keywords, and so on), Structural (describing how an object is constituted and how the elements are arranged) and Administrative (including elements, such as date and time of creation, data type, and other technical details) String dump and analysis: Most of the digital investigations rely on textual evidences, this is obviously due to the fact that most of the stored digital data is linguistic; for instance, logged conversation, a lot of important text based evidence can be gathered while dumping strings from images (smartphone memory dumps) and can include emails, instant messaging, address books, browsing history, and so on. Most of the currently available digital forensic tools rely on matching and indexing algorithms to search textual evidence at physical level, so that they search every byte to locate specific text strings. Encryption versus encoding versus hashing: The important thing to keep in mind is that encoding, encrypting and hashing are the terms that do not say the same thing at all: Encoding: Is meant for data usability, and it can be reversed using the same algorithm and requires no key Encrypting: Is meant for confidentiality, is reversible and depending on algorithms, it relies on key(s) to encrypt and decrypt. Hashing: Is meant for data integrity and cannot be 'theoretically' reversible and depends on no keys. Decompiling and disassembling: These are types of reverse engineering processes that do the opposite of what a compiler and an assembler do. Decompiler: This translates a compiled binary's low-level code designed to be computer readable into human readable high-level code. The accuracy of decompilers depends on many factors, such as the amount of metadata present in the code being decompiled and the complexity of the code (not in term of algorithms but in term of the high-level code used sophistication). Disassembler: The output of a disassembler is at some level dependent on the processor. It maps processor instructions into mnemonics, which is in contrast to decompiler's output that is far more complicated to understand and edit. iDevices forensics Similar to all Apple operating systems, iOS is derived from Mac OS X; thus, iOS uses Hierarchical File System Plus (HFS+) as its primary file system. HFS+ replaces the first developed filesystem HFS and is considered to be an enhanced version of HFS, but they are still architecturally very similar. The main improvements seen in HFS+ are: A decrease in disk space usage on large volumes (efficient use of disk space) Internationally-friendly file names (by the use of UNICODE instead of MacRoman) Allows future systems to use and extend files/folder's metadata HFS+ divides the total space on a volume (file that contains data and structure to access this data) into allocation blocks and uses 32-bit fields to identify them, meaning that this allows up to 2^32 blocks on a given volume which "simply" means that a volume can hold more files. All HFS+ volumes respect a well-defined structure and each volume contains a volume header, a catalog file, extents overflow file, attributes file, allocation file, and startup file. In addition, all Apple' iDevices have a combined built-in hardware/software advanced security and can be categorized according to Apple's official iOS Security Guide as: System security: Integrated software and hardware platform Encryption and data protection: Mechanisms implemented to protect data from unauthorized use Application security: Application sandboxing Network security: Secure data transmission Apple Pay: Implementation of secure payments Internet services: Apple's network of messaging, synchronizing, and backuping Device controls: Remotely wiping the device if it is lost or stolen Privacy control: Capabilities of control access to geolocation and user data When dealing with seizure, it's important to turn on Airplane mode and if the device is unlocked, set auto-lock to never and check whether passcode was set or not (Settings | Passcode). If you are dealing with a passcode, try to keep the phone charged if you cannot acquire its content immediately; if no passcode was set, turn off the device. There are four different acquisition methods when talking about iDevices: Normal or Direct, this is the most perfect case where you can deal directly with a powered on device; Logical Acquisition, when acquisition is done using iTunes backup or a forensic tool that uses AFC protocol and is in general not complete when emails, geolocation database, apps cache folder, and executables are missed; Advanced Logical Acquisition, a technique introduced by Jonathan Zdziarski (http://www.zdziarski.com/blog/) but no longer possible due to the introduction of iOS 8; and Physical Acquisition that generates a forensic bit-by-bit image of both system and data partitions. Before selecting (or not, because the method to choose depends on some parameters) one method, the examiner should answer three important questions: What is the device model? What is the iOS version installed? Is the device passcode protected? Is it a simple passcode? Is it a complex passcode? Android forensics Android is an open source Linux based operating system, it was first developed by Android Inc. in 2003; then in 2005 it was acquired by Google and was unveiled in 2007. The Android operating system is like most of operating systems; it consists of a stack of software components roughly divided into four main layers and five main sections, as shown on the image from https://upload.wikimedia.org/wikipedia/commons/a/af/Android-System-Architecture.svg) and each layer provides different services to the layer above. Understanding every smartphone's OS security model is a big deal in a forensic context, all vendors and smartphones manufacturers care about securing their user's data and in most of the cases the security model implemented can cause a real headache to every forensic examiner and Android is no exception to the rule. Android, as you know, is an open source OS built on the Linux Kernel and provides an environment offering the ability to run multiple applications simultaneously, each application is digitally signed and isolated in its very own sandbox. Each application sandbox defines the application's privileges. Above the Kernel all activities have constrained access to the system. Android OS implements many security components and has many considerations of its various layers; the following figure summarizes Android security architecture on ARM with TrustZone support: Without any doubt, lock screens represent the very first starting point in every mobile forensic examination. As for all smartphone's OS, Android offers a way to control access to a given device by requiring user authentication. The problem with recent implementations of lock screen in modern operating systems in general, and in Android since it is the point of interest of this section, is that beyond controlling access to the system user interface and applications, the lock screens have now been extended with more "fancy" features (showing widgets, switching users in multi-users devices, and so on) and more forensically challenging features, such as unlocking the system keystore to derive the key-encryption key (used among the disk encryption key) as well as the credential storage encryption key. The problem with bypassing lock screens (also called keyguards) is that techniques that can be used are very version/device dependent, thus there is neither a generalized method nor all-time working techniques. Android keyguard is basically an Android application whose window lives on a high window layer with the possibility of intercepting navigation buttons, in order to produce the lock effect. Each unlock method (PIN, password, pattern and face unlock) is a view component implementation hosted by the KeyguardHostView view container class. All of the methods/modes, used to secure an android device, are activated by setting the current selected mode in the enumerable SecurityMode of the class KeyguardSecurityModel. The following is the KeyguardSecurityModel.SecurityModeimplementation, as seen from Android open source project:     enum SecurityMode {         Invalid, // NULL state         None, // No security enabled         Pattern, // Unlock by drawing a pattern.         Password, // Unlock by entering an alphanumeric password         PIN, // Strictly numeric password         Biometric, // Unlock with a biometric key (e.g. finger print or face unlock)         Account, // Unlock by entering an account's login and password.         SimPin, // Unlock by entering a sim pin.         SimPuk // Unlock by entering a sim puk     } Before starting our bypass and locks cracking techniques, dealing with system files or "system protected files" assumes that the device you are handling meets some requirements: Using Android Debug Bridge (ADB) The device must be rooted USB Debugging should be enabled on the device Booting into a custom recovery mode JTAG/chip-off to acquire a physical bit-by-bit copy Windows Phone forensics Based on Windows NT Kernel, Windows Phone 8.x uses the Core System to boot, manage hardware, authenticate, and communicate on networks. The Core System is a minimal Windows system that contains low-level security features and is supplemented by a set of Windows Phone specific binaries from Mobile Core to handle phone-specific tasks which make it the only distinct architectural entity (From desktop based Windows) in Windows Phone. Windows and Windows Phone are completely aligned at Window Core System and are running exactly the same code at this level. The shared core actually consists of the Windows Core System and Mobile Core where APIs are the same but the code behinds is turned to mobile needs. Similar to most of the mobile operating systems, Windows Phone has a pretty layered architecture; the kernel and OS layers are mainly provided and supported by Microsoft but some layers are provided by Microsoft's partners depending on hardware properties in the form of board support package (BSP), which usually consists of a set of drivers and support libraries that ensure low-level hardware interaction and boot process created by the CPU supplier, then comes the original equipment manufacturers (OEMs) and independent hardware vendors (IHVs) that write the required drivers to support the phone hardware and specific component. Following this is a high level diagram describing Windows Phone architecture organized by layer and ownership: There are three main partitions on a Windows Phone that are forensically interesting: MainOS, Data, and Removable User Data (not visible on the preceding screenshot since Lumia 920 does not support SD cards) partitions; as their respective names suggest, the MainOS partition contains all Windows Phone operating system components, Data partition stores all user's data, third-party applications and all application's states. The Removable User Data partition is considered by Windows Phone as a separate volume and refers to all data stored in the SD Card (on devices that supports SD cards). Each of the previously named partitions respects a folder layout and can be mapped to their root folders with predefined Access Control Lists (ACL). Each ACL is in the form of a list of access control entries (ACE) and each ACE identifies the user account to which it applies (trustee) and specifies the access right allowed, denied or audited for that trustee. Windows Phone 8.1 is an extremely challenging and different; forensic tools and techniques should be used in order to gather evidences. One of the interesting techniques is side loading, where an agent to extract contacts and appointments from a WP8.1 device. To extract phonebook and appointments entries we will use WP Logical, which is a contacts and appointments acquisition tool designed to run under Windows Phone 8.1, once deployed and executed will create a folder with the name WPLogical_MDY__HMMSS_PM/AM under the public folder PhonePictures where M=Month, D=Day, Y=Year, H=hour, MM=Minutes and SS= Seconds of the extraction date. Inside the created folder you can find appointments__MDY__HMMSS_PM/AM.html and contacts_MDY__HMMSS_PM/AM.html. WP Logical will extract the following information (if found) regarding each appointment starting from 01/01/CurrentYear at 00:00:00 to 31/12/CurrentYear at 00:00:00: Subject Location Organizer Invitees Start time (UTC) Original start time Duration (in hours) Sensitivity Replay time Is organized by user? Is canceled? More details And the following information about each found contact: Display name First name Middle name Last name Phones (types: personal, office, home, and numbers) Important dates Emails (types: personal, office, home, and numbers) Websites Job info Addresses Notes Thumbnail WP Logical also allows the extraction of some device related information, such as Phone time zone, device's friendly name, Store Keeping Unit (SKU), and so on. Windows Phone 8.1 is relatively strict regarding application deployment; WP Logical can be deployed in two ways: Upload the compiled agent to Windows Store and get it signed by Microsoft, after that it will be available in the store for download. Deploy the agent directly to a developer unlocked device using Windows Phone Application Deployment utility. Summary In this article, we looked at forensics for iOS and Android devices. We also looked at some low-level forensic techniques. Resources for Article: Further resources on this subject: Mobile Forensics and Its Challanges [article] Introduction to Mobile Forensics [article] Forensics Recovery [article]
Read more
  • 0
  • 0
  • 18661
Packt
23 May 2016
13 min read
Save for later

Bang Bang – Let's Make It Explode

Packt
23 May 2016
13 min read
In this article by Justin Plowman, author of the book 3D Game Design with Unreal Engine 4 and Blender, We started with a basic level that, when it comes right down to it, is simply two rooms connected by a hallway with a simple crate. From humble beginnings our game has grown, as have our skills. Our simple cargo ship leads the player to a larger space station level. This level includes scripted events to move the story along and a game asset that looks great and animates. However, we are not done. How do we end our journey? We blow things up, that's how! In this article we will cover the following topics: Using class blueprints to bring it all together Creating an explosion using sound effects Adding particle effects (For more resources related to this topic, see here.) Creating a class blueprint to tie it all together We begin with the first step to any type of digital destruction, creation. we have created a disturbing piece of ancient technology. The Artifact stands as a long forgotten terror weapon of another age, somehow brought forth by an unknown power. But we know the truth. That unknown power is us, and we are about to import all that we need to implement the Artifact on the deck of our space station. Players beware! Take a look at the end result. To get started we will need to import the Artifact body, the Tentacle, and all of the texture maps from Substance Painter. Let's start with exporting the main body of the Artifact. In Blender, open our file with the complete Artifact. The FBX file format will allow us to export both the completed 3d model and the animations we created, all in a single file. Select the Artifact only. Since it is now bound to the skeleton we created, the bones and the geometry should all be one piece. Now press Alt+S to reset the scale of our game asset. Doing this will make sure that we won't have any problems weird scaling problems when we import the Artifact into Unreal. Head to the File menu and select Export. Choose FBX as our file format. On the first tab of the export menu, select the check box for Selected Objects. This will make sure that we get just the Artifact and not the Tentacle. On the Geometries tab, change the Smoothing option to Faces. Name your file and click Export! Alright, we now have the latest version of the Artifact exported out as an FBX. With all the different processes described in its pages, it makes a great reference! Time to bring up Unreal. Open the game engine and load our space station level. It's been a while since we've taken a look at it and there is no doubt in my mind that you've probably thought of improvements and new sections you would love to add. Don't forget them! Just set them aside for now. Once we get our game assets in there and make them explode, you will have plenty of time to add on. Time to import into Unreal! Before we begin importing our pieces, let's create a folder to hold our custom assets. Click on the Content folder in the Content Browser and then right click. At the top of the menu that appears, select New Folder and name it CustomAssets. It's very important not to use spaces or special characters (besides the underscore). Select our new folder and click Import. Select the Artifact FBX file. At the top of the Import menu, make sure Import as Skeletal and Import Mesh are selected. Now click the small arrow at the bottom of the section to open the advanced options. Lastly, turn on the check box to tell Unreal to use T0 As Reference Pose. A Reference Pose is the starting point for any animations associated with a skeletal mesh. Next, take a look at the Animation section of the menu. Turn on Import Animations to tell Unreal to bring in our open animation for the Artifact. Once all that is done, it's time to click Import! Unreal will create a Skeletal Mesh, an Animation, a Physics Asset, and a Skeleton asset for the Artifact. Together, these pieces make up a fully functioning skeletal mesh that can be used within our game. Take a moment and repeat the process for the Tentacle, again being careful to make sure to export only selected objects from Blender. Next, we need to import all of our texture maps from Substance Painter. Locate the export folder for Substance Painter. By default it is located in the Documents folder, under Substance Painter, and finally under Export. Head back into Unreal and then bring up the Export folder from your computer's task bar. Click and drag each of the texture maps we need into the Content Browser. Unreal will import them automatically. Time to set them all up as a usable material! Right click in the Content Browser and select Material from the Create Basic Asset section of the menu. Name the material Artifact_MAT. This will open a Material Editor window. Creating Materials and Shaders for video games is an art form all its own. Here I will talk about creating Materials in basic terms but I would encourage you to check out the Unreal documentation and open up some of the existing materials in the Starter Content folder and begin exploring this highly versatile tool. So we need to add our textures to our new Material. An easy way to add texture maps to any Material is to click and drag them from the Content Browser into the Material Editor. This will create a node called a Texture Sample which can plug into the different sockets on the main Material node. Now to plug in each map. Drag a wire from each of the white connections on the right side of each Texture Sample to its appropriate slot on the main Material node. The Metallic and Roughness texture sample will be plugged into two slots on the main node. Let's preview the result. Back in the Content Browser, select the Artifact. Then in the Preview Window of the Material Editor, select the small button the far right that reads Set the Preview Mesh based on the current Content Browser selection. The material has come out just a bit too shiny. The large amount of shine given off by the material is called the specular highlight and is controlled by the Specular connection on the main Material node. If we check the documentation, we can see that this part of the node accepts a value between 0 and 1. How might we do this? Well, the Material Editor has a Constant node that allows us to input a number and then plug that in wherever we may need it. This will work perfectly! Search for a Constant in the search box of the Palette, located on the right side of the Material Editor. Drag it into the area with your other nodes and head over to the Details panel. In the Value field, try different values between 0 and 1 and preview the result. I ended up using 0.1. Save your changes. Time to try it out on our Artifact! Double click on the Skeletal Mesh to open the Skeletal Mesh editor window. On the left hand side, look for the LOD0 section of the menu. This section has an option to add a material (I have highlighted it in the image above). Head back to the content browser and select our Artifact_MAT material. Now select the small arrow in the LOD0 box to apply the selection to the Artifact. How does it look? Too shiny? Not shiny enough? Feel free to adjust our Constant node in the material until you are able to get the result you want. When you are happy, repeat the process for the Tentacle. You will import it as a static mesh (since it doesn't have any animations) and create a material for it made out of the texture maps you created in Substance Painter. Now we will use a Class Blueprint for final assembly. Class Blueprints are a form of standalone Blueprint that allows us to combine art assets with programming in an easy and, most importantly, reusable package. For example, the player is a class blueprint as it combines the player character skeletal mesh with blueprint code to help the player move around. So how and when might we use class blueprints vs just putting the code in the level blueprint? The level blueprint is great for anything that is specific to just that level. Such things would include volcanoes on a lava based level or spaceships in the background of our space station level. Class blueprints work great for building objects that are self-contained and repeatable, such as doors, enemies, or power ups. These types of items would be used frequently and would have a place in several levels of a game. Let's create a class blueprint for the Artifact. Click on the Blueprints button and select the New Empty Blueprints Tab. This will open the Pick Parent Class menu. Since we creating a prop and not something that the player needs to control directly, select the Actor Parent Class. The next screen will ask us to name our new class and for a location to save it. I chose to save it in my CustomAssets folder and named it Artifact_Blueprint. Welcome to the Class Blueprint Editor. Similar to other editor windows within Unreal, the Class Blueprint editor has both a Details panel and a Palette. However, there is a panel that is new to us. The Components panel contains a list of the art that makes up a class blueprint. These components are various pieces that make up the whole object. For our Artifact, this would include the main piece itself, any number of tentacles, and a collision box. Other components that can be added include particle emitters, audio, and even lights. Let's add the Artifact. In the Components section, click the Add Component button and select Skeletal Mesh from the drop down list. You can find it in the Common section. This adds a blank Skeletal Mesh to the Viewport and the Components list. With it selected, check out the Details panel. In the Mesh section is an area to assign the skeletal mesh you wish it to be. Back in the Content Browser, select the Artifact. Lastly, back in the Details panel of the Blueprint Editor click the small arrow next to the Skeletal Mesh option to assign the Artifact. It should now appear in the viewport. Back to the Components list. Let's add a Collision Box. Click Add Component and select Collision Box from the Collision section of the menu. Click it and in the Details panel, increase the Box Extents to a size that would allow the player to enter within its bounds. I used 180 for x, y, and z. Repeat the last few steps and add the Tentacles to the Artifact using the Add Component menu. We will use the Static Mesh option. The design calls for 3, but add more if you like. Time to give this class blueprint a bit of programming. We want the player to be able to walk up to the Artifact and press the E key to open it. We used a Gate ton control the flow of information through the blueprint. However, Gates don't function the same within class blueprints so we require a slightly different approach. The first step in the process is to use the Enable Input and Disable Input nodes to allow the player to use input keys when they are within our box collision. Using the search box located within our Palette, grab an Enable Input and a Disable Input. Now we need to add our trigger events. Click on the Box variable within the Variable section of the My Blueprint panel. This changes the Details panel to display a list of all the Events that can be created for this component. Click the + button next to the OnComponentBeginOverlap and the OnComponentEndOverlap events. Connect the OnComponentBeginOverlap event to the Enable Input node and the OnComponentEndOverlap event to the Disable Input node. Next, create an event for the player pressing the E key by searching for it and dragging it in from the Palette. To that, we will add a Do Once node. This node works similar to a Gate in that it restricts the flow of information through the network, but it does allow the action to happen once before closing. This will make it so the player can press E to open the Artifact but the animation will only play once. Without it a player can press E as many times as they want, playing the animation over and over again. Its fun for a while since it makes it look like a mouth trying to eat you, but it's not our original intention (I might have spent some time pressing it repeatedly and laughing hysterically). Do Once can be easily found in the Palette. Lastly, we will need a Play Animation node. There are two versions so be sure to grab this node from the Skeletal Mesh section of your search so that its target is Skeletal Mesh Component. Connect the input E event to the Do Once node and the Do Once node to the Play Animation. Once last thing to complete this sequence. We need to set the target and animation to play on the Play Animation node. So the target will be our Skeletal Mesh component. Click on the Artifact component in the Components list and drag it into the Blueprint window and plug that into the Target on our Play Animation. Lastly, click the drop down under the New Anim to Play option on the Play Animation node and select our animation of the Artifact opening. We're done! Let's save all of our files and test this out. Drag the Artifact into our Space Station and position it in the Importer/Export Broker's shop. Build the level and then drop in and test. Did it open? Does it need more tentacles? Debug and refine it until it is exactly what you want. Summary This article provides an overview about using class blueprints, creating an explosion and adding particle effects. Resources for Article: Further resources on this subject: Dynamic Graphics [article] Lighting basics [article] VR Build and Run [article]
Read more
  • 0
  • 0
  • 10769

article-image-visualizations-using-ccc
Packt
20 May 2016
28 min read
Save for later

Visualizations Using CCC

Packt
20 May 2016
28 min read
In this article by Miguel Gaspar the author of the book Learning Pentaho CTools you will learn about the Charts Component Library in detail. The Charts Components Library is not really a Pentaho plugin, but instead is a Chart library that Webdetails created some years ago and that Pentaho started to use on the Analyzer visualizations. It allows a great level of customization by changing the properties that are applied to the charts and perfectly integrates with CDF, CDE, and CDA. (For more resources related to this topic, see here.) The dashboards that Webdetails creates make use of the CCC charts, usually with a great level of customization. Customizing them is a way to make them fancy and really good-looking, and even more importantly, it is a way to create a visualization that best fits the customer/end user's needs. We really should be focused on having the best visualizations for the end user, and CCC is one of the best ways to achieve this, but do this this you need to have a very deep knowledge of the library, and know how to get amazing results. I think I could write an entire book just about CCC, and in this article I will only be able to cover a small part of what I like, but I will try to focus on the basics and give you some tips and tricks that could make a difference.I'll be happy if I can give you some directions that you follow, and then you can keep searching and learning about CCC. An important part of CCC is understanding some properties such as the series in rows or the crosstab mode, because that is where people usually struggle at the start. When you can't find a property to change some styling/functionality/behavior of the charts, you might find a way to extend the options by using something called extension points, so we will also cover them. I also find the interaction within the dashboard to be an important feature.So we will look at how to use it, and you will see that it's very simple. In this article,you will learn how to: Understand the properties needed to adapt the chart to your data source results Use the properties of a CCC chart Create a CCC chat by using the JavaScript library Make use of internationalization of CCC charts See how to handle clicks on charts Scale the base axis Customize the tooltips Some background on CCC CCC is built on top of Protovis, a JavaScript library that allows you to produce visualizations just based on simple marks such as bars, dots, and lines, among others, which are created through dynamic properties based on the data to be represented. You can get more information on this at: http://mbostock.github.io/protovis/. If you want to extend the charts with some elements that are not available you can, but it would be useful to have an idea about how Protovis works.CCC has a great website, which is available at http://www.webdetails.pt/ctools/ccc/, where you can see some samples including the source code. On the page, you can edit the code, change some properties, and click the apply button. If the code is valid, you will see your chart update.As well as that, it provides documentation for almost all of the properties and options that CCC makes available. Making use of the CCC library in a CDF dashboard As CCC is a chart library, you can use it as you would use it on any other webpage, by using it like the samples on CCC webpages. But CDF also provides components that you can implement to use a CCC chart on a dashboard and fully integrate with the life cycle of the dashboard. To use a CCC chart on CDF dashboard, the HTML that is invoked from the XCDF file would look like the following(as we already covered how to build a CDF dashboard, I will not focus on that, and will mainly focus on the JavaScript code): <div class="row"> <div class="col-xs-12"> <div id="chart"/> </div> </div> <script language="javascript" type="text/javascript">   require(['cdf/Dashboard.Bootstrap', 'cdf/components/CccBarChartComponent'], function(Dashboard, CccBarChartComponent) {     var dashboard = new Dashboard();     var chart = new CccBarChartComponent({         type: "cccBarChart",         name: "cccChart",         executeAtStart: true,         htmlObject: "chart",         chartDefinition: {             height: 200,             path: "/public/…/queries.cda",             dataAccessId: "totalSalesQuery",             crosstabMode: true,             seriesInRows: false, timeSeries: false             plotFrameVisible: false,             compatVersion: 2         }     });     dashboard.addComponent(chart);     dashboard.init();   }); </script> The most important thing here is the use of the CCC chart component that we have covered as an example in which we have covered it's a bar chart. We can see by the object that we are instantiating CccBarChartComponent as also by the type that is cccBarChart. The previous dashboard will execute the query specified as dataAccessId of the CDA file set on the property path, and render the chart on the dashboard. We are also saying that its data comes from the query in the crosstab mode, but the base axis should not be atimeSeries. There are series in the columns, but don't worry about this as we'll be covering it later. The existing CCC components that you are able to use out of the box inside CDF dashboards are as follows. Don't forget that CCC has plenty of charts, so the sample images that you will see in the following table are just one example of the type of charts you can achieve. CCC Component Chart Type Sample Chart CccAreaChartComponent cccAreaChart   CccBarChartComponent cccBarChart http://www.webdetails.pt/ctools/ccc/#type=bar CccBoxplotChartComponent cccBoxplotChart http://www.webdetails.pt/ctools/ccc/#type=boxplot CccBulletChartComponent cccBulletChart http://www.webdetails.pt/ctools/ccc/#type=bullet CccDotChartComponent cccDotChart http://www.webdetails.pt/ctools/ccc/#type=dot CccHeatGridChartComponent cccHeatGridChart http://www.webdetails.pt/ctools/ccc/#type=heatgrid CccLineChartComponent cccLineChart http://www.webdetails.pt/ctools/ccc/#type=line CccMetricDotChartComponent cccMetricDotChart http://www.webdetails.pt/ctools/ccc/#type=metricdot CccMetricLineChartComponent cccMetricLineChart   CccNormalizedBarChartComponent cccNormalizedBarChart   CccParCoordChartComponent cccParCoordChart   CccPieChartComponent cccPieChart http://www.webdetails.pt/ctools/ccc/#type=pie CccStackedAreaChartComponent cccStackedAreaChart http://www.webdetails.pt/ctools/ccc/#type=stackedarea CccStackedDotChartComponent cccStackedDotChart   CccStackedLineChartComponent cccStackedLineChart http://www.webdetails.pt/ctools/ccc/#type=stackedline CccSunburstChartComponent cccSunburstChart http://www.webdetails.pt/ctools/ccc/#type=sunburst CccTreemapAreaChartComponent cccTreemapAreaChart http://www.webdetails.pt/ctools/ccc/#type=treemap CccWaterfallAreaChartComponent cccWaterfallAreaChart http://www.webdetails.pt/ctools/ccc/#type=waterfall In the sample code, you will find a property calledcompatMode that hasa value of 2 set. This will make CCC work as a revamped version that delivers more options, a lot of improvements, and makes it easier to use. Mandatory and desirable properties Among otherssuch as name, datasource, and htmlObject, there are other properties of the charts that are mandatory. The height is really important, because if you don't set the height of the chart, you will not fit the chart in the dashboard. The height should also be specified in pixels. If you don't set the width of the component, or to be more precise, then the chart will grab the width of the element where it's being rendered it will grab the width of the HTML element with the name specified in the htmlObject property. The seriesInRows, crosstabMode, and timeseriesproperties are optional, but depending on the kind of chart you are generating, you might want to specify them. The use of these properties becomes clear if we can also see the output of the queries we are executing. We need to get deeper into the properties that are related to the data mapping to visual elements. Mapping data We need to be aware of the way that data mapping is done in the chart.You can understand how it works if you can imagine data input as a table. CCC can receive the data as two different structures: relational and crosstab. If CCC receives data as crosstab,it will translate it to a relational structure. You can see this in the following examples. Crosstab The following table is an example of the crosstab data structure: Column Data 1 Column Data 2 Row Data 1 Measure Data 1.1 Measure Data 1.2 Row Data 2 Measure Data 2.1 Measure Data 2.2 Creating crosstab queries To create a crosstab query, usually you can do this with the group when using SQL, or just use MDX, which allows us to easily specify a set for the columns and for the rows. Just by looking at the previous and following examples, you should be able to understand that in the crosstab structure (the previous), columns and rows are part of the result set, while in the relational format (the following), column headers or headers are not part of the result set, but are part of the metadata that is returned from the query. The relationalformat is as follows: Column Row Value Column Data 1 Row Data 1 Measure Data 1.1 Column Data 2 Row Data 1 Measure Data 2.1 Column Data 1 Row Data 2 Measure Data 1.2 Column Data 2 Row Data 2 Measure Data 2.1   The preceding two data structures represent the options when setting the properties crosstabMode and seriesInRows. The crosstabMode property To better understand these concepts, we will make use of a real example. This property, crosstabMode, is easy to understand when comparing the two that represents the results of two queries. Non-crosstab (Relational): Markets Sales APAC 1281705 EMEA 50028224 Japan 503957 NA 3852061 Crosstab: Markets 2003 2004 2005 APAC 3529 5938 3411 EMEA 16711 23630 9237 Japan 2851 1692 380 NA 13348 18157 6447   In the previous tables, you can see that on the left-handside you can find the values of sales from each of the territories. The only relevant information relative to the values presented in only one variable, territories. We can say that we are able to get all the information just by looking at the rows, where we can see a direct connection between markets and the sales value. In the table presented on the right, you will find a value for each territory/year, meaning that the values presented, and in the sample provided in the matrix, are dependent on two variables, which are the territory in the rows and the years in the columns. Here we need both the rows andthe columns to know what each one of the values represents. Relevant information can be found in the rows and the columns, so this a crosstab. The crosstabs display the joint distribution of two or more variables, and are usually represented in the form of a contingency table in a matrix. When the result of a query is dependent only on one variable, then you should set the crosstabModeproperty to false. When it is dependent on 2 or more variables, you should set the crosstabMode property to false, otherwise CCC will just use the first two columns like in the non-crosstab example. The seriesInRows property Now let's use the same examplewhere we have a crosstab: The previous image shows two charts: the one on the left is a crosstab with the series in the rows, and the one on the right is also crosstab but the series are not in the rows (the series are in the columns).When the crosstab is set to true, it means that the measure column title can be translated as a series or a category,and that's determined by the property seriesInRows. If this property is set to true, then it will read the series from the rows, otherwise it will read the series from the columns. If the crosstab is set to false, the community chart component is expecting a row to correspond exactly to one data point, and two or three columns can be returned. When three columns are returned, they can be a category, series and dataor series, category and data and that's determined by the seriesInRows property. When set to true, CCC will expect the structure to have three columns such as category, series, and data. When it is set to false, it will expect them to be series, category, and data. A simple table should give you a quicker reference, so here goes: crosstabMode seriesInRows Description true true The column titles will act as category values while the series values are represented as data points of the first column. true false The column titles will act as series value while the category/category values are represented as data points of the first column. false true The column titles will act as category values while the series values are represented as data points of the first column. false false The column titles will act as category values while the series values are represented as data points of the first column. The timeSeries and timeSeriesFormat properties The timeSeries property defines whether the data to be represented by the chart is discrete or continuous. If we want to present some values over time, then the timeSeries property should be set to true. When we set the chart to be timeSeries, we also need to set another property to tell CCC how it should interpret the dates that come from the query.Check out the following image for timeSeries and timeSeriesFormat: The result of one of the queries has the year and the abbreviated month name separate by -, like 2015-Nov. For the chart to understand it as a date, we need to specify the format by setting the property timeSeriesFomart, which in our example would be %Y-%b, where %Y is the year is represented by four digits, and %b is the abbreviated month name. The format should be specified using the Protovis format that follows the same format as strftime in the C programming language, aside from some unsupported options. To find out what options are available, you should take a look at the documentation, which you will find at: https://mbostock.github.io/protovis/jsdoc/symbols/pv.Format.date.html. Making use of CCC inCDE There are a lot of properties that will use a default value, and you can find out aboutthem by looking at the documentation or inspecting the code that is generated by CDE when you use the charts components. By looking at the console log of your browser, you should also able to understand and get some information about the properties being used by default and/or see whether you are using a property that does not fit your needs. The use of CCC charts in CDE is simpler, just because you may not need to code. I am only saying may because to achieve quicker results, you may apply some code and make it easier to share properties among different charts or type of charts. To use a CCC chart, you just need to select the property that you need to change and set its value by using the dropdown or by just setting the value: The previous image shows a group of properties with the respective values on the right side. One of the best ways to start to get used to the CCC properties is to use the CCC page available as part of the Webdetails page: http://www.webdetails.pt/ctools/ccc. There you will find samples and the properties that are being used for each of the charts. You can use the dropdown to select different kinds of charts from all those that are available inside CCC. You also have the ability to change the properties and update the chart to check the result immediately. What I usually do, as it's easier and faster, is to change the properties here and check the results and then apply the necessary values for each of the properties in the CCC charts inside the dashboards. In the following samples, you will also find documentation about the properties, see where the properties are separated by sections of the chart, and after that you will find the extension points. On the site, when you click on a property/option, you will be redirected to another page where you will find the documentation and how to use it. Changing properties in the preExecution or postFetch We are able to change the properties for the charts, as with any other component. Inside the preExecution, this, refers to the component itself, so we will have access to the chart's main object, which we can also manipulate and add, remove, and change options. For instance, you can apply the following code: function() {    var cdProps = {         dotsVisible: true,         plotFrame_strokeStyle: '#bbbbbb',         colors: ['#005CA7', '#FFC20F', '#333333', '#68AC2D']     };     $.extend(true, this.chartDefinition, cdProps); } What we are doing is creating an object with all the properties that we want to add or change for the chart, and then extending the chartDefinitions (where the properties or options are). This is what we are doing with the JQuery function, extending. Use the CCC website and make your life easier This way to apply options makes it easier to set the properties. Just change or add the properties that you need, test it, and when you're happy with the result, you just need to copy them into the object that will extend/overwrite the chart options. Just keep in mind that the properties you change directly in the editor will be overwritten by the ones defined in the preExecution, if they match each other of course. Why is this important? It's because not all the properties that you can apply to CCC are exposed in CDE, so you can use the preExecution to use or set those properties. Handling the click event One important thing about the charts is that they allow interaction. CCC provides a way to handle some events in the chart and click is one of those events. To have it working, we need to change two properties: clickable, which needs to be set to true, and clickAction where we need to write a function with the code to be executed when a click happens. The function receives one argument that usually is referred to as a scene. The scene is an object that has a lot of information about the context where the event happened. From the object you will have access to vars, another object where we can find the series and the categories where the clicked happened. We can use the function to get the series/categories being clicked and perform a fireChange that can trigger updates on other components: function(scene) {     var series =  "Series:"+scene.atoms.series.label;     var category =  "Category:"+scene.vars.category.label;     var value = "Value:"+scene.vars.value.label;     Logger.log(category+"&"+value);     Logger.log(series); } In the previous code example, you can find the function to handle the click action for a CCC chart. When the click happens, the code is executed, and a variable with the click series is taken from scene.atoms.series.label. As well as this, the categories clickedscene.vars.category.label and the value that crosses the same series/category in scene.vars.value.value. This is valid for a crosstab, but you will not find the series when it's non-crosstab. You can think of a scene as describing one instance of visual representation. It is generally local to each panel or section of the chart and it's represented by a group of variables that are organized hierarchically. Depending on the scene, it may contain one or many datums. And you must be asking what a hell is a datum? A datum represents a row, so it contains values for multiple columns. We also can see from the example that we are referring to atoms, which hold at least a value, a label, and a key of a column. To get a better understanding of what I am talking about, you should perform a breakpoint anywhere in the code of the previous function and explore the object scene. In the previous example, you would be able to access to the category, series labels, and value, as you can see in the following table:   Corosstab Non-crosstab Value scene.vars.value.label or scene.getValue(); scene.vars.value.label or scene.getValue(); Category scene.vars.category.label or scene.getCategoryLabel(); scene.vars.category.label or scene.getCategoryLabel(); Series scene.atoms.series.label or scene.getSeriesLabel()   For instance, if you add the previous function code to a chart that is a crosstab where the categories are the years and the series are the territories, if you click on the chart, the output would be something like: [info] WD: Category:2004 & Value:23630 [info] WD: Series:EMEA This means that you clicked on the year 2004 for the EMEA. EMEA sales for the year 2004 were 23,630. If you replace the Logger functions withfireChangeas follows, you will be able to make use of the label/value of the clicked category to render other components and some details about them: this.dashboard.fireChange("parameter", scene.vars.category.label); Internationalization of CCCCharts We already saw that all the values coming from the database should not need to be translated. There are some ways in Pentaho to do this, but we may still need to set the title of a chart, where the title should be also internationalized. Another case is when you have dates where the month is represented by numbers in the base axis, but you want to display the month's abbreviated name. This name could be also translated to different languages, which is not hard. For the title, sub-title, and legend, the way to do it is using the instructions on how to set properties on preExecution.First, you will need to define the properties files for the internationalization and set the properties/translations: var cd = this.chartDefinition; cd.title =  this.dashboard.i18nSupport.prop('BOTTOMCHART.TITLE'); To change the title of the chart based on the language defined, we will need to define a function, but we can't use the property on the chart because that will only allow you to define a string, so you will not be able to use a JavaScript instruction to get the text. If you set the previous example code on the preExecution of the chart then, you will be able to. It may also make sense to change not only the titles, but for instance also internationalize the month names. If you are getting data like 2004-02, this may correspond to a time series format as %Y-%m. If that's the case and you want to display the abbreviated month name, then you may use the baseAxisTickFormatter and the dateFormat function from the dashboard utilities, also known as Utils. The code to write inside the preExecution would be like: var cd = this.chartDefinition; cd.baseAxisTickFormatter = function(label) {   return Utils.dateFormat(moment(label, 'YYYY-mmm'), 'MMM/YYYY'); }; The preceding code uses the baseAxisTickFormatter, which allows you to write a function that receives an argument, identified on the code as a label, because it will store the label for each one of the base axis ticks. We are using the dateFormatmethod and moment to format and return the year followed by the abbreviated month name. You can get information about the language defined and being used by running the following instruction moment.locale(); If you need to, you can change the language. Format a basis axis label based on the scale When you are working with a time series chart, you may want to set a different format for the base axis labels. Let's suppose you want to have a chart that is listening to a time selector. If you select one year old data to be displayed on the chart, certainly you are not interested in seeing the minutes on the date label. However, if you want to display the last hour, the ticks of the base axis need to be presented in minutes. There is an extension point we can use to get a conditional format based on the scale of the base axis. The extension point is baseAxisScale_tickFormatter, and it can be used like in the code as follows: baseAxisScale_tickFormatter: function(value, dateTickPrecision) { switch(dateTickPrecision) { casepvc.time.intervals.y: return format_date_year_tick(value);              break; casepvc.time.intervals.m: return format_date_month_tick(value);              break;            casepvc.time.intervals.H: return format_date_hour_tick(value);              break;          default:              return format_date_default_tick(value);   } } It accepts a function with two arguments: the value to be formatted and the tick precision, and should return the formatted label to be presented on each label of the base axis. The previous code shows howthe function is used. You can see a switch that based on the base axis scale will do a different format, calling a function. The functions in the code are not pre-defined—we need to write the functions or code to create the formatting. One example of a function to format the date is that we could use the utils dateFormat function to return the formatted value to the chart. The following table shows the intervals that can be used when verifying which time intervals are being displayed on the chart: Interval Description Number representing the interval y Year 31536e6 m Month 2592e6 d30 30 days 2592e6 d7 7 days 6048e5 d Day 864e5 H Hour 36e5 m Minute 6e4 s Second 1e3 ms Milliseconds 1 Customizing tooltips CCC provides the ability to change the tooltip format that comes by default, and can be changed using the tooltipFormat property. We can change it, making it look likethe following image, on the right side. You can also compare it to the one on the left, which is the default one: The tooltip default format might change depending on the chart type, but also on some options that you apply to the chart, mainly crosstabMode and seriesInRows. The property accepts a function that receives one argument, the scene, which will be a similar structure as already covered for the click event. You should return the HTML to be showed on the dashboard when we hover the chart. In the previous image,you will see on the chart on the left side the defiant tooltip, and on the right a different tooltip. That's because the following code was applied: tooltipFormat: function(scene){   var year = scene.atoms.series.label;   var territory = scene.atoms.category.value;   var sales = Utils.numberFormat(scene.vars.value.value, "#.00A");   var html = '<html>' + <div>Sales for '+year+' at '+territory+':'+sales+'</div>' + '</html>';   return html; } The code is pretty self-explanatory. First we are setting some variables such as year, territory, and the sales values, which we need to present inside the tooltip. Like in the click event, we are getting the labels/value from the scene, which might depend on the properties we set for the chart. For the sales, we are also abbreviating it, using two decimal places. And last, we build the HTML to be displayed when we hover over the chart. You can also change the base axis tooltip Like we are doing to the tooltip when hovering over the values represented in the chart, we can also baseAxisTooltip, just don't forget that the baseAxisTooltipVisible must be set to true (the value by default). Getting the values to show will pretty similar. It can get more complex, though not much more, when we also want for instance, to display the total value of sales for one year or for the territory. Based on that, we could also present the percentage relative to the total. We should use the property as explained earlier. The previous image is one example of how we can customize a tooltip. In this case, we are showing the value but also the percentage that represents the hovered over territory (as the percentage/all the years) and also for the hovered over year (where we show the percentage/all the territories): tooltipFormat: function(scene){   var year = scene.getSeriesLabel();   var territory = scene.getCategoryLabel();   var value = scene.getValue();   var sales = Utils.numberFormat(value, "#.00A");   var totals = {};   _.each(scene.chart().data._datums, function(element) {     var value = element.atoms.value.value;     totals[element.atoms.category.label] =            (totals[element.atoms.category.label]||0)+value;     totals[element.atoms.series.label] =       (totals[element.atoms.series.label]||0)+value;   });   var categoryPerc = Utils.numberFormat(value/totals[territory], "0.0%");   var seriesPerc = Utils.numberFormat(value/totals[year], "0.0%");   var html =  '<html>' + '<div class="value">'+sales+'</div>' + '<div class="dValue">Sales for '+territory+' in '+year+'</div>' + '<div class="bar">'+ '<div class="pPerc">'+categoryPerc+' of '+territory+'</div>'+ '<div class="partialBar" style="width:'+cPerc+'"></div>'+ '</div>' + '<div class="bar">'+ '<div class="pPerc">'+seriesPerc+' of '+year+'</div>'+ '<div class="partialBar" style="width:'+seriesPerc+'"></div>'+ '</div>' + '</html>';   return html; } The first lines of the code are pretty similar except that we are using scene.getSeriesLabel() in place of scene.atoms.series.label. They do the same, so it's only different ways to get the values/labels. Then the total calculations that are calculated by iterating in all the elements of scene.chart().data._datums, which return the logical/relational table, a combination of the territory, years, and value. The last part is just to build the HTML with all the values and labels that we already got from the scene. There are multiple ways to get the values you need, for instance to customize the tooltip, you just need to explore the hierarchical structure of the scene and get used to it. The image that you are seeing also presents a different style, and that should be done using CSS. You can add CSS for your dashboard and change the style of the tooltip, not just the format. Styling tooltips When we want to style a tooltip, we may want to use the developer's tools to check the classes or names and CSS properties already applied, but it's hard because the popup does not stay still. We can change the tooltipDelayOut property and increase its default value from 80 to 1000 or more, depending on the time you need. When you want to apply some styles to the tooltips for a particular chart you can do by setting a CSS class on the tooltip. For that you should use the propertytooltipClassName and set the class name to be added and latter user on the CSS. Summary In this article,we provided a quick overview of how to use CCC in CDF and CDE dashboards and showed you what kinds of charts are available. We covered some of the base options as well as some advanced option that you might use to get a more customized visualization. Resources for Article: Further resources on this subject: Diving into OOP Principles [article] Python Scripting Essentials [article] Building a Puppet Module Skeleton [article]
Read more
  • 0
  • 0
  • 6801

article-image-virtualizing-hosts-and-applications
Packt
20 May 2016
19 min read
Save for later

Virtualizing Hosts and Applications

Packt
20 May 2016
19 min read
In this article by Jay LaCroix, the author of the book Mastering Ubuntu Server, you will learn how there have been a great number of advancements in the IT space in the last several decades, and a few technologies have come along that have truly revolutionized the technology industry. The author is sure few would argue that the Internet itself is by far the most revolutionary technology to come around, but another technology that has created a paradigm shift in IT is virtualization. It evolved the way we maintain our data centers, allowing us to segregate workloads into many smaller machines being run from a single server or hypervisor. Since Ubuntu features the latest advancements of the Linux kernel, virtualization is actually built right in. After installing just a few packages, we can create virtual machines on our Ubuntu Server installation without the need for a pricey license agreement or support contract. In this article, Jay will walk you through creating, running, and managing Docker containers. (For more resources related to this topic, see here.) Creating, running, and managing Docker containers Docker is a technology that seemed to come from nowhere and took the IT world by storm just a few years ago. The concept of containerization is not new, but Docker took this concept and made it very popular. The idea behind a container is that you can segregate an application you'd like to run from the rest of your system, keeping it sandboxed from the host operating system, while still being able to use the host's CPU and memory resources. Unlike a virtual machine, a container doesn't have a virtual CPU and memory of its own, as it shares resources with the host. This means that you will likely be able to run more containers on a server than virtual machines, since the resource utilization would be lower. In addition, you can store a container on a server and allow others within your organization to download a copy of it and run it locally. This is very useful for developers developing a new solution and would like others to test or run it. Since the Docker container contains everything the application needs to run, it's very unlikely that a systematic difference between one machine or another will cause the application to behave differently. The Docker server, also known as Hub, can be used remotely or locally. Normally, you'd pull down a container from the central Docker Hub instance, which will make various containers available, which are usually based on a Linux distribution or operating system. When you download it locally, you'll be able to install packages within the container or make changes to its files, just as if it were a virtual machine. When you finish setting up your application within the container, you can upload it back to Docker Hub for others to benefit from or your own local Hub instance for your local staff members to use. In some cases, some developers even opt to make their software available to others in the form of containers rather than creating distribution-specific packages. Perhaps they find it easier to develop a container that can be used on every distribution than build separate packages for individual distributions. Let's go ahead and get started. To set up your server to run or manage Docker containers, simply install the docker.io package: # apt-get install docker.io Yes, that's all there is to it. Installing Docker has definitely been the easiest thing we've done during this entire article. Ubuntu includes Docker in its default repositories, so it's only a matter of installing this one package. You'll now have a new service running on your machine, simply titled docker. You can inspect it with the systemctl command, as you would any other: # systemctl status docker Now that Docker is installed and running, let's take it for a test drive. Having Docker installed gives us the docker command, which has various subcommands to perform different functions. Let's try out docker search: # docker search ubuntu What we're doing with this command is searching Docker Hub for available containers based on Ubuntu. You could search for containers based on other distributions, such as Fedora or CentOS, if you wanted. The command will return a list of Docker images available that meet your search criteria. The search command was run as root. This is required, unless you make your own user account a member of the docker group. I recommend you do that and then log out and log in again. That way, you won't need to use root anymore. From this point on, I won't suggest using root for the remaining Docker examples. It's up to you whether you want to set up your user account with the docker group or continue to run docker commands as root. To pull down a docker image for our use, we can use the docker pull command, along with one of the image names we saw in the output of our search command: docker pull ubuntu With this command, we're pulling down the latest Ubuntu container image available on Docker Hub. The image will now be stored locally, and we'll be able to create new containers from it. To create a new container from our downloaded image, this command will do the trick: docker run -it ubuntu:latest /bin/bash Once you run this command, you'll notice that your shell prompt immediately changes. You're now within a shell prompt from your container. From here, you can run commands you would normally run within a real Ubuntu machine, such as installing new packages, changing configuration files, and so on. Go ahead and play around with the container, and then we'll continue on with a bit more theory on how it actually works. There are some potentially confusing aspects of Docker we should get out of the way first before we continue with additional examples. The most likely thing to confuse newcomers to Docker is how containers are created and destroyed. When you execute the docker run command against an image you've downloaded, you're actually creating a container. Each time you use the docker run command, you're not resuming the last container, but creating a new one. To see this in action, run a container with the docker run command provided earlier, and then type exit. Run it again, and then type exit again. You'll notice that the prompt is different each time you run the command. After the root@ portion of the bash prompt within the container is a portion of a container ID. It'll be different each time you execute the docker run command, since you're creating a new container with a new ID each time. To see the number of containers on your server, execute the docker info command. The first line of the output will tell you how many containers you have on your system, which should be the number of times you've run the docker run command. To see a list of all of these containers, execute the docker ps -a command: docker ps -a The output will give you the container ID of each container, the image it was created from, the command being run, when the container was created, its status, and any ports you may have forwarded. The output will also display a randomly generated name for each container, and these names are usually quite wacky. As I was going through the process of creating containers while writing this section, the codenames for my containers were tender_cori, serene_mcnulty, and high_goldwasser. This is just one of the many quirks of Docker, and some of these can be quite hilarious. The important output of the docker ps -a command is the container ID, the command, and the status. The ID allows you to reference a specific container. The command lets you know what command was run. In our example, we executed /bin/bash when we started our containers. Using the ID, we can resume a container. Simply execute the docker start command with the container ID right after. Your command will end up looking similar to the following: docker start 353c6fe0be4d The output will simply return the ID of the container and then drop you back to your shell prompt. Not the shell prompt of your container, but that of your server. You might be wondering at this point, then, how you get back to the shell prompt for the container. We can use docker attach for that: docker attach 353c6fe0be4d You should now be within a shell prompt inside your container. If you remember from earlier, when you type exit to disconnect from your container, the container stops. If you'd like to exit the container without stopping it, press CTRL + P and then CTRL + Q on your keyboard. You'll return to your main shell prompt, but the container will still be running. You can see this for yourself by checking the status of your containers with the docker ps -a command. However, while these keyboard shortcuts work to get you out of the container, it's important to understand what a container is and what it isn't. A container is not a service running in the background, at least not inherently. A container is a collection of namespaces, such as a namespace for its filesystem or users. When you disconnect without a process running within the container, there's no reason for it to run, since its namespace is empty. Thus, it stops. If you'd like to run a container in a way that is similar to a service (it keeps running in the background), you would want to run the container in detached mode. Basically, this is a way of telling your container, "run this process, and don't stop running it until I tell you to." Here's an example of creating a container and running it in detached mode: docker run -dit ubuntu /bin/bash Normally, we use the -it options to create a container. This is what we used a few pages back. The -i option triggers interactive mode, while the -t option gives us a psuedo-TTY. At the end of the command, we tell the container to run the Bash shell. The -d option runs the container in the background. It may seem relatively useless to have another Bash shell running in the background that isn't actually performing a task. But these are just simple examples to help you get the hang of Docker. A more common use case may be to run a specific application. In fact, you can even run a website from a Docker container by installing and configuring Apache within the container, including a virtual host. The question then becomes this: how do you access the container's instance of Apache within a web browser? The answer is port redirection, which Docker also supports. Let's give this a try. First, let's create a new container in detached mode. Let's also redirect port 80 within the container to port 8080 on the host: docker run -dit -p 8080:80 ubuntu /bin/bash The command will output a container ID. This ID will be much longer than you're accustomed to seeing, because when we run docker ps -a, it only shows shortened container IDs. You don't need to use the entire container ID when you attach; you can simply use part of it, so long as it's long enough to be different from other IDs—like this: docker attach dfb3e Here, I've attached to a container with an ID that begins with dfb3e. I'm now attached to a Bash shell within the container. Let's install Apache. We've done this before, but to keep it simple, just install the apache2 package within your container, we don't need to worry about configuring the default sample web page or making it look nice. We just want to verify that it works. Apache should now be installed within the container. In my tests, the apache2 daemon wasn't automatically started as it would've been on a real server instance. Since the latest container available on Docker Hub for Ubuntu hasn't yet been upgraded to 16.04 at the time of writing this (it's currently 14.04), the systemctl command won't work, so we'll need to use the legacy start command for Apache: # /etc/init.d/apache2 start We can similarly check the status, to make sure it's running: # /etc/init.d/apache2 status Apache should be running within the container. Now, press CTRL + P and then CTRL + Q to exit the container, but allow it to keep running in the background. You should be able to visit the sample Apache web page for the container by navigating to localhost:8080 in your web browser. You should see the default "It works!" page that comes with Apache. Congratulations, you're officially running an application within a container! Before we continue, think for a moment of all the use cases you can use Docker for. It may seem like a very simple concept (and it is), but it allows you to do some very powerful things. I'll give you a personal example. At a previous job, I worked with some embedded Linux software engineers, who each had their preferred Linux distribution to run on their workstation computers. Some preferred Ubuntu, others preferred Debian, and a few even ran Gentoo. For developers, this poses a problem—the build tools are different in each distribution, because they all ship different versions of all development packages. The application they developed was only known to compile in Debian, and newer versions of the GNU Compiler Collection (GCC) compiler posed a problem for the application. My solution was to provide each developer a Docker container based on Debian, with all the build tools baked in that they needed to perform their job. At this point, it no longer mattered which distribution they ran on their workstations. The container was the same no matter what they were running. I'm sure there are some clever use cases you can come up with. Anyway, back to our Apache container: it's now running happily in the background, responding to HTTP requests over port 8080 on the host. But, what should we do with it at this point? One thing we can do is create our own image from it. Before we do, we should configure Apache to automatically start when the container is started. We'll do this a bit differently inside the container than we would on an actual Ubuntu server. Attach to the container, and open the /etc/bash.bashrc file in a text editor within the container. Add the following to the very end of the file: /etc/init.d/apache2 start Save the file, and exit your editor. Exit the container with the CTRL + P and CTRL + Q key combinations. We can now create a new image of the container with the docker commit command: docker commit <Container ID> ubuntu:apache-server This command will return to us the ID of our new image. To view all the Docker images available on our machine, we can run the docker images command to have Docker return a list. You should see the original Ubuntu image we downloaded, along with the one we just created. We'll first see a column for the repository the image came from. In our case, it's Ubuntu. Next, we can see the tag. Our original Ubuntu image (the one we used docker pull command to download) has a tag of latest. We didn't specify that when we first downloaded it, it just defaulted to latest. In addition, we see an image ID for both, as well as the size. To create a new container from our new image, we just need to use docker run but specify the tag and name of our new image. Note that we may already have a container listening on port 8080, so this command may fail if that container hasn't been stopped: docker run -dit -p 8080:80 ubuntu:apache-server /bin/bash Speaking of stopping a container, I should probably show you how to do that as well. As you could probably guess, the command is docker stop followed by a container ID. This will send the SIGTERM signal to the container, followed by SIGKILL if it doesn't stop on its own after a delay: docker stop <Container ID> To remove a container, issue the docker rm command followed by a container ID. Normally, this will not remove a running container, but it will if you add the -f option. You can remove more than one docker container at a time by adding additional container IDs to the command, with a space separating each. Keep in mind that you'll lose any unsaved changes within your container if you haven't committed the container to an image yet: docker rm <Container ID> The docker rm command will not remove images. If you want to remove a docker image, use the docker rmi command followed by an image ID. You can run the docker image command to view images stored on your server, so you can easily fetch the ID of the image you want to remove. You can also use the repository and tag name, such as ubuntu:apache-server, instead of the image ID. If the image is in use, you can force its removal with the -f option: docker rmi <Image ID> Before we conclude our look into Docker, there's another related concept you'll definitely want to check out: Dockerfiles. A Dockerfile is a neat way of automating the building of docker images, by creating a text file with a set of instructions for their creation. The easiest way to set up a Dockerfile is to create a directory, preferably with a descriptive name for the image you'd like to create (you can name it whatever you wish, though) and inside it create a file named Dockerfile. Following is a sample—copy this text into your Dockerfile and we'll look at how it works: FROM ubuntu MAINTAINER Jay <jay@somewhere.net> # Update the container's packages RUN apt-get update; apt-get dist-upgrade # Install apache2 and vim RUN apt-get install -y apache2 vim # Make Apache automatically start-up` RUN echo "/etc/init.d/apache2 start" >> /etc/bash.bashrc Let's go through this Dockerfile line by line to get a better understanding of what it's doing: FROM ubuntu We need an image to base our new image on, so we're using Ubuntu as a base. This will cause Docker to download the ubuntu:latest image from Docker Hub if we don't already have it downloaded: MAINTAINER Jay <myemail@somewhere.net> Here, we're setting the maintainer of the image. Basically, we're declaring its author: # Update the container's packages Lines beginning with a hash symbol (#) are ignored, so we are able to create comments within the Dockerfile. This is recommended to give others a good idea of what your Dockerfile does: RUN apt-get update; apt-get dist-upgrade -y With the RUN command, we're telling Docker to run a specific command while the image is being created. In this case, we're updating the image's repository index and performing a full package update to ensure the resulting image is as fresh as can be. The -y option is provided to suppress any requests for confirmation while the command runs: RUN apt-get install -y apache2 vim Next, we're installing both apache2 and vim. The vim package isn't required, but I personally like to make sure all of my servers and containers have it installed. I mainly included it here to show you that you can install multiple packages in one line: RUN echo "/etc/init.d/apache2 start" >> /etc/bash.bashrc Earlier, we copied the startup command for the apache2 daemon into the /etc/bash.bashrc file. We're including that here so that we won't have to do this ourselves when containers are crated from the image. To build the image, we can use the docker build command, which can be executed from within the directory that contains the Dockerfile. What follows is an example of using the docker build command to create an image tagged packt:apache-server: docker build -t packt:apache-server Once you run this command, you'll see Docker create the image for you, running each of the commands you asked it to. The image will be set up just the way you like. Basically, we just automated the entire creation of the Apache container we used as an example in this section. Once this is complete, we can create a container from our new image: docker run -dit -p 8080:80 packt:apache-server /bin/bash Almost immediately after running the container, the sample Apache site will be available on the host. With a Dockerfile, you'll be able to automate the creation of your Docker images. There's much more you can do with Dockerfiles though; feel free to peruse Docker's official documentation to learn more. Summary In this article, we took a look at virtualization as well as containerization. We began by walking through the installation of KVM as well as all the configuration required to get our virtualization server up and running. We also took a look at Docker, which is a great way of virtualizing individual applications rather than entire servers. We installed Docker on our server, and we walked through managing containers by pulling down an image from Docker Hub, customizing our own images, and creating Dockerfiles to automate the deployment of Docker images. We also went over many of the popular Docker commands to manage our containers. Resources for Article: Further resources on this subject: Configuring and Administering the Ubuntu Server[article] Network Based Ubuntu Installations[article] Making the most of Ubuntu through Windows Proxies[article]
Read more
  • 0
  • 0
  • 25196
article-image-gradle-java-plugin
Packt
19 May 2016
18 min read
Save for later

Gradle with the Java Plugin

Packt
19 May 2016
18 min read
In this article by Hubert Klein Ikkink, author of the book Gradle Effective Implementations Guide, Second Edition, we will discuss the Java plugin provides a lot of useful tasks and properties that we can use for building a Java application or library. If we follow the convention-over-configuration support of the plugin, we don't have to write a lot of code in our Gradle build file to use it. If we want to, we can still add extra configuration options to override the default conventions defined by the plugin. (For more resources related to this topic, see here.) Let's start with a new build file and use the Java plugin. We only have to apply the plugin for our build: apply plugin: 'java' That's it! Just by adding this simple line, we now have a lot of tasks that we can use to work with in our Java project. To see the tasks that have been added by the plugin, we run the tasks command on the command line and look at the output: $ gradle tasks :tasks ------------------------------------------------------------ All tasks runnable from root project ------------------------------------------------------------ Build tasks ----------- assemble - Assembles the outputs of this project. build - Assembles and tests this project. buildDependents - Assembles and tests this project and all projects that depend on it. buildNeeded - Assembles and tests this project and all projects it depends on. classes - Assembles main classes. clean - Deletes the build directory. jar - Assembles a jar archive containing the main classes. testClasses - Assembles test classes. Build Setup tasks ----------------- init - Initializes a new Gradle build. [incubating] wrapper - Generates Gradle wrapper files. [incubating] Documentation tasks ------------------- javadoc - Generates Javadoc API documentation for the main source code. Help tasks ---------- components - Displays the components produced by root project 'getting_started'. [incubating] dependencies - Displays all dependencies declared in root project 'getting_started'. dependencyInsight - Displays the insight into a specific dependency in root project 'getting_started'. help - Displays a help message. model - Displays the configuration model of root project 'getting_started'. [incubating] projects - Displays the sub-projects of root project 'getting_started'. properties - Displays the properties of root project 'getting_started'. tasks - Displays the tasks runnable from root project 'getting_started'. Verification tasks ------------------ check - Runs all checks. test - Runs the unit tests. Rules ----- Pattern: clean<TaskName>: Cleans the output files of a task. Pattern: build<ConfigurationName>: Assembles the artifacts of a configuration. Pattern: upload<ConfigurationName>: Assembles and uploads the artifacts belonging to a configuration. To see all tasks and more detail, run gradle tasks --all To see more detail about a task, run gradle help --task <task> BUILD SUCCESSFUL Total time: 0.849 secs If we look at the list of tasks, we can see the number of tasks that are now available to us, which we didn't have before; all this is done just by adding a simple line to our build file. We have several task groups with their own individual tasks, which can be used. We have tasks related to building source code and packaging in the Build tasks section. The javadoc task is used to generate Javadoc documentation, and is in the Documentation tasks section. The tasks for running tests and checking code quality are in the Verification tasks section. Finally, we have several rule-based tasks to build, upload, and clean artifacts or tasks in our Java project. The tasks added by the Java plugin are the visible part of the newly added functionality to our project. However, the plugin also adds the so-called convention object to our project. A convention object has several properties and methods, which are used by the tasks of the plugin. These properties and methods are added to our project and can be accessed like normal project properties and methods. So, with the convention object, we can not only look at the properties used by the tasks in the plugin, but we can also change the value of the properties to reconfigure certain tasks. Using the Java plugin To work with the Java plugin, we are first going to create a very simple Java source file. We can then use the plugin's tasks to build the source file. You can make this application as complex as you want, but in order to stay on topic, we will make this as simple as possible. By applying the Java plugin, we must now follow some conventions for our project directory structure. To build the source code, our Java source files must be in the src/main/java directory, relative to the project directory. If we have non-Java source files that need to be included in the JAR file, we must place them in the src/main/resources directory. Our test source files need to be in the src/test/java directory and any non-Java source files required for testing can be placed in src/test/resources. These conventions can be changed if we want or need it, but it is a good idea to stick with them so that we don't have to write any extra code in our build file, which could lead to errors. Our sample Java project that we will write is a Java class that uses an external property file to receive a welcome message. The source file with the name Sample.java is located in the src/main/java directory, as follows: $ gradle tasks :tasks ------------------------------------------------------------ All tasks runnable from root project ------------------------------------------------------------ Build tasks ----------- assemble - Assembles the outputs of this project. build - Assembles and tests this project. buildDependents - Assembles and tests this project and all projects that depend on it. buildNeeded - Assembles and tests this project and all projects it depends on. classes - Assembles main classes. clean - Deletes the build directory. jar - Assembles a jar archive containing the main classes. testClasses - Assembles test classes. Build Setup tasks ----------------- init - Initializes a new Gradle build. [incubating] wrapper - Generates Gradle wrapper files. [incubating] Documentation tasks ------------------- javadoc - Generates Javadoc API documentation for the main source code. Help tasks ---------- components - Displays the components produced by root project 'getting_started'. [incubating] dependencies - Displays all dependencies declared in root project 'getting_started'. dependencyInsight - Displays the insight into a specific dependency in root project 'getting_started'. help - Displays a help message. model - Displays the configuration model of root project 'getting_started'. [incubating] projects - Displays the sub-projects of root project 'getting_started'. properties - Displays the properties of root project 'getting_started'. tasks - Displays the tasks runnable from root project 'getting_started'. Verification tasks ------------------ check - Runs all checks. test - Runs the unit tests. Rules ----- Pattern: clean<TaskName>: Cleans the output files of a task. Pattern: build<ConfigurationName>: Assembles the artifacts of a configuration. Pattern: upload<ConfigurationName>: Assembles and uploads the artifacts belonging to a configuration. To see all tasks and more detail, run gradle tasks --all To see more detail about a task, run gradle help --task <task> BUILD SUCCESSFUL Total time: 0.849 secs In the code, we use ResourceBundle.getBundle() to read our welcome message. The welcome message itself is defined in a properties file with the name messages.properties, which will go in the src/main/resources directory: # File: src/main/resources/gradle/sample/messages.properties welcome = Welcome to Gradle! To compile the Java source file and process the properties file, we run the classes task. Note that the classes task has been added by the Java plugin. This is the so-called life cycle task in Gradle. The classes task is actually dependent on two other tasks—compileJava and processResources. We can see this task dependency when we run the tasks command with the --all command-line option: $ gradle tasks --all ... classes - Assembles main classes. compileJava - Compiles main Java source. processResources - Processes main resources. ... Let's run the classes task from the command line: $ gradle classes :compileJava :processResources :classes BUILD SUCCESSFUL Total time: 1.08 secs Here, we can see that compileJava and processResources tasks are executed because the classes task depends on these tasks. The compiled class file and properties file are now in the build/classes/main and build/resources/main directories. The build directory is the default directory that Gradle uses to build output files. If we execute the classes task again, we will notice that the tasks support the incremental build feature of Gradle. As we haven't changed the Java source file or the properties file, and the output is still present, all the tasks can be skipped as they are up-to-date: $ gradle classes :compileJava UP-TO-DATE :processResources UP-TO-DATE :classes UP-TO-DATE BUILD SUCCESSFUL Total time: 0.595 secs To package our class file and properties file, we invoke the jar task. This task is also added by the Java plugin and depends on the classes task. This means that if we run the jar task, the classes task is also executed. Let's try and run the jar task, as follows: $ gradle jar :compileJava UP-TO-DATE :processResources UP-TO-DATE :classes UP-TO-DATE :jar BUILD SUCCESSFUL Total time: 0.585 secs The default name of the resulting JAR file is the name of our project. So if our project is called sample, then the JAR file is called sample.jar. We can find the file in the build/libs directory. If we look at the contents of the JAR file, we see our compiled class file and the messages.properties file. Also, a manifest file is added automatically by the jar task: $ jar tvf build/libs/sample.jar 0 Wed Oct 21 15:29:36 CEST 2015 META-INF/ 25 Wed Oct 21 15:29:36 CEST 2015 META-INF/MANIFEST.MF 0 Wed Oct 21 15:26:58 CEST 2015 gradle/ 0 Wed Oct 21 15:26:58 CEST 2015 gradle/sample/ 685 Wed Oct 21 15:26:58 CEST 2015 gradle/sample/Sample.class 90 Wed Oct 21 15:26:58 CEST 2015 gradle/sample/messages.properties We can also execute the assemble task to create the JAR file. The assemble task, another life cycle task, is dependent on the jar task and can be extended by other plugins. We could also add dependencies on other tasks that create packages for a project other than the JAR file, such as a WAR file or ZIP archive file: $ gradle assemble :compileJava UP-TO-DATE :processResources UP-TO-DATE :classes UP-TO-DATE :jar UP-TO-DATE :assemble UP-TO-DATE BUILD SUCCESSFUL Total time: 0.607 secs To start again and clean all the generated output from the previous tasks, we can use the clean task. This task deletes the project build directory and all the generated files in this directory. So, if we execute the clean task from the command line, Gradle will delete the build directory: $ gradle clean :clean BUILD SUCCESSFUL Total time: 0.583 secs Note that the Java plugin also added some rule-based tasks. One of them was clean<TaskName>. We can use this task to remove the output files of a specific task. The clean task deletes the complete build directory; but with clean<TaskName>, we only delete the files and directories created by the named task. For example, to clean the generated Java class files of the compileJava task, we execute the cleanCompileJava task. As this is a rule-based task, Gradle will determine that everything after clean must be a valid task in our project. The files and directories created by this task are then determined by Gradle and deleted: $ gradle cleanCompileJava :cleanCompileJava UP-TO-DATE BUILD SUCCESSFUL Total time: 0.578 secs   Working with source sets The Java plugin also adds a new concept to our project—source sets. A source set is a collection of source files that are compiled and executed together. The files can be Java source files or resource files. Source sets can be used to group files together with a certain meaning in our project, without having to create a separate project. For example, we can separate the location of source files that describe the API of our Java project in a source set, and run tasks that only apply to the files in this source set. Without any configuration, we already have the main and test source sets, which are added by the Java plugin. For each source set, the plugin also adds the following three tasks: compile<SourceSet>Java, process<SourceSet>Resources, and <SourceSet>Classes. When the source set is named main, we don't have to provide the source set name when we execute a task. For example, compileJava applies to the main source test, but compileTestJava applies to the test source set. Each source set also has some properties to access the directories and files that make up the source set. The following table shows the properties that we can access in a source set: Source set property Type Description java org.gradle.api.file.SourceDirectorySet These are the Java source files for this project. Only files with the.java extension are in this collection. allJava SourceDirectorySet By default, this is the same as the java property, so it contains all the Java source files. Other plugins can add extra source files to this collection. resources SourceDirectorySet These are all the resource files for this source set. This contains all the files in the resources source directory, excluding any files with the.java extension. allSource SourceDirectorySet By default, this is the combination of the resources and Java properties. This includes all the source files of this source set, both resource and Java source files. output SourceSetOutput These are the output files for the source files in the source set. This contains the compiled classes and processed resources. java.srcDirs Set<File> These are the directories with Java source files. resources.srcDirs Set<File> These are the directories with the resource files for this source set. output.classesDir File This is the output directory with the compiled class files for the Java source files in this source set. output.resourcesDir File This is the output directory with the processed resource files from the resources in this source set. name String This is the read-only value with the name of the source set. We can access these properties via the sourceSets property of our project. In the following example, we will create a new task to display values for several properties: apply plugin: 'java' task sourceSetJavaProperties << { sourceSets { main { println "java.srcDirs = ${java.srcDirs}" println "resources.srcDirs = ${resources.srcDirs}" println "java.files = ${java.files.name}" println "allJava.files = ${allJava.files.name}" println "resources.files = ${resources.files.name}" println "allSource.files = ${allSource.files.name}" println "output.classesDir = ${output.classesDir}" println "output.resourcesDir = ${output.resourcesDir}" println "output.files = ${output.files}" } } } When we run the sourceSetJavaProperties task, we get the following output: $ gradle sourceSetJavaproperties :sourceSetJavaProperties java.srcDirs = [/gradle-book/Chapter4/Code_Files/sourcesets/src/main/java] resources.srcDirs = [/gradle-book/Chapter4/Code_Files/sourcesets/src/main/resources] java.files = [Sample.java] allJava.files = [Sample.java] resources.files = [messages.properties] allSource.files = [messages.properties, Sample.java] output.classesDir = /gradle-book/Chapter4/Code_Files/sourcesets/build/classes/main output.resourcesDir = /gradle-book/Chapter4/Code_Files/sourcesets/build/resources/main output.files = [/gradle-book/Chapter4/Code_Files/sourcesets/build/classes/main, /gradle-book/Chapter4/Code_Files/sourcesets/build/resources/main] BUILD SUCCESSFUL Total time: 0.594 secs Creating a new source set We can create our own source set in a project. A source set contains all the source files that are related to each other. In our example, we will add a new source set to include a Java interface. Our Sample class will then implement the interface; however, as we use a separate source set, we can use this later to create a separate JAR file with only the compiled interface class. We will name the source set api as the interface is actually the API of our example project, which we can share with other projects. To define this source set, we only have to put the name in the sourceSets property of the project, as follows: apply plugin: 'java' sourceSets { api } Gradle will create three new tasks based on this source set—apiClasses, compileApiJava, and processApiResources. We can see these tasks after we execute the tasks command: $ gradle tasks --all ... Build tasks ----------- apiClasses - Assembles api classes. compileApiJava - Compiles api Java source. processApiResources - Processes api resources. We have created our Java interface in the src/api/java directory, which is the source directory for the Java source files for the api source set. The following code allows us to see the Java interface: // File: src/api/java/gradle/sample/ReadWelcomeMessage.java package gradle.sample; /** * Read welcome message from source and return value. */ public interface ReadWelcomeMessage { /** * @return Welcome message */ String getWelcomeMessage(); } To compile the source file, we can execute the compileApiJava or apiClasses task: $ gradle apiClasses :compileApiJava :processApiResources UP-TO-DATE :apiClasses BUILD SUCCESSFUL Total time: 0.595 secs The source file is compiled in the build/classes/api directory. We will now change the source code of our Sample class and implement the ReadWelcomeMessage interface, as shown in the following code: // File: src/main/java/gradle/sample/Sample.java package gradle.sample; import java.util.ResourceBundle; /** * Read welcome message from external properties file * <code>messages.properties</code>. */ public class Sample implements ReadWelcomeMessage { public Sample() { } /** * Get <code>messages.properties</code> file * and read the value for <em>welcome</em> key. * * @return Value for <em>welcome</em> key * from <code>messages.properties</code> */ public String getWelcomeMessage() { final ResourceBundle resourceBundle = ResourceBundle.getBundle("messages"); final String message = resourceBundle.getString("welcome"); return message; } } Next, we run the classes task to recompile our changed Java source file: $ gradle classes :compileJava /gradle-book/Chapter4/src/main/java/gradle/sample/Sample.java:10: error: cannot find symbol public class Sample implements ReadWelcomeMessage { ^ symbol: class ReadWelcomeMessage 1 error :compileJava FAILED FAILURE: Build failed with an exception. * What went wrong: Execution failed for task ':compileJava'. > Compilation failed; see the compiler error output for details. * Try: Run with --stacktrace option to get the stack trace. Run with --info or --debug option to get more log output. BUILD FAILED Total time: 0.608 secs We get a compilation error! The Java compiler cannot find the ReadWelcomeMessage interface. However, we just ran the apiClasses task and compiled the interface without errors. To fix this, we must define a dependency between the classes and apiClasses tasks. The classes task is dependent on the apiClasses tasks. First, the interface must be compiled and then the class that implements the interface. Next, we must add the output directory with the compiled interface class file to the compileClasspath property of the main source set. Once we have done this, we know for sure that the Java compiler for compiling the Sample class picks up the compiled class file. To do this, we will change the build file and add the task dependency between the two tasks and the main source set configuration, as follows: apply plugin: 'java' sourceSets { api main { compileClasspath += files(api.output.classesDir) } } classes.dependsOn apiClasses   Now we can run the classes task again, without errors: $ gradle classes :compileApiJava :processApiResources UP-TO-DATE :apiClasses :compileJava :processResources :classes BUILD SUCCESSFUL Total time: 0.648 secs   Custom configuration If we use Gradle for an existing project, we might have a different directory structure than the default structure defined by Gradle, or it may be that we want to have a different structure for another reason. We can account for this by configuring the source sets and using different values for the source directories. Consider that we have a project with the following source directory structure: . ├── resources │ ├── java │ └── test ├── src │ └── java ├── test │ ├── integration │ │ └── java │ └── unit │ └── java └── tree.txt We will need to reconfigure the main and test source sets, but we must also add a new integration-test source set. The following code reflects the directory structure for the source sets: apply plugin: 'java' sourceSets { main { java { srcDir 'src/java' } resources { srcDir 'resources/java' } } test { java { srcDir 'test/unit/java' } resources { srcDir 'resources/test' } } 'integeration-test' { java { srcDir 'test/integration/java' } resources { srcDir 'resources/test' } } } Notice how we must put the name of the integration-test source set in quotes; this is because we use a hyphen in the name. Gradle then converts the name of the source set into integrationTest (without the hyphen and with a capital T). To compile, for example, the source files of the integration test source set, we use the compileIntegrationTestJava task. Summary In this article, we discussed the support for a Java project in Gradle. With a simple line needed to apply the Java plugin, we get masses of functionality, which we can use for our Java code. Resources for Article: Further resources on this subject: Working with Gradle [article] Speeding up Gradle builds for Android [article] Developing a JavaFX Application for iOS [article]
Read more
  • 0
  • 0
  • 5874

Packt
19 May 2016
13 min read
Save for later

First Person Shooter Part 1 – Creating Exterior Environments

Packt
19 May 2016
13 min read
In this article by John P. Doran, the author of the book Unity 5.x Game Development Blueprints, we will be creating a first-person shooter; however, instead of shooting a gun to damage our enemies, we will be shooting a picture in a survival horror environment, similar to the Fatal Frame series of games and the recent indie title DreadOut. To get started on our project, we're first going to look at creating our level or, in this case, our environments starting with the exterior. In the game industry, there are two main roles in level creation: an environment artist and a level designer. An environment artist is a person who builds the assets that go into the environment. He/she uses tools such as 3Ds Max or Maya to create the model and then uses other tools such as Photoshop to create textures and normal maps. The level designer is responsible for taking the assets that the environment artist created and assembling them in an environment for players to enjoy. He/she designs the gameplay elements, creates the scripted events, and tests the gameplay. Typically, a level designer will create environments through a combination of scripting and using a tool that may or may not be in development as the game is being made. In our case, that tool is Unity. One important thing to note is that most companies have their own definition for different roles. In some companies, a level designer may need to create assets and an environment artist may need to create a level layout. There are also some places that hire someone to just do lighting or just to place meshes (called a mesher) because they're so good at it. (For more resources related to this topic, see here.) Project overview In this article, we take on the role of an environment artist who has been tasked to create an outdoor environment. We will use assets that I've placed in the example code as well as assets already provided to us by Unity for mesh placement. In addition, you will also learn some beginner-level design. Your objectives This project will be split into a number of tasks. It will be a simple step-by-step process from the beginning to end. Here is the outline of our tasks: Creating the exterior environment—terrain Beautifying the environment—adding water, trees, and grass Building the atmosphere Designing the level layout and background Project setup At this point, I assume that you have a fresh installation of Unity and have started it. You can perform the following steps: With Unity started, navigate to File | New Project. Select a project location of your choice somewhere on your hard drive and ensure that you have Setup defaults for set to 3D. Then, put in a Project name (I used First Person Shooter). Once completed, click on Create project. Here, if you see the Welcome to Unity popup, feel free to close it as we won't be using it. Level design 101 – planning Now just because we are going to be diving straight into Unity, I feel that it's important to talk a little more about how level design is done in the game industry. Although you may think a level designer will just jump into the editor and start playing, the truth is that you normally would need to do a ton of planning ahead of time before you even open up your tool. In general, a level begins with an idea. This can come from anything; maybe you saw a really cool building, or a photo on the Internet gave you a certain feeling; maybe you want to teach the player a new mechanic. Turning this idea into a level is what a level designer does. Taking all of these ideas, the level designer will create a level design document, which will outline exactly what you're trying to achieve with the entire level from start to end. A level design document will describe everything inside the level; listing all of the possible encounters, puzzles, so on and so forth, which the player will need to complete as well as any side quests that the player will be able to achieve. To prepare for this, you should include as many references as you can with maps, images, and movies similar to what you're trying to achieve. If you're working with a team, making this document available on a website or wiki will be a great asset so that you know exactly what is being done in the level, what the team can use in their levels, and how difficult their encounters can be. In general, you'll also want a top-down layout of your level done either on a computer or with a graph paper, with a line showing a player's general route for the level with encounters and missions planned out. Of course, you don't want to be too tied down to your design document and it will change as you playtest and work on the level, but the documentation process will help solidify your ideas and give you a firm basis to work from. For those of you interested in seeing some level design documents, feel free to check out Adam Reynolds (Level Designer on Homefront and Call of Duty: World at War) at http://wiki.modsrepository.com/index.php?title=Level_Design:_Level_Design_Document_Example. If you want to learn more about level design, I'm a big fan of Beginning Game Level Design, John Feil (previously my teacher) and Marc Scattergood, Cengage Learning PTR. For more of an introduction to all of game design from scratch, check out Level Up!: The Guide to Great Video Game Design, Scott Rogers, Wiley and The Art of Game Design, Jesse Schell, CRC Press. For some online resources, Scott has a neat GDC talk named Everything I Learned About Level Design I Learned from Disneyland, which can be found at http://mrbossdesign.blogspot.com/2009/03/everything-i-learned-about-game-design.html, and World of Level Design (http://worldofleveldesign.com/) is a good source for learning about of level design, though it does not talk about Unity specifically. Introduction to terrain Terrain is basically used for non-manmade ground; things such as hills, deserts, and mountains. Unity's way of dealing with terrain is different than what most engines use in the fact that there are two mays to make terrains, one being using a height map and the other sculpting from scratch. Height maps Height maps are a common way for game engines to support terrains. Rather than creating tools to build a terrain within the level, they use a piece of graphics software to create an image and then we can translate that image into a terrain using the grayscale colors provided to translate into different height levels, hence the name height map. The lighter in color the area is, the lower its height, so in this instance, black represents the terrain's lowest areas, whereas white represents the highest. The Terrain's Terrain Height property sets how high white actually is compared with black. In order to apply a height map to a terrain object, inside an object's Terrain component, click on the Settings button and scroll down to Import Raw…. For more information on Unity's Height tools, check out http://docs.unity3d.com/Manual/terrain-Height.html. If you want to learn more about creating your own HeightMaps using Photoshop while this tutorial is for UDK, the area in Photoshop is the same: http://worldofleveldesign.com/categories/udk/udk-landscape-heightmaps-photoshop-clouds-filter.php  Others also use software such as Terragen to create HeightMaps. More information on that is at http://planetside.co.uk/products/terragen3. Exterior environment – terrain When creating exterior environments, we cannot use straight floors for the most part unless you're creating a highly urbanized area. Our game takes place in a haunted house in the middle of nowhere, so we're going to create a natural landscape. In Unity, the best tool to use to create a natural landscape is the Terrain tool. Unity's Terrain system lets us add landscapes, complete with bushes, trees, and fading materials to our game. To show how easy it is to use the terrain tool, let's get started. The first thing that we're going to do is actually create the terrain we'll be placing for the world. Let's first create a Terrain by selecting GameObject | 3D Object | Terrain. At this point, you should see the terrain on the screen. If for some reason you have problems seeing the terrain object, go to the Hierarchy tab and double-click on the Terrain object to focus your camera on it and move in as needed. Right now, it's just a flat plane, but we'll be doing a lot to it to make it shine. If you look to the right with the Terrain object selected, you'll see the Terrain editing tools, which do the following (from left to right): Raise/Lower Height—This will allow us to raise or lower the height of our terrain in a certain radius to create hills, rivers, and more. Paint Height—If you already know exactly the height that a part of your terrain needs to be, this tool will allow you to paint a spot to that location. Smooth Height—This averages out the area that it is in, attempts to smooth out areas, and reduces the appearance of abrupt changes. Paint Texture—This allows us to add textures to the surface of our terrain. One of the nice features of this is the ability to lay multiple textures on top of each other. Place Trees—This allows us to paint objects in our environment that will appear on the surface. Unity attempts to optimize these objects by billboarding distant trees so we can have dense forests without having a horrible frame rate. By billboarding, I mean that the object will be simplified and its direction usually changes constantly as the object and camera move, so it always faces the camera direction. Paint Details—In addition to trees, you can also have small things like rocks or grass covering the surface of your environment, using 2D images to represent individual clumps with bits of randomization to make it appear more natural. Terrain Settings—Settings that will affect the overall properties of the particular Terrain, options such as the size of the terrain and wind can be found here. By default, the entire Terrain is set to be at the bottom, but we want to have ground above us and below us so we can add in things like lakes. With the Terrain object selected, click on the second button from the left on the Terrain component (Paint height mode). From there, set the Height value under Settings to 100 and then press the Flatten button. At this point, you should note the plane moving up, so now everything is above by default. Next, we are going to create some interesting shapes to our world with some hills by "painting" on the surface. With the Terrain object selected, click on the first button on the left of our Terrain component (the Raise/Lower Terrain mode). Once this is completed, you should see a number of different brushes and shapes that you can select from. Our use of terrain is to create hills in the background of our scene, so it does not seem like the world is completely flat. Under the Settings, change the Brush Size and Opacity of your brush to 100 and left-click around the edges of the world to create some hills. You can increase the height of the current hills if you click on top of the previous hill. When creating hills, it's a good idea to look at multiple angles while you're building them, so you can make sure that none are too high or too low In general, you want to have taller hills as you go further back, or else you cannot see the smaller ones since they're blocked. In the Scene view, to move your camera around, you can use the toolbar at the top-right corner or hold down the right mouse button and drag it in the direction you want the camera to move around in, pressing the W, A, S, and D keys to pan. In addition, you can hold down the middle mouse button and drag it to move the camera around. The mouse wheel can be scrolled to zoom in and out from where the camera is. Even though you should plan out the level ahead of time on something like a piece of graph paper to plan out encounters, you will want to avoid making the level entirely from the preceding section, as the player will not actually see the game with a bird's eye view in the game at all (most likely). Referencing the map from the same perspective as your character will help ensure that the map looks great. To see many different angles at one time, you can use a layout with multiple views of the scene, such as the 4 Split. Once we have our land done, we now want to create some holes in the ground, which we will fill with water later. This will provide a natural barrier to our world that players will know they cannot pass, so we will create a moat by first changing the Brush Size value to 50 and then holding down the Shift key, and left-clicking around the middle of our texture. In this case, it's okay to use the Top view; remember that this will eventually be water to fill in lakes, rivers, and so on, as shown in the following screenshot:   To make this easier to see, you can click on the sun-looking light icon from the Scene tab to disable lighting for the time being. At this point, we have done what is referred to in the industry as "grayboxing," making the level in the engine in the simplest way possible but without artwork (also known as "whiteboxing" or "orangeboxing" depending on the company you're working for). At this point in a traditional studio, you'd spend time playtesting the level and iterating on it before an artist or you will take the time to make it look great. However, for our purposes, we want to create a finished project as soon as possible. When doing your own games, be sure to play your level and have others play your level before you polish it. For more information on grayboxing, check out http://www.worldofleveldesign.com/categories/level_design_tutorials/art_of_blocking_in_your_map.php. For an example with images of a graybox to the final level, PC Gamer has a nice article available at http://www.pcgamer.com/2014/03/18/building-crown-part-two-layout-design-textures-and-the-hammer-editor/. Summary With this, we now have a great-looking exterior level for our game! In addition, we covered a lot of features that exist in Unity for you to be able to use in your own future projects.   Resources for Article: Further resources on this subject: Learning NGUI for Unity [article] Components in Unity [article] Saying Hello to Unity and Android [article]
Read more
  • 0
  • 0
  • 7100
Modal Close icon
Modal Close icon