How-To Tutorials

article-image-editor-tool-prefabs-and-main-menu

22 Sep 2015

19 min read

Editor Tool, Prefabs, and Main Menu

22 Sep 2015

In this article by Edward Kyle Langley, author of the book Learning Unity iOS Game Development, we will learn that the player has the ability to send input to the device, and we will handle this by manipulating the player character GameObject. We also set up some game logic so that the player character can interact with positive and negative world objects, such as Coins and Obstacles. To further develop the sense of a complete game, we need to create the pieces of the game world that represent a floor that the player will run on. (For more resources related to this topic, see here.) To create these pieces, we will create a Unity EditorWindow class that will help us create grids that will represent the ground the player runs on and the dirt below it. Traditionally, you would have to place each sprite one at a time. With this editor tool, we will be able to crate bigger boxes in a grid based on our settings. After we have our editor tool running, we will begin to create the prefabs that will hold multiple GameObjects and their components in a single file. Finally, we will write the code needed to move the floor and ground pieces below the player character, simulating the character as running forward. To summarize, in this article, we will cover the following topics: Writing a Unity C# class that extends EditorWindow, which allows you to input settings and sprite files that will give you a box grid and simplify the level pieces creation Creating the game-related prefabs so that you have grouped files in an easy-to-use file Building the main menu user interface with Unity's UI tools, including buttons for achievements, leaderboards, and store purchases Use the prefabs we made in the C# script. This will move the level pieces of prefabs under the player character, simulating movement. We will also go through the steps to get the final aspects of the iOS integration function and set up the main menu UI so that the player can navigate between playing the game, view at leaderboards /achievements, and have the option to purchase "remove iAds" for the cost of ten thousand coins or 99 cents. Making the Sprite Tile Editor Tool The Unity engine is incredibly flexible for all the aspects of game development, including creating custom editor tools to help fast track the more tedious aspects of development. In our case, it will be beneficial to have a tool that creates a root GameObject that will then create children GameObjects in a grid. This will be spaced out by the size of the sprite component they have attached. For example, if you were to place say 24 GameObjects one at a time, it could take some time to make sure that all are snapped correctly together. With our tool, we will be able to select the X value and the Y value for the grid, the sprite that represents the ground, and the sprite that represents the dirt below the ground. Perform the following steps: To begin with, navigate to the Assets folder. Right-click on this folder and select Create and then New Folder. Name this folder Level. Right-click on the new Level folder and select Import New Asset. Right-click on the Script folder, select Create and then C# Script. Name the script SpriteTiler. The SpriteTiler C# class Double-click on the SpriteTiler C# file to open it. Change the file so that it looks similar to the following code: using UnityEngine; using UnityEditor; using System.Collections; public class SpriteTiler : EditorWindow { } The big changes from the normally generated code file is the addition to using UnityEditor, changing the inherited class to EditorWindow, and removing the Start() and Update() functions. Global variables We now want to add the global variables for this class. Add the following code in the class block: // Grid settings to make tiled by public float GridXSlider = 1; public float GridYSlider = 1; // Sprites for both the ground and dirt public Sprite TileGroundSprite; public Sprite TileDirtSprite; // Name of the GameObject that holds our tiled Objects public string TileSpriteRootGameObjectName = "Tiled Object"; The GridXSlider and GridYSlider class will be used to generate our grid, X being left to right and Y being top down. For example, if you had X set to five and Y set to three, the grid would generate columns of five elements and rows of three elements or five sprites long and three sprites down. The TileGroundSprite and TileDirtSprite sprite files will make up the ground and dirt levels. TileSpriteRootGameObjectName is the GameObject name that will hold the GameObjects children that have the sprite components. This is editable by you so that you can choose the name of the GameObject that gets created to avoid having the default new GameObject for each one made. The MenuItem creation Next, we need to create the MenuItem function. This will represent the Editor selection drop-down list so that we can use our tool. Add the following function to the SpriteTiler class under the global variables: // Menu option to bring up Sprite Tiler window [MenuItem("RushRunner/Sprite Tile")] public static void OpenSpriteTileWindow() { EditorWindow.GetWindow< SpriteTiler > ( true, "Sprite Tiler" ); } As this class extends EditorWindow, and the preceding function is declared as MenuItem, it will create a dropdown in the Editor named RushRunner. This will hold a selection called Sprite Tile: You can name the dropdown and selection anything you like by changing the string that is passed into MenuItem, such as MyEditorTool or Editor Tool Name. If you save the SpiteTiler.cs file and go back to Unity and allow the engine to compile, you will be able to click on the SpriteTile button under RushRunner. This will create a editor window named Sprite Tiler. The OnGUI function Next, we need to add the function that will be used to draw all the windows GUI elements or the fields that we will use to get the settings to make the grid. Under our OpenSpriteTileWindow function, add the following code: // Called to render GUI frames and elements void OnGUI() { } OnGUI is the function that will draw our GUI elements to the window. This allows you to manipulate these GUI elements so that we have values to use when we create the GameObject grid and its GameObjects children with sprite components. The GUILayout and OnGUI setup To begin with the OnGUI function, we want to add the GUI elements to the window. In the OnGUI function, add the following code: // Setting for GameObject name that holds our tiled Objects GUILayout.Label("Tile Level Object Name", EditorStyles .boldLabel); TileSpriteRootGameObjectName = GUILayout.TextField( TileSpriteRootGameObjectName, 25 ); // Slider for X grid value (left to right) GUILayout.Label("X: " + GridXSlider, EditorStyles. boldLabel); GridXSlider = GUILayout.HorizontalScrollbar( GridXSlider, 1.0f, 0.0f, 30.0f ); GridXSlider = (int)GridXSlider; // Slider for Y grid value(up to down) GUILayout.Label("Y: " + GridYSlider, EditorStyles. boldLabel); GridYSlider = GUILayout.HorizontalScrollbar(GridYSlider, 1.0f, 0.0f, 30.0f); GridYSlider = (int)GridYSlider; // File chose to be our Ground Sprite GUILayout.Label("Sprite Ground File", EditorStyles. boldLabel); TileGroundSprite = EditorGUILayout.ObjectField (TileGroundSprite, typeof(Sprite), true) as Sprite; // File chose to be our Dirt Sprite GUILayout.Label("Sprite Dirt File", EditorStyles. boldLabel); TileDirtSprite = EditorGUILayout.ObjectField (TileDirtSprite, typeof(Sprite), true) as Sprite; GUILayout.Label is a function that creates a text label in the window we are using. Its first use is to let the user know that the next setting is for Tile Level Object Name: the name of the root GameObject that will hold children GameObjects with Sprite components. By default, this is set to Tiled Object, although we allow the user to change it. In order to allow the user to change it, we need to give them a TextField parameter to input a new string. We do this by telling that TileSpriteRootGameObjectName is equal to the GUILayout.TextField setting. As this is used in OnGUI, anything the user inputs will change the value of TileSpriteRootGameObjectName. We will use this later when the user wants to create the GameObject. We then need to create two HorizontalSlider GUI elements so that we can get values from them that represent the X and Y values of the grid. Similar to TextField, we can start each of the HorizontalSlider elements with GUILayout.Label. This describes what the slider is for. We will then assign the GridXSlider and GridYSlider values to what the HorizontalSlider element is set to, which is one by default. As the user adjusts the sliders, the GridXSlider and GridYSlider values will change so that when the user clicks on a button to create the GameObject, we will have a reference to the values that they want to use for the grid. After HorizontalSliders, we want to have ObjectFields so that the user can search for and assign sprite files that will represent the ground and dirt of the grid. EditorGUILayout.ObjectField takes a reference to the object you want to assign when the user selects one, the type of object that ObjectField wants, and if ObjectField takes SceneObjects. As we want this ObjectField to be for sprites, we will set the type of object to typeof( Sprite ) and then cast the result that is assigned to TileGroundSprite or TileDirtSprite to the sprite by using as Sprite. The OnGUI create tiled button In order to know when the user wants to create the root GameObject and its grid of children GameObjects, we will need a button. Add the following code under the last GUI Elements: // If butt "Create Tiled" is clicked if (GUILayout.Button("Create Tiled")) { // If the Grid settings are both zero, // send notification to user if (GridXSlider == 0 && GridYSlider == 0) { ShowNotification(new GUIContent("Must have either X or Y grid set to a value greater than 0")); return; } // if Dirt and Ground Sprite exist if (TileDirtSprite != null && TileGroundSprite !=null) { // If the Sprites sizes dont match, // send notifcation to user if (TileDirtSprite.bounds.size.x != TileGroundSprite. bounds.size.x || TileDirtSprite.bounds.size.y != TileGroundSprite.bounds.size.y) { ShowNotification(new GUIContent("Both Sprites must be of matching size.")); return; } // Create GameObject and tiled // Objects with user settings CreateSpriteTiledGameObject(GridXSlider, GridYSlider, TileGroundSprite, TileDirtSprite, TileSpriteRoot GameObjectName); } else { // If either Dirt or Ground Sprite dont exist, // send notifcation to user ShowNotification( new GUIContent( "Must have Dirt and Ground Sprite selected." ) ); return; } } The first condition we have set is the GUILayout.Button( "Create Tiled" ) function. The Button function will return true as soon as it is clicked on, but it will still render to the window if false. This means that although the button is not active, it'll still be seen by the user. As some settings will create a scenario that is not ideal for the concept of our SpriteTiler, we first want to make sure that the settings are in line with what we have designed the tool to perform. We will first check whether GridXSlider and GridYSlider are set to zero. If both of these values are set to zero, the grid won't create anything, and as the concept of the tool is to create a grid of children sprites, we will tell the user that they must have a selection above zero for either GridXSlider or GridYSlider. We then check whether TileDirtSprite and TileGroundSprite have a value. If either of these values are null, the settings are not complete. This results in you telling the user that Dirt and Ground sprites need a selection. If the user has set Dirt and Ground sprites to something, but their sizing is not the same, such as one being 32 x 32 and the other being 64 x 64, we will tell the user that both the sprites need to be of the same size. If we didn't check for this, the grid wouldn't align correctly, creating negative results and making the tool not function as we want it to. If the user settings are in order, we will call the CreateSpriteTiledGameObject function and pass GridXSlider, GridYSlixer, TileGroundSprite, TileDirtSprite, and TileSpriteRootGameObjectName. The CreateSpriteTiledGameObject function This function is designed to take the user settings and create the grid from them. Add the following function under the OnGUI function: // Create GameObject and tiled childen based on user settings public static void CreateSpriteTiledGameObject(float GridXSlider, float GridYSlider, Sprite SpriteGroundFile, Sprite SpriteDirtFile, string RootObjectName) { // Store size of Sprite float spriteX = SpriteGroundFile.bounds.size.x; float spriteY = SpriteGroundFile.bounds.size.y; // Create the root GameObject which will hold children that tile GameObject rootObject = new GameObject( ); // Set position in world to 0,0,0 rootObject.transform.position = new Vector3( 0.0f, 0.0f, 0.0f ); // Name it based on user settings rootObject.name = RootObjectName; // Create starting values for while loop int currentObjectCount = 0; int currentColumn = 0; int currentRow = 0; Vector3 currentLocation = new Vector3( 0.0f, 0.0f, 0.0f ); // Continue loop until all rows // and columns have been filled while (currentRow < GridYSlider) { // Create a child GameObject, set its parent to root, // name it, and offset its location based on current location GameObject gridObject = new GameObject( ); gridObject.transform.SetParent( rootObject.transform ); gridObject.name = RootObjectName + "_" + currentObjectCount; gridObject.transform.position = currentLocation; // Give child gridObject a SpriteRenderer and set sprite on CurrentRow SpriteRenderer gridRenderer = gridObject.AddComponent <SpriteRenderer>( ); gridRenderer.sprite = ( currentRow == 0 ) ? SpriteGroundFile : SpriteDirtFile; // Give the gridObject a BoxCollider gridObject.AddComponent<BoxCollider2D>(); // Offset currentLocation for next gridObject to use currentLocation.x += spriteX; // Increment current column by one currentColumn++; // If the current collumn is greater than the X slider if (currentColumn >= GridXSlider) { // Reset column, incrmement row, reset x location // and offset y location downwards currentColumn = 0; currentRow++; currentLocation.x = 0; currentLocation.y -= spriteY; } // Add to currentObjectCount for naming of // gridObject children. currentObjectCount++; } } To start with, we must first have the X and Y sizes of the sprite we want to create so that we can offset the location of the children GameObjects that were created. As we originally checked to make sure that both sprites are of the same size, it doesn't matter which sprite object we get the size from. In our case, we will use SpriteGroundFile. We will then move the rootObject position to 0X, 0Y, and 0Z so that it is in the center of our scene. This can be set to anything you like, although when rootObject and its children get created, it is easier to find it at the center of the scene world. After it has been moved, we can set its name to the setting that the user had entered or Tiled Object (the default one). Once we have rootObject set up, we can create its children GameObjects. To start this cycle, we will need a few variables to reference and change: currentObjectCount: This specifies the total number of children that will be created. This increments for each one created. currentColumn: This denotes the current column we are on in the row. currentRow: This specifies the current row we are on. currentLocation: This denotes the current location that the children GameObject will use and sets its position too. This is changed after each new child is created based on the X or Y setting of the sprite size. Now that we have our rootObject and the variables we need to create the children, we can use while loop. A while loop is a loop that will continue until its condition fails. In our case, we will check whether currentRow is less than the GridYSlider value. As soon as currentRow is equal to or greater than GridYSlider, the loop will stop because the condition failed. The reason we will look at currentRow is that for each column created, we can reset its value to zero and increment currentRow by one. This means that each row will hold as many columns as were set by the GridXSlider value, and we know that the grid is complete when currentRow is equal or greater than GridYSlider. For example, if we had a grid setting of 3X and 3Y, the first row will hold three columns. When the first row is done, the row changes to two and adds three more columns. In the last row, it completes three more columns and then the while condition fails because the row value is equal to GridYSlider. In each loop of the while loop, we start by creating gridObject. We set this grid object parent to that of rootObject, set its name to RootObjectName, and concatenate an underscore, followed by currentObjectCount and then set the gridObject position to the currentLocation value, which will change based on the size of the sprite and the column/row. We will then add a SpriteRenderer component to gridObject and assign a sprite to it. We will change the sprite based on whether currentRow is equal to zero or not. If it is, in the first row, we will set the sprite to SpriteGroundFile. If currentRow is not equal to zero, we will set the sprite to SpriteDirtFile. The ternary operator is a sort of shorthand for if → else. If the condition is true, we will set the value to what is behind the question mark. If the condition is false, we will set the value based on what's behind the colon. The question mark represents if, whereas the colon represents else. The ternary operator is as follows: Value = ( condtion == true ) ? ifTrue : elseNotTrue; Once we have the sprite assigned to the SpriteRenderer component of gridObject, we can assign a BoxCollider2D component, which will make itself the same size as the sprite. If we were to add the BoxCollider2D component to SpriteRenderer, it would be the default size of 1, 1, 1, which would be too big. We will then offset currentLocation by the spriteX size, so the next gridObject will offset the size of the spriteX size. The currentColumn value is incremented by one, and we then check whether currentColumn is greater than or equal to the GridXSlider value. If it is, we know that we need to start the next row. To do this, we reset currentColumn to zero, increment currentRow by one, set the currentLocation.x value to zero, and offset currentLocation.y by negative spriteY size. This not only results in an offset location down, but also resets the X value to zero, making it possible for the columns to be created again; just down the size of spriteY. Finally, we increment currentObjectCount by one. Building the main menu UI The main menu UI will be its own Canvas GameObject. We will then handle the main menu and the game UI via the GameInfo class. We will also use the GameInfo class to manage button presses and the iOS integration. In Hierarchy, right-click and select UI and then click on Canvas. Name this new Canvas GameObject MenuUI. Let's start by adding five buttons to achievements, playing, leaderboards, remove iAds, and restore purchase. Right-click on the new MenuUI GameObject, navigate to UI, and left-click on Button. Do this four more times, so there are a total of five buttons that are children of the MenuUI GameObject. Name the buttons and text children as follows: PlayButton, PlayText LeaderboardButton, LeaderboardText AchievementButton, AchievementText RemoveAdsButton, RemoveAdsText RestorePurchaseButton, RestorePurchaseText Adding button images Next, we need to import the art that will be used for the main menu UI. In the Assets | UI folder, right-click and select Import New Asset. Select all the new images in the Assets | UI folder and change their settings as follows: Filter Mode: Trilinear Max Size: 256 Format: Truecolor PlayButton Select PlayButton in Hierarchy and search for Inspector. Change its settings as follows: Anchor: Bottom Center Pos X: 0 Pos Y: 115 Pos Z: 0 Width: 128 Height: 128 Source Image: MenuButton Now, select PlayButtonText. In the Inspector window, change its settings as follows: Text: Play Font: Arial Font Style: Bold Font Size: 36 Alignment: Center LeaderboardButton Select LeaderboardButton in the Hierarchy tab and search for Inspector. Change its settings as follows: Anchor: Bottom Center Pos X: 135 Pos Y: 115 Pos Z: 0 Width: 128 Height: 128 Source Image: MenuButton Select LeaderboardText. In the Inspector window, change its settings to: Text: Leaderboards Font: Arial Font Style: Bold Font Size: 17 Alignment: Center AchievementButton Select AchievementButton. In Hierarchy, search for Inspector. Change its settings as follows: Anchor: Bottom Center Pos X: -135 Pos Y: 115 Pos Z: 0 Width: 128 Height: 128 Source Image: MenuButton Now, select AchievementText and then in Inspector, change its settings to: Text: Achievements Font: Arial Font Style: Bold Font Size: 17 Alignment: Center RemoveAdsButton Select RemoveAdsButton in the Hierarchy tab and navigate to Inspector. Change its settings as follows: Anchor: Bottom Center Pos X: -64 Pos Y: 55 Pos Z: 0 Width: 96 Height: 42 Source Image: RestartButton Now, select RemoveAdsText and then in the Inspector window, change its settings as shown here: Text: Remove iAds Font: Arial Font Style: Bold Font Size: 12 Alignment: Center RestorePurchaseButton Let's select RestorePurchaseButton in the Hierarchy tab and search for Inspector. Change its settings as follows: Anchor: Bottom Center Pos X: 64 Pos Y: 55 Pos Z: 0 Width: 96 Height: 42 Source Image: RestartButton Now, select RestorePurchaseText and then in the Inspector window, change its settings as follows: Text: Restore Purchase Font: Arial Font Style: Bold Font Size: 14 Alignment: Center You should now have a button layout that looks similar to the following image: Summary In this article, we discussed how to create a Unity editor tool and a grid of GameObjects. These were laid out by the size of the sprites you chose and were flexible enough to use with your own settings. We also created prefabs for all of our bigger GameObjects, which could hold all of their components in a neat package. We also covered the basics of how to create a game for iOS and utilize its GameCenter features. Feel free to explore these features and add to them. Adding more store purchases, achievements, and leaderboards is simply repeating the steps that we have already done. Resources for Article: Further resources on this subject: Components in Unity[article] Saying Hello to Unity and Android [article] Unity Networking – The Pong Game [article]

0
0
2328

Packt

22 Sep 2015

13 min read

Find Friends on Facebook

Packt

22 Sep 2015

13 min read

In this article by the authors, Vikram Garg and Sharan Kumar Ravindran, of the book, Mastering Social Media Mining with R, we learn about data mining using Facebook as our resource. (For more resources related to this topic, see here.) We will see how to use the R package Rfacebook, which provides access to the Facebook Graph API from R. It includes a series of functions that allow us to extract various data about our network such as friends, likes, comments, followers, newsfeeds, and much more. We will discuss how to visualize our Facebook network and we will see some methodologies to make use of the available data to implement business cases. Rfacebook package installation and authentication The Rfacebook package is authored and maintained by Pablo Barbera and Michael Piccirilli. It provides an interface to the Facebook API. It needs Version 2.12.0 or later of R and it is dependent on a few other packages, such as httr, rjson, and httpuv. Before starting, make sure those packages are installed. It is preferred to have Version 0.6 of the httr package installed. Installation We will now install the Rfacebook packages. We can download and install the latest package from GitHub using the following code and load the package using the library function. On the other hand, we will also install the Rfacebook package from the CRAN network. One prerequisite for installing the package using the function install_github is to have the package devtools loaded into the R environment. The code is as follows: library(devtools) install_github("Rfacebook", "pablobarbera", subdir="Rfacebook") library(Rfacebook) After installing the Rfacebook package for connecting to the API, make an authentication request. This can be done via two different methods. The first method is by using the access token generated for the app, which is short-lived (valid for two hours); on the other hand, we can create a long-lasting token using the OAuth function. Let's first create a temporary token. Go to https://developers.facebook.com/tools/explorer, click on Get Token, and select the required user data permissions. The Facebook Graph API explorer will open with an access token. This access token will be valid for two hours. The status of the access token as well as the scope can be checked by clicking on the Debug button. Once the tokens expire, we can regenerate a new token. Now, we can access the data from R using the following code. The access token generated using the link should be copied and passed to the token variable. The use of username in the function getUsers is deprecated in the latest Graph API; hence, we are passing the ID of a user. You can get your ID from the same link that was used for token generation. This function can be used to pull the details of any user, provided the generated token has the access. Usually, access is limited to a few users with a public setting or those who use your app. It is also based on the items selected in the user data permission check page during token generation. In the following code, paste your token inside the double quotes, so that it can be reused across the functions without explicitly mentioning the actual token. token<- "XXXXXXXXX" A closer look at how the package works The getUsers function using the token will hit the Facebook Graph API. Facebook will be able to uniquely identify the users as well as the permissions to access information. If all the check conditions are satisfied, we will be able to get the required data. Copy the token from the mentioned URL and paste it within the double quotes. Remember that the token generated will be active only for two hours. Use the getUsers function to get the details of the user. Earlier, the getUsers function used to work based on the Facebook friend's name as well as ID; in API Version 2.0, we cannot access the data using the name. Consider the following code for example: token<- "XXXXXXXXX" me<- getUsers("778278022196130", token, private_info = TRUE) Then, the details of the user, such as name and hometown, can be retrieved using the following code: me$name The output is also mentioned for your reference: [1] "Sharan Kumar R" For the following code: me$hometown The output is as follows: [1] "Chennai, Tamil Nadu" Now, let's see how to create a long-lasting token. Open your Facebook app page by going to https://developers.facebook.com/apps/ and choosing your app. On theDashboard tab, you will be able to see the App ID and Secret Code values. Use those in the following code. require("Rfacebook") fb_oauth<- fbOAuth(app_id="11",app_secret="XX",extended_permissions = TRUE) On executing the preceding statements, you will find the following message in your console: Copy and paste into Site URL on Facebook App Settings: http://localhost:1410/ When done, press any key to continue... Copy the URL displayed and open your Facebook app; on the Settings tab, click on the Add Platform button and paste the copied URL in the Site URL text box. Make sure to save the changes. Then, return to the R console and press any key to continue, you will be prompted to enter your Facebook username and password. On completing that, you will return to the R console. If you find the following message, it means your long-lived token is ready to use. When you get the completion status, you might not be able to access any of the information. It is advisable to use the OAuth function a few minutes after creation of the Facebook application. Authentication complete. Authentication successful. After successfully authenticating, we can save it and load on demand using the following code: save(fb_oauth, file="fb_oauth") load("fb_oauth") When it is required to automate a few things or to use Rfacebook extensively, it will be very difficult as the tokens should be generated quite often. Hence, it is advisable to create a long-lasting token to authenticate the user, and then save it. Whenever required, we can just load it from a local file. Note that Facebook authentication might take several minutes. Hence, if your authentication fails on the retry, please wait for some time before pressing any key and check whether you have installed the httr package Version 0.6. If you continue to experience any issues in generating the token, then it's not a problem. We are good to go with the temporary token. Exercise Create an app in Facebook and authenticate by any one of the methods discussed. A basic analysis of your network In this section, we will discuss how to extract Facebook network of friends and some more information about the people in our network. After completing the app creation and authentication steps, let's move forward and learn to pull some basic network data from Facebook. First, let's find out which friends we have access to, using the following command in R. Let's use the temporary token for accessing the data: token<- "XXXXXXXXX" friends<- getFriends(token, simplify = TRUE) head(friends) # To see few of your friends The preceding function will return all our Facebook friends whose data is accessible. Version 1 of the API would allow us to download all the friends' data by default. But in the new version, we have limited access. Since we have set simplify as TRUE, we will pull only the username and their Facebook ID. By setting the same parameter to FALSE, we will be able to access additional data such as gender, location, hometown, profile picture, relationship status, and full name. We can use the function getUsers to get additional information about a particular user. The following information is available by default: gender, location, and language. We can, however, get some additional information such as relationship status, birthday, and the current location by setting the parameter private_info to TRUE: friends_data<- getUsers(friends$id, token, private_info = TRUE) table(friends_data$gender) The output is as follows: female male 5 21 We can also find out the language, location, and relationship status. The commands to generate the details as well as the respective outputs are given here for your reference: #Language table(substr(friends_data$locale, 1, 2)) The output is as follows: en 26 The code to find the location is as follows: # Location (Country) table(substr(friends_data$locale, 4, 5)) The output is as follows: GB US 1 25 Here's the code to find the relationship status: # Relationship Status table(friends_data$relationship_status) Here's the output: Engaged Married Single 1 1 3 Now, let's see what things were liked by us in Facebook. We can use the function getLikes to get the like data. In order to know about your likes data, specify user as me. The same function can be used to extract information about our friends, in which case we should pass the user's Facebook ID. This function will provide us with a list of Facebook pages liked by the user, their ID, name, and the website associated with the page. We can even restrict the number of results retrieved by setting a value to the parameter n. The same function will be used to get the likes of people in our network; instead of the keyword me, we should give the Facebook ID of those users. Remember we can only access data of people with accessibility from our app. The code is as follows: likes<- getLikes(user="me", token=token) head(likes) After exploring the use of functions to pull data, let's see how to use the Facebook Query Language using the function getFQL, which can be used to pass the queries. The following query will get you the list of friends in your network: friends<- getFQL("SELECT uid2 FROM friend WHERE uid1=me()", token=token) In order to get the complete details of your friends, the following query can be used. The query will return the username, Facebook ID, and the link to their profile picture. Note that we might not be able to access the complete network of friends' data, since access to data of all your friends are deprecated with Version 2.0. The code is as follows: # Details about friends Friends_details<- getFQL("SELECT uid, name, pic_square FROM user WHERE uid = me() OR uid IN (SELECT uid2 FROM friend WHERE uid1 = me())", token=token) In order to know more about the Facebook Query Language, check out the following link. This method of extracting the information might be preferred by people familiar with query language. It can also help extract data satisfying only specific conditions (https://developers.facebook.com/docs/technical-guides/fql). Exercise Download your Facebook network and do an exploration analysis on the languages your friends speak, places where they live, the total number of pages they have liked, and their marital status. Try all these with the Facebook Query Language as well. Network analysis and visualization So far, we used a few functions to get the details about our Facebook profile as well as friends' data. Let's see how to get to know more about our network. Before learning to get the network data, let's understand what a network is as well as a few important concepts about the network. Anything connected to a few other things could be a network. Everything in real life is connected to each other, for example, people, machines, events, and so on. It would make a lot of sense if we analyzed them as a network. Let's consider a network of people; here, people will be the nodes in the network and the relationship between them would be the edges (lines connecting them). Social network analysis The technique to study/analyze the network is called social network analysis. We will see how to create a simple plot of friends in our network in this section. To understand the nodes (people/places/etc) in a network in social network analysis, we need to evaluate the position of the nodes. We can evaluate the nodes using centrality. Centrality can be measured using different methods like degree, betweenness, and closeness. Let's first get our Facebook network and then get to know the centrality measures in detail. We use the function getNetwork to download our Facebook network. We need to mention how we would like to format the data. When the parameter format is set to adj.matrix, it will produce the data in matrix format where the people in the network would become the row names and column names of the matrix and if they are connected to each other, then the corresponding cell in the matrix will hold a value. The command is as follows: network<- getNetwork(token, format="adj.matrix") We now have our Facebook network downloaded. Let's visualize our network before getting to understand the centrality concept one by one with our own network. To visualize the network, we need to use the package called igraph in R. Since we downloaded our network in the adjacency matrix format, we will use the same function in igraph. We use the layout function to determine the placement of vertices in the network for drawing the graph and then we use the plot function to draw the network. In order to explore various other functionalities in these parameters, you can execute the ?<function_name> function in RStudio and the help window will have the description of the function. Let’s use the following code to load the package igraph into R. require(igraph) We will now build the graph using the function graph.adjacency; this function helps in creating a network graph using the adjacency matrix. In order to build a force-directed graph, we will use the function layout.drl. The force-directed graph will help in making the graph more readable. The commands are as follows: social_graph<- graph.adjacency(network) layout<- layout.drl(social_graph, options=list(simmer.attraction=0)) At last, we will use the plot function with various built in parameters to make the graph more readable. For example, we can name the nodes in our network, we can set the size of the nodes as well as the edges in the network, and we can color the graph and the components of the graph. Use the following code to see what the network looks like. The output that was plotted can be saved locally using the function dev.copy, and the size of the image as well as the type can be passed as a parameter to the function: plot(social_graph, vertex.size=10, vertex.color="green", vertex.label=NA, vertex.label.cex=0.5, edge.arrow.size=0, edge.curved=TRUE, layout=layout.fruchterman.reingold) dev.copy(png,filename= "C:/Users/Sharan/Desktop/3973-03-community.png", width=600, height=600); dev.off (); With the preceding plot function, my network will look like the following one. In the following network, the node labels (name of the people) have been disabled. They can be enabled by removing the vertex.label parameter. Summary In this article, we discussed how to use the various functions implemented in the Rfacebook package, analyze the network. This article covers the important techniques that helps in performing vital network analysis and also enlightens us about the wide range of business problems that could be addressed with the Facebook data. It gives us a glimpse of the great potential for implementation of various analyses. Resources for Article: Further resources on this subject: Using App Directory in HootSuite[article] Supervised learning[article] Warming Up [article]

0
0
2572

article-image-prototyping-levels-prototype

Packt

22 Sep 2015

13 min read

Prototyping Levels with Prototype

Packt

22 Sep 2015

13 min read

Level design 101 – planning Now, just because we are going to be diving straight into Unity, I feel it's important to talk a little more about how level design is done in the game industry. While you may think a level designer will just jump into the editor and start playing, the truth is you would normally need to do a ton of planning ahead of time before you even open up your tool. Generally, a level begins with an idea. This can come from anything; maybe you saw a really cool building or a photo on the Internet gave you a certain feeling; maybe you want to teach the player a new mechanic. Turning this idea into a level is what a level designer does. Taking all of these ideas, the level designer will create a level design document, which will outline exactly what you're trying to achieve with the entire level from start to end. In this article by John Doran, author of Building FPS Games with Unity, a level design document will describe everything inside the level; listing all of the possible encounters, puzzles, so on and so forth, which the player will need to complete as well as any side quests that the player will be able to achieve. To prepare for this, you should include as many references as you can with maps, images, and movies similar to what you're trying to achieve. If you're working with a team, making this document available on a website or wiki will be a great asset so that you know exactly what is being done in the level, what the team can use in their levels, and how difficult their encounters can be. Generally, you'll also want a top-down layout of your level done either on a computer or with a graph paper, with a line showing a player's general route for the level with the encounters and missions planned out. (For more resources related to this topic, see here.) Of course, you don't want to be too tied down to your design document. It will change as you playtest and work on the level, but the documentation process will help solidify your ideas and give you a firm basis to work from. For those of you interested in seeing some level design documents, feel free to check out Adam Reynolds' Level Designer on Homefront and Call of Duty: World at War at http://wiki.modsrepository.com/index.php?title=Level_Design:_Level_Design_Document_Example. If you want to learn more about level design, I'm a big fan of Beginning Game Level Design by John Feil (previously, my teacher) and Marc Scattergood, Cengage Learning PTR. For more of an introduction to all of game design from scratch, check out Level Up!: The Guide to Great Video Game Design by Scott Rogers and Wiley and The Art of Game Design by Jesse Schel. For some online resources, Scott has a neat GDC talk called Everything I Learned About Level Design I Learned from Disneyland, which can be found at http://mrbossdesign.blogspot.com/2009/03/everything-i-learned-about-game-design.html, and World of Level Design (http://worldofleveldesign.com/) is a good source to learn about level design, though it does not talk about Unity specifically. In addition to a level design document, you can also create a game design document (GDD) that goes beyond the scope of just the level and includes story, characters, objectives, dialogue, concept art, level layouts, and notes about the game's content. However, it is something to do on your own. Creating architecture overview As a level designer, one of the most time-consuming parts of your job will be creating environments. There are many different ways out there to create levels. By default, Unity gives us some default meshes such as a Box, Sphere, and Cylinder. While it's technically possible to build a level in this way, it could get really tedious very quickly. Next, I'm going to quickly go through the most popular options to build levels for the games made in Unity before we jump into building a level of our own. 3D modelling software A lot of times, opening up a 3D modeling software package and building an architecture that way is what professional game studios will often do. This gives you maximum freedom to create your environment and allows you to do exactly what it is you'd like to do; but it requires you to be proficient in that tool, be it Maya, 3ds Max, Blender (which can be downloaded for free at blender.org), or some other tool. Then, you just need to export your models and import them into Unity. Unity supports a lot of different formats for 3D models (most commonly used are .obj and .fbx), but there are a lot of issues to consider. For some best practices when it comes to creating art assets, please visit http://blogs.unity3d.com/2011/09/02/art-assets-best-practice-guide/. Constructing geometry with brushes Constructive Solid Geometry (CSG), commonly referred to as brushes, is a tool artists/designers use to quickly block out pieces of a level from scratch. Using brushes inside the in-game level editor has been a common approach for artists/designers to create levels. Unreal Engine 4, Hammer, Radiant, and other professional game engines make use of this building structure, making it quite easy for people to create and iterate through levels quickly through a process called white-boxing, as it's very easy to make changes to the simple shapes. However; just like learning a modeling software tool, there can be a higher barrier for entry in creating complex geometry using a 3D application, but using CSG brushes will provide a quick solution to create shapes with ease. Unity does not support building things like this by default, but there are several tools in the Unity Asset Store, which allow you to do something like this. For example, sixbyseven studio has an extension called ProBuilder that can add this functionality to Unity, making it very easy to build out levels. The only possible downside is the fact that it does cost money, though it is worth every penny. However, sixbyseven has kindly released a free version of their tools called Prototype, which we installed earlier. It contains everything we will need for this chapter, but it does not allow us to add custom textures and some of the more advanced tools. We will be using ProBuilder later on in the book to polish the entire product. You can find out more information about ProBuilder at http://www.protoolsforunity3d.com/probuilder/. Modular tilesets Another way to generate architecture is through the use of "tiles" that are created by an artist. Similar to using Lego pieces, we can use these tiles to snap together walls and other objects to create a building. With creative uses of the tiles, you can create a large amount of content with just a minimal amount of assets. This is probably the easiest way to create a level at the expense of not being able to create unique looking buildings, since you only have a few pieces to work with. Titles such as Skyrim use this to a great extent to create their large world environments. Mix and match Of course, it's also possible to use a mixture of the preceding tools in order to use the advantages of certain ways of doing things. For example, you could use brushes to block out an area and then use a group of tiles called a tileset to replace the boxes with the highly detailed models, which is what a lot of AAA studios do. In addition, we could initially place brushes to test our gameplay and then add in props to break up the repetitiveness of the levels, which is what we are going to be doing. Creating geometry The first thing we are going to do is to learn how we can create geometry as described in the following steps: From the top menu, go to File | New Scene. This will give us a fresh start to build our project. Next, because we already have Prototype installed, let's create a cube by hitting Ctrl + K. Right now, our Cube (with a name of pb-Cube-1562 or something similar) is placed on a Position of 2, -7, -2. However, for simplicity's sake, I'm going to place it in the middle of the world. We can do this by typing in 0,0,0 by left-clicking in the X position field, typing 0, and then pressing Tab. Notice the cursor is now automatically at the Y part. Type in 0, press Tab again, and then, from the Z slot, press 0 again. Alternatively you can right-click on the Transform component and select Reset Position. Next, we have to center the camera back onto our Cube object. We can do this by going over to the Hierarchy tab and double-clicking on the Cube object (or selecting it and then pressing F). Now, to actually modify this cube, we are going to open up Prototype. We can do this by first selecting our Cube object, going to the Pb_Object component, and then clicking on the green Open Prototype button. Alternatively, you can also go to Tools | Prototype | Prototype Window. This is going to bring up a window much like the one I have displayed here. This new Prototype tab can be detached from the main Unity window or, if you drag from the tab over into Unity, it can be "hooked" into place elsewhere, like the following screenshot shows by my dragging and dropping it to the right of the Hierarchy tab. Next, select the Scene tab in the middle of the screen and press the G key to toggle us into the Object/Geometry mode. Alternatively, you can also click on the Element button in the Scene tab. Unlike the default Object/Top level mode, this will allow us to modify the cube directly to build upon it. For more information on the different modes, check out the Modes & Elements section from http://www.protoolsforunity3d.com/docs/probuilder/#buildingAndEditingGeometry. You'll notice the top of the Prototype tab has three buttons. These stand for what selection type you are currently wanting to use. The default is Vertex or the Point mode, which will allow us to select individual parts to modify. The next is Edge and the last is Face. Face is a good standard to use at this stage, because we only want to extend things out. Select the Face mode by either clicking on the button or pressing the H key twice until it says Editing Faces on the screen. Afterwards, select the box's right side. For a list of keyword shortcuts included with Prototype/ProBuilder, check out http://www.protoolsforunity3d.com/docs/probuilder/#keyboardShortcuts. Now, pull on the red handle to extend our brush outward. Easy enough. Note that, by default, while pulling things out, it is being done in 1 increment. This is nice when we are polishing our levels and trying to make things exactly where we want them, but right now, we are just prototyping. So, getting it out as quickly as possible is paramount to test if it's enjoyable. To help with this, we can use a feature of Unity called Unit Snapping. Undo the previous change we made by pressing Ctrl+Z. Then, move the camera over to the other side and select our longer face. Drag it 9 units out by holding down the Control key (Command on Mac). ProCore3D also has another tool out called ProGrids, which has some advanced unit snapping functionality, but we are not going to be using it. For more information on it, check out http://www.protoolsforunity3d.com/progrids/ If you'd like to change the distance traveled while using unit snapping, set it using the Edit | Snap Settings… menu. Next, drag both the sides out until they are 9 x 9 wide. To make things easier to see, select the Directional Light object in our scene via the Hierarchy tab and reduce the Light component's Intensity to . 5. So, at this point, we have a nice looking floor. However, to create our room, we are first going to need to create our ceiling. Select the floor we have created and press Ctrl + D to duplicate the brush. Once completed, change back into the Object/Top Level editing mode and move the brush so that its Position is at 0, 4, 0. Alternatively, you can click on the duplicated object and, from the Inspector tab, change the Position's Y value to 4. Go back into the sub-selection mode by hitting H to go back to the Faces mode. Then, hold down Ctrl and select all of the edges of our floor. Click on the Extrude button from the Prototype panel. This creates a new part on each of the four edges, which is by default .5 wide (change by clicking on the + button on the edge). This adds additional edges and/or faces to our object. Next, we are going to extrude again; but, rather than doing it from the menu, let's do it manually by selecting the tops of our newly created edges and holding down the Shift button and dragging it up along the Y (green) axis. We then hold down Ctrl after starting the extrusion to have it snap appropriately to fit around our ceiling. Note that the box may not look like this as soon as you let go, as Prototype needs time to compute lighting and materials, which it will mention from the bottom right part of Unity. Next, select Main Camera in the Hierarchy, hit W to switch to the Translate mode, and F to center the selection. Then, move our camera into the room. You'll notice it's completely dark due to the ceiling, but we can add light to the world to fix that! Let's add a point light by going to GameObject | Light | Point Light and position it in the center of the room towards the ceiling (In my case, it was at 4.5, 2.5. 3.5). Then, up the Range to 25 so that it hits the entire room. Finally, add a player to see how he interacts. First, delete the Main Camera object from Hierarchy, as we won't need it. Then, go into the Project tab and open up the AssetsUFPSBaseContentPrefabsPlayers folder. Drag and drop the AdvancedPlayer prefab, moving it so that it doesn't collide with the walls, floors, or ceiling, a little higher than the ground as shown in the following screenshot: Next, save our level (Chapter 3_1_CreatingGeometry) and hit the Play button. It may be a good idea for you to save your levels in such a way that you are able to go back and see what was covered in each section for each chapter, thus making things easier to find in the future. Again, remember that we can pull a weapon out by pressing the 1-5 keys. With this, we now have a simple room that we can interact with! Summary In this article, we take on the role of a level designer, who has been asked to create a level prototype to prove that our gameplay is solid. We will use the free Prototype tool to help in this endeavor. In addition, we will also learn some beginning level designs. Resources for Article: Further resources on this subject: Unity Networking – The Pong Game [article] Unity 3.x Scripting-Character Controller versus Rigidbody [article] Animations in Cocos2d-x [article]

0
0
10546

article-image-implementing-decision-trees

Packt

22 Sep 2015

4 min read

Implementing Decision Trees

Packt

22 Sep 2015

4 min read

0
0
37536

article-image-getting-started-apache-spark-dataframes

Packt

22 Sep 2015

5 min read

Getting Started with Apache Spark DataFrames

Packt

22 Sep 2015

5 min read

In this article article about Arun Manivannan’s book Scala Data Analysis Cookbook, we will cover the following recipes: Getting Apache Spark ML – a framework for large-scale machine learning Creating a data frame from CSV (For more resources related to this topic, see here.) Getting started with Apache Spark Breeze is the building block of Spark MLLib, the machine learning library for Apache Spark. In this recipe, we'll see how to bring Spark into our project (using SBT) and look at how it works internally. The code for this recipe could be found at https://github.com/arunma/ScalaDataAnalysisCookbook/blob/master/chapter1-spark-csv/build.sbt. How to do it... Pulling Spark ML into our project is just a matter of adding a few dependencies on our build.sbt file: spark-core, spark-sql, and spark-mllib: Under a brand new folder (which will be our project root), we create a new file called build.sbt. Next, let's add to the project dependencies the Spark libraries: organization := "com.packt" name := "chapter1-spark-csv" scalaVersion := "2.10.4" val sparkVersion="1.3.0" libraryDependencies ++= Seq( "org.apache.spark" %% "spark-core" % sparkVersion, "org.apache.spark" %% "spark-sql" % sparkVersion, "org.apache.spark" %% "spark-mllib" % sparkVersion ) resolvers ++= Seq( "Apache HBase" at "https://repository.apache.org/content/repositories/releases", "Typesafe repository" at "http://repo.typesafe.com/typesafe/releases/" ) How it works... Spark has four major higher level tools built on top of the Spark Core: Spark Streaming, Spark ML Lib (Machine Learning), Spark SQL (An SQL interface for accessing data), and GraphX (for graph processing). The Spark Core is the heart of Spark, providing higher level abstractions in various languages for data representation, serialization, scheduling, metrics, and so on. For this recipe, we skipped streaming and GraphX and added the remaining three libraries. There’s more… Apache Spark is a cluster computing platform that claims to run about 100 times faster than Hadoop (that's a mouthful). In our terms, we could consider that as a means to run our complex logic over a massive amount of data at a blazingly high speed. The other good thing about Spark is that the programs we write are much smaller than the typical Map Reduce classes that we write for Hadoop. So, not only do our programs run faster, but it also takes lesser time to write them in the first place. Creating a data frame from CSV In this recipe, we'll look at how to create a new data frame from a Delimiter Separated Values (DSV) file. The code for this recipe could be found athttps://github.com/arunma/ScalaDataAnalysisCookbook/tree/master/chapter1-spark-csv in the DataFrameCSV class. How to do it... CSV support isn't first-class in Spark but is available through an external library from databricks. So, let's go ahead and add that up in build.sbt: After adding the spark-csv dependency, our complete build.sbt looks as follows: organization := "com.packt" name := "chapter1-spark-csv" scalaVersion := "2.10.4" val sparkVersion="1.3.0" libraryDependencies ++= Seq( "org.apache.spark" %% "spark-core" % sparkVersion, "org.apache.spark" %% "spark-sql" % sparkVersion, "org.apache.spark" %% "spark-mllib" % sparkVersion, "com.databricks" %% "spark-csv" % "1.0.3" ) resolvers ++= Seq( "Apache HBase" at"https://repository.apache.org/content/repositories/releases", "Typesafe repository" at "http://repo.typesafe.com/typesafe/releases/" ) fork := true Before we create the actual data frame, there are three steps that we ought to do: create the Spark configuration, create the Spark context, and create the SQL context. SparkConf holds all of the information for running this Spark cluster. For this recipe, we are running locally, and we intend to use only two cores in the machine—local[2]: val conf = new SparkConf().setAppName("csvDataFrame").setMaster("local[2]") For this recipe, we'll be running Spark on standalone mode. Now let's load our pipe-separated file: org.apache.spark.sql.DataFrame val students=sqlContext.csvFile(filePath="StudentData.csv", useHeader=true, delimiter='|') How it works... The csvFile function of sqlContext accepts the full filePath of the file to be loaded. If the CSV has a header, then the useHeader flag will read the first row as column names. The delimiter flag, as expected, defaults to a comma, but you can override the character as needed. Instead of using the csvFile function, you can also use the load function available in the SQL context. The load function accepts the format of the file (in our case, it is CSV) and options as a map. We can specify the same parameters that we specified earlier using Map, like this: val options=Map("header"->"true", "path"->"ModifiedStudent.csv") val newStudents=sqlContext.load("com.databricks.spark.csv",options) Summary In this article, you learned in detail Apache Spark ML, a framework for large-scale machine learning. Then we saw the creation of a data frame from CSV with the help of example code. Resources for Article: Further resources on this subject: Integrating Scala, Groovy, and Flex Development with Apache Maven[article] Ridge Regression[article] Reactive Data Streams [article]

0
0
9371

article-image-putting-function-functional-programming

Packt

22 Sep 2015

27 min read

Putting the Function in Functional Programming

Packt

22 Sep 2015

27 min read

0
0
23805

article-image-using-google-maps-apis-knockoutjs

Packt

22 Sep 2015

7 min read

Using Google Maps APIs with Knockout.js

Packt

22 Sep 2015

7 min read

This article by Adnan Jaswal, the author of the book, KnockoutJS by Example, will render a map of the application and allow the users to place markers on it. The users will also be able to get directions between two addresses, both as description and route on the map. (For more resources related to this topic, see here.) Placing marker on the map This feature is about placing markers on the map for the selected addresses. To implement this feature, we will: Update the address model to hold the marker Create a method to place a marker on the map Create a method to remove an existing marker Register subscribers to trigger the removal of the existing markers when an address changes Update the module to add a marker to the map Let's get started by updating the address model. Open the MapsApplication module and locate the AddressModel variable. Add an observable to this model to hold the marker like this: /* generic model for address */ var AddressModel = function() { this.marker = ko.observable(); this.location = ko.observable(); this.streetNumber = ko.observable(); this.streetName = ko.observable(); this.city = ko.observable(); this.state = ko.observable(); this.postCode = ko.observable(); this.country = ko.observable(); }; Next, we create a method that will create and place the marker on the map. This method should take location and address model as parameters. The method will also store the marker in the address model. Use the google.maps.Marker class to create and place the marker. Our implementation of this method looks similar to this: /* method to place a marker on the map */ var placeMarker = function (location, value) { // create and place marker on the map var marker = new google.maps.Marker({ position: location, map: map }); //store the newly created marker in the address model value().marker(marker); }; Now, create a method that checks for an existing marker in the address model and removes it from the map. Name this method removeMarker. It should look similar to this: /* method to remove old marker from the map */ var removeMarker = function(address) { if(address != null) { address.marker().setMap(null); } }; The next step is to register subscribers that will trigger when an address changes. We will use these subscribers to trigger the removal of the existing markers. We will use the beforeChange event of the subscribers so that we have access to the existing markers in the model. Add subscribers to the fromAddress and toAddress observables to trigger on the beforeChange event. Remove the existing markers on the trigger. To achieve this, I created a method called registerSubscribers. This method is called from the init method of the module. The method registers the two subscribers that triggers calls to removeMarker. Our implementation looks similar to this: /* method to register subscriber */ var registerSubscribers = function () { //fire before from address is changed mapsModel.fromAddress.subscribe(function(oldValue) { removeMarker(oldValue); }, null, "beforeChange"); //fire before to address is changed mapsModel.toAddress.subscribe(function(oldValue) { removeMarker(oldValue); }, null, "beforeChange"); }; We are now ready to bring the methods we created together and place a marker on the map. Create a map called updateAddress. This method should take two parameters: the place object and the value binding. The method should call populateAddress to extract and populate the address model, and placeMarker to place a new marker on the map. Our implementation looks similar to this: /* method to update the address model */ var updateAddress = function(place, value) { populateAddress(place, value); placeMarker(place.geometry.location, value); }; Call the updateAddress method from the event listener in the addressAutoComplete custom binding: google.maps.event.addListener(autocomplete, 'place_changed', function() { var place = autocomplete.getPlace(); console.log(place); updateAddress(place, value); }); Open the application in your browser. Select from and to addresses. You should now see markers appear for the two selected addresses. In our browser, the application looks similar to the following screenshot: Displaying a route between the markers The last feature of the application is to draw a route between the two address markers. To implement this feature, we will: Create and initialize the direction service Request routing information from the direction service and draw the route Update the view to add a button to get directions Let's get started by creating and initializing the direction service. We will use the google.maps.DirectionsService class to get the routing information and the google.maps.DirectionsRenderer to draw the route on the map. Create two attributes in the MapsApplication module: one for directions service and the other for directions renderer: /* the directions service */ var directionsService; /* the directions renderer */ var directionsRenderer; Next, create a method to create and initialize the preceding attributes: /* initialise the direction service and display */ var initDirectionService = function () { directionsService = new google.maps.DirectionsService(); directionsRenderer = new google.maps.DirectionsRenderer({suppressMarkers: true}); directionsRenderer.setMap(map); }; Call this method from the mapPanel custom binding handler after the map has been created and cantered. The updated mapPanel custom binding should look similar to this: /* custom binding handler for maps panel */ ko.bindingHandlers.mapPanel = { init: function(element, valueAccessor){ map = new google.maps.Map(element, { zoom: 10 }); centerMap(localLocation); initDirectionService(); } }; The next step is to create a method that will build and fire a request to the direction service to fetch the direction information. The direction information will then be used by the direction renderer to draw the route on the map. Our implementation of this method looks similar to this: /* method to get directions and display route */ var getDirections = function () { //create request for directions var routeRequest = { origin: mapsModel.fromAddress().location(), destination: mapsModel.toAddress().location(), travelMode: google.maps.TravelMode.DRIVING }; //fire request to route based on request directionsService.route(routeRequest, function(response, status) { if (status == google.maps.DirectionsStatus.OK) { directionsRenderer.setDirections(response); } else { console.log("No directions returned ..."); } }); }; We create a routing request in the first part of the method. The request object consists of origin, destination, and travelMode. The origin and destination values are set to the locations for from and to addresses. The travelMode is set to google.maps.TravelMode.DRIVING, which, as the name suggests, specifies that we require driving route. Add the getDirections method to the return statement of the module as we will bind it to a button in the view. One last step before we can work on the view is to clear the route on the map when the user selects a new address. This can be achieved by adding an instruction to clear the route information in the subscribers we registerd earlier. Update the subscribers in the registerSubscribers method to clear the routes on the map: /* method to register subscriber */ var registerSubscribers = function () { //fire before from address is changed mapsModel.fromAddress.subscribe(function(oldValue) { removeMarker(oldValue); directionsRenderer.set('directions', null); }, null, "beforeChange"); //fire before to address is changed mapsModel.toAddress.subscribe(function(oldValue) { removeMarker(oldValue); directionsRenderer.set('directions', null); }, null, "beforeChange"); }; The last step is to update the view. Open the view and add a button under the address input components. Add click binding to the button and bind it to the getDirections method of the module. Add enable binding to make the button clickable only after the user has selected the two addresses. The button should look similar to this: <button type="button" class="btn btn-default" data-bind="enable: MapsApplication.mapsModel.fromAddress && MapsApplication.mapsModel.toAddress, click: MapsApplication.getDirections"> Get Directions </button> Open the application in your browser and select the From address and To address option. The address details and markers should appear for the two selected addresses. Click on the Get Directions button. You should see the route drawn on the map between the two markers. In our browser, the application looks similar to the following screenshot: Summary In this article, we walked through placing markers on the map and displaying the route between the markers. Resources for Article: Further resources on this subject: KnockoutJS Templates[article] Components [article] Web Application Testing [article]

0
0
5071

How-To Tutorials

article-image-introduction-penetration-testing-and-kali-linux

Packt

22 Sep 2015

4 min read

Introduction to Penetration Testing and Kali Linux

Packt

22 Sep 2015

4 min read

In this article by Juned A Ansari, author of the book, Web Penetration Testing with Kali Linux, Second Edition, the author wants us to learn about the following topics: Introduction to penetration testing An Overview of Kali Linux Using Tor for penetration testing (For more resources related to this topic, see here.) Introduction to penetration testing Penetration testing or Ethical hacking is a proactive way of testing your web applications by simulating an attack that's similar to a real attack that could occur on any given day. We will use the tools provided in Kali Linux to accomplish this. Kali Linux is the rebranded version of Backtrack and is now based on Debian-derived Linux distribution. It comes preinstalled with a large list of popular hacking tools that are ready to use with all the prerequisites installed. We will dwell deep into the tools that would help Pentest web applications, and also attack websites in a lab vulnerable to major flaws found in real world web applications. An Overview of Kali Linux Kali Linux is security-focused Linux distribution based on Debian. It's a rebranded version of the famous Linux distribution known as Backtrack, which came with a huge repository of open source hacking tools for network, wireless, and web application penetration testing. Although Kali Linux contains most of the tools from Backtrack, the main aim of Kali Linux is to make it portable so that it can be installed on devices based on the ARM architectures, such as tablets and Chromebook, which makes the tools available at your disposal with much ease. Using open source hacking tools comes with a major drawback. They contain a whole lot of dependencies when installed on Linux, and they need to be installed in a predefined sequence; authors of some tools have not released accurate documentation, which makes our life difficult. Kali Linux simplifies this process; it contains many tools preinstalled with all the dependencies and are in ready-to-use condition so that you can pay more attention for the actual attack and not on installing the tool. Updates for tools installed in Kali Linux are more frequently released, which helps you to keep the tools up to date. A noncommercial toolkit that has all the major hacking tools preinstalled to test real-world networks and applications is a dream of every ethical hacker and the authors of Kali Linux make every effort to make our life easy, which enables us to spend more time on finding the actual flaws rather than building a toolkit. Using Tor for penetration testing The main aim of a penetration test is to hack into a web application in a way that a real-world malicious hacker would do it. Tor provides an interesting option to emulate the steps that a black hat hacker uses to protect his identity and location. Although an ethical hacker trying to improve the security of a web application should be not be concerned about hiding his location, Tor will give an additional option of testing the edge security systems such as network firewalls, web application firewalls, and IPS devices. Black hat hackers try every method to protect their location and true identity; they do not use a permanent IP address and constantly change it to fool cybercrime investigators. You will find port scanning request from a different range of IP addresses, and the actual exploitation having the source IP address that you edge security systems are logging for the first time. With the necessary written approval from the client, you can use Tor to emulate an attacker by connecting to the web application from an unknown IP address that the system does not usually see connections from. Using Tor makes it more difficult to trace back the intrusion attempt to the actual attacker. Tor uses a virtual circuit of interconnected network relays to bounce encrypted data packets. The encryption is multilayered and the final network relay releasing the data to the public Internet cannot identify the source of the communication as the entire packet was encrypted and only a part of it is decrypted at each node. The destination computer sees the final exit point of the data packet as the source of the communication, thus protecting the real identify and location of the user. The following figure shows the working of Tor: Summary This article served as an introduction to penetration testing of web application and Kali Linux. At the end, we looked at how to use Tor for penetration testing. Resources for Article: Further resources on this subject: An Introduction to WEP[article] WLAN Encryption Flaws[article] What is Kali Linux [article]

0
0
24387

How-To Tutorials

article-image-internet-connected-smart-water-meter

Packt

22 Sep 2015

13 min read

Internet Connected Smart Water Meter

Packt

22 Sep 2015

13 min read

0
0
17336

How-To Tutorials

Packt

22 Sep 2015

18 min read

Cassandra Design Patterns

Packt

22 Sep 2015

18 min read

In this article by Rajanarayanan Thottuvaikkatumana, author of the book Cassandra Design Patterns, Second Edition, the author has discussed how Apache Cassandra is one of the most popular NoSQL data stores. He states this based on the research paper Dynamo: Amazon’s Highly Available Key-Value Store and the research paper Bigtable: A Distributed Storage System for Structured Data. Cassandra is implemented with best features from both of these research papers. In general, NoSQL data stores can be classified into the following groups: Key-value data store Column family data store Document data store Graph data store Cassandra belongs to the column family data store group. Cassandra’s peer-to-peer architecture avoids single point failures in the cluster of Cassandra nodes and gives the ability to distribute the nodes across racks or data centres. This makes Cassandra a linearly scalable data store. In other words, the more processing you need, the more Cassandra nodes you can add to your cluster. Cassandra’s multi data centre support makes it a perfect choice to replicate the data stores across data centres for disaster recovery, high availability, separating transaction processing, analytical environments, and for building resiliency into the data store infrastructure. Design patterns in Cassandra The term “design patterns” is a highly misinterpreted term in the software development community. In an extremely general sense, it is a set of solutions for some known problems in quite a specific context. It is used in this book to describe a pattern of using certain features of Cassandra to solve some real-world problems. This book is a collection of such design patterns with real-world examples. Coexistence patterns Cassandra is one of the highly successful NoSQL data stores, which is greatly similar to the traditional RDBMS. Cassandra column families (also known as Cassandra tables), in a logical perspective, have a similarity with RDBMS-based tables in the view of the users, even though the underlying structure of these tables are totally different. Because of this, Cassandra is best fit to be deployed along with the traditional RDBMS to solve some of the problems that RDBMS is not able to handle. The caveat here is that because of the similarity of RDBMS tables and Cassandra column families in the view of the end users, many users and data modelers try to use Cassandra in the exact the same way as the RDBMS schema is being modeled, used, and getting into serious deployment issues. How do you prevent such pitfalls? The key here is to understand the differences in a theoretical perspective as well as in a practical perspective, and follow best practices prescribed by the creators of Cassandra. Where do you start with Cassandra? The best place to look at is the new application development requirements and take it from there. Look at the cases where there is a need to normalize the RDBMS tables and keep all the data items together, which would have got distributed if you were to design the same solution in RDBMS. Instead of thinking from the pure data model perspective, start thinking in terms of the application's perspective. How the data is generated by the application, what are the read requirements, what are the write requirements, what is the response time expected out of some of the use cases, and so on. Depending on these aspects, design the data model. In the big data world, the application becomes the first class citizen and the data model leaves the driving seat in the application design. Design the data model to serve the needs of the applications. In any organization, new reporting requirements come all the time. The major challenge in to generate reports is the underlying data store. In the RDBMS world, reporting is always a challenge. You may have to join multiple tables to generate even simple reports. Even though the RDBMS objects such as views, stored procedures, and indexes maybe used to get the desired data for the reports, when the report is being generated, the query plan is going to be very complex most of the time. The consumption of processing power is another need to consider when generating such reports on the fly. Because of these complexities, many times, for reporting requirements, it is common to keep separate tables containing data exported from the transactional tables. This is a great opportunity to start with NoSQL stores like Cassandra as a reporting data store. Data aggregation and summarization are common requirements in any organization. This helps to control the data growth by storing only the summary statistics and moving the transactional data into archives. Many times, this aggregated and summarized data is used for statistical analysis. Making the summary accurate and easily accessible is a big challenge. Most of the time, data aggregation and reporting goes hand in hand. The aggregated data is heavily used in reports. The aggregation process speeds up the queries to a great extent. This is another place where you can start with NoSQL stores like Cassandra. The coexistence of RDBMS and NoSQL data stores like Cassandra is very much possible, feasible, and sensible; and this is the only way to get started with the NoSQL movement, unless you embark on a totally new product development from scratch. In summary, this section of the book discusses about some design patterns related to de-normalization, reporting, and aggregation of data using Cassandra as the preferred NoSQL data store. RDBMS migration patterns A big bang approach to any kind of technology migration is not advisable. A series of deliberations have to happen before the eventual and complete change over. Migration from RDBMS to Cassandra is not different at all. Any new technology replacing an old one must coexist harmoniously, at least for a short period of time. This gives a lot of confidence on the new technology to the stakeholders. Many technology pundits give various approaches on the RDBMS to NoSQL migration strategies. Many such guidelines are specific to the particular NoSQL data stores giving attention to specific areas, and most of the time, this will end up on the process rather than the technology. The migration from RDBMS to Cassandra is not an easy task. Mainly because the RDBMS-based systems are really time tested and trust worthy in most of the organizations. So, migrating from such a robust RDBMS-based system to Cassandra is not going to be easy for anyone. One of the best approaches to achieve this goal is to exploit some of the new or unique features in Cassandra, which many of the traditional RDBMS don't have. This also prevents the usage of Cassandra just like any other RDBMS. Cassandra is unique. Cassandra is not an RDBMS. The approach of banking on the unique features is not only applicable to the RDBMS to Cassandra migration, but also to any migration from one paradigm to another. Some of the design patterns that are discussed in this section of the book revolve around very simple and important features of Cassandra, but have profound application potential when designing the next generation NoSQL data stores using Cassandra. A wise usage of these unique features in Cassandra will give a head start on the eventual and complete migration from RDBMS. The modeling of collection objects in RDBMS is a real pain, because multiple tables are to be defined and a join is required to access data. Many RDBMS offer this by providing capability to define user-defined data types, but there is absolutely no standardization at all in this space. Collection objects are very commonly seen in the real-world applications. A list of actions, tuple of related values, set of objects, dictionaries, and things like that come quite often in applications. Cassandra has elegant ways to model this because they are data types in column families. Counting is a very commonly required process in many business processes and applications. In RDBMS, this has to be modeled as integers or long numbers, but many times, applications make big mistakes in using them in wrong ways. Cassandra has a counter data type in the column family that alleviates this problem. Getting rid of unwanted records from an RDBMS table is not an automatic process. When some application events occur, they have to be removed by application programs or through some other means. But in many situations, many data items will have a preallocated time to live. They should go away without the intervention of any external events. Cassandra has a way to assign time-to-live (TTL) attribute to data items. By making use of TTL, data items get removed without any other external event's intervention. All the design patterns covered in this section of the book revolve around some of the new features of Cassandra that will make the migration from RDBMS to Cassandra an easy task. Cache migration pattern Database access whether it is from RDBMS or other highly distributed NoSQL data stores is always an input/output (I/O) intensive operation. It makes perfect sense to cache the frequently used, but reasonably static data for fast access for the applications consuming this data. In such situations, the in-memory cache is preferred to the repeated database access for each request. Using cache is not always a pleasant experience. Getting into really weird problems such as data loss, data getting out of sync with its source and other data integrity problems are very common. It is very common to see wrong components coming into the enterprise solution stack all the time for various reasons. Overlooking on some of the features and adopting the technology without much background work is a very common pitfall. Many a times, the use of cache comes into the solution stack to reduce the latency of the responses. Once the initial results are favorable, more and more data will get tossed into the cache. Slowly, this will become a practice to see that more and more data is getting into cache. Now is the time when problems start popping up one by one. Pure in-memory cache solutions are favored by everybody, by the virtue of its ability to serve the data quickly until you start loosing data. This is because of the faults in the system, along with application and node crashes. Cache serves data much faster than being served from other data stores. But if the caching solution in use is giving data integrity problems, it is better to migrate to NoSQL data stores like Cassandra. Is Cassandra faster than the in-memory caching solutions? The obvious answer is no. But it is not as bad as many think. Cassandra can be configured to serve fast reads, and bonus comes in the form of high data integrity with strong replication capabilities. Cache is good as long as it serves its purpose without any data loss or any other data integrity issues. Emphasizing on the use case of the key/value type cache and various methods of cache to NoSQL migration are discussed in this section of the book. Cassandra cannot be used as a replacement for cache in terms of the speed of data access. But when it comes to data integrity, Cassandra shines all the time with its tuneable consistency feature. With a continual tuning and manipulating data with clean and well-written application code, data access can be improved to a great level, and it will be much better than many other data stores. The design pattern covered in this section of the book gives some guidance on migrating from caching solutions to Cassandra, if this is a must. CAP patterns When it comes to large-scale Internet applications or web services, popularly known as the Internet of Things (IoT) applications, the number of components are huge and the way they are distributed is beyond imagination. There will be hundreds of application servers, hundreds of data store nodes, and many other components in the whole ecosystem. In such a scenario, for doing an atomic transaction by getting an agreement from all the components involved is, for all practical purposes, impossible. Consistency, availability, and partition tolerance are three important guarantees, popularly known as CAP guarantees that any distributed computing systems should offer even though all is not possible simultaneously. In the IoT applications, the distribution of the application nodes is unavoidable. This means that the possibility of network partition is pretty much there. So, it is mandatory to give the P guarantee. Now, the question is whether to forfeit the C guarantee or the A guarantee. At this stage, the situation is not as grave as portrayed in the CAP Theorem conjectured by Eric Brewer. For all the use cases in a given IoT application, there is no need of having 100% of C guarantee and 100% of A guarantee. So, depending on the need of the level of A guarantee, the C guarantee can be tuned. In other words, it is called tunable consistency. Depending on the way data is ingested into Cassandra, and the way it is consumed from Cassandra, tuning is possible to give best results for the appropriate read and write requirements of the applications. In some applications, the speed at which the data is written will be very high. In other words, the velocity of the data ingestion into Cassandra is very high. This falls into the write-heavy applications. In some applications, the need to read data quickly will be an important requirement. This is mainly needed in the applications where there is a lot of data processing required. Data analytics applications, batch processing applications, and so on fall under this category. These fall into the read-heavy applications. Now, there is a third category of applications where there is an equal importance for fast writes as well as fast reads. These are the kind of applications where there is a constant inflow of data, and at the same time, there is a need to read the data by clients for various purposes. This falls into the read-write balanced applications. The consistency level requirements for all the previous three types of applications are totally different. There is no one way to tune so that it is optimal for all the three types of applications. All the three applications' consistency levels are to be tuned differently from use case to use case. In this section of the book, various design patterns related to applications with the needs of fast writes, fast reads, and moderate write and read are discussed. All these design patterns revolve around using the tuneable consistency parameters of Cassandra. Whether it is for write or read and if the consistency levels are set high, the availability levels will be low and vice versa. So, by making use of the consistency level knob, the Cassandra data store can be used for various types of writing and reading use cases. Temporal patterns In any applications, the usage of data that varies over the period of time is called as temporal data, which is very important. Temporal data is needed wherever there is a need to maintain chronology. There are so many applications in which there is a huge need for storage, retrieval, and processing of data that is tied to time. The biggest challenge in dealing with temporal data stored in a data store is that they are hugely used for analytical purposes and retrieving the data, based on various sort orders in terms of time. So, the data stores that are used to capture the temporal data should be capable of storing the data strictly adhering to the chronology. There are so many usage patterns that are seen in the real world that fall into showing temporal behavior. For the classification purpose in this book, they are bucketed into three. The first one is the general time series category. The second one is the log category, such as in an audit log, a transaction log, and so on. The third one is the conversation category, such as in the conversation messages of a chat application. There is relevance in this classification, because these are commonly used across in many of the applications. In many of the applications, these are really cross cutting concerns; and designers underestimate this aspect; and finally, many of the applications will have different data stores capturing this temporal data. There is a need to have a common strategy dealing with temporal data that fall in these three commonly seen categories in an enterprise wide solution architecture. In other words, there should be a uniform way of capturing temporal data; there should be a uniform way of processing temporal data; and there should be a commonly used set of tools and libraries to manage the temporal data. Out of the three design patterns that are discussed in this section of the book, the first Time Series pattern is a general design pattern that covers the most general behavior of any kind of temporal data. The next two design patterns namely Log pattern and Conversation pattern are two special cases of the first design pattern. This section of the book covers the general nature of temporal data, some specific instances of such data items in the real-world applications, and why Cassandra is the best fit as a NoSQL data store to persist the temporal data. Temporal data comes quite often in many use cases of lots of applications. Data modeling of temporal data is very important in the Cassandra perspective for optimal storage and quick access of the data. Some common design patterns to model temporal data have been covered in this section of the book. By focusing on some very few aspects, such as the partition key, primary key, clustering column and the number of records that gets stored in a wide row of Cassandra, very effective and high performing temporal data models can be built. Analytical patterns The 3Vs of big data namely Volume, Variety, and Velocity pose another big challenge, which is the analysis of the data stored in NoSQL data stores, such as Cassandra. What are the analytics use cases? How can the distributed data be processed? What are the data transformations that are typically seen in the applications? These are the topics covered in this section of the book. Unlike other sections of this book, the focus is shifted from Cassandra to other technologies like Apache Hadoop, Hadoop MapReduce, and Apache Spark to introduce the big data analytics tool space. The design patterns such as Map/Reduce Pattern and Transformation Pattern are very commonly seen in the data analytics world. Cassandra with Apache Spark has good compatibility, and is a very ideal tool set in the data analysis use cases. This section of the book covers some data analysis aspects and mainly discusses about data processing. Data transformation is one of the major activity in data processing. Out of the many data processing patterns, Map/Reduce Pattern deserves a special mention, because it is being used in so many batch processing and analysis use cases, dealing with big data. Spark has been chosen as the tool of choice to explain the data processing activities. This section explains how a Map/Reduce kind of data processing task can be done using Cassandra. Spark has also been discussed, which is very powerful to perform online data analysis. This section of the book also covers some of the commonly seen data transformations that are used in the data processing applications. Summary Many Cassandra design patterns have been covered in this book. If the design patterns are not being used in any real-world applications, it has only theoretical value. To give a practical approach to the applicability of these design patterns, an end-to-end application is taken as a case point and described as the last chapter of the book, which is used as a vehicle to explain the applicability of the Cassandra design patterns discussed in the earlier sections of the book. Users love Cassandra because of its SQL-like interface CQL. Also, its features are very closely related to the RDBMS even though the paradigm is totally new. Application developers love Cassandra because of the plethora of drivers available in the market so that they can write applications in their preferred programming language. Architects love Cassandra because they can store structured, semi-structured, and unstructured data in it. Database administers love Cassandra because it comes with almost no maintenance overhead. Service managers love Cassandra because of the wonderful monitoring tools available in the market. CIOs love Cassandra because it gives value for their money. And Cassandra works! An application based on Cassandra will be perfect only if its features are used in the right way, and this book is an attempt to guide the Cassandra community in this direction. Resources for Article: Further resources on this subject: Cassandra Architecture [article] Getting Up and Running with Cassandra [article] Getting Started with Apache Cassandra [article]

0
0
12821

article-image-embedded-linux-and-its-elements

Packt

22 Sep 2015

6 min read

Embedded Linux and Its Elements

Packt

22 Sep 2015

6 min read

In this article by Chris Simmonds, author of the book, Mastering Embedded Linux Programming, we'll cover the introduction of embedded Linux and its elements. (For more resources related to this topic, see here.) Why is embedded Linux popular? Linux first became a viable choice for embedded devices around 1999. That was when Axis (www.axis.com) released their first Linux-powered network camera and Tivo (www.tivo.com) their first DVR (Digital video recorder). Since 1999, Linux has become ever more popular, to the point that today it is the operating system of choice for many classes of product. As of this writing, in 2015, there are about 2 billion devices running Linux. That includes a large number of smart phones running Android, set top boxes and smart TVs and WiFi routers. Not to mention a very diverse range of devices such as vehicle diagnostics, weighing scales, industrial devices and medical monitoring units that ship in smaller volumes. So, why does your TV run Linux? At first glance, the function of a TV is simple: it has to display a stream of video on a screen. Why is a complex Unix-based operating system like Linux necessary? The simple answer is Moore's Law: Gordon Moore, co-founder of Intel stated in 1965 that the density of components on a chip will double every 2 years. That applies to the devices that we design and use in our everyday lives just as much as it does to desktops, laptops and servers. A typical SoC (System on Chip) at the heart of current devices contains many function block and has a technical reference manual that stretches to thousands of pages. Your TV is not simply displaying a video stream as the old analog sets used to. The stream is digital, possibly encrypted, and it needs processing to create an image. Your TV is (or soon will be) connected to the Internet. It can receive content from smart phones, tablets and home media servers. It can be (or soon will) used to play games. And so on and so on. You need a full operating system to manage all that hardware. Here are some points that drive the adoption of Linux: Linux has the functionality required. It has a good scheduler, a good network stack, support for many kinds of storage media, good support for multimedia devices, and so on. It ticks all the boxes. Linux has been ported to a wide range of processor architectures, including those important for embedded use: ARM, MIPS, x86 and PowerPC. Linux is open source. So you have the freedom to get the source code and modify it to meet your needs. You, or someone in the community, can create a board support package for your particular SoC, board or device. You can add protocols, features, technologies that may be missing from the mainline source code. Or, you can remove features that you don't need in order to reduce memory and storage requirements. Linux is flexible. Linux has an active community. In the case of the Linux kernel, very active. There is a new release of the kernel every 10 to 12 weeks, and each release contains code from around 1000 developers. An active community means that Linux is up to date and supports current hardware, protocols and standards. Open source licenses guarantee that you have access to the source code. There is no vendor tie-in. There is no vendor, no license fees, no restrictive NDAs, EULAs, and so on. Open source software is free in both senses: it gives you the freedom to adapt it for our own use and there is nothing to pay. For these reasons, Linux is an ideal choice for complex devices. But there are a few caveats I should mention here. Complexity makes it harder to understand. Coupled with the fast moving development process and the decentralized structures of open source, you have to put some effort into learning how to use it and to keep on re-learning as it changes. I hope that this article will help in the process. Elements of embedded Linux Every project begins by obtaining, customizing and deploying these four elements: Toolchain, Bootloader, Kernel, and Root filesystem. Toolchain The toolchain is the first element of embedded Linux and the starting point of your project. It should be constant throughout the project, in other words, once you have chosen your toolchain it is important to stick with it. Changing compilers and development libraries in an inconsistent way during a project will lead to subtle bugs. Obtaining a toolchain can be as simple as downloading and installing a package. But, the toolchain itself is a complex thing. Linux toolchains are almost always based on components from the GNU project (http://www.gnu.org). It is becoming possible to create toolchains based on LLVM/Clang (http://llvm.org). Bootloader The bootloader is the second element of Embedded Linux. It is the part that starts the system up and loads the operating system kernel. When considering which bootloader to focus on, there is one that stands out: U-Boot. In an embedded Linux system the bootloader has two main jobs: to start the system running and to load a kernel. In fact the first job is in somewhat subsidiary to the second in that it is only necessary to get as much of the system working as is necessary to load the kernel. Kernel The kernel is the third element of Embedded Linux. It is the component that is responsible for managing resources and interfacing with hardware, and so affects almost every aspect of your final software build. Usually it is tailored to your particular hardware configuration. The kernel has three main jobs to do: to manage resources, to interface to hardware, and to provide an API that offers a useful level of abstraction to user space programs, as summarized in the following diagram: Root filesystem The root filesystem is the fourth and final element of embedded Linux. The first objective is to create a minimal root filesystem that can give us a shell prompt. Then using that as a base we will add scripts to start other programs up, and to configure a network interface and user permissions. Knowing how to build the root filesystem from scratch is a useful skill. Summary In this article we briefly saw the introduction for embedded Linux and its elements. Resources for Article: Further resources on this subject: Virtualization[article] An Introduction to WEP [article] Raspberry Pi LED Blueprints [article]

0
0
3613

article-image-enhancing-your-blog-advanced-features

Packt

22 Sep 2015

8 min read

Enhancing Your Blog with Advanced Features

Packt

22 Sep 2015

8 min read

In this article by Antonio Melé, the author of the Django by Example book shows how to use the Django forms, and ModelForms. You will let your users share posts by e-mail, and you will be able to extend your blog application with a comment system. You will also learn how to integrate third-party applications into your project, and build complex QuerySets to get useful information from your models. In this article, you will learn how to add tagging functionality using a third-party application. (For more resources related to this topic, see here.) Adding tagging functionality After implementing our comment system, we are going to create a system for adding tags to our posts. We are going to do this by integrating in our project a third-party Django tagging application. django-taggit is a reusable application that primarily offers you a Tag model, and a manager for easily adding tags to any model. You can take a look at its source code at https://github.com/alex/django-taggit. First, you need install django-taggit via pip by running the pip install django-taggit command. Then, open the settings.py file of the project, and add taggit to your INSTALLED_APPS setting as the following: INSTALLED_APPS = ( # ... 'mysite.blog', 'taggit', ) Then, open the models.py file of your blog application, and add to the Post model the TaggableManager manager, provided by django-taggit as the following: from taggit.managers import TaggableManager # ... class Post(models.Model): # ... tags = TaggableManager() You just added tags for this model. The tags manager will allow you to add, retrieve, and remove tags from the Post objects. Run the python manage.py makemigrations blog command to create a migration for your model changes. You will get the following output: Migrations for 'blog': 0003_post_tags.py: Add field tags to post Now, run the python manage.py migrate command to create the required database tables for django-taggit models and synchronize your model changes. You will see an output indicating that the migrations have been applied: Operations to perform: Apply all migrations: taggit, admin, blog, contenttypes, sessions, auth Running migrations: Applying taggit.0001_initial... OK Applying blog.0003_post_tags... OK Your database is now ready to use django-taggit models. Open the terminal with the python manage.py shell command, and learn how to use the tags manager. First, we retrieve one of our posts (the one with the ID as 1): >>> from mysite.blog.models import Post >>> post = Post.objects.get(id=1) Then, add some tags to it and retrieve its tags back to check that they were successfully added: >>> post.tags.add('music', 'jazz', 'django') >>> post.tags.all() [<Tag: jazz>, <Tag: django>, <Tag: music>] Finally, remove a tag and check the list of tags again: >>> post.tags.remove('django') >>> post.tags.all() [<Tag: jazz>, <Tag: music>] This was easy, right? Run the python manage.py runserver command to start the development server again, and open http://127.0.0.1:8000/admin/taggit/tag/ in your browser. You will see the admin page with the list of the Tag objects of the taggit application: Navigate to http://127.0.0.1:8000/admin/blog/post/ and click on a post to edit it. You will see that the posts now include a new Tags field as the following one where you can easily edit tags: Now, we are going to edit our blog posts to display the tags. Open the blog/post/list.html template and add the following HTML code below the post title: <p class="tags">Tags: {{ post.tags.all|join:", " }}</p> The join template filter works as the Python string join method to concatenate elements with the given string. Open http://127.0.0.1:8000/blog/ in your browser. You will see the list of tags under each post title: Now, we are going to edit our post_list view to let users see all posts tagged with a tag. Open the views.py file of your blog application, import the Tag model form django-taggit, and change the post_list view to optionally filter posts by tag as the following: from taggit.models import Tag def post_list(request, tag_slug=None): post_list = Post.published.all() if tag_slug: tag = get_object_or_404(Tag, slug=tag_slug) post_list = post_list.filter(tags__in=[tag]) # ... The view now takes an optional tag_slug parameter that has a None default value. This parameter will come in the URL. Inside the view, we build the initial QuerySet, retrieving all the published posts. If there is a given tag slug, we get the Tag object with the given slug using the get_object_or_404 shortcut. Then, we filter the list of posts by the ones which tags are contained in a given list composed only by the tag we are interested in. Remember that QuerySets are lazy. The QuerySet for retrieving posts will only be evaluated when we loop over the post list to render the template. Now, change the render function at the bottom of the view to pass all the local variables to the template using locals(). The view will finally look as the following: def post_list(request, tag_slug=None): post_list = Post.published.all() if tag_slug: tag = get_object_or_404(Tag, slug=tag_slug) post_list = post_list.filter(tags__in=[tag]) paginator = Paginator(post_list, 3) # 3 posts in each page page = request.GET.get('page') try: posts = paginator.page(page) except PageNotAnInteger: # If page is not an integer deliver the first page posts = paginator.page(1) except EmptyPage: # If page is out of range deliver last page of results posts = paginator.page(paginator.num_pages) return render(request, 'blog/post/list.html', locals()) Now, open the urls.py file of your blog application, and make sure you are using the following URL pattern for the post_list view: url(r'^$', post_list, name='post_list'), Now, add another URL pattern as the following one for listing posts by tag: url(r'^tag/(?P<tag_slug>[-w]+)/$', post_list, name='post_list_by_tag'), As you can see, both the patterns point to the same view, but we are naming them differently. The first pattern will call the post_list view without any optional parameters, whereas the second pattern will call the view with the tag_slug parameter. Let’s change our post list template to display posts tagged with a specific tag, and also link the tags to the list of posts filtered by this tag. Open blog/post/list.html and add the following lines before the for loop of posts: {% if tag %} <h2>Posts tagged with "{{ tag.name }}"</h2> {% endif %} If the user is accessing the blog, he will the list of all posts. If he is filtering by posts tagged with a specific tag, he will see this information. Now, change the way the tags are displayed into the following: <p class="tags"> Tags: {% for tag in post.tags.all %} <a href="{% url "blog:post_list_by_tag" tag.slug %}">{{ tag.name }}</a> {% if not forloop.last %}, {% endif %} {% endfor %} </p> Notice that now we are looping through all the tags of a post, and displaying a custom link to the URL for listing posts tagged with this tag. We build the link with {% url "blog:post_list_by_tag" tag.slug %} using the name that we gave to the URL, and the tag slug as parameter. We separate the tags by commas. The complete code of your template will look like the following: {% extends "blog/base.html" %} {% block title %}My Blog{% endblock %} {% block content %} <h1>My Blog</h1> {% if tag %} <h2>Posts tagged with "{{ tag.name }}"</h2> {% endif %} {% for post in posts %} <h2><a href="{{ post.get_absolute_url }}">{{ post.title }}</a></h2> <p class="tags"> Tags: {% for tag in post.tags.all %} <a href="{% url "blog:post_list_by_tag" tag.slug %}">{{ tag.name }}</a> {% if not forloop.last %}, {% endif %} {% endfor %} </p> <p class="date">Published {{ post.publish }} by {{ post.author }}</p> {{ post.body|truncatewords:30|linebreaks }} {% endfor %} {% include "pagination.html" with page=posts %} {% endblock %} Open http://127.0.0.1:8000/blog/ in your browser, and click on any tag link. You will see the list of posts filtered by this tag as the following: Summary In this article, you added tagging to your blog posts by integrating a reusable application. The book Django By Example, hands-on-guide will also show you how to integrate other popular technologies with Django in a fun and practical way. Resources for Article: Further resources on this subject: Code Style in Django[article] So, what is Django? [article] Share and Share Alike [article]

0
0
4402

How-To Tutorials

Packt

22 Sep 2015

16 min read

R ─ Classification and Regression Trees

Packt

22 Sep 2015

16 min read

"The classifiers most likely to be the best are the random forest (RF) versions, the best of which (implemented in R and accessed via caret), achieves 94.1 percent of the maximum accuracy overcoming 90 percent in the 84.3 percent of the data sets." – Fernández-Delgado et al (2014) "You can't see the forest for the trees!" – An old saying (For more resources related to this topic, see here.) In this article by Cory Lesmeister, the author of Mastering Machine Learning with R, the first item of discussion is the basic decision tree, which is both simple to build and understand. However, the single decision tree method does not perform as well as the other methods such as support vector machines or neural networks. Therefore, we will discuss the creation of multiple, sometimes hundreds of, different trees with their individual results combined, leading to a single overall prediction. The first quote written above is from Fernández-Delgado et al in the Journal of Machine Learning Research and is meant to set the stage that the techniques in this article are quite powerful, particularly when used for the classification problems. Certainly, they are not always the best solution, but they do provide a good starting point. Regression trees For an understanding of the tree-based methods, it is probably easier to start with a quantitative outcome and then move on to how it works on a classification problem. The essence of a tree is that the features are partitioned, starting with the first split that improves the residual sum of squares the most. These binary splits continue until the termination of the tree. Each subsequent split/partition is not done on the entire dataset but only on the portion of the prior split that it falls under. This top-down process is referred as recursive partitioning. It is also a process that is greedy, a term you may stumble on in reading about the machine learning methods. Greedy means that in each split in the process, the algorithm looks for the greatest reduction in the residual sum of squares without a regard to how well it will perform in the later partitions. The result is that you may end up with a full tree of unnecessary branches, leading to a low bias but high variance. To control this effect, you need to appropriately prune the tree to an optimal size after building a full tree. The following figure provides a visual of the technique in action. The data is hypothetical with 30 observations, a response ranging from 1 to 10, and two predictor features, both ranging in value from 0 to 10 named X1 and X2. The tree has three splits that lead to four terminal nodes. Each split is basically an if or then statement or uses an R syntax, ifelse(). In the first split, if X1 < 3.5, then the response is split into 4 observations with an average value of 2.4 and the remaining 26 observations. This left branch of 4 observations is a terminal node as any further splits would not substantially improve the residual sum of squares. The predicted value for the 4 observations in that partition of the tree becomes the average. The next split is at X2 < 4 and finally X1 < 7.5. An advantage of this method is that it can handle the highly nonlinear relationships; but can you see a couple of potential problems? The first issue is that an observation is given the average of the terminal node that it falls under. This can hurt the overall predictive performance (high bias). Conversely, if you keep partitioning the data further and further to achieve a low bias, high variance can become an issue. As with the other methods, you can use cross-validation to select the appropriate tree size. Regression Tree with 3 splits and 4 terminal nodes and the corresponding node average and number of observations. Classification trees Classification trees operate under the same principal as regression trees except that the splits are not determined by the residual sum of squares but an error rate. The error rate used is not what you would expect, where the calculation is simply misclassified observations divided by the total observations. As it turns out, when it comes to tree splitting, a misclassification rate by itself may lead to a situation where you can gain information with a further split but not improve the misclassification rate. Let's look at an example. Suppose we have a node—let's call it N0 where you have 7 observations labeled No and 3 observations labeled Yes, and we say that the misclassified rate is 30 percent. With this in mind, let's calculate a common alternative error measure called Gini index. The formula for a single node Gini index is as follows: Gini = 1 – (probability of Class 1)2 – (probability of Class 2)2. For N0, the Gini is 1 - (.7)2 - (.3)2, which is equal to 0.42, versus the misclassification rate of 30 percent. Taking this example further, we will now create an N1 node with 3 of Class 1 and none of Class 2 along with N2, which has 4 observations from Class 1 and 3 from Class 2. Now, the overall misclassification rate for this branch of the tree is still 30 percent, but look at the following to see how the overall Gini index has improved: Gini(N1) = 1 – (3/3)2 – (0/3)2 = 0. Gini(N2) = 1 – (4/7)2 – (3/7)2 = 0.49. The new Gini index = (proportion of N1 x Gini(N1)) + (proportion of N2 x Gini(N2)) which is equal to (.3 x 0) + (.7 x 0.49) or 0.343. By doing a split on a surrogate error rate, we actually improved our model impurity by reducing it from 0.42 to 0.343, whereas the misclassification rate did not change. This is the methodology used by the rpart() package. Random forest To greatly improve our model's predictive ability, we can produce numerous trees and combine the results. The random forest technique does this by applying two different tricks in the model development. The first is the use of bootstrap aggregation or bagging as it is called. In bagging, an individual tree is built on a sample of dataset, roughly two-thirds of the total observations. It is important to note that the remaining one-third is referred to as Out of Bag(OOB). This is repeated for dozens or hundreds of times and the results are averaged. Each of these trees is grown and not pruned based on any error measure and this means that the variance of each of these individual trees is high. However, by averaging the results, you can reduce the variance without increasing the bias. The next thing that the random forest brings to the table is that concurrently with the random sample of the data, it also takes a random sample of the input features at each split. In the randomForest package, we will use the default random number of the sampled predictors, which is the square root of the total predictors for classification problems and total predictors divided by 3 for regression. The number of predictors that the algorithm randomly chooses at each split can be changed via the model tuning process. By doing this random sampling of the features at each split and incorporating it into the methodology, you mitigate the effect of a highly correlated predictor in becoming the main driver in all of your bootstrapped trees and preventing you from reducing the variance that you hoped to achieve with bagging. The subsequent averaging of the trees that are less correlated to each other than if you only performed bagging, is more generalizable and more robust to outliers. Gradient boosting The boosting methods can become extremely complicated for you to learn and understand, but you should keep in mind about what is fundamentally happening behind the curtain. The main idea is to build an initial model of some kind (linear, spline, tree, and so on.) called the base-learner, examine the residuals, and fit a model based on these residuals around the so-called loss function. A loss function is merely the function that measures the discrepancy between the model and desired prediction, for example, a squared error for the regression or the logistic function for the classification. The process continues until it reaches some specified stopping criterion. This is like the student who takes a practice exam and gets 30 out of 100 questions wrong and as a result, studies only those 30 questions that were missed. The next practice exam they get 10 out of these 30 wrong and so only focus on these 10 questions and so on. If you would like to explore the theory behind this further, a great resource for you is available in Frontiers in Neurorobotics, Gradient boosting machines, a tutorial, Natekin A., Knoll A., (2013), at http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3885826/. As previously mentioned, boosting can be applied to many different base learners, but here we will only focus on the specifics of tree-based learning. Each tree iteration is small and we will determine how small it is with one of the tuning parameters referred to as interaction depth. In fact, it may be as small as one split, which is referred to as a stump. Trees are sequentially fit to the residuals according to the loss function up to the number of trees that we specified (our stopping criterion). There is another tuning parameter that we will need to identify and that is shrinkage. You can think of shrinkage as the rate at which your model is learning generally and specifically, as the contribution of each tree or stump to the model. This learning rate acts as a regularization parameter. The other thing about our boosting algorithm is that it is stochastic, meaning that it adds randomness by taking a random sample of our data at each tree. Introducing some randomness to a boosted model usually improves the accuracy and speed and reduces overfitting (Friedman 2002). As you may have guessed, tuning these parameters can be quite a challenge. These parameters can interact with each other and if you just tinker with one without considering the other, your model may actually perform worse. The caret package will help us in this endeavor. Business case The overall business objective in this situation is to see if we can improve the predictive ability for some of the cases. For regression, we will visit the prostate cancer data. For classification purposes, we will utilize both the breast cancer biopsy data and Pima Indian Diabetes data. Both random forests and boosting will be applied to all the three datasets. The simple tree method will be used only on the breast and prostate cancer sets. Regression tree We will jump right into the prostate data set, but first let's load the necessary R package, as follows: > library(rpart) #classification and regression trees > library(partykit) #treeplots > library(MASS) #breast and pima indian data > library(ElemStatLearn) #prostate data > library(randomForest) #random forests > library(gbm) #gradient boosting > library(caret) #tune hyper-parameter First, we will do regression on the prostate data. This involves calling the dataset, coding the gleason score as an indicator variable using the ifelse() function, and creating a test and training set. The training set will be pros.train and the test set will be pros.test, as follows: > data(prostate) > prostate$gleason = ifelse(prostate$gleason == 6, 0, 1) > pros.train = subset(prostate, train==TRUE)[,1:9] > pros.test = subset(prostate, train==FALSE)[,1:9] To build a regression tree on the training data, we will use the following rpart() function from R's party package. The syntax is quite similar to what we used in the other modeling techniques: > tree.pros <- rpart(lpsa~., data=pros.train) We can call this object using the print() function and cptable and then examine the error per split to determine the optimal number of splits in the tree, as follows: > print(tree.pros$cptable) CP nsplit rel error xerror xstd 1 0.35852251 0 1.0000000 1.0364016 0.1822698 2 0.12295687 1 0.6414775 0.8395071 0.1214181 3 0.11639953 2 0.5185206 0.7255295 0.1015424 4 0.05350873 3 0.4021211 0.7608289 0.1109777 5 0.01032838 4 0.3486124 0.6911426 0.1061507 6 0.01000000 5 0.3382840 0.7102030 0.1093327 This is a very important table to analyze. The first column labeled CP is the cost complexity parameter, which states that the second column, nsplit, is the number of splits in the tree. The rel error column stands for relative errors and is the residual sum of squares for that number of splits divided by the residual sum of squares for no splits (RSS(k)/RSS(0). Both xerror and xstd are based on a ten-fold cross-validation with xerror being the average error and xstd being the standard deviation of the cross-validation process. We can see that four splits produced slightly less errors using cross-validation while five splits produced the lowest error on the full dataset. You can examine this using the plotcp() function, as follows: > plotcp(tree.pros) The following is the output of the preceding command: The plot shows us the relative error by the tree size with the corresponding error bars. The horizontal line on the plot is the upper limit of the lowest standard error. Selecting the tree size 5, which is four splits, we can build a new tree object where xerror is minimized by pruning our tree accordingly—first creating an object for cp associated with the pruned tree from the table. Then the prune() function handles the rest as follows: > cp = min(tree.pros$cptable[5,]) > prune.tree.pros <- prune(tree.pros, cp = cp) With this done, you can plot and compare the full and pruned trees. The tree plots produced by the partykit package are much better than those produced by the party package. You can simply use the as.party() function as a wrapper in the plot() function: > plot(as.party(tree.pros)) The output of the preceding command is as follows: > plot(as.party(prune.tree.pros)) The following is the output of the preceding command: Note that the splits are exactly the same in the two trees with the exception of the last split, which includes the age variable for the full tree. Interestingly, both the first and second splits in the tree are related to the log of cancer volume lcavol. These plots are quite informative as they show the splits, nodes, observations per node, and box plots of the outcome that we are trying to predict. Let's see how well the pruned tree performs on the test data. What we will do is create an object of the predicted values using the predict() function by incorporating the test data. Then, we will calculate the errors as the predicted values minus the actual values and finally the mean of the squared errors, as follows: > party.pros.test <- predict(prune.tree.pros, newdata=pros.test) > rpart.resid = party.pros.test - pros.test$lpsa #calculate residuals > mean(rpart.resid^2) #caluclate MSE [1] 0.5267748 One can look at the tree plots that we produced and easily explain what are the primary drivers behind the response. As mentioned in the introduction, the trees are easy to interpret and explain, which, in many cases, may be more important than the accuracy. Classification tree For the classification problem, we will prepare the breast cancer data. After loading the data, you will delete the patient ID, rename the features, eliminate the few missing values, and then create the train/test datasets, as follows: > data(biopsy) > biopsy <- biopsy[,-1] #delete ID > names(biopsy) = c("thick", "u.size", "u.shape", "adhsn", "s.size", "nucl", "chrom", "n.nuc", "mit", "class") #change the feature names > biopsy.v2 = na.omit(biopsy) #delete the observations with missing values > set.seed(123) #random number generator > ind = sample(2, nrow(biopsy.v2), replace=TRUE, prob=c(0.7, 0.3)) > biop.train = biopsy.v2[ind==1,] #the training data set > biop.test = biopsy.v2[ind==2,] #the test data set With the data set up appropriately, we will use the same syntax style for a classification problem as we did previously for a regression problem, but before creating a classification tree, we will need to ensure that the outcome is a factor, which can be done using the str() function. as follows: > str(biop.test[,10]) Factor w/ 2 levels "benign","malignant": 1 1 1 1 1 2 1 2 1 1 … First, we will create the tree: > set.seed(123) > tree.biop <- rpart(class~., data=biop.train) Then, examine the table for the optimal number of splits: > print(tree.biop$cptable) CP nsplit rel error xerror xstd 1 0.79651163 0 1.0000000 1.0000000 0.06086254 2 0.07558140 1 0.2034884 0.2674419 0.03746996 3 0.01162791 2 0.1279070 0.1453488 0.02829278 4 0.01000000 3 0.1162791 0.1744186 0.03082013 The cross-validation error is at a minimum with only two splits (row 3). We can now prune the tree, plot the full and pruned tree, and see how it performs on the test set, as follows: > cp = min(tree.biop$cptable[3,]) > prune.tree.biop <- prune(tree.biop, cp = cp) > plot(as.party(tree.biop)) > plot(as.party(prune.tree.biop)) An examination of the tree plots shows that the uniformity of the cell size is the first split, then bare nuclei. The full tree had an additional split at the cell thickness. We can predict the test observations using type="class" in the predict() function: > rparty.test <- predict(prune.tree.biop, newdata=biop.test, type="class") > table(rparty.test, biop.test$class) rparty.test benign malignant benign 136 3 malignant 6 64 > (136+64)/209 [1] 0.9569378 The basic tree with just two splits gets us almost 96 percent accuracy. This still falls short but should encourage us to believe that we can improve on it with the upcoming methods, starting with random forests. Summary In this article we learned both the power and limitations of tree-based learning methods for both classification and regression problems. To improve on predictive ability, we have the tools of the random forest and gradient boosted trees at our disposal. Resources for Article: Further resources on this subject: Big Data Analysis (R and Hadoop) [article] Using R for Statistics, Research, and Graphics [article] First steps with R [article]

0
0
7948

article-image-stata-data-analytics-software

Packt

22 Sep 2015

11 min read

Stata as Data Analytics Software

Packt

22 Sep 2015

11 min read

In this article by Prasad Kothari, the author of the book Data Analysis with STATA, the overall goal is to cover the STATA related topics such as data management, graphs and visualization and programming in STATA. The article will give a detailed description of STATA starting with an introduction to STATA and Data analytics and then talks about STATA programming and data management. After which it takes you through Data visualization and all the important statistical tests in STATA. Then the article will cover the Linear and the Logistics regression in STATA and in the end it will take you through few analyses like Survey analysis, Time Series analysis and Survival analysis in STATA. It also teaches different types of statistical modelling techniques and how to implement these techniques in STATA. (For more resources related to this topic, see here.) These days, many people use Stata for econometric and medical research purposes, among other things. There are many people who use different packages, such as Statistical Package for the Social Sciences (SPSS) and EViews, Micro, RATS/CATS (used by time series experts), and R for Matlab/Guass/Fortan (used for hardcore analysis). One should know the usage of Stata and then apply it in their relative fields. Stata is a command-driven language; there are over 500 different commands and menu options, and each has a particular syntax required to invoke any of the various options. Learning these commands is a time-consuming process, but it is not hard. At the end of each class, your do-file will contain all the commands that we have covered, but there is no way we will cover all of these commands in this short introductory course. Stata is a combined statistical analytical tool that is intended for use by research scholars and analytics practitioners. Stata has many strengths, but we are going to talk about the most important one: managing, adjusting, and arranging large sets of data. Stata has many versions, and with every version, it keeps on improving; for example, in Stata versions 11 to 14, there are changes and progress in the computing speed, capabilities and functionalities, as well as flexible graphic capabilities. Over a period of time, Stata keeps on changing and updating the model as per users' suggestions. In short, the regression method is based on a nonstandard feature, which means that you can easily get help from the Web if another person has written a program that can be integrated with their software for the purpose of analysis. The following topics will be covered in this articler: Introducing Data analytics Introducing the Stata interface and basic techniques Introducing data analytics We analyze data everyday for various reasons. To predict an event or forecast the key indicators, such as the revenue for given organization, is fast becoming a major requirement in the industry. There are various types of techniques and tools that can be leveraged to analyze the data. Here are the techniques that will be covered in this article using Stata as a tool: Stata Programming and Data management: Before predicting anything, we need to manage and massage the data in order to make it good enough to be something through which insights can be derived. The programming aspect helps in creating new variables to treat data in such a way that finding patterns in historical data or predicting the outcome of given event becomes much easier. Data visualization: After the data preparation, we need to visualize the data for the the following: To view what patterns in the data look like To check whether there are any outliers in the data To understand the data better To draw preliminary insights from the data Important statistical tests in Stata: After data visualization, based on observations, you can try to come up with various hypotheses about the data. We need to test these hypotheses on the datasets to check whether they are statistically significant and whether we can depend on and apply these hypotheses in future situations as well. Linear regression in Stata: Once done with the hypothesis testing, there is always a business need to predict one of the variables, such as what the revenue of the financial organization will be given the specific conditions, and so on. These predictions about continuous variables, such as the revenue, the default amount on the credit card, and the number of items sold in a given store, come through linear regression. Linear regression is the most basic and widely used prediction methodology. We will go into details of linear regression in a later chapter. Logistic regression in Stata: When you need to predict the outcome of a particular event along with the probability, logistic regression is the best and most acknowledged method by far. Predicting which team will win the match in football or cricket or predicting whether a customer will default on a loan payment can be decided through the probabilities given by logistic regression. Survey analysis in Stata: Understanding the customer sentiment and consumer experience is one of the biggest requirements of the retail industry. The research industry also needs data about people's opinion in order to derive the effect of a certain event or the sentiments of the affected people. All of these can be achieved by conducting and analyzing survey datasets. Survey analysis can have various subtechniques, such as factor analysis, principle component analysis, panel data analysis, and so on. Time series analysis in Stata: When you try to forecast a time-dependent variable with reasonable cyclic behavior of seasonality, time series analysis comes handy. There are many techniques of time series analysis, but we will talk about a couple of them: Autoregressive Integrated Moving Average (ARIMA) and Box Jenkins. Forecasting the amount of rainfall depending on the amount of rainfall in the past 5 years is a classic time series analysis problem. Survival analysis in Stata: These days, lots of customers attrite from telecom plans, healthcare plans, and so on and join the competitors. When you need to develop a churn model or attrition model to check who will attrite, survival analysis is the best model. The Stata interface Let's discuss the location and layout of Stata. It is very easy to locate Stata on a computer or laptop; after installing the software, go to the start menu, go to the search menu, and type Stata. You can find out the path where the file is saved. This depends on which version has been installed. Another way to find Stata on computer is through the quick launch button as well as through start programs. The preceding diagram represents the Stata layout. The four types of processors in Stata are multiprocessor (two or four), special edition processor (flavors), intercooled, and small processor. The multiprocessor is one of the most efficient processors. Though all processor versions function in a similar fashion, only variables' repressors frequency increases with each new version. At present, Stata version 11 is in demand and is being used on various computers. It is a type of software that runs on commands. In the new versions of Stata, new ways, such as menus that can search Stata, have come in the market; however, typing a command is the most simple and quick way to learn Stata. The more you leverage the functionality of typing the command, the better your learning is. Through the typing technique method, programming becomes easy and simple for analytics. Sometimes, it is difficult to find the exact syntax in commands; therefore, it is advisable that the menu command be used. Later on, you just copy the same command for further use. There are three ways to enter the commands, as follows: Use the do-file program. This is a type of program in which one has to inform the computer (through a command) that it needs to use the do-file type. Type the command manually through typing. Enter the command interactively; just click on the menu screen. Though all the three types discussed in the preceding bullets are used, the do-file type is the most frequently used one. The reason is that for a bigger file, it is faster as compared to manual typing. Secondly, it can store the data and keep it in the same format in which it was stored. Suppose you make a mistake and want to rectify it; what would you do? In this case, do-file is useful; one can correct it and run the program once again. Generally, an interactive command is used to find out the problem and later on, do-file is used to solve it. The following is an example of an interactive command: Data-storing techniques in Stata Stata is a multipurpose program, which can serve not only its own data, but also other data in a simple format, for example, ASCII. Regardless of the data type format (Excel/statistical package), it gets automatically exported to the ASCII file. This means that all the data can now easily be imported to Stata. The data entered in Stata is in different types of variables, such as vectors with individual observations in every row; it also holds strings and numeric strings. Every row has a detailed observation of the individual, country, firm, or whatever information is entered in Stata. As the data is stored in variables, it makes Stata the most efficient way to store information. Sometimes, it is better to save the data in a different storage form, such as the following: Matrices Macros Matrices should be used carefully as they consume more memory as compared to variables, so there might be a possibility of low space memory before work is started. Another form is macros; these are similar to variables in other programming languages and are named containers, which means they contain information of any type. There are two flavors of macros: local/temporary and global. Global macros are flexible and easy to manage; once they are defined in a computer or laptop, they can be easily opened through all commands. On the other hand, local macros are temporary objects that are formed for a particular environment and cannot be use in another area. For example, if you use a local macro for do-file, that code will only exist in that particular environment. Directories and folders in Stata Stata has a tree-style structure to organize directories as well as folders similar to other operating systems, such as Windows, Linux, Unix, and Mac OS. This makes things easy and can be retrieved later on dates that are convenient. For example, the data folder is used to save entire datasets, subfolders for every single dataset, and so on. In Stata, the following commands can be leveraged: Dos Linux Unix For example, if you need to change the directory, you can use the CD command for example: CD C:Stataforlder You can also generate a new directory along with the current directory you have been using. For example: mkdir "newstata". You can leverage the dir command to get the details of the directory. If you need the current directory name along with the directory, you can utilize the pwd or cd command. The use of paths in Stata depends on the type of data; usually, there are two paths: absolute and relative. The absolute path contains the full address, denoting the folder. In the command you have seen in the earlier example, we leveraged the CD command using the path that is absolute. On the contrary, the relative path provides us with the location of the file. The following example of mkdir has used the relative path: mkdir "EStata|Stata1" The use of the relative path will be beneficial, especially when working on different devices, such as a PC at home or a library or server. To separate folders, Windows and Dos use a backslash (), whereas Linux and Unix use a slash (/). Sometimes, these connotations might be troublesome when working on the server where Stata is installed. As a general rule, it is advisable that you use slashes in the relative path as Stata can easily understand slash as a separator. The following is an example of this: mkdir "/Stata1/Data" – this is how you create the new folder for your STATA work. Summary In this Article we discussed lots of basic commands, which can be leveraged while performing Stata programming. Read Data Analysis with Stata to gain detailed knowledge of the different data management techniques and programming in detail. As you learn more about Stata, you will understand the various commands and functions and their business applications. Resources for Article: Further resources on this subject: Big Data Analysis (R and Hadoop) [article] Financial Management with Microsoft Dynamics AX 2012 R3 [article] Taming Big Data using HDInsight [article]

0
0
4208

Packt

21 Sep 2015

25 min read

Introducing JAX-RS API

Packt

21 Sep 2015

25 min read

0
0
9512

How-To Tutorials

Editor Tool, Prefabs, and Main Menu

Find Friends on Facebook

Prototyping Levels with Prototype

Implementing Decision Trees

Getting Started with Apache Spark DataFrames

Putting the Function in Functional Programming

Using Google Maps APIs with Knockout.js

Introduction to Penetration Testing and Kali Linux

Internet Connected Smart Water Meter

Cassandra Design Patterns

Trending Topics

Embedded Linux and Its Elements

Enhancing Your Blog with Advanced Features

R ─ Classification and Regression Trees

Stata as Data Analytics Software

Introducing JAX-RS API

Create a Free Account To Continue Reading

Sign in to activate your 7-day free access