Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletter Hub
Free Learning
Arrow right icon
timer SALE ENDS IN
0 Days
:
00 Hours
:
00 Minutes
:
00 Seconds

How-To Tutorials

7019 Articles
article-image-optimizing-games-android
Packt
16 Aug 2016
13 min read
Save for later

Optimizing Games for Android

Packt
16 Aug 2016
13 min read
In this article by Avisekhar Roy, the author of The Android Game Developer's Handbook, we will focus on the need for optimization and types of optimization with respect to games for the Android OS. We will also look at some common game development mistakes in this article. (For more resources related to this topic, see here.) The rendering pipeline in Android Let's now have a look at the types of rendering pipelines in Android. The 2D rendering pipeline In thecase of the 2D Android drawing system through Canvas, all the assets are first drawn on the canvas, and the canvas is rendered on screen. The graphic engine maps all the assets within the finite Canvas according to the given position. Many times, developers use small assets separately that cause a mapping instruction to execute for each asset. It is always recommended that you use sprite sheets to merge as many small assets as possible. A single draw call can then be applied to draw every object on the Canvas. Now, the question is how to create the sprite and what are the other consequences. Previously, Android could not support images or sprites of a size more than 1024x1024 pixels. Since Android 2.3, the developer can use a sprite of size 4096x4096. However, using such sprites can cause permanent memory occupancy during the scopes of all the small assets. Many low-configuration Android devices do not support such large images to be loaded during an application. It is a best practice that developers limit themselves to 2048x2048 pixels. This will reduce memory peak as well as significant amount of draw calls to the canvas. The 3D rendering pipeline Android uses OpenGL to render assets on the screen. So,the rendering pipeline for Android 3D is basically theOpenGL pipeline. Let's have look at the OpenGL rendering system: Now, let's have a detailed look at each step of thepreceding rendering flow diagram: The vertex shader processes individual vertices with vertex data. The control shader is responsible for controlling vertex data and patches for the tessellation. The polygon arrangement system arranges the polygon with each pair of intersecting lines created by vertices. Thus, it creates the edges without repeating vertices. Tessellation is the process of tiling the polygons in a shape without overlap or any gap. The geometry shader is responsible for optimizing the primitive shape. Thus triangles are generated. After constructing the polygons and shapes, the model is clipped for optimization. Vertex post processing is used to filter out unnecessary data. The mesh is then rasterized. The fragment shader is used to process fragments generated from rasterization. All the pixels are mapped after fragmentation and processed with the processed data. The mesh is added to the frame buffer for final rendering. Optimizing 2D assets Any digital game cannot be made without 2D art assets. There must be 2D assets in some form inside the game. So, as far as game component optimization is concerned, every 2D asset should also be optimized. Optimization of 2D assets means these three main things. Size optimization Each asset frame should contain only the effective pixels to be used in games. Unnecessary pixels increase the asset size and memory use during runtime. Data optimization Not all images require full data information for pixels. A significant amount of data might be stored in each pixel, depending on the image format. For example, fullscreen opaque images should never contain transparency data. Similarly, depending on the color set, images must be formatted in 8-bit, 16-bit, or 24-bit format. Image optimization tools can be used to perform such optimizations. Process optimization The larger the amount of data compressed during optimization, the more time it takes to decompress it and load it to memory. So, image optimization has a direct effect on the processing speed. From another point of view, creating an image atlas or sprite sheet is another way to reduce the processing time of images. Optimizing 3D assets A 3D art asset has two parts to be optimized. A 2D texture part is to be optimized in the same 2D optimization style. The only thing the developer needs to consider is after optimization, the shader should have the same effect on the structure. Rest of the 3D asset optimization entirely depends on the number of vertices and the model polygon. Limiting polygon count It is very obvious that a large number of polygons used to create the mesh can create more details. However, we all know that Android is a mobile OS, and it always has hardware limitations. The developer should count the number of polygons used in the mesh and the total number of polygons rendered on the screen in a single draw cycle. There is always a limitation depending on the hardware configuration. So, limiting polygon and vertex count per mesh is always an advantage in order to achieve a certain frame rate or performance. Model optimization Models are created with more than one mesh. Using a separate mesh in the final model always results in heavy processing. This is a major effort for the game artist. Multiple overlaps can occur if multiple meshes are used. This increases vertex processing. Rigging is another essential part of finalizing the model. A good rigger defines the skeleton with minimum possible joints for minimum processing. Common game development mistakes It is not always possible to look into each and every performance aspect at every development stage. It is a very common practice to use assets and write code in a temporary mode and use it in the final game. This affects the overall performance and future maintenance procedure. Here are few of the most common mistakes made during game development. Use of non-optimized images An artist creates art assets, and the developer directly integrates those into the game for debug build. However, most of the time, those assets are never optimized even for the release candidate. This is the reason there may be plenty of high-bit images where the asset contains limited information. Alpha information may be found in opaque images. Use of full utility third-party libraries The modernday development style does not require each and every development module to be written from scratch. Most of the developers use a predefined third-party library for common utility mechanisms. Most of the time, these packages come with most of the possible methods, and among them, very few are actually used in games. Developers, most of the time, use these packages without any filtration. A lot of unused data occupies memory during runtime in such cases. Many times, a third-party library comes without any editing facility. In this case, the developer should choose such packages very carefully depending on specific requirements. Use of unmanaged networking connections In modern Android games, use of Internet connectivity is very common. Many games use server-based gameplay. In such cases, the entire game runs on the server with frequent data transfers between the server and the client device.Each data transfer process takes time, and the connectivity drains battery charge significantly. Badly managed networking states often freeze the application. Especially for real-time multiplayer games, a significant amount of data is handled. In this case, a request and response queue should be created and managed properly. However, the developer often skips this part to save development time. Another aspect of unmanaged connections is unnecessary packet data transferred between the server and client. So, there is an extra parsing process involved each time data is transferred. Using substandard programming We have already discussed programming styles and standards. The modular programming approach may increase a few extra processes, but the longer management of programming demandsmodular programming. Otherwise, developers end up repeating code, and this increases process overhead. Memory management also demands a good programming style. In few cases, the developer allocates memory but often forgets to free the memory. This causes a lot of memory leakage. At times,the application crashes due to insufficient memory. Substandard programming includes the following mistakes: Declaring the same variables multiple times Creating many static instances Writing non-modular coding Improper singleton class creation Loading objects at runtime Taking the shortcut This is the funniest fact among ill-practiced development styles. Taking a shortcut during development is very common among game developers. Making games is mostly about logical development. There may be multiple ways of solving a logical problem. Very often, the developer chooses the most convenient way to solve such problems.For example, the developer mostly uses the bubble sorting method for most of the sorting requirements, despite knowing that it is the most inefficient sorting process. Using such shortcuts multiple times in a game may cause a visible process delay, which directly affects frame rate. 2D/3D performance comparison Android game development in 2D and 3D is different. It is a fact that 3D game processing is heavier than 2D games. However, game scale is always the deciding factor. Different look and feel 3D look and feel is way different than 2D. Use of a particle system in 3D games is very common to provide visual effects. In the case of 2D games, sprite animation and other transformations are used to show such effects. Another difference between 2D and 3D look and feel is dynamic light and shadow. Dynamic light is always a factor for greater visual quality. Nowadays, most 3D games use dynamic lighting, which has a significant effect on game performance. In the case of 2D games, light management is done through assets. So, there is no extra processing in 2D games for light and shadow. In 2D games, the game screen is rendered on a Canvas. There is only one fixed point of view. So, the concept of camera is limited to a fixed camera. However, in 3D games, it is a different case. Multiple types of cameras can be implemented. Multiple cameras can be used together for a better feel of the game. Rendering objects through multiple cameras causes more process overhead. Hence, it decreases the frame rate of the game. There is a significant performance difference in using 2D physics and 3D physics. A 3D physics engine is far more process heavy than a 2D physics engine. 3D processing is way heavier than 2D processing It is a common practice in the gaming industry to accept less FPS of 3D games in comparison to 2D games. In Android, the standard accepted FPS for 2D games is around 60FPS, whereas a 3D game is acceptable even if it runs at as low as 40FPS. The logical reason behind that is 3D games are way heavier than 2D games in terms of process. The main reasons are as follows: Vertex processing: In 3D games, each vertex is processed on the OpenGL layer during rendering. So, increasing the number of vertices leads to heavier processing. Mesh rendering: A mesh consists of multiple vertices and many polygons. Processing a mesh increasesthe rendering overhead as well. 3D collision system: A 3D dynamic collisiondetection system demands each vertex of the collider to be calculated for collision. This calculation is usually done by the GPU. 3D physics implementation: 3D transformation calculation completely depends on matrix manipulation, which is always heavy. Multiple camera use: Use of multiple cameras and dynamically setting up the rendering pipeline takes more memory and clock cycles. Device configuration Android has a wide range of device configuration options supported by the platform. Running the same game on different configurations does not produce the same result. Performance depends on the following factors. Processor There are many processors used for Android devices in terms of the number of cores and speed of each core. Speed decides the number of instructions that can be executed in a single cycle.There was a time when Android used to have a single core CPU with speed less than 500MHz. Now.we have multicore CPUs with more than 2GHz speed on each core. RAM Availability of RAM is another factor to decide performance. Heavy games require a greater amount of RAM during runtime. If RAM is limited, then frequent loading/unloading processesaffect performance. GPU GPU decides the rendering speed. It acts as the processing unit for graphical objects. A more powerful processor can process more rendering instructions, resulting in better performance. Display quality Display quality is actually inversely proportional to the performance. Better display quality has to be backed by better GPU, CPU, and RAM, because better displays always consist of bigger resolution with better dpi and more color support. We can see various devices with different display quality. Android itself has divided the assets by this feature: LDPI: Lowest dpi display for Android (~120 dpi) MDPI: Medium dpi display for Android (~160 dpi) HDPI: High dpi display for Android (~240 dpi) XHDPI: Extra high dpi display for Android (~320 dpi) XXHDPI: Extra extra high dpi display for Android (~480 dpi) XXXHDPI: Extra extraextra high dpi display for Android (~640 dpi) It can be easily predicted that the list will include more options in the near future, with the advancement of hardware technology. Battery capacity It is an odd factor in the performance of the application. More powerful CPU, GPU, and RAM demand more power. If the battery is incapable of delivering power, then processing units cannot run at their peak efficiency. To summarize these factors, we can easily make a few relational equations with performance: CPU is directly proportional to performance GPU is directly proportional to performance RAM is directly proportional to performance Display quality is inversely proportional to performance Battery capacity is directly proportional to performance Summary There are many technical differences between 2D and 3D games in terms of rendering, processing, and assets. The developer should always use an optimized approach to create assets and write code. One more way of gaining performance is to port the games for different hardware systems for both 2D and 3D games. We can see a revolutionary upgrade in hardware platforms since the last decade. Accordingly, the nature of games has also changed. However, the scope of 2D games is still there with a large set of possibilities. Gaining performance is more of a logical task than technical. There are a few tools available to do the job, but it is the developer's decision to choose them. So, selecting the right tool for the right purpose is necessary, and there should be a different approach for making 2D and 3D games. Resources for Article: Further resources on this subject: Drawing and Drawables in Android Canvas [article] Hacking Android Apps Using the Xposed Framework [article] Getting started with Android Development [article]
Read more
  • 0
  • 0
  • 14607

article-image-server-side-rendering
Packt
16 Aug 2016
5 min read
Save for later

Server-Side Rendering

Packt
16 Aug 2016
5 min read
In this article by Kamil Przeorski, the author of the book Mastering Full Stack React Web Development introduces Universal JavaScript or isomorphic JavaScript features that we are going to implement in thisarticle. To be more exact: we will develop our app the way that we will render the app's pages both on server and client side. It's different to Angular1 or Backbone single-page apps which are mainly rendered on the client side. Our approach is more complicated in technological terms as you need to deploy your full-stack skills which working on a server side rendering, but on the other-side having this experience will make you more desirable programmer so you can advance your career to the next level—you will be able to charge more for your skills on the market. (For more resources related to this topic, see here.) When the server-side is worth implementing h1 The server-side rendering is very useful feature in text's content (like news portals) related startups/companies because it helps to be better indexed by different search engines. It's an essential feature for any news and content heavy websites, because it helps grow them organic traffic. In this article, we will also run our app with server-side rendering. Second segment of companies where server-side rendering may be very useful are entertainment one where users have less patience and they can close the www's browser if a webpage is loading slowly. In general, all B2C (consumer facing) apps shall use server-side rendering to improve its experience with the masses of people who are visiting their websites. Our focus for article will include the following: Making whole server-side code rearrangement to prepare for the server-side rendering Start using react-dom/server and it's renderToString method Are you ready? Our first step is to mock the database's response on the backend (we will create a real DB query after whole server-side rendering will work correctly on the mocked data). Mocking the database response h2 First of all, we will mock our database response on the backend in order to get prepared to go into server-side rendering directly. $ [[you are in the server directory of your project]] $ touch fetchServerSide.js The fetchServerSide.js file will consist of all functions that will fetch data from our database in order to make the server side works.As was mentioned earlier we will mock it for the meanwhile with following code in fetchServerSide.js: export default () => { return { 'article': { '0': { 'articleTitle': 'SERVER-SIDE Lorem ipsum - article one', 'articleContent':'SERVER-SIDE Here goes the content of the article' }, '1': { 'articleTitle':'SERVER-SIDE Lorem ipsum - article two', 'articleContent':'SERVER-SIDE Sky is the limit, the content goes here.' } } } } The goal of making this mocked object once again, is that we will be able to see if our server-side rendering works correctly after implementation because as you probably have already spotted that we have added this SERVER-SIDE in the beginning of each title and content—so it will help us to learn that our app is getting the data from server-side rendering. Later this function will be replaced with a query to MongoDB. Next thing that will help us implement the server-side rendering is to make a handleServerSideRender function that will be triggered each time a request hits the server. In order to make the handleServerSideRender trigger every time the frontend calls our backend we need to use the Express middleware using app.use. So far we were using some external libraries like: app.use(cors()) app.use(bodyParser.json({extended: false})) Now, we will write our own small's middleware function that behaves similar way to the cors or bodyParser (the external libs that are also middlewares). Before doing so, let's import our dependencies that are required in React's server-side rendering (server/server.js): import React from 'react'; import {createStore} from 'redux'; import {Provider} from 'react-redux'; import {renderToStaticMarkup} from 'react-dom/server'; import ReactRouter from 'react-router'; import {RoutingContext, match} from 'react-router'; import * as hist from 'history'; import rootReducer from '../src/reducers'; import reactRoutes from '../src/routes'; import fetchServerSide from './fetchServerSide'; After adding all those imports of the server/server.js, the file will be looking as following: import http from 'http'; import express from 'express'; import cors from 'cors'; import bodyParser from 'body-parser'; import falcor from 'falcor'; import falcorExpress from 'falcor-express'; import falcorRouter from 'falcor-router'; import routes from './routes.js'; import React from 'react' import { createStore } from 'redux' import { Provider } from 'react-redux' import { renderToStaticMarkup } from 'react-dom/server' import ReactRouter from 'react-router'; import { RoutingContext, match } from 'react-router'; import * as hist from 'history'; import rootReducer from '../src/reducers'; import reactRoutes from '../src/routes'; import fetchServerSide from './fetchServerSide'; Important is to import history in the given way as in the example import * as hist from 'history'. The RoutingContext, match is the way of using React-Router on the server side. The renderToStaticMarkup function is going to generate for us a HTML markup on serverside. After we have added those new imports then under falcor's middleware setup: // this already exists in your codebase app.use('/model.json', falcorExpress.dataSourceRoute((req, res) => { return new falcorRouter(routes); // this alrady exsits in your codebase })); Under themodel.jsonfile's code, please add the following: let handleServerSideRender = (req, res) => { return; }; let renderFullHtml = (html, initialState) => { return; };app.use(handleServerSideRender); The app.use(handleServerSideRender) is fired each time the server side receives a request from a client's application. Then we have prepared empty functions that we will use: handleServerSideRender:It will use renderToString in order to create a valid server-side's HTML's markup renderFullHtml:The helper's function will embed our new React's HTML markup into a whole HTML's document as you can later in a moment down below. Summary We have done the basic server-side rendering in this article. Resources for Article: Further resources on this subject: Basic Website using Node.js and MySQL database [article] How to integrate social media with your WordPress website [article] Laravel 5.0 Essentials [article]
Read more
  • 0
  • 0
  • 28839

article-image-creating-your-first-plug
Packt
16 Aug 2016
31 min read
Save for later

Creating Your First Plug-in

Packt
16 Aug 2016
31 min read
Eclipse – an IDE for everything and nothing in particular. Eclipse is a highly modular application consisting of hundreds of plugins, and can be extended by installing additional plugins. Plugins are developed and debugged with the Plugin Development Environment (PDE). In this article by Dr Alex Blewitt, author of the book, Eclipse Plug-in Development: Beginner's Guide - Second Edition, covers: Set up an Eclipse environment for doing plug-in development Create a plug-in with the new plug-in wizard Launch a new Eclipse instance with the plug-in enabled Debug the Eclipse plug-in (For more resources related to this topic, see here.) Getting started Developing plug-ins requires an Eclipse development environment. This has been developed and tested on Eclipse Mars 4.5 and Eclipse Neon 4.6, which was released in June 2016. Use the most recent version available. Eclipse plug-ins are generally written in Java. Although it's possible to use other JVM-based languages (such as Groovy or Scala). There are several different packages of Eclipse available from the downloads page, each of which contains a different combination of plug-ins. This has been tested with: Eclipse SDK from http://download.eclipse.org/eclipse/downloads/ Eclipse IDE for Eclipse Committers from http://www.eclipse.org/downloads/ These contain the necessary Plug-in Development Environment (PDE) feature as well as source code, help documentation, and other useful features. It is also possible to install the Eclipse PDE feature in an existing Eclipse instance. To do this, go to the Help menu and select Install New Software, followed by choosing the General Purpose Tools category from the selected update site. The Eclipse PDE feature contains everything needed to create a new plug-in. Time for action – setting up the Eclipse environment Eclipse is a Java-based application; it needs Java installed. Eclipse is distributed as a compressed archive and doesn't require an explicit installation step: To obtain Java, go to http://java.com and follow the instructions to download and install Java. Note that Java comes in two flavors: a 32-bit installation and a 64-bit installation. If the running OS is 32-bit, then install the 32-bit JDK; alternatively, if the running OS is 64-bit, then install the 64-bit JDK. Running java -version should give output like this: java version "1.8.0_92" Java(TM) SE Runtime Environment (build 1.8.0_92-b14) Java HotSpot(TM) 64-Bit Server VM (build 25.92-b14, mixed mode) Go to http://www.eclipse.org/downloads/ and select the Eclipse IDE for Eclipse Committers distribution. Download the one that matches the installed JDK. Running java -version should report either of these:    If it's a 32-bit JDK: Java HotSpot(TM) Client VM    If it's a 64-bit JDK: Java HotSpot(TM) 64-Bit Server VM On Linux, Eclipse requires GTK+ 2 or 3 to be installed. Most Linux distributions have a window manager based on GNOME, which provides GTK+ 2 or 3. To install Eclipse, download and extract the contents to a suitable location. Eclipse is shipped as an archive, and needs no administrator privileges to install. Do not run it from a networked drive as this will cause performance problems. Note that Eclipse needs to write to the folder where it is extracted, so it's normal that the contents are writable afterwards. Generally, installing in /Applications or C:Program Files as an administrator account is not recommended. Run Eclipse by double-clicking on the Eclipse icon, or by running eclipse.exe (Windows), eclipse (Linux), or Eclipse.app (macOS). On startup, the splash screen will be shown:   Choose a workspace, which is the location in which projects are to be stored, and click on OK:   Close the welcome screen by clicking on the cross in the tab next to the welcome text. The welcome screen can be reopened by navigating to Help | Welcome: What just happened? Eclipse needs Java to run, and so the first step involved in installing Eclipse is ensuring that an up-to-date Java installation is available. By default, Eclipse will find a copy of Java installed on the path or from one of the standard locations. It is also possible to specify a different Java by using the -vm command-line argument. If the splash screen doesn't show, then the Eclipse version may be incompatible with the JDK (for example, a 64-bit JDK with a 32-bit Eclipse, or vice versa). Common error messages shown at the launcher may include Unable to find companion launcher or a cryptic message about being unable to find an SWT library. On Windows, there is an additional eclipsec.exe launcher that allows log messages to be displayed on the console. This is sometimes useful if Eclipse fails to load and no other message is displayed. Other operating systems can use the eclipse command; and both support the -consolelog argument, which can display more diagnostic information about problems with launching Eclipse. The Eclipse workspace is a directory used for two purposes: as the default project location, and to hold the .metadata directory containing Eclipse settings, preferences, and other runtime information. The Eclipse runtime log is stored in the .metadata/.log file. The workspace chooser dialog has an option to set a default workspace. It can be changed within Eclipse by navigating to File | Switch Workspace. It can also be overridden by specifying a different workspace location with the -data command-line argument. Finally, the welcome screen is useful for first-time users, but it is worth closing (rather than minimizing) once Eclipse has started. Creating your first plug-in In this task, Eclipse's plug-in wizard will be used to create a plug-in. Time for action – creating a plug-in In PDE, every plug-in has its own individual project. A plug-in project is typically created with the new project wizard, although it is also possible to upgrade an existing Java project to a plug-in project by adding the PDE nature and the required files by navigating to Configure | Convert to plug-in project. To create a Hello World plug-in, navigate to File | New | Project…. The project types shown may be different from this list but should include Plug-in Project with Eclipse IDE for Eclipse Committers or Eclipse SDK. If nothing is shown when you navigate to File | New, then navigate to Window | Open Perspective | Other | Plug-in Development first; the entries should then be seen under the New menu. Choose Plug-in Project and click on Next. Fill in the dialog as follows: Project name should be com.packtpub.e4.hello.ui. Ensure that Use default location is selected. Ensure that Create a Java project is selected. The Eclipse version should be targeted to 3.5 or greater: Click on Next again, and fill in the plug-in properties: ID is set to com.packtpub.e4.hello.ui. Version is set to 1.0.0.qualifier. Name is set to Hello. Vendor is set to PacktPub. For Execution Environment, use the default (for example, JavaSE-1.8). Ensure that Generate an Activator is selected. Set Activator to com.packtpub.e4.hello.ui.Activator. Ensure that This plug-in will make contributions to the UI is selected. Rich client application should be No: Click on Next and a set of templates will be provided:    Ensure that Create a plug-in using one of the templates is selected.    Choose the Hello, World Command template:   Click on Next to customize the sample, including:    Java Package Name, which defaults to the project's name followed by .handlers    Handler Class Name, which is the code that gets invoked for the action Message Box Text, which is the message to be displayed: Finally, click on Finish and the project will be generated. If an Open Associated Perspective? dialog asks, click on Yes to show the Plug-in Development perspective. What just happened? Creating a plug-in project is the first step towards creating a plug-in for Eclipse. The new plug-in project wizard was used with one of the sample templates to create a project. Plug-ins are typically named in reverse domain name format, so these examples will be prefixed with com.packtpub.e4. This helps to distinguish between many plug-ins; the stock Eclipse IDE for Eclipse Committers comes with more than 450 individual plug-ins; the Eclipse-developed ones start with org.eclipse. Conventionally, plug-ins that create additions to (or require) the use of the UI have .ui. in their name. This helps to distinguish those that don't, which can often be used headlessly. Of the more than 450 plug-ins that make up the Eclipse IDE for Eclipse Committers, approximately 120 are UI-related and the rest are headless. The project contains a number of files that are automatically generated based on the content filled in the wizard. The key files in an Eclipse plug-in are: META-INF/MANIFEST.MF: The MANIFEST.MF file, also known as the OSGi manifest, describes the plug-in's name, version, and dependencies. Double-clicking on it will open a custom editor, which shows the information entered in the wizards; or it can be opened in a standard text editor. The manifest follows standard Java conventions; line continuations are represented by a newline followed by a single space character, and the file must end with a newline. plugin.xml: The plugin.xml file declares what extensions the plug-in provides to the Eclipse runtime. Not all plug-ins need a plugin.xml file; headless (non-UI) plug-ins often don't need to have one. Extension points will be covered in more detail later; but the sample project creates an extension for the commands, handlers, bindings, and menus' extension points. Text labels for the commands/actions/menus are represented declaratively in the plugin.xml file, rather than programmatically; this allows Eclipse to show the menu before needing to load or execute any code. This is one of the reasons Eclipse starts so quickly; by not needing to load or execute classes, it can scale by showing what's needed at the time, and then load the class on demand when the user invokes the action. Java Swing's Action class provides labels and tooltips programmatically, which can result in slower initialization of Swing-based user interfaces. build.properties: The build.properties file is used by PDE at development time and at build time. Generally it can be ignored, but if resources are added that need to be made available to the plug-in (such as images, properties files, HTML content and more), then an entry must be added here as otherwise it won't be found. Generally the easiest way to do this is by going to the Build tab of the build.properties file, which will gives a tree-like view of the project's contents. This file is an archaic hangover from the days of Ant builds, and is generally useless when using more up-to-date builds such as Maven Tycho. Pop quiz – Eclipse workspaces and plugins Q1. What is an Eclipse workspace? Q2. What is the naming convention for Eclipse plug-in projects? Q3. What are the names of the three key files in an Eclipse plug-in? Running plug-ins To test an Eclipse plug-in, Eclipse is used to run or debug a new Eclipse instance with the plug-in installed. Time for action – launching Eclipse from within Eclipse Eclipse can launch a new Eclipse application by clicking on the run icon, or via the Run menu: Select the plug-in project in the workspace. Click on the run icon  to launch the project. The first time this happens, a dialog will be shown; subsequent launches will remember the chosen type:   Choose the Eclipse Application type and click on OK. A new Eclipse instance will be launched. Close the Welcome page in the launched application, if shown. Click on the hello world icon in the menu bar, or navigate to Sample Menu | Sample Command from the menu, and the dialog box created via the wizard will be shown:   Quit the target Eclipse instance by closing the window, or via the usual keyboard shortcuts or menus (Cmd + Q on macOS or Alt + F4 on Windows). What just happened? Upon clicking on run  in the toolbar (or via Run | Run As | Eclipse Application) a launch configuration is created, which includes any plug-ins open in the workspace. A second copy of Eclipse—with its own temporary workspace—will enable the plug-in to be tested and verify that it works as expected. The run operation is intelligent, in that it launches an application based on what is selected in the workspace. If a plug-in is selected, it will offer the opportunity to run as an Eclipse Application; if a Java project with a class with a main method, it will run it as a standard Java Application; and if it has tests, then it will offer to run the test launcher instead. However, the run operation can also be counter-intuitive; if clicked a second time, and in a different project context, then something other than the expected launched might be run. A list of the available launch configurations can be seen by going to the Run menu, or by going to the dropdown to the right of the run icon. The Run | Run Configurations menu shows all the available types, including any previously run: By default, the runtime workspace is kept between runs. The launch configuration for an Eclipse application has options that can be customized; in the preceding screenshot, the Workspace Data section in the Main tab shows where the runtime workspace is stored, and an option is shown that allows the workspace to be cleared (with or without confirmation) between runs. Launch configurations can be deleted by clicking on the red delete icon on the top left, and new launch configurations can be created by clicking on the new icon. Each launch configuration has a type: Eclipse Application Java Applet Java Application JUnit JUnit Plug-in Test OSGi Framework The launch configuration can be thought of as a pre-canned script that can launch different types of programs. Additional tabs are used to customize the launch, such as the environment variables, system properties, or command-line arguments. The type of the launch configuration specifies what parameters are required and how the launch is executed. When a program is launched with the run icon, changes to the project's source code while it is running have no effect. However, as we'll see in the next section, if launched with the debug icon, changes can take effect. If the target Eclipse is hanging or otherwise unresponsive, in the host Eclipse instance, the Console view (shown by navigating to Window | View | Show View | Other | General | Console menu) can be used to stop the target Eclipse instance. Pop quiz: launching Eclipse Q1. What are the two ways of terminating a launched Eclipse instance? Q2. What are launch configurations? Q3. How are launch configurations created and deleted? Have a go hero – modifying the plug-in Now that the Eclipse plug-in is running, try the following: Change the message of the label and title of the dialog box to something else Invoke the action by using the keyboard shortcut (defined in plugin.xml) Change the tooltip of the action to a different message Switch the action icon to a different graphic (if a different filename is used, remember to update it in plugin.xml and build.properties) Debugging a plug-in Since it's rare that everything works first time, it's often necessary to develop iteratively, adding progressively more functionality each time. Secondly, it's sometimes necessary to find out what's going on under the cover when trying to fix a bug, particularly if it's a hard-to-track-down exception such as NullPointerException. Fortunately, Eclipse comes with excellent debugging support, which can be used to debug both standalone Java applications as well as Eclipse plug-ins. Time for action – debugging a plug-in Debugging an Eclipse plug-in is much the same as running an Eclipse plug-in, except that breakpoints can be used, the state of the program can be updated, and variables and minor changes to the code can be made. Rather than debugging plug-ins individually, the entire Eclipse launch configuration is started in debug mode. That way, all the plug-ins can be debugged at the same time. Although run mode is slightly faster, the added flexibility of being able to make changes makes debug mode much more attractive to use as a default. Start the target Eclipse instance by navigating to Debug | Debug As | Eclipse Application, or by clicking on debug  in the toolbar: Click on the hello world icon in the target Eclipse to display the dialog, as before, and click on OK to dismiss it. In the host Eclipse, open the SampleHandler class and go to the first line of the execute method. Add a breakpoint by double-clicking in the vertical ruler (the grey/blue bar on the left of the editor), or by pressing Ctrl + Shift + B (or Cmd + Shift + B on macOS). A blue dot representing the breakpoint will appear in the ruler: Click on the hello world icon in the target Eclipse to display the dialog, and the debugger will pause the thread at the breakpoint in the host Eclipse:   The debugger perspective will open whenever a breakpoint is triggered and the program will be paused. While it is paused, the target Eclipse is unresponsive. Any clicks on the target Eclipse application will be ignored, and it will show a busy cursor. In the top right, variables that are active in the line of code are shown. In this case, it's just the implicit variables (via this), any local variables (none yet), as well as the parameter (in this case, event). Click on Step Over or press F6, and window will be added to the list of available variables: When ready to continue, click on resume  or press F8 to keep running. What just happened? The built-in Eclipse debugger was used to launch Eclipse in debug mode. By triggering an action that led to a breakpoint, the debugger was revealed, allowing the local variables to be inspected. When in the debugger, there are several ways to step through the code: Step Over: This allows stepping over line by line in the method Step Into: This follows the method calls recursively as execution unfolds There is also a Run | Step into Selection menu item; it does not have a toolbar icon. It can be invoked with Ctrl + F5 (Alt + F5 on macOS) and is used to step into a specific expression. Step Return: This jumps to the end of a method Drop to Frame: This returns to a stack frame in the thread to re-run an operation Time for action – updating code in the debugger When an Eclipse instance is launched in run mode, changes made to the source code aren't reflected in the running instance. However, debug mode allows changes made to the source to be reflected in the running target Eclipse instance: Launch the target Eclipse in debug mode by clicking on the debug icon. Click on the hello world icon in the target Eclipse to display the dialog, as before, and click on OK to dismiss it. It may be necessary to remove or resume the breakpoint in the host Eclipse instance to allow execution to continue. In the host Eclipse, open the SampleHandler class and go to the execute method. Change the title of the dialog to Hello again, Eclipse world and save the file. Provided the Build Automatically option in Project menu is enabled, the change will be automatically recompiled. Click on the hello world icon in the target Eclipse instance again. The new message should be shown. What just happened? By default, Eclipse ships with the Build Automatically option in Project menu enabled. Whenever changes are made to Java files, they are recompiled along with their dependencies if necessary. When a Java program is launched in run mode, it will load classes on demand and then keep using that definition until the JVM shuts down. Even if the classes are changed, the JVM won't notice that they have been updated, and so no differences will be seen in the running application. However, when a Java program is launched in debug mode, whenever changes to classes are made, Eclipse will update the running JVM with the new code if possible. The limits to what can be replaced are controlled by the JVM through the Java Virtual Machine Tools Interface (JVMTI). Generally, updating an existing method and adding a new method or field will work, but changes to interfaces and superclasses may not be. The Hotspot JVM cannot replace classes if methods are added or interfaces are updated. Some JVMs have additional capabilities that can substitute more code on demand. Other JVMs, such as IBM's, can deal with a wider range of replacements. Note that there are some types of changes that won't be picked up, for example, new extensions added to the plugin.xml file. In order to see these changes, it is possible to start and stop the plug-in through the command-line OSGi console, or restart Eclipse inside or outside of the host Eclipse to see the change. Debugging with step filters When debugging using Step Into, the code will frequently go into Java internals, such as the implementation of Java collections classes or other internal JVM classes. These don't usually add value, so fortunately Eclipse has a way of ignoring uninteresting classes. Time for action – setting up step filtering Step filters allow for uninteresting packages and classes to be ignored during step debugging: Run the target Eclipse instance in debug mode. Ensure that a breakpoint is set at the start of the execute method of the SampleHandler class. Click on the hello world icon, and the debugger should open at the first line, as before. Click on Step Into five or six times. At each point, the code will jump to the next method in the expression, first through various methods in HandlerUtil and then into ExecutionEvent. Click on resume   to continue. Open Preferences and then navigate to Java | Debug | Step Filtering. Select the Use Step Filters option. Click on Add Package and enter org.eclipse.ui, followed by a click on OK:   Click on the hello world icon again. Click on Step Into as before. This time, the debugger goes straight to the getApplicationContext method in the ExecutionEvent class. Click on resume   to continue. To make debugging more efficient by skipping accessors, go back to the Step Filters preference and select Filter Simple Getters from the Step Filters preferences page. Click on the hello world icon again. Click on Step Into as before. Instead of going into the getApplicationContext method, the execution will drop through to the getVariable method of the ExpressionContext class instead. What just happened? Step Filters allows uninteresting packages to be skipped, at least from the point of debugging. Typically, JVM internal classes (such as those beginning with sun or sunw) are not helpful when debugging and can easily be ignored. This also avoids debugging through the ClassLoader as it loads classes on demand. Typically it makes sense to enable all the default packages in the Step Filters dialog, as it's pretty rare to need to debug any of the JVM libraries (internal or public interfaces). This means that when stepping through code, if a common method such as toString is called, debugging won't step through the internal implementation. It also makes sense to filter out simple setters and getters (those that just set a variable or those that just return a variable). If the method is more complex (like the getVariable method previously), then it will still stop in the debugger. Constructors and static initializers can also be filtered specifically. Using different breakpoint types Although it's possible to place a breakpoint anywhere in a method, a special breakpoint type exists that can fire on method entry, exit, or both. Breakpoints can also be customized to only fire in certain situations or when certain conditions are met. Time for action – breaking at method entry and exit Method breakpoints allow the user to see when a method is entered or exited: Open the SampleHandler class, and go to the execute method. Double-click in the vertical ruler at the method signature, or select Toggle Method Breakpoint from the method in one of the Outline, Package Explorer or Members views. The breakpoint should be shown on the line: public Object execute(...) throws ExecutionException { Open the breakpoint properties by right-clicking on the breakpoint or via the Breakpoints view, which is shown in the Debug perspective. Set the breakpoint to trigger at method entry and method exit. Click on the hello world icon again. When the debugger stops at method entry, click on resume . When the debugger stops at method exit, click on resume . What just happened? The breakpoint triggers at the time the method enters and subsequently when the method's return is reached. Note that the exit is only triggered if the method returns normally; if an uncaught exception is raised, it is not treated as a normal method exit, and so the breakpoint won't fire. Other than the breakpoint type, there's no significant difference between creating a breakpoint on method entry and creating one on the first statement of the method. Both give the ability to inspect the parameters and do further debugging before any statements in the method itself are called. The method exit breakpoint will only trigger once the return statement is about to leave the method. Thus any expression in the method's return value will have been evaluated prior to the exit breakpoint firing. Compare and contrast this with the line breakpoint, which will wait to evaluate the argument of the return statement. Note that Eclipse's Step Return has the same effect; this will run until the method's return statement is about to be executed. However, to find when a method returns, using a method exit breakpoint is far faster than stopping at a specific line and then doing Step Return. Using conditional breakpoints Breakpoints are useful since they can be invoked on every occasion when a line of code is triggered. However, they sometimes need to break for specific actions only—such as when a particular option is set, or when a value has been incorrectly initialized. Fortunately, this can be done with conditional breakpoints. Time for action – setting a conditional breakpoint Normally breakpoints fire on each invocation. It is possible to configure breakpoints such that they fire when certain conditions are met; these are known as conditional breakpoints: Go to the execute method of the SampleHandler class. Clear any existing breakpoints, by double-clicking on them or using Remove All Breakpoints from the Breakpoints view. Add a breakpoint to the first line of the execute method body. Right-click on the breakpoint, and select the Breakpoint Properties menu (it can also be shown by Ctrl + double-clicking—or Cmd + double-clicking in macOS—on the breakpoint icon itself):   Set Hit Count to 3, and click on OK. Click on the hello world icon button three times. On the third click, the debugger will open up at that line of code. Open the breakpoint properties, deselect Hit Count, and select the Enabled and Conditional options. Put the following line into the conditional trigger field: ((org.eclipse.swt.widgets.Event)event.trigger).stateMask==65536 Click on the hello world icon, and the breakpoint will not fire. Hold down Alt + click on the hello world icon, and the debugger will open (65536 is the value of SWT.MOD3, which is the Alt key). What just happened? When a breakpoint is created, it is enabled by default. A breakpoint can be temporarily disabled, which has the effect of removing it from the flow of execution. Disabled breakpoints can be easily re-enabled on a per breakpoint basis, or from the Breakpoints view. Quite often it's useful to have a set of breakpoints defined in the code base, but not necessarily have them all enabled at once. It is also possible to temporarily disable all breakpoints using the Skip All Breakpoints setting, which can be changed from the corresponding item in the Run menu (when the Debug perspective is shown) or the corresponding icon in the Breakpoints view. When this is enabled, no breakpoints will be fired. Conditional breakpoints must return a value. If the breakpoint is set to break whether or not the condition is true, it must be a Boolean expression. If the breakpoint is set to stop whenever the value changes, then it can be any Java expression. Multiple statements can be used provided that there is a return keyword with a value expression. Using exceptional breakpoints Sometimes when debugging a program, an exception occurs. Typically this isn't known about until it happens, when an exception message is printed or displayed to the user via some kind of dialog box. Time for action – catching exceptions Although it's easy to put a breakpoint in the catch block, this is merely the location where the failure was ultimately caught, not where it was caused. The place where it was caught can often be in a completely different plug-in from where it was raised, and depending on the amount of information encoded within the exception (particularly if it has been transliterated into a different exception type) may hide the original source of the problem. Fortunately, Eclipse can handle such cases with a Java Exception Breakpoint: Introduce a bug into the execute method of the SampleHandler class, by adding the following just before the MessageDialog.openInformation() call: window = null; Click on the hello world icon. Nothing will appear to happen in the target Eclipse, but in the Console view of the host Eclipse instance, the error message should be seen: Caused by: java.lang.NullPointerException   at com.packtpub.e4.hello.ui.handlers.SampleHandler.execute   at org.eclipse.ui.internal.handlers.HandlerProxy.execute   at org.eclipse.ui.internal.handlers.E4HandlerProxy.execute Create a Java Exception Breakpoint in the Breakpoints view of the Debug perspective. The Add Java Exception Breakpoint dialog will be shown:   Enter NullPointerException in the search dialog, and click on OK. Click on the hello world icon, and the debugger will stop at the line where the exception is thrown, instead of where it is caught: What just happened? The Java Exception Breakpoint stops when an exception is thrown, not when it is caught. The dialog asks for a single exception class to catch, and by default, the wizard has been pre-filled with any class whose name includes *Exception*. However, any name (or filter) can be typed into the search box, including abbreviations such as FNFE for FileNotFoundException. Wildcard patterns can also be used, which allows searching for Nu*Ex or *Unknown*. By default, the exception breakpoint corresponds to instances of that specific class. This is useful (and quick) for exceptions such as NullPointerException, but not so useful for ones with an extensive class hierarchy, such as IOException. In this case, there is a checkbox visible on the breakpoint properties and at the bottom of the breakpoints view, which allows the capture of all Subclasses of this exception, not just of the specific class. There are also two other checkboxes that say whether the debugger should stop when the exception is Caught or Uncaught. Both of these are selected by default; if both are deselected, then the breakpoint effectively becomes disabled. Caught means that the exception is thrown in a corresponding try/catch block, and Uncaught means that the exception is thrown without a try/catch block (this bubbles up to the method's caller). Time for action – inspecting and watching variables Finally, it's worth seeing what the Variables view can do: Create a breakpoint at the start of the execute method. Click on the hello world icon again. Highlight the openInformation call and navigate to Run | Step Into Selection. Select the title variable in the the Variables view. Modify where it says Hello in the bottom half of the variables view and change it to Goodbye:   Save the value with Ctrl + S (or Cmd + S on macOS). Click on resume, and the newly updated title can be seen in the dialog. Click on the hello world icon again. With the debugger stopped in the execute method, highlight the event in the Variables view. Right-click on the value and choose Inspect (by navigating to Ctrl + Shift + I or Cmd + Shift + I on macOS) and the value is opened in the Expressions view: Click on Add new expression at the bottom of the Expressions view. Add new java.util.Date() and the right-hand side will show the current time. Right-click on the new java.util.Date() and choose Re-evaluate Watch Expression. The right-hand-side pane shows the new value. Step through the code line by line, and notice that the watch expression is re-evaluated after each step. Disable the watch expression by right-clicking on it and choosing Disable. Step through the code line by line, and the watch expression will not be updated. What just happened? The Eclipse debugger has many powerful features, and the ability to inspect (and change) the state of the program is one of the more important ones. Watch expressions, when combined with conditional breakpoints, can be used to find out when data becomes corrupted or used to show the state of a particular object's value. Expressions can also be evaluated based on objects in the variables view, and code completion is available to select methods, with the result being shown with Display. Pop quiz: debugging Q1. How can an Eclipse plug-in be launched in debug mode? Q2. How can certain packages be avoided when debugging? Q3. What are the different types of breakpoints that can be set? Q4. How can a loop that only exhibits a bug after 256 iterations be debugged? Q5. How can a breakpoint be set on a method when its argument is null? Q6. What does inspecting an object do? Q7. How can the value of an expression be calculated? Q8. How can multiple statements be executed in breakpoint conditions? Have a go hero – working with breakpoints Using a conditional breakpoint to stop at a certain method is fine if the data is simple, but sometimes there needs to be more than one expression. Although it is possible to use multiple statements in the breakpoint condition definition, the code is not very reusable. To implement additional reusable functionality, the breakpoint can be delegated to a breakpoint utility class: Create a Utility class in the com.packtpub.e4.hello.ui.handlers package with a static, breakpoint that returns a true value if the breakpoint should stop, and false otherwise: public class Utility {   public static boolean breakpoint() {     System.out.println("Breakpoint");     return false;   } } Create a conditional breakpoint in the execute method that calls Utility.breakpoint(). Click on the hello world icon again, and the message will be printed to the host Eclipse's Console view. The breakpoint will not stop. Modify the breakpoint method to return true instead of false. Run the action again. The debugger will stop. Modify the breakpoint method to take the message as an argument, along with a Boolean value that is returned to say whether the breakpoint should stop. Set up a conditional breakpoint with the expression: Utility.breakpoint(  ((org.eclipse.swt.widgets.Event)event.trigger).stateMask != 0, "Breakpoint") Modify the breakpoint method to take a variable Object array, and use that in conjunction with the message to use String.format() for the resulting message: Utility.breakpoint(  ((org.eclipse.swt.widgets.Event)event.trigger).stateMask != 0,  "Breakpoint %s %h",  event,  java.time.Instant.now()) Summary In this article, we covered how to get started with Eclipse plug-in development. From downloading the right Eclipse package to getting started with a wizard-generated plug-in. Specifically, we learned these things: The Eclipse SDK and the Eclipse IDE for Eclipse Committers have the necessary plug-in development environment to get you started The plug-in creation wizard can be used to create a plug-in project, optionally using one of the example templates Testing an Eclipse plug-in launches a second copy of Eclipse with the plug-in installed and available for use Launching Eclipse in debug mode allows you to update code and stop execution at breakpoints defined via the editor Now that we've learned how to get started with Eclipse plug-ins, we're ready to look at creating plug-ins that contribute to the IDE, starting with SWT and Views. Resources for Article: Further resources on this subject: Apache Maven and m2eclipse [article] Installing and Setting up JavaFX for NetBeans and Eclipse IDE [article] JBoss AS plug-in and the Eclipse Web Tools Platform [article]
Read more
  • 0
  • 0
  • 1997

article-image-preprocessing-data
Packt
16 Aug 2016
5 min read
Save for later

Preprocessing the Data

Packt
16 Aug 2016
5 min read
In this article, by Sampath Kumar Kanthala, the author of the book Practical Data Analysis discusses how to obtain, clean, normalize, and transform raw data into a standard format like CVS or JSON using OpenRefine. In this article we will cover: Data Scrubbing Statistical methods Text Parsing Data Transformation (For more resources related to this topic, see here.) Data scrubbing Scrubbing data also called data cleansing, is the process of correcting or removing data in a dataset that is incorrect, inaccurate, incomplete, improperly formatted, or duplicated. The result of the data analysis process not only depends on the algorithms, it depends on the quality of the data. That's why the next step after obtaining the data, is the data scrubbing. In order to avoid dirty data our dataset should possess the following characteristics: Correct Completeness Accuracy Consistency Uniformity Dirty data can be detected by applying some simple statistical data validation also by parsing the texts or deleting duplicate values. Missing or sparse data can lead you to highly misleading results. Statistical methods In this method we need some context about the problem (knowledge domain) to find values that are unexpected and thus erroneous, even if the data type match but the values are out of the range, it can be resolved by setting the values to an average or mean value. Statistical validations can be used to handle missing values which can be replaced by one or more probable values using Interpolation or by reducing the data set using decimation. Mean: Value calculated by summing up all values and then dividing by the number of values. Median: The median is defined as the value where 50% of values in a range will be below, 50% of values above the value. Range constraints: Numbers or dates should fall within a certain range. That is, they have minimum and/or maximum possible values. Clustering: Usually, when we obtain data directly from the user some values include ambiguity or refer to the same value with a typo. For example, "Buchanan Deluxe 750ml 12x01 "and "Buchanan Deluxe 750ml   12x01." which are different only by a "." or in the case of "Microsoft" or "MS" instead of "Microsoft Corporation" which refer to the same company and all values are valid. In those cases, grouping can help us to get accurate data and eliminate duplicated enabling a faster identification of unique values. Text parsing We perform parsing to help us to validate if a string of data is well formatted and avoid syntax errors. Regular expression patterns usually, text fields would have to be validated this way. For example, dates, e-mail, phone numbers, and IP address. Regex is a common abbreviation for "regular expression"): In Python we will use re module to implement regular expressions. We can perform text search and pattern validations. First, we need to import the re module. import re In the follow examples, we will implement three of the most common validations (e-mail, IP address, and date format). E-mail validation: myString = 'From: readers@packt.com (readers email)' result = re.search('([w.-]+)@([w.-]+)', myString) if result: print (result.group(0)) print (result.group(1)) print (result.group(2)) Output: >>> readers@packt.com >>> readers >>> packt.com The function search() scans through a string, searching for any location where the Regex matches. The function group() helps us to return the string matched by the Regex. The pattern w matches any alphanumeric character and is equivalent to the class [a-zA-Z0-9_]. IP address validation: isIP = re.compile('d{1,3}.d{1,3}.d{1,3}.d{1,3}') myString = " Your IP is: 192.168.1.254 " result = re.findall(isIP,myString) print(result) Output: >>> 192.168.1.254 The function findall() finds all the substrings where the Regex matches, and returns them as a list. The pattern d matches any decimal digit, is equivalent to the class [0-9]. Date format: myString = "01/04/2001" isDate = re.match('[0-1][0-9]/[0-3][0-9]/[1-2][0-9]{3}', myString) if isDate: print("valid") else: print("invalid") Output: >>> 'valid' The function match() finds if the Regex matches with the string. The pattern implements the class [0-9] in order to parse the date format. For more information about regular expressions: http://docs.python.org/3.4/howto/regex.html#regex-howto Data transformation Data transformation is usually related with databases and data warehouse where values from a source format are extract, transform, and load in a destination format. Extract, Transform, and Load (ETL) obtains data from data sources, performs some transformation function depending on our data model and loads the result data into destination. Data extraction allows us to obtain data from multiple data sources, such as relational databases, data streaming, text files (JSON, CSV, XML), and NoSQL databases. Data transformation allows us to cleanse, convert, aggregate, merge, replace, validate, format, and split data. Data loading allows us to load data into destination format, like relational databases, text files (JSON, CSV, XML), and NoSQL databases. In statistics data transformation refers to the application of a mathematical function to the dataset or time series points. Summary In this article, we explored the common data sources and implemented a web scraping example. Next, we introduced the basic concepts of data scrubbing like statistical methods and text parsing. Resources for Article:   Further resources on this subject: MicroStrategy 10 [article] Expanding Your Data Mining Toolbox [article] Machine Learning Using Spark MLlib [article]
Read more
  • 0
  • 0
  • 8387

article-image-introducing-vsphere-vmotion
Packt
16 Aug 2016
5 min read
Save for later

Introducing vSphere vMotion

Packt
16 Aug 2016
5 min read
In this article by Abhilash G B and Rebecca Fitzhugh author of the book Learning VMware vSphere, we are mostly going to be talking about howvSphere vMotion is a VMware technology used to migrate a running virtual machine from one host to another without altering its power-state. The beauty of the whole process is that it is transparent to the applications running inside the virtual machine. In this section we will understand the inner workings of vMotion and learn how to configure it. There are different types of vMotion, such as: Compute vMotion Storage vMotion Unified vMotion Enhanced vMotion (X-vMotion) Cross vSwitch vMotion Cross vCenter vMotion Long Distance vMotion (For more resources related to this topic, see here.) Compute vMotion is the default vMotion method and is employed by other features such as DRS, FT and Maintenance Mode. When you initiate a vMotion, it initiates an iterative copy of all memory pages. After the first pass, all the dirtied memory pages are copied again by doing another pass and this is done iteratively until the amount of pages left over to be copied is small enough to be transferred and to switch over the state of the VM to the destination host. During the switch over, the virtual machine's device state are transferred and resumed at the destination host.You can initiate up to 8 simultaneous vMotion operations on a single host. Storage vMotion is used to migrate the files backing a virtual machine (virtual disks, configuration files, logs) from one datastore to another while the virtual machine is still running. When you initiate a storage vMotion, it starts a sequential copy of source disk in 64 MB chunks. While a region is being copied, all the writes issued to that region are deferred until the region is copied. An already copied source region is monitored for further writes. If there is a write I/O, then it will be mirrored to the destination disk as well. This process of mirror writes to the destination virtual disk continues until the sequential copy of the entire source virtual disk is complete. Once the sequential copy is complete, all subsequent READS/WRITES are issued to the destination virtual disk. Keep in mind though that while the sequential copy is still in progress all the READs are issued to the source virtual disk. Storage vMotion is used be Storage DRS. You initiate up to 2 simultaneous SvMotion operations on a single host. Unified vMotion is used to migrate both the running state of a virtual machine and files backing it from one host and datastore to another. Unified vMotion uses a combination of both Compute and Storage vMotion to achieve the migration. First, the configuration files and the virtual disks are migrated and only then the migration of live state of the virtual machine will begin. You can initiate up to 2 simultaneous Unified vMotion operations on a single host. Enhanced vMotion (X-vMotion) is used to migrate virtual machine between hosts that do not share storage. Both the virtual machine's running state and the files backing it are transferred over the network to the destination. The migration procedure is same as the compute and storage vMotion. In fact, Enhanced vMotion uses Unified vMotion to achieve the migration. Since the memory and disk states are transferred over vMotion network, ESXi hosts maintain a transmit buffer at the source and a receive buffer at the destination. The transmit buffer collects and places data on to the network, while the receive buffer will collect data received via the network and flushes it to the storage. You can initiate up to 2 simultaneous X-vMotion operations on a single host. Cross vSwitch vMotion allows you to choose a destination port group for the virtual machine. It is important to note that unless the destination port group supports the same L2 network, the virtual machine will not be able to communicate over the network. Cross vSwitch vMotion allows changing from Standard vSwitch to VDS, but not from VDS to Standard vSwitch. vSwitch to vSwitch and VDS to VDS is supported. Cross vCenter vMotion allows migrating virtual machines beyond the vCenter's boundary. This is a new enhancement with vSphere 6.0. However, for this to be possible both the vCenter's should be in the same SSO Domain and should be in Enhanced Linked Mode. Infrastructure requirement for Cross vCenter vMotion has been detailed in the VMware Knowledge Base article 2106952 at the following link:http://kb.vmware.com/kb/2106952. Long Distance vMotion allows migrating virtual machines over distances with a latency not exceeding 150 milliseconds. Prior to vSphere 6.0, the maximum supported network latency for vMotion was 10 milliseconds. Using the provisioning interface You can configure a Provisioning Interface to send all non-active data of the virtual machine being migrated. Prior to vSphere 6.0, vMotion used the vmkernel interface which has the default gateway configured on it (which in most cases is the management interface vmk0) to transfer non-performance impacting vMotion data. Non-performance impacting vMotion data includes the Virtual Machine's home directory, older delta in the snapshot chain, base disks etc. Only the live data will hit the vMotion interface. The Provisioning Interface is nothing but a vmkernel interface with Provisioning Traffic enabled on this. The procedure to do this is very similar to how you would configure a vmkernel interface for Management or vMotion traffic. You will have to edit the settings of the intended vmk interface and set Provisioning traffic as the enabled service: It is important to keep in mind that the provisioning interface is not just meant for VMotion data, but if enabled it will be used for cold migrations, cloning operations and virtual machine snapshots. The provisioning interface can be configured to use a different gateway other than vmkernel's default gateway. Further resources on this subject: Cloning and Snapshots in VMware Workstation [article] Essentials of VMware vSphere [article] Upgrading VMware Virtual Infrastructure Setups [article]
Read more
  • 0
  • 0
  • 12197

article-image-setting-mongodb
Packt
12 Aug 2016
10 min read
Save for later

Setting up MongoDB

Packt
12 Aug 2016
10 min read
In this article by Samer Buna author of the book Learning GraphQL and Relay, we're mostly going to be talking about how an API is nothing without access to a database. Let's set up a local MongoDB instance, add some data in there, and make sure we can access that data through our GraphQL schema. (For more resources related to this topic, see here.) MongoDB can be locally installed on multiple platforms. Check the documentation site for instructions for your platform (https://docs.mongodb.com/manual/installation/). For Mac, the easiest way is probably Homebrew: ~ $ brew install mongodb Create a db folder inside a data folder. The default location is /data/db: ~ $ sudo mkdir -p /data/db Change the owner of the /data folder to be the current logged-in user: ~ $ sudo chown -R $USER /data Start the MongoDB server: ~ $ mongod If everything worked correctly, we should be able to open a new terminal and test the mongo CLI: ~/graphql-project $ mongo MongoDB shell version: 3.2.7 connecting to: test > db.getName() test > We're using MongoDB version 3.2.7 here. Make sure that you have this version or newer versions of MongoDB. Let's go ahead and create a new collection to hold some test data. Let's name that collection users: > db.createCollection("users")" { "ok" : 1 } Now we can use the users collection to add documents that represent users. We can use the MongoDB insertOne() function for that: > db.users.insertOne({ firstName: "John"," lastName: "Doe"," email: "john@example.com" }) We should see an output like: { "acknowledged" : true, "insertedId" : ObjectId("56e729d36d87ae04333aa4e1") } Let's go ahead and add another user: > db.users.insertOne({ firstName: "Jane"," lastName: "Doe"," email: "jane@example.com" }) We can now verify that we have two user documents in the users collection using: > db.users.count() 2 MongoDB has a built-in unique object ID which you can see in the output for insertOne(). Now that we have a running MongoDB, and we have some test data in there, it's time to see how we can read this data using a GraphQL API. To communicate with a MongoDB from a Node.js application, we need to install a driver. There are many options that we can choose from, but GraphQL requires a driver that supports promises. We will use the official MongoDB Node.js driver which supports promises. Instructions on how to install and run the driver can be found at: https://docs.mongodb.com/ecosystem/drivers/node-js/. To install the MongoDB official Node.js driver under our graphql-project app, we do: ~/graphql-project $ npm install --save mongodb └─┬ mongodb@2.2.4 We can now use this mongodb npm package to connect to our local MongoDB server from within our Node application. In index.js: const mongodb = require('mongodb'); const assert = require('assert'); const MONGO_URL = 'mongodb'://localhost:27017/test'; mongodb.MongoClient.connect(MONGO_URL, (err, db) => { assert.equal(null, err); console.log('Connected' to MongoDB server'); // The readline interface code }); The MONGO_URL variable value should not be hardcoded in code like this. Instead, we can use a node process environment variable to set it to a certain value before executing the code. On a production machine, we would be able to use the same code and set the process environment variable to a different value. Use the export command to set the environment variable value: export MONGO_URL=mongodb://localhost:27017/test Then in the Node code, we can read the exported value by using: process.env.MONGO_URL If we now execute the node index.js command, we should see the Connected to MongoDB server line right before we ask for the Client Request. At this point, the Node.js process will not exit after our interaction with it. We'll need to force exit the process with Ctrl + C to restart it. Let's start our database API with a simple field that can answer this question: How many total users do we have in the database? The query could be something like: { usersCount } To be able to use a MongoDB driver call inside our schema main.js file, we need access to the db object that the MongoClient.connect() function exposed for us in its callback. We can use the db object to count the user documents by simply running the promise: db.collection('users').count() .then(usersCount => console.log(usersCount)); Since we only have access to the db object in index.js within the connect() function's callback, we need to pass a reference to that db object to our graphql() function. We can do that using the fourth argument for the graphql() function, which accepts a contextValue object of globals, and the GraphQL engine will pass this context object to all the resolver functions as their third argument. Modify the graphql function call within the readline interface in index.js to be: graphql.graphql(mySchema, inputQuery, {}, { db }).then(result => { console.log('Server' Answer :', result.data); db.close(() => rli.close()); }); The third argument to the graphql() function is called the rootValue, which gets passed as the first argument to the resolver function on the top level type. We are not using that feature here. We passed the connected database object db as part of the global context object. This will enable us to use db within any resolver function. Note also how we're now closing the rli interface within the callback for the operation that closes the db. We should not leave any open db connections behind. Here's how we can now use the resolver third argument to resolve our usersCount top-level field with the db count() operation: fields: { // "hello" and "diceRoll"..." usersCount: { type: GraphQLInt, resolve: (_, args, { db }) => db.collection('users').count() } } A couple of things to notice about this code: We destructured the db object from the third argument for the resolve() function so that we can use it directly (instead of context.db). We returned the promise itself from the resolve() function. The GraphQL executor has native support for promises. Any resolve() function that returns a promise will be handled by the executor itself. The executor will either successfully resolve the promise and then resolve the query field with the promise-resolved value, or it will reject the promise and return an error to the user. We can test our query now: ~/graphql-project $ node index.js Connected to MongoDB server Client Request: { usersCount } Server Answer : { usersCount: 2 } *** #GitTag: chapter1-setting-up-mongodb *** Setting up an HTTP interface Let's now see how we can use the graphql() function under another interface, an HTTP one. We want our users to be able to send us a GraphQL request via HTTP. For example, to ask for the same usersCount field, we want the users to do something like: /graphql?query={usersCount} We can use the Express.js node framework to handle and parse HTTP requests, and within an Express.js route, we can use the graphql() function. For example (don't add these lines yet): const app = express(); app.use('/graphql', (req, res) => { // use graphql.graphql() to respond with JSON objects }); However, instead of manually handling the req/res objects, there is a GraphQL Express.js middleware that we can use, express-graphql. This middleware wraps the graphql() function and prepares it to be used by Express.js directly. Let's go ahead and bring in both the Express.js library and this middleware: ~/graphql-project $ npm install --save express express-graphql ├─┬ express@4.14.0 └─┬ express-graphql@0.5.3 In index.js, we can now import both express and the express-graphql middleware: const graphqlHTTP = require('express-graphql'); const express = require('express'); const app = express(); With these imports, the middleware main function will now be available as graphqlHTTP(). We can now use it in an Express route handler. Inside the MongoClient.connect() callback, we can do: app.use('/graphql', graphqlHTTP({ schema: mySchema, context: { db } })); app.listen(3000, () => console.log('Running Express.js on port 3000') ); Note that at this point we can remove the readline interface code as we are no longer using it. Our GraphQL interface from now on will be an HTTP endpoint. The app.use line defines a route at /graphql and delegates the handling of that route to the express-graphql middleware that we imported. We pass two objects to the middleware, the mySchema object, and the context object. We're not passing any input query here because this code just prepares the HTTP endpoint, and we will be able to read the input query directly from a URL field. The app.listen() function is the call we need to start our Express.js app. Its first argument is the port to use, and its second argument is a callback we can use after Express.js has started. We can now test our HTTP-mounted GraphQL executor with: ~/graphql-project $ node index.js Connected to MongoDB server Running Express.js on port 3000 In a browser window go to: http://localhost:3000/graphql?query={usersCount} *** #GitTag: chapter1-setting-up-an-http-interface *** The GraphiQL editor The graphqlHTTP() middleware function accepts another property on its parameter object graphiql, let's set it to true: app.use('/graphql', graphqlHTTP({ schema: mySchema, context: { db }, graphiql: true })); When we restart the server now and navigate to http://localhost:3000/graphql, we'll get an instance of the GraphiQL editor running locally on our GraphQL schema: GraphiQL is an interactive playground where we can explore our GraphQL queries and mutations before we officially use them. GraphiQL is written in React and GraphQL, and it runs completely within the browser. GraphiQL has many powerful editor features such as syntax highlighting, code folding, and error highlighting and reporting. Thanks to GraphQL introspective nature, GraphiQL also has intelligent type-ahead of fields, arguments, and types. Put the cursor in the left editor area, and type a selection set: { } Place the cursor inside that selection set and press Ctrl + space. You should see a list of all fields that our GraphQL schema support, which are the three fields that we have defined so far (hello, diceRoll, and usersCount): If Ctrl +space does not work, try Cmd + space, Alt + space, or Shift + space. The __schema and __type fields can be used to introspectively query the GraphQL schema about what fields and types it supports. When we start typing, this list starts to get filtered accordingly. The list respects the context of the cursor, if we place the cursor inside the arguments of diceRoll(), we'll get the only argument we defined for diceRoll, the count argument. Go ahead and read all the root fields that our schema support, and see how the data gets reported on the right side with the formatted JSON object: *** #GitTag: chapter1-the-graphiql-editor *** Summary In this article, we learned how to set up a local MongoDB instance, add some data in there, so that we can access that data through our GraphQL schema. Resources for Article: Further resources on this subject: Apache Solr and Big Data – integration with MongoDB [article] Getting Started with Java Driver for MongoDB [article] Documents and Collections in Data Modeling with MongoDB [article]
Read more
  • 0
  • 0
  • 14694
Unlock access to the largest independent learning library in Tech for FREE!
Get unlimited access to 7500+ expert-authored eBooks and video courses covering every tech area you can think of.
Renews at $19.99/month. Cancel anytime
article-image-laravel-50-essentials
Packt
12 Aug 2016
9 min read
Save for later

Laravel 5.0 Essentials

Packt
12 Aug 2016
9 min read
In this article by Alfred Nutile from the book, Laravel 5.x Cookbook, we will learn the following topics: Setting up Travis to Auto Deploy when all is Passing Working with Your .env File Testing Your App on Production with Behat (For more resources related to this topic, see here.) Setting up Travis to Auto Deploy when all is Passing Level 0 of any work should be getting a deployment workflow setup. What that means in this case is that a push to GitHub will trigger our Continuous Integration (CI). And then from the CI, if the tests are passing, we trigger the deployment. In this example I am not going to hit the URL Forge gives you but I am going to send an Artifact to S3 and then have call CodeDeploy to deploy this Artifact. Getting ready… You really need to see the section before this, otherwise continue knowing this will make no sense. How to do it… Install the travis command line tool in Homestead as noted in their docs https://github.com/travis-ci/travis.rb#installation. Make sure to use Ruby 2.x: sudo apt-get install ruby2.0-dev sudo gem install travis -v 1.8.2 --no-rdoc --no-ri Then in the recipe folder I run the command > travis setup codedeploy I answer all the questions keeping in mind:     The KEY and SECRET are the ones we made of the I AM User in the Section before this     The S3 KEY is the filename not the KEY we used for a above. So in my case I just use the name again of the file latest.zip since it sits inside the recipe-artifact bucket. Finally I open the .travis.yml file, which the above modifies and I update the before-deploy area so the zip command ignores my .env file otherwise it would overwrite the file on the server. How it works… Well if you did the CodeDeploy section before this one you will know this is not as easy as it looks. After all the previous work we are able to, with the one command travis setup codedeploy punch in securely all the needed info to get this passing build to deploy. So after phpunit reports things are passing we are ready. With that said we had to have a lot of things in place, S3 bucket to put the artifact, permission with the KEY and SECRET to access the Artifact and CodeDeploy, and a CodeDeploy Group and Application to deploy to. All of this covered in the previous section. After that it is just the magic of Travis and CodeDeploy working together to make this look so easy. See also… Travis Docs: https://docs.travis-ci.com/user/deployment/codedeploy https://github.com/travis-ci/travis.rb https://github.com/travis-ci/travis.rb#installation Working with Your .env File The workflow around this can be tricky. Going from Local, to TravisCI, to CodeDeploy and then to AWS without storing your info in .env on GitHub can be a challenge. What I will show here are some tools and techniques to do this well. Getting ready…. A base install is fine I will use the existing install to show some tricks around this. How to do it… Minimize using Conventions as much as possible     config/queue.php I can do this to have one or more Queues     config/filesystems.php Use the Config file as much as possible. For example this is in my .env If I add config/marvel.php and then make it look like this My .env can be trimmed down by KEY=VALUES later on I can call to those:    Config::get('marvel.MARVEL_API_VERSION')    Config::get('marvel.MARVEL_API_BASE_URL') Now to easily send to Staging or Production using the EnvDeployer library >composer require alfred-nutile-inc/env-deployer:dev-master Follow the readme.md for that library. Then as it says in the docs setup your config file so that it matches the destination IP/URL and username and path for those services. I end up with this config file config/envdeployer.php Now the trick to this library is you start to enter KEY=VALUES into your .env file stacked on top of each other. For example, my database settings might look like this. so now I can type: >php artisan envdeployer:push production Then this will push over SSH your .env to production and swap out the related @production values for each KEY they are placed above. How it works… The first mindset to follow is conventions before you put a new KEY=VALUE into the .env file set back and figure out defaults and conventions around what you already must have in this file. For example must haves, APP_ENV, and then I always have APP_NAME so those two together do a lot to make databases, queues, buckets and so on. all around those existing KEYs. It really does add up, whether you are working alone or on a team focus on these conventions and then using the config/some.php file workflow to setup defaults. Then libraries like the one I use above that push this info around with ease. Kind of like Heroku you can command line these settings up to the servers as needed. See also… Laravel Validator for the .env file: https://packagist.org/packages/mathiasgrimm/laravel-env-validator Laravel 5 Fundamentals: Environments and Configuration: https://laracasts.com/series/laravel-5-fundamentals/episodes/6 Testing Your App on Production with Behat So your app is now on Production! Start clicking away at hundreds of little and big features so you can make sure everything went okay or better yet run Behat! Behat on production? Sounds crazy but I will cover some tips on how to do this including how to setup some remote conditions and clean up when you are done. Getting ready… Any app will do. In my case I am going to hit production with some tests I made earlier. How to do it… Tag a Behat test @smoke or just a Scenario that you know it is safe to run on Production for example features/home/search.feature. Update behat.yml adding a profile call production. Then run > vendor/bin/behat -shome_ui --tags=@smoke --profile=production I run an Artisan command to run all these Then you will see it hit the production url and only the Scenarios you feel are safe for Behat. Another method is to login as a demo user. And after logging in as that user you can see data that is related to that user only so you can test authenticated level of data and interactions. For example database/seeds/UserTableSeeder.php add the demo user to the run method Then update your .env. Now push that .env setting up to Production.  >php artisan envdeploy:push production Then we update our behat.yml file to run this test even on Production features/auth/login.feature. Now we need to commit our work and push to GitHub so TravisCI can deploy and changes: Since this is a seed and not a migration I need to rerun seeds on production. Since this is a new site, and no one has used it this is fine BUT of course this would have been a migration if I had to do this later in the applications life. Now let's run this test, from our vagrant box > vendor/bin/behat -slogin_ui --profile=production But it fails because I am setting up the start of this test for my local database not the remote database features/bootstrap/LoginPageUIContext.php. So I can basically begin to create a way to setup the state of the world on the remote server. > php artisan make:controller SetupBehatController And update that controller to do the setup. And make the route app/Http/routes.php Then update the behat test features/bootstrap/LoginPageUIContext.php And we should do some cleanup! First add a new method to features/bootstrap/LoginPageUIContext.php. Then add that tag to the Scenarios this is related to features/auth/login.feature Then add the controller like before and route app/Http/Controllers/CleanupBehatController.php Then Push and we are ready test this user with fresh state and then clean up when they are done! In this case I could test editing the Profile from one state to another. How it works… Not to hard! Now we have a workflow that can save us a ton of clicking around Production after every deployment. To begin with I add the tag @smoke to tests I considered safe for production. What does safe mean? Basically read only tests that I know will not effect that site's data. Using the @smoke tag I have a consistent way to make Suites or Scenarios as safe to run on Production. But then I take it a step further and create a way to test authenticated related state. Like make a Favorite or updating a Profile! By using some simple routes and a user I can begin to tests many other things on my long list of features I need to consider after every deploy. All of this happens with the configurability of Behat and how it allows me to manage different Profiles and Suites in the behat.yml file! Lastly I tie into the fact that Behat has hooks. I this case I tie in to the @AfterScenario by adding that to my Annotation. And I add another hooks @profile so it only runs if the Scenario has that Tag. That is it, thanks to Behat, Hooks and how easy it is to make Routes in Laravel I can easily take care of a large percentage of what otherwise would be a tedious process after every deployment! See also… Behat Docus on Hooks—http://docs.behat.org/en/v3.0/guides/3.hooks.html Saucelabs—on behat.yml setting later and you can test your site on numerous devices: https://saucelabs.com/. Summary This article gives a summary of Setting up Travis, working with .env files and Behat.  Resources for Article: Further resources on this subject: CRUD Applications using Laravel 4 [article] Laravel Tech Page [article] Eloquent… without Laravel! [article]
Read more
  • 0
  • 0
  • 25383

article-image-null-7
Packt
11 Aug 2016
3 min read
Save for later

Open Source Project Royalty Scheme

Packt
11 Aug 2016
3 min read
Open Source Project Royalty Scheme Packt believes in Open Source and helping to sustain and support its unique projects and communities. Therefore, when we sell a book written on an Open Source project, we pay a royalty directly to that project. As a result of purchasing one of our Open Source books, Packt will have given some of the money received to the Open Source project. In the long term, we see ourselves and yourselves, as customers and readers of our books, as part of the Open Source ecosystem, providing sustainable revenue for the projects we publish on. Our aim at Packt is to establish publishing royalties as an essential part of the service and support business model that sustains Open Source.   Some of the things people have said about the Open Source Project Royalty Scheme: “Moodle is grateful for the royalty donations that Packt have volunteered to send us as part of their Open Source Project Royalty Scheme. The money donated helps us fund a developer for a few months a year and thus contributes directly towards Moodle core development, support and improvements in the future.”  - Martin Dougiamas, founder of acclaimed Open-Source e-learning software, Moodle. “Most of the money that we've used, donated from Packt has gone towards running jQuery conferences for the community and bringing together the jQuery team to do development work together. The financial contributions have been very valuable and in that regard, have resulted in a team that's able to operate much more efficiently and effectively.”  - John Resig, the founder of the popular JavaScript library, jQuery "The Drupal project and its community have grown sharply over the last couple of years. The support that Packt has shown, through its book royalties and awards, has contributed to that success and helped the project handle its growth. The Drupal Association uses the money that Packt donates on a number of things including, server infrastructure and the organization of events." -Dries Buytaert, founder of renowned Content Management System, Drupal   To read up on the projects that are supported by the Packt Open Source Project Royalty Scheme, click the appropriate categories below: All Open Source Projects Content Management System (CMS) Customer Relationship Management (CRM) e-Commerce e-Learning Networking and Telephony Web Development Web Graphics and Video   Are you part of an Open Source project that Packt has published a book on? Packt believes in Open Source and your project may be able to receive support through the Open Source Project Royalty Scheme. Simply contact Packt: royalty@packtpub.com.
Read more
  • 0
  • 0
  • 2360

article-image-application-logging
Packt
11 Aug 2016
8 min read
Save for later

Application Logging

Packt
11 Aug 2016
8 min read
In this article by Travis Marlette, author of Splunk Best Practices, will cover the following topics: (For more resources related to this topic, see here.) Log messengers Logging formats Within the working world of technology, there are hundreds of thousands of different applications, all (usually) logging in different formats. As Splunk experts, our job is make all those logs speak human, which is often the impossible task. With third-party applications that provide support, sometimes log formatting is out of our control. Take for instance, Cisco or Juniper, or any other leading application manufacturer. We won't be discussing these kinds of logs in this article, but instead the logs that we do have some control over. The logs I am referencing belong to proprietary in-house (also known as "home grown") applications that are often part of the middle-ware, and usually they control some of the most mission-critical services an organization can provide. Proprietary applications can be written in any language, however logging is usually left up to the developers for troubleshooting and up until now the process of manually scraping log files to troubleshoot quality assurance issues and system outages has been very specific. I mean that usually, the developer(s) are the only people that truly understand what those log messages mean. That being said, oftentimes developers write their logs in a way that they can understand them, because ultimately it will be them doing the troubleshooting / code fixing when something breaks severely. As an IT community, we haven't really started taking a look at the way we log things, but instead we have tried to limit the confusion to developers, and then have them help other SME's that provide operational support to understand what is actually happening. This method is successful, however it is slow, and the true value of any SME is reducing any system’s MTTR, and increasing uptime. With any system, the more transactions processed means the larger the scale of a system, which means that, after about 20 machines, troubleshooting begins to get more complex and time consuming with a manual process. This is where something like Splunk can be extremely valuable, however Splunk is only as good as the information that is coming into it. I will say this phrase for the people who haven't heard it yet; "garbage in… garbage out". There are some ways to turn proprietary logging into a powerful tool, and I have personally seen the value of these kinds of logs, after formatting them for Splunk, they turn into a huge asset in an organization’s software life cycle. I'm not here to tell you this is easy, but I am here to give you some good practices about how to format proprietary logs. To do that I'll start by helping you appreciate a very silent and critical piece of the application stack. To developers, a logging mechanism is a very important part of the stack, and the log itself is mission critical. What we haven't spent much time thinking about before log analyzers, is how to make log events/messages/exceptions more machine friendly so that we can socialize the information in a system like Splunk, and start to bridge the knowledge gap between development and operations. The nicer we format the logs, the faster Splunk can reveal the information about our systems, saving everyone time and from headaches. Loggers Here I'm giving some very high level information on loggers. My intention is not to recommend logging tools, but simply to raise awareness of their existence for those that are not in development, and allow for independent research into what they do. With the right developer, and the right Splunker, the logger turns into something immensely valuable to an organization. There is an array of different loggers in the IT universe, and I'm only going to touch on a couple of them here. Keep in mind that I'm only referencing these due to the ease of development I've seen from personal experience, and experiences do vary. I'm only going to touch on three loggers and then move on to formatting, as there are tons of logging mechanisms and the preference truly depends on the developer. Anatomy of a log I'm going to be taking some very broad strokes with the following explanations in order to familiarize you, the Splunker, with the logger. If you would like to learn more information, please either seek out a developer to help you understand the logic better or acquire some education how to develop and log in independent study. There are some pretty basic components to logging that we need to understand to learn which type of data we are looking at. I'll start with the four most common ones: Log events: This is the entirety of the message we see within a log, often starting with a timestamp. The event itself contains all other aspects of application behavior such as fields, exceptions, messages, and so on… think of this as the "container" if you will, for information. Messages: These are often made by the developer of the application and provide some human insight into what's actually happening within an application. The most common messages we see are things like unauthorized login attempt <user> or Connection Timed out to <ip address>. Message Fields: These are the pieces of information that give us the who, where, and when types of information for the application’s actions. They are handed to the logger by the application itself as it either attempts or completes an activity. For instance, in the log event below, the highlighted pieces are what would be fields, and often those that people look for when troubleshooting: "2/19/2011 6:17:46 AM Using 'xplog70.dll' version '2009.100.1600' to execute extended store procedure 'xp_common_1' operation failed to connect to 'DB_XCUTE_STOR'" Exceptions: These are the uncommon, but very important pieces of the log. They are usually only written when something went wrong, and offer developer insight into the root cause at the application layer. They are usually only printed when an error occurs, and used for debugging. These exceptions can print a huge amount of information into the log depending on the developer and the framework. The format itself is not easy and in some cases not even possible for a developer to manage. Log4* This is an open source logger that is often used in middleware applications. Pantheios This is a logger popularly used for Linux, and popular for its performance and multi-threaded handling of logging. Commonly, Pantheios is used for C/C++ applications, but it works with a multitude of frameworks. Logging – logging facility for Python This is a logger specifically for Python, and since Python is becoming more and more popular, this is a very common package used to log Python scripts and applications. Each one of these loggers has their own way of logging, and the value is determined by the application developer. If there is no standardized logging, then one can imagine the confusion this can bring to troubleshooting. Example of a structured log This is an example of a Java exception in a structured log format: When Java prints this exception, it will come in this format and a developer doesn't control what that format is. They can control some aspects about what is included within an exception, though the arrangement of the characters and how it's written is done by the Java framework itself. I mention this last part in order to help operational people understand where the control of a developer sometimes ends. My own personal experience has taught me that attempting to change a format that is handled within the framework itself is an attempt at futility. Pick your battles right? As a Splunker, you can save yourself a headache on this kind of thing. Summary While I say that, I will add an addendum by saying that Splunk, mixed with a Splunk expert and the right development resources, can also make the data I just mentioned extremely valuable. It will likely not happen as fast as they make it out to be at a presentation, and it will take more resources than you may have thought, however at the end of your Splunk journey, you will be happy. This article was to help you understand the importance of logs formatting, and how logs are written. We often don't think about our logs proactively, and I encourage you to do so. Resources for Article: Further resources on this subject: Logging and Monitoring [Article] Logging and Reports [Article] Using Events, Interceptors, and Logging Services [Article]
Read more
  • 0
  • 0
  • 1841

article-image-migrating-version-3
Packt
11 Aug 2016
11 min read
Save for later

Migrating from Version 3

Packt
11 Aug 2016
11 min read
In this article by Matt Lambert, the author of the book Learning Bootstrap 4, has covered how to migrate your Bootstrap 3 project to Version 4 of Bootstrap is a major update. Almost the entire framework has been rewritten to improve code quality, add new components, simplify complex components, and make the tool easier to use overall. We've seen the introduction of new components like Cards and the removal of a number of basic components that weren't heavily used. In some cases, Cards present a better way of assembling a layout than a number of the removed components. Let's jump into this article by showing some specific class and behavioral changes to Bootstrap in version 4. (For more resources related to this topic, see here.) Browser support Before we jump into the component details, let's review the new browser support. If you are currently running on version 3 and support some older browsers, you may need to adjust your support level when migrating to Bootstrap 4. For desktop browsers, Internet Explorer version 8 support has been dropped. The new minimum Internet Explorer version that is supported is version 9. Switching to mobile, iOS version 6 support has been dropped. The minimum iOS supported is now version 7. The Bootstrap team has also added support for Android v5.0 Lollipop's browser and WebView. Earlier versions of the Android Browser and WebView are not officially supported by Bootstrap. Big changes in version 4 Let's continue by going over the biggest changes to the Bootstrap framework in version 4. Switching to Sass Perhaps the biggest change in Bootstrap 4 is the switch from Less to Sass. This will also likely be the biggest migration job you will need to take care of. The good news is you can use the sample code we've created in the book as a starting place. Luckily, the syntax for the two CSS pre-processors is not that different. If you haven't used Sass before, there isn't a huge learning curve that you need to worry about. Let's cover some of the key things you'll need to know when updating your stylesheets for Sass. Updating your variables The main difference in variables is the symbol used to denote one. In Less we use an @ symbol for our variables, while in Sass you use the $ symbol. Here are a couple of examples for you: /* LESS */ @red: #c00; @black: #000; @white: #fff; /* SASS */ $red: #c00; $black: #000; $white: #fff; As you can see, that is pretty easy to do. A simple find and replace should do most of the work for you. However, if you are using @import in your stylesheets, make sure there remains an @ symbol. Updating @import statements Another small change in Sass is how you import different stylesheets using the @import keyword. First, let's take a look at how you do this in Less: @import "components/_buttons.less"; Now let's compare how we do this using Sass: @import "components/_buttons.scss"; As you can see, it's almost identical. You just need to make sure you name all your files with the .scss extension. Then update your file names in the @import to use .scss and not .less. Updating mixins One of the biggest differences between Less and Sass are mixins. Here we'll need to do a little more heavy lifting when we update the code to work for Sass. First, let's take a look at how we would create a border-radius, or round corners, mixin in Less: .border-radius (@radius: 2px) { -moz-border-radius: @radius; -ms-border-radius: @radius; border-radius: @radius; } In Less, all elements that use the border-radius mixin will have a border radius of 2px. That is added to a component, like this: button { .border-radius } Now let's compare how you would do the same thing using Sass. Check out the mixin code: @mixin border-radius($radius) { -webkit-border-radius: $radius; -moz-border-radius: $radius; -ms-border-radius: $radius; border-radius: $radius; } There are a few differences here that you need to note: You need to use the @mixin keyword to initialize any mixin We don't actually define a global value to use with the mixin To use the mixin with a component, you would code it like this: button { @include border-radius(2px); } This is also different from Less in a few ways: First, you need to insert the @include keyword to call the mixin Next, you use the mixin name you defined earlier, in this case, border-radius Finally, you need to set the value for the border-radius for each element, in this case, 2px Personally, I prefer the Less method as you can set the value once and then forget about it. However, since Bootstrap has moved to Sass, we have to learn and use the new syntax. That concludes the main differences that you will likely encounter. There are other differences and if you would like to research them more, I would check out this page: http://sass-lang.com/guide. Additional global changes The change to Sass is one of the bigger global differences in version 4 of Bootstrap. Let's take a look at a few others you should be aware of.  Using REM units In Bootstrap 4, px has been replaced with rem for the primary unit of measure. If you are unfamiliar with rem it stands for root em. Rem is a relative unit of measure where pixels are fixed. Rem looks at the value for font-size on the root element in your stylesheet. It then uses your value declaration, in rems, to determine the computer pixel value. Let's use an example to make this easier to understand: html { font-size: 24px; } p { font-size: 2rem; } In this case, the computed font-size for the <p> tag would be 48px. This is different from the em unit because ems will be affected by wrapping elements that may have a different size. Whereas rem takes a simpler approach and just calculates everything from the root HTML element. It removes the size cascading that can occur when using ems and nested, complicated elements. This may sound confusing, but it is actually easier to use em units. Just remember your root font-size and use that when figuring out your rem values. What this means for migration is that you will need to go through your stylesheet and change any px or em values to use ems. You'll need to recalculate everything to make sure it fits the new format if you want to maintain the same look and feel for your project. Other font updates The trend for a long while has been to make text on a screen larger and easier to read for all users. In the past, we used tons of small typefaces that might have looked cool but were hard to read for anyone visually challenged. To that end, the base font-size for Bootstrap has been changed from 14px to 16px. This is also the standard size for most browsers and makes the readability of text better. Again, from a migration standpoint, you'll need to review your components to ensure they still look correct with the increased font size. You may need to make some changes if you have components that were based off the 14px default font-size in Bootstrap 3. New grid size With the increased use of mobile devices, Bootstrap 4 includes a new smaller grid tier for small screen devices. The new grid tier is called extra small and is configured for devices under 480px in width. For the migration story this shouldn't have a big effect. What it does do is allow you a new breakpoint if you want to further optimize your project for smaller screens. That concludes the main global changes to Bootstrap that you should be aware of when migrating your projects. Next, let's take a look at components. Migrating components With the release of Bootstrap 4, a few components have been dropped and a couple new ones have been added. The most significant change is around the new Cards component. Let's start by breaking down this new option. Migrating to the Cards component With the release of the Cards component, the Panels, Thumbnails, and Wells components have been removed from Bootstrap 4. Cards combines the best of these elements into one and even adds some new functionality that is really useful. If you are migrating from a Bootstrap 3 project, you'll need to update any Panels, Thumbnails, or Wells to use the Cards component instead. Since the markup is a bit different, I would recommend just removing the old components all together, and then recode them using the same content as Cards. Using icon fonts The Glyph-icons icon font has been removed from Bootstrap 4. I'm guessing this is due to licensing reasons as the library was not fully open source. If you don't want to update your icon code, simply download the library from the Glyph-icons website at: http://glyphicons.com/ The other option would be to change the icon library to a different one like Font Awesome. If you go down this route, you'll need to update all of your <i> tags to use the proper CSS class to render the icon. There is a quick reference tool that will allow you to do this called GlyphSearch. This tool supports a number of icon libraries and I use it all the time. Check it out at: http://glyphsearch.com/. Those are the key components you need to be aware of. Next let's go over what's different in JavaScript. Migrating JavaScript The JavaScript components have been totally rewritten in Bootstrap 4. Everything is now coded in ES6 and compiled with Babel which makes it easier and faster to use. On the component size, the biggest difference is the Tooltips component. The Tooltip is now dependant on an external library called Tether, which you can download from: http://github.hubspot.com/tether/. If you are using Tooltips, make sure you include this library in your template. The actual markup looks to be the same for calling a Tooltip but you must include the new library when migrating from version 3 to 4. Miscellaneous migration changes Aside from what I've gone over already, there are a number of other changes you need to be aware of when migrating to Bootstrap 4. Let's go through them all below. Migrating typography The .page-header class has been dropped from version 4. Instead, you should look at using the new display CSS classes on your headers if you want to give them a heading look and feel. Migrating images If you've ever used responsive images in the past, the class name has changed. Previously, the class name was .image-responsive but it is now named .image-fluid. You'll need to update that class anywhere it is used. Migrating tables For the table component, a few class names have changed and there are some new classes you can use. If you would like to create a responsive table, you can now simply add the class .table-responsive to the <table> tag. Previously, you had to wrap the class around the <table> tag. If migrating, you'll need to update your HTML markup to the new format. The .table-condensed class has been renamed to .table-sm. You'll need to update that class anywhere it is used. There are a couple of new table styles you can add called .table-inverse or .table-reflow. Migrating forms Forms are always a complicated component to code. In Bootstrap 4, some of the class names have changed to be more consistent. Here's a list of the differences you need to know about: control-label is now .form-control-label input-lg and .input-sm are now .form-control-lg and .form-control-sm The .form-group class has been dropped and you should instead use .form-control You likely have these classes throughout most of your forms. You'll need to update them anywhere they are used. Migrating buttons There are some minor CSS class name changes that you need to be aware of: btn-default is now .btn-secondary The .btn-xs class has been dropped from Bootstrap 4 Again, you'll need to update these classes when migrating to the new version of Bootstrap. There are some other minor changes when migrating on components that aren't as commonly used. I'm confident my explanation will cover the majority of use cases when using Bootstrap 4. However, if you would like to see the full list of changes, please visit: http://v4-alpha.getbootstrap.com/migration/. Summary That brings this article to a close! Hope you are able to migrate your Bootstrap 3 project to Bootstrap 4. Resources for Article: Further resources on this subject: Responsive Visualizations Using D3.js and Bootstrap [article] Advanced Bootstrap Development Tools [article] Deep Customization of Bootstrap [article]
Read more
  • 0
  • 0
  • 39091
article-image-aws-mobile-developers-getting-started-mobile-hub
Raka Mahesa
11 Aug 2016
5 min read
Save for later

AWS for Mobile Developers - Getting Started with Mobile Hub

Raka Mahesa
11 Aug 2016
5 min read
Amazon Web Services, also known as AWS, is a go-to destination for developers to host their server-side apps. AWS isn’t exactly beginner-friendly, however. So if you're a mobile developer who doesn't have much knowledge in the backend field, AWS, with its countless weirdly-named services, may look like a complex beast. Well, Amazon has decided to rectify that. In late 2015, they rolled out the AWS Mobile Hub, a new dashboard for managing Amazon services on a mobile platform. The most important part about it is that it’s easy to use. AWS Mobile Hub features the following services: User authentication via Amazon Cognito Data, file, and resource storage using Amazon S3 and Cloudfront Backend code with Amazon Lambda Push notification using Amazon SNS Analytics using Amazon Analytics After seeing this list of services, you may still think it's complicated and that you only need one or two services from that list. The good news is that the hub allows you to cherry-pick the service you want instead of forcing you to use all of them. So, if your mobile app only needs to access some files on the Internet, then you can choose to only use the resource storage functionality and skip the other features. So, is AWS Mobile Hub basically a more focused version of the AWS dashboard? Well, it's more than just that. The hub is able to configure the Amazon services that you're going to use, so they're automatically tailored to your app. The hub will then generate Android and iOS codes for connecting to the services you just set up, so that you can quickly copy them to your app and use the services right away. Do note that most of the stuff that was done automatically by the hub can also be achieved by integrating the AWS SDK and configuring each service manually. Fortunately, you can easily add Amazon services that aren’t included in the hub. So, if you want to also use the Amazon DynamoDB service on your app, all you have to do is call the relevant DynamoDB functions from the AWS SDK and you're good to go. All right, enough talking. Let's give the AWS Mobile Hub a whirl! We're going to use the hub for the Android platform, so make sure you satisfy the following requirements: Android Studio v1.2 or above Android 4.4 (API 19) SDK Android SDK Build-tools 23.0.1 Let's start by opening the AWS Mobile Hub to create a new mobile project (you will be asked to log in to your Amazon account if you haven't done so). After entering the project name, you are presented with the hub dashboard, where you can choose the service you want to add to your app. Let's start by adding the User Sign-in functionality. There are a couple of steps that must be completed to configure the authentication service. First you need to figure out whether your app can be used without logging in or not (for example, users can use Imgur without logging in, but they have to log in to use Facebook). If a user doesn't need to be logged in, choose "Sign-in is optional"; otherwise, choose that sign-in is required. The next step is to add the actual authentication method. You can use your own authentication method, but that requires setting up a server, so let's go with a 3rd party authentication instead. When choosing a 3rd party authentication, create a corresponding app on the 3rd party website and then copy the required information to the Mobile Hub dashboard. When that's done, save the changes you made and return to the service picker. Except for the Cloud Logic service, the configurations for the other services are quite straightforward, so let's add User Data Storage and App Content Delivery services. If you want to integrate Cloud Logic, you will be directed to the Amazon Lambda dashboard, where you will need to write a function with JavaScript that will be run on the server. So let's leave it to another time for now. All right, you should be all set up now, so let's proceed with building the base Android app. Click on the build button on the menu to the left and then choose Android. The Hub will then configure all the services you chose earlier and provide you with an Android project that has been integrated with all of the necessary SDK, including the SDK for the 3rd party authentication. It's pretty nice, isn't it? Download the Android project and unzip it, and make note of the "MySampleApp" folder inside it. Fire up Android Studio and import (File > New > Import Project...) that folder. Wait for the project to finish syncing, and once it's done, try running it in your Android device to see if AWS was integrated successfully or not. And that's it. All of the code needed to connect to the Amazon services you have set up earlier can be found in the MySampleApp project. Now you can simply copy that to your actual project or use the project as a base to build the app you want. Check out the build section of the dashboard for a more detailed explanation of the generated codes.  About the author Raka Mahesa is a game developer at Chocoarts (http://chocoarts.com/) who is interested in digital technology in general. Outside of work hours, he likes to work on his own projects, with Corridoom VR (https://play.google.com/store/apps/details?id=com.rakamahesa.corridoom) being his latest released game. Raka also regularly tweets as @legacy99.
Read more
  • 0
  • 0
  • 6856

article-image-go-programming-control-flow
Packt
10 Aug 2016
13 min read
Save for later

Go Programming Control Flow

Packt
10 Aug 2016
13 min read
In this article by Vladimir Vivien author of the book Learning Go programming explains some basic control flow of Go programming language. Go borrows several of the control flow syntax from its C-family of languages. It supports all of the expected control structures including if-else, switch, for-loop, and even goto. Conspicuously absent though are while or do-while statements. The following topics examine Go's control flow elements. Some of which you may already be familiar and others that bring new set of functionalities not found in other languages. The if statement The switch statement The type Switch (For more resources related to this topic, see here.) The If Statement The if-statement, in Go, borrows its basic structural form from other C-like languages. The statement conditionally executes a code-block when the Boolean expression that follows the if keyword which evaluates to true as illustrated in the following abbreviated program that displays information about the world currencies. import "fmt" type Currency struct { Name string Country string Number int } var CAD = Currency{ Name: "Canadian Dollar", Country: "Canada", Number: 124} var FJD = Currency{ Name: "Fiji Dollar", Country: "Fiji", Number: 242} var JMD = Currency{ Name: "Jamaican Dollar", Country: "Jamaica", Number: 388} var USD = Currency{ Name: "US Dollar", Country: "USA", Number: 840} func main() { num0 := 242 if num0 > 100 || num0 < 900 { mt.Println("Currency: ", num0) printCurr(num0) } else { fmt.Println("Currency unknown") } if num1 := 388; num1 > 100 || num1 < 900 { fmt.Println("Currency:", num1) printCurr(num1) } } func printCurr(number int) { if CAD.Number == number { fmt.Printf("Found: %+vn", CAD) } else if FJD.Number == number { fmt.Printf("Found: %+vn", FJD) } else if JMD.Number == number { fmt.Printf("Found: %+vn", JMD) } else if USD.Number == number { fmt.Printf("Found: %+vn", USD) } else { fmt.Println("No currency found with number", number) } } The if statement in Go looks similar to other languages. However, it sheds a few syntactic rules while enforcing new ones. The parentheses, around the test expression, are not necessary. While the following if-statement will compile, it is not idiomatic: if (num0 > 100 || num0 < 900) { fmt.Println("Currency: ", num0) printCurr(num0) } Use instead: if num0 > 100 || num0 < 900 { fmt.Println("Currency: ", num0) printCurr(num0) } The curly braces for the code block are always required. The following snippet will not compile: if num0 > 100 || num0 < 900 printCurr(num0) However, this will compile: if num0 > 100 || num0 < 900 {printCurr(num0)} It is idiomatic, however, to write the if statement on multiple lines (no matter how simple the statement block may be). This encourages good style and clarity. The following snippet will compile with no issues: if num0 > 100 || num0 < 900 {printCurr(num0)} However, the preferred idiomatic layout for the statement is to use multiple lines as follows: if num0 > 100 || num0 < 900 { printCurr(num0) } The if statement may include an optional else block which is executed when the expression in the if block evaluates to false. The code in the else block must be wrapped in curly braces using multiple lines as shown in the following. if num0 > 100 || num0 < 900 { fmt.Println("Currency: ", num0) printCurr(num0) } else { fmt.Println("Currency unknown") } The else keyword may be immediately followed by another if statement forming an if-else-if chain as used in function printCurr() from the source code listed earlier. if CAD.Number == number { fmt.Printf("Found: %+vn", CAD) } else if FJD.Number == number { fmt.Printf("Found: %+vn", FJD) The if-else-if statement chain can grow as long as needed and may be terminated by an optional else statement to express all other untested conditions. Again, this is done in the printCurr() function which tests four conditions using the if-else-if blocks. Lastly, it includes an else statement block to catch any other untested conditions: func printCurr(number int) { if CAD.Number == number { fmt.Printf("Found: %+vn", CAD) } else if FJD.Number == number { fmt.Printf("Found: %+vn", FJD) } else if JMD.Number == number { fmt.Printf("Found: %+vn", JMD) } else if USD.Number == number { fmt.Printf("Found: %+vn", USD) } else { fmt.Println("No currency found with number", number) } } In Go, however, the idiomatic and cleaner way to write such a deep if-else-if code block is to use an expressionless switch statement. This is covered later in the section on SwitchStatement. If Statement Initialization The if statement supports a composite syntax where the tested expression is preceded by an initialization statement. At runtime, the initialization is executed before the test expression is evaluated as illustrated in this code snippet (from the program listed earlier). if num1 := 388; num1 > 100 || num1 < 900 { fmt.Println("Currency:", num1) printCurr(num1) } The initialization statement follows normal variable declaration and initialization rules. The scope of the initialized variables is bound to the if statement block beyond which they become unreachable. This is a commonly used idiom in Go and is supported in other flow control constructs covered in this article. Switch Statements Go also supports a switch statement similarly to that found in other languages such as C or Java. The switch statement in Go achieves multi-way branching by evaluating values or expressions from case clauses as shown in the following abbreviated source code: import "fmt" type Curr struct { Currency string Name string Country string Number int } var currencies = []Curr{ Curr{"DZD", "Algerian Dinar", "Algeria", 12}, Curr{"AUD", "Australian Dollar", "Australia", 36}, Curr{"EUR", "Euro", "Belgium", 978}, Curr{"CLP", "Chilean Peso", "Chile", 152}, Curr{"EUR", "Euro", "Greece", 978}, Curr{"HTG", "Gourde", "Haiti", 332}, ... } func isDollar(curr Curr) bool { var bool result switch curr { default: result = false case Curr{"AUD", "Australian Dollar", "Australia", 36}: result = true case Curr{"HKD", "Hong Kong Dollar", "Hong Koong", 344}: result = true case Curr{"USD", "US Dollar", "United States", 840}: result = true } return result } func isDollar2(curr Curr) bool { dollars := []Curr{currencies[2], currencies[6], currencies[9]} switch curr { default: return false case dollars[0]: fallthrough case dollars[1]: fallthrough case dollars[2]: return true } return false } func isEuro(curr Curr) bool { switch curr { case currencies[2], currencies[4], currencies[10]: return true default: return false } } func main() { curr := Curr{"EUR", "Euro", "Italy", 978} if isDollar(curr) { fmt.Printf("%+v is Dollar currencyn", curr) } else if isEuro(curr) { fmt.Printf("%+v is Euro currencyn", curr) } else { fmt.Println("Currency is not Dollar or Euro") } dol := Curr{"HKD", "Hong Kong Dollar", "Hong Koong", 344} if isDollar2(dol) { fmt.Println("Dollar currency found:", dol) } } The switch statement in Go has some interesting properties and rules that make it easy to use and reason about. Semantically, Go's switch-statement can be used in two contexts: An expression-switch statement A type-switch statement The break statement can be used to escape out of a switch code block early The switch statement can include a default case when no other case expressions evaluate to a match. There can only be one default case and it may be placed anywhere within the switch block. Using Expression Switches Expression switches are flexible and can be used in many contexts where control flow of a program needs to follow multiple path. An expression switch supports many attributes as outlined in the following bullets. Expression switches can test values of any types. For instance, the following code snippet (from the previous program listing) tests values of struct type Curr. func isDollar(curr Curr) bool { var bool result switch curr { default: result = false case Curr{"AUD", "Australian Dollar", "Australia", 36}: result = true case Curr{"HKD", "Hong Kong Dollar", "Hong Koong", 344}: result = true case Curr{"USD", "US Dollar", "United States", 840}: result = true } return result } The expressions in case clauses are evaluated from left to right, top to bottom, until a value (or expression) is found that is equal to that of the switch expression. Upon encountering the first case that matches the switch expression, the program will execute the statements for the case block and then immediately exist the switch block. Unlike other languages, the Go case statement does not need to use a break to avoid falling through the next case. For instance, calling isDollar(Curr{"HKD", "Hong Kong Dollar", "Hong Koong", 344}) will match the second case statement in the function above. The code will set result to true and exist the switch code block immediately. Case clauses can have multiple values (or expressions) separated by commas with logical OR operator implied between them. For instance, in the following snippet, the switch expression curr is tested against values currencies[2], currencies[4], or currencies[10] using one case clause until a match is found. func isEuro(curr Curr) bool { switch curr { case currencies[2], currencies[4], currencies[10]: return true default: return false } } The switch statement is the cleaner and preferred idiomatic approach to writing complex conditional statements in Go. This is evident when the snippet above is compared to the following which does the same comparison using if statements. func isEuro(curr Curr) bool { if curr == currencies[2] || curr == currencies[4], curr == currencies[10]{ return true }else{ return false } } Fallthrough Cases There is no automatic fall through in Go's case clause as it does in the C or Java switch statements. Recall that a switch block that will exit after executing its first matching case. The code must explicitly place the fallthrough keyword, as the last statement in a case block, to force the execution flow to fall through the successive case block. The following code snippet shows a switch statement with a fallthrough in each case block. func isDollar2(curr Curr) bool { switch curr { case Curr{"AUD", "Australian Dollar", "Australia", 36}: fallthrough case Curr{"HKD", "Hong Kong Dollar", "Hong Koong", 344}: fallthrough case Curr{"USD", "US Dollar", "United States", 840}: return true default: return false } } When a case is matched, the fallthrough statements cascade down to the first statement of the successive case block. So if curr = Curr{"AUD", "Australian Dollar", "Australia", 36}, the first case will be matched. Then the flow cascades down to the first statement of the second case block which is also a fallthrough statement. This causes the first statement, return true, of the third case block to execute. This is functionally equivalent to following snippet. switch curr { case Curr{"AUD", "Australian Dollar", "Australia", 36}, Curr{"HKD", "Hong Kong Dollar", "Hong Koong", 344}, Curr{"USD", "US Dollar", "United States", 840}: return true default: return false } Expressionless Switches Go supports a form of the switch statement that does not specify an expression. In this format, each case expression must evaluate to a Boolean value true. The following abbreviated source code illustrates the uses of an expressionless switch statement as listed in function find(). The function loops through the slice of Curr values to search for a match based on field values in the struct passed in: import ( "fmt" "strings" ) type Curr struct { Currency string Name string Country string Number int } var currencies = []Curr{ Curr{"DZD", "Algerian Dinar", "Algeria", 12}, Curr{"AUD", "Australian Dollar", "Australia", 36}, Curr{"EUR", "Euro", "Belgium", 978}, Curr{"CLP", "Chilean Peso", "Chile", 152}, ... } func find(name string) { for i := 0; i < 10; i++ { c := currencies[i] switch { case strings.Contains(c.Currency, name), strings.Contains(c.Name, name), strings.Contains(c.Country, name): fmt.Println("Found", c) } } } Notice in the previous example, the switch statement in function find() does not include an expression. Each case expression is separated by a comma and must be evaluated to a Boolean value with an implied OR operator between each case. The previous switch statement is equivalent to the following use of if statement to achieve the same logic. func find(name string) { for i := 0; i < 10; i++ { c := currencies[i] if strings.Contains(c.Currency, name) || strings.Contains(c.Name, name) || strings.Contains(c.Country, name){ fmt.Println("Found", c) } } } Switch Initializer The switch keyword may be immediately followed by a simple initialization statement where variables, local to the switch code block, may be declared and initialized. This convenient syntax uses a semicolon between the initializer statement and the switch expression to declare variables which may appear anywhere in the switch code block. The following code sample shows how this is done by initializing two variables name and curr as part of the switch declaration. func assertEuro(c Curr) bool { switch name, curr := "Euro", "EUR"; { case c.Name == name: return true case c.Currency == curr: return true } return false } The previous code snippet uses an expressionless switch statement with an initializer. Notice the trailing semicolon to indicate the separation between the initialization statement and the expression area for the switch. In the example, however, the switch expression is empty. Type Switches Given Go's strong type support, it should be of little surprise that the language supports the ability to query type information. The type switch is a statement that uses the Go interface type to compare underlying type information of values (or expressions). A full discussion on interface types and type assertion is beyond the scope of this section. For now all you need to know is that Go offers the type interface{}, or empty interface, as a super type that is implemented by all other types in the type system. When a value is assigned type interface{}, it can be queried using the type switch as, shown in function findAny() in following code snippet, to query information about its underlying type. func find(name string) { for i := 0; i < 10; i++ { c := currencies[i] switch { case strings.Contains(c.Currency, name), strings.Contains(c.Name, name), strings.Contains(c.Country, name): fmt.Println("Found", c) } } } func findNumber(num int) { for _, curr := range currencies { if curr.Number == num { fmt.Println("Found", curr) } } } func findAny(val interface{}) { switch i := val.(type) { case int: findNumber(i) case string: find(i) default: fmt.Printf("Unable to search with type %Tn", val) } } func main() { findAny("Peso") findAny(404) findAny(978) findAny(false) } The function findAny() takes an interface{} as its parameter. The type switch is used to determine the underlying type and value of the variable val using the type assertion expression: switch i := val.(type) Notice the use of the keyword type in the type assertion expression. Each case clause will be tested against the type information queried from val.(type). Variable i will be assigned the actual value of the underlying type and is used to invoke a function with the respective value. The default block is invoked to guard against any unexpected type assigned to the parameter val parameter. Function findAny may then be invoked with values of diverse types, as shown in the following code snippet. findAny("Peso") findAny(404) findAny(978) findAny(false) Summary This article gave a walkthrough of the mechanism of control flow in Go including if, switch statements. While Go’s flow control constructs appear simple and easy to use, they are powerful and implement all branching primitives expected for a modern language. Resources for Article: Further resources on this subject: Game Development Using C++ [Article] Boost.Asio C++ Network Programming [Article] Introducing the Boost C++ Libraries [Article]
Read more
  • 0
  • 0
  • 12171

article-image-third-dimension
Packt
10 Aug 2016
13 min read
Save for later

The Third Dimension

Packt
10 Aug 2016
13 min read
In this article by Sebastián Di Giuseppe, author of the book, Building a 3D game with LibGDX, describes about how to work in 3 dimensions! For which we require new camera techniques. The third dimension adds a new axis, instead of having just the x and y grid, a slightly different workflow, and lastly new render methods are required to draw our game. We'll learn the very basics of this workflow in this article for you to have a sense of what's coming, like moving, scaling, materials, environment, and some others and we are going to move systematically between them one step at a time. (For more resources related to this topic, see here.) The following topics will be covered in this article: Camera techniques Workflow LibGDX's 3D rendering API Math Camera techniques The goal of this article is to successfully learn about working with 3D as stated. In order to achieve this we will start at the basics, making a simple first person camera. We will facilitate the functions and math that LibGDX contains. Since you probably have used LibGDX more than once, you should be familiar with the concepts of the camera in 2D. The way 3D works is more or less the same, except there is a z axis now for the depth . However instead of an OrthographicCamera class, a PerspectiveCamera class is used to set up the 3D environment. Creating a 3D camera is just as easy as creating a 2D camera. The constructor of a PerspectiveCamera class requires three arguments, the field of vision, camera width and camera height. The camera width and height are known from 2D cameras, the field of vision is new. Initialization of a PerspectiveCamera class looks like this: float FoV = 67; PerspectiveCamera camera = new PerspectiveCamera(FoV, Gdx.graphics.getWidth(), Gdx.graphics.getHeight()); The first argument, field of vision, describes the angle the first person camera can see. The image above gives a good idea what the field of view is. For first person shooters values up to 100 are used. Higher than 100 confuses the player, and with a lower field of vision the player is bound to see less. Displaying a texture. We will start by doing something exciting, drawing a cube on the screen! Drawing a cube First things first! Let's create a camera. Earlier, we showed the difference between the 2D camera and the 3D camera, so let's put this to use. Start by creating a new class on your main package (ours is com.deeep.spaceglad) and name it as you like. The following imports are used on our test: import com.badlogic.gdx.ApplicationAdapter; import com.badlogic.gdx.Gdx; import com.badlogic.gdx.graphics.Color; import com.badlogic.gdx.graphics.GL20; import com.badlogic.gdx.graphics.PerspectiveCamera; import com.badlogic.gdx.graphics.VertexAttributes; import com.badlogic.gdx.graphics.g3d.*; import com.badlogic.gdx.graphics.g3d.attributes.ColorAttribute; import com.badlogic.gdx.graphics.g3d.environment.DirectionalLight; import com.badlogic.gdx.graphics.g3d.utils.ModelBuilder; Create a class member called cam of type PerspectiveCamera; public PerspectiveCamera cam; Now this camera needs to be initialized and needs to be configured. This will be done in the create method as shown below. public void create() { cam = new PerspectiveCamera(67, Gdx.graphics.getWidth(), Gdx.graphics.getHeight()); cam.position.set(10f, 10f, 10f); cam.lookAt(0,0,0); cam.near = 1f; cam.far = 300f; cam.update(); } In the above code snippet we are setting the position of the camera, and looking towards a point set at 0, 0, 0 . Next up, is getting a cube ready to draw. In 2D it was possible to draw textures, but textures are flat. In 3D, models are used. Later on we will import those models. But we will start with generated models. LibGDX offers a convenient class to build simple models such as: spheres, cubes, cylinders, and many more to choose from. Let's add two more class members, a Model and a ModelInstance. The Model class contains all the information on what to draw, and the resources that go along with it. The ModelInstance class has information on the whereabouts of the model such as the location rotation and scale of the model. public Model model; public ModelInstance instance; Add those class members. We use the overridden create function to initialize our new class members. public void create() { … ModelBuilder modelBuilder = new ModelBuilder();Material mat = new Material(ColorAttribute.createDiffuse(Color.BLUE));model = modelBuilder.createBox(5, 5, 5, mat, VertexAttributes.Usage.Position | VertexAttributes.Usage.Normal);instance = new ModelInstance(model); } We use a ModelBuilder class to create a box. The box will need a material, a color. A material is an object that holds different attributes. You could add as many as you would like. The attributes passed on to the material changes the way models are perceived and shown on the screen. We could, for example, add FloatAttribute.createShininess(8f) after the ColorAttribute class, that will make the box to shine with lights around. There are more complex configurations possible but we will leave that out of the scope for now. With the ModelBuilder class, we create a box of (5, 5, 5). Then we pass the material in the constructor, and the fifth argument are attributes for the specific box we are creating. We use a bitwise operator to combine a position attribute and a normal attribute. We tell the model that it has a position, because every cube needs a position, and the normal is to make sure the lighting works and the cube is drawn as we want it to be drawn. These attributes are passed down to openGL on which LibGDX is build. Now we are almost ready for drawing our first cube. Two things are missing, first of all: A batch to draw to. When designing 2D games in LibGDX a SpriteBatch class is used. However since we are not using sprites anymore, but rather models, we will use a ModelBatch class. Which is the equivalent for models. And lastly, we will have to create an environment and add lights to it. For that we will need two more class members: public ModelBatchmodelBatch; public Environment environment; And they are to be initialized, just like the other class members: public void create() { .... modelBatch = new ModelBatch(); environment = new Environment(); environment.set(new ColorAttribute(ColorAttribute.AmbientLight, 0.4f, 0.4f, 0.4f, 1f)); environment.add(new DirectionalLight().set(0.8f, 0.8f, 0.8f, - 1f, -0.8f, -0.2f)); } Here we add two lights, an ambient light, which lights up everything that is being drawn (a general light source for all the environment), and a directional light, which has a direction (most similar to a "sun" type of source). In general, for lights, you can experiment directions, colors, and different types. Another type of light would be PointLight and it can be compared to a flashlight. Both lights start with 3 arguments, for the color, which won't make a difference yet as we don't have any textures. The directional lights constructor is followed by a direction. This direction can be seen as a vector. Now we are all set to draw our environment and the model in it @Override public void render() { Gdx.gl.glViewport(0, 0, Gdx.graphics.getWidth(), Gdx.graphics.getHeight()); Gdx.gl.glClear(GL20.GL_COLOR_BUFFER_BIT | GL20.GL_DEPTH_BUFFER_BIT); modelBatch.begin(cam); modelBatch.render(instance, environment); modelBatch.end(); } It directly renders our cube. The ModelBatch catch behaves just like a SpriteBatch, as can be seen if we run it, it has to be started (begin), then ask for it to render and give them the parameters (models and environment in our case), and then make it stop. We should not forget to release any resources that our game allocated. The model we created allocates memory that should be disposed of. @Override public void dispose() { model.dispose(); } Now we can look at our beautiful cube! It's only very static and empty. We will add some movement to it in our next subsection! Translation Translating rotating and scaling are a bit different to that of a 2D game. It's slightly more mathematical. The easier part are vectors, instead of a vector2D, we can now use a vector3D, which is essentially the same, just that, it adds another dimension. Let's look at some basic operations of 3D models. We will use the cube that we previously created. With translation we are able to move the model along all three the axis. Let's create a function that moves our cube along the x axis. We add a member variable to our class to store the position in for now. A Vector3 class. Vector3 position = new Vector3(); private void movement() { instance.transform.getTranslation(position); position.x += Gdx.graphics.getDeltaTime(); instance.transform.setTranslation(position); } The above code snippet retrieves the translation, adds the delta time to the x attribute of the translation. Then we set the translation of the ModelInstance. The 3D library returns the translation a little bit different than normally. We pass a vector, and that vector gets adjusted to the current state of the object. We have to call this function every time the game updates. So therefore we put it in our render loop before we start drawing. @Override public void render() { movement(); ... } It might seem like the cube is moving diagonally, but that's because of the angle of our camera. In fact it's' moving towards one face of the cube. That was easy! It's only slightly annoying that it moves out of bounds after a short while. Therefor we will change the movement function to contain some user input handling. private void movement() { instance.transform.getTranslation(position); if(Gdx.input.isKeyPressed(Input.Keys.W)){ position.x+=Gdx.graphics.getDeltaTime(); } if(Gdx.input.isKeyPressed(Input.Keys.D)){ position.z+=Gdx.graphics.getDeltaTime(); } if(Gdx.input.isKeyPressed(Input.Keys.A)){ position.z-=Gdx.graphics.getDeltaTime(); } if(Gdx.input.isKeyPressed(Input.Keys.S)){ position.x-=Gdx.graphics.getDeltaTime(); } instance.transform.setTranslation(position); } The rewritten movement function retrieves our position, updates it based on the keys that are pressed, and sets the translation of our model instance. Rotation Rotation is slightly different from 2D. Since there are multiple axes on which we can rotate, namely the x, y, and z axis. We will now create a function to showcase the rotation of the model. First off let us create a function in which  we can rotate an object on all axis private void rotate() { if (Gdx.input.isKeyPressed(Input.Keys.NUM_1)) instance.transform.rotate(Vector3.X, Gdx.graphics.getDeltaTime() * 100); if (Gdx.input.isKeyPressed(Input.Keys.NUM_2)) instance.transform.rotate(Vector3.Y, Gdx.graphics.getDeltaTime() * 100); if (Gdx.input.isKeyPressed(Input.Keys.NUM_3)) instance.transform.rotate(Vector3.Z, Gdx.graphics.getDeltaTime() * 100); } And let's not forget to call this function from the render loop, after we call the movement function @Override public void render() { ... rotate(); } If we press the number keys 1, 2 or 3, we can rotate our model. The first argument of the rotate function is the axis to rotate on. The second argument is the amount to rotate. These functions are to add a rotation. We can also set the value of an axis, instead of add a rotation, with the following function: instance.transform.setToRotation(Vector3.Z, Gdx.graphics.getDeltaTime() * 100); However say, we want to set all three axis rotations at the same time, we can't simply call setToRotation function three times in a row for each axis, as they eliminate any other rotation done before that. Luckily LibGDX has us covered with a function that is able to take all three axis. float rotation; private void rotate() { rotation = (rotation + Gdx.graphics.getDeltaTime() * 100) % 360; instance.transform.setFromEulerAngles(0, 0, rotation); } The above function will continuously rotate our cube. We face one last problem. We can't seem to move the cube! The setFromEulerAngles function clears all the translation and rotation properties. Lucky for us the setFromEulerAngles returns a Matrix4 type, so we can chain and call another function from it. A function which translates the matrix for example. For that we use the trn(x,y,z) function. Short for translate. Now we can update our rotation function, although it also translates. instance.transform.setFromEulerAngles(0, 0, rotation).trn(position.x, position.y, position.z); Now we can set our cube to a rotation, and translate it! These are the most basic operations which we will use a lot throughout the book. As you can see this function does both the rotation and the translation. So we can remove the last line in our movement function instance.transform.setTranslation(position); Our latest rotate function looks like the following: private void rotate() { rotation = (rotation + Gdx.graphics.getDeltaTime() * 100) % 360; instance.transform.setFromEulerAngles(0, 0, rotation).trn(position.x, position.y, position.z); } The setFromEulerAngles function will be extracted to a function of its own, as it serves multiple purposes now and is not solely bound to our rotate function. private void updateTransformation(){ instance.transform.setFromEulerAngles(0, 0, rotation).trn(position.x, position.y, position.z).scale(scale,scale,scale); } This function should be called after we've calculated our rotation and translation public void render() { rotate(); movement(); updateTransformation(); ... } Scaling We've almost had all of the transformations we can apply to models. The last one being described in this book is the scaling of a model. LibGDX luckily contains all the required functions and methods for this. Let's extend our previous example and make our box growing and shrinking over time. We first create a function that increments and subtracts from a scale variable. boolean increment;float scale = 1; void scale(){ if(increment) { scale = (scale + Gdx.graphics.getDeltaTime()/5); if (scale >= 1.5f) { increment = false; } else { scale = (scale - Gdx.graphics.getDeltaTime()/5); if(scale <= 0.5f) increment = true; } } Now to apply this scaling we can adjust our updateTransformation function to include the scaling. private void updateTransformation(){ instance.transform.setFromEulerAngles(0, 0, rotation).trn(position.x, position.y, position.z).scale(scale,scale,scale); } Our render method should now include the scaling function as well public void render() { rotate(); movement(); scale(); updateTransformation(); ... } And there you go, we can now successfully move, rotate and scale our cube! Summary In this article we learned about the workflow of LibGDX 3D API. We are now able to apply multiple kinds of transformations to a model, and understand the differences between 2D and 3D. We also learned how to apply materials to models, which will change the appearance of the model and lets us create cool effects. Note that there's plenty more information that you can learn about 3D and a lot of practice to go with it to fully understand it. There's also subjects not covered here, like how to create your own materials, and how to make and use of shaders. There's plenty room for learning and experimenting. In the next article we will start on applying the theory that's learned in this article, and start working towards an actual game! We will also go more in depth on the environment and lights, as well as collision detection. So plenty to look forward to. Resources for Article: Further resources on this subject: 3D Websites [Article] Your 3D World [Article] Using 3D Objects [Article]
Read more
  • 0
  • 0
  • 35025
article-image-consistency-conflicts
Packt
10 Aug 2016
11 min read
Save for later

Consistency Conflicts

Packt
10 Aug 2016
11 min read
In this article by Robert Strickland, author of the book Cassandra 3.x High Availability - Second Edition, we will discuss how for any given call, it is possible to achieve either strong consistency or eventual consistency. In the former case, we can know for certain that the copy of the data that Cassandra returns will be the latest. In the case of eventual consistency, the data returned may or may not be the latest, or there may be no data returned at all if the node is unaware of newly inserted data. Under eventual consistency, it is also possible to see deleted data if the node you're reading from has not yet received the delete request. (For more resources related to this topic, see here.) Depending on the read_repair_chance setting and the consistency level chosen for the read operation, Cassandra might block the client and resolve the conflict immediately, or this might occur asynchronously. If data in conflict is never requested, the system will resolve the conflict the next time nodetool repair is run. How does Cassandra know there is a conflict? Every column has three parts: key, value, and timestamp. Cassandra follows last-write-wins semantics, which means that the column with the latest timestamp always takes precedence. Now, let's discuss one of the most important knobs a developer can turn to determine the consistency characteristics of their reads and writes. Consistency levels On every read and write operation, the caller must specify a consistency level, which lets Cassandra know what level of consistency to guarantee for that one call. The following table details the various consistency levels and their effects on both read and write operations: Consistency level Reads Writes ANY This is not supported for reads. Data must be written to at least one node, but permits writes via hinted handoff. Effectively allows a write to any node, even if all nodes containing the replica are down. A subsequent read might be impossible if all replica nodes are down. ONE The replica from the closest node will be returned. Data must be written to at least one replica node (both commit log and memtable). Unlike ANY, hinted handoff writes are not sufficient. TWO The replicas from the two closest nodes will be returned. The same as ONE, except two replicas must be written. THREE The replicas from the three closest nodes will be returned. The same as ONE, except three replicas must be written. QUORUM Replicas from a quorum of nodes will be compared, and the replica with the latest timestamp will be returned. Data must be written to a quorum of replica nodes (both commit log and memtable) in the entire cluster, including all data centers. SERIAL Permits reading uncommitted data as long as it represents the current state. Any uncommitted transactions will be committed as part of the read. Similar to QUORUM, except that writes are conditional based on the support for lightweight transactions. LOCAL_ONE Similar to ONE, except that the read will be returned by the closest replica in the local data center. Similar to ONE, except that the write must be acknowledged by at least one node in the local data center. LOCAL_QUORUM Similar to QUORUM, except that only replicas in the local data center are compared. Similar to QUORUM, except the quorum must only be met using the local data center. LOCAL_SERIAL Similar to SERIAL, except only local replicas are used. Similar to SERIAL, except only writes to local replicas must be acknowledged. EACH_QUORUM The opposite of LOCAL_QUORUM; requires each data center to produce a quorum of replicas, then returns the replica with the latest timestamp. The opposite of LOCAL_QUORUM; requires a quorum of replicas to be written in each data center. ALL Replicas from all nodes in the entire cluster (including all data centers) will be compared, and the replica with the latest timestamp will be returned. Data must be written to all replica nodes (both commit log and memtable) in the entire cluster, including all data centers. As you can see, there are numerous combinations of read and write consistency levels, all with different ultimate consistency guarantees. To illustrate this point, let's assume that you would like to guarantee absolute consistency for all read operations. On the surface, it might seem as if you would have to read with a consistency level of ALL, thus sacrificing availability in the case of node failure. But there are alternatives depending on your use case. There are actually two additional ways to achieve strong read consistency: Write with consistency level of ALL: This has the advantage of allowing the read operation to be performed using ONE, which lowers the latency for that operation. On the other hand, it means the write operation will result in UnavailableException if one of the replica nodes goes offline. Read and write with QUORUM or LOCAL_QUORUM: Since QUORUM and LOCAL_QUORUM both require a majority of nodes, using this level for both the write and the read will result in a full consistency guarantee (in the same data center when using LOCAL_QUORUM), while still maintaining availability during a node failure. You should carefully consider each use case to determine what guarantees you actually require. For example, there might be cases where a lost write is acceptable, or occasions where a read need not be absolutely current. At times, it might be sufficient to write with a level of QUORUM, then read with ONE to achieve maximum read performance, knowing you might occasionally and temporarily return stale data. Cassandra gives you this flexibility, but it's up to you to determine how to best employ it for your specific data requirements. A good rule of thumb to attain strong consistency is that the read consistency level plus write consistency level should be greater than the replication factor. If you are unsure about which consistency levels to use for your specific use case, it's typically safe to start with LOCAL_QUORUM (or QUORUM for a single data center) reads and writes. This configuration offers strong consistency guarantees and good performance while allowing for the inevitable replica failure. It is important to understand that even if you choose levels that provide less stringent consistency guarantees, Cassandra will still perform anti-entropy operations asynchronously in an attempt to keep replicas up to date. Repairing data Cassandra employs a multifaceted anti-entropy mechanism that keeps replicas in sync. Data repair operations generally fall into three categories: Synchronous read repair: When a read operation requires comparing multiple replicas, Cassandra will initially request a checksum from the other nodes. If the checksum doesn't match, the full replica is sent and compared with the local version. The replica with the latest timestamp will be returned and the old replica will be updated. This means that in normal operations, old data is repaired when it is requested. Asynchronous read repair: Each table in Cassandra has a setting called read_repair_chance (as well as its related setting, dclocal_read_repair_chance), which determines how the system treats replicas that are not compared during a read. The default setting of 0.1 means that 10 percent of the time, Cassandra will also repair the remaining replicas during read operations. Manually running repair: A full repair (using nodetool repair) should be run regularly to clean up any data that has been missed as part of the previous two operations. At a minimum, it should be run once every gc_grace_seconds, which is set in the table schema and defaults to 10 days. One might ask what the consequence would be of failing to run a repair operation within the window specified by gc_grace_seconds. The answer relates to Cassandra's mechanism to handle deletes. As you might be aware, all modifications (or mutations) are immutable, so a delete is really just a marker telling the system not to return that record to any clients. This marker is called a tombstone. Cassandra performs garbage collection on data marked by a tombstone each time a compaction occurs. If you don't run the repair, you risk deleted data reappearing unexpectedly. In general, deletes should be avoided when possible as the unfettered buildup of tombstones can cause significant issues. In the course of normal operations, Cassandra will repair old replicas when their records are requested. Thus, it can be said that read repair operations are lazy, such that they only occur when required. With all these options for replication and consistency, it can seem daunting to choose the right combination for a given use case. Let's take a closer look at this balance to help bring some additional clarity to the topic. Balancing the replication factor with consistency There are many considerations when choosing a replication factor, including availability, performance, and consistency. Since our topic is high availability, let's presume your desire is to maintain data availability in the case of node failure. It's important to understand exactly what your failure tolerance is, and this will likely be different depending on the nature of the data. The definition of failure is probably going to vary among use cases as well, as one case might consider data loss a failure, whereas another accepts data loss as long as all queries return. Achieving the desired availability, consistency, and performance targets requires coordinating your replication factor with your application's consistency level configurations. In order to assist you in your efforts to achieve this balance, let's consider a single data center cluster of 10 nodes and examine the impact of various configuration combinations (where RF corresponds to the replication factor): RF Write CL Read CL Consistency Availability Use cases 1 ONE QUORUM ALL ONE QUORUM ALL Consistent Doesn't tolerate any replica loss Data can be lost and availability is not critical, such as analysis clusters 2 ONE ONE Eventual Tolerates loss of one replica Maximum read performance and low write latencies are required, and sometimes returning stale data is acceptable 2 QUORUM ALL ONE Consistent Tolerates loss of one replica on reads, but none on writes Read-heavy workloads where some downtime for data ingest is acceptable (improves read latencies) 2 ONE QUORUM ALL Consistent Tolerates loss of one replica on writes, but none on reads Write-heavy workloads where read consistency is more important than availability 3 ONE ONE Eventual Tolerates loss of two replicas Maximum read and write performance are required, and sometimes returning stale data is acceptable 3 QUORUM ONE Eventual Tolerates loss of one replica on write and two on reads Read throughput and availability are paramount, while write performance is less important, and sometimes returning stale data is acceptable 3 ONE QUORUM Eventual Tolerates loss of two replicas on write and one on reads Low write latencies and availability are paramount, while read performance is less important, and sometimes returning stale data is acceptable 3 QUORUM QUORUM Consistent Tolerates loss of one replica Consistency is paramount, while striking a balance between availability and read/write performance 3 ALL ONE Consistent Tolerates loss of two replicas on reads, but none on writes Additional fault tolerance and consistency on reads is paramount at the expense of write performance and availability 3 ONE ALL Consistent Tolerates loss of two replicas on writes, but none on reads Low write latencies and availability are paramount, but read consistency must be guaranteed at the expense of performance and availability 3 ANY ONE Eventual Tolerates loss of all replicas on write and two on read Maximum write and read performance and availability are paramount, and often returning stale data is acceptable (note that hinted writes are less reliable than the guarantees offered at CL ONE) 3 ANY QUORUM Eventual Tolerates loss of all replicas on write and one on read Maximum write performance and availability are paramount, and sometimes returning stale data is acceptable 3 ANY ALL Consistent Tolerates loss of all replicas on writes, but none on reads Write throughput and availability are paramount, and clients must all see the same data, even though they might not see all writes immediately There are also two additional consistency levels, SERIAL and LOCAL_SERIAL, which can be used to read the latest value, even if it is part of an uncommitted transaction. Otherwise, they follow the semantics of QUORUM and LOCAL_QUORUM, respectively. As you can see, there are numerous possibilities to consider when choosing these values, especially in a scenario involving multiple data centers. This discussion will give you greater confidence as you design your applications to achieve the desired balance. Summary In this article, we introduced the foundational concept of consistency. In our discussion, we outlined the importance of the relationship between replication factor and consistency level, and their impact on performance, data consistency, and availability. Resources for Article: Further resources on this subject: Cassandra Design Patterns [Article] Cassandra Architecture [Article] About Cassandra [Article]
Read more
  • 0
  • 0
  • 3362

article-image-expanding-your-data-mining-toolbox
Packt
09 Aug 2016
15 min read
Save for later

Expanding Your Data Mining Toolbox

Packt
09 Aug 2016
15 min read
In this article by Megan Squire, author of Mastering Data Mining with Python, when faced with sensory information, human beings naturally want to find patterns to explain, differentiate, categorize, and predict. This process of looking for patterns all around us is a fundamental human activity, and the human brain is quite good at it. With this skill, our ancient ancestors became better at hunting, gathering, cooking, and organizing. It is no wonder that pattern recognition and pattern prediction were some of the first tasks humans set out to computerize, and this desire continues in earnest today. Depending on the goals of a given project, finding patterns in data using computers nowadays involve database systems, artificial intelligence, statistics, information retrieval, computer vision, and any number of other various subfields of computer science, information systems, mathematics, or business, just to name a few. No matter what we call this activity – knowledge discovery in databases, data mining, data science – its primary mission is always to find interesting patterns. (For more resources related to this topic, see here.) Despite this humble-sounding mission, data mining has existed for long enough and has built up enough variation in how it is implemented that it has now become a large and complicated field to master. We can think of a cooking school, where every beginner chef is first taught how to boil water and how to use a knife before moving to more advanced skills, such as making puff pastry or deboning a raw chicken. In data mining, we also have common techniques that even the newest data miners will learn: how to build a classifier and how to find clusters in data. The aim is to teach you some of the techniques you may not have seen yet in earlier data mining projects. In this article, we will cover the following topics: What is data mining? We will situate data mining in the growing field of other similar concepts, and we will learn a bit about the history of how this discipline has grown and changed. How do we do data mining? Here, we compare several processes or methodologies commonly used in data mining projects. What are the techniques used in data mining? In this article, we will summarize each of the data analysis techniques that are typically included in a definition of data mining. How do we set up a data mining work environment? Finally, we will walk through setting up a Python-based development environment. What is data mining? We explained earlier that the goal of data mining is to find patterns in data, but this oversimplification falls apart quickly under scrutiny. After all, could we not also say that finding patterns is the goal of classical statistics, or business analytics, or machine learning, or even the newer practices of data science or big data? What is the difference between data mining and all of these other fields, anyway? And while we are at it, why is it called data mining if what we are really doing is mining for patterns? Don't we already have the data? It was apparent from the beginning that the term data mining is indeed fraught with many problems. The term was originally used as something of a pejorative by statisticians who cautioned against going on fishing expeditions, where a data analyst is casting about for patterns in data without forming proper hypotheses first. Nonetheless, the term rose to prominence in the 1990s, as the popular press caught wind of exciting research that was marrying the mature field of database management systems with the best algorithms from machine learning and artificial intelligence. The inclusion of the word mining inspires visions of a modern-day Gold Rush, in which the persistent and intrepid miner will discover (and perhaps profit from) previously hidden gems. The idea that data itself could be a rare and precious commodity was immediately appealing to the business and technology press, despite efforts by early pioneers to promote more the holistic term knowledge discovery in databases (KDD). The term data mining persisted, however, and ultimately some definitions of the field attempted to re-imagine the term data mining to refer to just one of the steps in a longer, more comprehensive knowledge discovery process. Today, data mining and KDD are considered very similar, closely related terms. What about other related terms, such as machine learning, predictive analytics, big data, and data science? Are these the same as data mining or KDD? Let's draw some comparisons between each of these terms: Machine learning is a very specific subfield of computer science that focuses on developing algorithms that can learn from data in order to make predictions. Many data mining solutions will use techniques from machine learning, but not all data mining is trying to make predictions or learn from data. Sometimes we just want to find a pattern in the data. In fact, in this article we will be exploring a few data mining solutions that do use machine learning techniques, and many more that do not. Predictive analytics, sometimes just called analytics, is a general term for computational solutions that attempt to make predictions from data in a variety of domains. We can think of the terms business analytics, media analytics, and so on. Some, but not all, predictive analytics solutions will use machine learning techniques to perform their predictions. But again, in data mining, we are not always interested in prediction. Big data is a term that refers to the problems and solutions of dealing with very large sets of data, irrespective of whether we are searching for patterns in that data, or simply storing it. In terms of comparing big data to data mining, many data mining problems are made more interesting when the data sets are large, so solutions discovered for dealing with big data might come in handy to solve a data mining problem. Nonetheless, these two terms are merely complementary, not interchangeable. Data science is the closest of these terms to being interchangeable with the KDD process, of which data mining is one step. Because data science is an extremely popular buzzword at this time, its meaning will continue to evolve and change as the field continues to mature. To show the relative search interest for these various terms over time, we can look at Google Trends. This tool shows how frequently people are searching for various keywords over time. In the following figure, the newcomer term data science is currently the hot buzzword, with data mining pulling into second place, followed by machine learning, data science, and predictive analytics. (I tried to include the search term knowledge discovery in databases as well, but the results were so close to zero that the line was invisible.) The y-axis shows the popularity of that particular search term as a 0-100 indexed value. In addition, I combined the weekly index values that Google Trends gives into a monthly average for each month in the period 2004-2015. Google Trends search results for four common data-related terms How do we do data mining? Since data mining is traditionally seen as one of the steps in the overall KDD process, and increasingly in the data science process, in this article we get acquainted with the steps involved. There are several popular methodologies for doing the work of data mining. Here we highlight four methodologies: two that are taken from textbook introductions to the theory of data mining, one taken from a very practical process used in industry, and one designed for teaching beginners. The Fayyad et al. KDD process One early version of the knowledge discovery and data mining process was defined by Usama Fayyad, Gregory Piatetsky-Shapiro, and Padhraic Smyth in a 1996 article (The KDD Process for Extracting Useful Knowledge from Volumes of Data). This article was important at the time for refining the rapidly-changing KDD methodology into a concrete set of steps. The following steps lead from raw data at the beginning to knowledge at the end: Data selection: The input to this step is raw data, and the output of this selection step is a smaller subset of the data, called the target data. Data pre-processing: The target data is cleaned, oddities and outliers are removed, and missing data is accounted for. The output of this step is pre-processed data, or cleaned data. Data transformation: The cleaned data is organized into a format appropriate for the mining step, and the number of features or variables is reduced if need be. The output of this step is transformed data. Data Mining: The transformed data is mined for patterns using one or more data mining algorithms appropriate to the problem at hand. The output of this step is the discovered patterns. Data Interpretation/Evaluation: The discovered patterns are evaluated for their ability to solve the problem at hand. The output of this step is knowledge. Since this process leads from raw data to knowledge, it is appropriate that these authors were the ones who were really committed to the term knowledge discovery in databases rather than simply data mining. The Han et al. KDD process Another version of the knowledge discovery process is described in the popular data mining textbook Data Mining: Concepts and Techniques by Jiawei Han, Micheline Kamber, and Jian Pei as the following steps, which also lead from raw data to knowledge at the end: Data cleaning: The input to this step is raw data, and the output is cleaned data Data integration: In this step, the cleaned data is integrated (if it came from multiple sources). The output of this step is integrated data. Data selection: The data set is reduced to only the data needed for the problem at hand. The output of this step is a smaller data set. Data transformation: The smaller data set is consolidated into a form that will work with the upcoming data mining step. This is called transformed data. Data Mining: The transformed data is processed by intelligent algorithms that are designed to discover patterns in that data. The output of this step is one or more patterns. Pattern evaluation: The discovered patterns are evaluated for their interestingness and their ability to solve the problem at hand. The output of this step is an interestingness measure applied to each pattern, representing knowledge. Knowledge representation: In this step, the knowledge is communicated to users through various means, including visualization. In both the Fayyad and Han methodologies, it is expected that the process will iterate multiple times over steps, if such iteration is needed. For example, if during the transformation step the person doing the analysis realized that another data cleaning or pre-processing step is needed, both of these methodologies specify that the analyst should double back and complete a second iteration of the incomplete earlier step. The CRISP-DM process A third popular version of the KDD process that is used in many business and applied domains is called CRISP-DM, which stands for CRoss-Industry Standard Process for Data Mining. It consists of the following steps: Business Understanding: In this step, the analyst spends time understanding the reasons for the data mining project from a business perspective. Data Understanding: In this step, the analyst becomes familiar with the data and its potential promises and shortcomings, and begins to generate hypotheses. The analyst is tasked to reassess the business understanding (step 1) if needed. Data Preparation: This step includes all the data selection, integration, transformation, and pre-processing steps that are enumerated as separate steps in the other models. The CRISP-DM model has no expectation of what order these tasks will be done in. Modeling: This is the step in which the algorithms are applied to the data to discover the patterns. This step is closest to the actual data mining steps in the other KDD models. The analyst is tasked to reassess the data preparation step (step 3) if the modeling and mining step requires it. Evaluation: The model and discovered patterns are evaluated for their value in answering the business problem at hand. The analyst is tasked with revisiting the business understanding (step 1) if necessary. Deployment: The discovered knowledge and models are presented and put into production to solve the original problem at hand. One of the strengths of this methodology is that iteration is built in. Between specific steps, it is expected that the analyst will check that the current step is still in agreement with certain previous steps. Another strength of this method is that the analyst is explicitly reminded to keep the business problem front and center in the project, even down in the evaluation steps. The Six Steps process When I teach the introductory data science course at my university, I use a hybrid methodology of my own creation. This methodology is called the Six Steps, and I designed it to be especially friendly for teaching. My Six Steps methodology removes some of the ambiguity that inexperienced students may have with open-ended tasks from CRISP-DM, such as Business Understanding, or a corporate-focused task such as Deployment. In addition, the Six Steps method keeps the focus on developing students' critical thinking skills by requiring them to answer Why are we doing this? and What does it mean? at the beginning and end of the process. My Six Steps method looks like this: Problem statement: In this step, the students identify what the problem is that they are trying to solve. Ideally, they motivate the case for why they are doing all this work. Data collection and storage: In this step, students locate data and plan their storage for the data needed for this problem. They also provide information about where the data that is helping them answer their motivating question came from, as well as what format it is in and what all the fields mean. Data cleaning: In this phase, students carefully select only the data they really need, and pre-process the data into the format required for the mining step. Data mining: In this step, students formalize their chosen data mining methodology. They describe what algorithms they used and why. The output of this step is a model and discovered patterns. Representation and visualization: In this step, the students show the results of their work visually. The outputs of this step can be tables, drawings, graphs, charts, network diagrams, maps, and so on. Problem resolution: This is an important step for beginner data miners. This step explicitly encourages the student to evaluate whether the patterns they showed in step 5 are really an answer to the question or problem they posed in step 1. Students are asked to state the limitations of their model or results, and to identify parts of the motivating question that they could not answer with this method. Which data mining methodology is the best? A 2014 survey of the subscribers of Gregory Piatetsky-Shapiro's very popular data mining email newsletter KDNuggets included the question What main methodology are you using for your analytics, data mining, or data science projects? 43% of the poll respondents indicated that they were using the CRISP-DM methodology 27% of the respondents were using their own methodology or a hybrid 7% were using the traditional KDD methodology These results are generally similar to the 2007 results from the same newsletter asking the same question. My best advice is that it does not matter too much which methodology you use for a data mining project, as long as you just pick one. If you do not have any methodology at all, then you run the risk of forgetting important steps. Choose one of the methods that seems like it might work for your project and your needs, and then just do your best to follow the steps. We will vary our data mining methodology depending on which technique we are looking at in a given article. For example, even though the focus of the article as a whole is on the data mining step, we still need to motivate of project with a healthy dose of Business Understanding (CRISP-DM) or Problem Statement (Six Steps) so that we understand why we are doing the tasks and what the results mean. In addition, in order to learn a particular data mining method, we may also have to do some pre-processing, whether we call that data cleaning, integration, or transformation. But in general, we will try to keep these tasks to a minimum so that our focus on data mining remains clear. Finally, even though data visualization is typically very important for representing the results of your data mining process to your audience, we will also keep these tasks to a minimum so that we can remain focused on the primary job at hand: data mining. Summary In this article, we learned what it would take to expand our data mining toolbox to the master level. First we took a long view of the field as a whole, starting with the history of data mining as a piece of the knowledge discovery in databases (KDD) process. We also compared the field of data mining to other similar terms such as data science, machine learning, and big data. Next, we outlined the common tools and techniques that most experts consider to be most important to the KDD process, paying special attention to the techniques that are used most frequently in the mining and analysis steps. To really master data mining, it is important that we work on problems that are different than simple textbook examples. For this reason we will be working on more exotic data mining techniques such as generating summaries and finding outliers, and focusing on more unusual data types, such as text and networks.  Resources for Article: Further resources on this subject: Python Data Structures [article] Mining Twitter with Python – Influence and Engagement [article] Data mining [article]
Read more
  • 0
  • 0
  • 6491
Modal Close icon
Modal Close icon