How-To Tutorials

article-image-understanding-core-data-concepts

23 Mar 2015

10 min read

Understanding Core Data concepts

23 Mar 2015

0
0
5249

article-image-overview-automation-and-advent-chef

Packt

23 Mar 2015

14 min read

An Overview of Automation and Advent of Chef

Packt

23 Mar 2015

14 min read

In this article by Rishabh Sharma and Mitesh Soni, author of the book Learning Chef, before moving to the details of different Chef components and other practical things, it is recommended that you know the foundation of automation and some of the existing automation tools. This article will provide you with a conceptual understanding of automation, and a comparative analysis of Chef with the existing automation tools. (For more resources related to this topic, see here.) In this article, we will cover the following topics: An overview of automation The need for automation A brief introduction of Chef The salient features of Chef Automation Automation is the process of automating operations that control, regulate, and administrate machines, disparate systems, or software with little or no human intervention. In simple English, automation means automatic processing with "little or no human involvement. An automated system is expected to perform a function more reliably, efficiently, and accurately than a human operator. The automated machine performs a function at a lower cost with higher efficiency than a human operator, thereby, automation is becoming more and more widespread across various service industries as well as in the IT and software industry. Automation basically helps a business in the following ways: It helps to reduce the complexities of processes and sequential steps It helps to reduce the possibilities of human error in repeatable tasks It helps to consistently and predictably improve the performance of a system It helps customers to focus on business rather than how to manage complexities of their system; hence, it increases the productivity and "scope of innovation in a business It improves robustness, agility of application deployment in different environments, and reduces the time to market an application Automation has already helped to solve various engineering problems such as information gathering, preparation of automatic bills, and reports; with the help "of automation, we get high-quality products and products that save cost. IT operations are very much dependent on automation. A high degree of automation in IT operations results in a reduced need for manual work, improved quality of service, and productivity. Why automation is needed Automation has been serving different types of industries such as agriculture, food and drink, and so on for many years, and its usage is well known; here, we will concentrate on automation related to the information technology (IT) service and software industry. Escalation of innovation in information technology has created tremendous opportunities for unbelievable growth in large organizations and small- and medium-sized businesses. IT automation is the process of automated integration and management of multifaceted compute resources, middleware, enterprise applications, and services based on workflow. Obviously, large organizations with gigantic profits can afford costly IT resources, manpower, and sophisticated management tools, while for small- and medium-scale organizations, it is not feasible. In addition to this, huge investments are at stake in all resources and most of the time, this resource management is a manual process, which is prone to errors. Hence, automation in the IT industry can be proved as a boon considering that it has repeatable and error-prone tasks. Let's drill down the reasons for the need of automation in more detail: Agile methodology: An agile approach to develop an application results in the frequent deployment of a process. Multiple deployments in a short interval involve a lot of manual effort and repeatable activities. Continuous delivery: Large number of application releases within a short span of time due to an agile approach of business units or organizations require speedy and frequent deployment in a production environment. Development of a delivery process involves development and operation teams that have different responsibilities for proper delivery of the outcome. Non-effective transition between development and production environment: In a traditional environment, transition of a latest application build from development to production lasts over weeks. Execution steps taken to do this are manual, and hence, it is likely that they will create problems. The complete process is extremely inefficient. It becomes an exhaustive process with a lot of manual effort involved. Inefficient communication and collaboration between teams: Priorities of development and IT operations teams are different in different organizations. A development team is focused on the latest development releases and considers new feature development, fixing the existing bugs, or development of innovative concepts; while an operations team cares about the stability of a production environment. Often, the first deployment takes place in a production-like environment when a development team completes its part. An operations team manages the deployment environment for the application independently, and there is hardly any interaction between both the teams. More often than not, ineffective or virtually, no collaboration and communication between the teams causes many problems in the transition of application package from the deployment environment to the production environment because of the different roles and responsibilities of the respective teams. Cloud computing: The surfacing of cloud computing in the last decade has changed the perspective of the business stakeholders. Organizations are attempting to develop and deploy cloud-based applications to keep up their pace with the current market and technology trends. Cloud computing helps to manage a complex IT infrastructure that includes physical, consolidated, virtualized, and cloud resources, as well as it helps to manage the constant pressure to reduce costs. Infrastructure as a code is an innovative concept that models the infrastructure as a code to pool resources in an abstract manner with seamless operations to provision and deprovision for the infrastructure in a flexible environment of the cloud. Hence, we can consider that the infrastructure is redeployable using configuration management tools. Such an unimaginable agility in resources has provided us with the best platform to develop innovative applications with an agile methodology rather than the slow and linear waterfall of the Software Development Life Cycle (SDLC) model. Automation brings the following benefits to the IT industry by addressing "preceding concerns: Agility: It provides promptness and agility to your IT infrastructure. Productivity and flexibility is the significant advantage of automation, which helps us to compete with the current agile economic condition. Scalability: Using automation, we can manage the complications of the infrastructure and leverage the scalability of resources in order to fulfill our customers demand. It helps to transform infrastructure into a simple code, which means that building, rebuilding, configuring, and scaling of the infrastructure is possible in just a few minutes according to the need of the customers in a real-world environment. Efficiency and consistency: It can handle all the repeated tasks very easily, so that you can concentrate on innovative business. It increases the agility and efficiency of managing a deployment environment and application deployment itself. Effective management of resources: It helps to maintain a model of infrastructure which must be consistent. It provides a code-based design framework that leads us to a flexible and manageable way to know all the fundamentals of the complex network. Deployment accuracy: Application development and delivery is a multifaceted, cumbersome, repetitive, and time-bound endeavor. Using automation, testability of a deployment environment and the enforcing discipline of an accurate scripting of the changes needs to be done to an environment, and the repeatability of those changes can be done very quickly. We have covered DevOps-related aspects in the previous section, where we discussed the need for automation and its benefits. Let's understand it in a more precise manner. Recently, the DevOps culture has become very popular. A DevOps-based application development can handle quick changes, frequent releases, fix bugs and continuous delivery-related issues in the entire SDLC process. In simple English, we can say that DevOps is a blend of the tasks undertaken by the development and operation teams to make application delivery faster and more effective. DevOps (includes coding, testing, continuous integration of applications, and version releases) and various IT operations (includes change, incident, problem management, escalation, and monitoring,) can work together in a highly collaborative environment. It means that there must be a strong collaboration, integration, and communication between software developers and IT operations team. The following figure shows you the applied view of DevOps, and how a development and an operations team collaborate with each other with the help of different types of tools. For different kinds of operations such as configuration management and deployment, both Chef and Puppet are being used. DevOps also shows how cloud management tools such as Dell Cloud Manager, formerly known as Enstratius, RightScale, and Scalr can be used to manage cloud resources for development and operations activities: DevOps is not a technology or a product, but it is a combination of culture, people, process, and technology. Everyone who is involved in the software development process, including managers, works together and collaboratively on all the aspects of a project. DevOps represents an important opportunity for organizations to stay ahead of their competition by building better applications and services, thus opening the door for increased revenue and improved customer experiences. DevOps is the solution for the problems that arise from the interdependence of IT operations and software development. There are various benefits of DevOps: DevOps targets application delivery, new feature development, bug fixing, testing, and maintenance of new releases It provides stable operating environments similar to an actual deployment environment and hence, results in less errors or unknown scenarios It supports an effective application release management process by providing better control over the distributed development efforts, and by regulating development and deployment environments It provides continuous delivery of applications and hence provides faster solutions to problems It provides faster development and delivery cycles, which help us to increase our response to customer feedback in a timely manner and enhance customer experience and loyalty It improves efficiency, security, reliability, predictability of outcome, and faster development and deployment cycles In the following figure, we can see all the necessities that are based on the development of DevOps. In order to serve most of the necessities of DevOps, we need a tool for configuration management, such as Chef: In order to support DevOps-based application development and delivery approach, infrastructure automation is mandatory, considering extreme need of agility. The entire infrastructure and platform layer should be configurable in the form of code or a script. These scripts will manage to install operating systems, install and configure servers on different instances or on virtual machines, and these scripts will manage to install and configure the required software and services on particular machines. Hence, it is an opportunistic time for organizations that need to deliver innovative business value in terms of services or offerings in the form of working outcome – deployment ready applications. With an automation script, same configuration can be applied to a single server or thousands of identical servers simultaneously. Thereby, it can handle error-prone manual tasks more efficiently without any intervention, and manage horizontal scalability efficiently and easily. In the past few years, several open-source commercial tools have emerged for infrastructure automation, in which, Bcfg2, Cobbler, CFEngine, Puppet, and Chef are the most popular. These automation tools can be used to manage all types of infrastructure environments such as physical or virtual machines, or clouds. Our objective is to understand Chef in detail, and hence, we will look at the overview of the Chef tool in the next section. Introduction to Chef Chef is an open source configuration management tool developed by the Opscode community in 2008. They launched its first edition in January 2009. Opscode is run by individuals from the data center teams of Amazon and Microsoft. Chef supports a variety of operating systems; it typically runs on Linux, but supports Windows 7 and Windows Server too. Chef is written in Ruby and Erlang, both are real-time programming languages. The Chef server, workstation, and nodes are the three major components of Chef. The Chef server stores data to configure and manage nodes effectively. A Chef workstation works as a local Chef repository. Knife is installed on a workstation. Knife is used to upload cookbooks to a Chef server. Cookbook is a collection of recipes. Recipes execute actions that are meant to be automated. A node communicates with a Chef server and gets the configuration data related to it and executes it to install packages or to perform any other operations for configuration management . Most of the outages that impact the core services of business organizations are caused by human errors during configuration changes and release management. Chef helps software developers and engineers to manage server and application configurations, and provides for hardware or virtual resources by writing code rather than running commands manually. Hence, it is possible to apply best practices of coding and design patterns to automate infrastructure. Chef was developed to handle most critical infrastructure challenges in the current scenario; it makes deployment of server and applications to any physical, virtual, or cloud instances easy. Chef transforms infrastructure to code. Considering virtual machines in a cloud environment, we can easily visualize the possibility of keeping versions of infrastructure and its configurations and creating infrastructure repeatedly and proficiently. Additionally, Chef also supports system administration, network management, and continuous delivery of an application. The salient features of Chef Based on comparative analysis with Chef's competitors, the following are the salient features of Chef, which make it an outstanding and the most popular choice among developers in the current IT infrastructure automation scenario: Chef has different flavors of automated solutions for current IT operations such as Open Source Chef, Hosted Chef, and Private Chef. Chef enables the highly scalable, secure, and fault-tolerant automation capability features of your infrastructure. Every flavor has a specific solution to handle different kinds of infrastructure needs. For example, the Open Source Chef server is freely available for all, but supports limited features, while the Hosted Chef server is managed by Opscode as a service with subscription fees for standard and premium support. The Private Chef server provides an on-premise automated solution with a subscription price and licensing plans. Chef has given us flexibility. According to the current industry use cases, we can choose among Open Source, Hosted, and Private Chef server as per our requirement. Chef has the facility to integrate with third-party tools such as Test Kitchen, Vagrant, and Foodcritic. These integrations help developers to test Chef scripts and perform proof of concept (POC) before deploying an actual automation. These tools are very useful to learn and test Chef scripting. Chef has a very strong community. The website, https://www.chef.io/ can help you get started with Chef and publish things. Opscode has hosted numerous webinars, it publishes training material, and makes it very easy for developers to contribute to new patches and releases. Chef can quickly handle all types of traditional dependencies and manual processes of the entire network. Chef has a strong dependency management approach, which means that only the sequence of order matters, and all dependencies would be met if they are specified in the proper order. Chef is well suited for cloud instances, and it is the first choice of developers who are associated with cloud infrastructure automation. Therefore, demand for Chef automation is growing exponentially. Within a short span of time, Chef has acquired a good market reputation and reliability. In the following figure, we can see the key features of Chef automation, which make it the most popular choice of developers in the current industry scenario: Summary Here, we got the fundamental understanding of automation and the various ways in which automation helps the IT industry. DevOps is most popular nowadays, because it brings a highly collaborative environment in the entire software development process. We got an overview of several traditional and advance automation tools, which have been used over the past 15 years. We got a clear idea why Chef is needed in the current scenario of IT automation process and why it is more preferable. We saw the evolution of various IT automation tools and the comparison of some advance automation tools. We also discussed the salient features of Chef. Resources for Article: Further resources on this subject: Chef Infrastructure [Article] External Tools and the Puppet Ecosystem [Article] Getting started with using Chef [Article]

0
0
2423

article-image-gps-enabled-time-lapse-recorder

Packt

23 Mar 2015

17 min read

GPS-enabled Time-lapse Recorder

Packt

23 Mar 2015

17 min read

0
0
7059

Alvin Ourrad

23 Mar 2015

6 min read

Making Games with Pixi.js

Alvin Ourrad

23 Mar 2015

6 min read

In this post I will introduce you to pixi.js, a super-fast rending engine that is also a swiss-army-knife tool with a friendly API. What ? Pixi.js is a rendering engine that allows you to use the power of WebGL and canvas to render your content on your screen in a completely seamless way. In fact, pixi.js features both a WebGL and a canvas renderer, and can fall back to the latter for lower-end devices. You can then harness the power of WebGL and hardware-accelerated graphics on devices that are powerful enough to use it. If one of your users is on an older device, the engine falls back to the canvas renderer automatically and there is no difference for the person browsing your website, so you don't have to worry about those users any more. WebGL for 2D ? If you have heard or browsed a web product that was showcased as using WebGL, you probably have memories of a 3D game, a 3D earth visualization, or something similar. WebGL was originally highlighted and brought to the public for its capability to render 3D graphics in the browser, because it was the only way that was fast enough to allow them to do it. But the underlying technology is not 3D only, nor is it 2D, you make it do what you want, so the idea behind pixi.js was to bring this speed and quality of rendering to 2D graphics and games, and of course to the general public. You might argue that you do not need this level of accuracy and fine-grain control for 2D, and the WebGL API might be a bit of an overhead for a 2D application, but with browsers becoming more powerful, the expectations of the users are getting higher and higher and this technology with its speed allows you to compete with the applications that used to be flash-only. Tour/Overview Pixi.js was created by a former flash developer, so consequently its syntax is very similar to ActionScript3. Here is a little tour of the core components that you need to create when using pixi. The renderer I already gave you a description of its features and capabilities, so the only thing to bear in mind is that there are two ways of creating a renderer. You can specify the renderer that you want, or let the engine decide according to the current device. // When you let the engine decide : var renderer = PIXI.autoDetectRenderer(800,600); // When you specifically want one or the other renderer: var renderer = new PIXI.WebGLRenderer(800,600); // and for canvas you'd write : // var renderer = new PIXI.WebGLRenderer(800,600); The stage Pixi mimics the Flash API in how it deals with object’s positioning. Basically, the object's coordinates are always relative to their parent container. Flash and pixi allow you to create special objects that are called containers. They are not images or graphics, they are abstract ways to group objects together. Say you have a landscape made of various things such as trees, rocks, and so on. If you add them to a container and move this container, you can move all of these objects together by moving the container. Here is how it works: Don't run away just yet, this is where the stage comes in. The Stage is the root container that everything is added to. The stage isn't meant to move, so when a sprite is added directly to the stage, you can be sure its position will be the same as its position on-screen (well, within your canvas). // here is how you create a stage var stage = new PIXI.Stage(); Let's make a thing Ok, enough of the scene-graph theory, it's time to make something. As I wrote before, pixi is a rendering engine, so you will need to tell the renderer to render its stage, otherwise nothing will happen. So this is the bare bones template you'll use for anything pixi: // create an new instance of a pixi stage var stage = new PIXI.Stage(0x0212223); // create a renderer instance var renderer = PIXI.autoDetectRenderer(window.innerWidth, window.innerHeight); // add the renderer view element to the DOM document.body.appendChild(renderer.view); // create a new Sprite using the texture var bunny = new PIXI.Sprite.fromImage("assets/bunny.png"); bunny.position.set(200,230); stage.addChild(bunny); animate(); function animate() { // render the stage renderer.render(stage); requestAnimFrame(animate); } First, you create a renderer and a stage, just like I showed you before, then you create the most important pixi object, a Sprite, which is basically an image rendered on your screen. var sprite = new PIXI.Sprite.fromImage("assets/image.png"); Sprites, are the core of your game, and the thing you will use the most in pixi and any major game framework. However, pixi being not really a game framework, but a level lower, you need to manually add your sprites to the stage. So whenever something is not visible, make sure to double-check that you have added it to the stage like this: stage.addChild(sprite); Then, you can create a function that creates a bunch of sprites. function createParticles () { for (var i = 0; i < 40; i++) { // create a new Sprite using the texture var bunny = new PIXI.Sprite.fromImage("assets/bunny.png"); bunny.xSpeed = (Math.random()*20)-10; bunny.ySpeed = (Math.random()*20)-10; bunny.tint = Math.random() * 0xffffff; bunny.rotation = Math.random() * 6; stage.addChild(bunny); } } And then, you can leverage the update loop to move these sprites around randomly: if(count > 10){ createParticles(); count = 0; } if(stage.children.length > 20000){ stage.children.shift()} for (var i = 0; i < stage.children.length; i++) { var sprite = stage.children[i]; sprite.position.x += sprite.xSpeed; sprite.position.y += sprite.ySpeed; if(sprite.position.x > renderer.width){ sprite.position.x = 0; } if(sprite.position.y > renderer.height){ sprite.position.y = 0; } }; </code> That's it, for this blog post. Feel free to have a play with pixi and browse the dedicated website. Games development, web development, native apps... Visit our JavaScript page for more tutorials and content on the frameworks and tools essential for any software developers toolkit. About the author Alvin is a web developer fond of the web and the power of open standards. A lover of open source, he likes experimenting with interactivity in the browser. He currently works as an HTML5 game developer.

0
0
10253

How-To Tutorials

article-image-introducing-interactive-plotting

Packt

20 Mar 2015

29 min read

Introducing Interactive Plotting

Packt

20 Mar 2015

29 min read

This article is written by Benjamin V. Root, the author of Interactive Applications using Matplotlib. The goal of any interactive application is to provide as much information as possible while minimizing complexity. If it can't provide the information the users need, then it is useless to them. However, if the application is too complex, then the information's signal gets lost in the noise of the complexity. A graphical presentation often strikes the right balance. The Matplotlib library can help you present your data as graphs in your application. Anybody can make a simple interactive application without knowing anything about draw buffers, event loops, or even what a GUI toolkit is. And yet, the Matplotlib library will cede as much control as desired to allow even the most savvy GUI developer to create a masterful application from scratch. Like much of the Python language, Matplotlib's philosophy is to give the developer full control, but without being stupidly unhelpful and tedious. (For more resources related to this topic, see here.) Installing Matplotlib There are many ways to install Matplotlib on your system. While the library used to have a reputation for being difficult to install on non-Linux systems, it has come a long way since then, along with the rest of the Python ecosystem. Refer to the following command: $ pip install matplotlib Most likely, the preceding command would work just fine from the command line. Python Wheels (the next-generation Python package format that has replaced "eggs") for Matplotlib are now available from PyPi for Windows and Mac OS X systems. This method would also work for Linux users; however, it might be more favorable to install it via the system's built-in package manager. While the core Matplotlib library can be installed with few dependencies, it is a part of a much larger scientific computing ecosystem known as SciPy. Displaying your data is often the easiest part of your application. Processing it is much more difficult, and the SciPy ecosystem most likely has the packages you need to do that. For basic numerical processing and N-dimensional data arrays, there is NumPy. For more advanced but general data processing tools, there is the SciPy package (the name was so catchy, it ended up being used to refer to many different things in the community). For more domain-specific needs, there are "Sci-Kits" such as scikit-learn for artificial intelligence, scikit-image for image processing, and statsmodels for statistical modeling. Another very useful library for data processing is pandas. This was just a short summary of the packages available in the SciPy ecosystem. Manually managing all of their installations, updates, and dependencies would be difficult for many who just simply want to use the tools. Luckily, there are several distributions of the SciPy Stack available that can keep the menagerie under control. The following are Python distributions that include the SciPy Stack along with many other popular Python packages or make the packages easily available through package management software: Anaconda from Continuum Analytics Canopy from Enthought SciPy Superpack Python(x, y) (Windows only) WinPython (Windows only) Pyzo (Python 3 only) Algorete Loopy from Dartmouth College Show() your work With Matplotlib installed, you are now ready to make your first simple plot. Matplotlib has multiple layers. Pylab is the topmost layer, often used for quick one-off plotting from within a live Python session. Start up your favorite Python interpreter and type the following: >>> from pylab import * >>> plot([1, 2, 3, 2, 1]) Nothing happened! This is because Matplotlib, by default, will not display anything until you explicitly tell it to do so. The Matplotlib library is often used for automated image generation from within Python scripts, with no need for any interactivity. Also, most users would not be done with their plotting yet and would find it distracting to have a plot come up automatically. When you are ready to see your plot, use the following command: >>> show() Interactive navigation A figure window should now appear, and the Python interpreter is not available for any additional commands. By default, showing a figure will block the execution of your scripts and interpreter. However, this does not mean that the figure is not interactive. As you mouse over the plot, you will see the plot coordinates in the lower right-hand corner. The figure window will also have a toolbar: From left to right, the following are the tools: Home, Back, and Forward: These are similar to that of a web browser. These buttons help you navigate through the previous views of your plot. The "Home" button will take you back to the first view when the figure was opened. "Back" will take you to the previous view, while "Forward" will return you to the previous views. Pan (and zoom): This button has two modes: pan and zoom. Press the left mouse button and hold it to pan the figure. If you press x or y while panning, the motion will be constrained to just the x or y axis, respectively. Press the right mouse button to zoom. The plot will be zoomed in or out proportionate to the right/left and up/down movements. Use the X, Y, or Ctrl key to constrain the zoom to the x axis or the y axis or preserve the aspect ratio, respectively. Zoom-to-rectangle: Press the left mouse button and drag the cursor to a new location and release. The axes view limits will be zoomed to the rectangle you just drew. Zoom out using your right mouse button, placing the current view into the region defined by the rectangle you just drew. Subplot configuration: This button brings up a tool to modify plot spacing. Save: This button brings up a dialog that allows you to save the current figure. The figure window would also be responsive to the keyboard. The default keymap is fairly extensive (and will be covered fully later), but some of the basic hot keys are the Home key for resetting the plot view, the left and right keys for back and forward actions, p for pan/zoom mode, o for zoom-to-rectangle mode, and Ctrl + s to trigger a file save. When you are done viewing your figure, close the window as you would close any other application window, or use Ctrl + w. Interactive plotting When we did the previous example, no plots appeared until show() was called. Furthermore, no new commands could be entered into the Python interpreter until all the figures were closed. As you will soon learn, once a figure is closed, the plot it contains is lost, which means that you would have to repeat all the commands again in order to show() it again, perhaps with some modification or additional plot. Matplotlib ships with its interactive plotting mode off by default. There are a couple of ways to turn the interactive plotting mode on. The main way is by calling the ion() function (for Interactive ON). Interactive plotting mode can be turned on at any time and turned off with ioff(). Once this mode is turned on, the next plotting command will automatically trigger an implicit show() command. Furthermore, you can continue typing commands into the Python interpreter. You can modify the current figure, create new figures, and close existing ones at any time, all from the current Python session. Scripted plotting Python is known for more than just its interactive interpreters; it is also a fully fledged programming language that allows its users to easily create programs. Having a script to display plots from daily reports can greatly improve your productivity. Alternatively, you perhaps need a tool that can produce some simple plots of the data from whatever mystery data file you have come across on the network share. Here is a simple example of how to use Matplotlib's pyplot API and the argparse Python standard library tool to create a simple CSV plotting script called plotfile.py. Code: chp1/plotfile.py#!/usr/bin/env python from argparse import ArgumentParserimport matplotlib.pyplot as pltif __name__ == '__main__': parser = ArgumentParser(description="Plot a CSV file") parser.add_argument("datafile", help="The CSV File") # Require at least one column name parser.add_argument("columns", nargs='+', help="Names of columns to plot") parser.add_argument("--save", help="Save the plot as...") parser.add_argument("--no-show", action="store_true", help="Don't show the plot") args = parser.parse_args() plt.plotfile(args.datafile, args.columns) if args.save: plt.savefig(args.save) if not args.no_show: plt.show() Note the two optional command-line arguments: --save and --no-show. With the --save option, the user can have the plot automatically saved (the graphics format is determined automatically from the filename extension). Also, the user can choose not to display the plot, which when coupled with the --save option might be desirable if the user is trying to plot several CSV files. When calling this script to show a plot, the execution of the script will stop at the call to plt.show(). If the interactive plotting mode was on, then the execution of the script would continue past show(), terminating the script, thus automatically closing out any figures before the user has had a chance to view them. This is why the interactive plotting mode is turned off by default in Matplotlib. Also note that the call to plt.savefig() is before the call to plt.show(). As mentioned before, when the figure window is closed, the plot is lost. You cannot save a plot after it has been closed. Getting help We have covered how to install Matplotlib and went over how to make very simple plots from a Python session or a Python script. Most likely, this went very smoothly for you.. You may be very curious and want to learn more about the many kinds of plots this library has to offer, or maybe you want to learn how to make new kinds of plots. Help comes in many forms. The Matplotlib website (http://matplotlib.org) is the primary online resource for Matplotlib. It contains examples, FAQs, API documentation, and, most importantly, the gallery. Gallery Many users of Matplotlib are often faced with the question, "I want to make a plot that has this data along with that data in the same figure, but it needs to look like this other plot I have seen." Text-based searches on graphing concepts are difficult, especially if you are unfamiliar with the terminology. The gallery showcases the variety of ways in which one can make plots, all using the Matplotlib library. Browse through the gallery, click on any figure that has pieces of what you want in your plot, and see the code that generated it. Soon enough, you will be like a chef, mixing and matching components to produce that perfect graph. Mailing lists and forums When you are just simply stuck and cannot figure out how to get something to work or just need some hints on how to get started, you will find much of the community at the Matplotlib-users mailing list. This mailing list is an excellent resource of information with many friendly members who just love to help out newcomers. Be persistent! While many questions do get answered fairly quickly, some will fall through the cracks. Try rephrasing your question or with a plot showing your attempts so far. The people at Matplotlib-users love plots, so an image that shows what is wrong often gets the quickest response. A newer community resource is StackOverflow, which has many very knowledgeable users who are able to answer difficult questions. From front to backend So far, we have shown you bits and pieces of two of Matplotlib's topmost abstraction layers: pylab and pyplot. The layer below them is the object-oriented layer (the OO layer). To develop any type of application, you will want to use this layer. Mixing the pylab/pyplot layers with the OO layer will lead to very confusing behaviors when dealing with multiple plots and figures. Below the OO layer is the backend interface. Everything above this interface level in Matplotlib is completely platform-agnostic. It will work the same regardless of whether it is in an interactive GUI or comes from a driver script running on a headless server. The backend interface abstracts away all those considerations so that you can focus on what is most important: writing code to visualize your data. There are several backend implementations that are shipped with Matplotlib. These backends are responsible for taking the figures represented by the OO layer and interpreting it for whichever "display device" they implement. The backends are chosen automatically but can be explicitly set, if needed. Interactive versus non-interactive There are two main classes of backends: ones that provide interactive figures and ones that don't. Interactive backends are ones that support a particular GUI, such as Tcl/Tkinter, GTK, Qt, Cocoa/Mac OS X, wxWidgets, and Cairo. With the exception of the Cocoa/Mac OS X backend, all interactive backends can be used on Windows, Linux, and Mac OS X. Therefore, when you make an interactive Matplotlib application that you wish to distribute to users of any of those platforms, unless you are embedding Matplotlib, you will not have to concern yourself with writing a single line of code for any of these toolkits—it has already been done for you! Non-interactive backends are used to produce image files. There are backends to produce Postscript/EPS, Adobe PDF, and Scalable Vector Graphics (SVG) as well as rasterized image files such as PNG, BMP, and JPEGs. Anti-grain geometry The open secret behind the high quality of Matplotlib's rasterized images is its use of the Anti-Grain Geometry (AGG) library (http://agg.sourceforge.net/antigrain.com/index.html). The quality of the graphics generated from AGG is far superior than most other toolkits available. Therefore, not only is AGG used to produce rasterized image files, but it is also utilized in most of the interactive backends as well. Matplotlib maintains and ships with its own fork of the library in order to ensure you have consistent, high quality image products across all platforms and toolkits. What you see on your screen in your interactive figure window will be the same as the PNG file that is produced when you call savefig(). Selecting your backend When you install Matplotlib, a default backend is chosen for you based upon your OS and the available GUI toolkits. For example, on Mac OS X systems, your installation of the library will most likely set the default interactive backend to MacOSX or CocoaAgg for older Macs. Meanwhile, Windows users will most likely have a default of TkAgg or Qt5Agg. In most situations, the choice of interactive backends will not matter. However, in certain situations, it may be necessary to force a particular backend to be used. For example, on a headless server without an active graphics session, you would most likely need to force the use of the non-interactive Agg backend: import matplotlibmatplotlib.use("Agg") When done prior to any plotting commands, this will avoid loading any GUI toolkits, thereby bypassing problems that occur when a GUI fails on a headless server. Any call to show() effectively becomes a no-op (and the execution of the script is not blocked). Another purpose of setting your backend is for scenarios when you want to embed your plot in a native GUI application. Therefore, you will need to explicitly state which GUI toolkit you are using. Finally, some users simply like the look and feel of some GUI toolkits better than others. They may wish to change the default backend via the backend parameter in the matplotlibrc configuration file. Most likely, your rc file can be found in the .matplotlib directory or the .config/matplotlib directory under your home folder. If you can't find it, then use the following set of commands: >>> import matplotlib >>> matplotlib.matplotlib_fname() u'/home/ben/.config/matplotlib/matplotlibrc' Here is an example of the relevant section in my matplotlibrc file: #### CONFIGURATION BEGINS HERE # the default backend; one of GTK GTKAgg GTKCairo GTK3Agg # GTK3Cairo CocoaAgg MacOSX QtAgg Qt4Agg TkAgg WX WXAgg Agg Cairo # PS PDF SVG # You can also deploy your own backend outside of matplotlib by # referring to the module name (which must be in the PYTHONPATH) # as 'module://my_backend' #backend : GTKAgg #backend : QT4Agg backend : TkAgg # If you are using the Qt4Agg backend, you can choose here # to use the PyQt4 bindings or the newer PySide bindings to # the underlying Qt4 toolkit. #backend.qt4 : PyQt4 # PyQt4 | PySide This is the global configuration file that is used if one isn't found in the current working directory when Matplotlib is imported. The settings contained in this configuration serves as default values for many parts of Matplotlib. In particular, we see that the choice of backends can be easily set without having to use a single line of code. The Matplotlib figure-artist hierarchy Everything that can be drawn in Matplotlib is called an artist. Any artist can have child artists that are also drawable. This forms the basis of a hierarchy of artist objects that Matplotlib sends to a backend for rendering. At the root of this artist tree is the figure. In the examples so far, we have not explicitly created any figures. The pylab and pyplot interfaces will create the figures for us. However, when creating advanced interactive applications, it is highly recommended that you explicitly create your figures. You will especially want to do this if you have multiple figures being displayed at the same time. This is the entry into the OO layer of Matplotlib: fig = plt.figure() Canvassing the figure The figure is, quite literally, your canvas. Its primary component is the FigureCanvas instance upon which all drawing occurs. Unless you are embedding your Matplotlib figures into a GUI application, it is very unlikely that you will need to interact with this object directly. Instead, as plotting commands are issued, artist objects are added to the canvas automatically. While any artist can be added directly to the figure, usually only Axes objects are added. A figure can have many axes objects, typically called subplots. Much like the figure object, our examples so far have not explicitly created any axes objects to use. This is because the pylab and pyplot interfaces will also automatically create and manage axes objects for a figure if needed. For the same reason as for figures, you will want to explicitly create these objects when building your interactive applications. If an axes or a figure is not provided, then the pyplot layer will have to make assumptions about which axes or figure you mean to apply a plotting command to. While this might be fine for simple situations, these assumptions get hairy very quickly in non-trivial applications. Luckily, it is easy to create both your figure and its axes using a single command: fig, axes = plt.subplots(2, 1) # 2x1 grid of subplots These objects are highly advanced complex units that most developers will utilize for their plotting needs. Once placed on the figure canvas, the axes object will provide the ticks, axis labels, axes title(s), and the plotting area. An axes is an artist that manages all of its scale and coordinate transformations (for example, log scaling and polar coordinates), automated tick labeling, and automated axis limits. In addition to these responsibilities, an axes object provides a wide assortment of plotting functions. A sampling of plotting functions is as follows: Function Description bar Make a bar plot barbs Plot a two-dimensional field of barbs boxplot Make a box and whisker plot cohere Plot the coherence between x and y contour Plot contours errorbar Plot an errorbar graph hexbin Make a hexagonal binning plot hist Plot a histogram imshow Display an image on the axes pcolor Create a pseudocolor plot of a two-dimensional array pcolormesh Plot a quadrilateral mesh pie Plot a pie chart plot Plot lines and/or markers quiver Plot a two-dimensional field of arrows sankey Create a Sankey flow diagram scatter Make a scatter plot of x versus y stem Create a stem plot streamplot Draw streamlines of a vector flow This application will be a storm track editing application. Given a series of radar images, the user can circle each storm cell they see in the radar image and link those storm cells across time. The application will need the ability to save and load track data and provide the user with mechanisms to edit the data. Along the way, we will learn about Matplotlib's structure, its artists, the callback system, doing animations, and finally, embedding this application within a larger GUI application. So, to begin, we first need to be able to view a radar image. There are many ways to load data into a Python program but one particular favorite among meteorologists is the Network Common Data Form (NetCDF) file. The SciPy package has built-in support for NetCDF version 3, so we will be using an hour's worth of radar reflectivity data prepared using this format from a NEXRAD site near Oklahoma City, OK on the evening of May 10, 2010, which produced numerous tornadoes and severe storms. The NetCDF binary file is particularly nice to work with because it can hold multiple data variables in a single file, with each variable having an arbitrary number of dimensions. Furthermore, metadata can be attached to each variable and to the dataset itself, allowing you to self-document data files. This particular data file has three variables, namely Reflectivity, lat, and lon to record the radar reflectivity values and the latitude and longitude coordinates of each pixel in the reflectivity data. The reflectivity data is three-dimensional, with the first dimension as time and the other two dimensions as latitude and longitude. The following code example shows how easy it is to load this data and display the first image frame using SciPy and Matplotlib. Code: chp1/simple_radar_viewer.py import matplotlib.pyplot as plt from scipy.io import netcdf_file ncf = netcdf_file('KTLX_20100510_22Z.nc') data = ncf.variables['Reflectivity'] lats = ncf.variables['lat'] lons = ncf.variables['lon'] i = 0 cmap = plt.get_cmap('gist_ncar') cmap.set_under('lightgrey') fig, ax = plt.subplots(1, 1) im = ax.imshow(data[i], origin='lower', extent=(lons[0], lons[-1], lats[0], lats[-1]), vmin=0.1, vmax=80, cmap='gist_ncar') cb = fig.colorbar(im) cb.set_label('Reflectivity (dBZ)') ax.set_xlabel('Longitude') ax.set_ylabel('Latitude') plt.show() Running this script should result in a figure window that will display the first frame of our storms. The plot has a colorbar and the axes ticks label the latitudes and longitudes of our data. What is probably most important in this example is the imshow() call. Being an image, traditionally, the origin of the image data is shown in the upper-left corner and Matplotlib follows this tradition by default. However, this particular dataset was saved with its origin in the lower-left corner, so we need to state this with the origin parameter. The extent parameter is a tuple describing the data extent of the image. By default, it is assumed to be at (0, 0) and (N – 1, M – 1) for an MxN shaped image. The vmin and vmax parameters are a good way to ensure consistency of your colormap regardless of your input data. If these two parameters are not supplied, then imshow() will use the minimum and maximum of the input data to determine the colormap. This would be undesirable as we move towards displaying arbitrary frames of radar data. Finally, one can explicitly specify the colormap to use for the image. The gist_ncar colormap is very similar to the official NEXRAD colormap for radar data, so we will use it here: The gist_ncar colormap, along with some other colormaps packaged with Matplotlib such as the default jet colormap, are actually terrible for visualization. See the Choosing Colormaps page of the Matplotlib website for an explanation of why, and guidance on how to choose a better colormap. The menagerie of artists Whenever a plotting function is called, the input data and parameters are processed to produce new artists to represent the data. These artists are either primitives or collections thereof. They are called primitives because they represent basic drawing components such as lines, images, polygons, and text. It is with these primitives that your data can be represented as bar charts, line plots, errorbars, or any other kinds of plots. Primitives There are four drawing primitives in Matplotlib: Line2D, AxesImage, Patch, and Text. It is through these primitive artists that all other artist objects are derived from, and they comprise everything that can be drawn in a figure. A Line2D object uses a list of coordinates to draw line segments in between. Typically, the individual line segments are straight, and curves can be approximated with many vertices; however, curves can be specified to draw arcs, circles, or any other Bezier-approximated curves. An AxesImage class will take two-dimensional data and coordinates and display an image of that data with a colormap applied to it. There are actually other kinds of basic image artists available besides AxesImage, but they are typically for very special uses. AxesImage objects can be very tricky to deal with, so it is often best to use the imshow() plotting method to create and return these objects. A Patch object is an arbitrary two-dimensional object that has a single color for its "face." A polygon object is a specific instance of the slightly more general patch. These objects have a "path" (much like a Line2D object) that specifies segments that would enclose a face with a single color. The path is known as an "edge," and can have its own color as well. Besides the Polygons that one sees for bar plots and pie charts, Patch objects are also used to create arrows, legend boxes, and the markers used in scatter plots and elsewhere. Finally, the Text object takes a Python string, a point coordinate, and various font parameters to form the text that annotates plots. Matplotlib primarily uses TrueType fonts. It will search for fonts available on your system as well as ship with a few FreeType2 fonts, and it uses Bitstream Vera by default. Additionally, a Text object can defer to LaTeX to render its text, if desired. While specific artist classes will have their own set of properties that make sense for the particular art object they represent, there are several common properties that can be set. The following table is a listing of some of these properties. Property Meaning alpha 0 represents transparent and 1 represents opaque color Color name or other color specification visible boolean to flag whether to draw the artist or not zorder value of the draw order in the layering engine Let's extend the radar image example by loading up already saved polygons of storm cells in the tutorial.py file. Code: chp1/simple_storm_cell_viewer.py import matplotlib.pyplot as plt from scipy.io import netcdf_file from matplotlib.patches import Polygon from tutorial import polygon_loader ncf = netcdf_file('KTLX_20100510_22Z.nc') data = ncf.variables['Reflectivity'] lats = ncf.variables['lat'] lons = ncf.variables['lon'] i = 0 cmap = plt.get_cmap('gist_ncar') cmap.set_under('lightgrey') fig, ax = plt.subplots(1, 1) im = ax.imshow(data[i], origin='lower', extent=(lons[0], lons[-1], lats[0], lats[-1]), vmin=0.1, vmax=80, cmap='gist_ncar') cb = fig.colorbar(im) polygons = polygon_loader('polygons.shp') for poly in polygons[i]: p = Polygon(poly, lw=3, fc='k', ec='w', alpha=0.45) ax.add_artist(p) cb.set_label("Reflectivity (dBZ)") ax.set_xlabel("Longitude") ax.set_ylabel("Latitude") plt.show() The polygon data returned from polygon_loader() is a dictionary of lists keyed by a frame index. The list contains Nx2 numpy arrays of vertex coordinates in longitude and latitude. The vertices form the outline of a storm cell. The Polygon constructor, like all other artist objects, takes many optional keyword arguments. First, lw is short for linewidth, (referring to the outline of the polygon), which we specify to be three points wide. Next is fc, which is short for facecolor, and is set to black ('k'). This is the color of the filled-in region of the polygon. Then edgecolor (ec) is set to white ('w') to help the polygons stand out against a dark background. Finally, we set the alpha argument to be slightly less than half to make the polygon fairly transparent so that one can still see the reflectivity data beneath the polygons. Note a particular difference between how we plotted the image using imshow() and how we plotted the polygons using polygon artists. For polygons, we called a constructor and then explicitly called ax.add_artist() to add each polygon instance as a child of the axes. Meanwhile, imshow() is a plotting function that will do all of the hard work in validating the inputs, building the AxesImage instance, making all necessary modifications to the axes instance (such as setting the limits and aspect ratio), and most importantly, adding the artist object to the axes. Finally, all plotting functions in Matplotlib return artists or a list of artist objects that it creates. In most cases, you will not need to save this return value in a variable because there is nothing else to do with them. In this case, we only needed the returned AxesImage so that we could pass it to the fig.colorbar() method. This is so that it would know what to base the colorbar upon. The plotting functions in Matplotlib exist to provide convenience and simplicity to what can often be very tricky to get right by yourself. They are not magic! They use the same OO interface that is accessible to application developers. Therefore, anyone can write their own plotting functions to make complicated plots easier to perform. Collections Any artist that has child artists (such as a figure or an axes) is called a container. A special kind of container in Matplotlib is called a Collection. A collection usually contains a list of primitives of the same kind that should all be treated similarly. For example, a CircleCollection would have a list of Circle objects, all with the same color, size, and edge width. Individual values for artists in the collection can also be set. A collection makes management of many artists easier. This becomes especially important when considering the number of artist objects that may be needed for scatter plots, bar charts, or any other kind of plot or diagram. Some collections are not just simply a list of primitives, but are artists in their own right. These special kinds of collections take advantage of various optimizations that can be assumed when rendering similar or identical things. RegularPolyCollection, for example, just needs to know the points of a single polygon relative to its center (such as a star or box) and then just needs a list of all the center coordinates, avoiding the need to store all the vertices of every polygon in its collection in memory. In the following example, we will display storm tracks as LineCollection. Note that instead of using ax.add_artist() (which would work), we will use ax.add_collection() instead. This has the added benefit of performing special handling on the object to determine its bounding box so that the axes object can incorporate the limits of this collection with any other plotted objects to automatically set its own limits which we trigger with the ax.autoscale(True) call. Code: chp1/linecoll_track_viewer.py import matplotlib.pyplot as plt from matplotlib.collections import LineCollection from tutorial import track_loader tracks = track_loader('polygons.shp') # Filter out non-tracks (unassociated polygons given trackID of -9) tracks = {tid: t for tid, t in tracks.items() if tid != -9} fig, ax = plt.subplots(1, 1) lc = LineCollection(tracks.values(), color='b') ax.add_collection(lc) ax.autoscale(True) ax.set_xlabel("Longitude") ax.set_ylabel("Latitude") plt.show() Much easier than the radar images, Matplotlib took care of all the limit setting automatically. Such features are extremely useful for writing generic applications that do not wish to concern themselves with such details. Summary In this article, we introduced you to the foundational concepts of Matplotlib. Using show(), you showed your first plot with only three lines of Python. With this plot up on your screen, you learned some of the basic interactive features built into Matplotlib, such as panning, zooming, and the myriad of key bindings that are available. Then we discussed the difference between interactive and non-interactive plotting modes and the difference between scripted and interactive plotting. You now know where to go online for more information, examples, and forum discussions of Matplotlib when it comes time for you to work on your next Matplotlib project. Next, we discussed the architectural concepts of Matplotlib: backends, figures, axes, and artists. Then we started our construction project, an interactive storm cell tracking application. We saw how to plot a radar image using a pre-existing plotting function, as well as how to display polygons and lines as artists and collections. While creating these objects, we had a glimpse of how to customize the properties of these objects for our display needs, learning some of the property and styling names. We also learned some of the steps one needs to consider when creating their own plotting functions, such as autoscaling. Resources for Article: Further resources on this subject: The plot function [article] First Steps [article] Machine Learning in IPython with scikit-learn [article]

0
2
7670

Packt

19 Mar 2015

18 min read

Finding People and Things

Packt

19 Mar 2015

18 min read

0
0
3134

Packt

19 Mar 2015

13 min read

The Observer Pattern

Packt

19 Mar 2015

13 min read

In this article, written by Leonardo Borges, the author of Clojure Reactive Programming, we will: Explore Rx's main abstraction: Observables Learn about the duality between iterators and Observables Create and manipulate Observable sequences (For more resources related to this topic, see here.) The Observer pattern revisited Let's take a look at an example: (def numbers (atom [])) (defn adder [key ref old-state new-state] (print "Current sum is " (reduce + new-state))) (add-watch numbers :adder adder) In the preceding example, our Observable subject is the var, numbers. The Observer is the adder watch. When the Observable changes, it pushes its changes to the Observer synchronously. Now, contrast this to working with sequences: (->> [1 2 3 4 5 6] (map inc) (filter even?) (reduce +)) This time around, the vector is the subject being observed and the functions processing it can be thought of as the Observers. However, this works in a pull-based model. The vector doesn't push any elements down the sequence. Instead, map and friends ask the sequence for more elements. This is a synchronous operation. Rx makes sequences—and more—behave like Observables so that you can still map, filter, and compose them just as you would compose functions over normal sequences. Observer – an Iterator's dual Clojure's sequence operators such as map, filter, reduce, and so on support Java Iterables. As the name implies, an Iterable is an object that can be iterated over. At a low level, this is supported by retrieving an Iterator reference from such object. Java's Iterator interface looks like the following: public interface Iterator<E> { boolean hasNext(); E next(); void remove(); } When passed in an object that implements this interface, Clojure's sequence operators pull data from it by using the next method, while using the hasNext method to know when to stop. The remove method is required to remove its last element from the underlying collection. This in-place mutation is clearly unsafe in a multithreaded environment. Whenever Clojure implements this interface for the purposes of interoperability, the remove method simply throws UnsupportedOperationException. An Observable, on the other hand, has Observers subscribed to it. Observers have the following interface: public interface Observer<T> { void onCompleted(); void onError(Throwable e); void onNext(T t); } As we can see, an Observer implementing this interface will have its onNext method called with the next value available from whatever Observable it's subscribed to. Hence, it being a push-based notification model. This duality becomes clearer if we look at both the interfaces side by side: Iterator<E> { Observer<T> { boolean hasNext(); void onCompleted(); E next(); void onError(Throwable e); void remove(); void onNext(T t); } } Observables provide the ability to have producers push items asynchronously to consumers. A few examples will help solidify our understanding. Creating Observables This article is all about Reactive Extensions, so let's go ahead and create a project called rx-playground that we will be using in our exploratory tour. We will use RxClojure (see https://github.com/ReactiveX/RxClojure), a library that provides Clojure bindings for RxJava() (see https://github.com/ReactiveX/RxJava): $ lein new rx-playground Open the project file and add a dependency on RxJava's Clojure bindings: (defproject rx-playground "0.1.0-SNAPSHOT" :description "FIXME: write description" :url "http://example.com/FIXME" :license {:name "Eclipse Public License" :url "http://www.eclipse.org/legal/epl-v10.html"} :dependencies [[org.clojure/clojure "1.5.1"] [io.reactivex/rxclojure "1.0.0"]])"]]) Now, fire up a REPL in the project's root directory so that we can start creating some Observables: $ lein repl The first thing we need to do is import RxClojure, so let's get this out of the way by typing the following in the REPL: (require '[rx.lang.clojure.core :as rx]) (import '(rx Observable)) The simplest way to create a new Observable is by calling the justreturn function: (def obs (rx/return 10)) Now, we can subscribe to it: (rx/subscribe obs (fn [value] (prn (str "Got value: " value)))) This will print the string "Got value: 10" to the REPL. The subscribe function of an Observable allows us to register handlers for three main things that happen throughout its life cycle: new values, errors, or a notification that the Observable is done emitting values. This corresponds to the onNext, onError, and onCompleted methods of the Observer interface, respectively. In the preceding example, we are simply subscribing to onNext, which is why we get notified about the Observable's only value, 10. A single-value Observable isn't terribly interesting though. Let's create and interact with one that emits multiple values: (-> (rx/seq->o [1 2 3 4 5 6 7 8 9 10]) (rx/subscribe prn)) This will print the numbers from 1 to 10, inclusive, to the REPL. seq->o is a way to create Observables from Clojure sequences. It just so happens that the preceding snippet can be rewritten using Rx's own range operator: (-> (rx/range 1 10) (rx/subscribe prn)) Of course, this doesn't yet present any advantages to working with raw values or sequences in Clojure. But what if we need an Observable that emits an undefined number of integers at a given interval? This becomes challenging to represent as a sequence in Clojure, but Rx makes it trivial: (import '(java.util.concurrent TimeUnit)) (rx/subscribe (Observable/interval 100 TimeUnit/MILLISECONDS) prn-to-repl) RxClojure doesn't yet provide bindings to all of RxJava's API. The interval method is one such example. We're required to use interoperability and call the method directly on the Observable class from RxJava. Observable/interval takes as arguments a number and a time unit. In this case, we are telling it to emit an integer—starting from zero—every 100 milliseconds. If we type this in an REPL-connected editor, however, two things will happen: We will not see any output (depending on your REPL; this is true for Emacs) We will have a rogue thread emitting numbers indefinitely Both issues arise from the fact that Observable/interval is the first factory method we have used that doesn't emit values synchronously. Instead, it returns an Observable that defers the work to a separate thread. The first issue is simple enough to fix. Functions such as prn will print to whatever the dynamic var *out* is bound to. When working in certain REPL environments such as Emacs', this is bound to the REPL stream, which is why we can generally see everything we print. However, since Rx is deferring the work to a separate thread, *out* isn't bound to the REPL stream anymore so we don't see the output. In order to fix this, we need to capture the current value of *out* and bind it in our subscription. This will be incredibly useful as we experiment with Rx in the REPL. As such, let's create a helper function for it: (def repl-out *out*) (defn prn-to-repl [& args] (binding [*out* repl-out] (apply prn args))) The first thing we do is create a var repl-out that contains the current REPL stream. Next, we create a function prn-to-repl that works just like prn, except it uses the binding macro to create a new binding for *out* that is valid within that scope. This still leaves us with the rogue thread problem. Now is the appropriate time to mention that the subscribe method from an Observable returns a subscription object. By holding onto a reference to it, we can call its unsubscribe method to indicate that we are no longer interested in the values produced by that Observable. Putting it all together, our interval example can be rewritten like the following: (def subscription (rx/subscribe (Observable/interval 100 TimeUnit/MILLISECONDS) prn-to-repl)) (Thread/sleep 1000) (rx/unsubscribe subscription) We create a new interval Observable and immediately subscribe to it, just as we did before. This time, however, we assign the resulting subscription to a local var. Note that it now uses our helper function prn-to-repl, so we will start seeing values being printed to the REPL straight away. Next, we sleep the current—the REPL—thread for a second. This is enough time for the Observable to produce numbers from 0 to 9. That's roughly when the REPL thread wakes up and unsubscribes from that Observable, causing it to stop emitting values. Custom Observables Rx provides many more factory methods to create Observables (see https://github.com/ReactiveX/RxJava/wiki/Creating-Observables), but it is beyond the scope of this article to cover them all. Nevertheless, sometimes, none of the built-in factories is what you want. For such cases, Rx provides the create method. We can use it to create a custom Observable from scratch. As an example, we'll create our own version of the just Observable we used earlier in this article: (defn just-obs [v] (rx/Observable* (fn [Observer] (rx/on-next Observer v) (rx/on-completed Observer)))) (rx/subscribe (just-obs 20) prn) First, we create a function, just-obs, which implements our Observable by calling the Observable* function. When creating an Observable this way, the function passed to Observable* will get called with an Observer as soon as one subscribes to us. When this happens, we are free to do whatever computation—and even I/O—we need in order to produce values and push them to the Observer. We should remember to call the Observer's onCompleted method whenever we're done producing values. The preceding snippet will print 20 to the REPL. While creating custom Observables is fairly straightforward, we should make sure we exhaust the built-in factory functions first, only then resorting to creating our own. Manipulating Observables Now that we know how to create Observables, we should look at what kinds of interesting things we can do with them. In this section, we will see what it means to treat Observables as sequences. We'll start with something simple. Let's print the sum of the first five positive even integers from an Observable of all integers: (rx/subscribe (->> (Observable/interval 1 TimeUnit/MICROSECONDS) (rx/filter even?) (rx/take 5) (rx/reduce +)) prn-to-repl) This is starting to look awfully familiar to us. We create an interval that will emit all positive integers starting at zero every 1 microsecond. Then, we filter all even numbers in this Observable. Obviously, this is too big a list to handle, so we simply take the first five elements from it. Finally, we reduce the value using +. The result is 20. To drive home the point that programming with Observables really is just like operating on sequences, we will look at one more example where we will combine two different Observable sequences. One contains the names of musicians I'm a fan of and the other the names of their respective bands: (defn musicians [] (rx/seq->o ["James Hetfield" "Dave Mustaine" "Kerry King"])) (defn bands [] (rx/seq->o ["Metallica" "Megadeth" "Slayer"])) We would like to print to the REPL a string of the format Musician name – from: band name. An added requirement is that the band names should be printed in uppercase for impact. We'll start by creating another Observable that contains the uppercased band names: (defn uppercased-obs [] (rx/map (fn [s] (.toUpperCase s)) (bands))) While not strictly necessary, this makes a reusable piece of code that can be handy in several places of the program, thus avoiding duplication. Subscribers interested in the original band names can keep subscribing to the bands Observable. With the two Observables in hand, we can proceed to combine them: (-> (rx/map vector (musicians) (uppercased-obs)) (rx/subscribe (fn [[musician band]] (prn-to-repl (str musician " - from: " band))))) Once more, this example should feel familiar. The solution we were after was a way to zip the two Observables together. RxClojure provides zip behavior through map, much like Clojure's core map function does. We call it with three arguments: the two Observables to zip and a function that will be called with both elements, one from each Observable, and should return an appropriate representation. In this case, we simply turn them into a vector. Next, in our subscriber, we simply destructure the vector in order to access the musician and band names. We can finally print the final result to the REPL: "James Hetfield - from: METALLICA" "Dave Mustaine - from: MEGADETH" "Kerry King - from: SLAYER" Summary In this article, we took a deep dive into RxJava, a port form Microsoft's Reactive Extensions from .NET. We learned about its main abstraction, the Observable, and how it relates to iterables. We also learned how to create, manipulate, and combine Observables in several ways. The examples shown here were contrived to keep things simple. Resources for Article: Further resources on this subject: A/B Testing – Statistical Experiments for the Web [article] Working with Incanter Datasets [article] Using cross-validation [article]

0
0
2374

article-image-drupal-8-configuration-management

Packt

18 Mar 2015

14 min read

Drupal 8 Configuration Management

Packt

18 Mar 2015

14 min read

0
0
13299

article-image-building-mobile-games-craftyjs-and-phonegap-part-1

Robi Sen

18 Mar 2015

7 min read

Building Mobile Games with Crafty.js and PhoneGap: Part 1

Robi Sen

18 Mar 2015

7 min read

In this post, we will build a mobile game using HTML5, CSS, and JavaScript. To make things easier, we are going to make use of the Crafty.js JavaScript game engine, which is both free and open source. In this first part of a two-part series, we will look at making a simple turn-based RPG-like game based on Pascal Rettig’s Crafty Workshop presentation. You will learn how to add sprites to a game, control them, and work with mouse/touch events. Setting up To get started, first create a new PhoneGap project wherever you want in which to do your development. For this article, let’s call the project simplerpg. Figure 1: Creating the simplerpg project in PhoneGap. Navigate to the www directory in your PhoneGap project and then add a new director called lib. This is where we are going to put several JavaScript libraries we will use for the project. Now, download the JQuery library to the lib directory. For this project, we will use JQuery 2.1. Once you have downloaded JQuery, you need to download the Crafty.js library and add it to your lib directory as well. For later parts of this series,you will want to be using a web server such as Apache or IIS to make development easier. For the first part of the post, you can just drag-and-drop the HTML files into your browser to test, but later, you will need to use a web browser to avoid Same Origin Policy errors. This article assumes you are using Chrome to develop in. While IE or FireFox will work just fine, Chrome is used in this article and its debugging environment. Finally, the source code for this article can be found here on GitHub. In the lessons directory, you will see a series of index files with a listing number matching each code listing in this article. Crafty PhoneGap allows you to take almost any HTML5 application and turn it into a mobile app with little to no extra work. Perhaps the most complex of all mobile apps are videos. Video games often have complex routines, graphics, and controls. As such, developing a video game from the ground up is very difficult. So much so that even major video game companies rarely do so. What they usually do, and what we will do here, is make use of libraries and game engines that take care of many of the complex tasks of managing objects, animation, collision detection, and more. For our project, we will be making use of the open source JavaScript game engine Crafty. Before you get started with the code, it’s recommended to quickly review the website here and review the Crafty API here. Bootstrapping Crafty and creating an entity Crafty is very simple to start working with. All you need to do is load the Crafty.js library and initialize Crafty. Let’s try that. Create an index.html file in your www root directory, if one does not exist; if you already have one, go ahead and overwrite it. Then, cut and paste listing 1 into it. Listing 1: Creating an entity <!DOCTYPE html> <html> <head></head> <body> <div id="game"></div> <script type="text/javascript" src="lib/crafty.js"></script> <script> // Height and Width var WIDTH = 500, HEIGHT = 320; // Initialize Crafty Crafty.init(WIDTH, HEIGHT); var player = Crafty.e(); player.addComponent("2D, Canvas, Color") player.color("red").attr({w:50, h:50}); </script> </body> </html> As you can see in listing 1, we are creating an HTML5 document and loading the Crafty.js library. Then, we initialize Crafty and pass it a width and height. Next, we create a Crafty entity called player. Crafty, like many other game engines, follows a design pattern called Entity-Component-System or (ECS). Entities are objects that you can attach things like behaviors and data to. For ourplayerentity, we are going to add several components including 2D, Canvas, and Color. Components can be data, metadata, or behaviors. Finally, we will add a specific color and position to our entity. If you now save your file and drag-and-drop it into the browser, you should see something like figure 2. Figure 2: A simple entity in Crafty. Moving a box Now,let’s do something a bit more complex in Crafty. Let’s move the red box based on where we move our mouse, or if you have a touch-enabled device, where we touch the screen. To do this, open your index.html file and edit it so it looks like listing 2. Listing 2: Moving the box <!DOCTYPE html> <html> <head></head> <body> <div id="game"></div> <script type="text/javascript" src="lib/crafty.js"></script> <script> var WIDTH = 500, HEIGHT = 320; Crafty.init(WIDTH, HEIGHT); // Background Crafty.background("black"); //add mousetracking so block follows your mouse Crafty.e("mouseTracking, 2D, Mouse, Touch, Canvas") .attr({ w:500, h:320, x:0, y:0 }) .bind("MouseMove", function(e) { console.log("MouseDown:"+ Crafty.mousePos.x +", "+ Crafty.mousePos.y); // when you touch on the canvas redraw the player player.x = Crafty.mousePos.x; player.y = Crafty.mousePos.y; }); // Create the player entity var player = Crafty.e(); player.addComponent("2D, DOM"); //set where your player starts player.attr({ x : 10, y : 10, w : 50, h : 50 }); player.addComponent("Color").color("red"); </script> </body> </html> As you can see, there is a lot more going on in this listing. The first difference is that we are using Crafty.background to set the background to black, but we are also creating a new entity called mouseTracking that is the same size as the whole canvas. We assign several components to the entity so that it can inherit their methods and properties. We then use .bind to bind the mouse’s movements to our entity. Then, we tell Crafty to reposition our player entity to wherever the mouse’s x and y position is. So, if you save this code and run it, you will find that the red box will go wherever your mouse moves or wherever you touch or drag as in figure 3. Figure 3: Controlling the movement of a box in Crafty. Summary In this post, you learned about working with Crafty.js. Specifically, you learned how to work with the Crafty API and create entities. In Part 2, you will work with sprites, create components, and control entities via mouse/touch. About the author Robi Sen, CSO at Department 13, is an experienced inventor, serial entrepreneur, and futurist whose dynamic twenty-plus-year career in technology, engineering, and research has led him to work on cutting edge projects for DARPA, TSWG, SOCOM, RRTO, NASA, DOE, and the DOD. Robi also has extensive experience in the commercial space, including the co-creation of several successful start-up companies. He has worked with companies such as UnderArmour, Sony, CISCO, IBM, and many others to help build new products and services. Robi specializes in bringing his unique vision and thought process to difficult and complex problems, allowing companies and organizations to find innovative solutions that they can rapidly operationalize or go to market with.

0
0
5763

article-image-drupal-8-and-configuration-management

Packt

18 Mar 2015

15 min read

Drupal 8 and Configuration Management

Packt

18 Mar 2015

15 min read

0
0
21909

article-image-entity-framework-code-first-accessing-database-views-and-stored-procedures

Packt

18 Mar 2015

15 min read

Entity Framework Code-First: Accessing Database Views and Stored Procedures

Packt

18 Mar 2015

15 min read

0
0
10826

article-image-signing-application-android-using-maven

Packt

18 Mar 2015

10 min read

Signing an application in Android using Maven

Packt

18 Mar 2015

10 min read

In this article written by Patroklos Papapetrou and Jonathan LALOU, authors of the book Android Application Development with Maven, we'll learn different modes of digital signing and also using Maven to digitally sign the applications. The topics that we will explore in this article are: (For more resources related to this topic, see here.) Signing an application Android requires that all packages, in order to be valid for installation in devices, need to be digitally signed with a certificate. This certificate is used by the Android ecosystem to validate the author of the application. Thankfully, the certificate is not required to be issued by a certificate authority. It would be a total nightmare for every Android developer and it would increase the cost of developing applications. However, if you want to sign the certificate by a trusted authority like the majority of the certificates used in web servers, you are free to do it. Android supports two modes of signing: debug and release. Debug mode is used by default during the development of the application, and the release mode when we are ready to release and publish it. In debug mode, when building and packaging an application the Android SDK automatically generates a certificate and signs the package. So don't worry; even though we haven't told Maven to do anything about signing, Android knows what to do and behind the scenes signs the package with the autogenerated key. When it comes to distributing an application, debug mode is not enough; so, we need to prepare our own self-signed certificate and instruct Maven to use it instead of the default one. Before we dive to Maven configuration, let us quickly remind you how to issue your own certificate. Open a command window, and type the following command: keytool -genkey -v -keystore my-android-release-key.keystore -alias my-android-key -keyalg RSA -keysize 2048 -validity 10000 If the keytool command line utility is not in your path, then it's a good idea to add it. It's located under the %JAVA_HOME%/bin directory. Alternatively, you can execute the command inside this directory. Let us explain some parameters of this command. We use the keytool command line utility to create a new keystore file under the name my-android-release-key.keystore inside the current directory. The -alias parameter is used to define an alias name for this key, and it will be used later on in Maven configuration. We also specify the algorithm, RSA, and the key size, 2048; finally we set the validity period in days. The generated key will be valid for 10,000 days—long enough for many many new versions of our application! After running the command, you will be prompted to answer a series of questions. First, type twice a password for the keystore file. It's a good idea to note it down because we will use it again in our Maven configuration. Type the word :secret in both prompts. Then, we need to provide some identification data, like name, surname, organization details, and location. Finally, we need to set a password for the key. If we want to keep the same password with the keystore file, we can just hit RETURN. If everything goes well, we will see the final message that informs us that the key is being stored in the keystore file with the name we just defined. After this, our key is ready to be used to sign our Android application. The key used in debug mode can be found in this file: ~/.android.debug.keystore and contains the following information: Keystore name: "debug.keystore" Keystore password: "android" Key alias: "androiddebugkey" Key password: "android" CN: "CN=Android Debug,O=Android,C=US" Now, it's time to let Maven use the key we just generated. Before we add the necessary configuration to our pom.xml file, we need to add a Maven profile to the global Maven settings. The profiles defined in the user settings.xml file can be used by all Maven projects in the same machine. This file is usually located under this folder: %M2_HOME%/conf/settings.xml. One fundamental advantage of defining global profiles in user's Maven settings is that this configuration is not shared in the pom.xml file to all developers that work on the application. The settings.xml file should never be kept under the Source Control Management (SCM) tool. Users can safely enter personal or critical information like passwords and keys, which is exactly the case of our example. Now, edit the settings.xml file and add the following lines inside the <profiles> attribute: <profile> <id>release</id> <properties> <sign.keystore>/path/to/my/keystore/my-android-release- key.keystore</sign.keystore> <sign.alias>my-android-key</sign.alias> <sign.storepass>secret</sign.storepass> <sign.keypass>secret</sign.keypass> </properties> </profile> Keep in mind that the keystore name, the alias name, the keystore password, and the key password should be the ones we used when we created the keystore file. Clearly, storing passwords in plain text, even in a file that is normally protected from other users, is not a very good practice. A quick way to make it slightly less easy to read the password is to use XML entities to write the value. Some sites on the internet like this one http://coderstoolbox.net/string/#!encoding=xml&action=encode&charset=none provide such encryptions. It will be resolved as plain text when the file is loaded; so Maven won't even notice it. In this case, this would become: <sign.storepass>secret</sign.storepass> We have prepared our global profile and the corresponding properties, and so we can now edit the pom.xml file of the parent project and do the proper configuration. Adding common configuration in the parent file for all Maven submodules is a good practice in our case because at some point, we would like to release both free and paid versions, and it's preferable to avoid duplicating the same configuration in two files. We want to create a new profile and add all the necessary settings there, because the release process is not something that runs every day during the development phase. It should run only at a final stage, when we are ready to deploy our application. Our first priority is to tell Maven to disable debug mode. Then, we need to specify a new Maven plugin name: maven-jarsigner-plugin, which is responsible for driving the verification and signing process for custom/private certificates. You can find the complete release profile as follows: <profiles> <profile> <id>release</id> <build> <plugins> <plugin> <groupId>com.jayway.maven.plugins.android.generation2 </groupId> <artifactId>android-maven-plugin</artifactId> <extensions>true</extensions> <configuration> <sdk> <platform>19</platform> </sdk> <sign> <debug>false</debug> </sign> </configuration> </plugin> <plugin> <groupId>org.apache.maven.plugins</groupId> <artifactId>maven-jarsigner-plugin</artifactId> <executions> <execution> <id>signing</id> <phase>package</phase> <goals> <goal>sign</goal> <goal>verify</goal> </goals> <inherited>true</inherited> <configuration> <removeExistingSignatures>true </removeExistingSignatures> <archiveDirectory /> <includes> <include>${project.build.directory}/ ${project.artifactId}.apk</include> </includes> <keystore>${sign.keystore}</keystore> <alias>${sign.alias}</alias> <storepass>${sign.storepass}</storepass> <keypass>${sign.keypass}</keypass> <verbose>true</verbose> </configuration> </execution> </executions> </plugin> </plugins> </build> </profile> </profiles> We instruct the JAR signer plugin to be triggered during the package phase and run the goals of verification and signing. Furthermore, we tell the plugin to remove any existing signatures from the package and use the variable values we have defined in our global profile, $sign.alias, $sign.keystore, $sign.storepass and $sign.keypass. The "verbose" setting is used here to verify that the private key is used instead of the debug key. Before we run our new profile, for comparison purposes, let's package our application without using the signing capability. Open a terminal window, and type the following Maven command: mvn clean package When the command finishes, navigate to the paid version target directory, /PaidVersion/target, and take a look at its contents. You will notice that there are two packaging files: a PaidVersion.jar (size 14KB) and PaidVersion.apk (size 46KB). Since we haven't discussed yet about releasing an application, we can just run the following command in a terminal window and see how the private key is used for signing the package: mvn clean package -Prelease You must have probably noticed that we use only one profile name, and that is the beauty of Maven. Profiles with the same ID are merged together, and so it's easier to understand and maintain the build scripts. If you want to double-check that the package is signed with your private certificate, you can monitor the Maven output, and at some point you will see something similar to the following image: This output verifies that the classes have been properly signed through the execution of the Maven JAR signer plugin. To better understand how signing and optimization affects the packages generation, we can navigate again to the /PaidVersion/target directory and take a look at the files created. You will be surprised to see that the same packages exist again but they have different sizes. The PaidVersion.jar file has a size of 18KB, which is greater than the file generated without signing. However, the PaidVersion.apk is smaller (size 44KB) than the first version. These differences happen because the .jar file is signed with the new certificate; so the size is getting slightly bigger, but what about the .apk file. Should be also bigger because every file is signed with the certificate. The answer can be easily found if we open both the .apk files and compare them. They are compressed files so any well-known tool that opens compressed files can do this. If you take a closer look at the contents of the .apk files, you will notice that the contents of the .apk file that was generated using the private certificate are slightly larger except the resources.arsc file. This file, in the case of custom signing, is compressed, whereas in the debug signing mode it is in raw format. This explains why the signed version of the .apk file is smaller than the original one. There's also one last thing that verifies the correct completion of signing. Keep the compressed .apk files opened and navigate to the META-INF directory. These directories contain a couple of different files. The signed package with our personal certificate contains the key files named with the alias we used when we created the certificate and the package signed in debug mode contains the default certificate used by Android. Summary We know that Android developers struggle when it comes to proper package and release of an application to the public. We have analyzed in many details the necessary steps for a correct and complete packaging of Maven configuration. After reading this article, you should have basic knowledge of digitally signing packages with and without the help of Mavin in Android. Resources for Article: Further resources on this subject: Best Practices [article] Installing Apache Karaf [article] Apache Maven and m2eclipse [article]

0
0
7229

Packt

17 Mar 2015

7 min read

Azure Storage

Packt

17 Mar 2015

7 min read

0
0
1930

article-image-code-sharing-between-ios-and-android

Packt

17 Mar 2015

24 min read

Code Sharing Between iOS and Android

Packt

17 Mar 2015

24 min read

0
0
16763

Robi Sen

16 Mar 2015

7 min read

Text Mining with R: Part 1

Robi Sen

16 Mar 2015

7 min read

R is rapidly becoming the platform of choice for programmers, scientists, and others who need to perform statistical analysis and data mining. In part this is because R is incredibly easy to learn and with just a few commands you can perform data mining and analysis functions that would be very hard in more general purpose languages like Ruby, .Net, Java, or C++. To demonstrate R’s ease, flexibility, and power we will look at how to use R to look at a collection of tweets from the 2014 super bowl, clear up data via R, turn that data it a document matrix so we can analyze the data, then create a “word cloud” so we can visualize our analysis to look for interesting words. Getting Started To get started you need to download both R and R studio. R can be found here and RStudio can be found here. R and RStudio are available for most major operating systems and you should follow the up to date installation guides on their respective websites. For this example we are going to be using a data set from Techtunk which is rather large. For this article I have taken a small excerpt of techtrunks SuperBowl 2014, over 8 million tweets, and cleaned it up for the article. You can download it from the original data source here. Finally you will need to install the R packages text mining package (tm ) and word cloud package (wordcloud). You can use standard library method to install the packages or just use RStudio to install the packets. Preparing our Data As already stated you can find the total SuperBowl 2014 dataset. That being said, it's very large and broken up into many sets of Pipe Delimited files, and they have the .csv file extension but are not .csv, which can be somewhat awkward to work with. This though is a common problem when working with large data sets. Luckily the data set is broken up into fragments in that usually when you are working with large datasets you do not want to try to start developing against the whole data set rather a small and workable dataset the will let you quickly develop your scripts without being so large and unwieldy that it delays development. Indeed you will find that working the large files provided by Techtunk can take 10’s of minutes to process as is. In cases like this is good to look at the data, figure out what data you want, take a sample set of data, massage it as needed, and then work in it from there until you have your coding working exactly how you want. In our cases I took a subset of 4600 tweets from one of the pipe delimited files, converted the file format to Commas Separated Value, .csv, and saved it as a sample file to work from. You can do the same thing, you should consider using files smaller than 5000 records, however you would like or use the file created for this post here. Visualizing our Data For this post all we want to do is get a general sense of what the more common words are that are being tweeted during the superbowl. A common way to visualize this is with a word cloud which will show us the frequency of a term by representing it in greater size to other words in comparison to how many times it is mentioned in the body of tweets being analyzed. To do this we need to a few things first with our data. First we need to create read in our file and turn our collection of tweets into a Corpus. In general a Corpus is a large body of text documents. In R’s textming package it’s an object that will be used to hold our tweets in memory. So to load our tweets as a corpus into R you can do as shown here: # change this file location to suit your machine file_loc <- "yourfilelocation/largerset11.csv" # change TRUE to FALSE if you have no column headings in the CSV myfile <- read.csv(file_loc, header = TRUE, stringsAsFactors=FALSE) require(tm) mycorpus <- Corpus(DataframeSource(myfile[c("username","tweet")])) You can now simply print your Corpus to get a sense of it. > print(mycorpus) <<VCorpus (documents: 4688, metadata (corpus/indexed): 0/0)>> In this case, VCorpus is an automatic assignment meaning that the Corpus is a Volatile object stored in memory. If you want you can make the Corpus permanent using PCorpus. You might do this if you were doing analysis on actual documents such as PDF’s or even databases and in this case R makes pointers to the documents instead of full document structures in memory. Another method you can use to look at your corpus is inspect() which provides a variety of ways to look at the documents in your corpus. For example using: inspect(mycorpus[1,2]) This might give you a result like: > inspect(mycorpus[1:2]) <<VCorpus (documents: 2, metadata (corpus/indexed): 0/0)>> [[1]] <<PlainTextDocument (metadata: 7)>> sstanley84 wow rt thestanchion news fleury just made fraudulent investment karlsson httptco5oi6iwashg [[2]] <<PlainTextDocument (metadata: 7)>> judemgreen 2 hour wait train superbowl time traffic problemsnice job chris christie As such inspect can be very useful in quickly getting a sense of data in your corpus without having to try to print the whole corpus. Now that we have our corpus in memory let's clean it up a little before we do our analysis. Usually you want to do this on documents that you are analyzing to remove words that are not relevant to your analysis such as “stopwords” or words such as and, like, but, if, the, and the like which you don’t care about. To do this with the textmining package you want to use transforms. Transforms essentially various functions to all documents in a corpus and that the form of tm_map(your corpus, some function). For example we can use tm_map like this: mycorpus <- tm_map(mycorpus, removePunctuation) Which will now remove all the punctuation marks from our tweets. We can do some other transforms to clean up our data by converting all the text to lower case, removing stop words, extra whitespace, and the like. mycorpus <- tm_map(mycorpus, removePunctuation) mycorpus <- tm_map(mycorpus, content_transformer(tolower)) mycorpus <- tm_map(mycorpus, stripWhitespace) mycorpus <- tm_map(mycorpus, removeWords, c(stopwords("english"), "news")) Note the last line. In that case we are using the stopwords() method but also adding our own word to it; news. You could append your own list of stopwords in this manner. Summary In this post we have looked at the basics of doing text mining in R by selecting data, preparing it, cleaning, then performing various operations on it to visualize that data. In the next post, Part 2, we look at a simple use case showing how we can derive real meaning and value from a visualization by seeing how a simple word cloud and help you understand the impact of an advertisement. About the author Robi Sen, CSO at Department 13, is an experienced inventor, serial entrepreneur, and futurist whose dynamic twenty-plus year career in technology, engineering, and research has led him to work on cutting edge projects for DARPA, TSWG, SOCOM, RRTO, NASA, DOE, and the DOD. Robi also has extensive experience in the commercial space, including the co-creation of several successful start-up companies. He has worked with companies such as UnderArmour, Sony, CISCO, IBM, and many others to help build out new products and services. Robi specializes in bringing his unique vision and thought process to difficult and complex problems allowing companies and organizations to find innovative solutions that they can rapidly operationalize or go to market with.

0
0
3404

Understanding Core Data concepts

An Overview of Automation and Advent of Chef

GPS-enabled Time-lapse Recorder

Making Games with Pixi.js

Introducing Interactive Plotting

Finding People and Things

The Observer Pattern

Drupal 8 Configuration Management

Building Mobile Games with Crafty.js and PhoneGap: Part 1

Drupal 8 and Configuration Management

Trending Topics

Entity Framework Code-First: Accessing Database Views and Stored Procedures

Signing an application in Android using Maven

Azure Storage

Code Sharing Between iOS and Android

Text Mining with R: Part 1

Create a Free Account To Continue Reading

Sign in to activate your 7-day free access