Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletter Hub
Free Learning
Arrow right icon
timer SALE ENDS IN
0 Days
:
00 Hours
:
00 Minutes
:
00 Seconds

How-To Tutorials

7019 Articles
article-image-getting-started-forcecom
Packt
15 Apr 2016
17 min read
Save for later

Getting Started with Force.com

Packt
15 Apr 2016
17 min read
In this article by Siddhesh Kabe, the author of the book Salesforce Platform App Builder Certification Handbook, will introduce you to the Force.com platform. We will understand the life cycle of an application build using Force.com. We will define the multi-tenant architecture and understand how it will impact the data of the organization stored on the cloud. And finally, we will build our first application on Force.com. We will cover the following topics in this article: The multi-tenant architecture of Force.com Understanding the Force.com platform Application development on the Force.com platform Discussing the maintenance and releases schedule by Salesforce.com Types of Force.com applications Discussing the scenarios when to use point-and-click customization and when to use code Talking about Salesforce.com identity Developer resources So, let's get started and step into the cloud. (For more resources related to this topic, see here.) The cloud computing model of Force.com Force.com is a cloud computing platform used to build enterprise applications. The end user does not have to worry about networks, hardware, software licenses, or any other things. The data saved is completely secure in the cloud. The following features of Force.com make it a 100 percent cloud-based system: The multi-tenant architecture: The multi-tenant architecture is a way of serving multiple clients on the single software instance. Each client gets their own full version of the software configuration and data. They cannot utilize the other instance resources. The software is virtually partitioned into different instances. The basic structure of the multi-tenant architecture is shown in the following image: Just like how tenants in a single building share the resources of electricity and water, in the multi-tenant system, tenants share common resources and databases. In a multi-tenant system, such as Salesforce.com, different organizations use the same shared database system that is separated by a secure virtual partition. Special programs keep the data separated and make sure that no single organization monopolizes on the resources. Automatic upgrades: In a cloud computing system, all the new updates are automatically released to its subscribers. Any developments or customizations made during the previous version are automatically updated to the latest version without any manual modification to the code. This results in all instances of Salesforce staying up to date and on the same version. Subscription model: Force.com is distributed under the subscription model. The user can purchase a few licenses and build the system. After the system is up and successful, further user licenses can be purchased from Salesforce. This model ensures that there no large start up fees and we pay as we go, which adds fixed, predictable costs in the future. The subscription model can be visualized like the electricity distribution system. We pay for whatever electricity we use and not the complete generator and the infrastructure. Scalability: The multi-tenant kernel is already tested and running for many users simultaneously. If the organization is growing, there is always room for scaling the application with new users without worrying about the load balancing and data limitation. Force.com provides data storage per user, which means that the data storage increases with the number of users added to the organization. Upgrades and maintenance: Force.com releases three updated versions every year. The new releases consist of feature updates to Salesforce.com and the Force.com platform with selected top ideas from IdeaExchange. IdeaExchange is the community of Salesforce users where the users submit ideas and the community votes for them. The most popular ideas are considered by Salesforce in their next release. All the instances hosted on the servers are upgraded with no additional cost. The Salesforce maintenance outage during a major release is only 5 minutes. The sandboxes are upgraded early so there can be testing for compatibility with the new release. The new releases are backward compatible with previous releases, thus the old code will work with new versions. The upgrades are taken care of by Force.com and the end user gets the latest version running application. Understanding the new model of Salesforce1 platform In the earlier edition of the book, we discussed the Force.com platform in detail. In the last couple of years, Salesforce has introduced the new Salesforce1 platform. It encompasses all the existing features of the Force.com platform but also includes the new powerful tools for mobile development. The new Salesforce1 platform is build mobile first and all the existing features of cloud development are automatically available for mobiles. From Winter 16, Salesforce has also introduced the lighting experience. The lighting experience is another extension to the existing platform. It provides a brand new set of design and development library that lets developers build applications that work on mobiles as well as web. Let's take a detailed look at the services that form the platform offered by Force.com. The following section provides us with an overview of the Force.com platform. Force.com platform Force.com is the world's first cloud application development platform where end users can build, share, and run an application directly on the cloud. While most cloud computing systems provide the ability to deploy the code from the local machine, Force.com gives us the feature to directly write the code in cloud. The Force.com platform runs in a hosted multi-tenant environment, which gives the end users freedom to build their custom application without hardware purchases, database maintenance, and maintaining a software license. Salesforce.com provides the following main products: Sales force Automation, Sales Cloud Service and Support Center, Service Cloud The Exact Target Marketing Cloud Collaboration Cloud, Chatter The following screenshot shows the Force.com platform: The application built on Force.com is automatically hosted on the cloud platform. It can be used separately (without the standard Sales, Service, and Marketing cloud) or can be used in parallel with the existing Salesforce application. The users can access the application using a browser from any mobile, computer, tablet, and any of the operating system such as Windows, UNIX, Mac, and so on, giving them complete freedom of location. For a complete list of supported browsers, visit https://help.salesforce.com/apex/HTViewHelpDoc?id=getstart_browser_overview.htm. Model-View-Controller architecture The most efficient way to build an enterprise application is to clearly separate out the model, that is, data, the code, that is, controller, and the UI, that is, the View. By separating the three, we can make sure that each area is handled by an expert. The business logic is separated from the backend database and the frontend user interface.  It is also easy to upgrade a part of the system without disturbing the entire structure. The following diagram illustrates the model-view-controller of Force.com: We will be looking in detail at each layer in the MVC architecture in the subsequent article. Key technology behind the Force.com platform Force.com is a hosted multi-tenant service used to build a custom cloud computing application. It is a 100 percent cloud platform where we pay no extra cost for the hardware and network. Any application built on Force.com is directly hosted on the cloud and can be accessed using a simple browser from a computer or a mobile. The Force.com platform runs on some basic key technologies. The multi-tenant kernel The base of the platform forms a multi-tenant kernel where all users share a common code base and physical infrastructure. The multiple tenants, who are hosted on a shared server, share the resources under governor limits to prevent a single instance monopolizing the resources. The custom code and data are separated by software virtualization and users cannot access each other's code. The multi-tenant kernel ensures that all the instances are updated to the latest version of the software simultaneously. The updates are applied automatically without any patches or software download. The multi-tenant architecture is already live for one million users. This helps developers easily scale the applications from one to a million users with little or no modification at all. The following image illustrates the multi-tenant architecture: Traditional software systems are hosted on a single-tenant system, usually a client-server-based enterprise application. With the multi-tenant architecture, the end user does not have to worry about the hardware layer or software upgrade and patches. The software system deployed over the Internet can be accessed using a browser from any location possible, even wide ranges of mobile devices. The multi-tenant architecture also allows the applications to be low cost, quick to deploy, and open to innovation. Other examples of software using the multi-tenant architecture are webmail systems, such as www.Gmail.com, www.Yahoo.com, and online storage systems, such as www.Dropbox.com, or note-taking applications, such as Evernote, Springpad, and so on. Force.com metadata Force.com is entirely metadata-driven. The metadata is defined in XML and can be extracted and imported. We will look into metadata in detail later in this article. Force.com Webservice API The data and the metadata stored on the Force.com server can be accessed programmatically through the Webservice API. This enables the developers to extend the functionality to virtually any language, operating system, and platform possible. The web services are based on open web standards, such as SOAP XML and JSON REST, and are directly compatible with other technologies, such as .Net, JAVA, SAP, and Oracle. We can easily integrate the Force.com application with the current business application without rewriting the entire code. Apex and Visualforce Apex is the world's first on-demand language introduced by Salesforce. It is an object-oriented language very similar to C# or JAVA. Apex is specially designed to process bulk data for business applications. Apex is used to write the controller in the MVC architecture. Salesforce Object Query Language (SOQL) gives developers an easy and declarative query language that can fetch and process a large amount of data in an easy, human-readable query language. For those who have used other relational database systems, such as Oracle, SQL Server, and so on, it is similar to SQL but does not support advance capabilities, such as joins. Apex and SOQL together give the developers powerful tools to manage the data and processes of their application, leaving the rest of the overhead on the Force.com platform. The following screenshot shows the page editor for Visualforce. It is easy to use and splits a page into two parts: the one at the bottom is for development and the above half shows the output: Visualforce is an easy to use, yet a powerful framework used to create rich user interfaces, thus extending the standard tabs and forms to any kind of interfaces imaginable. Visualforce ultimately renders into HTML, and hence, we can use any HTML code alongside the Visualforce markup to create a powerful and rich UI to manage business applications. Apart from the UI, Visualforce provides a very easy and direct access to the server-side data and metadata from Apex. The powerful combination of a rich UI with access to the Salesforce metadata makes Visualforce the ultimate solution to build powerful business applications on Salesforce. As the Salesforce.com Certified Force.com Developer Certification does not include Apex and Visualforce, we won't be going into detail about Apex and Visualforce in this book. The developer Console The developer console is an Integrated Development Environment (IDE) for tools to help write code, run tests, and debug the system. The developer console provides an editor for writing code. It also provides a UI to monitor and debug Unit test classes, as shown in the following screenshot: AppExchange AppExchange is the directory of applications build on the Force.com platform. Developers can choose to submit their developed applications on AppExchange. The applications extend the functionality of Force.com beyond CRM with many ready-made business applications available to download and use. AppExchange is available at http://appexchange.salesforce.com. Force.com sites Using Force.com sites or site.com, we can build public facing websites that use the existing Salesforce data and browser technologies, such as HTML, JavaScript, CSS, Angular JS, Bootstrap, and so on. The sites can have an external login for sensitive data or a no login public portal that can be linked to the corporate website as well. Site.com helps in creating websites using drag-and-drop controls. The user with less or no HTML knowledge can build websites using the site.com editor. Force.com development Like any other traditional software development process, the Force.com platform offers tools used to define data, business process, logic, and rich UI for the business application. Many of these tools are built-in, point-and-click tools simplified for users without any development skills. Any user with no programming knowledge can build applications suitable to their business on Force.com. The point-and-click tools are easy to use, but they have limitations and control. To extend the platform beyond these limitations, we use Apex and Visualforce. Let's now compare the tools used for traditional software development and Force.com:   JAVA Dot Net Force.com Building the database Oracle, MS-Access, SQL, or any third-party database setup Oracle, MS-Access, SQL, or any third-party database setup Salesforce metadata (now database.com) Connection to the database JDBC   Ado.net Salesforce metadata API Developing the IDE NetBeans, Eclipse, and so on Visual Studio Online Page Editor and App Setup, Force.com IDE, Mavens Mate, and Aside.io Controlled environment for development and testing Local servers, remote test servers Local servers, remote test servers Force.com real time sandboxes Force.com metadata Everything on Force.com such as data models, objects, forms, tabs, and workflows are defined by metadata. The definitions or metadata are made in XML and can be extracted and imported. The metadata-driven development also helps users with no prior development experience to build business applications without any need to code. We can define the objects, tabs, and forms in the UI using point-and-click. All the changes made to the metadata in App-Setup are tracked. Alternatively, the developers can customize every part of Salesforce using XML flies that control the organization's metadata. The files are downloaded using the Eclipse IDE or Force.com IDE. To customize metadata on Salesforce UI, go to Setup | Build: As Force.com Developer Certification is about using point-and-click, we will be going into the setup details in the coming article. Metadata API The metadata API provides easy access to the organization data, business logic, and the user interface. We can modify the metadata in a controlled test organization called the sandbox. Finally, the tested changes can be deployed to a live production environment edition. The production environment is the live environment that is used by the users and contains live data. The production instance does not allow developers to code in them directly; this ensures that only debugged and tested code reaches the live organization. Online page editor and the Eclipse Force.com IDE Force.com provides a built-in online editor that helps edit the Visualforce pages in real time. The online editor can be enabled by checking the Development Mode checkbox on the user profile, as shown in the following screenshot: The online page editor splits the screen into two parts with live code in the bottom half and the final page output in the top half. Force.com also provides an inline editor for editing the Apex code in the browser itself. Force.com IDE is an IDE built over eclipse. It provides an easy environment to write code and also offline saving. It also comes with a schema browser and a query generator, which helps in generating simple queries (select statements) by selecting fields and objects. The code is auto synced with the organization. Sandboxes Force.com provides a real-time environment to develop, test, and train people in the organization. It is a safe and isolated environment where any changes made will not affect the production data or application. These sandboxes are used to experiment on new features without disturbing the live production organization. Separation of test and dev instances also ensures that only the tested and verified code reaches the production organization. There are three types of sandboxes: Developer sandbox: This environment is specially used to code and test the environment by a single developer. Just like the configuration-only sandbox, this also copies the entire customization of the production organization, excluding the data. The added feature of a developer sandbox is that it allows Apex and Visualforce coding also. Developer pro sandbox: Developer pro sandboxes are similar to developer sandboxes but with larger storage. This sandbox is mostly used to handle more developer and quality assurance tasks. With a larger sandbox, we can store more data and run more efficient tasks. Partial copy sandbox: This is used as a testing environment. This environment copies the full metadata of the production environment and a subset of production data that can be set using a template. Full copy sandbox: This copies the entire production organization and all its data records, documents, and attachments. This is usually used to develop and test a new application until it is ready to be shared with the users. Full copy sandbox has the same IDs of the records as that of production only when it has been freshly created. Force.com application types There are some common types of applications that are required to automate an enterprise process. They are as follows: Content centric applications: These applications enable organizations to share and version content across different levels. They consist of file sharing systems, versioning systems, and content management systems. Transaction centric applications: These applications focus on the transaction. They are applications, such as banking systems, online payment systems, and so on. Process centric applications: These applications focus on automating the business process in the organization such as a bug tracking system, procurement process, approval process, and so on. Force.com is suited to build these kinds of applications. Data centric applications: These applications are built around a powerful database. Many of the organizations use spreadsheets for these applications. Some examples include CRM, HRM, and so on. Force.com is suited to build these kinds of applications. Developing on the Force.com platform There are two ways of development on Force.com: one way is to use point-and-click without a single line of coding, called the declarative development. The other way is to develop an application using code, called programmatic development. Let's take a look at the two types of development in detail. Declarative development The declarative type of development is done by point-and-click using a browser. We use ready-to-use components and modify their configuration to build applications. We can add new objects, define their standard views, and create input forms with simple point-and-link with no coding knowledge. The declarative framework allows rapid development and deployment of applications. The declarative development also follows the MVC architecture in development. The MVC components in the declarative development using Force.com are mentioned in the following table: Model View Controller Objects Fields Relationships   Applications Tabs Page layouts Record types Workflow rules Validation rules Assignment rules   Summary In this article, we became familiar with the Force.com platform. We have seen the life cycle of an application build using Force.com. We saw the multi-tenant architecture and how it is different from the web hosting server. We have a fresh new developer account, and now in further article, we will be using it to build an application on Force.com. Resources for Article: Further resources on this subject: Custom Coding with Apex [article] Auto updating child records in Process Builder [article] Configuration in Salesforce CRM [article]
Read more
  • 0
  • 0
  • 6299

article-image-web-server-development
Packt
15 Apr 2016
24 min read
Save for later

Web Server Development

Packt
15 Apr 2016
24 min read
In this article by Holger Brunn, Alexandre Fayolle, and Daniel Eufémio Gago Reis, the authors of the book, Odoo Development Cookbook, have discussed how to deploy the web server in Odoo. In this article, we'll cover the following topics: Make a path accessible from the network Restrict access to web accessible paths Consume parameters passed to your handlers Modify an existing handler Using the RPC API (For more resources related to this topic, see here.) Introduction We'll introduce the basics of the web server part of Odoo in this article. Note that this article covers the fundamental pieces. All of Odoo's web request handling is driven by the Python library werkzeug (http://werkzeug.pocoo.org). While the complexity of werkzeug is mostly hidden by Odoo's convenient wrappers, it is an interesting read to see how things work under the hood. Make a path accessible from the network In this recipe, we'll see how to make an URL of the form http://yourserver/path1/path2 accessible to users. This can either be a web page or a path returning arbitrary data to be consumed by other programs. In the latter case, you would usually use the JSON format to consume parameters and to offer you data. Getting ready We'll make use of a ready-made library.book model. We want to allow any user to query the full list of books. Furthermore, we want to provide the same information to programs via a JSON request. How to do it… We'll need to add controllers, which go into a folder called controllers by convention. Add a controllers/main.py file with the HTML version of our page: from openerp import http from openerp.http import request class Main(http.Controller): @http.route('/my_module/books', type='http', auth='none') def books(self): records = request.env['library.book']. sudo().search([]) result = '<html><body><table><tr><td>' result += '</td></tr><tr><td>'.join( records.mapped('name')) result += '</td></tr></table></body></html>' return result Add a function to serve the same information in the JSON format @http.route('/my_module/books/json', type='json', auth='none') def books_json(self): records = request.env['library.book']. sudo().search([]) return records.read(['name']) Add the file controllers/__init__.py: from . import main Add controllers to your __init__.py addon: from . import controllers After restarting your server, you can visit /my_module/books in your browser and get presented with a flat list of book names. To test the JSON-RPC part, you'll have to craft a JSON request. A simple way to do that would be using the following command line to receive the output on the command line: curl -i -X POST -H "Content-Type: application/json" -d "{}" localhost:8069/my_module/books/json If you get 404 errors at this point, you probably have more than one database available on your instance. In this case, it's impossible for Odoo to determine which database is meant to serve the request. Use the --db-filter='^yourdatabasename$' parameter to force using exact database you installed the module in. Now the path should be accessible. How it works… The two crucial parts here are that our controller is derived from openerp.http.Controller and that the methods we use to serve content are decorated with openerp.http.route. Inheriting from openerp.http.Controller registers the controller with Odoo's routing system in a similar way as models are registered by inheriting from openerp.models.Model; also, Controller has a meta class that takes care of this. In general, paths handled by your addon should start with your addon's name to avoid name clashes. Of course, if you extend some addon's functionality, you'll use this addon's name. openerp.http.route The route decorator allows us to tell Odoo that a method is to be web accessible in the first place, and the first parameter determines on which path it is accessible. Instead of a string, you can also pass a list of strings in case you use the same function to serve multiple paths. The type argument defaults to http and determines what type of request is to be served. While strictly speaking JSON is HTTP, declaring the second function as type='json' makes life a lot easier, because Odoo then handles type conversions itself. Don't worry about the auth parameter for now, it will be addressed in recipe Restrict access to web accessible paths. Return values Odoo's treatment of the functions' return values is determined by the type argument of the route decorator. For type='http', we usually want to deliver some HTML, so the first function simply returns a string containing it. An alternative is to use request.make_response(), which gives you control over the headers to send in the response. So to indicate when our page was updated the last time, we might change the last line in books() to the following: return request.make_response( result, [ ('Last-modified', email.utils.formatdate( ( fields.Datetime.from_string( request.env['library.book'].sudo() .search([], order='write_date desc', limit=1) .write_date) - datetime.datetime(1970, 1, 1) ).total_seconds(), usegmt=True)), ]) This code sends a Last-modified header along with the HTML we generated, telling the browser when the list was modified for the last time. We extract this information from the write_date field of the library.book model. In order for the preceding snippet to work, you'll have to add some imports on the top of the file: import email import datetime from openerp import fields You can also create a Response object of werkzeug manually and return that, but there's little gain for the effort. Generating HTML manually is nice for demonstration purposes, but you should never do this in production code. Always use templates as appropriate and return them by calling request.render(). This will give you localization for free and makes your code better by separating business logic from the presentation layer. Also, templates provide you with functions to escape data before outputting HTML. The preceding code is vulnerable to cross-site-scripting attacks if a user manages to slip a script tag into the book name, for example. For a JSON request, simply return the data structure you want to hand over to the client, Odoo takes care of serialization. For this to work, you should restrict yourself to data types that are JSON serializable, which are roughly dictionaries, lists, strings, floats and integers. openerp.http.request The request object is a static object referring to the currently handled request, which contains everything you need to take useful action. Most important is the property request.env, which contains an Environment object which is just the same as in self.env for models. This environment is bound to the current user, which is none in the preceding example because we used auth='none'. Lack of a user is also why we have to sudo() all our calls to model methods in the example code. If you're used to web development, you'll expect session handling, which is perfectly correct. Use request.session for an OpenERPSession object (which is quite a thin wrapper around the Session object of werkzeug), and request.session.sid to access the session id. To store session values, just treat request.session as a dictionary: request.session['hello'] = 'world' request.session.get('hello') Note that storing data in the session is not different from using global variables. Use it only if you must - that is usually the case for multi request actions like a checkout in the website_sale module. And also in this case, handle all functionality concerning sessions in your controllers, never in your modules. There's more… The route decorator can have some extra parameters to customize its behavior further. By default, all HTTP methods are allowed, and Odoo intermingles with the parameters passed. Using the parameter methods, you can pass a list of methods to accept, which usually would be one of either ['GET'] or ['POST']. To allow cross origin requests (browsers block AJAX and some other types of requests to domains other than where the script was loaded from for security and privacy reasons), set the cors parameter to * to allow requests from all origins, or some URI to restrict requests to ones originating from this URI. If this parameter is unset, which is the default, the Access-Control-Allow-Origin header is not set, leaving you with the browser's standard behavior. In our example, we might want to set it on /my_module/books/json in order to allow scripts pulled from other websites accessing the list of books. By default, Odoo protects certain types of requests from an attack known as cross-site request forgery by passing a token along on every request. If you want to turn that off, set the parameter csrf to False, but note that this is a bad idea in general. See also If you host multiple Odoo databases on the same instance and each database has different web accessible paths on possibly multiple domain names per database, the standard regular expressions in the --db-filter parameter might not be enough to force the right database for every domain. In that case, use the community module dbfilter_from_header from https://github.com/OCA/server-tools in order to configure the database filters on proxy level. To see how using templates makes modularity possible, see recipe Modify an existing handler later in the article. Restrict access to web accessible paths We'll explore the three authentication mechanisms Odoo provides for routes in this recipe. We'll define routes with different authentication mechanisms in order to show their differences. Getting ready As we extend code from the previous recipe, we'll also depend on the library.book model, so you should get its code correct in order to proceed. How to do it… Define handlers in controllers/main.py: Add a path that shows all books: @http.route('/my_module/all-books', type='http', auth='none') def all_books(self): records = request.env['library.book'].sudo().search([]) result = '<html><body><table><tr><td>' result += '</td></tr><tr><td>'.join( records.mapped('name')) result += '</td></tr></table></body></html>' return result Add a path that shows all books and indicates which was written by the current user, if any: @http.route('/my_module/all-books/mark-mine', type='http', auth='public') def all_books_mark_mine(self): records = request.env['library.book'].sudo().search([]) result = '<html><body><table>' for record in records: result += '<tr>' if record.author_ids & request.env.user.partner_id: result += '<th>' else: result += '<td>' result += record.name if record.author_ids & request.env.user.partner_id: result += '</th>' else: result += '</td>' result += '</tr>' result += '</table></body></html>' return result Add a path that shows the current user's books: @http.route('/my_module/all-books/mine', type='http', auth='user') def all_books_mine(self): records = request.env['library.book'].search([ ('author_ids', 'in', request.env.user.partner_id.ids), ]) result = '<html><body><table><tr><td>' result += '</td></tr><tr><td>'.join( records.mapped('name')) result += '</td></tr></table></body></html>' return result With this code, the paths /my_module/all_books and /my_module/all_books/mark_mine look the same for unauthenticated users, while a logged in user sees her books in a bold font on the latter path. The path /my_module/all-books/mine is not accessible at all for unauthenticated users. If you try to access it without being authenticated, you'll be redirected to the login screen in order to do so. How it works… The difference between authentication methods is basically what you can expect from the content of request.env.user. For auth='none', the user record is always empty, even if an authenticated user is accessing the path. Use this if you want to serve content that has no dependencies on users, or if you want to provide database agnostic functionality in a server wide module. The value auth='public' sets the user record to a special user with XML ID, base.public_user, for unauthenticated users, and to the user's record for authenticated ones. This is the right choice if you want to offer functionality to both unauthenticated and authenticated users, while the authenticated ones get some extras, as demonstrated in the preceding code. Use auth='user' to be sure that only authenticated users have access to what you've got to offer. With this method, you can be sure request.env.user points to some existing user. There's more… The magic for authentication methods happens in the ir.http model from the base addon. For whatever value you pass to the auth parameter in your route, Odoo searches for a function called _auth_method_<yourvalue> on this model, so you can easily customize this by inheriting this model and declaring a method that takes care of your authentication method of choice. As an example, we provide an authentication method base_group_user which enforces a currently logged in user who is a member of the group with XML ID, base.group_user: from openerp import exceptions, http, models from openerp.http import request class IrHttp(models.Model): _inherit = 'ir.http' def _auth_method_base_group_user(self): self._auth_method_user() if not request.env.user.has_group('base.group_user'): raise exceptions.AccessDenied() Now you can say auth='base_group_user' in your decorator and be sure that users running this route's handler are members of this group. With a little trickery you can extend this to auth='groups(xmlid1,…)', the implementation of this is left as an exercise to the reader, but is included in the example code. Consume parameters passed to your handlers It's nice to be able to show content, but it's better to show content as a result of some user input. This recipe will demonstrate the different ways to receive this input and react to it. As the recipes before, we'll make use of the library.book model. How to do it… First, we'll add a route that expects a traditional parameter with a book's ID to show some details about it. Then, we'll do the same, but we'll incorporate our parameter into the path itself: Add a path that expects a book's ID as parameter: @http.route('/my_module/book_details', type='http', auth='none') def book_details(self, book_id): record = request.env['library.book'].sudo().browse( int(book_id)) return u'<html><body><h1>%s</h1>Authors: %s' % ( record.name, u', '.join(record.author_ids.mapped( 'name')) or 'none', ) Add a path where we can pass the book's ID in the path @http.route("/my_module/book_details/<model('library.book') :book>", type='http', auth='none') def book_details_in_path(self, book): return self.book_details(book.id) If you point your browser to /my_module/book_details?book_id=1, you should see a detail page of the book with ID 1. If this doesn't exist, you'll receive an error page. The second handler allows you to go to /my_module/book_details/1 and view the same page. How it works… By default, Odoo (actually werkzeug) intermingles with GET and POST parameters and passes them as keyword argument to your handler. So by simply declaring your function as expecting a parameter called book_id, you introduce this parameter as either GET (the parameter in the URL) or POST (usually passed by forms with your handler as action) parameter. Given that we didn't add a default value for this parameter, the runtime will raise an error if you try to access this path without setting the parameter. The second example makes use of the fact that in a werkzeug environment, most paths are virtual anyway. So we can simply define our path as containing some input. In this case, we say we expect the ID of a library.book as the last component of the path. The name after the colon is the name of a keyword argument. Our function will be called with this parameter passed as keyword argument. Here, Odoo takes care of looking up this ID and delivering a browse record, which of course only works if the user accessing this path has appropriate permissions. Given that book is a browse record, we can simply recycle the first example's function by passing book.id as parameter book_id to give out the same content. There's more… Defining parameters within the path is a functionality delivered by werkzeug, which is called converters. The model converter is added by Odoo, which also defines the converter, models, that accepts a comma separated list of IDs and passes a record set containing those IDs to your handler. The beauty of converters is that the runtime coerces the parameters to the expected type, while you're on your own with normal keyword parameters. These are delivered as strings and you have to take care of the necessary type conversions yourself, as seen in the first example. Built-in werkzeug converters include int, float, and string, but also more intricate ones such as path, any, or uuid. You can look up their semantics at http://werkzeug.pocoo.org/docs/0.11/routing/#builtin-converters. See also Odoo's custom converters are defined in ir_http.py in the base module and registered in the _get_converters method of ir.http. As an exercise, you can create your own converter that allows you to visit the /my_module/book_details/Odoo+cookbook page to receive the details of this book (if you added it to your library before). Modify an existing handler When you install the website module, the path /website/info displays some information about your Odoo instance. In this recipe, we override this in order to change this information page's layout, but also to change what is displayed. Getting ready Install the website module and inspect the path /website/info. Now craft a new module that depends on website and uses the following code. How to do it… We'll have to adapt the existing template and override the existing handler: Override the qweb template in a file called views/templates.xml: <?xml version="1.0" encoding="UTF-8"?> <odoo> <template id="show_website_info" inherit_id="website.show_website_info"> <xpath expr="//dl[@t-foreach='apps']" position="replace"> <table class="table"> <tr t-foreach="apps" t-as="app"> <th> <a t-att-href="app.website"> <t t-esc="app.name" /></a> </th> <td><t t-esc="app.summary" /></td> </tr> </table> </xpath> </template> </odoo> Override the handler in a file called controllers/main.py: from openerp import http from openerp.addons.website.controllers.main import Website class Website(Website): @http.route() def website_info(self): result = super(Website, self).website_info() result.qcontext['apps'] = result.qcontext[ 'apps'].filtered( lambda x: x.name != 'website') return result Now when visiting the info page, we'll only see a filtered list of installed applications, and in a table as opposed to the original definition list. How it works In the first step, we override an existing QWeb template. In order to find out which that is, you'll have to consult the code of the original handler. Usually, it will end with the following command line, which tells you that you need to override template.name: return request.render('template.name', values) In our case, the handler uses a template called website.info, but this one is extended immediately by another template called website.show_website_info, so it's more convenient to override this one. Here, we replace the definition list showing installed apps with a table. In order to override the handler method, we must identify the class that defines the handler, which is openerp.addons.website.controllers.main.Website in this case. We import the class to be able to inherit from it. Now we override the method and change the data passed to the response. Note that what the overridden handler returns is a Response object and not a string of HTML as the previous recipes did for the sake of brevity. This object contains a reference to the template to be used and the values accessible to the template, but is only evaluated at the very end of the request. In general, there are three ways to change an existing handler: If it uses a QWeb template, the simplest way of changing it is to override the template. This is the right choice for layout changes and small logic changes. QWeb templates get a context passed, which is available in the response as the field qcontext. This usually is a dictionary where you can add or remove values to suit your needs. In the preceding example, we filter the list of apps to only contain apps which have a website set. If the handler receives parameters, you could also preprocess those in order to have the overridden handler behave the way you want. There's more… As seen in the preceding section, inheritance with controllers works slightly differently than model inheritance: You actually need a reference to the base class and use Python inheritance on it. Don't forget to decorate your new handler with the @http.route decorator; Odoo uses it as a marker for which methods are exposed to the network layer. If you omit the decorator, you actually make the handler's path inaccessible. The @http.route decorator itself behaves similarly to field declarations: every value you don't set will be derived from the decorator of the function you're overriding, so we don't have to repeat values we don't want to change. After receiving a response object from the function you override, you can do a lot more than just changing the QWeb context: You can add or remove HTTP headers by manipulating response.headers. If you want to render an entirely different template, you can set response.template. To detect if a response is based on QWeb in the first place, query response.is_qweb. The resulting HTML code is available by calling response.render(). Using the RPC API One of Odoo's strengths is its interoperability, which is helped by the fact that basically any functionality is available via JSON-RPC 2.0 and XMLRPC. In this recipe, we'll explore how to use both of them from client code. This interface also enables you to integrate Odoo with any other application. Making functionality available via any of the two protocols on the server side is explained in the There's more section of this recipe. We'll query a list of installed modules from the Odoo instance, so that we could show a list as the one displayed in the previous recipe in our own application or website. How to do it… The following code is not meant to run within Odoo, but as simple scripts: First, we query the list of installed modules via XMLRPC: #!/usr/bin/env python2 import xmlrpclib db = 'odoo9' user = 'admin' password = 'admin' uid = xmlrpclib.ServerProxy( 'http://localhost:8069/xmlrpc/2/common') .authenticate(db, user, password, {}) odoo = xmlrpclib.ServerProxy( 'http://localhost:8069/xmlrpc/2/object') installed_modules = odoo.execute_kw( db, uid, password, 'ir.module.module', 'search_read', [[('state', '=', 'installed')], ['name']], {'context': {'lang': 'fr_FR'}}) for module in installed_modules: print module['name'] Then we do the same with JSONRPC: import json import urllib2 db = 'odoo9' user = 'admin' password = 'admin' request = urllib2.Request( 'http://localhost:8069/web/session/authenticate', json.dumps({ 'jsonrpc': '2.0', 'params': { 'db': db, 'login': user, 'password': password, }, }), {'Content-type': 'application/json'}) result = urllib2.urlopen(request).read() result = json.loads(result) session_id = result['result']['session_id'] request = urllib2.Request( 'http://localhost:8069/web/dataset/call_kw', json.dumps({ 'jsonrpc': '2.0', 'params': { 'model': 'ir.module.module', 'method': 'search_read', 'args': [ [('state', '=', 'installed')], ['name'], ], 'kwargs': {'context': {'lang': 'fr_FR'}}, }, }), { 'X-Openerp-Session-Id': session_id, 'Content-type': 'application/json', }) result = urllib2.urlopen(request).read() result = json.loads(result) for module in result['result']: print module['name'] Both code snippets will print a list of installed modules, and because they pass a context that sets the language to French, the list will be in French if there are no translations available. How it works… Both snippets call the function search_read, which is very convenient because you can specify a search domain on the model you call, pass a list of fields you want to be returned, and receive the result in one request. In older versions of Odoo, you had to call search first to receive a list of IDs and then call read to actually read the data. search_read returns a list of dictionaries, with the keys being the names of the fields requested and the values the record's data. The ID field will always be transmitted, no matter if you requested it or not. Now, we need to look at the specifics of the two protocols. XMLRPC The XMLRPC API expects a user ID and a password for every call, which is why we need to fetch this ID via the method authenticate on the path /xmlrpc/2/common. If you already know the user's ID, you can skip this step. As soon as you know the user's ID, you can call any model's method by calling execute_kw on the path /xmlrpc/2/object. This method expects the database you want to execute the function on, the user's ID and password for authentication, then the model you want to call your function on, and then the function's name. The next two mandatory parameters are a list of positional arguments to your function, and a dictionary of keyword arguments. JSONRPC Don't be distracted by the size of the code example, that's because Python doesn't have built in support for JSONRPC. As soon as you've wrapped the urllib calls in some helper functions, the example will be as concise as the XMLRPC one. As JSONRPC is stateful, the first thing we have to do is to request a session at /web/session/authenticate. This function takes the database, the user's name, and their password. The crucial part here is that we record the session ID Odoo created, which we pass in the header X-Openerp-Session-Id to /web/dataset/call_kw. Then the function behaves the same as execute_kw from; we need to pass a model name and a function to call on it, then positional and keyword arguments. There's more… Both protocols allow you to call basically any function of your models. In case you don't want a function to be available via either interface, prepend its name with an underscore – Odoo won't expose those functions as RPC calls. Furthermore, you need to take care that your parameters, as well as the return values, are serializable for the protocol. To be sure, restrict yourself to scalar values, dictionaries, and lists. As you can do roughly the same with both protocols, it's up to you which one to use. This decision should be mainly driven by what your platform supports best. In a web context, you're generally better off with JSON, because Odoo allows JSON handlers to pass a CORS header conveniently (see the Make a path accessible from the network recipe for details). This is rather difficult with XMLRPC. Summary In this article, we saw how to start about with the web server architecture. Later on, we covered the Routes and Controllers that will be used in the article and their authentication, how the handlers consumes parameters, and how to use an RPC API, namely, JSON-RPC and XML-RPC. Resources for Article: Further resources on this subject: Advanced React [article] Remote Authentication [article] ASP.Net Site Performance: Improving JavaScript Loading [article]
Read more
  • 0
  • 0
  • 17536

article-image-building-our-first-poky-image-raspberry-pi
Packt
14 Apr 2016
12 min read
Save for later

Building Our First Poky Image for the Raspberry Pi

Packt
14 Apr 2016
12 min read
In this article by Pierre-Jean TEXIER, the author of the book Yocto for Raspberry Pi, covers basic concepts of the Poky workflow. Using the Linux command line, we will proceed to different steps, download, configure, and prepare the Poky Raspberry Pi environment and generate an image that can be used by the target. (For more resources related to this topic, see here.) Installing the required packages for the host system The steps necessary for the configuration of the host system depend on the Linux distribution used. It is advisable to use one of the Linux distributions maintained and supported by Poky. This is to avoid wasting time and energy in setting up the host system. Currently, the Yocto project is supported on the following distributions: Ubuntu 12.04 (LTS) Ubuntu 13.10 Ubuntu 14.04 (LTS) Fedora release 19 (Schrödinger's Cat) Fedora release 21 CentOS release 6.4 CentOS release 7.0 Debian GNU/Linux 7.0 (Wheezy) Debian GNU/Linux 7.1 (Wheezy) Debian GNU/Linux 7.2 (Wheezy) Debian GNU/Linux 7.3 (Wheezy) Debian GNU/Linux 7.4 (Wheezy) Debian GNU/Linux 7.5 (Wheezy) Debian GNU/Linux 7.6 (Wheezy) openSUSE 12.2 openSUSE 12.3 openSUSE 13.1 Even if your distribution is not listed here, it does not mean that Poky will not work, but the outcome cannot be guaranteed. If you want more information about Linux distributions, you can visitthis link: http://www.yoctoproject.org/docs/current/ref-manual/ref-manual.html Poky on Ubuntu The following list shows required packages by function, given a supported Ubuntu or Debian Linux distribution. The dependencies for a compatible environment include: Download tools: wget and git-core Decompression tools: unzip and tar Compilation tools: gcc-multilib, build-essential, and chrpath String-manipulation tools: sed and gawk Document-management tools: texinfo, xsltproc, docbook-utils, fop, dblatex, and xmlto Patch-management tools: patch and diffstat Here is the command to type on a headless system: $ sudo apt-get install gawk wget git-core diffstat unzip texinfo gcc-multilib build-essential chrpath  Poky on Fedora If you want to use Fedora, you just have to type this command: $ sudo yum install gawk make wget tar bzip2 gzip python unzip perl patch diffutils diffstat git cpp gcc gcc-c++ glibc-devel texinfo chrpath ccache perl-Data-Dumper perl-Text-ParseWords perl-Thread-Queue socat Downloading the Poky metadata After having installed all the necessary packages, it is time to download the sources from Poky. This is done through the git tool, as follows: $ git clone git://git.yoctoproject.org/poky (branch master) Another method is to download tar.bz2 file directly from this repository: https://www.yoctoproject.org/downloads To avoid all hazardous and problematic manipulations, it is strongly recommended to create and switch to a specific local branch. Use these commands: $ cd poky $ git checkout daisy –b work_branch Downloading the Raspberry Pi BSP metadata At this stage, we only have the base of the reference system (Poky), and we have no support for the Broadcom BCM SoC. Basically, the BSP proposed by Poky only offers the following targets: $ ls meta/conf/machine/*.conf beaglebone.conf edgerouter.conf genericx86-64.conf genericx86.conf mpc8315e-rdb.conf This is in addition to those provided by OE-Core: $ ls meta/conf/machine/*.conf qemuarm64.conf qemuarm.conf qemumips64.conf qemumips.conf qemuppc.conf qemux86-64.conf qemux86.conf In order to generate a compatible system for our target, download the specific layer (the BSP Layer) for the Raspberry Pi: $ git clone git://git.yoctoproject.org/meta-raspberrypi If you want to learn more about git scm, you can visit the official website: http://git-scm.com/ Now we can verify whether we have the configuration metadata for our platform (the rasberrypi.conf file): $ ls meta-raspberrypi/conf/machine/*.conf raspberrypi.conf This screenshot shows the meta-raspberrypi folder: The examples and code presented in this article use Yocto Project version 1.7 and Poky version 12.0. For reference,the codename is Dizzy. Now that we have our environment freshly downloaded, we can proceed with its initialization and the configuration of our image through various configurations files. The oe-init-build-env script As can be seen in the screenshot, the Poky directory contains a script named oe-init-build-env. This is a script for the configuration/initialization of the build environment. It is not intended to be executed but must be "sourced". Its work, among others, is to initialize a certain number of environment variables and place yourself in the build directory's designated argument. The script must be run as shown here: $ source oe-init-build-env [build-directory] Here, build-directory is an optional parameter for the name of the directory where the environment is set (for example, we can use several build directories in a single Poky source tree); in case it is not given, it defaults to build. The build-directory folder is the place where we perform the builds. But, in order to standardize the steps, we will use the following command throughout to initialize our environment: $ source oe-init-build-env rpi-build ### Shell environment set up for builds. ### You can now run 'bitbake <target>' Common targets are:     core-image-minimal     core-image-sato     meta-toolchain     adt-installer     meta-ide-support You can also run generated qemu images with a command like 'runqemu qemux86' When we initialize a build environment, it creates a directory (the conf directory) inside rpi-build. This folder contain two important files: local.conf: It contains parameters to configure BitBake behavior. bblayers.conf: It lists the different layers that BitBake takes into account in its implementation. This list is assigned to the BBLAYERS variable. Editing the local.conf file The local.conf file under rpi-build/conf/ is a file that can configure every aspect of the build process. It is through this file that we can choose the target machine (the MACHINE variable), the distribution (the DISTRO variable), the type of package (the PACKAGE_CLASSES variable), and the host configuration (PARALLEL_MAKE, for example). The minimal set of variables we have to change from the default is the following: BB_NUMBER_THREADS ?= "${@oe.utils.cpu_count()}" PARALLEL_MAKE ?= "-j ${@oe.utils.cpu_count()}" MACHINE ?= raspberrypi MACHINE ?= "raspberrypi" The BB_NUMBER_THREADS variable determines the number of tasks that BitBake will perform in parallel (tasks under Yocto; we're not necessarily talking about compilation). By default, in build/conf/local.conf, this variable is initialized with ${@oe.utils.cpu_count()},corresponding to the number of cores detected on the host system (/proc/cpuinfo). The PARALLEL_MAKE variable corresponds to the -j of the make option to specify the number of processes that GNU Make can run in parallel on a compilation task. Again, it is the number of cores present that defines the default value used. The MACHINE variable is where we determine the target machine we wish to build for the Raspberry Pi (define in the .conf file; in our case, it is raspberrypi.conf). Editing the bblayers.conf file Now, we still have to add the specific layer to our target. This will have the effect of making recipes from this layer available to our build. Therefore, we should edit the build/conf/bblayers.conf file: # LAYER_CONF_VERSION is increased each time build/conf/bblayers.conf # changes incompatibly LCONF_VERSION = "6" BBPATH = "${TOPDIR}" BBFILES ?= "" BBLAYERS ?= "   /home/packt/RASPBERRYPI/poky/meta   /home/packt/RASPBERRYPI/poky/meta-yocto   /home/packt/RASPBERRYPI/poky/meta-yocto-bsp   " BBLAYERS_NON_REMOVABLE ?= "   /home/packt/RASPBERRYPI/poky/meta   /home/packt/RASPBERRYPI/poky/meta-yocto " Add the following line: # LAYER_CONF_VERSION is increased each time build/conf/bblayers.conf # changes incompatibly LCONF_VERSION = "6" BBPATH = "${TOPDIR}" BBFILES ?= "" BBLAYERS ?= "   /home/packt/RASPBERRYPI/poky/meta   /home/packt/RASPBERRYPI/poky/meta-yocto   /home/packt/RASPBERRYPI/poky/meta-yocto-bsp   /home/packt/RASPBERRYPI/poky/meta-raspberrypi   " BBLAYERS_NON_REMOVABLE ?= "   /home/packt/RASPBERRYPI/poky/meta   /home/packt/RASPBERRYPI/poky/meta-yocto " Naturally, you have to adapt the absolute path (/home/packt/RASPBERRYPI here) depending on your own installation. Building the Poky image At this stage, we will have to look at the available images as to whether they are compatible with our platform (.bb files). Choosing the image Poky provides several predesigned image recipes that we can use to build our own binary image. We can check the list of available images by running the following command from the poky directory: $ ls meta*/recipes*/images/*.bb All the recipes provide images which are, in essence, a set of unpacked and configured packages, generating a filesystem that we can use on actual hardware (for further information about different images, you can visit http://www.yoctoproject.org/docs/latest/mega-manual/mega-manual.html#ref-images). Here is a small representation of the available images: We can add the layers proposed by meta-raspberrypi to all of these layers: $ ls meta-raspberrypi/recipes-core/images/*.bb rpi-basic-image.bb rpi-hwup-image.bb rpi-test-image.bb Here is an explanation of the images: rpi-hwup-image.bb: This is an image based on core-image-minimal. rpi-basic-image.bb: This is an image based on rpi-hwup-image.bb, with some added features (a splash screen). rpi-test-image.bb: This is an image based on rpi-basic-image.bb, which includes some packages present in meta-raspberrypi. We will take one of these three recipes for the rest of this article. Note that these files (.bb) describe recipes, like all the others. These are organized logically, and here, we have the ones for creating an image for the Raspberry Pi. Running BitBake At this point, what remains for us is to start the build engine Bitbake, which will parse all the recipes that contain the image you pass as a parameter (as an initial example, we can take rpi-basic-image): $ bitbake rpi-basic-image Loading cache: 100% |########################################################################################################################################################################| ETA:  00:00:00 Loaded 1352 entries from dependency cache. NOTE: Resolving any missing task queue dependencies Build Configuration: BB_VERSION        = "1.25.0" BUILD_SYS         = "x86_64-linux" NATIVELSBSTRING   = "Ubuntu-14.04" TARGET_SYS        = "arm-poky-linux-gnueabi" MACHINE           = "raspberrypi" DISTRO            = "poky" DISTRO_VERSION    = "1.7" TUNE_FEATURES     = "arm armv6 vfp" TARGET_FPU        = "vfp" meta              meta-yocto        meta-yocto-bsp    = "master:08d3f44d784e06f461b7d83ae9262566f1cf09e4" meta-raspberrypi  = "master:6c6f44136f7e1c97bc45be118a48bd9b1fef1072" NOTE: Preparing RunQueue NOTE: Executing SetScene Tasks NOTE: Executing RunQueue Tasks Once launched, BitBake begins by browsing all the (.bb and .bbclass)files that the environment provides access to and stores the information in a cache. Because the parser of BitBake is parallelized, the first execution will always be longer because it has to build the cache (only about a few seconds longer). However, subsequent executions will be almost instantaneous, because BitBake will load the cache. As we can see from the previous command, before executing the task list, BitBake displays a trace that details the versions used (target, version, OS, and so on). Finally, BitBake starts the execution of tasks and shows us the progress. Depending on your setup, you can go drink some coffee or even eat some pizza. Usually after this, , if all goes well, you will be pleased to see that the tmp/subdirectory's directory construction (rpi-build) is generally populated. The build directory (rpi-build) contains about20 GB after the creation of the image. After a few hours of baking, we can rejoice with the result and the creation of the system image for our target: $ ls rpi-build/tmp/deploy/images/raspberrypi/*sdimg rpi-basic-image-raspberrypi.rpi-sdimg This is this file that we will use to create our bootable SD card. Creating a bootable SD card Now that our environment is complete, you can create a bootable SD card with the following command (remember to change /dev/sdX to the proper device name and be careful not to kill your hard disk by selecting the wrong device name): $ sudo dd if=rpi-basic-image-raspberrypi.rpi-sdimg of=/dev/sdX bs=1M Once the copying is complete, you can check whether the operation was successful using the following command (look at mmcblk0): $ lsblk NAME        MAJ:MIN RM   SIZE RO TYPE MOUNTPOINT mmcblk0     179:0    0   3,7G  0 disk ├─mmcblk0p1 179:1    0    20M  0 part /media/packt/raspberrypi └─mmcblk0p2 179:2    0   108M  0 part /media/packt/f075d6df-d8b8-4e85-a2e4-36f3d4035c3c You can also look at the left-hand side of your interface: Booting the image on the Raspberry Pi This is surely the most anticipated moment of this article—the moment where we boot our Raspberry Pi with a fresh Poky image. You just have to insert your SD card in a slot, connect the HDMI cable to your monitor, and connect the power supply (it is also recommended to use a mouse and a keyboard to shut down the device, unless you plan on just pulling the plug and possibly corrupting the boot partition). After connectingthe power supply, you could see the Raspberry Pi splash screen: The login for the Yocto/Poky distribution is root. Summary In this article, we learned the steps needed to set up Poky and get our first image built. We ran that image on the Raspberry Pi, which gave us a good overview of the available capabilities. Resources for Article:   Further resources on this subject: Programming on Raspbian [article] Working with a Webcam and Pi Camera [article] Creating a Supercomputer [article]
Read more
  • 0
  • 1
  • 41800

article-image-probabilistic-graphical-models-r
Packt
14 Apr 2016
18 min read
Save for later

Probabilistic Graphical Models in R

Packt
14 Apr 2016
18 min read
In this article by David Bellot, author of the book, Learning Probabilistic Graphical Models in R, explains that among all the predictions that were made about the 21st century, we may not have expected that we would collect such a formidable amount of data about everything, everyday, and everywhere in the world. The past years have seen an incredible explosion of data collection about our world and lives, and technology is the main driver of what we can certainly call a revolution. We live in the age of information. However, collecting data is nothing if we don't exploit it and if we don't try to extract knowledge out of it. At the beginning of the 20th century, with the birth of statistics, the world was all about collecting data and making statistics. Back then, the only reliable tools were pencils and papers and, of course, the eyes and ears of the observers. Scientific observation was still in its infancy despite the prodigious development of the 19th century. (For more resources related to this topic, see here.) More than a hundred years later, we have computers, electronic sensors, massive data storage, and we are able to store huge amounts of data continuously, not only about our physical world but also about our lives, mainly through the use of social networks, Internet, and mobile phones. Moreover, the density of our storage technology increased so much that we can, nowadays, store months if not years of data into a very small volume that can fit in the palm of our hand. Among all the tools and theories that have been developed to analyze, understand, and manipulate probability and statistics became one of the most used. In this field, we are interested in a special, versatile, and powerful class of models called the probabilistic graphical models (PGM, for short). Probabilistic graphical model is a tool to represent beliefs and uncertain knowledge about facts and events using probabilities. It is also one of the most advanced machine learning techniques nowadays and has many industrial success stories. They can deal with our imperfect knowledge about the world because our knowledge is always limited. We can't observe everything, and we can't represent the entire universe in a computer. We are intrinsically limited as human beings and so are our computers. With Probabilistic Graphical Models, we can build simple learning algorithms or complex expert systems. With new data, we can improve these models and refine them as much as we can, and we can also infer new information or make predictions about unseen situations and events. Probabilistic Graphical Models, seen from the point of view of mathematics, are a way to represent a probability distribution over several variables, which is called a joint probability distribution. In a PGM, such knowledge between variables can be represented with a graph, that is, nodes connected by edges with a specific meaning associated to it. Let's consider an example from the medical world: how to diagnose a cold. This is an example and by no means a medical advice. It is oversimplified for the sake of simplicity. We define several random variables such as the following: Se: This means season of the year N: This means that the nose is blocked H: This means the patient has a headache S: This means that the patient regularly sneezes C: This means that the patient coughs Cold: This means the patient has a cold. Because each of the symptoms can exist at different degrees, it is natural to represent the variable as random variables. For example, if the patient's nose is a bit blocked, we will assign a probability of, say, 60% to this variable, that is P(N=blocked)=0.6 and P(N=not blocked)=0.4. In this example, the probability distribution P(Se,N,H,S,C,Cold) will require 4 * 25 = 128 values in total (4 values for season and 2 values for each of the other random variables). It's quite a lot, and honestly, it's quite difficult to determine things such as the probability that the nose is not blocked, the patient has a headache, the patient sneeze, and so on. However, we can say that a headache is not directly related to cough of a blocked nose, expect when the patient has a cold. Indeed, the patient can have a headache for many other reasons. Moreover, we can say that the Season has quite a direct effect on Sneezing, blocked nose, or Cough but less or no direct effect on Headache. In a Probabilistic Graphical Model, we will represent these dependency relationships with a graph, as follow, where each random variable is a node in the graph, and each relationship is an arrow between 2 nodes: In the graph that follows, there is a direct relation between each node and each variable of the Probabilistic Graphical Model and also a direct relation between arrows and the way we can simplify the joint probability distribution in order to make it tractable. Using a graph as a model to simplify a complex (and sometimes complicated) distribution presents numerous benefits: As we observed in the previous example, and in general when we model a problem, the random variables interacts directly with only a small subsets of other random variables. Therefore, this promotes more compact and tractable models The knowledge and dependencies represented in a graph are easy to understand and communicate The graph induces a compact representation of the joint probability distribution and it is easy to make computations with Algorithms to draw inferences and learn can use the graph theory and the associated algorithms to improve and facilitate all the inference and learning algorithms. Compared to the raw joint probability distribution, using a PGM will speed up computations by several order of magnitude. The junction tree algorithm The Junction Tree Algorithm is one of the main algorithms to do inference on PGM. Its name arises from the fact that before doing the numerical computations, we will transform the graph of the PGM into a tree with a set of properties that allow for efficient computations of posterior probabilities. One of the main aspects is that this algorithm will not only compute the posterior distribution of the variables in the query, but also the posterior distribution of all other variables that are not observed. Therefore, for the same computational price, one can have any posterior distribution. Implementing a junction tree algorithm is a complex task, but fortunately, several R packages contain a full implementation, for example, gRain. Let's say we have several variables A, B, C, D, E,and F. We will consider for the sake of simplicity that each variable is binary so that we won't have too many values to deal with. We will assume the following factorization: This is represented by the following graph: We first start by loading the gRain package into R: library(gRain) Then, we create our set of random variables from A to F: val=c(“true”,”false”) F = cptable(~F, values=c(10,90),levels=val) C = cptable(~C|F, values=c(10,90,20,80),levels=val) E = cptable(~E|F, values=c(50,50,30,70),levels=val) A = cptable(~A|C, values=c(50,50,70,30),levels=val) D = cptable(~D|E, values=c(60,40,70,30),levels=val) B = cptable(~B|A:D, values=c(60,40,70,30,20,80,10,90),levels=val) The cptable function creates a conditional probability table, which is a factor for discrete variables. The probabilities associated to each variable are purely subjective and only serve the purpose of the example. The next step is to compute the junction tree. In most packages, computing the junction tree is done by calling one function because the algorithm just does everything at once: plist = compileCPT(list(F,E,C,A,D,B)) plist Also, we check whether the list of variable is correctly compiled into a probabilistic graphical model and we obtain from the previous code: CPTspec with probabilities:  P( F )  P( E | F )  P( C | F )  P( A | C )  P( D | E )  P( B | A D ) This is indeed the factorization of our distribution, as stated earlier. If we want to check further, we can look at the conditional probability table of a few variables: print(plist$F) print(plist$B) F  true false 0.1   0.9 , , D = true        A B       true false   true   0.6   0.7   false  0.4   0.3 , , D = false          A B       true false   true   0.2   0.1   false  0.8   0.9 The second output is a bit more complex, but if you look carefully, you will see that you have two distributions, P(B|A,D=true) and P(B|A,D=false) which is more readable presentation of P(B|A,D). We finally create the graph and run the junction tree algorithm by calling this: jtree = grain(plist) Again, when we check the result, we obtain: jtree Independence network: Compiled: FALSE Propagated: FALSE   Nodes: chr [1:6] "F" "E" "C" "A" "D" "B" We only need to compute the junction tree once. Then, all queries can be computed with the same junction tree. Of course, if you change the graph, then you need to recompute the junction tree. Let's perform a few queries: querygrain(jtree, nodes=c("F"), type="marginal") $F F  true false 0.1   0.9 Of course, if you ask for the marginal distribution of F, you will obtain the initial conditional probability table because F has no parents.  querygrain(jtree, nodes=c("C"), type="marginal") $C C  true false 0.19  0.81 This is more interesting because it computes the marginal of C while we only stated the conditional distribution of C given F. We didn't need to have such a complex algorithm as the junction tree algorithm to compute such a small marginal. We saw the variable elimination algorithm earlier and that would be enough too. But if you ask for the marginal of B, then the variable elimination will not work because of the loop in the graph. However, the junction tree will give the following: querygrain(jtree, nodes=c("B"), type="marginal") $B B     true    false 0.478564 0.521436   And, can ask more complex distribution, such as the joint distribution of B and A: querygrain(jtree, nodes=c("A","B"), type="joint")        B A           true    false   true  0.309272 0.352728   false 0.169292 0.168708 In fact, any combination can be given like A,B,C: querygrain(jtree, nodes=c("A","B","C"), type="joint") , , B = true          A C           true    false   true  0.044420 0.047630   false 0.264852 0.121662   , , B = false          A C           true    false   true  0.050580 0.047370   false 0.302148 0.121338 Now, we want to observe a variable and compute the posterior distribution. Let's say F=true and we want to propagate this information down to the rest of the network: jtree2 = setEvidence(jtree, evidence=list(F="true")) We can ask for any joint or marginal now: querygrain(jtree, nodes=c("A"), type="marginal") $A A  true false 0.662 0.338 querygrain(jtree2, nodes=c("A"), type="marginal") $A A  true false  0.68  0.32 Here, we see that knowing that F=true changed the marginal distribution on A from its previous marginal (the second query is again with jtree2, the tree with an evidence). And, we can query any other variable: querygrain(jtree, nodes=c("B"), type="marginal") $B B     true    false 0.478564 0.521436   querygrain(jtree2, nodes=c("B"), type="marginal") $B B   true  false 0.4696 0.5304 Learning Building a Probabilistic Graphical Model, generally, requires three steps: defining the random variables, which are the nodes of the graph as well; defining the structure of the graph; and finally defining the numerical parameters of each local distribution. So far, the last step has been done manually and we gave numerical values to each local probability distribution by hand. In many cases, we have access to a wealth of data and we can find the numerical values of those parameters with a method called parameters learning. In other fields, it is also called parameters fitting or model calibration. Learning parameters can be done with several approaches and there is no ultimate solution to the problem because it depends on the goal where the model's user wants to reach. Nevertheless, it is common to use the notion of Maximum Likelihood of a model and also Maximum A Posteriori. As you are now used to the notion of prior and posterior of a distribution, you can already guess what a maximum a posteriori can do. Many algorithms are used, among which we can cite the Expectation Maximization algorithm (EM), which computes the maximum likelihood of a model even when data is missing or variables are not observed at all. It is a very important algorithm, especially for mixture models. A graphical model of a linear model PGM can be used to represent standard statistical models and then extend them. One famous example is the linear regression mode. We can visualize the structure of a linear mode and better understand the relationships between the variable. The linear model captures the relationships between observable variables xand a target variable y. This relation is modeled by a set of parameters, θ. But remember the distribution of y for each data point indexed by i: Here, Xiis a row vector for which the first element is always one to capture the intercept of the linear model. The parameter θ in the following graph is itself composed of the intercept, the coefficient β for each component of X, and the variance σ2 of in the distribution of yi. The PGM for an observation of a linear model can be represented as follows: So, this decomposition leads us to a second version of the graphical model in which we explicitly separate the components of θ: In a PGM, when a rectangle is drawn around a set of nodes with a number or variables in a corner (N for example), it means that the same graph is repeated many times. The likelihood function of a linear model is    , and it can be represented as a PGM. And, the vector β can also be decomposed it into its univariate components too: In this last iterations of the graphical model, we see that the parameters β could have a prior probability on it instead of being fixed. In fact, the parameter  can also be considered as a random variable. For the time being, we will keep it fixed. Latent Dirichlet Allocation The last model we want to show in this article is called the Latent Dirichlet Allocation. It is a generative model that can be represented as a graphical model. It's based on the same idea as the mixture model with one notable exception. In this model, we assume that the data points might be generated by a combination of clusters and not just one cluster at a time, as it was the case before. The LDA model is primarily used in text analysis and classification. Let's consider that a text document is composed of words making sentences and paragraphs. To simplify the problem we can say that each sentence or paragraph is about one specific topic, such as science, animals, sports, and s on. Topics can also be more specific, such as cat topic or European soccer topic. Therefore, there are words that are more likely to come from specific topics. For example, the work cat is likely to come from the topic cat topic. The word stadium is likely to come from the topic European soccer. However, the word ball should come with a higher probability from the topic European soccer, but it is not unlikely to come from the topic cat, because cats like to play with balls too. So, it seems the word ball might belong to two topics at the same time with a different degree of certainty. Other words such as table will certainly belong equally to both topics and presumably to others. They are very generic; expect, of course, if we introduce another topics such as furniture. A document is a collection of words, so a document can have complex relationships with a set of topics. But in the end, it is more likely to see words coming from the same topic or the same topics within a paragraph and to some extent to the document. In general, we model a document with a bag of words model, that is, we consider a document to be a randomly generated set of words, using a specific distribution over the words. If this distribution is uniform over all the words, then the document will be purely random without a specific meaning. However, if this distribution has a specific form, with more probability mass to related words, then the collection of words generated by this model will have a meaning. Of course, generating documents is not really the application we have in mind for such a model. What we are interested in is the analysis of documents, their classification, and automatic understanding. Let's say is  a categorical variable (in other words, a histogram), representing the probability of appearance of all words from a dictionary. Usually, in this kind of model, we restrict ourselves to long words only and remove the small words, like and, to, but, the, a, and so onThese words are usually called stop words. Let w_jbe the jth words in a document. The following three graphs show the progression from representing a document (left-most graph) to representing a collection of documents (the third graph): Let  be a distribution over topics, then in the second graph from the left, we extend this model by choosing the kind of topic that will be selected at any time and then generate a word out of it. Therefore, the variable zi now becomes the variable zij, that is, the topic iis selected for the word j. We can go even further and decide that we want to model a collection of documents, which seems natural if we consider that we have a big data set. Assuming that documents are i.i.d, the next step (the third graph) is a PGM that represents the generative model for M documents. And, because the distribution on  is categorical, we want to be Bayesian about it, mainly because it will help to model not to overfit and because we consider the selection of topics for a document to be a random process. Moreover, we want to apply the same treatment to the word variable by having a Dirichlet prior. This prior is used to avoid non-observed words that have zero probability. It smooths the distribution of words per topic. A uniform Dirichlet prior will induce a uniform prior distribution on all the words. And therefore, the final graph on the right represents the complete model. This is quite a complex graphical model but techniques have been developed to fit the parameters and use this model. If we follow this graphical model carefully, we have a process that generates documents based on a certain set of topics: α chooses the set of topics for a documents From θ, we generate a topic zij From this topic, we generate a word wj In this model, only the words are observable. All the other variables will have to be determined without observation, exactly like in the other mixture models. So, documents are represented as random mixtures over latent topics, in which each topic is represented as a distribution over words. The distribution of a topic mixture based on this graphical mode can be written as follows: You can see in this formula that for each word, we select a topic, hence the product from 1 to N. Integrating over θ and summing over z, the marginal distribution of a document is as follows: The final distribution can be obtained by taking the product of marginal distributions of single documents, so as to get the distribution over a collection of documents (assuming that documents are independently and identically distributed). Here, D is the collection of documents: The main problem to be solved now is how to compute the posterior distribution over θ and z, given a document. By applying the Bayes formula, we know the following: Unfortunately, this is intractable because of the normalization factor at the denominator. The original paper on LDA, therefore, refers to a technique called Variational inference, which aims at transforming a complex Bayesian inference problem into a simpler approximation which can be solved as an (convex) optimization problem. This technique is the third approach to Bayesian inference and has been used on many other problems. Summary The probabilistic graphical model framework offers a powerful and versatile framework to develop and extend many probabilistic models using an elegant graph-based formalism. It has many applications such as in biology, genomics, medicine, finance, robotics, computer vision, automation, engineering, law, and games, for example. Many packages in R exist to deal with all sort of models and data among which gRain or Rstan are very popular. Resources for Article:   Further resources on this subject: Extending ElasticSearch with Scripting [article] Exception Handling in MySQL for Python [article] Breaking the Bank [article]
Read more
  • 0
  • 0
  • 12660

article-image-step-detector-and-step-counters-sensors
Packt
14 Apr 2016
13 min read
Save for later

Step Detector and Step Counters Sensors

Packt
14 Apr 2016
13 min read
In this article by Varun Nagpal, author of the book, Android Sensor Programming By Example, we will focus on learning about the use of step detector and step counter sensors. These sensors are very similar to each other and are used to count the steps. Both the sensors are based on a common hardware sensor, which internally uses accelerometer, but Android still treats them as logically separate sensors. Both of these sensors are highly battery optimized and consume very low power. Now, lets look at each individual sensor in detail. (For more resources related to this topic, see here.) In this article by Varun Nagpal, author of the book, Android Sensor Programming By Example, we will focus on learning about the use of step detector and step counter sensors. These sensors are very similar to each other and are used to count the steps. Both the sensors are based on a common hardware sensor, which internally uses accelerometer, but Android still treats them as logically separate sensors. Both of these sensors are highly battery optimized and consume very low power. Now, lets look at each individual sensor in detail. The step counter sensor The step counter sensor is used to get the total number of steps taken by the user since the last reboot (power on) of the phone. When the phone is restarted, the value of the step counter sensor is reset to zero. In the onSensorChanged() method, the number of steps is give by event.value[0]; although it's a float value, the fractional part is always zero. The event timestamp represents the time at which the last step was taken. This sensor is especially useful for those applications that don't want to run in the background and maintain the history of steps themselves. This sensor works in batches and in continuous mode. If we specify 0 or no latency in the SensorManager.registerListener() method, then it works in a continuous mode; otherwise, if we specify any latency, then it groups the events in batches and reports them at the specified latency. For prolonged usage of this sensor, it's recommended to use the batch mode, as it saves power. Step counter uses the on-change reporting mode, which means it reports the event as soon as there is change in the value. The step detector sensor The step detector sensor triggers an event each time a step is taken by the user. The value reported in the onSensorChanged() method is always one, the fractional part being always zero, and the event timestamp is the time when the user's foot hit the ground. The step detector sensor has very low latency in reporting the steps, which is generally within 1 to 2 seconds. The Step detector sensor has lower accuracy and produces more false positive, as compared to the step counter sensor. The step counter sensor is more accurate, but has more latency in reporting the steps, as it uses this extra time after each step to remove any false positive values. The step detector sensor is recommended for those applications that want to track the steps in real time and want to maintain their own history of each and every step with their timestamp. Time for action – using the step counter sensor in activity Now, you will learn how to use the step counter sensor with a simple example. The good thing about the step counter is that, unlike other sensors, your app doesn't need to tell the sensor when to start counting the steps and when to stop counting them. It automatically starts counting as soon as the phone is powered on. For using it, we just have to register the listener with the sensor manager and then unregister it after using it. In the following example, we will show the total number of steps taken by the user since the last reboot (power on) of the phone in the Android activity. We created a PedometerActivity and implemented it with the SensorEventListener interface, so that it can receive the sensor events. We initiated the SensorManager and Sensor object of the step counter and also checked the sensor availability in the OnCreate() method of the activity. We registered the listener in the onResume() method and unregistered it in the onPause() method as a standard practice. We used a TextView to display the total number of steps taken and update its latest value in the onSensorChanged() method. public class PedometerActivity extends Activity implements SensorEventListener{ private SensorManager mSensorManager; private Sensor mSensor; private boolean isSensorPresent = false; private TextView mStepsSinceReboot; @Override protected void onCreate(Bundle savedInstanceState) { super.onCreate(savedInstanceState); setContentView(R.layout.activity_pedometer); mStepsSinceReboot = (TextView)findViewById(R.id.stepssincereboot); mSensorManager = (SensorManager) this.getSystemService(Context.SENSOR_SERVICE); if(mSensorManager.getDefaultSensor(Sensor.TYPE_STEP_COUNTER) != null) { mSensor = mSensorManager.getDefaultSensor(Sensor.TYPE_STEP_COUNTER); isSensorPresent = true; } else { isSensorPresent = false; } } @Override protected void onResume() { super.onResume(); if(isSensorPresent) { mSensorManager.registerListener(this, mSensor, SensorManager.SENSOR_DELAY_NORMAL); } } @Override protected void onPause() { super.onPause(); if(isSensorPresent) { mSensorManager.unregisterListener(this); } } @Override public void onSensorChanged(SensorEvent event) { mStepsSinceReboot.setText(String.valueOf(event.values[0])); } Time for action – maintaining step history with step detector sensor The Step counter sensor works well when we have to deal with the total number of steps taken by the user since the last reboot (power on) of the phone. It doesn't solve the purpose when we have to maintain history of each and every step taken by the user. The Step counter sensor may combine some steps and process them together, and it will only update with an aggregated count instead of reporting individual step detail. For such cases, the step detector sensor is the right choice. In our next example, we will use the step detector sensor to store the details of each step taken by the user, and we will show the total number of steps for each day, since the application was installed. Our next example will consist of three major components of Android, namely service, SQLite database, and activity. Android service will be used to listen to all the individual step details using the step counter sensor when the app is in the background. All the individual step details will be stored in the SQLite database and finally the activity will be used to display the list of total number of steps along with dates. Let's look at the each component in detail. The first component of our example is PedometerListActivity. We created a ListView in the activity to display the step count along with dates. Inside the onCreate() method of PedometerListActivity, we initiated the ListView and ListAdaptor required to populate the list. Another important task that we do in the onCreate() method is starting the service (StepsService.class), which will listen to all the individual steps' events. We also make a call to the getDataForList() method, which is responsible for fetching the data for ListView. public class PedometerListActivity extends Activity{ private ListView mSensorListView; private ListAdapter mListAdapter; @Override protected void onCreate(Bundle savedInstanceState) { super.onCreate(savedInstanceState); setContentView(R.layout.activity_main); mSensorListView = (ListView)findViewById(R.id.steps_list); getDataForList(); mListAdapter = new ListAdapter(); mSensorListView.setAdapter(mListAdapter); Intent mStepsIntent = new Intent(getApplicationContext(), StepsService.class); startService(mStepsIntent); } In our example, the DateStepsModel class is used as a POJO (Plain Old Java Object) class, which is a handy way of grouping logical data together, to store the total number of steps and date. We also use the StepsDBHelper class to read and write the steps data in the database (discussed further in the next section). Inside the getDataForList() method, we initiated the object of the StepsDBHelper class and call the readStepsEntries() method of the StepsDBHelper class, which returns ArrayList of the DateStepsModel objects containing the total number of steps along with dates after reading from database. The ListAdapter class is used for populating the values for ListView, which internally uses ArrayList of DateStepsModel as the data source. The individual list item is the string, which is the concatenation of date and the total number of steps. class DateStepsModel { public String mDate; public int mStepCount; } private StepsDBHelper mStepsDBHelper; private ArrayList<DateStepsModel> mStepCountList; public void getDataForList() { mStepsDBHelper = new StepsDBHelper(this); mStepCountList = mStepsDBHelper.readStepsEntries(); } private class ListAdapter extends BaseAdapter{ private TextView mDateStepCountText; @Override public int getCount() { return mStepCountList.size(); } @Override public Object getItem(int position) { return mStepCountList.get(position); } @Override public long getItemId(int position) { return position; } @Override public View getView(int position, View convertView, ViewGroup parent) { if(convertView==null){ convertView = getLayoutInflater().inflate(R.layout.list_rows, parent, false); } mDateStepCountText = (TextView)convertView.findViewById(R.id.sensor_name); mDateStepCountText.setText(mStepCountList.get(position).mDate + " - Total Steps: " + String.valueOf(mStepCountList.get(position).mStepCount)); return convertView; } } The second component of our example is StepsService, which runs in the background and listens to the step detector sensor until the app is uninstalled. We implemented this service with the SensorEventListener interface so that it can receive the sensor events. We also initiated theobjects of StepsDBHelper, SensorManager, and the step detector sensor inside the OnCreate() method of the service. We only register the listener when the step detector sensor is available on the device. A point to note here is that we never unregistered the listener because we expect our app to log the step information indefinitely until the app is uninstalled. Both step detector and step counter sensors are very low on battery consumptions and are highly optimized at the hardware level, so if the app really requires, it can use them for longer durations without affecting the battery consumption much. We get a step detector sensor callback in the onSensorChanged() method whenever the operating system detects a step, and from CC: specify, we call the createStepsEntry() method of the StepsDBHelperclass to store the step information in the database. public class StepsService extends Service implements SensorEventListener{ private SensorManager mSensorManager; private Sensor mStepDetectorSensor; private StepsDBHelper mStepsDBHelper; @Override public void onCreate() { super.onCreate(); mSensorManager = (SensorManager) this.getSystemService(Context.SENSOR_SERVICE); if(mSensorManager.getDefaultSensor(Sensor.TYPE_STEP_DETECTOR) != null) { mStepDetectorSensor = mSensorManager.getDefaultSensor(Sensor.TYPE_STEP_DETECTOR); mSensorManager.registerListener(this, mStepDetectorSensor, SensorManager.SENSOR_DELAY_NORMAL); mStepsDBHelper = new StepsDBHelper(this); } } @Override public int onStartCommand(Intent intent, int flags, int startId) { return Service.START_STICKY; } @Override public void onSensorChanged(SensorEvent event) { mStepsDBHelper.createStepsEntry(); } The last component of our example is the SQLite database. We created a StepsDBHelper class and extended it from the SQLiteOpenHelper abstract utility class provided by the Android framework to easily manage database operations. In the class, we created a database called StepsDatabase, which is automatically created on the first object creation of the StepsDBHelper class by the OnCreate() method. This database has one table StepsSummary, which consists of only three columns (id, stepscount, and creationdate). The first column, id, is the unique integer identifier for each row of the table and is incremented automatically on creation of every new row. The second column, stepscount, is used to store the total number of steps taken for each date. The third column, creationdate, is used to store the date in the mm/dd/yyyy string format. Inside the createStepsEntry() method, we first check whether there is an existing step count with the current date, and we if find one, then we read the existing step count of the current date and update the step count by incrementing it by 1. If there is no step count with the current date found, then we assume that it is the first step of the current date and we create a new entry in the table with the current date and step count value as 1. The createStepsEntry() method is called from onSensorChanged() of the StepsService class whenever a new step is detected by the step detector sensor. public class StepsDBHelper extends SQLiteOpenHelper { private static final int DATABASE_VERSION = 1; private static final String DATABASE_NAME = "StepsDatabase"; private static final String TABLE_STEPS_SUMMARY = "StepsSummary"; private static final String ID = "id"; private static final String STEPS_COUNT = "stepscount"; private static final String CREATION_DATE = "creationdate";//Date format is mm/dd/yyyy private static final String CREATE_TABLE_STEPS_SUMMARY = "CREATE TABLE " + TABLE_STEPS_SUMMARY + "(" + ID + " INTEGER PRIMARY KEY AUTOINCREMENT," + CREATION_DATE + " TEXT,"+ STEPS_COUNT + " INTEGER"+")"; StepsDBHelper(Context context) { super(context, DATABASE_NAME, null, DATABASE_VERSION); } @Override public void onCreate(SQLiteDatabase db) { db.execSQL(CREATE_TABLE_STEPS_SUMMARY); } public boolean createStepsEntry() { boolean isDateAlreadyPresent = false; boolean createSuccessful = false; int currentDateStepCounts = 0; Calendar mCalendar = Calendar.getInstance(); String todayDate = String.valueOf(mCalendar.get(Calendar.MONTH))+"/" + String.valueOf(mCalendar.get(Calendar.DAY_OF_MONTH))+"/"+String.valueOf(mCalendar.get(Calendar.YEAR)); String selectQuery = "SELECT " + STEPS_COUNT + " FROM " + TABLE_STEPS_SUMMARY + " WHERE " + CREATION_DATE +" = '"+ todayDate+"'"; try { SQLiteDatabase db = this.getReadableDatabase(); Cursor c = db.rawQuery(selectQuery, null); if (c.moveToFirst()) { do { isDateAlreadyPresent = true; currentDateStepCounts = c.getInt((c.getColumnIndex(STEPS_COUNT))); } while (c.moveToNext()); } db.close(); } catch (Exception e) { e.printStackTrace(); } try { SQLiteDatabase db = this.getWritableDatabase(); ContentValues values = new ContentValues(); values.put(CREATION_DATE, todayDate); if(isDateAlreadyPresent) { values.put(STEPS_COUNT, ++currentDateStepCounts); int row = db.update(TABLE_STEPS_SUMMARY, values, CREATION_DATE +" = '"+ todayDate+"'", null); if(row == 1) { createSuccessful = true; } db.close(); } else { values.put(STEPS_COUNT, 1); long row = db.insert(TABLE_STEPS_SUMMARY, null, values); if(row!=-1) { createSuccessful = true; } db.close(); } } catch (Exception e) { e.printStackTrace(); } return createSuccessful; } The readStepsEntries() method is called from PedometerListActivity to display the total number of steps along with the date in the ListView. The readStepsEntries() method reads all the step counts along with their dates from the table and fills the ArrayList of DateStepsModelwhich is used as a data source for populating the ListView in PedometerListActivity. public ArrayList<DateStepsModel> readStepsEntries() { ArrayList<DateStepsModel> mStepCountList = new ArrayList<DateStepsModel>(); String selectQuery = "SELECT * FROM " + TABLE_STEPS_SUMMARY; try { SQLiteDatabase db = this.getReadableDatabase(); Cursor c = db.rawQuery(selectQuery, null); if (c.moveToFirst()) { do { DateStepsModel mDateStepsModel = new DateStepsModel(); mDateStepsModel.mDate = c.getString((c.getColumnIndex(CREATION_DATE))); mDateStepsModel.mStepCount = c.getInt((c.getColumnIndex(STEPS_COUNT))); mStepCountList.add(mDateStepsModel); } while (c.moveToNext()); } db.close(); } catch (Exception e) { e.printStackTrace(); } return mStepCountList; } What just happened? We created a small pedometer utility app that maintains the step history along with dates using the steps detector sensor. We used PedometerListActivityto display the list of the total number of steps along with their dates. StepsServiceis used to listen to all the steps detected by the step detector sensor in the background. And finally, the StepsDBHelperclass is used to create and update the total step count for each date and to read the total step counts along with dates from the database. Resources for Article: Further resources on this subject: Introducing the Android UI [article] Building your first Android Wear Application [article] Mobile Phone Forensics – A First Step into Android Forensics [article]
Read more
  • 0
  • 3
  • 48076

article-image-remote-authentication
Packt
14 Apr 2016
9 min read
Save for later

Remote Authentication

Packt
14 Apr 2016
9 min read
When setting up a Linux system, security is supposed to be an important part of all the stages. A good knowledge of the fundamentals of Linux is essential to implement a good security policy on the machine. In this article by Tajinder Pal Singh Kalsi, author of the book, Practical Linux Security Cookbook, we will discuss the following topics: Remote server / Host access using SSH SSH root login disable or enable Key based Login into SSH for restricting remote access (For more resources related to this topic, see here.) Remote server / host access using SSH SSH or Secure Shell is a protocol which is used to log onto remote systems securely and is the most used method for accessing remote Linux systems. Getting ready To see how to use SSH, we need two Ubuntu systems. One will be used as server and the other as client. How to do it… To use SSH we can use freely available software called—OpenSSH. Once the software is installed it can be used by the command ssh, on the Linux system. We will see how to use this tool in detail. If the software to use SSH is not already installed we have to install it on both the server and the client system. The command to install the tool on the server system is: sudo apt-get install openssh-server The output obtained will be as follows: Next we need to install the client version of the software: sudo apt-get install openssh-client The output obtained will be as follows: For latest versions ssh service starts running as soon as the software is installed. If it is not running by default, we can start the service by using the command: sudo service ssh start The output obtained will be as follows: Now if we want to login from the client system to the server system, the command will be as follows: ssh remote_ip_address Here remote_ip_address refers to the IP address of the server system. Also this command assumes that the username on the client machine is the same as that on the server machine: ssh remote_ip_address If we want to login for different user, the command will be as follows: ssh username@remote_ip_address The output obtained will be as follows: Next we need to configure SSH to use it as per our requirements. The main configuration file for sshd in Ubuntu is located at /etc/ssh/sshd_config. Before making any changes to the original version of this file, create a backup using the command: sudo cp /etc/ssh/sshd_config{,.bak} The configuration file defines the default settings for SSH on the server system. When we open the file in any editor, we can see that the default port declaration on which the sshd server listens for the incoming connections is 22. We can change this to any non-standard port to secure the server from random port scans, hence making it more secure. Suppose we change the port to 888, then next time the client wants to connect to the SSH server, the command will be as follows: ssh -p port_numberremote_ip_address The output obtained will be as follows: As we can see when we run the command without specifying the port number, the connection is refused. Next when we mention the correct port number, the connection is established. How it works… SSH is used to connect a client program to a SSH server. On one system we install the openssh-server package to make it the SSH server and on the other system we install the openssh-client package to use it as client. Now keeping the SSH service running on the server system, we try to connect to it through the client. We use the configuration file of SSH to change the settings like default port for connecting. SSH root login disable or enable The Linux systems have root account by default which is enabled by default. If unauthorized users get ssh root access on the system, it is not a good idea because this will give an attacker access to the complete system. We can disable or enable the root login for ssh as per requirement to prevent the chances of an attacker getting access to the system. Getting Ready We need 2 Linux systems to be used as server and client. On the server system, install the package openssh-server, as shown in the preceding recipe. How to do it… First we will see how to disable SSH Root login and then we will also see how to enable it again Firstly open the main configuration file of ssh—/etc/ssh/sshd_config, in any editor. sudo nano /etc/ssh/sshd_config Now look for the line that reads as follows: PermitRootLogin yes Change the value yes to no. Then save and close the file: PermitRootLogin no The output obtained will be as follows: Once done, restart the SSH daemon service using the command as shown here: Now let's try to login as root. We should get an error – Permission Denied as the root login has been disabled: Now whenever we want to login as root, first we will have to login as normal user. And after that we can use the su command and switch to root user. So, the user accounts which are not listed in /etc/sudoers file will not be able to switch to root user and the system will be more secure: Now if we want to again enable SSH Root login, we just need to edit /etc/ssh/sshd_config file again and change the option no to yes again: PermitRootLogin yes The output obtained will be as follows: Then restart the service again by using the command: Now if we try to login as root again, it will work: How it works… When we try to connect to a remote system using SSH, the remote system checks its configuration file at /etc/ssh/sshd_config and according to the details mentioned in this file it decides whether the connection should be allowed or refused. When we change the value of PermitRootLogin according the working also changes. There's more… Suppose we have many user accounts on the systems, then we need to edit the /etc/ssh/sshd_config file in such a way that remote access is allowed only for few mentioned users. sudo nano /etc/ssh/sshd_config Add the line: AllowUsers tajinder user1 Now restart the ssh service: sudo service ssh restart Now when we try to login with user1, the login is successful. However, when we try to login with user2 which is not added in /etc/ssh/sshd_config file, the login fails and we get the error Permission denied, as shown here: Key based login into SSH for restricting remote access Even though SSH login is protected by using passwords for the user account, we can make it more secure by using Key based authentication into SSH. Getting ready To see how key based authentication works, we would need two Linux system (in our example both our Ubuntu systems). One should have the OpenSSH server package installed on it. How to do it... To use key-based authentication, we need to create a pair of keys—a private key and a public key. On the client or local system, we will execute the following command  to generate the SSH keys pair: ssh-keygen-trsa The output obtained will be as follows: While creating the key, we can accept the defaults values or change them as per our wish. It will also ask for a passphrase, which you can choose anything or else leave it blank. The key-pair will be created in the location—~./ssh/. Change to this directory and then use the command—ls –l to see the details of the key files: We can see that id_rsa file can be read and written only by the owner. This permission ensures that the file is kept secure. Now we need to copy the public key file to the remote SSH server. To do so we run the command: ssh-copy-id 192.168.1.101 The output obtained will be as follows: An SSH session will be started and prompt for entering the password for the user account. Once the correct password has been entered the key will get copied to the remote server. Once the public key has been successfully copied to the remote server, try to login to the server again using the ssh 192.168.1.101 command: We can see that now we are not prompted for the user account's password. Since we had configured the passphrase for the SSH key, it has been asked. Otherwise we would have been logged into the system without being asked for the password. How it works... When we create the SSH key pair and move the public key to the remote system, it works as an authentication method for connecting to the remote system. If the public key present in the remote system matches the public key generated by the local system and also the local system has the private key to complete the key-pair, the login happens. Otherwise, if any key file is missing, login is not allowed. Summary Linux security is a massive subject and everything cannot be covered in just one article. Still, Practical Linux Security Cookbook will give you a lot of recipes for securing your machine. It can be referred to as a practical guide for the administrators and help them configure a more secure machine. Resources for Article: Further resources on this subject: Wireless Attacks in Kali Linux [article] Creating a VM using VirtualBox - Ubuntu Linux [article] Building tiny Web-applications in Ruby using Sinatra [article]
Read more
  • 0
  • 0
  • 2450
Unlock access to the largest independent learning library in Tech for FREE!
Get unlimited access to 7500+ expert-authored eBooks and video courses covering every tech area you can think of.
Renews at $19.99/month. Cancel anytime
article-image-detecting-fraud-e-commerce-orders-benfords-law
Packt
14 Apr 2016
7 min read
Save for later

Detecting fraud on e-commerce orders with Benford's law

Packt
14 Apr 2016
7 min read
In this article by Andrea Cirillo, author of the book RStudio for R Statistical Computing Cookbook, has explained how to detect fraud on e-commerce orders. Benford's law is a popular empirical law that states that the first digits of a population of data will follow a specific logarithmic distribution. This law was observed by Frank Benford around 1938 and since then has gained increasing popularity as a way to detect anomalous alteration of population of data. Basically, testing a population against Benford's law means verifying that the given population respects this law. If deviations are discovered, the law performs further analysis for items related to those deviations. In this recipe, we will test a population of e-commerce orders against the law, focusing on items deviating from the expected distribution. (For more resources related to this topic, see here.) Getting ready This recipe will use functions from the well-documented benford.analysis package by Carlos Cinelli. We therefore need to install and load this package: install.packages("benford.analysis") library(benford.analysis) In our example, we will use a data frame that stores e-commerce orders, provided within the book as an .Rdata file. In order to make it available within your environment, we need to load this file by running the following command (assuming the file is within your current working directory): load("ecommerce_orders_list.Rdata") How to do it... Perform Benford test on the order amounts: benford_test <- benford(ecommerce_orders_list$order_amount,1) Plot test analysis: plot(benford_test) This will result in the following plot: Highlights supectes digits: suspectsTable(benford_test) This will produce a table showing for each digit absolute differences between expected and observed frequencies. The first digits will therefore be more anomalous ones: > suspectsTable(benford_test)    digits absolute.diff 1:      5     4860.8974 2:      9     3764.0664 3:      1     2876.4653 4:      2     2870.4985 5:      3     2856.0362 6:      4     2706.3959 7:      7     1567.3235 8:      6     1300.7127 9:      8      200.4623 Define a function to extrapolate the first digit from each amount: left = function (string,char){   substr(string,1,char)} Extrapolate the first digit from each amount: ecommerce_orders_list$first_digit <- left(ecommerce_orders_list$order_amount,1) Filter amounts starting with the suspected digit: suspects_orders <- subset(ecommerce_orders_list,first_digit == 5) How it works Step 1 performs the Benford test on the order amounts. In this step, we applied the benford() function to the amounts. Applying this function means evaluating the distribution of the first digits of amounts against the expected Benford distribution. The function will result in the production of the following objects: Object Description Info This object covers the following general information: data.name: This shows the name of the data used n: This shows the number of observations used n.second.order: This shows the number of observations used for second-order analysis number.of.digits: This shows the number of first digits analyzed Data This is a data frame with the following subobjects: lines.used: This shows  the original lines of the dataset data.used: This shows the data used data.mantissa: This shows the log data's mantissa data.digits: This shows the first digits of the data s.o.data This is a data frame with the following subobjects: data.second.order: This shows the differences of the ordered data  data.second.order.digits: This shows the first digits of the second-order analysis Bfd This is a data frame with the following subobjects: digits: This highlights the groups of digits analyzed data.dist: This highlights the distribution of the first digits of the data data.second.order.dist: This highlights the distribution of the first digits of the second-order analysis benford.dist: This shows the theoretical Benford distribution data.second.order.dist.freq: This shows the frequency distribution of the first digits of the second-order analysis data.dist.freq: This shows the frequency distribution of the first digits of the data benford.dist.freq: This shows the theoretical Benford frequency distribution benford.so.dist.freq: This shows the theoretical Benford frequency distribution of the second order analysis. data.summation: This shows the summation of the data values grouped by first digits abs.excess.summation: This shows the absolute excess summation of the data values grouped by first digits difference: This highlights the difference between the data and Benford frequencies squared.diff: This shows the chi-squared difference between the data and Benford frequencies absolute.diff: This highlights the absolute difference between the data and Benford frequencies Mantissa This is a data frame with the following subobjects: mean.mantissa: This shows the mean of the mantissa var.mantissa: This shows the variance of the mantissa ek.mantissa: This shows the excess kurtosis of the mantissa sk.mantissa: This highlights the skewness of the mantissa MAD This object depicts the mean absolute deviation. distortion.factor This object talks about the distortion factor. Stats This object lists of htest class statistics as follows: chisq: This lists the Pearson's Chi-squared test. mantissa.arc.test: This lists the Mantissa Arc test Step 2 plots test results. Running plot on the object resulting from the benford() function will result in a plot showing the following (from upper-left corner to bottom-right corner): First digit distribution Results of second-order test Summation distribution for each digit Results of chi-squared test Summation differences If you look carefully at these plots, you will understand which digits show up a distribution significantly different from the one expected from the Benford law. Nevertheless, in order to have a sounder base for our consideration, we need to look at the suspects table, showing absolute differences between expected and observed frequencies. This is what we will do in the next step. Step 3 highlights suspects digits. Using suspectsTable() we can easily discover which digits presents the greater deviation from the expected distribution. Looking at the so-called suspects table, we can see that number 5 shows up as the first digit within our table. In the next step, we will focus our attention on the orders with amounts having this digit as the first digit. Step 4 defines a function to extrapolate the first digit from each amount. This function leverages the substr() function from the stringr() package and extracts the first digit from the number passed to it as an argument. Step 5 adds a new column to the investigated dataset where the first digit is extrapolated. Step 6 filters amounts starting with the suspected digit. After applying the left function to our sequence of amounts, we can now filter the dataset, retaining only rows whose amounts have 5 as the first digit. We will now be able to perform analytical, testing procedures on those items. Summary In this article, you learned how to apply the R language to an e-commerce fraud detection system. Resources for Article: Further resources on this subject: Recommending Movies at Scale (Python) [article] Visualization of Big Data [article] Big Data Analysis (R and Hadoop) [article]
Read more
  • 0
  • 0
  • 3169

article-image-using-registry-and-xlswriter-modules
Packt
14 Apr 2016
12 min read
Save for later

Using the Registry and xlswriter modules

Packt
14 Apr 2016
12 min read
In this article by Chapin Bryce and Preston Miller, the authors of Learning Python for Forensics, we will learn about the features offered by the Registry and xlswriter modules. (For more resources related to this topic, see here.) Working with the Registry module The Registry module, developed by Willi Ballenthin, can be used to obtain keys and values from registry hives. Python provides a built-in registry module called _winreg; however, this module only works on Windows machines. The _winreg module interacts with the registry on the system running the module. It does not support opening external registry hives. The Registry module allows us to interact with the supplied registry hives and can be run on non-Windows machines. The Registry module can be downloaded from https://github.com/williballenthin/python-registry. Click on the releases section to see a list of all the stable versions and download the latest version. For this article, we use version 1.1.0. Once the archived file is downloaded and extracted, we can run the included setup.py file to install the module. In a command prompt, execute the following code in the module's top-level directory as shown: python setup.py install This should install the Registry module successfully on your machine. We can confirm this by opening the Python interactive prompt and typing import Registry. We will receive an error if the module is not installed successfully. With the Registry module installed, let's begin to learn how we can leverage this module for our needs. First, we need to import the Registry class from the Registry module. Then, we use the Registry function to open the registry object that we want to query. Next, we use the open() method to navigate to our key of interest. In this case, we are interested in the RecentDocs registry key. This key contains recent active files separated by extension as shown: >>> from Registry import Registry >>> reg = Registry.Registry('NTUSER.DAT') >>> recent_docs = reg.open('SOFTWARE\Microsoft\Windows\CurrentVersion\Explorer\RecentDocs') If we print therecent_docs variable, we can see that it contains 11 values with five subkeys, which may contain additional values and subkeys. Additionally, we can use thetimestamp() method to see the last written time of the registry key. >>> print recent_docs Registry Key CMI-CreateHive{B01E557D-7818-4BA7-9885-E6592398B44E}SoftwareMicrosoftWindowsCurrentVersionExplorerRecentDocs with 11 values and 5 subkeys >>> print recent_docs.timestamp() # Last Written Time 2012-04-23 09:34:12.099998 We can iterate over the values in the recent_docs key using the values() function in a for loop. For each value, we can access the name(), value(), raw_data(), value_type(), and value_type_str() methods. The value() and raw_data() represent the data in different ways. We will use the raw_data() function when we want to work with the underlying binary data and use the value() function to gather an interpreted result. The value_type() and value_type_str() functions display a number or string that identify the type of data, such as REG_BINARY, REG_DWORD, REG_SZ, and so on. >>> for i, value in enumerate(recent_docs.values()): ... print '{}) {}: {}'.format(i, value.name(), value.value()) ... 0) MRUListEx: ???? 1) 0: myDocument.docx 2) 4: oldArchive.zip 3) 2: Salaries.xlsx ... Another useful feature of the Registry module is the means provided for querying for a certain subkey or value. This is provided by the subkey(), value(), or find_key() functions. A RegistryKeyNotFoundException is generated when a subkey is not present while using the subkey() function: >>> if recent_docs.subkey('.docx'): ... print 'Found docx subkey.' ... Found docx subkey. >>> if recent_docs.subkey('.1234abcd'): ... print 'Found 1234abcd subkey.' ... Registry.Registry.RegistryKeyNotFoundException: ... The find_key() function takes a path and can find a subkey through multiple levels. The subkey() and value() functions only search child elements. We can use these functions to confirm that a key or value exists before trying to navigate to them. If a particular key or value cannot be found, a custom exception from the Registry module is raised. Be sure to add error handling to catch this error and also alert the user that the key was not discovered. With the Registry module, finding keys and their values becomes straightforward. However, when the values are not strings and are instead binary data we have to rely on another module to make sense of the mess. For all binary needs, the struct module is an excellent candidate. Read also: Tools for Working with Excel and Python Creating Spreadsheets with the xlsxwriter Module Xlsxwriter is a useful third-party module that writes Excel output. There are a plethora of Excel-supported modules for Python, but we chose this module because it was highly robust and well-documented. As the name suggests, this module can only be used to write Excel spreadsheets. The xlsxwriter module supports cell and conditional formatting, charts, tables, filters, and macros among others. Adding data to a spreadsheet Let's quickly create a script called simplexlsx.v1.py for this example. On lines 1 and 2 we import the xlsxwriter and datetime modules. The data we are going to be plotting, including the header column is stored as nested lists in the school_data variable. Each list is a row of information that we want to store in the output excel sheet, with the first element containing the column names. 001 import xlsxwriter 002 from datetime import datetime 003 004 school_data = [['Department', 'Students', 'Cumulative GPA', 'Final Date'], 005 ['Computer Science', 235, 3.44, datetime(2015, 07, 23, 18, 00, 00)], 006 ['Chemistry', 201, 3.26, datetime(2015, 07, 25, 9, 30, 00)], 007 ['Forensics', 99, 3.8, datetime(2015, 07, 23, 9, 30, 00)], 008 ['Astronomy', 115, 3.21, datetime(2015, 07, 19, 15, 30, 00)]] The writeXLSX() function, defined on line 11, is responsible for writing our data in to a spreadsheet. First, we must create our Excel spreadsheet using the Workbook() function supplying the desired name of the file. On line 13, we create a worksheet using the add_worksheet() function. This function can take the desired title of the worksheet or use the default name 'Sheet N', where N is the specific sheet number. 011 def writeXLSX(data): 012 workbook = xlsxwriter.Workbook('MyWorkbook.xlsx') 013 main_sheet = workbook.add_worksheet('MySheet') The date_format variable stores a custom number format that we will use to display our datetime objects in the desired format. On line 17, we begin to enumerate through our data to write. The conditional on line 18 is used to handle the header column which is the first list encountered. We use the write() function and supply a numerical row and column. Alternatively, we can also use the Excel notation, i.e. A1. 015 date_format = workbook.add_format({'num_format': 'mm/dd/yy hh:mm:ss AM/PM'}) 016 017 for i, entry in enumerate(data): 018 if i == 0: 019 main_sheet.write(i, 0, entry[0]) 020 main_sheet.write(i, 1, entry[1]) 021 main_sheet.write(i, 2, entry[2]) 022 main_sheet.write(i, 3, entry[3]) The write() method will try to write the appropriate type for an object when it can detect the type. However, we can use different write methods to specify the correct format. These specialized writers preserve the data type in Excel so that we can use the appropriate data type specific Excel functions for the object. Since we know the data types within the entry list, we can manually specify when to use the general write() function or the specific write_number() function. 023 else: 024 main_sheet.write(i, 0, entry[0]) 025 main_sheet.write_number(i, 1, entry[1]) 026 main_sheet.write_number(i, 2, entry[2]) For the fourth entry in the list, thedatetime object, we supply the write_datetime() function with our date_format defined on line 15. After our data is written to the workbook, we use the close() function to close and save our data. On line 32, we call the writeXLSX() function passing it to the school_data list we built earlier. 027 main_sheet.write_datetime(i, 3, entry[3], date_format) 028 029 workbook.close() 030 031 032 writeXLSX(school_data) A table of write functions and the objects they preserve is presented below. Function Supported Objects write_string str write_number int, float, long write_datetime datetime objects write_boolean bool write_url str When the script is invoked at the Command Line, a spreadsheet called MyWorkbook.xlsx is created. When we convert this to a table, we can sort it according to any of our values. Had we failed to preserve the data types values such as our dates might be identified as non-number types and prevent us from sorting them appropriately. Building a table Being able to write data to an Excel file and preserve the object type is a step-up over CSV, but we can do better. Often, the first thing an examiner will do with an Excel spreadsheet is convert the data into a table and begin the frenzy of sorting and filtering. We can convert our data range to a table. In fact, writing a table with xlsxwriter is arguably easier than writing each row individually. The following code will be saved into the file simplexlsx.v2.py. For this iteration, we have removed the initial list in the school_data variable that contained the header information. Our new writeXLSX() function writes the header separately. 004 school_data = [['Computer Science', 235, 3.44, datetime(2015, 07, 23, 18, 00, 00)], 005 ['Chemistry', 201, 3.26, datetime(2015, 07, 25, 9, 30, 00)], 006 ['Forensics', 99, 3.8, datetime(2015, 07, 23, 9, 30, 00)], 007 ['Astronomy', 115, 3.21, datetime(2015, 07, 19, 15, 30, 00)]] Lines 10 through 14 are identical to the previous iteration of the function. Representing our table on the spreadsheet is accomplished on line 16. 010 def writeXLSX(data): 011 workbook = xlsxwriter.Workbook('MyWorkbook.xlsx') 012 main_sheet = workbook.add_worksheet('MySheet') 013 014 date_format = workbook.add_format({'num_format': 'mm/dd/yy hh:mm:ss AM/PM'}) The add_table() function takes multiple arguments. First, we pass a string representing the top-left and bottom-right cells of the table in Excel notation. We use the length variable, defined on line 15, to calculate the necessary length of our table. The second argument is a little more confusing; this is a dictionary with two keys, named data and columns. The data key has a value of our data variable, which is perhaps poorly named in this case. The columns key defines each row header and, optionally, its format, as seen on line 19: 015 length = str(len(data) + 1) 016 main_sheet.add_table(('A1:D' + length), {'data': data, 017 'columns': [{'header': 'Department'}, {'header': 'Students'}, 018 {'header': 'Cumulative GPA'}, 019 {'header': 'Final Date', 'format': date_format}]}) 020 workbook.close() In lesser lines than the previous example, we've managed to create a more useful output built as a table. Now our spreadsheet has our specified data already converted into a table and ready to be sorted. There are more possible keys and values that can be supplied during the construction of a table. Please consult the documentation at (http://xlsxwriter.readthedocs.org) for more details on advanced usage. This process is simple when we are working with nested lists representing each row of a worksheet. Data structures not in the specified format require a combination of both methods demonstrated in our previous iterations to achieve the same effect. For example, we can define a table to span across a certain number of rows and columns and then use the write() function for those cells. However, to prevent unnecessary headaches we recommend keeping data in nested lists. Creating charts with Python Lastly, let's create a chart with xlsxwriter. The module supports a variety of different chart types including: line, scatter, bar, column, pie, and area. We use charts to summarize the data in meaningful ways. This is particularly useful when working with large data sets, allowing examiners to gain a high level of understanding of the data before getting into the weeds. Let's modify the previous iteration yet again to display a chart. We will save this modified file as simplexlsx.v3.py. On line 21, we are going to create a variable called department_grades. This variable will be our chart object created by the add_chart()method. For this method, we pass in a dictionary specifying keys and values[SS4] . In this case, we specify the type of the chart to be a column chart. 021 department_grades = workbook.add_chart({'type':'column'}) On line 22, we use theset_title() function and again pass it in a dictionary of parameters. We set the name key equal to our desired title. At this point, we need to tell the chart what data to plot. We do this with the add_series() function. Each category key maps to the Excel notation specifying the horizontal axis data. The vertical axis is represented by the values key. With the data to plot specified, we use theinsert_chart() function to plot the data in the spreadsheet. We give this function a string of the cell to plot the top-left of the chart and then the chart object itself. 022 department_grades.set_title({'name':'Department and Grade distribution'}) 023 department_grades.add_series({'categories':'=MySheet!$A$2:$A$5', 'values':'=MySheet!$C$2:$C$5'}) 024 main_sheet.insert_chart('A8', department_grades) 025 workbook.close() Running this version of the script will convert our data into a table and generate a column chart comparing departments by their grades. We can clearly see that, unsurprisingly, the Forensic Science department has the highest GPA earners in the school's program. This information is easy enough to eyeball for such a small data set. However, when working with data orders of larger magnitude, creating summarizing graphics can be particularly useful to understand the big picture. Be aware that there is a great deal of additional functionality in the xlsxwriter module that we will not use in our script. This is an extremely powerful module and we recommend it for any operation that requires writing Excel spreadsheets. Summary In this article, we began with introducing the Registry module and how it is used to obtain keys and values from registry hives. Next, we dealt with various aspects of spreadsheets, such as cells, tables, and charts using the xlswriter module. Resources for Article: Further resources on this subject: Test all the things with Python [article] An Introduction to Python Lists and Dictionaries [article] Python Data Science Up and Running [article]
Read more
  • 0
  • 0
  • 33740

article-image-understanding-proxmox-ve-and-advanced-installation
Packt
13 Apr 2016
12 min read
Save for later

Understanding Proxmox VE and Advanced Installation

Packt
13 Apr 2016
12 min read
In this article by Wasim Ahmed, the author of the book Mastering Proxmox - Second Edition, we will see Virtualization as we all know today is a decade old technology that was first implemented in mainframes of the 1960s. Virtualization was a way to logically divide the mainframe's resources for different application processing. With the rise in energy costs, running under-utilized server hardware is no longer a luxury. Virtualization enables us to do more with less thus save energy and money while creating a virtual green data center without geographical boundaries. (For more resources related to this topic, see here.) A hypervisor is a piece software, hardware, or firmware that creates and manages virtual machines. It is the underlying platform or foundation that allows a virtual world to be built upon. In a way, it is the very building block of all virtualization. A bare metal hypervisor acts as a bridge between physical hardware and the virtual machines by creating an abstraction layer. Because of this unique feature, an entire virtual machine can be moved over a vast distance over the Internet and be made able to function exactly the same. A virtual machine does not see the hardware directly; instead, it sees the layer of the hypervisor, which is the same no matter on what hardware the hypervisor has been installed. The Proxmox Virtual Environment (VE) is a cluster-based hypervisor and one of the best kept secrets in the virtualization world. The reason is simple. It allows you to build an enterprise business-class virtual infrastructure at a small business-class price tag without sacrificing stability, performance, and ease of use. Whether it is a massive data center to serve millions of people, or a small educational institution, or a home serving important family members, Proxmox can handle configuration to suit any situation. If you have picked up this article, no doubt you will be familiar with virtualization and perhaps well versed with other hypervisors, such VMWare, Xen, Hyper-V, and so on. In this article and upcoming articles, we will see the mighty power of Proxmox from inside out. We will examine scenarios and create a complex virtual environment. We will tackle some heavy day-to-day issues and show resolutions, which might just save the day in a production environment. So, strap yourself and let's dive into the virtual world with the mighty hypervisor, Proxmox VE. Understanding Proxmox features Before we dive in, it is necessary to understand why one should choose Proxmox over the other main stream hypervisors. Proxmox is not perfect but stands out among other contenders with some hard to beat features. The following are some of the features that makes Proxmox a real game changer. It is free! Yes, Proxmox is free! To be more accurate, Proxmox has several subscription levels among which the community edition is completely free. One can simply download Proxmox ISO at no cost and raise a fully functional cluster without missing a single feature and without paying anything. The main difference between the paid and community subscription level is that the paid subscription receives updates, which goes through additional testing and refinement. If you are running a production cluster with real workload, it is highly recommended that you purchase support and licensing from Proxmox or Proxmox resellers. Built-in firewall Proxmox VE comes with a robust firewall ready to be configured out of the box. This firewall can be configured to protect the entire Proxmox cluster down to a virtual machine. The Per VM firewall option gives you the ability to configure each VM individually by creating individualized firewall rules, a prominent feature in a multi-tenant virtual environment. Open vSwitch Licensed under Apache 2.0 license, Open vSwitch is a virtual switch designed to work in a multi-server virtual environment. All hypervisors need a bridge between VMs and the outside network. Open vSwitch enhances features of the standard Linux bridge in an ever changing virtual environment. Proxmox fully supports Open vSwitch that allows you to create an intricate virtual environment all the while, reducing virtual network management overhead. For details on Open vSwitch, refer to http://openvswitch.org/. The graphical user interface Proxmox comes with a fully functional graphical user interface or GUI out of the box. The GUI allows an administrator to manage and configure almost all the aspects of a Proxmox cluster. The GUI has been designed keeping simplicity in mind with functions and features separated into menus for easier navigation. The following screenshot shows an example of the Proxmox GUI dashboard: KVM virtual machines KVM or Kernel-based virtual machine is a kernel module that is added to Linux for full virtualization to create isolated fully independent virtual machines. KVM VMs are not dependent on the host operating system in any way, but they do require the virtualization feature in BIOS to be enabled. KVM allows a wide variety of operating systems for virtual machines, such as Linux and Windows. Proxmox provides a very stable environment for KVM-based VMs. Linux containers or LXC Introduced recently in Proxmox VE 4.0, Linux containers allow multiple Linux instances on the same Linux host. All the containers are dependent on the host Linux operating system and only Linux flavors can be virtualized as containers. There are no containers for the Windows operating system. LXC replace prior OpenVZ containers, which were the primary containers in the virtualization method in the previous Proxmox versions. If you are not familiar with LXC and for details on LXC, refer to https://linuxcontainers.org/. Storage plugins Out of the box, Proxmox VE supports a variety of storage systems to store virtual disk images, ISO templates, backups, and so on. All plug-ins are quite stable and work great with Proxmox. Being able to choose different storage systems gives an administrator the flexibility to leverage the existing storage in the network. As of Proxmox VE 4.0, the following storage plug-ins are supported: The local directory mount points iSCSI LVM Group NFS Share GlusterFS Ceph RBD ZFS Vibrant culture Proxmox has a growing community of users who are always helping others to learn Proxmox and troubleshoot various issues. With so many active users around the world and through active participation of Proxmox developers, the community has now become a culture of its own. Feature requests are continuously being worked on, and the existing features are being strengthened on a regular basis. With so many users supporting Proxmox, it is sure here to stay. The basic installation of Proxmox The installation of a Proxmox node is very straightforward. Simply, accept the default options, select localization, and enter the network information to install Proxmox VE. We can summarize the installation process in the following steps: Download ISO from the official Proxmox site and prepare a disc with the image (http://proxmox.com/en/downloads). Boot the node with the disc and hit enter to start the installation from the installation GUI. We can also install Proxmox from a USB drive. Progress through the prompts to select options or type in information. After the installation is complete, access the Proxmox GUI dashboard using the IP address, as follows: https://<proxmox_node_ip:8006 In some cases, it may be necessary to open the firewall port to allow access to the GUI over port 8006. The advanced installation option Although the basic installation works in all scenarios, there may be times when the advanced installation option may be necessary. Only the advanced installation option provides you the ability to customize the main OS drive. A common practice for the operating system drive is to use a mirror RAID array using a controller interface. This provides drive redundancy if one of the drives fails. This same level of redundancy can also be achieved using a software-based RAID array, such as ZFS. Proxmox now offers options to select ZFS-based arrays for the operating system drive right at the beginning of the installation. For details on ZFS, if you are not familiar with ZFS, refer to https://en.wikipedia.org/wiki/ZFS. It is a common question to ask why one should choose ZFS software RAID over tried and tested hardware-based RAID. The simple answer is flexibility. A hardware RAID is locked or fully dependent on the hardware RAID controller interface that created the array, whereas ZFS software-based is not dependent on any hardware, and the array can be easily be ported to different hardware nodes. Should a RAID controller failure occur, the entire array created from that controller is lost unless there is an identical controller interface available for replacement? The ZFS array is only lost when all the drives or maximum tolerable number of drives are lost in the array. Besides ZFS, we can also select other filesystem types, such as ext3, ext4, or xfs from the same advanced option. We can also set the custom disk or partition sizes through the advanced option. The following screenshot shows the installation interface with the Target Hard disk selection page: Click on Options, as shown in the preceding screenshot, to open the advanced option for the Hard disk. The following screenshot shows the option window after clicking on the Options button: In the preceding screenshot, we selected ZFS RAID1 for mirroring and the two drives, Harddisk 0 and Harddisk 1, respectively to install Proxmox. If we pick one of the filesystems such as ext3, ext4, or xfs instead of ZFS, the Hard disk Option dialog box will look like the following screenshot with different set of options: Selecting a filesystem gives us the following advanced options: hdsize: This is the total drive size to be used by the Proxmox installation. swapsize: This defines the swap partition size. maxroot: This defines the maximum size to be used by the root partition. minfree: This defines the minimum free space that should remain after the Proxmox installation. maxvz: This defines the maximum size for data partition. This is usually /var/lib/vz. Debugging the Proxmox installation Debugging features are part of any good operating system. Proxmox has debugging features that will help you during a failed installation. Some common reasons are unsupported hardware, conflicts between devices, ISO image errors, and so on. Debugging mode logs and displays installation activities in real time. When the standard installation fails, we can start the Proxmox installation in debug mode from the main installation interface, as shown in the following screenshot: The debug installation mode will drop us in the following prompt. To start the installation, we need to press Ctrl + D. When there is an error during the installation, we can simply press Ctrl + C to get back to this console to continue with our investigation: From the console, we can check the installation log using the following command: # cat /tmp/install.log From the main installation menu, we can also press e to enter edit mode to change the loader information, as shown in the following screenshot: At times, it may be necessary to edit the loader information when normal booting does not function. This is a common case when Proxmox is unable to show the video output due to UEFI or a nonsupported resolution. In such cases, the booting process may hang. One way to continue with booting is to add the nomodeset argument by editing the loader. The loader will look as follows after editing: linux/boot/linux26 ro ramdisk_size=16777216 rw quiet nomodeset Customizing the Proxmox splash screen When building a custom Proxmox solution, it may be necessary to change the default blue splash screen to something more appealing in order to identify the company or department the server belongs to. In this section, we will see how easily we can integrate any image as the splash screen background. The splash screen image must be in the .tga format and must have fixed standard sizes, such as 640 x 480, 800 x 600, or 1024 x 768. If you do not have any image software that supports the .tga format, you can easily convert an jpg, gif, or png image to the .tga format using a free online image converter (http://image.online-convert.com/convert-to-tga). Once the desired image is ready in the .tga format, the following steps will integrate the image as the Proxmox splash screen: Copy the .tga image in the Proxmox node in the /boot/grub directory. Edit the grub file in /etc/default/grub to add the following code, and click on save: GRUB_BACKGROUND=/boot/grub/<image_name>.tga Run the following command to update the grub configuration: # update-grub Reboot. The following screenshot shows an example of how the splash screen may look like after we add a custom image to it: Picture courtesy of www.techcitynews.com We can also change the font color to make it properly visible, depending on the custom image used. To change the font color, edit the debian theme file in /etc/grub.d/05_debian_theme, and find the following line of code: set_background_image "${GRUB_BACKGROUND}" || set_default_theme Edit the line to add the font color, as shown in the following format. In our example, we have changed the font color to black and highlighted the font color to light blue: set_background_image "${GRUB_BACKGROUND}" "black/black" "light-blue/black" || set_default_theme After making the necessary changes, update grub, and reboot to see the changes. Summary In this article, we looked at why Proxmox is a better option as a hypervisor, what advanced installation options are available during an installation, and why do we choose software RAID for the operating system drive. We also looked at the cost of Proxmox, storage options, and network flexibility using openvswitch. We learned the presence of the debugging features and customization options of the Proxmox splash screen. In next article, we will take a closer look at the Proxmox GUI and see how easy it is to centrally manage a Proxmox cluster from a web browser. Resources for Article:   Further resources on this subject: Proxmox VE Fundamentals [article] Basic Concepts of Proxmox Virtual Environment [article]
Read more
  • 0
  • 0
  • 21868

Packt
13 Apr 2016
7 min read
Save for later

Nginx "expires" directive – Emitting Caching Headers

Packt
13 Apr 2016
7 min read
In this article by Alex Kapranoff, the author of the book Nginx Troubleshooting, explains how all browsers (and even many non-browser HTTP clients) support client-side caching. It is a part of the HTTP standard, albeit one of the most complex caching to understand. Web servers do not control client-side caching to full extent, obviously, but they may issue recommendations about what to cache and how, in the form of special HTTP response headers. This is a topic thoroughly discussed in many great articles and guides, so we will mention it shortly, and with a lean towards problems you may face and how to troubleshoot them. (For more resources related to this topic, see here.) In spite of the fact that browsers have been supporting caching on their side for at least 20 years, configuring cache headers was always a little confusing mostly due to the fact that there two sets of headers designed for the same purpose but having different scopes and totally different formats. There is the Expires: header, which was designed as a quick and dirty solution and also the new (relatively) almost omnipotent Cache-Control: header, which tries to support all the different ways an HTTP cache could work. This is an example of a modern HTTP request-response pair containing the caching headers. First is the request headers sent from the browser (here Firefox 41, but it does not matter): User-Agent:"Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:41.0) Gecko/20100101 Firefox/41.0" Accept:"text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8" Accept-Encoding:"gzip, deflate" Connection:"keep-alive" Cache-Control:"max-age=0" Then, the response headers is: Cache-Control:"max-age=1800" Content-Encoding:"gzip" Content-Type:"text/html; charset=UTF-8" Date:"Sun, 10 Oct 2015 13:42:34 GMT" Expires:"Sun, 10 Oct 2015 14:12:34 GMT" We highlighted the parts that are relevant. Note that some directives may be sent by both sides of the conversation. First, the browser sent the Cache-Control: max-age=0 header because the user pressed the F5 key. This is an indication that the user wants to receive a response that is fresh. Normally, the request will not contain this header and will allow any intermediate cache to respond with a stale but still nonexpired response. In this case, the server we talked to responded with a gzipped HTML page encoded in UTF-8 and indicated that the response is okay to use for half an hour. It used both mechanisms available, the modern Cache-Control:max-age=1800 header and the very old Expires:Sun, 10 Oct 2015 14:12:34 GMT header. The X-Cache: "EXPIRED" header is not a standard HTTP header but was also probably (there is no way to know for sure from the outside) emitted by Nginx. It may be an indication that there are, indeed, intermediate caching proxies between the client and the server, and one of them added this header for debugging purposes. The header may also show that the backend software uses some internal caching. Another possible source of this header is a debugging technique used to find problems in the Nginx cache configuration. The idea is to use the cache hit or miss status, which is available in one of the handy internal Nginx variables as a value for an extra header and then to be able to monitor the status from the client side. This is the code that will add such a header: add_header X-Cache $upstream_cache_status; Nginx has a special directive that transparently sets up both of standard cache control headers, and it is named expires. This is a piece of the nginx.conf file using the expires directive: location ~* \.(?:css|js)$ { expires 1y; add_header Cache-Control "public"; } First, the pattern uses the so-called noncapturing parenthesis, which is a feature first appeared in Perl regular expressions. The effect of this regexp is the same as of a simpler \.(css|js)$ pattern, but the regular expression engine is specifically instructed not to create a variable containing the actual string from inside the parenthesis. This is a simple optimization. Then, the expires directive declares that the content of the css and js files will expire after a year of storage. The actual headers as received by the client will look like this: Server: nginx/1.9.8 (Ubuntu) Date: Fri, 11 Mar 2016 22:01:04 GMT Content-Type: text/css Last-Modified: Thu, 10 Mar 2016 05:45:39 GMT Expires: Sat, 11 Mar 2017 22:01:04 GMT Cache-Control: max-age=31536000 The last two lines contain the same information in wildly different forms. The Expires: header is exactly one year after the date in the Date: header, whereas Cache-Control: specifies the age in seconds so that the client do the date arithmetics itself. The last directive in the provided configuration extract adds another Cache-Control: header with a value of public explicitly. What this means is that the content of the HTTP resource is not access-controlled and therefore may be cached not only for one particular user but also anywhere else. A simple and effective strategy that was used in offices to minimize consumed bandwidth is to have an office-wide caching proxy server. When one user requested a resource from a website on the Internet and that resource had a Cache-Control: public designation, the company cache server would store that to serve to other users on the office network. This may not be as popular today due to cheap bandwidth, but because history has a tendency to repeat itself, you need to know how and why Cache-Control: public works. The Nginx expires directive is surprisingly expressive. It may take a number of different values. See this table: off This value turns off Nginx cache headers logic. Nothing will be added, and more importantly, existing headers received from upstreams will not be modified. epoch This is an artificial value used to purge a stored resource from all caches by setting the Expires header to "1 January, 1970 00:00:01 GMT". max This is the opposite of the "epoch" value. The Expires header will be equal to "31 December 2037 23:59:59 GMT", and the Cache-Control max-age set to 10 years. This basically means that the HTTP responses are guaranteed to never change, so clients are free to never request the same thing twice and may use their own stored values. Specific time An actual specific time value means an expiry deadline from the time of the respective request. For example, expires 10w; A negative value for this directive will emit a special header Cache-Control: no-cache. "modified" specific time If you add the keyword "modified" before the time value, then the expiration moment will be computed relatively to the modification time of the file that is served. "@" specific time A time with an @ prefix specifies an absolute time-of-day expiry. This should be less than 24 hours. For example, Expires @17h;. Many web applications choose to emit the caching headers themselves, and this is a good thing. They have more information about which resources change often and which never change. Tampering with the headers that you receive from the upstream may or may not be a thing you want to do. Sometimes, adding headers to a response while proxying it may produce a conflicting set of headers and therefore create an unpredictable behavior. The static files that you serve with Nginx yourself should have the expires directive in place. However, the general advice about upstreams is to always examine the caching headers you get and refrain from overoptimizing by setting up more aggressive caching policy. Resources for Article: Further resources on this subject: Nginx service [article] Fine-tune the NGINX Configuration [article] Nginx Web Services: Configuration and Implementation [article]
Read more
  • 0
  • 0
  • 26567
article-image-cluster-computing-using-scala
Packt
13 Apr 2016
18 min read
Save for later

Cluster Computing Using Scala

Packt
13 Apr 2016
18 min read
In this article by Vytautas Jančauskas the author of the book Scientific Computing with Scala, explains the way of writing software to be run on distributed computing clusters. We will learn the MPJ Express library here. (For more resources related to this topic, see here.) Very often when dealing with intense data processing tasks and simulations of physical phenomena, there comes a time when no matter how many CPU cores and memory your workstation has, it is not enough. At times like these, you will want to turn to supercomputing clusters for help. These distributed computing environments consist of many nodes (each node being a separate computer) connected into a computer network using specialized high bandwidth and low latency connections (or if you are on a budget standard Ethernet hardware is often enough). These computers usually utilize a network filesystem allowing each node to see the same files. They communicate using messaging libraries, such as MPI. Your program will run on separate computers and utilize the message passing framework to exchange data via the computer network. Using MPJ Express for distributed computing MPJ Express is a message passing library for distributed computing. It works in programming languages using Java Virtual Machine (JVM). So, we can use it from Scala. It is similar in functionality and programming interface to MPI. If you know MPI, you will be able to use MPJ Express pretty much the same way. The differences specific to Scala are explained in this section. We will start with how to install it. For further reference, visit the MPJ Express website given here: http://mpj-express.org/ Setting up and running MPJ Express The steps to set up and run MPJ Express are as follows: First, download MPJ Express from the following link. The version at the time of this writing is 0.44.http://mpj-express.org/download.php Unpack the archive and refer to the included README file for installation instructions. Currently, you have to set MPJ_HOME to the folder you unpacked the archive to and add the bin folder in that archive to your path. For example, if you are a Linux user using bash as your shell, you can add the following two lines to your .bashrc file (the file is in your home directory at /home/yourusername/.bashrc): export MPJ_HOME=/home/yourusername/mpj export PATH=$MPJ_HOME/bin:$PATH Here, mpj is the folder you extracted the archive you downloaded from the MPJ Express website to. If you are using a different system, you will have to do the equivalent of the above for your system to use MPJ Express. We will want to use MPJ Express with Scala Build Tool (SBT), which we used previously to build and run all of our programs. Create the following directory structure: scalacluster/ lib/ project/ plugins.sbt build.sbt I have chosen to name the project folder asscalacluster here, but you can call it whatever you want. The .jar files in the lib folder will be accessible to your program now. Copy the contents of the lib folder from the mpj directory to this folder. Finally, create an empty build.sbt and plugins.sbt files. Let’s now write and run a simple "Hello, World!" program to test our setup: import mpi._ object MPJTest { def main(args: Array[String]) { MPI.Init(args) val me: Int = MPI.COMM_WORLD.Rank val size: Int = MPI.COMM_WORLD.Size println("Hello, World, I'm <" + me + ">") MPI.Finalize() } } This should be familiar to everyone who has ever used MPI. First, we import everything from the mpj package. Then, we initialize MPJ Express by calling MPI.Initialize, the arguments to MPJ Express will be passed from the command-line arguments you will enter when running the program. The MPI.COMM_WORLD.Rank() function returns the MPJ processes rank. A rank is a unique identifier used to distinguish processes from one another. They are used when you want different processes to do different things. A common pattern is to use the process with rank 0 as the master process and the processes with other ranks as workers. Then, you can use the processes rank to decide what action to take in the program. We also determine how many MPJ processes were launched by checking MPI.COMM_WORLD.Size. Our program will simply print a processes rank for now. We will want to run it. If you don't have a distributed computing cluster readily available, don't worry. You can test your programs locally on your desktop or laptop. The same program will work without changes on clusters as well. To run programs written using MPJ Express, you have to use the mpjrun.sh script. This script will be available to you if you have added the bin folder of the MPJ Express archive to your PATH as described in the section on installing MPJ Express. The mpjrun.sh script will setup the environment for your MPJ Express processes and start said processes. The mpjrun.sh script takes a .jar file, so we need to create one. Unfortunately for us, this cannot easily be done using the sbt package command in the directory containing our program. This worked previously, because we used Scala runtime to execute our programs. MPJ Express uses Java. The problem is that the .jar package created with sbt package does not include Scala's standard library. We need what is called a fat .jar—one that contains all the dependencies within itself. One way of generating it is to use a plugin for SBT called sbt-assembly. The website for this plugin is given here: https://github.com/sbt/sbt-assembly There is a simple way of adding the plugin for use in our project. Remember that project/plugins.sbt file we created? All you need to do is add the following line to it (the line may be different for different versions of the plugin. Consult the website): addSbtPlugin("com.eed3si9n" % "sbt-assembly" % "0.14.1") Now, add the following to the build.sbt file you created: lazy val root = (project in file(".")). settings( name := "mpjtest", version := "1.0", scalaVersion := "2.11.7" ) Then, execute the sbt assembly command from the shell to build the .jar file. The file will be put under the following directory if you are using the preceding build.sbt file. That is, if the folder you put the program and build.sbt in is /home/you/cluster: /home/you/cluster/target/scala-2.11/mpjtest-assembly- 1.0.jar Now, you can run the mpjtest-assembly-1.0.jar file as follows: $ mpjrun.sh -np 4 -jar target/scala-2.11/mpjtest-assembly-1.0.jar MPJ Express (0.44) is started in the multicore configuration Hello, World, I'm <0> Hello, World, I'm <2> Hello, World, I'm <3> Hello, World, I'm <1> Argument -np specifies how many processes to run. Since we specified -np 4, four processes will be started by the script. The order of the "Hello, World" messages can differ on your system since the precise order of execution of different processes is undetermined. If you got the output similar to the one shown here, then congratulations, you have done the majority of the work needed to write and deploy applications using MPJ Express. Using Send and Recv MPJ Express processes can communicate using Send and Recv. These methods constitute arguably the simplest and easiest to understand mode of operation that is also probably the most error prone. We will look at these two first. The following are the signatures for the Send and Recv methods: public void Send(java.lang.Object buf, int offset, int count, Datatype datatype, int dest, int tag) throws MPIException public Status Recv(java.lang.Object buf, int offset, int count, Datatype datatype, int source, int tag) throws MPIException Both of these calls are blocking. This means that after calling Send, your process will block (will not execute the instructions following it) until a corresponding Recv is called by another process. Also Recv will block the process, until a corresponding Send happens. By corresponding, we mean that the dest and source arguments of the calls have the values corresponding to receivers and senders ranks, respectively. The two calls will be enough to implement many complicated communication patterns. However, they are prone to various problems such as deadlocks. Also, they are quite difficult to debug, since you have to make sure that each Send has the correct corresponding Recv and vice versa. The parameters for Send and Recv are basically the same. The meanings of those parameters are summarized in the following table: Argument Type Description Buf java.lang.Object It has to be a one-dimensional Java array. When using from Scala, use the Scala array, which is a one-to-one mapping to a Java array. offset int The start of the data you want to pass from the start of the array. Count int This shows the number items of the array you want to pass. datatype Datatype The type of data in the array. Can be one of the following: MPI.BYTE, MPI.CHAR, MPI.SHORT, MPI.BOOLEAN, MPI.INT, MPI.LONG, MPI.FLOAT, MPI.DOUBLE, MPI.OBJECT, MPI.LB, MPI.UB, and MPI.PACKED. dest/source int Either the destination to send the message to or the source to get the message from. You use the rank of the process to identify sources and destinations. tag int Used to tag the message. Can be used to introduce different message types. Can be ignored for most common applications. Let’s look at a simple program using these calls for communication. We will implement a simple master/worker communication pattern: import mpi._ import scala.util.Random object MPJTest { def main(args: Array[String]) { MPI.Init(args) val me: Int = MPI.COMM_WORLD.Rank() val size: Int = MPI.COMM_WORLD.Size() if (me == 0) { Here, we use an if statement to identify who we are based on our rank. Since each process gets a unique rank, this allows us to determine what action should be taken. In our case, we assigned the role of the master to the process with rank 0 and the role of a worker to processes with other ranks: for (i <- 1 until size) { val buf = Array(Random.nextInt(100)) MPI.COMM_WORLD.Send(buf, 0, 1, MPI.INT, i, 0) println("MASTER: Dear <" + i + "> please do work on " + buf(0)) } We iterate over workers, who have the ranks from 1 to whatever is the argument for number of processes you passed to the mpjrun.sh script. Let’s say that number is four. This gives us one master process and three worker processes. So, each process with a rank from 1 to 3 will get a randomly generated number. We have to put that number in an array even though it is a single number. This is because both Send and Recv methods expect an array as their first argument. We then use the Send method to send the data. We specified the array as argument buf, offset of 0, size of 1, type MPI.INT, destination as the for loop index, and tag as 0. This means that each of our three worker processes will receive a (most probably) different number: for (i <- 1 until size) { val buf = Array(0) MPI.COMM_WORLD.Recv(buf, 0, 1, MPI.INT, i, 0) println("MASTER: Dear <" + i + "> thanks for the reply, which was " + buf(0)) } Finally, we collect the results from the workers. For this, we iterate over the worker ranks and use the Recv method on each one of them. We print the result we got from the worker, and this concludes the master's part. We now move on to the workers: } else { val buf = Array(0) MPI.COMM_WORLD.Recv(buf, 0, 1, MPI.INT, 0, 0) println("<" + me + ">: " + "Understood, doing work on " + buf(0)) buf(0) = buf(0) * buf(0) MPI.COMM_WORLD.Send(buf, 0, 1, MPI.INT, 0, 0) println("<" + me + ">: " + "Reporting back") } The workers code is identical for all of them. They receive a message from the master, calculate the square of it, and send it back: MPI.Finalize() } } After you run the program, the results should be akin to the following, which I got when running this program on my system: MASTER: Dear <1> please do work on 71 MASTER: Dear <2> please do work on 12 MASTER: Dear <3> please do work on 55 <1>: Understood, doing work on 71 <1>: Reported back MASTER: Dear <1> thanks for the reply, which was 5041 <3>: Understood, doing work on 55 <2>: Understood, doing work on 12 <2>: Reported back MASTER: Dear <2> thanks for the reply, which was 144 MASTER: Dear <3> thanks for the reply, which was 3025 <3>: Reported back Sending Scala objects in MPJ Express messages Sometimes, the types provided by MPJ Express for use in the Send and Recv methods are not enough. You may want to send your MPJ Express processes a Scala object. A very realistic example of this would be to send an instance of a Scala case class. These can be used to construct more complicated data types consisting of several different basic types. A simple example is a two-dimensional vector consisting of x and y coordinates. This can be sent as a simple array, but more complicated classes can't. For example, you may want to use a case class as the one shown here. It has two attributes of type String and one attribute of type Int. So what do we do with a data type like this? The simplest answer to that problem is to serialize it. Serializing converts an object to a stream of characters or a string that can be sent over the network (or stored to a file or done other things with) and later on deserialized to get the original object back: scala> case class Person(name: String, surname: String, age: Int) defined class Person scala> val a = Person("Name", "Surname", 25) a: Person = Person(Name,Surname,25) A simple way of serializing is to use a format such as XML or JSON. This can be done automatically using a pickling library. Pickling is a term that comes from the Python programming language. It is the automatic conversion of an arbitrary object into a string representation that can later be de-converted to get the original object back. The reconstructed object will behave the same way as it did before conversion. This allows one to store arbitrary objects to files for example. There is a pickling library available for Scala as well. You can of course do serialization in several different ways (for example, using the powerful support for XML available in Scala). We will use the pickling library that is available from the following website for this example: https://github.com/scala/pickling You can install it by adding the following line to your build.sbt file: libraryDependencies += "org.scala-lang.modules" %% "scala- pickling" % "0.10.1" After doing that, use the following import statements to enable easy pickling in your projects: scala> import scala.pickling.Defaults._ import scala.pickling.Defaults._ scala> import scala.pickling.json._ import scala.pickling.json._ Here, you can see how you can then easily use this library to pickle and unpickle arbitrary objects without the use of annoying boiler plate code: scala> val pklA = a.pickle pklA: pickling.json.pickleFormat.PickleType = JSONPickle({ "$type": "Person", "name": "Name", "surname": "Surname", "age": 25 }) scala> val unpklA = pklA.unpickle[Person] unpklA: Person = Person(Name,Surname,25) Let’s see how this would work in an application using MPJ Express for message passing. A program using pickling to send a case class instance in a message is given here: import mpi._ import scala.pickling.Defaults._ import scala.pickling.json._ case class ArbitraryObject(a: Array[Double], b: Array[Int], c: String) Here, we have chosen to define a fairly complex case class, consisting of two arrays of different types and a string: object MPJTest { def main(args: Array[String]) { MPI.Init(args) val me: Int = MPI.COMM_WORLD.Rank() val size: Int = MPI.COMM_WORLD.Size() if (me == 0) { val obj = ArbitraryObject(Array(1.0, 2.0, 3.0), Array(1, 2, 3), "Hello") val pkl = obj.pickle.value.toCharArray MPI.COMM_WORLD.Send(pkl, 0, pkl.size, MPI.CHAR, 1, 0) In the preceding bit of code, we create an instance of our case class. We then pickle it to JSON and get the string representation of said JSON with the value method. However, to send it in an MPJ message, we need to convert it to a one-dimensional array of one of the supported types. Since it is a string, we convert it to a char array. This is done using the toCharArray method: } else if (me == 1) { val buf = new Array[Char](1000) MPI.COMM_WORLD.Recv(buf, 0, 1000, MPI.CHAR, 0, 0) val msg = buf.mkString val obj = msg.unpickle[ArbitraryObject] On the receiving end, we get the raw char array, convert it back to string using mkString method, and then unpickle it using unpickle[T]. This will return an instance of the case class that we can use as any other instance of a case class. It is in its functionality the same object that was sent to us: println(msg) println(obj.c) } MPI.Finalize() } } The following is the result of running the preceding program. It prints out the JSON representation of our object, and also show that we can access the attributes of said object by printing the c attribute. MPJ Express (0.44) is started in the multicore configuration: { "$type": "ArbitraryObject", "a": [ 1.0, 2.0, 3.0 ], "b": [ 1, 2, 3 ], "c": "Hello" } Hello You can use this method to send arbitrary objects in an MPJ Express message. However, this is just one of many ways of doing this. As mentioned previously, an example of another way is to use the XML representation. XML support is strong in Scala, and you can use it to serialize objects as well. This will usually require you to add some boiler plate code to your program to serialize to XML. The method discussed earlier has the advantage of requiring no boiler plate code. Non-blocking communication So far, we examined only blocking (or synchronous) communication between two processes. This means that the process is blocked (halted their execution) until the Send or Recv methods have been completed successfully. This is simple to understand and enough for most cases. The problem with synchronous communication is that you have to be very careful otherwise deadlocks may occur. Deadlocks are situations when processes wait for each other to release a resource first. Mexican standoff including the dining philosophers problem is one of the famous example of Deadlock in Operating System. The point is that if you are unlucky, you may end up with a program that is seemingly stuck and you don't know why. Using nonlocking communication allows you to avoid these problems most of the time. If you think you may be at risk of deadlocks, you will probably want to use it. The signatures for the primary methods used in asynchronous communication are given here: Request Isend(java.lang.Object buf, int offset, int count, Datatype datatype, int dest, int tag) Isend works similar to its Send counterpart. The main differences are that it does not block (the program continues execution after the call rather than waiting for a corresponding send), and then it returns a Request object. This object is used to check the status of your Send request, block until it is complete if required, and so on: Request Irecv(java.lang.Object buf, int offset, int count, Datatype datatype, int src, int tag) Irecv is again the same as Recv only non-blocking and returns a Request object used to handle your receive request. The operation of these methods can be seen in action in the following example: import mpi._ object MPJTest { def main(args: Array[String]) { MPI.Init(args) val me: Int = MPI.COMM_WORLD.Rank() val size: Int = MPI.COMM_WORLD.Size() if (me == 0) { val requests = for (i <- 0 until 10) yield { val buf = Array(i * i) MPI.COMM_WORLD.Isend(buf, 0, 1, MPI.INT, 1, 0) } } else if (me == 1) { for (i <- 0 until 10) { Thread.sleep(1000) val buf = Array[Int](0) val request = MPI.COMM_WORLD.Irecv (buf, 0, 1, MPI.INT, 0, 0) request.Wait() println("RECEIVED: " + buf(0)) } } MPI.Finalize() } } This is a very simplistic example used simply to demonstrate the basics of using the asynchronous message passing methods. First, the process with rank 0 will send 10 messages to process with rank 1 using Isend. Since Isend does not block, the loop will finish quickly and the messages it sent will be buffered until they are retrieved using Irecv. The second process (the one with rank 1) will wait for one second before retrieving each message. This is to demonstrate the asynchronous nature of these methods. The messages are in the buffer waiting to be retrieved. Therefore, Irecv can be used at your leisure when convenient. The Wait() method of the Request object, it returns, has to be used to retrieve results. The Wait() method blocks until the message is successfully received from the buffer. Summary Extremely computationally intensive programs are usually parallelized and run on supercomputing clusters. These clusters consist of multiple networked computers. Communication between these computers is usually done using messaging libraries such as MPI. These allow you to pass data between processes running on different machines in an efficient manner. In this article, you have learned how to use MPJ Express—an MPI like library for JVM. We saw how to carry out process to process communication as well as collective communication. Most important MPJ Express primitives were covered and example programs using them were given. Resources for Article: Further resources on this subject: Differences in style between Java and Scala code[article] Getting Started with JavaFX[article] Integrating Scala, Groovy, and Flex Development with Apache Maven[article]
Read more
  • 0
  • 0
  • 4814

article-image-creating-graphs-and-charts
Packt
12 Apr 2016
17 min read
Save for later

Creating Graphs and Charts

Packt
12 Apr 2016
17 min read
In this article by Bhushan Purushottam Joshi author of the book Canvas Cookbook, highlights data representation in the form of graphs and charts with the following topics: Drawing the axes Drawing a simple equation Drawing a sinusoidal wave Drawing a line graph Drawing a bar graph Drawing a pie chart (For more resources related to this topic, see here.) Drawing the axes In school days, we all might have used a graph paper and drawn a vertical line called y axis and a horizontal line called as x axis. Here, in the first recipe of ours, we do only the drawing of axes. Also, we mark the points at equal intervals. The output looks like this: How to do it… The HTML code is as follows: <html> <head> <title>Axes</title> <script src="graphaxes.js"></script> </head> <body onload=init()> <canvas width="600" height="600" id="MyCanvasArea" style="border:2px solid blue;" tabindex="0"> Canvas tag is not supported by your browser </canvas> <br> <form id="myform"> Select your starting value <select name="startvalue" onclick="init()"> <option value=-10>-10</option> <option value=-9>-9</option> <option value=-8>-8</option> <option value=-7>-7</option> <option value=-6>-6</option> <option value=-5>-5</option> <option value=-4>-4</option> <option value=-3>-3</option> <option value=-2>-2</option> </select> </form> </body> </html> The JavaScript code is as follows: varxMin=-10;varyMin=-10;varxMax=10;varyMax=10; //draw the x-axis varcan;varctx;varxaxisx;varxaxisy;varyaxisx;varyaxisy; varinterval;var length; functioninit(){ can=document.getElementById('MyCanvasArea'); ctx=can.getContext('2d'); ctx.clearRect(0,0,can.width,can.height); varsel=document.forms['myform'].elements['startvalue']; xMin=sel.value; yMin=xMin; xMax=-xMin; yMax=-xMin; drawXAxis(); drawYAxis(); } functiondrawXAxis(){ //x axis drawing and marking on the same xaxisx=10; xaxisy=can.height/2; ctx.beginPath(); ctx.lineWidth=2; ctx.strokeStyle="black"; ctx.moveTo(xaxisx,xaxisy); xaxisx=can.width-10; ctx.lineTo(xaxisx,xaxisy); ctx.stroke(); ctx.closePath(); length=xaxisx-10; noofxfragments=xMax-xMin; interval=length/noofxfragments; //mark the x-axis xaxisx=10; ctx.beginPath(); ctx.font="bold 10pt Arial"; for(vari=xMin;i<=xMax;i++) { ctx.lineWidth=0.15; ctx.strokeStyle="grey"; ctx.fillText(i,xaxisx-5,xaxisy-10); ctx.moveTo(xaxisx,xaxisy-(can.width/2)); ctx.lineTo(xaxisx,(xaxisy+(can.width/2))); ctx.stroke(); xaxisx=Math.round(xaxisx+interval); } ctx.closePath(); } functiondrawYAxis(){ yaxisx=can.width/2; yaxisy=can.height-10; ctx.beginPath(); ctx.lineWidth=2; ctx.strokeStyle="black"; ctx.moveTo(yaxisx,yaxisy); yaxisy=10 ctx.lineTo(yaxisx,yaxisy); ctx.stroke(); ctx.closePath(); yaxisy=can.height-10; length=yaxisy-10; noofxfragments=yMax-yMin; interval=length/noofxfragments; //mark the y-axis ctx.beginPath(); ctx.font="bold 10pt Arial"; for(vari=yMin;i<=yMax;i++) { ctx.lineWidth=0.15; ctx.strokeStyle="grey"; ctx.fillText(i,yaxisx-20,yaxisy+5); ctx.moveTo(yaxisx-(can.height/2),yaxisy); ctx.lineTo((yaxisx+(can.height/2)),yaxisy); ctx.stroke(); yaxisy=Math.round(yaxisy-interval); } ctx.closePath(); } How it works... There are two functions in the JavaScript code viz. drawXAxis and drawYAxis. A canvas is not calibrated the way a graph paper is. A simple calculation is used to do the same. In both the functions, there are two parts. One part draws the axis and the second marks the axis on regular intervals. These are delimited by ctx.beginPath() and ctx.closePath(). In the first part, the canvas width and height are used to draw the axis. In the second part, we do some calculation. The length of the axis is divided by the number of markers to get the interval. If the starting point is -3, then we have -3, -2, -1, 0, 1, 2, and 3 on the axis, which makes 7 marks and 6 parts. The interval is used to generate x and y coordinate value for the starting point and plot the markers. There is more... Try to replace the following: ctx.moveTo(xaxisx,xaxisy-(can.width/2)); (in drawXAxis()) ctx.lineTo(xaxisx,(xaxisy+(can.width/2)));(in drawXAxis()) ctx.moveTo(yaxisx-(can.height/2),yaxisy);(in drawYAxis()) ctx.lineTo((yaxisx+(can.height/2)),yaxisy);(in drawYAxis()) WITH ctx.moveTo(xaxisx,xaxisy-5); ctx.lineTo(xaxisx,(xaxisy+5)); ctx.moveTo(yaxisx-5,yaxisy); ctx.lineTo((yaxisx+5),yaxisy); Also, instead of grey color for markers, you can use red. Drawing a simple equation This recipe is a simple line drawing on a graph using an equation. The output looks like this: How to do it… The HTML code is as follows: <html> <head> <title>Equation</title> <script src="graphaxes.js"></script> <script src="plotequation.js"></script> </head> <body onload=init()> <canvas width="600" height="600" id="MyCanvasArea" style="border:2px solid blue;" tabindex="0"> Canvas tag is not supported by your browser </canvas> <br> <form id="myform"> Select your starting value <select name="startvalue" onclick="init()"> <option value=-10>-10</option> <option value=-9>-9</option> <option value=-8>-8</option> <option value=-7>-7</option> <option value=-6>-6</option> <option value=-5>-5</option> <option value=-4>-4</option> <option value=-3>-3</option> <option value=-2>-2</option> </select> <br> Enter the coeficient(c) for the equation y=cx <input type="text" size=5 name="coef"> <input type="button" value="Click to plot" onclick="plotEquation()"> <input type="button" value="Reset" onclick="init()"> </form> </body> </html> The JavaScript code is as follows: functionplotEquation(){ varcoef=document.forms['myform'].elements['coef']; var s=document.forms['myform'].elements['startvalue']; var c=coef.value; var x=parseInt(s.value); varxPos; varyPos; while(x<=xMax) { y=c*x; xZero=can.width/2; yZero=can.height/2; if(x!=0) xPos=xZero+x*interval; else xPos=xZero-x*interval; if(y!=0) yPos=yZero-y*interval; else yPos=yZero+y*interval; ctx.beginPath(); ctx.fillStyle="blue"; ctx.arc(xPos,yPos,5,Math.PI/180,360*Math.PI/180,false); ctx.fill(); ctx.closePath(); if(x<xMax) { ctx.beginPath(); ctx.lineWidth=3; ctx.strokeStyle="green"; ctx.moveTo(xPos,yPos); nextX=x+1; nextY=c*nextX; if(nextX!=0) nextXPos=xZero+nextX*interval; else nextXPos=xZero-nextX*interval; if(nextY!=0) nextYPos=yZero-nextY*interval; else nextYPos=yZero+nextY*interval; ctx.lineTo(nextXPos,nextYPos); ctx.stroke(); ctx.closePath(); } x=x+1; } } How it works... We use one more script in this recipe. There are two scripts referred by the HTML file. One is the previous recipe named graphaxes.js, and the other one is the current one named plotequation.js. JavaScript allows you to use the variables created in one file into the other, and this is done in this new recipe. You already know how the axes are drawn. This recipe is to plot an equation y=cx, where c is the coefficient entered by the user. We take the minimum of the x value from the drop-down list and calculate the values for y in a loop. We plot the current and next coordinate and draw a line between the two. This happens till we reach the maximum value of x. Remember that the maximum and minimum value of x and y is same. There is more... Try the following: Input positive as well as negative value for coefficient. Drawing a sinusoidal wave This recipe also uses the previous recipe of axes drawing. The output looks like this: How to do it… The HTML code is as follows: <html> <head> <title>Equation</title> <script src="graphaxes.js"></script> <script src="plotSineEquation.js"></script> </head> <body onload=init()> <canvas width="600" height="600" id="MyCanvasArea" style="border:2px solid blue;" tabindex="0"> Canvas tag is not supported by your browser </canvas> <br> <form id="myform"> Select your starting value <select name="startvalue" onclick="init()"> <option value=-10>-10</option> <option value=-9>-9</option> <option value=-8>-8</option> <option value=-7>-7</option> <option value=-6>-6</option> <option value=-5>-5</option> <option value=-4>-4</option> <option value=-3>-3</option> <option value=-2>-2</option> </select> <br> <input type="button" value="Click to plot a sine wave" onclick="plotEquation()"> <input type="button" value="Reset" onclick="init()"> </form> </body> </html> The JavaScript code is as follows: functionplotEquation() { var s=document.forms['myform'].elements['startvalue']; var x=parseInt(s.value); //ctx.fillText(x,100,100); varxPos; varyPos; varnoofintervals=Math.round((2*Math.abs(x)+1)/2); xPos=10; yPos=can.height/2; xEnd=xPos+(2*interval); yEnd=yPos; xCtrl1=xPos+Math.ceil(interval/2); yCtrl1=yPos-200; xCtrl2=xEnd-Math.ceil(interval/2); yCtrl2=yPos+200; drawBezierCurve(ctx,xPos,yPos,xCtrl1,yCtrl1,xCtrl2,yCtrl2,xEnd,yEnd,"red",2); for(vari=1;i<noofintervals;i++) { xPos=xEnd; xEnd=xPos+(2*interval); xCtrl1=xPos+Math.floor(interval/2)+15; xCtrl2=xEnd-Math.floor(interval/2)-15; drawBezierCurve(ctx,xPos,yPos,xCtrl1,yCtrl1,xCtrl2,yCtrl2,xEnd,yEnd,"red",2); } } function drawBezierCurve(ctx,xstart,ystart,xctrl1,yctrl1,xctrl2,yctrl2,xend,yend,color,width) { ctx.strokeStyle=color; ctx.lineWidth=width; ctx.beginPath(); ctx.moveTo(xstart,ystart); ctx.bezierCurveTo(xctrl1,yctrl1,xctrl2,yctrl2,xend,yend); ctx.stroke(); } How it works... We use the Bezier curve to draw the sine wave along the x axis. A bit of calculation using the interval between two points, which encompasses a phase, is done to achieve this. The number of intervals is calculated in the following statement: varnoofintervals=Math.round((2*Math.abs(x)+1)/2); where x is the value in the drop-down list. One phase is initially drawn before the for loop begins. The subsequent phases are drawn in the for loop. The start and end x coordinate changes in every iteration. The ending coordinate for the first sine wave is the first coordinate for the subsequent sine wave. Drawing a line graph Graphs are always informative. The basic graphical representation can be a line graph, which is demonstrated here: How to do it… The HTML code is as follows: <html> <head> <title>A simple Line chart</title> <script src="linechart.js"></script> </head> <body onload=init()> <h1>Your WhatsApp Usage</h1> <canvas width="600" height="500" id="MyCanvasArea" style="border:2px solid blue;" tabindex="0"> Canvas tag is not supported by your browser </canvas> </body> </html> The JavaScript code is as follows: functioninit() { vargCanvas = document.getElementById('MyCanvasArea'); // Ensure that the element is available within the DOM varctx = gCanvas.getContext('2d'); // Bar chart data var data = new Array(7); data[0] = "1,130"; data[1] = "2,140"; data[2] = "3,150"; data[3] = "4,140"; data[4] = "5,180"; data[5] = "6,240"; data[6] = "7,340"; // Draw the bar chart drawLineGraph(ctx, data, 70, 100, (gCanvas.height - 40), 50); } functiondrawLineGraph(ctx, data, startX, barWidth, chartHeight, markDataIncrementsIn) { // Draw the x axis ctx.lineWidth = "3.0"; var max=0; varstartY = chartHeight; drawLine(ctx, startX, startY, startX, 1); drawLine(ctx, startX, startY, 490, startY); for(vari=0,m=0;i<data.length;i++,m+=60) { ctx.lineWidth=0.3; drawLine(ctx,startX,startY-m,490,startY-m) ctx.font="bold 12pt Arial"; ctx.fillText(m,startX-30,startY-m); } for(vari=0,m=0;i<data.length;i++,m+=61) { ctx.lineWidth=0.3; drawLine(ctx, startX+m, startY, startX+m, 1); var values=data[i].split(","); var day; switch(values[0]) { case "1": day="MO"; break; case "2": day="TU"; break; case "3": day="WE"; break; case "4": day="TH"; break; case "5": day="FR"; break; case "6": day="SA"; break; case "7": day="SU"; break; } ctx.fillText(day,startX+m-10, startY+20); } //plot the points and draw lines between them varstartAngle = 0 * (Math.PI/180); varendAngle = 360 * (Math.PI/180); varnewValues; for(vari=0,m=0;i<data.length;i++,m+=60) { ctx.beginPath(); var values=data[i].split(","); varxPos=startX+parseInt(values[0])+m; varyPos=chartHeight-parseInt(values[1]); ctx.arc(xPos, yPos, 5, startAngle,endAngle, false); ctx.fillStyle="red"; ctx.fill(); ctx.fillStyle="blue"; ctx.fillText(values[1],xPos, yPos); ctx.stroke(); ctx.closePath(); if(i>0){ ctx.strokeStyle="green"; ctx.lineWidth=1.5; ctx.moveTo(oldxPos,oldyPos); ctx.lineTo(xPos,yPos); ctx.stroke(); } oldxPos=xPos; oldyPos=yPos; } } functiondrawLine(ctx, startx, starty, endx, endy) { ctx.beginPath(); ctx.moveTo(startx, starty); ctx.lineTo(endx, endy); ctx.closePath(); ctx.stroke(); } How it works... All the graphs in the subsequent recipes also work on an array named data. The array element has two parts: one indicates the day and the second indicates the usage in minutes. A split function down the code splits the element into two independent elements. The coordinates are calculated using a parameter named m, which is used in calculating the value of the x coordinate. The value in minutes and the chart height is used to calculate the position of y coordinate. Inside the loop, there are two coordinates, which are used to draw a line. One in the moveTo() method and the other in the lineTo() method. However, the coordinates oldxPos and oldyPos are not calculated in the first iteration, for the simple reason that we cannot draw a line with a single coordinate. Next iteration onwards, we have two coordinates and then the line is drawn between the prior and current coordinates. There is more... Use your own data Drawing a bar graph Another typical representation, which is widely used, is the bar graph. Here is an output of this recipe: How to do it… The HTML code is as follows: <html> <head> <title>A simple Bar chart</title> <script src="bargraph.js"></script> </head> <body onload=init()> <h1>Your WhatsApp Usage</h1> <canvas width="600" height="500" id="MyCanvasArea" style="border:2px solid blue;" tabindex="0"> Canvas tag is not supported by your browser </canvas> </body> </html> The JavaScript code is as follows: functioninit(){ vargCanvas = document.getElementById('MyCanvasArea'); // Ensure that the element is available within the DOM varctx = gCanvas.getContext('2d'); // Bar chart data var data = new Array(7); data[0] = "MON,130"; data[1] = "TUE,140"; data[2] = "WED,150"; data[3] = "THU,140"; data[4] = "FRI,170"; data[5] = "SAT,250"; data[6] = "SUN,340"; // Draw the bar chart drawBarChart(ctx, data, 70, 100, (gCanvas.height - 40), 50); } functiondrawBarChart(ctx, data, startX, barWidth, chartHeight, markDataIncrementsIn) { // Draw the x and y axes ctx.lineWidth = "3.0"; varstartY = chartHeight; //drawLine(ctx, startX, startY, startX, 30); drawBarGraph(ctx, startX, startY, startX, 30,data,chartHeight); drawLine(ctx, startX, startY, 570, startY); } functiondrawLine(ctx, startx, starty, endx, endy) { ctx.beginPath(); ctx.moveTo(startx, starty); ctx.lineTo(endx, endy); ctx.closePath(); ctx.stroke(); } functiondrawBarGraph(ctx, startx, starty, endx, endy,data,chartHeight) { ctx.beginPath(); ctx.moveTo(startx, starty); ctx.lineTo(endx, endy); ctx.closePath(); ctx.stroke(); var max=0; //code to label x-axis for(i=0;i<data.length;i++) { varxValues=data[i].split(","); varxName=xValues[0]; ctx.textAlign="left"; ctx.fillStyle="#b90000"; ctx.font="bold 15px Arial"; ctx.fillText(xName,startx+i*50+i*20,chartHeight+15,200); var height=parseInt(xValues[1]); if(parseInt(height)>parseInt(max)) max=height; varcolor='#'+Math.floor(Math.random()*16777215).toString(16); drawBar(ctx,startx+i*50+i*20,(chartHeight-height),height,50,color); ctx.fillText(Math.round(height/60)+" hrs",startx+i*50+i*20,(chartHeight-height-20),200); } //title the x-axis ctx.beginPath(); ctx.fillStyle="black"; ctx.font="bolder 20pt Arial"; ctx.fillText("<------------Weekdays------------>",startx+150,chartHeight+35,200); ctx.closePath(); //y-axis labelling varylabels=Math.ceil(max/60); varyvalue=0; ctx.font="bold 15pt Arial"; for(i=0;i<=ylabels;i++) { ctx.textAlign="right"; ctx.fillText(yvalue,startx-5,(chartHeight-yvalue),50); yvalue+=60; } //title the y-axis ctx.beginPath(); ctx.font = 'bolder 20pt Arial'; ctx.save(); ctx.translate(20,70); ctx.rotate(-0.5*Math.PI); varrText = 'Rotated Text'; ctx.fillText("<--------Time in minutes--------->" , 0, 0); ctx.closePath(); ctx.restore(); } functiondrawBar(ctx,xPos,yPos,height,width,color){ ctx.beginPath(); ctx.fillStyle=color; ctx.rect(xPos,yPos,width,height); ctx.closePath(); ctx.stroke(); ctx.fill(); } How it works... The processing is similar to that of a line graph, except that here there are rectangles drawn, which represent bars. Also, the number 1, 2, 3… are represented as day of the week (for example, 1 means Monday). This line in the code: varcolor='#'+Math.floor(Math.random()*16777215).toString(16); is used to generate random colors for the bars. The number 16777215 is a decimal value for #FFFFF. Note that the value of the control variable i is not directly used for drawing the bar. Rather i is manipulated to get the correct coordinates on the canvas and then the bar is drawn using the drawBar() function. drawBar(ctx,startx+i*50+i*20,(chartHeight-height),height,50,color); There is more... Use your own data and change the colors. Drawing a pie chart A share can be easily represented in form of a pie chart. This recipe demonstrates a pie chart: How to do it… The HTML code is as follows: <html> <head> <title>A simple Pie chart</title> <script src="piechart.js"></script> </head> <body onload=init()> <h1>Your WhatsApp Usage</h1> <canvas width="600" height="500" id="MyCanvasArea" style="border:2px solid blue;" tabindex="0"> Canvas tag is not supported by your browser </canvas> </body> </html> The JavaScript code is as follows: functioninit() { var can = document.getElementById('MyCanvasArea'); varctx = can.getContext('2d'); var data = [130,140,150,140,170,250,340]; varcolors = ["crimson", "blue", "yellow", "navy", "aqua", "purple","red"]; var names=["MON","TUE","WED","THU","FRI","SAT","SUN"]; varcenterX=can.width/2; varcenterY=can.height/2; //varcenter = [can.width/2,can.height / 2]; var radius = (Math.min(can.width,can.height) / 2)-50; varstartAngle=0, total=0; for(vari in data) { total += data[i]; } varincrFactor=-(centerX-centerX/2); var angle=0; for (vari = 0; i<data.length; i++){ ctx.fillStyle = colors[i]; ctx.beginPath(); ctx.moveTo(centerX,centerY); ctx.arc(centerX,centerY,radius,startAngle,startAngle+(Math.PI*2*(data[i]/total)),false); ctx.lineTo(centerX,centerY); ctx.rect(centerX+incrFactor,20,20,10); ctx.fill(); ctx.fillStyle="black"; ctx.font="bold 10pt Arial"; ctx.fillText(names[i],centerX+incrFactor,15); ctx.save(); ctx.translate(centerX,centerY); ctx.rotate(startAngle); var dx=Math.floor(can.width*0.5)-100; vardy=Math.floor(can.height*0.20); ctx.fillText(names[i],dx,dy); ctx.restore(); startAngle += Math.PI*2*(data[i]/total); incrFactor+=50; } } How it works... Again the data here is the same, but instead of bars, we use arcs here. The trick is done by changing the end angle as per the data available. Translation and rotation helps in naming the weekdays for the pie chart. There is more... Use your own data and change the colors to get acquainted. Summary Managers make decisions based on the data representations. The data is usually represented in a report form and in the form of graph or charts. The latter representation plays a major role in providing a quick review of the data. In this article, we represent dummy data in the form of graphs and chart. Resources for Article: Further resources on this subject: HTML5 Canvas[article] HTML5: Developing Rich Media Applications using Canvas[article] Building the Untangle Game with Canvas and the Drawing API[article]
Read more
  • 0
  • 0
  • 34650

article-image-advanced-react
Packt
12 Apr 2016
7 min read
Save for later

Advanced React

Packt
12 Apr 2016
7 min read
In this article by Sven A. Robbestad, author of ReactJS Blueprints, we will cover the following topics: Understanding Webpack Adding Redux to your ReactJS app Understanding Redux reducers, actions, and the store (For more resources related to this topic, see here.) Introduction Understanding the tools you use and the libraries you include in your web app is important to make an efficient web application. In this article, we'll look at some of the difficult parts of modern web development with ReactJS, including Webpack and Redux. Webpack is an important tool for modern web developers. It is a module bundler and works by bundling all modules and files within the context of your base folder. Any file within this context is considered a module and attemptes will be made to bundled it. The only exceptions are files placed in designated vendor folders by default, that are node_modules and web_modules files. Files in these folders are explicitly required in your code to be bundled. Redux is an implementation of the Flux pattern. Flux describes how data should flow through your app. Since the birth of the pattern, there's been an explosion in the number of libraries that attempt to execute on the idea. It's safe to say that while many have enjoyed moderate success, none has been as successful as Redux. Configuring Webpack You can configure Webpack to do almost anything you want, including replacing the current code loaded in your browser with the updated code, while preserving the state of the app. Webpack is configured by writing a special configuration file, usually called webpack.config.js. In this file, you specify the entry and output parameters, plugins, module loaders, and various other configuration parameters. A very basic config file looks like this: var webpack = require('webpack'); module.exports = { entry: [ './entry' ], output: { path: './', filename: 'bundle.js' } }; It's executed by issuing this command from the command line: webpack --config webpack.config.js You can even drop the config parameter, as Webpack will automatically look for the presence of webpack.config.js if not specified. In order to convert the source files before bundling, you use module loaders. Adding this section to the Webpack config file will ensure that the babel-loader module converts JavaScript 2015 code to ECMAScript 5: module: { loaders: [{ test: /.js?$/', loader: 'babel-loader', exclude: /node_modules/, query: { presets: ['es2015','react'] } }] } The first option (required), test, is a regex match that tells Webpack which files these loader operates on. The regex tells Webpack to look for files with a period followed by the letters js and then any optional letters (?) before the end ($). This makes sure that the loader reads both plain JavaScript files and JSX files. The second option (required), loader, is the name of the package that we'll use to convert the code. The third option (optional), exclude, is another regex variable used to explicitly ignore a set of folders or files. The final option (optional), query, contains special configuration options for Babel. The recommended way to do it is actually by setting them in a special file called .babelrc. This file will be picked up automatically by Babel when transpiling files. Adding Redux to your ReactJS app When ReactJS was first introduced to the public in late 2013/early 2014, you would often hear it mentioned together with functional programming. However, there's no inherent requirement to write functional code when writing the ReactJS code, and JavaScript itself being a multi-paradigm language is neither strictly functional nor strictly imperative. Redux chose the functional approach, and it's quickly gaining traction as the superior Flux implementation. There are a number of benefits of choosing a functional, which are as follows: No side effects allowed, that is, the operation is stateless Always returns the same output for a given input Ideal for creating recursive operations Ideal for parallel execution Easy to establish the single source of truth Easy to debug Easy to persist the store state for a faster development cycle Easy to create functionality such as undo and redo Easy to inject the store state for server rendering The concept of stateless operations is possibly the number one benefit, as it makes it very easy to reason about the state of your application. This is, however, not the idiomatic Reflux approach, because it's actually designed to create many stores and has the children listen to changes separately. Application state is the only most difficult part of any application, and every single implementation of Flux has attempted to solve this problem. Redux solves it by not actually doing Flux at all but is an amalgamation of the ideas of Flux and the functional programming language Elm. There are three parts to Redux: actions, reducers, and the global store. The store In Redux, there is only one global store. It is an object that holds the state of your entire application. You create a store by passing your root reducing function (or reducer, for short) to a method called createStore. Rather than creating more stores, you use a concept called reducer composition to split data handling logic. You will then need to use a function called combineReducers to create a single root reducer. The createStore function is derived from Redux and is usually called once in the root of your app (or your store file). It is then passed on to your app and then propagated to the app's children. The only way to change the state of the store is to dispatch an action on it. This is not the same as a Flux dispatcher because Redux doesn't have one. You can also subscribe to changes from the store in order to update your components when the store changes state. Actions An action is an object that represents an intention to change the state. It must have a type field that indicates what kind of action is being performed. They can be defined as constants and imported from other modules. Apart from this requirement, the structure of the object is entirely up to you. A basic action object can look like this: { type: 'UPDATE', payload: { value: "some value" } } The payload property is optional and can be an object, as we saw earlier, or any other valid JavaScript type, such as a function or a primitive. Reducers A reducer is a function that accepts an accumulation and a value and returns a new accumulation. In other words, it returns the next state based on the previous state and an action. It must be a pure function, free of side effects, and it does not mutate the existing state. For smaller apps, it's okay to start with a single reducer, and as your app grows, you split off smaller reducers that manage specific parts of your state tree. This is what's called reducer composition and is the fundamental pattern of building apps with Redux. You start with a single reducer, and as your app grows, split it off into smaller reducers that manage specific parts of the state tree. Because reducers are just functions, you can control the order in which they are called, pass additional data, or even make reusable reducers for common tasks such as pagination. It's okay to have multiple reducers. In fact, it's encouraged. Summary In this article, you learned about Webpack and how to configure it. You also learned about adding Redux to your ReactJS app. Apart from this, you learned about Redux's reducers, actions, and the store. Resources for Article: Further resources on this subject: Getting Started with React [article] Reactive Programming and the Flux Architecture [article] Create Your First React Element [article]
Read more
  • 0
  • 0
  • 14716
article-image-market-basket-analysis
Packt
12 Apr 2016
17 min read
Save for later

Market Basket Analysis

Packt
12 Apr 2016
17 min read
In this article by Boštjan Kaluža, author of the book Machine Learning in Java, we will discuss affinity analysis which is the heart of Market Basket Analysis (MBA). It can discover co-occurrence relationships among activities performed by specific users or groups. In retail, affinity analysis can help you understand the purchasing behavior of customers. These insights can drive revenue through smart cross-selling and upselling strategies and can assist you in developing loyalty programs, sales promotions, and discount plans. In this article, we will look into the following topics: Market basket analysis Association rule learning Other applications in various domains First, we will revise the core association rule learning concepts and algorithms, such as support, lift, Apriori algorithm, and FP-growth algorithm. Next, we will use Weka to perform our first affinity analysis on supermarket dataset and study how to interpret the resulting rules. We will conclude the article by analyzing how association rule learning can be applied in other domains, such as IT Operations Analytics, medicine, and others. (For more resources related to this topic, see here.) Market basket analysis Since the introduction of electronic point of sale, retailers have been collecting an incredible amount of data. To leverage this data in order to produce business value, they first developed a way to consolidate and aggregate the data to understand the basics of the business. What are they selling? How many units are moving? What is the sales amount? Recently, the focus shifted to the lowest level of granularity—the market basket transaction. At this level of detail, the retailers have direct visibility into the market basket of each customer who shopped at their store, understanding not only the quantity of the purchased items in that particular basket, but also how these items were bought in conjunction with each other. This can be used to drive decisions about how to differentiate store assortment and merchandise, as well as effectively combine offers of multiple products, within and across categories, to drive higher sales and profits. These decisions can be implemented across an entire retail chain, by channel, at the local store level, and even for the specific customer with the so-called personalized marketing, where a unique product offering is made for each customer. MBA covers a wide variety of analysis: Item affinity: This defines the likelihood of two (or more) items being purchased together Identification of driver items: This enables the identification of the items that drive people to the store and always need to be in stock Trip classification: This analyzes the content of the basket and classifies the shopping trip into a category: weekly grocery trip, special occasion, and so on Store-to-store comparison: Understanding the number of baskets allows any metric to be divided by the total number of baskets, effectively creating a convenient and easy way to compare the stores with different characteristics (units sold per customer, revenue per transaction, number of items per basket, and so on) Revenue optimization: This helps in determining the magic price points for this store, increasing the size and value of the market basket Marketing: This helps in identifying more profitable advertising and promotions, targeting offers more precisely in order to improve ROI, generating better loyalty card promotions with longitudinal analysis, and attracting more traffic to the store Operations optimization: This helps in matching the inventory to the requirement by customizing the store and assortment to trade area demographics, optimizing store layout Predictive models help retailers to direct the right offer to the right customer segments/profiles, as well as gain understanding of what is valid for which customer, predict the probability score of customers responding to this offer, and understand the customer value gain from the offer acceptance. Affinity analysis Affinity analysis is used to determine the likelihood that a set of items will be bought together. In retail, there are natural product affinities, for example, it is very typical for people who buy hamburger patties to buy hamburger rolls, along with ketchup, mustard, tomatoes, and other items that make up the burger experience. While there are some product affinities that might seem trivial, there are some affinities that are not very obvious. A classic example is toothpaste and tuna. It seems that people who eat tuna are more prone to brush their teeth right after finishing their meal. So, why it is important for retailers to get a good grasp of the product affinities? This information is critical to appropriately plan promotions as reducing the price for some items may cause a spike on related high-affinity items without the need to further promote these related items. In the following section, we'll look into the algorithms for association rule learning: Apriori and FP-growth. Association rule learning Association rule learning has been a popular approach for discovering interesting relations between items in large databases. It is most commonly applied in retail for discovering regularities between products. Association rule learning approaches find patterns as interesting strong rules in the database using different measures of interestingness. For example, the following rule would indicate that if a customer buys onions and potatoes together, they are likely to also buy hamburger meat:{onions, potatoes} à {burger} Another classic story probably told in every machine learning class is the beer and diaper story. An analysis of supermarket shoppers' behavior showed that customers, presumably young men, who buy diapers tend also to buy beer. It immediately became a popular example of how an unexpected association rule might be found from everyday data; however, there are varying opinions as to how much of the story is true. Daniel Powers says (DSS News, 2002): In 1992, Thomas Blischok, manager of a retail consulting group at Teradata, and his staff prepared an analysis of 1.2 million market baskets from about 25 Osco Drug stores. Database queries were developed to identify affinities. The analysis "did discover that between 5:00 and 7:00 p.m. that consumers bought beer and diapers". Osco managers did NOT exploit the beer and diapers relationship by moving the products closer together on the shelves. In addition to the preceding example from MBA, association rules are today employed in many application areas, including web usage mining, intrusion detection, continuous production, and bioinformatics. We'll take a closer look these areas later in this article. Basic concepts Before we dive into algorithms, let's first review the basic concepts. Database of transactions First, there is no class value, as this is not required for learning association rules. Next, the dataset is presented as a transactional table, where each supermarket item corresponds to a binary attribute. Hence, the feature vector could be extremely large. Consider the following example. Suppose we have five receipts as shown in the following image. Each receipt corresponds a purchasing transaction: To write these receipts in the form of transactional database, we first identify all the possible items that appear in the receipts. These items are onions, potatoes, burger, beer, and dippers. Each purchase, that is, transaction, is presented in a row, and there is 1 if an item was purchased within the transaction and 0 otherwise, as shown in the following table: Transaction ID Onions Potatoes Burger Beer Dippers 1 0 1 1 0 0 2 1 1 1 1 0 3 0 0 0 1 1 4 1 0 1 1 0 This example is really small. In practical applications, the dataset often contains thousands or millions of transactions, which allow learning algorithm discovery of statistically significant patterns. Itemset and rule Itemset is simply a set of items, for example, {onions, potatoes, burger}. A rule consists of two itemsets, X and Y, in the following format X -> Y. This indicates a pattern that when the X itemset is observed, Y is also observed. To select interesting rules, various measures of significance can be used. Support Support, for an itemset, is defined as the proportion of transactions that contain the itemset. The {potatoes, burger} itemset in the previous table has the following support as it occurs in 50% of transactions (2 out of 4 transactions) supp({potatoes, burger }) = 2/4 = 0.5. Intuitively, it indicates the share of transactions that support the pattern. Confidence Confidence of a rule indicates its accuracy. It is defined as Conf(X -> Y) = supp(X U Y) / supp(X). For example, the {onions, burger} -> {beer} rule has the confidence 0.5/0.5 = 1.0 in the previous table, which means that 100% of the times when onions and burger are bought together, beer is bought as well. Apriori algorithm Apriori algorithm is a classic algorithm used for frequent pattern mining and association rule learning over transactional. By identifying the frequent individual items in a database and extending them to larger itemsets, Apriori can determine the association rules, which highlight general trends about a database. Apriori algorithm constructs a set of itemsets, for example, itemset1= {Item A, Item B}, and calculates support, which counts the number of occurrences in the database. Apriori then uses a bottom up approach, where frequent itemsets are extended, one item at a time, and it works by eliminating the largest sets as candidates by first looking at the smaller sets and recognizing that a large set cannot be frequent unless all its subsets are. The algorithm terminates when no further successful extensions are found. Although, Apriori algorithm is an important milestone in machine learning, it suffers from a number of inefficiencies and tradeoffs. In the following section, we'll look into a more recent FP-growth technique. FP-growth algorithm FP-growth, where frequent pattern (FP), represents the transaction database as a prefix tree. First, the algorithm counts the occurrence of items in the dataset. In the second pass, it builds a prefix tree, an ordered tree data structure commonly used to store a string. An example of prefix tree based on the previous example is shown in the following diagram: If many transactions share most frequent items, prefix tree provides high compression close to the tree root. Large itemsets are grown directly, instead of generating candidate items and testing them against the entire database. Growth starts at the bottom of the tree, by finding all the itemsets matching minimal support and confidence. Once the recursive process has completed, all large itemsets with minimum coverage have been found and association rule creation begins. FP-growth algorithms have several advantages. First, it constructs an FP-tree, which encodes the original dataset in a substantially compact presentation. Second, it efficiently builds frequent itemsets, leveraging the FP-tree structure and divide-and-conquer strategy. The supermarket dataset The supermarket dataset, located in datasets/chap5/supermarket.arff, describes the shopping habits of supermarket customers. Most of the attributes stand for a particular item group, for example, diary foods, beef, potatoes; or department, for example, department 79, department 81, and so on. The value is t if the customer had bought an item and missing otherwise. There is one instance per customer. The dataset contains no class attribute, as this is not required to learn association rules. A sample of data is shown in the following table: Discover patterns To discover shopping patterns, we will use the two algorithms that we have looked into before, Apriori and FP-growth. Apriori We will use the Apriori algorithm as implemented in Weka. It iteratively reduces the minimum support until it finds the required number of rules with the given minimum confidence: import java.io.BufferedReader; import java.io.FileReader; import weka.core.Instances; import weka.associations.Apriori; First, we will load the supermarket dataset: Instances data = new Instances( new BufferedReader( new FileReader("datasets/chap5/supermarket.arff"))); Next, we will initialize an Apriori instance and call the buildAssociations(Instances) function to start frequent pattern mining, as follows: Apriori model = new Apriori(); model.buildAssociations(data); Finally, we can output the discovered itemsets and rules, as shown in the following code: System.out.println(model); The output is as follows: Apriori ======= Minimum support: 0.15 (694 instances) Minimum metric <confidence>: 0.9 Number of cycles performed: 17 Generated sets of large itemsets: Size of set of large itemsets L(1): 44 Size of set of large itemsets L(2): 380 Size of set of large itemsets L(3): 910 Size of set of large itemsets L(4): 633 Size of set of large itemsets L(5): 105 Size of set of large itemsets L(6): 1 Best rules found: 1. biscuits=t frozen foods=t fruit=t total=high 788 ==> bread and cake=t 723 <conf:(0.92)> lift:(1.27) lev:(0.03) [155] conv:(3.35) 2. baking needs=t biscuits=t fruit=t total=high 760 ==> bread and cake=t 696 <conf:(0.92)> lift:(1.27) lev:(0.03) [149] conv:(3.28) 3. baking needs=t frozen foods=t fruit=t total=high 770 ==> bread and cake=t 705 <conf:(0.92)> lift:(1.27) lev:(0.03) [150] conv:(3.27) ... The algorithm outputs ten best rules according to confidence. Let's look the first rule and interpret the output, as follows: biscuits=t frozen foods=t fruit=t total=high 788 ==> bread and cake=t 723 <conf:(0.92)> lift:(1.27) lev:(0.03) [155] conv:(3.35) It says that when biscuits, frozen foods, and fruits are bought together and the total purchase price is high, it is also very likely that bread and cake are purchased as well. The {biscuits, frozen foods, fruit, total high} itemset appears in 778 transactions, while the {bread, cake} itemset appears in 723 transactions. The confidence of this rule is 0.92, meaning that the rule holds true in 92% of transactions where the {biscuits, frozen foods, fruit, total high} itemset is present. The output also reports additional measures such as lift, leverage, and conviction, which estimate the accuracy against our initial assumptions, for example, the 3.35 conviction value indicates that the rule would be incorrect 3.35 times as often if the association was purely a random chance. Lift measures the number of times X and Y occur together than expected if they where statistically independent (lift=1). The 2.16 lift in the X -> Y rule means that the probability of X is 2.16 times greater than the probability of Y. FP-growth Now, let's try to get the same results with more efficient FP-growth algorithm. FP-growth is also implemented in the weka.associations package: import weka.associations.FPGrowth; The FP-growth is initialized similarly as we did earlier: FPGrowth fpgModel = new FPGrowth(); fpgModel.buildAssociations(data); System.out.println(fpgModel); The output reveals that FP-growth discovered 16 rules: FPGrowth found 16 rules (displaying top 10) 1. [fruit=t, frozen foods=t, biscuits=t, total=high]: 788 ==> [bread and cake=t]: 723 <conf:(0.92)> lift:(1.27) lev:(0.03) conv:(3.35) 2. [fruit=t, baking needs=t, biscuits=t, total=high]: 760 ==> [bread and cake=t]: 696 <conf:(0.92)> lift:(1.27) lev:(0.03) conv:(3.28) ... We can observe that FP-growth found the same set of rules as Apriori; however, the time required to process larger datasets can be significantly shorter. Other applications in various areas We looked into affinity analysis to demystify shopping behavior patterns in supermarkets. Although, the roots of association rule learning are in analyzing point-of-sale transactions, they can be applied outside the retail industry to find relationships among other types of baskets. The notion of a basket can easily be extended to services and products, for example, to analyze items purchased using a credit card, such as rental cars and hotel rooms, and to analyze information on value-added services purchased by telecom customers (call waiting, call forwarding, DSL, speed call, and so on), which can help the operators determine the ways to improve their bundling of service packages. Additionally, we will look into the following examples of potential cross-industry applications: Medical diagnosis Protein sequences Census data Customer relationship management IT Operations Analytics Medical diagnosis Applying association rules in medical diagnosis can be used to assist physicians while curing patients. The general problem of the induction of reliable diagnostic rules is hard as, theoretically, no induction process can guarantee the correctness of induced hypotheses by itself. Practically, diagnosis is not an easy process as it involves unreliable diagnosis tests and the presence of noise in training examples. Nevertheless, association rules can be used to identify likely symptoms appearing together. A transaction, in this case, corresponds to a medical case, while symptoms correspond to items. When a patient is treated, a list of symptoms is recorded as one transaction. Protein sequences A lot of research has gone into understanding the composition and nature of proteins; yet many things remain to be understood satisfactorily. It is now generally believed that amino-acid sequences of proteins are not random. With association rules, it is possible to identify associations between different amino acids that are present in a protein. A protein is a sequences made up of 20 types of amino acids. Each protein has a unique three-dimensional structure, which depends on amino-acid sequence; slight change in the sequence may change the functioning of protein. To apply association rules, a protein corresponds to a transaction, while amino acids, their two grams and structure correspond to the items. Such association rules are desirable for enhancing our understanding of protein composition and hold the potential to give clues regarding the global interactions amongst some particular sets of amino acids occurring in the proteins. Knowledge of these association rules or constraints is highly desirable for synthesis of artificial proteins. Census data Censuses make a huge variety of general statistical information about the society available to both researchers and general public. The information related to population and economic census can be forecasted in planning public services (education, health, transport, and funds) as well as in public business(for setting up new factories, shopping malls, or banks and even marketing particular products). To discover frequent patterns, each statistical area (for example, municipality, city, and neighborhood) corresponds to a transaction, and the collected indicators correspond to the items. Customer relationship management Association rules can reinforce the knowledge management process and allow the marketing personnel to know their customers well in order to provide better quality services. For example, association rules can be applied to detect a change of customer behavior at different time snapshots from customer profiles and sales data. The basic idea is to discover changes from two datasets and generate rules from each dataset to carry out rule matching. IT Operations Analytics Based on records of a large number of transactions, association rule learning is well-suited to be applied to the data that is routinely collected in day-to-day IT operations, enabling IT Operations Analytics tools to detect frequent patterns and identify critical changes. IT specialists need to see the big picture and understand, for example, how a problem on a database could impact an application server. For a specific day, IT operations may take in a variety of alerts, presenting them in a transactional database. Using an association rule learning algorithm, IT Operations Analytics tools can correlate and detect the frequent patterns of alerts appearing together. This can lead to a better understanding about how a component impacts another. With identified alert patterns, it is possible to apply predictive analytics. For example, a particular database server hosts a web application and suddenly an alert about a database is triggered. By looking into frequent patterns identified by an association rule learning algorithm, this means that the IT staff needs to take action before the web application is impacted. Association rule learning can also discover alert events originating from the same IT event. For example, every time a new user is added, six changes in the Windows operating systems are detected. Next, in the Application Portfolio Management (APM), IT may face multiple alerts, showing that the transactional time in a database as high. If all these issues originate from the same source (such as getting hundreds of alerts about changes that are all due to a Windows update), this frequent pattern mining can help to quickly cut through a number of alerts, allowing the IT operators to focus on truly critical changes. Summary In this article, you learned how to leverage association rules learning on transactional datasets to gain insight about frequent patterns We performed an affinity analysis in Weka and learned that the hard work lies in the analysis of results—careful attention is required when interpreting rules, as association (that is, correlation) is not the same as causation. Resources for Article: Further resources on this subject: Debugging Java Programs using JDB [article] Functional Testing with JMeter [article] Implementing AJAX Grid using jQuery data grid plugin jqGrid [article]
Read more
  • 0
  • 0
  • 17050

article-image-setting-and-cleaning
Packt
11 Apr 2016
34 min read
Save for later

Setting Up and Cleaning Up

Packt
11 Apr 2016
34 min read
This article, by Mani Tadayon, author of the book, RSpec Essentials, discusses support code to set tests up and clean up after them. Initialization, configuration, cleanup, and other support code related to RSpec specs are important in real-world RSpec usage. We will learn how to cleanly organize support code in real-world applications by learning about the following topics: Configuring RSpec with spec_helper.rb Initialization and configuration of resources Preventing tests from accessing the Internet with WebMock Maintaining clean test state Custom helper code Loading support code on demand with tags (For more resources related to this topic, see here.) Configuring RSpec with spec_helper.rb The RSpec specs that we've seen so far have functioned as standalone units. Specs in the real world, however, almost never work without supporting code to prepare the test environment before tests are run and ensure it is cleaned up afterwards. In fact, the first line of nearly every real-world RSpec spec file loads a file that takes care of initialization, configuration, and cleanup: require 'spec_helper' By convention, the entry point for all support code for specs is in a file called spec_helper.rb. Another convention is that specs are located in a folder called spec in the root folder of the project. The spec_helper.rb file is located in the root of this spec folder. Now that we know where it goes, what do we actually put in spec_helper.rb? Let's start with an example: # spec/spec_helper.rb require 'rspec'   RSpec.configure do |config|   config.order            = 'random'   config.profile_examples = 3    end To see what these two options do, let's create a couple of dummy spec files that include our spec_helper.rb. Here's the first spec file: # spec/first_spec.rb require 'spec_helper'   describe 'first spec' do   it 'sleeps for 1 second' do     sleep 1   end     it 'sleeps for 2 seconds' do     sleep 2   end      it 'sleeps for 3 seconds' do     sleep 3   end  end And here's our second spec file: # spec/second_spec.rb require 'spec_helper'   describe 'second spec' do   it 'sleeps for 4 second' do     sleep 4   end     it 'sleeps for 5 seconds' do     sleep 5   end      it 'sleeps for 6 seconds' do     sleep 6   end  end Now let's run our two spec files and see what happens: We note that we used --format documentation when running RSpec so that we see the order in which the tests were run (the default format just outputs a green dot for each passing test). From the output, we can see that the tests were run in a random order. We can also see the three slowest specs. Although this was a toy example, I would recommend using both of these configuration options for RSpec. Running examples in a random order is very important, as it is the only reliable way of detecting bad tests which sometimes pass and sometimes fail based on the order the in which overall test suite is run. Also, keeping tests running fast is very important for maintaining a productive development flow, and seeing which tests are slow on every test run is the most effective way of encouraging developers to make the slow tests fast, or remove them from the test run. We'll return to both test order and test speed later. For now, let us just note that RSpec configuration is very important to keeping our specs reliable and fast. Initialization and configuration of resources Real-world applications rely on resources, such as databases, and external services, such as HTTP APIs. These must be initialized and configured for the application to work properly. When writing tests, dealing with these resources and services can be a challenge because of two opposing fundamental interests. First, we would like the test environment to match as closely as possible the production environment so that tests that interact with resources and services are realistic. For example, we may use a powerful database system in production that runs on many servers to provide the best performance. Should we spend money and effort to create and maintain a second production-grade database environment just for testing purposes? Second, we would like the test environment to be simple and relatively easy to understand, so that we understand what we are actually testing. We would also like to keep our code modular so that components can be tested in isolation, or in simpler environments that are easier to create, maintain, and understand. If we think of the example of the system that relies on a database cluster in production, we may ask ourselves whether we are better off using a single-server setup for our test database. We could even go so far as to use an entirely different database for our tests, such as the file-based SQLite. As always, there are no easy answers to such trade-offs. The important thing is to understand the costs and benefits, and adjust where we are on the continuum between production faithfulness and test simplicity as our system evolves, along with the goals it serves. For example, for a small hobbyist application or a project with a limited budget, we may choose to completely favor test simplicity. As the same code grows to become a successful fan site or a big-budget project, we may have a much lower tolerance for failure, and have both the motivation and resources to shift towards production faithfulness for our test environment. Some rules of thumb to keep in mind: Unit tests are better places for test simplicity Integration tests are better places for production faithfulness Try to cleverly increase production faithfulness in unit tests Try to cleverly increase test simplicity in integration tests In between unit and integration tests, be clear what is and isn't faithful to the production environment A case study of test simplicity with an external service Let's put these ideas into practice. I haven't changed the application code, except to rename the module OldWeatherQuery. The test code is also slightly changed to require a spec_helper file and to use a subject block to define an alias for the module name, which makes it easier to rename the code without having to change many lines of test code. So let's look at our three files now. First, here's the application code: # old_weather_query.rb   require 'net/http' require 'json' require 'timeout'   module OldWeatherQuery   extend self     class NetworkError < StandardError   end     def forecast(place, use_cache=true)     add_to_history(place)       if use_cache       cache[place] ||= begin         @api_request_count += 1         JSON.parse( http(place) )       end     else       JSON.parse( http(place) )     end   rescue JSON::ParserError     raise NetworkError.new("Bad response")   end     def api_request_count     @api_request_count ||= 0   end     def history     (@history || []).dup   end     def clear!     @history           = []     @cache             = {}     @api_request_count = 0   end     private     def add_to_history(s)     @history ||= []     @history << s   end     def cache     @cache ||= {}   end     BASE_URI = 'http://api.openweathermap.org/data/2.5/weather?q='   def http(place)     uri = URI(BASE_URI + place)       Net::HTTP.get(uri)   rescue Timeout::Error     raise NetworkError.new("Request timed out")   rescue URI::InvalidURIError     raise NetworkError.new("Bad place name: #{place}")   rescue SocketError     raise NetworkError.new("Could not reach #{uri.to_s}")   end end Next is the spec file: # spec/old_weather_query_spec.rb   require_relative 'spec_helper' require_relative '../old_weather_query'   describe OldWeatherQuery do   subject(:weather_query) { described_class }     describe 'caching' do     let(:json_response) do       '{"weather" : { "description" : "Sky is Clear"}}'     end       around(:example) do |example|       actual = weather_query.send(:cache)       expect(actual).to eq({})         example.run         weather_query.clear!     end       it "stores results in local cache" do       weather_query.forecast('Malibu,US')         actual = weather_query.send(:cache)       expect(actual.keys).to eq(['Malibu,US'])       expect(actual['Malibu,US']).to be_a(Hash)     end       it "uses cached result in subsequent queries" do       weather_query.forecast('Malibu,US')       weather_query.forecast('Malibu,US')       weather_query.forecast('Malibu,US')     end   end     describe 'query history' do     before do       expect(weather_query.history).to eq([])       allow(weather_query).to receive(:http).and_return("{}")     end     after do       weather_query.clear!     end       it "stores every place requested" do       places = %w(         Malibu,US         Beijing,CN         Delhi,IN         Malibu,US         Malibu,US         Beijing,CN       )         places.each {|s| weather_query.forecast(s) }         expect(weather_query.history).to eq(places)     end       it "does not allow history to be modified" do       expect {         weather_query.history = ['Malibu,CN']       }.to raise_error         weather_query.history << 'Malibu,CN'       expect(weather_query.history).to eq([])     end   end     describe 'number of API requests' do     before do       expect(weather_query.api_request_count).to eq(0)       allow(weather_query).to receive(:http).and_return("{}")     end       after do       weather_query.clear!     end       it "stores every place requested" do       places = %w(         Malibu,US         Beijing,CN         Delhi,IN         Malibu,US         Malibu,US         Beijing,CN       )         places.each {|s| weather_query.forecast(s) }         expect(weather_query.api_request_count).to eq(3)     end       it "does not allow count to be modified" do       expect {         weather_query.api_request_count = 100       }.to raise_error         expect {         weather_query.api_request_count += 10       }.to raise_error         expect(weather_query.api_request_count).to eq(0)     end   end end And last but not least, our spec_helper file, which has also changed only slightly: we only configure RSpec to show one slow spec (to keep test results uncluttered) and use color in the output to distinguish passes and failures more easily: # spec/spec_helper.rb   require 'rspec'   RSpec.configure do |config|   config.order            = 'random'   config.profile_examples = 1   config.color            = true end When we run these specs, something unexpected happens. Most of the time the specs pass, but sometimes they fail. If we keep running the specs with the same command, we'll see the tests pass and fail apparently at random. These are flaky tests, and we have exposed them because of the random order configuration we chose. If our tests run in a certain order, they fail. The problem could be simply in our tests. For example, we could have forgotten to clear state before or after a test. However, there could also be a problem with our code. In any case, we need to get to the bottom of the situation: We first notice that at the end of the failing test run, RSpec tells us "Randomized with seed 318". We can use this information to run the tests in the order that caused the failure and start to debug and diagnose the problem. We do this by passing the --seed parameter with the value 318, as follows: $ rspec spec/old_weather_query_spec.rb --seed 318 The problem has to do with the way that we increment @api_request_count without ensuring it has been initialized. Looking at our code, we notice that the only place we initialize @api_request_count is in OldWeatherQuery.api_request_count and OldWeatherQuery.clear!. If we don't call either of these methods first, then OldWeatherQuery.forecast, the main method in this module, will always fail. Our tests sometimes pass because our setup code calls one of these methods first when tests are run in a certain order, but that is not at all how our code would likely be used in production. So basically, our code is completely broken, but our specs pass (sometimes). Based on this, we can create a simple spec that will always fail: describe 'api_request is not initialized' do   it "does not raise an error" do     weather_query.forecast('Malibu,US')   end    end At least now our tests fail deterministically. But this is not the end of our troubles with these specs. If we run our tests many times with the seed value of 318, we will start seeing a second failing test case that is even more random than the first. This is an OldWeatherQuery::NetworkError and it indicates that our tests are actually making HTTP requests to the Internet! Let's do an experiment to confirm this. We'll turn off our Wi-Fi access, unplug our Ethernet cables, and run our specs. When we run our tests without any Internet access, we will see three errors in total. One of them is the error with the uninitialized @api_request_count instance variable, and two of them are instances of OldWeatherQuery::NetworkError, which confirms that we are indeed making real HTTP requests in our code. What's so bad about making requests to the Internet? After all, the test failures are indeed very random and we had to purposely shut off our Internet access to replicate the errors. Flaky tests are actually the least of our problems. First, we could be performing destructive actions that affect real systems, accounts, and people! Imagine if we were testing an e-commerce application that charged customer credit cards by using a third-party payment API via HTTP. If our tests actually hit our payment provider's API endpoint over HTTP, we would get a lot of declined transactions (assuming we are not storing and using real credit card numbers), which could lead to our account being suspended due to suspicions of fraud, putting our e-commerce application out of service. Also, if we were running a continuous integration (CI) server such as Jenkins, which did not have access to the public Internet, we would get failures in our CI builds due to failing tests that attempted to access the Internet. There are a few approaches to solving this problem. In our tests, we attempted to mock our HTTP requests, but obviously failed to do so effectively. A second approach is to allow actual HTTP requests but to configure a special server for testing purposes. Let's focus on figuring out why our HTTP mocks were not successful. In a small set of tests like in this example, it is not hard to hunt down the places where we are sending actual HTTP requests. In larger code bases with a lot of test support code, it may be harder. Also, it would be nice to prevent access to the Internet altogether so we notice these issues as soon as we run the offending tests. Fortunately, Ruby has many excellent tools for testing, and there is one that addresses our needs exactly: WebMock (https://github.com/bblimke/webmock). We simply install the gem and add a couple of lines to our spec helper file to disable all network connections in our tests: require 'rspec'   # require the webmock gem require 'webmock/rspec'   RSpec.configure do |config|   # this is done by default, but let's make it clear   WebMock.disable_net_connect!     Config.order            = 'random'   config.profile_examples = 1   config.color            = true end When we run our tests again, we'll see one or more instances of WebMock::NetConnectNotAllowedError, along with a backtrace to lead us to the point in our tests where the HTTP request was made: If we examine our test code, we'll notice that we mock the OldWeatherQuery.http method in a few places. However, we forgot to set up the mock in the first describe block for caching where we defined a json_response object, but never mocked the OldWeatherQuery.http method to return json_response. We can solve the problem by mocking OldWeatherQuery.http throughout the entire test file. We'll also take this opportunity to clean up the initialization of @api_request_count in our code. Here's what we have now: # new_weather_query.rb   require 'net/http' require 'json' require 'timeout'   module NewWeatherQuery   extend self     class NetworkError < StandardError   end     def forecast(place, use_cache=true)     add_to_history(place)     if use_cache       cache[place] ||= begin         increment_api_request_count         JSON.parse( http(place) )       end     else       JSON.parse( http(place) )     end   rescue JSON::ParserError => e     raise NetworkError.new("Bad response: #{e.inspect}")   end     def increment_api_request_count     @api_request_count ||= 0     @api_request_count += 1   end     def api_request_count     @api_request_count ||= 0   end     def history     (@history || []).dup   end     def clear!     @history           = []     @cache             = {}     @api_request_count = 0   end     private     def add_to_history(s)     @history ||= []     @history << s   end     def cache     @cache ||= {}   end     BASE_URI = 'http://api.openweathermap.org/data/2.5/weather?q='   def http(place)     uri = URI(BASE_URI + place)       Net::HTTP.get(uri)   rescue Timeout::Error     raise NetworkError.new("Request timed out")   rescue URI::InvalidURIError     raise NetworkError.new("Bad place name: #{place}")   rescue SocketError     raise NetworkError.new("Could not reach #{uri.to_s}")   end end And here is the spec file to go with it: # spec/new_weather_query_spec.rb   require_relative 'spec_helper' require_relative '../new_weather_query'   describe NewWeatherQuery do   subject(:weather_query) { described_class }     after { weather_query.clear! }     let(:json_response) { '{}' }   before do     allow(weather_query).to receive(:http).and_return(json_response)        end     describe 'api_request is initialized' do     it "does not raise an error" do       weather_query.forecast('Malibu,US')     end      end   describe 'caching' do     let(:json_response) do       '{"weather" : { "description" : "Sky is Clear"}}'     end       around(:example) do |example|       actual = weather_query.send(:cache)       expect(actual).to eq({})             example.run     end       it "stores results in local cache" do       weather_query.forecast('Malibu,US')         actual = weather_query.send(:cache)       expect(actual.keys).to eq(['Malibu,US'])       expect(actual['Malibu,US']).to be_a(Hash)     end       it "uses cached result in subsequent queries" do       weather_query.forecast('Malibu,US')       weather_query.forecast('Malibu,US')       weather_query.forecast('Malibu,US')     end   end     describe 'query history' do     before do       expect(weather_query.history).to eq([])     end       it "stores every place requested" do       places = %w(         Malibu,US         Beijing,CN         Delhi,IN         Malibu,US         Malibu,US         Beijing,CN       )         places.each {|s| weather_query.forecast(s) }         expect(weather_query.history).to eq(places)     end       it "does not allow history to be modified" do       expect {         weather_query.history = ['Malibu,CN']       }.to raise_error         weather_query.history << 'Malibu,CN'       expect(weather_query.history).to eq([])     end   end     describe 'number of API requests' do     before do       expect(weather_query.api_request_count).to eq(0)     end       it "stores every place requested" do       places = %w(         Malibu,US         Beijing,CN         Delhi,IN         Malibu,US         Malibu,US         Beijing,CN       )         places.each {|s| weather_query.forecast(s) }         expect(weather_query.api_request_count).to eq(3)     end       it "does not allow count to be modified" do       expect {         weather_query.api_request_count = 100       }.to raise_error         expect {         weather_query.api_request_count += 10       }.to raise_error         expect(weather_query.api_request_count).to eq(0)     end   end end Now we've fixed a major bug with our code that slipped through our specs and used to pass randomly. We've made it so that our tests always pass, regardless of the order in which they are run, and without needing to access the Internet. Our test code and application code has also become clearer as we've reduced duplication in a few places. A case study of production faithfulness with a test resource instance We're not done with our WeatherQuery example just yet. Let's take a look at how we would add a simple database to store our cached values. There are some serious limitations to the way we are caching with instance variables, which persist only within the scope of a single Ruby process. As soon as we stop or restart our app, the entire cache will be lost. In a production app, we would likely have many processes running the same code in order to serve traffic effectively. With our current approach, each process would have a separate cache, which would be very inefficient. We could easily save many HTTP requests if we were able to share the cache between processes and across restarts. Economizing on these requests is not simply a matter of improved response time. We also need to consider that we cannot make unlimited requests to external services. For commercial services, we would pay for the number of requests we make. For free services, we are likely to get throttled if we exceed some threshold. Therefore, an effective caching scheme that reduces the number of HTTP requests we make to our external services is of vital importance to the function of a real-world app. Finally, our cache is very simplistic and has no expiration mechanism short of clearing all entries. For a cache to be effective, we need to be able to store entries for individual locations for some period of time within which we don't expect the weather forecast to change much. This will keep the cache small and up to date. We'll use Redis (http://redis.io) as our database since it is very fast, simple, and easy to set up. You can find instructions on the Redis website on how to install it, which is an easy process on any platform. Once you have Redis installed, you simply need to start the server locally, which you can do with the redis-server command. We'll also need to install the Redis Ruby client as a gem (https://github.com/redis/redis-rb). Let's start with a separate configuration file to set up our Redis client for our tests: # spec/config/redis.rb   require 'rspec' require 'redis'   ENV['WQ_REDIS_URL'] ||= 'redis://localhost:6379/15'   RSpec.configure do |config|   if ! ENV['WQ_REDIS_URL'].is_a?(String)     raise "WQ_REDIS_URL environment variable not set"   end   ::REDIS_CLIENT = Redis.new( :url => ENV['WQ_REDIS_URL'] )     config.after(:example) do         ::REDIS_CLIENT.flushdb   end end Note that we place this file in a new config folder under our main spec folder. The idea is to configure each resource separately in its own file to keep everything isolated and easy to understand. This will make maintenance easy and prevent problems with configuration management down the road. We don't do much in this file, but we do establish some important conventions. There is a single environment variable, which takes care of the Redis connection URL. By using an environment variable, we make it easy to change configuration and also allow flexibility in how these configurations are stored. Our code doesn't care if the Redis connection URL is stored in a simple .env file with key-value pairs or loaded from a configuration database. We can also easily override this value manually simply by setting it when we run RSpec, like so: $ WQ_REDIS_URL=redis://1.2.3.4:4321/0 rspec spec Note that we also set a sensible default value, which is to run on the default Redis port of 6379 on our local machine, on database number 15, which is less likely to be used for local development. This prevents our tests from relying on our development database, or from polluting or destroying it. It is also worth mentioning that we prefix our environment variable with WQ (short for weather query). Small details like this are very important for keeping our code easy to understand and to prevent dangerous clashes. We could imagine the kinds of confusion and clashes that could be caused if we relied on REDIS_URL and we had multiple apps running on the same server, all relying on Redis. It would be very easy to break many applications if we changed the value of REDIS_URL for a single app to point to a different instance of Redis. We set a global constant, ::REDIS_CLIENT, to point to a Redis client. We will use this in our code to connect to Redis. Note that in real-world code, we would likely have a global namespace for the entire app and we would define globals such as REDIS_CLIENT under that namespace rather than in the global Ruby namespace. Finally, we configure RSpec to call the flushdb command after every example tagged with :redis to empty the database and keep state clean across tests. In our code, all tests interact with Redis, so this tag seems pointless. However, it is very likely that we would add code that had nothing to do with Redis, and using tags helps us to constrain the scope of our configuration hooks only to where they are needed. This will also prevent confusion about multiple hooks running for the same example. In general, we want to prevent global hooks where possible and make configuration hooks explicitly triggered where possible. So what does our spec look like now? Actually, it is almost exactly the same. Only a few lines have changed to work with the new Redis cache. See if you can spot them! # spec/redis_weather_query_spec.rb   require_relative 'spec_helper' require_relative '../redis_weather_query'   describe RedisWeatherQuery, redis: true do   subject(:weather_query) { described_class }     after { weather_query.clear! }     let(:json_response) { '{}' }   before do     allow(weather_query).to receive(:http).and_return(json_response)        end     describe 'api_request is initialized' do     it "does not raise an error" do       weather_query.forecast('Malibu,US')     end      end           describe 'caching' do     let(:json_response) do       '{"weather" : { "description" : "Sky is Clear"}}'     end       around(:example) do |example|       actual = weather_query.send(:cache).all       expect(actual).to eq({})             example.run     end     it "stores results in local cache" do       weather_query.forecast('Malibu,US')         actual = weather_query.send(:cache).all       expect(actual.keys).to eq(['Malibu,US'])       expect(actual['Malibu,US']).to be_a(Hash)     end       it "uses cached result in subsequent queries" do       weather_query.forecast('Malibu,US')       weather_query.forecast('Malibu,US')       weather_query.forecast('Malibu,US')     end   end     describe 'query history' do     before do       expect(weather_query.history).to eq([])     end       it "stores every place requested" do       places = %w(         Malibu,US         Beijing,CN         Delhi,IN         Malibu,US         Malibu,US         Beijing,CN       )         places.each {|s| weather_query.forecast(s) }         expect(weather_query.history).to eq(places)     end       it "does not allow history to be modified" do       expect {         weather_query.history = ['Malibu,CN']       }.to raise_error         weather_query.history << 'Malibu,CN'       expect(weather_query.history).to eq([])     end   end   describe 'number of API requests' do     before do       expect(weather_query.api_request_count).to eq(0)     end       it "stores every place requested" do       places = %w(         Malibu,US         Beijing,CN         Delhi,IN         Malibu,US         Malibu,US         Beijing,CN       )         places.each {|s| weather_query.forecast(s) }         expect(weather_query.api_request_count).to eq(3)     end       it "does not allow count to be modified" do       expect {         weather_query.api_request_count = 100       }.to raise_error         expect {         weather_query.api_request_count += 10       }.to raise_error         expect(weather_query.api_request_count).to eq(0)     end   end end So what about the actual WeatherQuery code? It changes very little as well: # redis_weather_query.rb   require 'net/http' require 'json' require 'timeout'   # require the new cache module require_relative 'redis_weather_cache' module RedisWeatherQuery   extend self     class NetworkError < StandardError   end     # ... same as before ...     def clear!     @history           = []     @api_request_count = 0           # no more clearing of cache here   end     private     # ... same as before ...       # the new cache module has a Hash-like interface   def cache     RedisWeatherCache   end     # ... same as before ...     end We can see that we've preserved pretty much the same code and specs as before. Almost all of the new functionality is accomplished in a new module that caches with Redis. Here is what it looks like: # redis_weather_cache.rb   require 'redis'   module RedisWeatherCache   extend self     CACHE_KEY             = 'weather_query:cache'   EXPIRY_ZSET_KEY       = 'weather_query:expiry_tracker'   EXPIRE_FORECAST_AFTER = 300 # 5 minutes       def redis_client     if ! defined?(::REDIS_CLIENT)       raise("No REDIS_CLIENT defined!")     end         ::REDIS_CLIENT   end     def []=(location, forecast)     redis_client.hset(CACHE_KEY, location, JSON.generate(forecast))     redis_client.zadd(EXPIRY_ZSET_KEY, Time.now.to_i, location)   end     def [](location)     remove_expired_entries         raw_value = redis_client.hget(CACHE_KEY, location)         if raw_value       JSON.parse(raw_value)     else       nil     end   end     def all     redis_client.hgetall(CACHE_KEY).inject({}) do |memo, (location, forecast_json)|       memo[location] = JSON.parse(forecast_json)       memo     end   end     def clear!     redis_client.del(CACHE_KEY)   end     def remove_expired_entries     # expired locations have a score, i.e. creation timestamp, less than a certain threshold     expired_locations = redis_client.zrangebyscore(EXPIRY_ZSET_KEY, 0, Time.now.to_i - EXPIRE_FORECAST_AFTER)       if ! expired_locations.empty?       # remove the cache entry       redis_client.hdel(CACHE_KEY, expired_locations)                  # also clear the expiry entry       redis_client.zrem(EXPIRY_ZSET_KEY, expired_locations)      end   end end We'll avoid a detailed explanation of this code. We simply note that we accomplish all of the design goals we discussed at the beginning of the section: a persistent cache with expiration of individual values. We've accomplished this using some simple Redis functionality along with ZSET or sorted set functionality, which is a bit more complex, and which we needed because Redis does not allow individual entries in a Hash to be deleted. We can see that by using method names such as RedisWeatherCache.[] and RedisWeatherCache.[]=, we've maintained a Hash-like interface, which made it easy to use this cache instead of the simple in-memory Ruby Hash we had in our previous iteration. Our tests all pass and are still pretty simple, thanks to the modularity of this new cache code, the modular configuration file, and the previous fixes we made to our specs to remove Internet and run-order dependencies. Summary In this article, we delved into setting up and cleaning up state for real-world specs that interact with external services and local resources by extending our WeatherQuery example to address a big bug, isolate our specs from the Internet, and cleanly configure a Redis database to serve as a better cache. Resources for Article: Further resources on this subject: Creating your first heat map in R [article] Probability of R? [article] Programming on Raspbian [article]
Read more
  • 0
  • 0
  • 3541
Modal Close icon
Modal Close icon