How-To Tutorials

article-image-exploring-windows-powershell-50

12 Oct 2015

16 min read

Exploring Windows PowerShell 5.0

12 Oct 2015

In this article by Chendrayan Venkatesan, the author of the book Windows PowerShell for .NET Developers, we will cover the following topics: Basics of Desired State Configuration (DSC) Parsing structured objects using PowerShell Exploring package management Exploring PowerShell Get-Module Exploring other enhanced features (For more resources related to this topic, see here.) Windows PowerShell 5.0 has many significant benefits, to know more features about its features refer to the following link: http://go.microsoft.com/fwlink/?LinkID=512808 A few highlights of Windows PowerShell 5.0 are as follows: Improved usability Backward compatibility Class and Enum keywords are introduced Parsing structured objects are made easy using ConvertFrom string command We have some new modules introduced in Windows PowerShell 5.0, such as Archive, Package Management (this was formerly known as OneGet) and so on ISE supported transcriptions Using PowerShell Get-Module cmdlet, we can find, install, and publish modules Debug at runspace can be done using Microsoft.PowerShell.Utility module Basics of Desired State Configuration Desired State Configuration also known as DSC is a new management platform in Windows PowerShell. Using DSC, we can deploy and manage configuration data for software servicing and manage the environment. DSC can be used to streamline datacenters and this was introduced along with Windows Management Framework 4.0 and it heavily extended into Windows Management Framework 5.0. Few highlights of DSC in April 2015 Preview are as follows: New cmdlets are introduced in WMF 5.0 Few DSC commands are updated and remarkable changes are made to the configuration management platform in PowerShell 5.0 DSC resources can be built using class, so no need of MOF file It's not mandatory to know PowerShell to learn DSC but it's a great added advantage. Similar to function we can also use configuration keyword but it has a huge difference because in DSC everything is declarative, which is a cool thing in Desired State Configuration. So before beginning this exercise, I created a DSCDemo lab machine in Azure cloud with Windows Server 2012 and it's available out of the box. So, the default PowerShell version is 4.0. For now let's create and define a simple configuration, which creates a file in the local host. Yeah! A simple New-Item command can do that but it's an imperative cmdlet and we need to write a program to tell the computer to create it, if it does not exist. Structure of the DSC configuration is as follows: Configuration Name { Node ComputerName { ResourceName <String> { } } } To create a simple text file with contents, we use the following code: Configuration FileDemo { Node $env:COMPUTERNAME { File FileDemo { Ensure = 'Present' DestinationPath = 'C:TempDemo.txt' Contents = 'PowerShell DSC Rocks!' Force = $true } } } Look at the following screenshot: Following are the steps represented in the preceding figure: Using the Configuration keyword, we are defining a configuration with the name FileDemo—it's a friendly name. Inside the Configuration block we created a Node block and also a file on the local host. File is the resource name. FileDemo is a friendly name of a resource and it's also a string. Properties of the file resource. This creates MOF file—we call this similar to function. But wait, here a code file is not yet created. We just created a MOF file. Look at the MOF file structure in the following image: We can manually edit the MOF and use it on another machine that has PS 4.0 installed on it. It's not mandatory to use PowerShell for generating MOF, if you are comfortable with PowerShell, you can directly write the MOF file. To explore the available DSC resources you can execute the following command: Get-DscResource The output is illustrated in the following image: Following are the steps represented in the preceding figure: Shows you how the resources are implemented. Binary, Composite, PowerShell, and so on. In the preceding example, we created a DSC Configuration that's FileDemo and that is listed as Composite. Name of the resource. Module name the resource belongs to. Properties of the resource. To know the Syntax of a particular DSC resource we can try the following code: Get-DscResource -Name Service -Syntax The output is illustrated in the following figure ,which shows the resource syntax in detail: Now, let's see how DSC works and its three different phases: The authoring phase. The staging phase. The "Make it so" phase. The authoring phase In this phase we will create a DSC Configuration using PowerShell and this outputs a MOF file. We saw a FileDemo example to create a configuration is considered to be an authoring phase. The staging phase In this phase the declarative MOF will be staged and it's as per node. DSC has a push and pull model, where push is simply pushing the configuration to target nodes. The custom providers need to be manually placed in target machines whereas in pull mode, we need to build an IIS Server that will have MOF for target nodes and this is well defined by the OData interface. In pull mode, the custom providers are downloaded to target system. The "Make it so" phase This is the phase for enacting the configuration, that is applying the configuration on the target nodes. Before we summarize the basics of DSC, let's see a few more DSC Commands. We can do this by executing the following command: Get-Command -Noun DSC* The output is as follows: We are using a PowerShell 4.0 stable release and not 5.0, so the version will not be available. Local Configuration Manager (LCM) is the engine for DSC and it runs on all nodes. LCM is responsible to call the configuration resources that are included in a DSC configuration script. Try executing Get-DscLocalConfigurationManager cmdlet to explore its properties. To Apply the LCM settings on target nodes we can use Set-DscLocalConfigurationManager cmdlet. Use case of classes in WMF 5.0 Using classes in PowerShell makes IT professionals, system administrators, and system engineers to start learning development in WMF. It's time for us to switch back to Windows PowerShell 5.0 because the Class keyword is supported from version 5.0 onwards. Why do we need to write class in PowerShell? Is there any special need? May be we will answer this in this section but this is one reason why I prefer to say that, PowerShell is far more than a scripting language. When the Class keyword was introduced, it mainly focused on creating DSC resources. But using class we can create objects like in any other object oriented programming language. The documentation that reads New-Object is not supported. But it's revised now. Indeed it supports the New-Object. The class we create in Windows PowerShell is a .NET framework type. How to create a PowerShell Class? It's easy, just use the Class keyword! The following steps will help you to create a PowerShell class. Create a class named ClassName {}—this is an empty class. Define properties in the class as Class ClassName {$Prop1 , $prop2} Instantiate the class as $var = [ClassName]::New() Now check the output of $var: Class ClassName { $Prop1 $Prop2 } $var = [ClassName]::new() $var Let's now have a look at how to create a class and its advantages. Let us define the properties in class: Class Catalog { #Properties $Model = 'Fujitsu' $Manufacturer = 'Life Book S Series' } $var = New-Object Catalog $var The following image shows the output of class, its members, and setting the property value: Now, by changing the property value, we get the following output: Now let's create a method with overloads. In the following example we have created a method name SetInformation that accepts two arguments $mdl and $mfgr and these are of string type. Using $var.SetInformation command with no parenthesis will show the overload definitions of the method. The code is as follows: Class Catalog { #Properties $Model = 'Fujitsu' $Manufacturer = 'Life Book S Series' SetInformation([String]$mdl,[String]$mfgr) { $this.Manufacturer = $mfgr $this.Model = $mdl } } $var = New-Object -TypeName Catalog $var.SetInformation #Output OverloadDefinitions ------------------- void SetInformation(string mdl, string mfgr) Let's set the model and manufacturer using set information, as follows: Class Catalog { #Properties $Model = 'Fujitsu' $Manufacturer = 'Life Book S Series' SetInformation([String]$mdl,[String]$mfgr) { $this.Manufacturer = $mfgr $this.Model = $mdl } } $var = New-Object -TypeName Catalog $var.SetInformation('Surface' , 'Microsoft') $var The output is illustrated in following image: Inside the PowerShell class we can use PowerShell cmdlets as well. The following code is just to give a demo of using PowerShell cmdlet. Class allows us to validate the parameters as well. Let's have a look at the following example: Class Order { [ValidateSet("Red" , "Blue" , "Green")] $color [ValidateSet("Audi")] $Manufacturer Book($Manufacturer , $color) { $this.color = $color $this.Manufacturer = $Manufacturer } } The parameter $Color and $Manufacturer has ValidateSet property and has a set of values. Now let's use New-Object and set the property with an argument which doesn't belong to this set, shown as follows: $var = New-Object Order $var.color = 'Orange' Now, we get the following error: Exception setting "color": "The argument "Orange" does not belong to the set "Red,Blue,Green" specified by the ValidateSet attribute. Supply an argument that is in the set and then try the command again." Let's set the argument values correctly to get the result using Book method, as follows: $var = New-Object Order $var.Book('Audi' , 'Red') $var The output is illustrated in the following figure: Constructors A constructor is a special type of method that creates new objects. It has the same name as the class and the return type is void. Multiple constructors are supported, but each one takes different numbers and types of parameters. In the following code, let's see the steps to create a simple constructor in PowerShell that simply creates a user in the active directory. Class ADUser { $identity $Name ADUser($Idenity , $Name) { New-ADUser -SamAccountName $Idenity -Name $Name $this.identity = $Idenity $this.Name = $Name } } $var = [ADUser]::new('Dummy' , 'Test Case User') $var We can also hide the properties in PowerShell class, for example let's create two properties and hide one. In theory, it just hides the property but we can use the property as follows: Class Hide { [String]$Name Hidden $ID } $var = [Hide]::new() $var The preceding code is illustrated in the following figure: Additionally, we can carry out operations, such as Get and Set, as shown in the following code: Class Hide { [String]$Name Hidden $ID } $var = [Hide]::new() $var.Id = '23' $var.Id This returns output as 23. To explore more about class use help about_Classes -Detailed. Parsing structured objects using PowerShell In Windows PowerShell 5.0 a new cmdlet ConvertFrom-String has been introduced and it's available in Microsoft.PowerShell.Utility. Using this command, we can parse the structured objects from any given string content. To see information, use help command with ConvertFrom-String -Detailed command. The help has an incorrect parameter as PropertyName. Copy paste will not work, so use help ConvertFrom-String –Parameter * and read the parameter—it's actually PropertyNames. Now, let's see an example of using ConvertFrom-String. Let us examine a scenario where a team has a custom code which generates log files for daily health check-up reports of their environment. Unfortunately, the tool delivered by the vendor is an EXE file and no source code is available. The log file format is as follows: "Error 4356 Lync" , "Warning 6781 SharePoint" , "Information 5436 Exchange", "Error 3432 Lync" , "Warning 4356 SharePoint" , "Information 5432 Exchange" There are many ways to manipulate this record but let's see how PowerShell cmdlet ConvertFrom-String helps us. Using the following code, we will simply extract the Type, EventID, and Server: "Error 4356 Lync" , "Warning 6781 SharePoint" , "Information 5436 Exchange", "Error 3432 Lync" , "Warning 4356 SharePoint" , "Information 5432 Exchange" | ConvertFrom-String -PropertyNames Type , EventID, Server Following figure shows the output of the code we just saw: Okay, what's interesting in this? It's cool because now your output is a PSCustom object and you can manipulate it as required. "Error 4356 Lync" , "Warning 6781 SharePoint" , "Information 5436 Exchange", "Error 3432 SharePoint" , "Warning 4356 SharePoint" , "Information 5432 Exchange" | ConvertFrom-String -PropertyNames Type , EventID, Server | ? {$_.Type -eq 'Error'} An output in Lync and SharePoint has some error logs that needs to be taken care of on priority. Since, requirement varies you can use this cmdlet as required. ConvertFrom-String has a delimiter parameter, which helps us to manipulate the strings as well. In the following example let's use the –Delimiter parameter that removes white space and returns properties, as follows: "Chen V" | ConvertFrom-String -Delimiter "s" -PropertyNames "FirstName" , "SurName" This results FirstName and SurName – FirstName as Chen and SurName as V In the preceding example, we walked you through using template file to manipulate the string as we need. To do this we need to use the parameter –Template Content. Use help ConvertFrom-String –Parameter Template Content Before we begin we need to create a template file. To do this let's ping a web site. Ping www.microsoft.com and the output returned is, as shown: Pinging e10088.dspb.akamaiedge.net [2.21.47.138] with 32 bytes of data: Reply from 2.21.47.138: bytes=32 time=37ms TTL=51 Reply from 2.21.47.138: bytes=32 time=35ms TTL=51 Reply from 2.21.47.138: bytes=32 time=35ms TTL=51 Reply from 2.21.47.138: bytes=32 time=36ms TTL=51 Ping statistics for 2.21.47.138: Packets: Sent = 4, Received = 4, Lost = 0 (0% loss), Approximate round trip times in milli-seconds: Minimum = 35ms, Maximum = 37ms, Average = 35ms Now, we have the information in some structure. Let's extract IP and bytes; to do this I replaced the IP and Bytes as {IP*:2.21.47.138} Pinging e10088.dspb.akamaiedge.net [2.21.47.138] with 32 bytes of data: Reply from {IP*:2.21.47.138}: bytes={[int32]Bytes:32} time=37ms TTL=51 Reply from {IP*:2.21.47.138}: bytes={[int32]Bytes:32} time=35ms TTL=51 Reply from {IP*:2.21.47.138}: bytes={[int32]Bytes:32} time=36ms TTL=51 Reply from {IP*:2.21.47.138}: bytes={[int32]Bytes:32} time=35ms TTL=51 Ping statistics for 2.21.47.138: Packets: Sent = 4, Received = 4, Lost = 0 (0% loss), Approximate round trip times in milli-seconds: Minimum = 35ms, Maximum = 37ms, Average = 35ms ConvertFrom-String has a debug parameter using which we can debug our template file. In the following example let's see the debugging output: ping www.microsoft.com | ConvertFrom-String -TemplateFile C:TempTemplate.txt -Debug As we mentioned earlier PowerShell 5.0 is a Preview release and has few bugs. Let's ignore those for now and focus on the features, which works fine and can be utilized in environment. Exploring package management In this topic, we will walk you through the features of package management, which is another great feature of Windows Management Framework 5.0. This was introduced in Windows 10 and was formerly known as OneGet. Using package management we can automate software discovery, installation of software, and inventorying. Do not think about Software Inventory Logging (SIL) for now. As we know, in Windows Software Installation, technology has its own way of doing installations, for example MSI type, MSU type, and so on. This is a real challenge for IT professionals and developers, to think about the unique automation of software installation or deployment. Now, we can do it using package management module. To begin with, let's see the package management Module using the following code: Get-Module -Name PackageManagement The output is illustrated as follows: Yeah, well we got an output that is a binary module. Okay, how to know the available cmdlets and their usage? PowerShell has the simplest way to do things, as shown in the following code: Get-Module -Name PackageManagement The available cmdlets are shown in the following image: Package Providers are the providers connected to package management (OneGet) and package sources are registered for providers. To view the list of providers and sources we use the following cmdlets: Now, let's have a look at the available packages—in the following example I am selecting the first 20 packages, for easy viewing: Okay, we have 20 packages so using Install-Package cmdlet, let us now install WindowsAzurePowerShell on our Windows 2012 Server. We need to ensure that the source are available prior to any installation. To do this just execute the cmdlet Get-PackageSource. If the chocolatey source didn't come up in the output, simply execute the following code—do not change any values. This code will install chocolatey package manager on your machine. Once the installation is done we need to restart the PowerShell: Invoke-Expression ((New-Object System.Net.WebClient).DownloadString('https://chocolatey.org/install.ps1')) Find-Package -Name WindowsAzurePowerShell | Install-Package -Verbose The command we just saw shows the confirmation dialog for chocolatey, which is the package source, as shown in the following figure: Click on Yes and install the package. Following are the steps represented in the figure that we just saw: Installs the prerequisites. Creates a temporary folder. Installation successful. Windows Server 2012 has .NET 4.5 in the box by default, so the verbose turned up as False for .NET 4.5, which says PowerShell not installed but WindowsAzurePowerShell is installed successfully. If you are trying to install the same package and the same version that is available on your system – the cmdlet will skip the installation. Find-Package -Name PowerShell Here | Install-Package -Verbose VERBOSE: Skipping installed package PowerShellHere 0.0.3 Explore all the package management cmdlets and automate your software deployments. Exploring PowerShell Get-Module PowerShell Get-Module is a module available in Windows PowerShell 5.0 preview. Following are few more modules: Search through modules in the gallery with Find-Module Save modules to your system from the gallery with Save-Module Install modules from the gallery with Install-Module Update your modules to the latest version with Update-Module Add your own custom repository with Register-PSRepository The following screenshot shows the additional cmdlets that are available: This will allow us to find a module from PowerShell gallery and install it in our environment. PS gallery is a repository of modules. Using Find-Module cmdlet we get a list of module available in the PS gallery. Pipe and install the required module, alternatively we can save the module and examine it before installation, to do this use Save-Module cmdlet. The following screenshot illustrates the installation and deletion of the xJEA module: We can also publish module in the PS gallery, which will be available over the internet to others. This is not a great module. All it does is get user-information from an active directory for the same account name—creates a function and saves it as PSM1 in module folder. In order to publish the module in PS gallery, we need to ensure that the module has manifest. Following are the steps to publish your module: Create a PSM1 file. Create a PSD1 file that is a manifest module (also known as data file). Get your NuGet API key from the PS gallery link shared above. Publish your module using the Publish-PSModule cmdlet. Following figure shows modules that are currently published: Following figure shows the commands to publish modules: Summary In this article, we saw that Windows PowerShell 5.0 preview has got a lot more significant features, such as enhancement in PowerShell DSC, cmdlets improvements and new cmdlets, ISE support transcriptions, support class, and using class. We can create Custom DSC resources with easy string manipulations, A new Network Switch module is introduced using which we can automate and manage Microsoft signed network switches. Resources for Article: Further resources on this subject: Windows Phone 8 Applications[article] The .NET Framework Primer[article] Unleashing Your Development Skills with PowerShell [article]

0
0
51494

Packt

12 Oct 2015

6 min read

Securing Your Data

Packt

12 Oct 2015

6 min read

In this article by Tyson Cadenhead, author of Socket.IO Cookbook, we will explore several topics related to security in Socket.IO applications. These topics will cover the gambit, from authentication and validation to how to use the wss:// protocol for secure WebSockets. As the WebSocket protocol opens innumerable opportunities to communicate more directly between the client and the server, people often wonder if Socket.IO is actually as secure as something such as the HTTP protocol. The answer to this question is that it depends entirely on how you implement it. WebSockets can easily be locked down to prevent malicious or accidental security holes, but as with any API interface, your security is only as tight as your weakest link. In this article, we will cover the following topics: Locking down the HTTP referrer Using secure WebSockets (For more resources related to this topic, see here.) Locking down the HTTP referrer Socket.IO is really good at getting around cross-domain issues. You can easily include the Socket.IO script from a different domain on your page, and it will just work as you may expect it to. There are some instances where you may not want your Socket.IO events to be available on every other domain. Not to worry! We can easily whitelist only the http referrers that we want so that some domains will be allowed to connect and other domains won't. How To Do It… To lock down the HTTP referrer and only allow events to whitelisted domains, follow these steps: Create two different servers that can connect to our Socket.IO instance. We will let one server listen on port 5000 and the second server listen on port 5001: var express = require('express'), app = express(), http = require('http'), socketIO = require('socket.io'), server, server2, io; app.get('/', function (req, res) { res.sendFile(__dirname + '/index.html'); }); server = http.Server(app); server.listen(5000); server2 = http.Server(app); server2.listen(5001); io = socketIO(server); When the connection is established, check the referrer in the headers. If it is a referrer that we want to give access to, we can let our connection perform its tasks and build up events as normal. If a blacklisted referrer, such as the one on port 5001 that we created, attempts a connection, we can politely decline and perhaps throw an error message back to the client, as shown in the following code: io.on('connection', function (socket) { switch (socket.request.headers.referer) { case 'http://localhost:5000/': socket.emit('permission.message', 'Okay, you're cool.'); break; default: returnsocket.emit('permission.message', 'Who invited you to this party?'); break; } }); On the client side, we can listen to the response from the server and react as appropriate using the following code: socket.on('permission.message', function (data) { document.querySelector('h1').innerHTML = data; }); How It Works… The referrer is always available in the socket.request.headers object of every socket, so we will be able to inspect it there to check whether it was a trusted source. In our case, we will use a switch statement to whitelist our domain on port 5000, but we could really use any mechanism at our disposal to perform the task. For example, if we need to dynamically whitelist domains, we can store a list of them in our database and search for it when the connection is established. Using secure WebSockets WebSocket communications can either take place over the ws:// protocol or the wss:// protocol. In similar terms, they can be thought of as the HTTP and HTTPS protocols in the sense that one is secure and one isn't. Secure WebSockets are encrypted by the transport layer, so they are safer to use when you handle sensitive data. In this recipe, you will learn how to force our Socket.IO communications to happen over the wss:// protocol for an extra layer of encryption. Getting Ready… In this recipe, we will need to create a self-signing certificate so that we can serve our app locally over the HTTPS protocol. For this, we will need an npm package called Pem. This allows you to create a self-signed certificate that you can provide to your server. Of course, in a real production environment, we would want a true SSL certificate instead of a self-signed one. To install Pem, simply call npm install pem –save. As our certificate is self-signed, you will probably see something similar to the following screenshot when you navigate to your secure server: Just take a chance by clicking on the Proceed to localhost link. You'll see your application load using the HTTPS protocol. How To Do It… To use the secure wss:// protocol, follow these steps: First, create a secure server using the built-in node HTTPS package. We can create a self-signed certificate with the pem package so that we can serve our application over HTTPS instead of HTTP, as shown in the following code: var https = require('https'), pem = require('pem'), express = require('express'), app = express(), socketIO = require('socket.io'); // Create a self-signed certificate with pem pem.createCertificate({ days: 1, selfSigned: true }, function (err, keys) { app.get('/', function(req, res){ res.sendFile(__dirname + '/index.html'); }); // Create an https server with the certificate and key from pem var server = https.createServer({ key: keys.serviceKey, cert: keys.certificate }, app).listen(5000); vario = socketIO(server); io.on('connection', function (socket) { var protocol = 'ws://'; // Check the handshake to determine if it was secure or not if (socket.handshake.secure) { protocol = 'wss://'; } socket.emit('hello.client', { message: 'This is a message from the server. It was sent using the ' + protocol + ' protocol' }); }); }); In your client-side JavaScript, specify secure: true when you initialize your WebSocket as follows: var socket = io('//localhost:5000', { secure: true }); socket.on('hello.client', function (data) { console.log(data); }); Now, start your server and navigate to https://localhost:5000. Proceed to this page. You should see a message in your browser developer tools that shows, This is a message from the server. It was sent using the wss:// protocol. How It Works… The protocol of our WebSocket is actually set automatically based on the protocol of the page that it sits on. This means that a page that is served over the HTTP protocol will send the WebSocket communications over ws:// by default, and a page that is served by HTTPS will default to using the wss:// protocol. However, by setting the secure option to true, we told the WebSocket to always serve through wss:// no matter what. Summary In this article, we gave you an overview of the topics related to security in Socket.IO applications. Resources for Article: Further resources on this subject: Using Socket.IO and Express together[article] Adding Real-time Functionality Using Socket.io[article] Welcome to JavaScript in the full stack [article]

0
0
1538

article-image-distributing-your-html5-game

Marco Stagni

12 Oct 2015

11 min read

Distributing your HTML5 Game

Marco Stagni

12 Oct 2015

11 min read

So, you've just completed your awesome project, and your game is ready to be played: it features tons of awesome characters and effects, a beautiful plot, an incredible soundtrack, and it sparkles at night. The only problem you're facing is: how should I distribute my game? How do I reach my players? Fortunately, this is an awesome world, and there are plenty of opportunities for you to reach millions of players. But before you are even able to reach the world, you need to package your game. For "packaging" your game, I will cover both the integration inside of a web page, and the actual packaging inside of a standalone application. In the first case, the only thing you need to do is be able to host your HTML page wherever you want, making it accessible for everybody in the world. Then, you will spread the world through every channel you know, and hopefully your website will be reached by a lot of interested people. If you used an HTML5 Game Engine (something like this ), then you should have a lot of tools to easily embed your game inside of an HTML page. Even if you're using a much more evolved game engine, such as Unity3D, you will always be able to incorporate the game into your website (of course, the only thing you need is to make sure that your chosen game engine should support this feature). In the Unity3D case, the user is prompted to download (or give access to, if already downloaded) the Unity3D Web Plugin, which allows the user to play the game with no problems at all. If you don't own a website, or you don't want to use your personal web space to distribute your product, there are a lot of available alternatives out there. Using an external service So, as I said, there are a lot of services that give you the ability to publish and distribute your game, both in standalone application, and as a "normal" HTML page. I will introduce and explain the most famous ones, since they're the most important platforms, and they're full of potential players. Itch.io Itch.io is a world famous platform and marketplace for indie games. It has a complete web interface, and unlike Steam, itch.io doesn't provide a standalone client, and users have to download game files directly from the platform. It accepts any kind of games, like mobile, desktop and web games. Itch.io can be considered as a game marketplace, since every developer is able to set a minimum price for his/her product: every player has the ability to choose the amount they thinks that game deserves. This can be seen as downside of this platform, but you have to consider that you can set a minimum price, and giving this ability to players cash spread the knowledge of your game more easily. Every developer can freely upload his/her content with no restrictions (except usual restrictions, like the ones concerning copyright infringment), with no entrance fee and with a fast profile creation procedure. Itch.io takes a 10% fee on developers' revenue (which is an absolutely amazing ratio) and allows developers to self manage their products, even allowing pre-orders if your game is still in development. Another awesome feature is that you don't need an account to play hosted games. This makes Itch.io a very interesting platform for those interested in selling a game. Another alternative, yet very similar to Itch.io, is gamejolt.com. GameJolt has basically the same features as Itch.io, although players are forced to create an account in order to play. An interesting feature of Game Jolt is that shares advertisements revenue with developers. Gog.com Gog.com is a more restrictive platform, since it only accepts a few games each month, but if you're one of the lucky guys whose work has been accepted, you will get a lot of promotion from the platform itself: your new release will be publicly shown on social media channels, and a great number of players will surely reach your game. This platform tries to help developers build their own community with on-site forums: every game has its own space inside the platform, where players and developers can share thoughts and discuss topics such as new releases or bugs to be fixed. Another interesting feature is that Gog.com helps you reach the end of your development path: they give you the possibility of an advance in royalties, to help your game reach its final stage. Finally, we can talk about the most famous and popular game distribution service of all time, Steam. Steam Steam is famous. Almost very player in the world knows it, and even if you don't download anything from the platform, its name is always recognizable. Steam provides a huge platform where developers and teams from all over the world can upload their works (both games and softwares), to get an income. Steam splits the revenue with a 70/30 model, which is considered a pretty fair ratio. The actual process of game uploading is called Steam GreenLight. Its main feature is being the most famous and biggest game marketplace of the world, and it provides its own client, available for Mac, Windows and Linux. The only difference from itch.io, gog.com and gamejolt.com is that only standalone applications are submittable. Building a standalone application Since this is still a beautiful world with a lot of possibilities, we can now build our own standalone game in a very easy way. At the end of the process I'm about to explain, you will obtain a single program that will contain your game, and you will be able to submit it to Steam or your favorite marketplace. The technology we're about to use is called "node-webkit", and you can find a lot of useful resources almost everywhere on the web about it. What is node-webkit? Node-webkit is like a browser that only loads your HTML content. You can think about it as a very small and super simplified version of Google Chrome, with the same amazing features. What makes node-webkit an incredible piece of software is that it's based on Chromium and Webkit, and seamlessly fully integrates Node.js, so you can access Node.js while interacting with your browser (this means that almost every node.js module you already use is available for you in node-webkit). What is more important for you, is that node-webkit is cross-platform: this means you can build your game for Windows, OS X and Linux very easily. Creating your first node-webkit application So, you now know that node-webkit is a powerful tool, and you will certainly be able to deploy your game on Steam GreenLight. The question is, how do you build a game? I will now show you what is a simple node-webkit application structure, and how to build your game for each platform. Project structure Node-webkit requires a very simple structure. The only requirements are a "package.json" file and your "index.html". So, if your project structure contains a index.html file, with all of its static assets (scripts, stylesheets, images), you have to provide a "package.json" near the index page. A very basic structure for this file should be something like this: { "name":"My Awesome Game", "description":"My very first game, built with amazing software", "author":"Walter White", "main":"index.html", "version":"1.0.0", "nodejs":true, "window":{ "title":"Awesomeness 3D", "width":1280, "height":720, "toolbar":false, "position":"center" } } It's very easy to understand what this package.json file says: it's telling node-webkit how to render our window, which file is the main one, and provides a few fields describing our application (its name, its author, its description and so on). The node-webkit wiki provides a very detailed description of the right way to write your package.json file, here. When you're still in debugging mode, setting the "toolbar" option to "true" is a great deal: you will gain access to the Chrome Developers Tool. So, now you have your completed game. To start the packaging process, you have to zip all your files inside a .nw package, like this: zip -r -X app.nw . -x *.git* This will recursively zip your game inside a package called "app.nw", removing all the git related files (I assume you're using git, if you're not using it you can simplify the command and run zip -r -X app.nw .). We now need to download the node-webkit executable, in order to create our own executable. Every platform needs its own path to be completed, and each one will be explained separately. OS X Since the OS X version of node-webkit is an .app package, the operations you need to follow are very simple. First of all, right-click the .app file, and select Show Package Contents. What you have to do right now, is to copy and paste your "app.nw" package inside Contents/Resources . If you now try to launch the .app file, your game will start immediately. You're now free to customize the application you created: you can change the Info.plist file to match the properties of your game. Be careful of what you change, or you might be forced to start from the beginning. You can user your favorite text editor. You can also update the application icon by replacing the nw.icns file inside Contents/Resources. Your game is now ready to run on Mac OS X. Linux Linux is also very easy to setup. First of all, after you download the Linux version of node-webkit, you have to unzip all of the files and copy your "node-webkit" package (your "app.nw") inside the root of the folder you just unzipped. Now, simply prompt using this command: cat nw app.nw > app chmod +x app And launch it using: ./app To distribute your linux package, zip the executable along with a few required files: nw.pak libffmpegsumo.so credits.html The last one is legally required due to multiple third-party open source libraries used by node-webkit. Windows The last one, Windows. First of all, download the Windows version of node-webkit and unzip it. Copy your app.nw file inside the folder, which contains the nw.exe file. Using the Windows command line tool (CMD), prompt this command: copy /b nw.exe+app.nw app.exe You can now launch your created .exe file to see your game running. To distribute your game in the easiest way, you can zip your executable with a few required files: nw.pak ffmpegsumo.dll icudtl.dat libEGL.dll libGLESv2.dll Your Windows package is ready to be distributed. You can use a powerful tool available to create Windows Installers, which is WixEdit. Conclusions Now you have your own executable, ready to be downloaded and played by millions of players. Your game is available for almost every platform (I won't cover the details on how to prepare your game for mobile, and since it's a pretty detailed topic, it will probably need another post), both online and offline. It's always been completely up to you decide where your game will live, but now you have a ton of multiple choices: you can decide to upload your HTML content to an online platform, or to your own website, pack it in a standalone application, and distribute it via any channel you know to whoever you desire. Your other options is to upload it to a marketplace like Steam. This is pretty much everything you need to know in order to get started. The purpose of this post was to give you a set of information to help you decide which distribution way is the most suitable for you: there are still a lot of alternatives, and the ones described here are probably the most famous. I hope your games will reach and engage every player in this world, and I wish you all the success you deserve. About the Author Marco Stagni is a Italian frontend and mobile developer, with a Bachelor's Degree in Computer Engineering. He's completely in love with JavaScript, and he's trying to push his knowledge of the language in every possible direction. After a few years as frontend and Android developer, working both with Italian Startups and Web Agencies, he's now deepening his knowledge about Game Programming. His efforts are currently aimed to the completion of his biggest project: a Javascript Game Engine, built on top of THREE.js and Physijs (the project is still in alpha version, but already downloadable via http://npmjs.org/package/wage. You can follow him also on twitter @marcoponds or on github at http://github.com/marco-ponds. Marco is a big NBA fan.

0
0
6279

How-To Tutorials

article-image-creating-graph-application-python-neo4j-gephi-linkuriousjs

Greg Roberts

12 Oct 2015

13 min read

Creating a graph application with Python, Neo4j, Gephi & Linkurious.js

Greg Roberts

12 Oct 2015

13 min read

I love Python, and to celebrate Packt Python week, I’ve spent some time developing an app using some of my favorite tools. The app is a graph visualization of Python and related topics, as well as showing where all our content fits in. The topics are all StackOverflow tags, related by their co-occurrence in questions on the site. The app is available to view at http://gregroberts.github.io/ and in this blog, I’m going to discuss some of the techniques I used to construct the underlying dataset, and how I turned it into an online application. Graphs, not charts Graphs are an incredibly powerful tool for analyzing and visualizing complex data. In recent years, many different graph database engines have been developed to make use of this novel manner of representing data. These databases offer many benefits over traditional, relational databases because of how the data is stored and accessed. Here at Packt, I use a Neo4j graph to store and analyze data about our business. Using the Cypher query language, it’s easy to express complicated relations between different nodes succinctly. It’s not just the technical aspect of graphs which make them appealing to work with. Seeing the connections between bits of data visualized explicitly as in a graph helps you to see the data in a different light, and make connections that you might not have spotted otherwise. This graph has many uses at Packt, from customer segmentation to product recommendations. In the next section, I describe the process I use to generate recommendations from the database. Make the connection For product recommendations, I use what’s known as a hybrid filter. This considers both content based filtering (product x and y are about the same topic) and collaborative filtering (people who bought x also bought y). Each of these methods has strengths and weaknesses, so combining them into one algorithm provides a more accurate signal. The collaborative aspect is straightforward to implement in Cypher. For a particular product, we want to find out which other product is most frequently bought alongside it. We have all our products and customers stored as nodes, and purchases are stored as edges. Thus, the Cypher query we want looks like this: MATCH (n:Product {title:’Learning Cypher’})-[r:purchased*2]-(m:Product) WITH m.title AS suggestion, count(distinct r)/(n.purchased+m.purchased) AS alsoBought WHERE m<>n RETURN* ORDER BY alsoBought DESC and will very efficiently return the most commonly also purchased product. When calculating the weight, we divide by the total units sold of both titles, so we get a proportion returned. We do this so we don’t just get the titles with the most units; we’re effectively calculating the size of the intersection of the two titles’ audiences relative to their overall audience size. The content side of the algorithm looks very similar: MATCH (n:Product {title:’Learning Cypher’})-[r:is_about*2]-(m:Product) WITH m.title AS suggestion, count(distinct r)/(length(n.topics)+length(m.topics)) AS alsoAbout WHERE m<>n RETURN * ORDER BY alsoAbout DESC Implicit in this algorithm is knowledge that a title is_about a topic of some kind. This could be done manually, but where’s the fun in that? In Packt’s domain there already exists a huge, well moderated corpus of technology concepts and their usage: StackOverflow. The tagging system on StackOverflow not only tells us about all the topics developers across the world are using, it also tells us how those topics are related, by looking at the co-occurrence of tags in questions. So in our graph, StackOverflow tags are nodes in their own right, which represent topics. These nodes are connected via edges, which are weighted to reflect their co-occurrence on StackOverflow: edge_weight(n,m) = (Number of questions tagged with both n & m)/(Number questions tagged with n or m) So, to find topics related to a given topic, we could execute a query like this: MATCH (n:StackOverflowTag {name:'Matplotlib'})-[r:related_to]-(m:StackOverflowTag) RETURN n.name, r.weight, m.name ORDER BY r.weight DESC LIMIT 10 Which would return the following: | n.name | r.weight | m.name ----+------------+----------+-------------------- 1 | Matplotlib | 0.065699 | Plot 2 | Matplotlib | 0.045678 | Numpy 3 | Matplotlib | 0.029667 | Pandas 4 | Matplotlib | 0.023623 | Python 5 | Matplotlib | 0.023051 | Scipy 6 | Matplotlib | 0.017413 | Histogram 7 | Matplotlib | 0.015618 | Ipython 8 | Matplotlib | 0.013761 | Matplotlib Basemap 9 | Matplotlib | 0.013207 | Python 2.7 10 | Matplotlib | 0.012982 | Legend There are many, more complex relationships you can define between topics like this, too. You can infer directionality in the relationship by looking at the local network, or you could start constructing Hyper graphs using the extensive StackExchange API. So we have our topics, but we still need to connect our content to topics. To do this, I’ve used a two stage process. Step 1 – Parsing out the topics We take all the copy (words) pertaining to a particular product as a document representing that product. This includes the title, chapter headings, and all the copy on the website. We use this because it’s already been optimized for search, and should thus carry a fair representation of what the title is about. We then parse this document and keep all the words which match the topics we’ve previously imported. #...code for fetching all the copy for all the products key_re = 'W(%s)W' % '|'.join(re.escape(i) for i in topic_keywords) for i in documents tags = re.findall(key_re, i[‘copy’]) i['tags'] = map(lambda x: tag_lookup[x],tags) Having done this for each product, we have a bag of words representing each product, where each word is a recognized topic. Step 2 – Finding the information From each of these documents, we want to know the topics which are most important for that document. To do this, we use the tf-idf algorithm. tf-idf stands for term frequency, inverse document frequency. The algorithm takes the number of times a term appears in a particular document, and divides it by the proportion of the documents that word appears in. The term frequency factor boosts terms which appear often in a document, whilst the inverse document frequency factor gets rid of terms which are overly common across the entire corpus (for example, the term ‘programming’ is common in our product copy, and whilst most of the documents ARE about programming, this doesn’t provide much discriminating information about each document). To do all of this, I use python (obviously) and the excellent scikit-learn library. Tf-idf is implemented in the class sklearn.feature_extraction.text.TfidfVectorizer. This class has lots of options you can fiddle with to get more informative results. import sklearn.feature_extraction.text as skt tagger = skt.TfidfVectorizer(input = 'content', encoding = 'utf-8', decode_error = 'replace', strip_accents = None, analyzer = lambda x: x, ngram_range = (1,1), max_df = 0.8, min_df = 0.0, norm = 'l2', sublinear_tf = False) It’s a good idea to use the min_df & max_df arguments of the constructor so as to cut out the most common/obtuse words, to get a more informative weighting. The ‘analyzer’ argument tells it how to get the words from each document, in our case, the documents are already lists of normalized words, so we don’t need anything additional done. #create vectors of all the documents vectors = tagger.fit_transform(map(lambda x: x['tags'],rows)).toarray() #get back the topic names to map to the graph t_map = tagger.get_feature_names() jobs = [] for ind, vec in enumerate(vectors): features = filter(lambda x: x[1]>0, zip(t_map,vec)) doc = documents[ind] for topic, weight in features: job = ‘’’MERGE (n:StackOverflowTag {name:’%s’}) MERGE (m:Product {id:’%s’}) CREATE UNIQUE (m)-[:is_about {source:’tf_idf’,weight:%d}]-(n) ’’’ % (topic, doc[‘id’], weight) jobs.append(job) We then execute all of the jobs using Py2neo’s Batch functionality. Having done all of this, we can now relate products to each other in terms of what topics they have in common: MATCH (n:Product {isbn10:'1783988363'})-[r:is_about]-(a)-[q:is_about]-(m:Product {isbn10:'1783289007'}) WITH a.name as topic, r.weight+q.weight AS weight RETURN topic ORDER BY weight DESC limit 6 Which returns: | topic ---+------------------ 1 | Machine Learning 2 | Image 3 | Models 4 | Algorithm 5 | Data 6 | Python Huzzah! I now have a graph into which I can throw any piece of content about programming or software, and it will fit nicely into the network of topics we’ve developed. Take a breath So, that’s how the graph came to be. To communicate with Neo4j from Python, I use the excellent py2neo module, developed by Nigel Small. This module has all sorts of handy abstractions to allow you to work with nodes and edges as native Python objects, and then update your Neo instance with any changes you’ve made. The graph I’ve spoken about is used for many purposes across the business, and has grown in size and scope significantly over the last year. For this project, I’ve taken from this graph everything relevant to Python. I started by getting all of our content which is_about Python, or about a topic related to python: titles = [i.n for i in graph.cypher.execute('''MATCH (n)-[r:is_about]-(m:StackOverflowTag {name:'Python'}) return distinct n''')] t2 = [i.n for i in graph.cypher.execute('''MATCH (n)-[r:is_about]-(m:StackOverflowTag)-[:related_to]-(m:StackOverflowTag {name:'Python'}) where has(n.name) return distinct n''')] titles.extend(t2) then hydrated this further by going one or two hops down each path in various directions, to get a large set of topics and content related to Python. Visualising the graph Since I started working with graphs, two visualisation tools I’ve always used are Gephi and Sigma.js. Gephi is a great solution for analysing and exploring graphical data, allowing you to apply a plethora of different layout options, find out more about the statistics of the network, and to filter and change how the graph is displayed. Sigma.js is a lightweight JavaScript library which allows you to publish beautiful graph visualizations in a browser, and it copes very well with even very large graphs. Gephi has a great plugin which allows you to export your graph straight into a web page which you can host, share and adapt. More recently, Linkurious have made it their mission to bring graph visualization to the masses. I highly advise trying the demo of their product. It really shows how much value it’s possible to get out of graph based data. Imagine if your Customer Relations team were able to do a single query to view the entire history of a case or customer, laid out as a beautiful graph, full of glyphs and annotations. Linkurious have built their product on top of Sigma.js, and they’ve made available much of the work they’ve done as the open source Linkurious.js. This is essentially Sigma.js, with a few changes to the API, and an even greater variety of plugins. On Github, each plugin has an API page in the wiki and a downloadable demo. It’s worth cloning the repository just to see the things it’s capable of! Publish It! So here’s the workflow I used to get the Python topic graph out of Neo4j and onto the web. Use Py2neo to graph the subgraph of content and topics pertinent to Python, as described above Add to this some other topics linked to the same books to give a fuller picture of the Python “world” Add in topic-topic edges and product-product edges to show the full breadth of connections observed in the data Export all the nodes and edges to csv files Import node and edge tables into Gephi. The reason I’m using Gephi as a middle step is so that I can fiddle with the visualisation in Gephi until it looks perfect. The layout plugin in Sigma is good, but this way the graph is presentable as soon as the page loads, the communities are much clearer, and I’m not putting undue strain on browsers across the world! The layout of the graph has been achieved using a number of plugins. Instead of using the pre-installed ForceAtlas layouts, I’ve used the OpenOrd layout, which I feel really shows off the communities of a large graph. There’s a really interesting and technical presentation about how this layout works here. Export the graph into gexf format, having applied some partition and ranking functions to make it more clear and appealing. Now it’s all down to Linkurious and its various plugins! You can explore the source code of the final page to see all the details, but here I’ll give an overview of the different plugins I’ve used for the different parts of the visualisation: First instantiate the graph object, pointing to a container (note the CSS of the container, without this, the graph won’t display properly: <style type="text/css"> #container { max-width: 1500px; height: 850px; margin: auto; background-color: #E5E5E5; } </style> … <div id="container"></div> … <script> s= new sigma({ container: 'container', renderer: { container: document.getElementById('container'), type: 'canvas' }, settings: { … } }); sigma.parsers.gexf - used for (trivially!) importing a gexf file into a sigma instance sigma.parsers.gexf( 'static/data/Graph1.gexf', s, function(s) { //callback executed once the data is loaded, use this to set up any aspects of the app which depend on the data }); sigma.plugins.filter - Adds the ability to very simply hide nodes/edges based on a callback function which returns a boolean. This powers the filtering widgets on the page. <input class="form-control" id="min-degree" type="range" min="0" max="0" value="0"> … function applyMinDegreeFilter(e) { var v = e.target.value; $('#min-degree-val').textContent = v; filter .undo('min-degree') .nodesBy( function(n, options) { return this.graph.degree(n.id) >= options.minDegreeVal; },{ minDegreeVal: +v }, 'min-degree' ) .apply(); }; $('#min-degree').change(applyMinDegreeFilter); sigma.plugins.locate - Adds the ability to zoom in on a single node or collection of nodes. Very useful if you’re filtering a very large initial graph function locateNode (nid) { if (nid == '') { locate.center(1); } else { locate.nodes(nid); } }; sigma.renderers.glyphs - Allows you to add custom glyphs to each node. Useful if you have many types of node. Outro This application has been a very fun little project to build. The improvements to Sigma wrought by Linkurious have resulted in an incredibly powerful toolkit to rapidly generate graph based applications with a great degree of flexibility and interaction potential. None of this would have been possible were it not for Python. Python is my right (left, I’m left handed) hand which I use for almost everything. Its versatility and expressiveness make it an incredibly robust Swiss army knife in any data-analysts toolkit.

0
0
29669

article-image-basics-jupyter-notebook-python

Packt Editorial Staff

11 Oct 2015

28 min read

Basics of Jupyter Notebook and Python

Packt Editorial Staff

11 Oct 2015

28 min read

In this article by Cyrille Rossant, coming from his book, Learning IPython for Interactive Computing and Data Visualization - Second Edition, we will see how to use IPython console, Jupyter Notebook, and we will go through the basics of Python. Originally, IPython provided an enhanced command-line console to run Python code interactively. The Jupyter Notebook is a more recent and more sophisticated alternative to the console. Today, both tools are available, and we recommend that you learn to use both. [box type="note" align="alignleft" class="" width=""]The first chapter of the book, Chapter 1, Getting Started with IPython, contains all installation instructions. The main step is to download and install the free Anaconda distribution at https://www.continuum.io/downloads (the version of Python 3 64-bit for your operating system).[/box] Launching the IPython console To run the IPython console, type ipython in an OS terminal. There, you can write Python commands and see the results instantly. Here is a screenshot: IPython console The IPython console is most convenient when you have a command-line-based workflow and you want to execute some quick Python commands. You can exit the IPython console by typing exit. [box type="note" align="alignleft" class="" width=""]Let's mention the Qt console, which is similar to the IPython console but offers additional features such as multiline editing, enhanced tab completion, image support, and so on. The Qt console can also be integrated within a graphical application written with Python and Qt. See http://jupyter.org/qtconsole/stable/ for more information.[/box] Launching the Jupyter Notebook To run the Jupyter Notebook, open an OS terminal, go to ~/minibook/ (or into the directory where you've downloaded the book's notebooks), and type jupyter notebook. This will start the Jupyter server and open a new window in your browser (if that's not the case, go to the following URL: http://localhost:8888). Here is a screenshot of Jupyter's entry point, the Notebook dashboard: The Notebook dashboard [box type="note" align="alignleft" class="" width=""]At the time of writing, the following browsers are officially supported: Chrome 13 and greater; Safari 5 and greater; and Firefox 6 or greater. Other browsers may work also. Your mileage may vary.[/box] The Notebook is most convenient when you start a complex analysis project that will involve a substantial amount of interactive experimentation with your code. Other common use-cases include keeping track of your interactive session (like a lab notebook), or writing technical documents that involve code, equations, and figures. In the rest of this section, we will focus on the Notebook interface. [box type="note" align="alignleft" class="" width=""]Closing the Notebook server To close the Notebook server, go to the OS terminal where you launched the server from, and press Ctrl + C. You may need to confirm with y.[/box] The Notebook dashboard The dashboard contains several tabs which are as follows: Files: shows all files and notebooks in the current directory Running: shows all kernels currently running on your computer Clusters: lets you launch kernels for parallel computing A notebook is an interactive document containing code, text, and other elements. A notebook is saved in a file with the .ipynb extension. This file is a plain text file storing a JSON data structure. A kernel is a process running an interactive session. When using IPython, this kernel is a Python process. There are kernels in many languages other than Python. [box type="note" align="alignleft" class="" width=""]We follow the convention to use the term notebook for a file, and Notebook for the application and the web interface.[/box] In Jupyter, notebooks and kernels are strongly separated. A notebook is a file, whereas a kernel is a process. The kernel receives snippets of code from the Notebook interface, executes them, and sends the outputs and possible errors back to the Notebook interface. Thus, in general, the kernel has no notion of the Notebook. A notebook is persistent (it's a file), whereas a kernel may be closed at the end of an interactive session and it is therefore not persistent. When a notebook is re-opened, it needs to be re-executed. In general, no more than one Notebook interface can be connected to a given kernel. However, several IPython consoles can be connected to a given kernel. The Notebook user interface To create a new notebook, click on the New button, and select Notebook (Python 3). A new browser tab opens and shows the Notebook interface as follows: A new notebook Here are the main components of the interface, from top to bottom: The notebook name, which you can change by clicking on it. This is also the name of the .ipynb file. The Menu bar gives you access to several actions pertaining to either the notebook or the kernel. To the right of the menu bar is the Kernel name. You can change the kernel language of your notebook from the Kernel menu. The Toolbar contains icons for common actions. In particular, the dropdown menu showing Code lets you change the type of a cell. Following is the main component of the UI: the actual Notebook. It consists of a linear list of cells. We will detail the structure of a cell in the following sections. Structure of a notebook cell There are two main types of cells: Markdown cells and code cells, and they are described as follows: A Markdown cell contains rich text. In addition to classic formatting options like bold or italics, we can add links, images, HTML elements, LaTeX mathematical equations, and more. A code cell contains code to be executed by the kernel. The programming language corresponds to the kernel's language. We will only use Python in this book, but you can use many other languages. You can change the type of a cell by first clicking on a cell to select it, and then choosing the cell's type in the toolbar's dropdown menu showing Markdown or Code. Markdown cells Here is a screenshot of a Markdown cell: A Markdown cell The top panel shows the cell in edit mode, while the bottom one shows it in render mode. The edit mode lets you edit the text, while the render mode lets you display the rendered cell. We will explain the differences between these modes in greater detail in the following section. Code cells Here is a screenshot of a complex code cell: Structure of a code cell This code cell contains several parts, as follows: The Prompt number shows the cell's number. This number increases every time you run the cell. Since you can run cells of a notebook out of order, nothing guarantees that code numbers are linearly increasing in a given notebook. The Input area contains a multiline text editor that lets you write one or several lines of code with syntax highlighting. The Widget area may contain graphical controls; here, it displays a slider. The Output area can contain multiple outputs, here: Standard output (text in black) Error output (text with a red background) Rich output (an HTML table and an image here) The Notebook modal interface The Notebook implements a modal interface similar to some text editors such as vim. Mastering this interface may represent a small learning curve for some users. Use the edit mode to write code (the selected cell has a green border, and a pen icon appears at the top right of the interface). Click inside a cell to enable the edit mode for this cell (you need to double-click with Markdown cells). Use the command mode to operate on cells (the selected cell has a gray border, and there is no pen icon). Click outside the text area of a cell to enable the command mode (you can also press the Esc key). Keyboard shortcuts are available in the Notebook interface. Type h to show them. We review here the most common ones (for Windows and Linux; shortcuts for Mac OS X may be slightly different). Keyboard shortcuts available in both modes Here are a few keyboard shortcuts that are always available when a cell is selected: Ctrl + Enter: run the cell Shift + Enter: run the cell and select the cell below Alt + Enter: run the cell and insert a new cell below Ctrl + S: save the notebook Keyboard shortcuts available in the edit mode In the edit mode, you can type code as usual, and you have access to the following keyboard shortcuts: Esc: switch to command mode Ctrl + Shift + -: split the cell Keyboard shortcuts available in the command mode In the command mode, keystrokes are bound to cell operations. Don't write code in command mode or unexpected things will happen! For example, typing dd in command mode will delete the selected cell! Here are some keyboard shortcuts available in command mode: Enter: switch to edit mode Up or k: select the previous cell Down or j: select the next cell y / m: change the cell type to code cell/Markdown cell a / b: insert a new cell above/below the current cell x / c / v: cut/copy/paste the current cell dd: delete the current cell z: undo the last delete operation Shift + =: merge the cell below h: display the help menu with the list of keyboard shortcuts Spending some time learning these shortcuts is highly recommended. References Here are a few references: Main documentation of Jupyter at http://jupyter.readthedocs.org/en/latest/ Jupyter Notebook interface explained at http://jupyter-notebook.readthedocs.org/en/latest/notebook.html A crash course on Python If you don't know Python, read this section to learn the fundamentals. Python is a very accessible language and is even taught to school children. If you have ever programmed, it will only take you a few minutes to learn the basics. Hello world Open a new notebook and type the following in the first cell: In [1]: print("Hello world!") Out[1]: Hello world! Here is a screenshot: "Hello world" in the Notebook [box type="note" align="alignleft" class="" width=""]Prompt string Note that the convention chosen in this article is to show Python code (also called the input) prefixed with In [x]: (which shouldn't be typed). This is the standard IPython prompt. Here, you should just type print("Hello world!") and then press Shift + Enter.[/box] Congratulations! You are now a Python programmer. Variables Let's use Python as a calculator. In [2]: 2 * 2 Out[2]: 4 Here, 2 * 2 is an expression statement. This operation is performed, the result is returned, and IPython displays it in the notebook cell's output. [box type="note" align="alignleft" class="" width=""]Division In Python 3, 3 / 2 returns 1.5 (floating-point division), whereas it returns 1 in Python 2 (integer division). This can be source of errors when porting Python 2 code to Python 3. It is recommended to always use the explicit 3.0 / 2.0 for floating-point division (by using floating-point numbers) and 3 // 2 for integer division. Both syntaxes work in Python 2 and Python 3. See http://python3porting.com/differences.html#integer-division for more details.[/box] Other built-in mathematical operators include +, -, ** for the exponentiation, and others. You will find more details at https://docs.python.org/3/reference/expressions.html#the-power-operator. Variables form a fundamental concept of any programming language. A variable has a name and a value. Here is how to create a new variable in Python: In [3]: a = 2 And here is how to use an existing variable: In [4]: a * 3 Out[4]: 6 Several variables can be defined at once (this is called unpacking): In [5]: a, b = 2, 6 There are different types of variables. Here, we have used a number (more precisely, an integer). Other important types include floating-point numbers to represent real numbers, strings to represent text, and booleans to represent True/False values. Here are a few examples: In [6]: somefloat = 3.1415 sometext = 'pi is about' # You can also use double quotes. print(sometext, somefloat) # Display several variables. Out[6]: pi is about 3.1415 Note how we used the # character to write comments. Whereas Python discards the comments completely, adding comments in the code is important when the code is to be read by other humans (including yourself in the future). String escaping String escaping refers to the ability to insert special characters in a string. For example, how can you insert ' and ", given that these characters are used to delimit a string in Python code? The backslash is the go-to escape character in Python (and in many other languages too). Here are a few examples: In [7]: print("Hello "world"") print("A list:n* item 1n* item 2") print("C:pathonwindows") print(r"C:pathonwindows") Out[7]: Hello "world" A list: * item 1 * item 2 C:pathonwindows C:pathonwindows The special character n is the new line (or line feed) character. To insert a backslash, you need to escape it, which explains why it needs to be doubled as . You can also disable escaping by using raw literals with a r prefix before the string, like in the last example above. In this case, backslashes are considered as normal characters. This is convenient when writing Windows paths, since Windows uses backslash separators instead of forward slashes like on Unix systems. A very common error on Windows is forgetting to escape backslashes in paths: writing "C:path" may lead to subtle errors. You will find the list of special characters in Python at https://docs.python.org/3.4/reference/lexical_analysis.html#string-and-bytes-literals. Lists A list contains a sequence of items. You can concisely instruct Python to perform repeated actions on the elements of a list. Let's first create a list of numbers as follows: In [8]: items = [1, 3, 0, 4, 1] Note the syntax we used to create the list: square brackets [], and commas , to separate the items. The built-in function len() returns the number of elements in a list: In [9]: len(items) Out[9]: 5 [box type="note" align="alignleft" class="" width=""]Python comes with a set of built-in functions, including print(), len(), max(), functional routines like filter() and map(), and container-related routines like all(), any(), range(), and sorted(). You will find the full list of built-in functions at https://docs.python.org/3.4/library/functions.html.[/box] Now, let's compute the sum of all elements in the list. Python provides a built-in function for this: In [10]: sum(items) Out[10]: 9 We can also access individual elements in the list, using the following syntax: In [11]: items[0] Out[11]: 1 In [12]: items[-1] Out[12]: 1 Note that indexing starts at 0 in Python: the first element of the list is indexed by 0, the second by 1, and so on. Also, -1 refers to the last element, -2, to the penultimate element, and so on. The same syntax can be used to alter elements in the list: In [13]: items[1] = 9 items Out[13]: [1, 9, 0, 4, 1] We can access sublists with the following syntax: In [14]: items[1:3] Out[14]: [9, 0] Here, 1:3 represents a slice going from element 1 included (this is the second element of the list) to element 3 excluded. Thus, we get a sublist with the second and third element of the original list. The first-included/last-excluded asymmetry leads to an intuitive treatment of overlaps between consecutive slices. Also, note that a sublist refers to a dynamic view of the original list, not a copy; changing elements in the sublist automatically changes them in the original list. Python provides several other types of containers: Tuples are immutable and contain a fixed number of elements: In [15]: my_tuple = (1, 2, 3) my_tuple[1] Out[15]: 2 Dictionaries contain key-value pairs. They are extremely useful and common: In [16]: my_dict = {'a': 1, 'b': 2, 'c': 3} print('a:', my_dict['a']) Out[16]: a: 1 In [17]: print(my_dict.keys()) Out[17]: dict_keys(['c', 'a', 'b']) There is no notion of order in a dictionary. However, the native collections module provides an OrderedDict structure that keeps the insertion order (see https://docs.python.org/3.4/library/collections.html). Sets, like mathematical sets, contain distinct elements: In [18]: my_set = set([1, 2, 3, 2, 1]) my_set Out[18]: {1, 2, 3} A Python object is mutable if its value can change after it has been created. Otherwise, it is immutable. For example, a string is immutable; to change it, a new string needs to be created. A list, a dictionary, or a set is mutable; elements can be added or removed. By contrast, a tuple is immutable, and it is not possible to change the elements it contains without recreating the tuple. See https://docs.python.org/3.4/reference/datamodel.html for more details. Loops We can run through all elements of a list using a for loop: In [19]: for item in items: print(item) Out[19]: 1 9 0 4 1 There are several things to note here: The for item in items syntax means that a temporary variable named item is created at every iteration. This variable contains the value of every item in the list, one at a time. Note the colon : at the end of the for statement. Forgetting it will lead to a syntax error! The statement print(item) will be executed for all items in the list. Note the four spaces before print: this is called the indentation. You will find more details about indentation in the next subsection. Python supports a concise syntax to perform a given operation on all elements of a list, as follows: In [20]: squares = [item * item for item in items] squares Out[20]: [1, 81, 0, 16, 1] This is called a list comprehension. A new list is created here; it contains the squares of all numbers in the list. This concise syntax leads to highly readable and Pythonic code. Indentation Indentation refers to the spaces that may appear at the beginning of some lines of code. This is a particular aspect of Python's syntax. In most programming languages, indentation is optional and is generally used to make the code visually clearer. But in Python, indentation also has a syntactic meaning. Particular indentation rules need to be followed for Python code to be correct. In general, there are two ways to indent some text: by inserting a tab character (also referred to as t), or by inserting a number of spaces (typically, four). It is recommended to use spaces instead of tab characters. Your text editor should be configured such that the Tab key on the keyboard inserts four spaces instead of a tab character. In the Notebook, indentation is automatically configured properly; so you shouldn't worry about this issue. The question only arises if you use another text editor for your Python code. Finally, what is the meaning of indentation? In Python, indentation delimits coherent blocks of code, for example, the contents of a loop, a conditional branch, a function, and other objects. Where other languages such as C or JavaScript use curly braces to delimit such blocks, Python uses indentation. Conditional branches Sometimes, you need to perform different operations on your data depending on some condition. For example, let's display all even numbers in our list: In [21]: for item in items: if item % 2 == 0: print(item) Out[21]: 0 4 Again, here are several things to note: An if statement is followed by a boolean expression. If a and b are two integers, the modulo operand a % b returns the remainder from the division of a by b. Here, item % 2 is 0 for even numbers, and 1 for odd numbers. The equality is represented by a double equal sign == to avoid confusion with the assignment operator = that we use when we create variables. Like with the for loop, the if statement ends with a colon :. The part of the code that is executed when the condition is satisfied follows the if statement. It is indented. Indentation is cumulative: since this if is inside a for loop, there are eight spaces before the print(item) statement. Python supports a concise syntax to select all elements in a list that satisfy certain properties. Here is how to create a sublist with only even numbers: In [22]: even = [item for item in items if item % 2 == 0] even Out[22]: [0, 4] This is also a form of list comprehension. Functions Code is typically organized into functions. A function encapsulates part of your code. Functions allow you to reuse bits of functionality without copy-pasting the code. Here is a function that tells whether an integer number is even or not: In [23]: def is_even(number): """Return whether an integer is even or not.""" return number % 2 == 0 There are several things to note here: A function is defined with the def keyword. After def comes the function name. A general convention in Python is to only use lowercase characters, and separate words with an underscore _. A function name generally starts with a verb. The function name is followed by parentheses, with one or several variable names called the arguments. These are the inputs of the function. There is a single argument here, named number. No type is specified for the argument. This is because Python is dynamically typed; you could pass a variable of any type. This function would work fine with floating point numbers, for example (the modulo operation works with floating point numbers in addition to integers). The body of the function is indented (and note the colon : at the end of the def statement). There is a docstring wrapped by triple quotes """. This is a particular form of comment that explains what the function does. It is not mandatory, but it is strongly recommended to write docstrings for the functions exposed to the user. The return keyword in the body of the function specifies the output of the function. Here, the output is a Boolean, obtained from the expression number % 2 == 0. It is possible to return several values; just use a comma to separate them (in this case, a tuple of Booleans would be returned). Once a function is defined, it can be called like this: In [24]: is_even(3) Out[24]: False In [25]: is_even(4) Out[25]: True Here, 3 and 4 are successively passed as arguments to the function. Positional and keyword arguments A Python function can accept an arbitrary number of arguments, called positional arguments. It can also accept optional named arguments, called keyword arguments. Here is an example: In [26]: def remainder(number, divisor=2): return number % divisor The second argument of this function, divisor, is optional. If it is not provided by the caller, it will default to the number 2, as shown here: In [27]: remainder(5) Out[27]: 1 There are two equivalent ways of specifying a keyword argument when calling a function. They are as follows: In [28]: remainder(5, 3) Out[28]: 2 In [29]: remainder(5, divisor=3) Out[29]: 2 In the first case, 3 is understood as the second argument, divisor. In the second case, the name of the argument is given explicitly by the caller. This second syntax is clearer and less error-prone than the first one. Functions can also accept arbitrary sets of positional and keyword arguments, using the following syntax: In [30]: def f(*args, **kwargs): print("Positional arguments:", args) print("Keyword arguments:", kwargs) In [31]: f(1, 2, c=3, d=4) Out[31]: Positional arguments: (1, 2) Keyword arguments: {'c': 3, 'd': 4} Inside the function, args is a tuple containing positional arguments, and kwargs is a dictionary containing keyword arguments. Passage by assignment When passing a parameter to a Python function, a reference to the object is actually passed (passage by assignment): If the passed object is mutable, it can be modified by the function If the passed object is immutable, it cannot be modified by the function Here is an example: In [32]: my_list = [1, 2] def add(some_list, value): some_list.append(value) add(my_list, 3) my_list Out[32]: [1, 2, 3] The add() function modifies an object defined outside it (in this case, the object my_list); we say this function has side-effects. A function with no side-effects is called a pure function: it doesn't modify anything in the outer context, and it deterministically returns the same result for any given set of inputs. Pure functions are to be preferred over functions with side-effects. Knowing this can help you spot out subtle bugs. There are further related concepts that are useful to know, including function scopes, naming, binding, and more. Here are a couple of links: Passage by reference at https://docs.python.org/3/faq/programming.html#how-do-i-write-a-function-with-output-parameters-call-by-reference Naming, binding, and scope at https://docs.python.org/3.4/reference/executionmodel.html Errors Let's discuss errors in Python. As you learn, you will inevitably come across errors and exceptions. The Python interpreter will most of the time tell you what the problem is, and where it occurred. It is important to understand the vocabulary used by Python so that you can more quickly find and correct your errors. Let's see the following example: In [33]: def divide(a, b): return a / b In [34]: divide(1, 0) Out[34]: --------------------------------------------------------- ZeroDivisionError Traceback (most recent call last) <ipython-input-2-b77ebb6ac6f6> in <module>() ----> 1 divide(1, 0) <ipython-input-1-5c74f9fd7706> in divide(a, b) 1 def divide(a, b): ----> 2 return a / b ZeroDivisionError: division by zero Here, we defined a divide() function, and called it to divide 1 by 0. Dividing a number by 0 is an error in Python. Here, a ZeroDivisionError exception was raised. An exception is a particular type of error that can be raised at any point in a program. It is propagated from the innards of the code up to the command that launched the code. It can be caught and processed at any point. You will find more details about exceptions at https://docs.python.org/3/tutorial/errors.html, and common exception types at https://docs.python.org/3/library/exceptions.html#bltin-exceptions. The error message you see contains the stack trace, the exception type, and the exception message. The stack trace shows all function calls between the raised exception and the script calling point. The top frame, indicated by the first arrow ---->, shows the entry point of the code execution. Here, it is divide(1, 0), which was called directly in the Notebook. The error occurred while this function was called. The next and last frame is indicated by the second arrow. It corresponds to line 2 in our function divide(a, b). It is the last frame in the stack trace: this means that the error occurred there. Object-oriented programming Object-oriented programming (OOP) is a relatively advanced topic. Although we won't use it much in this book, it is useful to know the basics. Also, mastering OOP is often essential when you start to have a large code base. In Python, everything is an object. A number, a string, or a function is an object. An object is an instance of a type (also known as class). An object has attributes and methods, as specified by its type. An attribute is a variable bound to an object, giving some information about it. A method is a function that applies to the object. For example, the object 'hello' is an instance of the built-in str type (string). The type() function returns the type of an object, as shown here: In [35]: type('hello') Out[35]: str There are native types, like str or int (integer), and custom types, also called classes, that can be created by the user. In IPython, you can discover the attributes and methods of any object with the dot syntax and tab completion. For example, typing 'hello'.u and pressing Tab automatically shows us the existence of the upper() method: In [36]: 'hello'.upper() Out[36]: 'HELLO' Here, upper() is a method available to all str objects; it returns an uppercase copy of a string. A useful string method is format(). This simple and convenient templating system lets you generate strings dynamically, as shown in the following example: In [37]: 'Hello {0:s}!'.format('Python') Out[37]: Hello Python The {0:s} syntax means "replace this with the first argument of format(), which should be a string". The variable type after the colon is especially useful for numbers, where you can specify how to display the number (for example, .3f to display three decimals). The 0 makes it possible to replace a given value several times in a given string. You can also use a name instead of a position—for example 'Hello {name}!'.format(name='Python'). Some methods are prefixed with an underscore _; they are private and are generally not meant to be used directly. IPython's tab completion won't show you these private attributes and methods unless you explicitly type _ before pressing Tab. In practice, the most important thing to remember is that appending a dot . to any Python object and pressing Tab in IPython will show you a lot of functionality pertaining to that object. Functional programming Python is a multi-paradigm language; it notably supports imperative, object-oriented, and functional programming models. Python functions are objects and can be handled like other objects. In particular, they can be passed as arguments to other functions (also called higher-order functions). This is the essence of functional programming. Decorators provide a convenient syntax construct to define higher-order functions. Here is an example using the is_even() function from the previous Functions section: In [38]: def show_output(func): def wrapped(*args, **kwargs): output = func(*args, **kwargs) print("The result is:", output) return wrapped The show_output() function transforms an arbitrary function func() to a new function, named wrapped(), that displays the result of the function, as follows: In [39]: f = show_output(is_even) f(3) Out[39]: The result is: False Equivalently, this higher-order function can also be used with a decorator, as follows: In [40]: @show_output def square(x): return x * x In [41]: square(3) Out[41]: The result is: 9 You can find more information about Python decorators at https://en.wikipedia.org/wiki/Python_syntax_and_semantics#Decorators and at http://www.thecodeship.com/patterns/guide-to-python-function-decorators/. Python 2 and 3 Let's finish this section with a few notes about Python 2 and Python 3 compatibility issues. There are still some Python 2 code and libraries that are not compatible with Python 3. Therefore, it is sometimes useful to be aware of the differences between the two versions. One of the most obvious differences is that print is a statement in Python 2, whereas it is a function in Python 3. Therefore, print "Hello" (without parentheses) works in Python 2 but not in Python 3, while print("Hello") works in both Python 2 and Python 3. There are several non-mutually exclusive options to write portable code that works with both versions: futures: A built-in module supporting backward-incompatible Python syntax 2to3: A built-in Python module to port Python 2 code to Python 3 six: An external lightweight library for writing compatible code Here are a few references: Official Python 2/3 wiki page at https://wiki.python.org/moin/Python2orPython3 The Porting to Python 3 book, by CreateSpace Independent Publishing Platform at http://www.python3porting.com/bookindex.html 2to3 at https://docs.python.org/3.4/library/2to3.html six at https://pythonhosted.org/six/ futures at https://docs.python.org/3.4/library/__future__.html The IPython Cookbook contains an in-depth recipe about choosing between Python 2 and 3, and how to support both. Going beyond the basics You now know the fundamentals of Python, the bare minimum that you will need in this book. As you can imagine, there is much more to say about Python. Following are a few further basic concepts that are often useful and that we cannot cover here, unfortunately. You are highly encouraged to have a look at them in the references given at the end of this section: range and enumerate pass, break, and, continue, to be used in loops Working with files Creating and importing modules The Python standard library provides a wide range of functionality (OS, network, file systems, compression, mathematics, and more) Here are some slightly more advanced concepts that you might find useful if you want to strengthen your Python skills: Regular expressions for advanced string processing Lambda functions for defining small anonymous functions Generators for controlling custom loops Exceptions for handling errors with statements for safely handling contexts Advanced object-oriented programming Metaprogramming for modifying Python code dynamically The pickle module for persisting Python objects on disk and exchanging them across a network Finally, here are a few references: Getting started with Python: https://www.python.org/about/gettingstarted/ A Python tutorial: https://docs.python.org/3/tutorial/index.html The Python Standard Library: https://docs.python.org/3/library/index.html Interactive tutorial: http://www.learnpython.org/ Codecademy Python course: http://www.codecademy.com/tracks/python Language reference (expert level): https://docs.python.org/3/reference/index.html Python Cookbook, by David Beazley and Brian K. Jones, O'Reilly Media (advanced level, highly recommended if you want to become a Python expert) Summary In this article, we have seen how to launch the IPython console and Jupyter Notebook, the different aspects of the Notebook and its user interface, the structure of the notebook cell, keyboard shortcuts that are available in the Notebook interface, and the basics of Python. Introduction to Data Analysis and Libraries Hand Gesture Recognition Using a Kinect Depth Sensor The strange relationship between objects, functions, generators and coroutines

0
0
126551

article-image-how-to-develop-for-wearable-tech-with-pebblejs

Eugene Safronov

09 Oct 2015

7 min read

How to Develop for Today's Wearable Tech with Pebble.js

Eugene Safronov

09 Oct 2015

7 min read

Pebble is a smartwatch that pairs with both Android and iPhone devices via Bluetooth. It has an e-paper display with LED backlight, accelerometer and compass sensors. On top of that battery lasts up to a week between charges. From the beginning Pebble team embraced the Developer community which resulted in powerful SDK. Although a primary language for apps development is C, there is a room for JavaScript developers as well. PebbleKit JS The PebbleKit JavaScript framework expands the ability of Pebble app to run JavaScript logic on the phone. It allows fast access to data location from the phone and has API for getting data from the web. Unfortunately app development still requires programming in C. I could recommend some great articles on how to get started. Pebble.js Pebble.js, in contrast to PebbleKit JS, allows developers to create watchapp using only JavaScript code. It is simple yet powerful enough for creating watch apps that fetch and display data from various web services or remotely control other smartdevices. The downside of that approach is connected to the way how Pebble.js works. It is built on top of the standard Pebble SDK and PebbleKit JS. It consists of a C app that runs on the watch and interacts with the phone in order to process user actions. The Pebble.js library provides an API to build user interface and then remotely controls the C app to display it. As a consequence of the described approach watchapp couldn't function without a connection to the phone. On a side note, I would mention that library is open source and still in beta, so breaking API changes are possible. First steps There are 2 options getting started with Pebble.js: Install Pebble SDK on your local machine. This option allows you to customize Pebble.js. Create a CloudPebble account and work with your appwatch projects online. It is the easiest way to begin Pebble development. CloudPebble The CloudPebble environment allows you to write, build and deploy your appwatch applications both on a simulator and a physical device. Everything is stored in the cloud so no headache with compilers, virtual machines or python dependencies (in my case installation of boost-python end up with errors on MacOS). Hello world As an introduction let's build the Hello World application with Pebble.js. Create a new project in CloudPebble: Then write the following code in the app.js file: // Import the UI elements var UI = require('ui'); // Create a simple Card var card = new UI.Card({ title: 'Hello World', body: 'This is your first Pebble app!' }); // Display to the user card.show(); Start the code on Pebble watch or simulator and you will get the same screen as below: StackExchange profile Getting some data from web services like weather or news is easy with an Ajax library call. For example let's construct an app view of your StackExchange profile: var UI = require('ui'); var ajax = require('ajax'); // Create a Card with title and subtitle var card = new UI.Card({ title:'Profile', subtitle:'Fetching...' }); // Display the Card card.show(); // Construct URL // https://api.stackexchange.com/docs/me#order=desc&sort=reputation&filter=default&site=stackoverflow&run=true var API_TOKEN = 'put api token here'; var API_KEY = 'secret key'; var URL = 'https://api.stackexchange.com/2.2/me?key=' + API_KEY + '&order=desc&sort=reputation&access_token=' + API_TOKEN + '&filter=default'; // Make the request ajax( { url: URL, type: 'json' }, function(data) { // Success! console.log('Successfully fetched StackOverflow profile'); var profile = data.items[0]; var badges = 'Badges: ' + profile.badge_counts.gold + ' ' + profile.badge_counts.silver + ' ' + profile.badge_counts.bronze; // Show to user card.subtitle('Rep: ' + profile.reputation); card.body(badges + 'nDaily change:' + profile.reputation_change_day + 'nWeekly change:' + profile.reputation_change_week + 'nMonthly change:' + profile.reputation_change_month); }, function(error) { // Failure! console.log('Failed fetching Stackoverflow data: ' + JSON.stringify(error)); } ); Egg timer Lastly I would like to create a small real life watchapp. I will demonstrate how to compose a Timer app for boiling eggs. Let's start with the building blocks that we need: Window is the basic building block in Pebble application. It allows you to add different elements and specify a position and size for them. A menu is a type of Window that displays a standard Pebble menu. Vibe allows you to trigger vibration on the wrist. It will signal that eggs are boiled. Egg size screen Users are able to select eggs size from options: Medium Large Extra-large var UI = require('ui'); var menu = new UI.Menu({ sections: [{ title: 'Egg size', items: [{ title: 'Medium', }, { title: 'Large' }, { title: 'Extra-large' }] }] }); menu.show(); Timer selection screen On the next step user selects timer duration. It depends whether he wants soft-boiled or hard-boiled eggs. The second level menu for medium size looks like: var mdMenu = new UI.Menu({ sections: [{ title: 'Egg timer', items: [{ title: 'Runny', subtitle: '2m' }, { title: 'Soft', subtitle: '3m' }, { title: 'Firm', subtitle: '4m' }, { title: 'Hard', subtitle: '5m' }] }] }); // open second level menu from the main menu.on('select', function(e) { if (e.itemIndex === 0){ mdMenu.show(); } else if (e.itemIndex === 1){ lgMenu.show(); } else { xlMenu.show(); } }); Timer screen When timer duration is selected we start a countdown. mdMenu.on('select', onTimerSelect); lgMenu.on('select', onTimerSelect); xlMenu.on('select', onTimerSelect); // timeouts mapping from header to seconds var timeouts = { '2m': 120, '3m': 180, '4m': 240, '5m': 300, '6m': 360, '7m': 420 }; function onTimerSelect(e){ var timeout = timeouts[e.item.subtitle]; timer(timeout); } The final bit of the watchapp is to display a timer, show a message and notify the user with vibration on the wrist when time is elapsed. var readyMessage = new UI.Card({ title: 'Done', body: 'Your eggs are ready!' }); function timer(timerInSec){ var intervalId = setInterval(function(){ timerInSec--; // notify with double vibration if (timerInSec == 1){ Vibe.vibrate('double'); } if (timerInSec > 0){ timerText.text(getTimeString(timerInSec)); } else { readyMessage.show(); timerWindow.hide(); clearInterval(intervalId); // notify with long vibration Vibe.vibrate('long'); } }, 1000); var timerWindow = new UI.Window(); var timerText = new UI.Text({ position: new Vector2(0, 50), size: new Vector2(144, 30), font: 'bitham-42-light', text: getTimeString(timerInSec), textAlign: 'center' }); timerWindow.add(timerText); timerWindow.show(); timerWindow.on('hide', function(){ clearInterval(intervalId); }); } // format remaining time into 00:00 string function getTimeString(timeInSec){ var minutes = parseInt(timeInSec / 60); var seconds = timeInSec % 60; return minutes + ':' + (seconds < 10 ? ('0' + seconds) : seconds); } Conclusion You can do much more with Pebble.js: Get accelerometer values Display complex UI mixing geometric elements, text and images Animate elements on the screen Use the GPS and LocalStorage on the phone Timeline API is coming Pebble.js is best suited for quick prototyping and applications that require access to the Internet. The unfortunate part of the JavaScript written applications is the requirement of a constant connection to the phone. Usually Pebble.js apps need more power and respond slower than a similar native app. Useful links Pebble.js tutorial Pebble blog Pebble.js docs Egg timer code About the author Eugene Safronov is a software engineer with a proven record of delivering high quality software. He has an extensive experience building successful teams and adjusting development processes to the project’s needs. His primary focuses are Web (.NET, node.js stacks) and cross-platform mobile development (native and hybrid). He can be found on Twitter @sejoker.

0
0
5080

How-To Tutorials

article-image-asynchronous-communication-between-components

Packt

09 Oct 2015

12 min read

Asynchronous Communication between Components

Packt

09 Oct 2015

12 min read

0
0
6125

Packt

08 Oct 2015

22 min read

Linux Shell Scripting

Packt

08 Oct 2015

22 min read

0
0
7298

article-image-first-principle-and-useful-way-think

Packt

08 Oct 2015

8 min read

First Principle and a Useful Way to Think

Packt

08 Oct 2015

8 min read

In this article, by Timothy Washington, author of the book Clojure for Finance, we will cover the following topics: Modeling the stock price activity Function evaluation First-Class functions Lazy evaluation Basic Clojure functions and immutability Namespace modifications and creating our first function (For more resources related to this topic, see here.) Modeling the stock price activity There are many types of banks. Commercial entities (large stores, parking areas, hotels, and so on) that collect and retain credit card information, are either quasi banks, or farm out credit operations to bank-like processing companies. There are more well-known consumer banks, which accept demand deposits from the public. There are also a range of other banks such as commercial banks, insurance companies and trusts, credit unions, and in our case, investment banks. As promised, this article will slowly build up a set of lagging price indicators that follow a moving stock price time series. In order to do that, I think it's useful to touch on stock markets, and to crudely model stock price activity. A stock (or equity) market, is a collection of buyers and sellers trading economic assets (usually companies). The stock (or shares) of those companies can be equities listed on an exchange (New York Stock Exchange, London Stock Exchange, and others), or may be those traded privately. In this exercise, we will do the following: Crudely model the stock price movement, which will give us a test bed for writing our lagging price indicators Introduce some basic features of the Clojure language Function evaluation The Clojure website has a cheatsheet (http://clojure.org/cheatsheet) with all of the language's core functions. The first function we'll look at is rand, a function that randomly gives you a number within a given range. So in your edgar/ project, launch a repl with the lein repl shell command. After a few seconds, you will enter repl (Read-Eval-Print-Loop). Again, Clojure functions are executed by being placed in the first position of a list. The function's arguments are placed directly afterwards. In your repl, evaluate (rand 35) or (rand 99) or (rand 43.21) or any number you fancy Run it many times to see that you can get any different floating point number, within 0 and the upper bound of the number you provided First-Class functions The next functions we'll look at are repeatedly and fn. repeatedly is a function that takes another function and returns an infinite (or length n if supplied) lazy sequence of calls to the latter function. This is our first encounter of a function that can take another function. We'll also encounter functions that return other functions. Described as First-Class functions, this falls out of lambda calculus and is one of the central features of functional programming. As such, we need to wrap our previous (rand 35) call in another function. fn is one of Clojure's core functions, and produces an anonymous, unnamed function. We can now supply this function to repeatedly. In your repl, if you evaluate (take 25 (repeatedly (fn [] (rand 35)))), you should see a long list of floating point numbers with the list's tail elided. Lazy evaluation We only took the first 25 of the (repeatedly (fn [] (rand 35))) result list, because the list (actually a lazy sequence) is infinite. Lazy evaluation (or laziness) is a common feature in functional programming languages. Being infinite, Clojure chooses to delay evaluating most of the list until it's needed by some other function that pulls out some values. Laziness benefits us by increasing performance and letting us more easily construct control flow. We can avoid needless calculation, repeated evaluations, and potential error conditions in compound expressions. Let's try to pull out some values with the take function. take itself, returns another lazy sequence, of the first n items of a collection. Evaluating (take 25 (repeatedly (fn [] (rand 35)))) will pull out the first 25 repeatedly calls to rand which generates a float between 0 and 35. Basic Clojure functions and immutability There's many operations we can perform over our result list (or lazy sequence). One of the main approaches of functional programming is to take a data structure, and perform operations over top of it to produce a new data structure, or some atomic result (a string, number, and so on). This may sound inefficient at first. But most FP languages employ something called immutability to make these operations efficient. Immutable data structures are the ones that cannot change once they've been created. This is feasible as most immutable, FP languages use some kind of structural data sharing between an original and a modified version. The idea is that if we run evaluate (conj [1 2 3] 4), the resulting [1 2 3 4] vector shares the original vector of [1 2 3]. The only additional resource that's assigned is for any novelty that's been introduced to the data structure (the 4). There's a more detailed explanation of (for example) Clojure's persistent vectors here: conj: This conjoins an element to a collection—the collection decides where. So conjoining an element to a vector (conj [1 2 3] 4) versus conjoining an element to a list (conj '(1 2 3) 4) yield different results. Try it in your repl. map: This passes a function over one or many lists, yielding another list. (map inc [1 2 3]) increments each element by 1. reduce (or left fold): This passes a function over each element, accumulating one result. (reduce + (take 100 (repeatedly (fn [] (rand 35))))) sums the list. filter: This constrains the input by some condition. >=: This is a conditional function, which tests whether the first argument is greater than or equal to the second function. Try (>= 4 9) and (>= 9 1). fn: This is a function that creates a function. This unnamed or anonymous function can have any instructions you choose to put in there. So if we only want numbers above 12, we can put that assertion in a predicate function. Try entering the below expression into your repl: (take 25 (filter (fn [x] (>= x 12)) (repeatedly (fn [] (rand 35))))) Modifying the namespaces and creating our first function We now have the basis for creating a function. It will return a lazy infinite sequence of floating point numbers, within an upper and lower bound. defn is a Clojure function, which takes an anonymous function, and binds a name to it in a given namespace. A Clojure namespace is an organizational tool for mapping human-readable names to things like functions, named data structures and such. Here, we're going to bind our function to the name generate-prices in our current namespace. You'll notice that our function is starting to span multiple lines. This will be a good time to author the code in your text editor of choice. I'll be using Emacs: Open your text editor, and add this code to the file called src/edgar/core.clj. Make sure that (ns edgar.core) is at the top of that file. After adding the following code, you can then restart repl. (load "edgaru/core") uses the load function to load the Clojure code in your in src/edgaru/core.clj: (defn generate-prices [lower-bound upper-bound] (filter (fn [x] (>= x lower-bound)) (repeatedly (fn [] (rand upper-bound))))) The Read-Eval-Print-Loop In our repl, we can pull in code in various namespaces, with the require function. This applies to the src/edgar/core.clj file we've just edited. That code will be in the edgar.core namespace: In your repl, evaluate (require '[edgar.core :as c]). c is just a handy alias we can use instead of the long name. You can then generate random prices within an upper and lower bound. Take the first 10 of them like this (take 10 (c/generate-prices 12 35)). You should see results akin to the following output. All elements should be within the range of 12 to 35: (29.60706184716407 12.507593971664075 19.79939384292759 31.322074615579716 19.737852534147326 25.134649707849572 19.952195022152488 12.94569843904663 23.618693004455086 14.695872710062428) There's a subtle abstraction in the preceding code that deserves attention. (require '[edgar.core :as c]) introduces the quote symbol. ' is the reader shorthand for the quote function. So the equivalent invocation would be (require (quote [edgar.core :as c])). Quoting a form tells the Clojure reader not to evaluate the subsequent expression (or form). So evaluating '(a b c) returns a list of three symbols, without trying to evaluate a. Even though those symbols haven't yet been assigned, that's okay, because that expression (or form) has not yet been evaluated. But that begs a larger question. What is reader? Clojure (and all Lisps) are what's known as homoiconic. This means that Clojure code is also data. And data can be directly output and evaluated as code. The reader is the thing that parses our src/edgar/core.clj file (or (+ 1 1) input from the repl prompt), and produces the data structures that are evaluated. read and eval are the 2 essential processes by which Clojure code runs. The evaluation result is printed (or output), usually to the standard output device. Then we loop the process back to the read function. So, when the repl reads, your src/edgar/two.clj file, it's directly transforming that text representation into data and evaluating it. A few things fall out of that. For example, it becomes simpler for Clojure programs to directly read, transform and write out other Clojure programs. The implications of that will become clearer when we look at macros. But for now, know that there are ways to modify or delay the evaluation process, in this case by quoting a form. Summary In this article, we learned about basic features of the Clojure language and how to model the stock price activity. Besides these, we also learned function evaluation, First-Class functions, the lazy evaluation method, namespace modifications and creating our first function. Resources for Article: Further resources on this subject: Performance by Design[article] Big Data[article] The Observer Pattern [article]

0
0
1628

article-image-creating-city-information-app-customized-table-views

Packt

08 Oct 2015

19 min read

Creating a City Information App with Customized Table Views

Packt

08 Oct 2015

19 min read

0
0
3573

article-image-real-time-communication-socketio

Daan van

07 Oct 2015

8 min read

Real-time Communication with SocketIO

Daan van

07 Oct 2015

8 min read

In this blog post we will create a small game and explore how to create a simple server that will allow real-time communication between players. The game we are going to create is an interpretation of the playground game Tag. One player is it and needs to tag other players by moving around her circle and touching other players circle. We will use Node.js to create the server. Node.js is a platform that brings JavaScript to the server by leveraging Chrome's JavaScript Runtime and provides API's to easily create applications. Don not worry if you are unfamiliar with Node. There are detailed instructions to follow. Node consist of a small core and is meant to be enhanced by modules. One of the modules we are going to use is SocketIO. SocketIO harnesses WebSockets in a ease to use way. WebSockets are a recent addition to the protocols over TCP and allows for bi-directional communication, ideal for real-time games. Follow along If you want to follow along with the code examples, download Tag-follow-along.zip and unzip it in a suitable location. It uses Node.js, so you need to install that before we can start. Once it is installed we need to retrieve the dependencies for the project. Open a console, enter the Tag-follow-along directory and execute the following command npm install This will install the dependencies over the network. You can start the game by running npm run game If you now open http://localhost:3000 in your browser you should be greeted with Tag although there is not a lot going on yet. Tag We will create a game Tag. One player is it and needs to tag other client and server. Below you see a screen-shot of the game. A Game of Tag The big circle is it and needs to tag other circles. Two smaller circles are trying to flee, but they seem to be cornered. Four tiny circles can be seen as well. These are already tagged and unable to move. Game Overview Because this blog post is about real-time communication we are not going into the game mechanics in detail. If you want to learn more about HTML games take a look at HTML5 Games Development by Example. The important files in the project are listed below. ├── lib │ ├── extend.js │ └── game.js ├── package.json ├── public │ ├── css │ │ └── tag.css │ ├── index.html │ ├── js │ │ ├── application.js │ │ └── tag.js │ └── options.json └── server.js We will give a short description of the role each file plays. package.json is a description of the project. It is mainly used to manage the dependencies of the project. server.js is the source for the server. It has two responsibilities: serving static files from the public directory and running the game. game.js is the source for the actual game of Tag. It keeps track of players, their position and if they are tagged. tag.js the counter-part of game.js in the browser. application.js is responsible for setting up the game in the browser. Communication Communication between server and clients The diagram above shows how the server and clients communicate with each other. The clients know about the position of the players. Each time a player moves the client should inform the server. The server in turn keeps track of all the positions that the clients send and updates the game state accordingly. It also needs to inform the clients of the updated game state. This is contrary to traditional pull-based communication between clients. I.e. a client makes a request and a server responds. From the diagram it is clear that the server can push information to clients without the clients make a request for it. This is made possible by WebSockets a protocol providing full-duplex communication channels over a single TCP connection. We will use the SocketIO library to tap into the potential of WebSockets. Setting up Communication In order to let the client and the server communicate with each other we need to add some code. Open server.js and require SocketIO. var express = require('express'); var http = require('http'); var socketio = require('socket.io'); var Game = require('./lib/game'); var options = require('./public/options.json'); This allows us to use the library in our code. Next we need to get a handle on io by adding it to the server. Find the line var server = http.Server(app); and change it to var server = http.Server(app); var io = socketio(server); This sets up the server for two-way communication with clients. In order to make an actual connection open public/js/application.js and add the following line var socket = io(); Head over to server.js and we will make sure we get some feedback when a client connects. At the end of the server.js file add the following lines io.on('connection', function(socket){ console.log('%s connected', socket.id); socket.on('disconnect', function(){ console.log('%s disconnected', socket.id); }); }); This will write a log message each time a client connects and disconnects. Restart the server, use Ctrl-C to stop it and run npm run game to start it. If you now refresh your browser a few times you should see something like Listening on http://:::3000 jK0r5Eqhzn16wySjAAAA connected jK0r5Eqhzn16wySjAAAA disconnected 52cTgdk_p3IwcrMXAAAB connected 52cTgdk_p3IwcrMXAAAB disconnected X89V2p12k9Jsw2UUAAAC connected X89V2p12k9Jsw2UUAAAC disconnected hw-7y6phGkvRz-TmAAAD connected Register Players The game is responsible for keeping track of players. So when a client connects we need to inform the game of a new player. When a client disconnects we need to remove the player from the game. Update the server.js to reflect this io.on('connection', function(socket){ console.log('%s connected', socket.id); game.addPlayer(socket.id); socket.on('disconnect', function(){ console.log('%s disconnected', socket.id); game.removePlayer(socket.id); }); }); Keeping Track of Player Position Each client should inform the server of the position the player wants to be in. We can use the socket that we created to emit this information to the server. Open the development tools of the browser and select the Console tab. If you now move your mouse around inside the square you will see some log messages occur. Object { x: 28, y: 417 } Object { x: 123, y: 403 } Object { x: 210, y: 401 } Object { x: 276, y: 401 } Object { x: 397, y: 401 } Object { x: 480, y: 419 } Object { x: 519, y: 435 } Object { x: 570, y: 471 } The position is already logged in the mouseMoveHandler in public/js/application.js. Change it to function mouseMoveHandler(event){ console.log({ 'x': event.pageX - this.offsetLeft, 'y': event.pageY - this.offsetTop }); socket.emit('position', { 'x': event.pageX - this.offsetLeft, 'y': event.pageY - this.offsetTop }); } The socket.emit call will send the position to the server. Lets make sure the server handles when it receives the updated position. In server.js head over to the socket registration code and add the following snippet. socket.on('position', function(data){ console.log('%s is at (%s, %s)', socket.id, data.x, data.y); }); This registers an event handler for position events that the client sends. If you now restart the server, refresh your browser and move your mouse around in the Tag square, you should also see log messages in the server output. kZyZkVjycq0_ZIvcAAAC is at (628, 133) kZyZkVjycq0_ZIvcAAAC is at (610, 136) kZyZkVjycq0_ZIvcAAAC is at (588, 139) kZyZkVjycq0_ZIvcAAAC is at (561, 145) kZyZkVjycq0_ZIvcAAAC is at (540, 148) kZyZkVjycq0_ZIvcAAAC is at (520, 148) kZyZkVjycq0_ZIvcAAAC is at (506, 148) kZyZkVjycq0_ZIvcAAAC is at (489, 148) kZyZkVjycq0_ZIvcAAAC is at (477, 148) kZyZkVjycq0_ZIvcAAAC is at (469, 148) Broadcasting Game State Now that the server receives the position of the client when it changes, we need to close the communication loop by sending the updated game state back. In server.js, change the position event handler into socket.on('position', function(data){ console.log('%s is at (%s, %s)', socket.id, data.x, data.y); game.update(socket.id, data); }); This updates the position of the player in the game. In the emitGameState function the game state is updated by calling game's tick method. This method is called 60 times a second. Change it to also broadcast the game state to all connected clients. function emitGameState() { game.tick(); io.emit('state', game.state()); }; In the client we need to respond when the server sends us a state event. We can do this be registering a state event handler in public/js/application.js socket.on('state', function(state){ game.updateState(state); }); If you now restart the server and reconnect your browser, you should see a big circle trying to follow your mouse around. Multiple Clients If you now open multiple browsers and let them all connect with http://localhost:3000, you could see the following picture Multiple Clients If you move your mouse in the Tag square in one of the browser, you would see the corresponding circle move in all of the browser. Conclusion We have seen that SocketIO is a very nice library that harnesses the power of WebSockets to enable real-time communication between client and server. Now start having fun with it! About the author Daan van Berkel is an enthusiastic software craftsman with a knack for presenting technical details in a clear and concise manner. Driven by the desire for understanding complex matters, Daan is always on the lookout for innovative uses of software.

0
0
3344

How-To Tutorials

article-image-fingerprint-detection-using-opencv

Packt

07 Oct 2015

11 min read

Fingerprint detection using OpenCV 3

Packt

07 Oct 2015

11 min read

In this article by Joseph Howse, Quan Hua, Steven Puttemans, and Utkarsh Sinha, the authors of OpenCV Blueprints, we delve into the aspect of fingerprint detection using OpenCV. (For more resources related to this topic, see here.) Fingerprint identification, how is it done? We have already discussed the use of the first biometric, which is the face of the person trying to login to the system. However since we mentioned that using a single biometric can be quite risky, we suggest adding secondary biometric checks to the system, like the fingerprint of a person. There are many of the shelf fingerprint scanners which are quite cheap and return you the scanned image. However you will still have to write your own registration software for these scanners and which can be done by using OpenCV software. Examples of such fingerprint images can be found below. Examples of single individual thumb fingerprint in different scanning positions This dataset can be downloaded from the FVC2002 competition website released by the University of Bologna. The website (http://bias.csr.unibo.it/fvc2002/databases.asp) contains 4 databases of fingerprints available for public download of the following structure: Four fingerprint capturing devices DB1 - DB4. For each device, the prints of 10 individuals are available. For each person, 8 different positions of prints were recorded. We will use this publicly available dataset to build our system upon. We will focus on the first capturing device, using up to 4 fingerprints of each individual for training the system and making an average descriptor of the fingerprint. Then we will use the other 4 fingerprints to evaluate our system and make sure that the person is still recognized by our system. You could apply exactly the same approach on the data grabbed from the other devices if you would like to investigate the difference between a system that captures almost binary images and one that captures grayscale images. However we will provide techniques for doing the binarization yourself. Implement the approach in OpenCV 3 The complete fingerprint software for processing fingerprints derived from a fingerprint scanner can be found at https://github.com/OpenCVBlueprints/OpenCVBlueprints/tree/master/chapter_6/source_code/fingerprint/fingerprint_process/. In this subsection we will describe how you can implement this approach in the OpenCV interface. We will start by grabbing the image from the fingerprint system and apply binarization. This will enable us to remove any desired noise from the image as well as help us to make the contrast better between the kin and the wrinkled surface of the finger. // Start by reading in an image Mat input = imread("/data/fingerprints/image1.png", CV_LOAD_GRAYSCALE); // Binarize the image, through local thresholding Mat input_binary; threshold(input, input_binary, 0, 255, CV_THRESH_BINARY | CV_THRESH_OTSU); The Otsu thresholding will automatically choose the best generic threshold for the image to obtain a good contrast between foreground and background information. This is because the image contains a bimodal distribution (which means that we have an image with a 2 peak histogram) of pixel values. For that image, we can approximately take a value in the middle of those peaks as threshold value. (for images which are not bimodal, binarization won't be accurate.) Otsu allows us to avoid using a fixed threshold value, and thus making the system more general for any capturing device. However we do acknowledge that if you have only a single capturing device, then playing around with a fixed threshold value could result in a better image for that specific setup. The result of the thresholding can be seen below. In order to make thinning from the next skeletization step as effective as possible we need the inverse binary image. Comparison between grayscale and binarized fingerprint image Once we have a binary image we are actually already set to go to calculate our feature points and feature point descriptors. However, in order to improve the process a bit more, we suggest to skeletize the image. This will create more unique and stronger interest points. The following piece of code can apply the skeletization on top of the binary image. The skeletization is based on the Zhang-Suen line thinning approach. Special thanks to @bsdNoobz of the OpenCV Q&A forum who supplied this iteration approach. #include <opencv2/imgproc.hpp> #include <opencv2/highgui.hpp> using namespace std; using namespace cv; // Perform a single thinning iteration, which is repeated until the skeletization is finalized void thinningIteration(Mat& im, int iter) { Mat marker = Mat::zeros(im.size(), CV_8UC1); for (int i = 1; i < im.rows-1; i++) { for (int j = 1; j < im.cols-1; j++) { uchar p2 = im.at<uchar>(i-1, j); uchar p3 = im.at<uchar>(i-1, j+1); uchar p4 = im.at<uchar>(i, j+1); uchar p5 = im.at<uchar>(i+1, j+1); uchar p6 = im.at<uchar>(i+1, j); uchar p7 = im.at<uchar>(i+1, j-1); uchar p8 = im.at<uchar>(i, j-1); uchar p9 = im.at<uchar>(i-1, j-1); int A = (p2 == 0 && p3 == 1) + (p3 == 0 && p4 == 1) + (p4 == 0 && p5 == 1) + (p5 == 0 && p6 == 1) + (p6 == 0 && p7 == 1) + (p7 == 0 && p8 == 1) + (p8 == 0 && p9 == 1) + (p9 == 0 && p2 == 1); int B = p2 + p3 + p4 + p5 + p6 + p7 + p8 + p9; int m1 = iter == 0 ? (p2 * p4 * p6) : (p2 * p4 * p8); int m2 = iter == 0 ? (p4 * p6 * p8) : (p2 * p6 * p8); if (A == 1 && (B >= 2 && B <= 6) && m1 == 0 && m2 == 0) marker.at<uchar>(i,j) = 1; } } im &= ~marker; } // Function for thinning any given binary image within the range of 0-255. If not you should first make sure that your image has this range preset and configured! void thinning(Mat& im) { // Enforce the range tob e in between 0 - 255 im /= 255; Mat prev = Mat::zeros(im.size(), CV_8UC1); Mat diff; do { thinningIteration(im, 0); thinningIteration(im, 1); absdiff(im, prev, diff); im.copyTo(prev); } while (countNonZero(diff) > 0); im *= 255; } The code above can then simply be applied to our previous steps by calling the thinning function on top of our previous binary generated image. The code for this is: Apply thinning algorithm Mat input_thinned = input_binary.clone(); thinning(input_thinned); This will result in the following output: Comparison between binarized and thinned fingerprint image using skeletization techniques. When we got this skeleton image, the following step would be to look for crossing points on the ridges of the fingerprint, which are being then called minutiae points. We can do this by a keypoint detector that looks at a large change in local contrast, like the Harris corner detector. Since the Harris corner detector is both able to detect strong corners and edges, this is ideally for the fingerprint problem, where the most important minutiae are short edges and bifurcation, the positions where edges come together. More information about minutae points and Harris Corner detection can be found in the following publications: Ross, Arun A., Jidnya Shah, and Anil K. Jain. "Toward reconstructing fingerprints from minutiae points." Defense and Security. International Society for Optics and Photonics, 2005. Harris, Chris, and Mike Stephens. "A combined corner and edge detector." Alvey vision conference. Vol. 15. 1988. Calling the Harris Corner operation on a skeletonized and binarized image in OpenCV is quite straightforward. The Harris corners are stored as positions corresponding in the image with their cornerness response value. If we want to detect points with a certain cornerness, than we should simply threshold the image. Mat harris_corners, harris_normalised; harris_corners = Mat::zeros(input_thinned.size(), CV_32FC1); cornerHarris(input_thinned, harris_corners, 2, 3, 0.04, BORDER_DEFAULT); normalize(harris_corners, harris_normalised, 0, 255, NORM_MINMAX, CV_32FC1, Mat()); We now have a map with all the available corner responses rescaled to the range of [0 255] and stored as float values. We can now manually define a threshold, that will generate a good amount of keypoints for our application. Playing around with this parameter could improve performance in other cases. This can be done by using the following code snippet: float threshold = 125.0; vector<KeyPoint> keypoints; Mat rescaled; convertScaleAbs(harris_normalised, rescaled); Mat harris_c(rescaled.rows, rescaled.cols, CV_8UC3); Mat in[] = { rescaled, rescaled, rescaled }; int from_to[] = { 0,0, 1,1, 2,2 }; mixChannels( in, 3, &harris_c, 1, from_to, 3 ); for(int x=0; x<harris_normalised.cols; x++){ for(int y=0; y<harris_normalised.rows; y++){ if ( (int)harris_normalised.at<float>(y, x) > threshold ){ // Draw or store the keypoint location here, just like you decide. In our case we will store the location of the keypoint circle(harris_c, Point(x, y), 5, Scalar(0,255,0), 1); circle(harris_c, Point(x, y), 1, Scalar(0,0,255), 1); keypoints.push_back( KeyPoint (x, y, 1) ); } } } Comparison between thinned fingerprint and Harris corner response, as well as the selected Harris corners. Now that we have a list of keypoints we will need to create some of formal descriptor of the local region around that keypoint to be able to uniquely identify it among other keypoints. Chapter 3, Recognizing facial expressions with machine learning, discusses in more detail the wide range of keypoints out there. In this article, we will mainly focus on the process. Feel free to adapt the interface with other keypoint detectors and descriptors out there, for better or for worse performance. Since we have an application where the orientation of the thumb can differ (since it is not a fixed position), we want a keypoint descriptor that is robust at handling these slight differences. One of the mostly used descriptors for that is the SIFT descriptor, which stands for scale invariant feature transform. However SIFT is not under a BSD license and can thus pose problems to use in commercial software. A good alternative in OpenCV is the ORB descriptor. In OpenCV you can implement it in the following way. Ptr<Feature2D> orb_descriptor = ORB::create(); Mat descriptors; orb_descriptor->compute(input_thinned, keypoints, descriptors); This will enable us to calculate only the descriptors using the ORB approach, since we already retrieved the location of the keypoints using the Harris corner approach. At this point we can retrieve a descriptor for each detected keypoint of any given fingerprint. The descriptors matrix will contain a row for each keypoint containing the representation. Let us now start from the case where we have only a single reference image for each fingerprint. In that case we will have a database containing a set of feature descriptors for the training persons in the database. We then have a single new entry, consisting of multiple descriptors for the keypoints found at registration time. We now have to match these descriptors to the descriptors stored in the database, to see which one has the best match. The most simple way is by performing a brute force matching using the hamming distance criteria between descriptors of different keypoints. // Imagine we have a vector of single entry descriptors as a database // We will still need to fill those once we compare everything, by using the code snippets above vector<Mat> database_descriptors; Mat current_descriptors; // Create the matcher interface Ptr<DescriptorMatcher> matcher = DescriptorMatcher::create("BruteForce-Hamming"); // Now loop over the database and start the matching vector< vector< DMatch > > all_matches; for(int entry=0; i<database_descriptors.size();entry++){ vector< DMatch > matches; matcheràmatch(database_descriptors[entry], current_descriptors, matches); all_matches.push_back(matches); } We now have all the matches stored as DMatch objects. This means that for each matching couple we will have the original keypoint, the matched keypoint and a floating point score between both matches, representing the distance between the matched points. The idea about finding the best match seems pretty straightforward. We take a look at the amount of matches that have been returned by the matching process and weigh them by their Euclidean distance in order to add some certainty. We then look for the matching process that yielded the biggest score. This will be our best match and the match we want to return as the selected one from the database. If you want to avoid an imposter getting assigned to the best matching score, you can again add a manual threshold on top of the scoring to avoid matches that are not good enough, to be ignored. However it is possible, and should be taken into consideration, that if you increase the score to high, that people with a little change will be rejected from the system, like for example in the case where someone cuts his finger and thus changing his pattern to drastically. Fingerprint matching process visualized. To summarize, we saw how to detect fingerprints and implement it using OpenCV 3. Resources for Article: Further resources on this subject: Making subtle color shifts with curves [article] Tracking Objects in Videos [article] Hand Gesture Recognition Using a Kinect Depth Sensor [article]

0
1
62522

Packt

07 Oct 2015

14 min read

Managing Pools for Desktops

Packt

07 Oct 2015

14 min read

In this article by Andrew Alloway, the author of VMware Horizon View High Availability, we will review strategies for providing High Availability for various types of VMware Horizon View desktop pools. (For more resources related to this topic, see here.) Overview of pools VMware Horizon View provides administrators with the ability to automatically provision and manage pools of desktops. As part of our provisioning of desktops, we must also consider how we will continue service for the individual users in the event of a host or storage failure. Generally High Availability requirements fall into two categories for each pool. We can have stateless desktops where the user information is not stored on the VM between sessions and Stateful desktops where the user information is stored on the desktop between sessions. Stateless desktops In a stateless configuration, we are not required to store data on the Virtual Desktops between user sessions. This allows us to use Local Storage instead of shared storage for our HA strategies as we can tolerate host failures without the use of shared disk. We can achieve a stateless desktop configuration using roaming profiles and/or View Persona profiles. This can greatly reduce cost and maintenance requirements for View Deployments. Stateless desktops are typical in the following environments: Task Workers: A group of workers where the tasks are well known and they all share a common set of core applications. Task workers can use roaming profiles to maintain data between user sessions. In a multi shift environment, having stateless desktops means we only need to provision as many desktops that will be used consecutively. Task Worker setups are typically found in the following scenarios: Data entry Call centers Finance, Accounts Payables, Accounts Receivables Classrooms (in some situations) Laboratories Healthcare terminals Kiosk Users: A group of users that do not login. Logins are typically automatic or without credentials. Kiosk users are typically untrusted users. Kiosk VMs should be locked down and restricted to only the core applications that need to be run. Kiosks are typically refreshed after logoff or at scheduled times after hours. Kiosks can be found in situations such as the following: Airline Check-In stations Library Terminals Classrooms (in some situations) Customer service terminals Customer Self-Serve Digital Signage Stateful desktops Statefull desktops have some advantages from reduced iops and higher disk performance due to the ability to choose thick provisioning. Stateful desktops are desktops that require user data to be stored on the VM or Desktop Host between user sessions. These machines typically are required by users who will extensively customize their desktop in non-trivial ways, require complex or unique applications that are not shared by a large group or require the ability to modify their VM Stateful Desktops are typically used for the following situations: Users who require the ability to modify the installed applications Developers IT Administrators Unique or specialized users Department Managers VIP staff/managers Dedicated pools Dedicated pools are View Desktops provisioned using thin or thick provisioning. Dedicated pools are typically used for Stateful Desktop deployments. Each desktop can be provisioned with a dedicated persistent disk used for storing the User Profile and data. Once assigned a desktop that user will always log into the same desktop ensuring that their profile is kept constant. During OS refresh, balances and recomposes the OS disk is reverted back to the base image. Dedicated Pools with persistent disks offer simplicity for managing desktops as minimal profile management takes place. It is all managed by the View Composer/View Connection Server. It also ensures that applications that store profile data will almost always be able to retrieve the profile data on the next login. Meaning that the administrator doesn't have to track down applications that incorrectly store data outside the roaming profile folder. HA considerations for dedicated pools Dedicated pools unfortunately have very difficult HA requirements. Storing the user profile with the VM means that the VM has to be stored and maintained in an HA aware fashion. This almost always results in a shared disk solution being required for Dedicated Pools. In the event of a host outage other hosts connected to the same storage can start up the VM. For shared storage, we can use NFS, iSCSI, Fibre Channel, or VMware Virtual SAN storage. Consider investing in storage systems with primary and backup controllers as we will be dependent on the disk controllers being always available. Backups are also a must with this system as there is very little recovery options in the event of a storage array failure. Floating Pools Floating pools are a pool of desktops where any user can be assigned to any desktop in the pool upon login. Floating pools are generally used for stateless desktop deployments. Floating pools can be used with roaming profiles or View Persona to provide a consistent user experience on login. Since floating pools are treated as disposable VMs, we open up additional options for HA. Floating pools are given 2 local disks, the OS disk which is a replica from the assigned base VM, and the Disposable Disk where the page file, hibernation file, and temp drive are located. When Floating pools are refreshed, recomposed or rebalanced, all changes made to the desktop by the users are lost. This is due to the Disposable Disk being discarded between refreshes and the OS disk being reverted back to the Base Image. As such any session information such as Profile, Temp directory, and software changes are lost between refreshes. Refreshes can be scheduled to occure after logoff, after every X days or can be manually refreshed. HA considerations for floating pools Floating pools can be protected in several ways depending on the environment. Since floating pools can be deployed on local storage we can protect against a host failure by provisioning the Floating Pool VMs on multiple separate hosts. In the event of a host failure the remaining Virtual Desktops will be used to log users in. If there is free capacity in the cluster more Virtual Desktops will be provisioned on other hosts. For environments with shared storage Floating Pools can still be deployed on the shared storage but it is a good idea to have a secondary shared storage device or a highly available storage device. In the event of a storage failure the VMs can be started on the secondary storage device. VMware Virtual SAN is inherently HA safe and there is no need for a secondary datastore when using Virtual SAN. Many floating pool environments will utilize a profile management solution such as Roaming Profiles or View Persona Management. In these situations it is essential to setup a redundant storage location for View Profiles and or Roaming Profiles. In practice a Windows DFS share is a convenient and easy way to guard profiles against loss in the event of an outage. DFS can be configured to replicate changes made to the profile in real time between hosts. If the Windows DFS server is provisioned as VMs on shared storage make sure to create a DRS rule to separate the VMs onto different hosts. Where possible DFS servers should be stored on separate disk arrays to ensure they data is preserved in the event of the Disk Array, or Storage Processor failure. For more information regarding Windows DFS you can visit the link below https://technet.microsoft.com/en-us/library/jj127250.aspx Manual pools Manual pools are custom dedicated desktops for each user. A VM is manually built for each user who is using the manual pool. Manual Pools are Stateful pools that generally do not utilize profile management technologies such as View Persona or Roaming Profiles. Like Dedicated pools once a user is assigned to a VM they will always log into the same VM. As such HA requirements for manual pools are very similar to dedicated pools. Manual desktops can be configured in almost any maner desired by the administrator. There are no requirements for more than one disk to be attached to the Manual Pool desktop. Manual pools can also be configured to utilize physical hardware as the Desktop such as Blade Servers, Desktop Computers or even Laptops. In this situation there are limited high availability options without investing in exotic and expensive hardware. As best practice the physical hosts should be built with redundant power supplies, ECC RAM, mirrored hard disks pending budget and HA requirements. There should be a good backup strategy for managing physical hosts connected to the Manual Pools. HA considerations for manual pools Manual pools like dedicated pools have a difficult HA requirement. Storing the user profile with the VM means that the VM has to be stored and maintained in an HA aware fashion. This almost always results in a shared disk solution being required for Manual Pools. In the event of a host outage other hosts connected to the same storage can start up the VM. For shared storage, we can use NFS, iSCSI, Fibre Channel, or VMware VSAN storage. Consider investing in storage systems with primary and backup controllers as we will be dependent on the disk controllers being always available. Backups are also a must with this system as there is very little recovery options in the event of a storage array failure. VSAN deployments are inherently HA safe and are excellent candidates for Manual Pool storage. Manual pools given their static nature also have the option of using replication technology to backup the VMs onto another disk. You can use VMware vSphere Replication to do automatic replication or use a variety of storage replication solutions offered by storage and backup vendors. In some cases it may be possible to use fault tolerance on the Virtual Desktops for truly high availability. Note that this would limit the individual VMs to a single vCPU which may be undesirable. Remote Desktop services pools Remote Desktop Services Pools (RDS Pools) are pools where the remote session or application is hosted on a Windows Remote Desktop Server. The application or remote session is run under the users' credentials. Usually all the user data is stored locally on the Remote Desktop Server but can also be stored remotely using Roaming Profiles or View Persona Profiles. Folder Redirection to a central network location is also used with RDS Pools. Typical uses for Remote Desktop Services is for migrating users off legacy RDS environments, hosting applications, and providing access to troublesome applications or applications with large memory foot prints. The Windows Remote Desktop Server can be either a VM or a standalone physical host. It can be combined with Windows Clustering technology to provide scalability and high availability. You can also deploy a load balancer solution to manage connections between multiple Windows Remote Desktop Servers. Remote Desktop services pool HA considerations Remote Desktop services HA revolves around protecting individual RDS VMs or provisioning a cluster of RDS servers. When a single VM is deployed wilth RDS generally it is best to use vSphere HA and clustering featurs to protect the VM. If the RDS resources are larger than practical for a VM then we must focus on protecting individual host or clustering multiple hosts. When the Windows Remote Desktop Server is deployed as a VM the following options are available: Protect the VM with VMware HA, using shared storage This allows vCenter to fail over the VM to another host in the event of a host failure. vSphere will be responcible for starting the VM on another host. The VM will resume from a crashed state. Replicate the Virtual Machine to separate disks on separate hosts using VMware Virtual SAN: Same as above but in this case the VM has been replicated to another host using Virtual SAN technology. The remote VM will be started up from a crashed state, using the last consistent harddrive image that was replicated. Using replication technologies such as vSphere Replication: The VM will be periodically synchronized to a remote host. In the event of a host failure we can manually activate the remotely synchronized VM. Use a Vendors Storage Level replication: In this case we allow our storage vendor to provide replication technology to provide a redundant backup. This protects us in the event of a storage or host failure. These failures can be automated or manual. Consult with your Storage Vendor for more information. Protect the VM using backup technologies: This provides redundancy in the sense that we won't loose the VM if it fails. Unfortuantely you are at the mercy of your restore process to bring the VM back to life. The VM will resume from a crashed state. Always keep backups of production servers. For RDS servers running on a dedicated server we could utilize the following: Redundant power supplies: Redundant power supplies will keep the server going while a PSU is being replaced or becomes defective. It is also a good idea to have 2 separate power sources for each power supply. Simple things like a power bar going faulty or triping a breaker could bring down the server if there are not two independent power sources. Uninteruptable Power Supply: Battery backups are always a must for production level equipment. Make sure to scale the UPS to provide adequate power and duration for your environment. Redundant network interfaces: In rare sucumstances a NIC can go bad or a cable can be damaged. In this case redundant NICs will prevent a server outage. Remember that to protect against a switch outage we should plug the NICs into separate switches. Mirrored or redundant disks: Harddrives are one of the most common failures in computers. Mirrored harddrives or RAID configurations are a must for production level equipment. 2 or more hosts: Clustering physical servers will ensure that host failures won't cause downtime. Consider multi site configurations for even more redundancy. Shared Strategies for VMs and Hardware: Provide High Availability to the RDS using Microsoft Network Load Balancer (NLB): Microsoft Network Load Balancer can provide load balancing to the RDS servers directy. In this situation the clients would connect to a single IP managed by the NLB which would randomly be assigned to a server. Provide High Availability using a load balancer to manage sessions between RDS servers: Using a hardware or software load balancer is can be used instead of Microsoft Network Load Balancers. Load Balancer vendors provide a high variety of capabilities and features for their load balancers. Consult your load balancer vendor for best practices. Use DNS Round Robin to alternate between RDS hosts: On of the most cost effective load balancing methods. It has the drawback of not being able to balance the load or to direct clients away from failed hosts. Updating DNS may delay adding new capacity to the cluster or delay removing a failed host from the cluster. Remote Desktop Connection Broker with High Availability: We can provide RDS failover using the Connection Broker feature of our RDS server. For more details see the link below. For more information regarding Remote Desktop Connection Broker with High Availability see: https://technet.microsoft.com/en-us/library/ff686148%28WS.10%29.aspx Here is an example topology using physical or virtual Microsoft RDS Servers. We use a load balancing technology for the View Connection Servers as described in the previous chapter. We then will connect to the RDS via either a load balancer, DNS round robin, or Cluster IP. Summary In this article, we covered the concept of stateful and stateless desktops and the consequences and techniques for supporting each in a highly available environment. Resources for Article: Further resources on this subject: Working with Virtual Machines[article] Storage Scalability[article] Upgrading VMware Virtual Infrastructure Setups [article]

0
0
8387

article-image-introduction-data-analysis-and-libraries

Packt

07 Oct 2015

13 min read

Introduction to Data Analysis and Libraries

Packt

07 Oct 2015

13 min read

In this article by Martin Czygan and Phuong Vothihong, the authors of the book Getting Started with Python Data Analysis, Data is raw information that can exist in any form, which is either usable or not. We can easily get data everywhere in our life; for example, the price of gold today is $ 1.158 per ounce. It does not have any meaning, except describing the gold price. This also shows that data is useful based on context. (For more resources related to this topic, see here.) With relational data connection, information appears and allows us to expand our knowledge beyond the range of our senses. When we possess gold price data gathered overtime, one information we might have is that the price has continuously risen from $1.152 to $1.158 for three days. It is used by someone who tracks gold prices. Knowledge helps people to create value in their lives and work. It is based on information that is organized, synthesized, or summarized to enhance comprehension, awareness, or understanding. It represents a state or potential for action and decisions. When the gold price continuously increases for three days, it will lightly decrease on the next day; this is useful knowledge. The following figure illustrates the steps from data to knowledge; we call this process the data analysis process and we will introduce it in the next section: In this article, we will cover the following topics: Data analysis and process Overview of libraries in data analysis using different programming languages Common Python data analysis libraries Data analysis and process Data is getting bigger and more diversified every day. Therefore, analyzing and processing data to advance human knowledge or to create value are big challenges. To tackle these challenges, you will need domain knowledge and a variety of skills, drawing from areas such as computer science, artificial intelligence (AI) and machine learning (ML), statistics and mathematics, and knowledge domain, as shown in the following figure: Let's us go through the Data analysis and it's domain knowledge: Computer science: We need this knowledge to provide abstractions for efficient data processing. A basic Python programming experience is required. We will introduce Python libraries used in data analysis. Artificial intelligence and machine learning: If computer science knowledge helps us to program data analysis tools, artificial intelligence and machine learning help us to model the data and learn from it in order to build smart products. Statistics and mathematics: We cannot extract useful information from raw data if we do not use statistical techniques or mathematical functions. Knowledge domain: Besides technology and general techniques, it is important to have an insight into the specific domain. What do the data fields mean? What data do we need to collect? Based on the expertise, we explore and analyze raw data by applying the above techniques, step by step. Data analysis is a process composed of the following steps: Data requirements: We have to define what kind of data will be collected based on the requirements or problem analysis. For example, if we want to detect a user's behavior while reading news on the internet, we should be aware of visited article links, date and time, article categories, and the user's time spent on different pages. Data collection: Data can be collected from a variety of sources: mobile, personal computer, camera, or recording device. It may also be obtained through different ways: communication, event, and interaction between person and person, person and device, or device and device. Data appears whenever and wherever in the world. The problem is, how can we find and gather it to solve our problem? This is the mission of this step. Data processing: Data that is initially obtained must be processed or organized for analysis. This process is performance-sensitive: How fast can we create, insert, update, or query data? For building a real product that has to process big data, we should consider this step carefully. What kind of database should we use to store data? What kind of data structure, such as analysis, statistics, or visualization, is suitable for our purposes? Data cleaning: After being processed and organized, the data may still contain duplicates or errors. Therefore, we need a cleaning step to reduce those situations and increase the quality of the results in the following steps. Common tasks include record matching, deduplication, or column segmentation. Depending on the type of data, we can apply several types of data cleaning. For example, a user's history of a visited news website might contain a lot of duplicate rows, because the user might have refreshed certain pages many times. For our specific issue, these rows might not carry any meaning when we explore the user's behavior. So, we should remove them before saving it to our database. Another situation we may encounter is click fraud on news—someone just wants to improve their website ranking or sabotage the website. In this case, the data will not help us to explore a user's behavior. We can use thresholds to check whether a visit page event comes from a real person or from malicious software. Exploratory data analysis: Now, we can start to analyze data through a variety of techniques referred to as exploratory data analysis. We may detect additional problems in data cleaning or discover requests for further data. Therefore, these steps may be iterative and repeated throughout the whole data analysis process. Data visualization techniques are also used to examine the data in graphs or charts. Visualization often facilitates the understanding of data sets, especially, if they are large or high-dimensional. Modelling and algorithms: A lot of mathematical formulas and algorithms may be applied to detect or predict useful knowledge from the raw data. For example, we can use similarity measures to cluster users who have exhibited similar news reading behavior and recommend articles of interest to them next time. Alternatively, we can detect users' gender based on their news reading behavior by applying classification models such as Support Vector Machine (SVM) or linear regression. Depending on the problem, we may use different algorithms to get an acceptable result. It can take a lot of time to evaluate the accuracy of the algorithms and to choose the best one to implement for a certain product. Data product: The goal of this step is to build data products that receive data input and generate output according to the problem requirements. We will apply computer science knowledge to implement our selected algorithms as well as manage the data storage. Overview of libraries in data analysis There are numerous data analysis libraries that help us to process and analyze data. They use different programming languages and have different advantages as well as disadvantages of solving various data analysis problems. Now, we introduce some common libraries that may be useful for you. They should give you an overview of libraries in the field. However, the rest of this focuses on Python-based libraries. Some of the libraries that use the Java language for data analysis are as follows: Weka: This is the library that I got familiar with, the first time I learned about data analysis. It has a graphical user interface that allows you to run experiments on a small dataset. This is great if you want to get a feel for what is possible in the data processing space. However, if you build a complex product, I think it is not the best choice because of its performance: sketchy API design, non-optimal algorithms, and little documentation (http://www.cs.waikato.ac.nz/ml/weka/). Mallet: This is another Java library that is used for statistical natural language processing, document classification, clustering, topic modelling, information extraction, and other machine learning applications on text. There is an add-on package to Mallet, called GRMM, that contains support for inference in general, graphical models, and training of Conditional random fields (CRF) with arbitrary graphical structure. In my experience, the library performance as well as the algorithms are better than Weka. However, its only focus is on text processing problems. The reference page is at http://mallet.cs.umass.edu/. Mahout: This is Apache's machine learning framework built on top of Hadoop; its goal is to build a scalable machine learning library. It looks promising, but comes with all the baggage and overhead of Hadoop. The Homepage is at http://mahout.apache.org/. Spark: This is a relatively new Apache project; supposedly up to a hundred times faster than Hadoop. It is also a scalable library that consists of common machine learning algorithms and utilities. Development can be done in Python as well as in any JVM language. The reference page is at https://spark.apache.org/docs/1.5.0/mllib-guide.html. Here are a few libraries that are implemented in C++: Vowpal Wabbit: This library is a fast out-of-core learning system sponsored by Microsoft Research and (previously) Yahoo! Research. It has been used to learn a tera-feature (1012) dataset on 1000 nodes in one hour. More information can be found in the publication at http://arxiv.org/abs/1110.4198. MultiBoost: This package is a multiclass, multilabel, and multitask classification boosting software implemented in C++. If you use this software, you should refer to the paper published in 2012, in the Journal of Machine Learning Research, MultiBoost: A Multi-purpose Boosting Package, D.Benbouzid, R. Busa-Fekete, N. Casagrande, F.-D. Collin, and B. Kégl. MLpack: This is also a C++ machine learning library, developed by the Fundamental Algorithmic and Statistical Tools Laboratory (FASTLab) at Georgia Tech. It focusses on scalability, speed, and ease-of-use and was presented at the BigLearning workshop of NIPS 2011. Its homepage is at http://www.mlpack.org/about.html. Caffe: The last C++ library we want to mention is Caffe. This is a deep learning framework made with expression, speed, and modularity in mind. It is developed by the Berkeley Vision and Learning Center (BVLC) and community contributors. You can find more information about it at http://caffe.berkeleyvision.org/. Other libraries for data processing and analysis are as follows: Statsmodels: This is a great Python library for statistical modelling and is mainly used for predictive and exploratory analysis. Modular toolkit for data processing (MDP):This is a collection of supervised and unsupervised learning algorithms and other data processing units that can be combined into data processing sequences and more complex feed-forward network architectures (http://mdp-toolkit.sourceforge.net/index.html). Orange: This is an open source data visualization and analysis for novices and experts. It is packed with features for data analysis and has add-ons for bioinformatics and text mining. It contains an implementation of self-organizing maps, which sets it apart from the other projects as well (http://orange.biolab.si/). Mirador: This is a tool for the visual exploration of complex datasets supporting Mac and Windows. It enables users to discover correlation patterns and derive new hypotheses from data (http://orange.biolab.si/). RapidMiner: This is another GUI-based tool for data mining, machine learning, and predictive analysis (https://rapidminer.com/). Theano: This bridges the gap between Python and lower-level languages. Theano gives very significant performance gains, particularly for large matrix operations and is, therefore, a good choice for deep learning models. However, it is not easy to debug because of the additional compilation layer. Natural language processing toolkit (NLTK): This is written in Python with very unique and salient features. Here, I could not list all libraries for data analysis. However, I think the above libraries are enough to take a lot of your time to learn and build data analysis applications. Python libraries in data analysis Python is a multi-platform, general purpose programming language that can run on Windows, Linux/Unix, and Mac OS X, and has been ported to the Java and the .NET virtual machines as well. It has a powerful standard library. In addition, it has many libraries for data analysis: Pylearn2, Hebel, Pybrain, Pattern, MontePython, and MILK. We will cover some common Python data analysis libraries such as Numpy, Pandas, Matplotlib, PyMongo, and scikit-learn. Now, to help you getting started, I will briefly present an overview of each library for those who are less familiar with the scientific Python stack. NumPy One of the fundamental packages used for scientific computing with Python is Numpy. Among other things, it contains the following: A powerful N-dimensional array object Sophisticated (broadcasting) functions for performing array computations Tools for integrating C/C++ and Fortran code Useful linear algebra operations, Fourier transforms, and random number capabilities. Besides this, it can also be used as an efficient multidimensional container of generic data. Arbitrary data types can be defined and integrated with a wide variety of databases. Pandas Pandas is a Python package that supports rich data structures and functions for analyzing data and is developed by the PyData Development Team. It is focused on the improvement of Python's data libraries. Pandas consists of the following things: A set of labelled array data structures; the primary of which are Series, DataFrame, and Panel Index objects enabling both simple axis indexing and multilevel/hierarchical axis indexing An integrated group by engine for aggregating and transforming datasets Date range generation and custom date offsets Input/output tools that loads and saves data from flat files or PyTables/HDF5 format Optimal memory versions of the standard data structures Moving window statistics and static and moving window linear/panel regression Because of these features, Pandas is an ideal tool for systems that need complex data structures or high-performance time series functions such as financial data analysis applications. Matplotlib Matplotlib is the single most used Python package for 2D-graphic. It provides both a very quick way to visualize data from Python and publication-quality figures in many formats: line plots, contour plots, scatter plots, or Basemap plot. It comes with a set of default settings, but allows customizing all kinds of properties. However, we can easily create our chart with the defaults of almost every property in Matplotlib. PyMongo MongoDB is a type of NoSQL database. It is highly scalable, robust, and perfect to work with JavaScript-based web applications because we can store data as JSON documents and use flexible schemas. PyMongo is a Python distribution containing tools for working with MongoDB. Many tools have also been written for working with PyMongo to add more features such as MongoKit, Humongolus, MongoAlchemy, and Ming. scikit-learn scikit-learn is an open source machine learning library using the Python programming language. It supports various machine learning models, such as classification, regression, and clustering algorithms, interoperated with the Python numerical and scientific libraries NumPy and SciPy. The latest scikit-learn version is 0.16.1, published in April 2015. Summary In this article, there were three main points that we presented. Firstly, we figured out the relationship between raw data, information and knowledge. Because of its contribution in our life, we continued to discuss an overview of data analysis and processing steps in the second part. Finally, we introduced a few common supported libraries that are useful for practical data analysis applications. Among those we will focus on Python libraries in data analysis. Resources for Article: Further resources on this subject: Exploiting Services with Python [Article] Basics of Jupyter Notebook and Python [Article] How to do Machine Learning with Python [Article]

0
0
28651

article-image-integrating-elasticsearch-hadoop-ecosystem

Packt

07 Oct 2015

14 min read

Integrating Elasticsearch with the Hadoop ecosystem

Packt

07 Oct 2015

14 min read

0
0
4453

Exploring Windows PowerShell 5.0

Securing Your Data

Distributing your HTML5 Game

Creating a graph application with Python, Neo4j, Gephi & Linkurious.js

Basics of Jupyter Notebook and Python

How to Develop for Today's Wearable Tech with Pebble.js

Asynchronous Communication between Components

Linux Shell Scripting

First Principle and a Useful Way to Think

Creating a City Information App with Customized Table Views

Trending Topics

Real-time Communication with SocketIO

Fingerprint detection using OpenCV 3

Managing Pools for Desktops

Introduction to Data Analysis and Libraries

Integrating Elasticsearch with the Hadoop ecosystem

Create a Free Account To Continue Reading

Sign in to activate your 7-day free access