How-To Tutorials

10 Aug 2015

4 min read

Sending and Syncing Data

10 Aug 2015

This article, by Steven F. Daniel, author of the book, Android Wearable Programming, will provide you with the background and understanding of how you can effectively build applications that communicate between the Android handheld device and the Android wearable. Android Wear comes with a number of APIs that will help to make communicating between the handheld and the wearable a breeze. We will be learning the differences between using MessageAPI, which is sometimes referred to as a "fire and forget" type of message, and DataLayerAPI that supports syncing of data between a handheld and a wearable, and NodeAPI that handles events related to each of the local and connected device nodes. (For more resources related to this topic, see here.) Creating a wearable send and receive application In this section, we will take a look at how to create an Android wearable application that will send an image and a message, and display this on our wearable device. In the next sections, we will take a look at the steps required to send data to the Android wearable using DataAPI, NodeAPI, and MessageAPIs. Firstly, create a new project in Android Studio by following these simple steps: Launch Android Studio, and then click on the File | New Project menu option. Next, enter SendReceiveData for the Application name field. Then, provide the name for the Company Domain field. Now, choose Project location and select where you would like to save your application code: Click on the Next button to proceed to the next step. Next, we will need to specify the form factors for our phone/tablet and Android Wear devices using which our application will run. On this screen, we will need to choose the minimum SDK version for our phone/tablet and Android Wear. Click on the Phone and Tablet option and choose API 19: Android 4.4 (KitKat) for Minimum SDK. Click on the Wear option and choose API 21: Android 5.0 (Lollipop) for Minimum SDK: Click on the Next button to proceed to the next step. In our next step, we will need to add Blank Activity to our application project for the mobile section of our app. From the Add an activity to Mobile screen, choose the Add Blank Activity option from the list of activities shown and click on the Next button to proceed to the next step: Next, we need to customize the properties for Blank Activity so that it can be used by our application. Here we will need to specify the name of our activity, layout information, title, and menu resource file. From the Customize the Activity screen, enter MobileActivity for Activity Name shown and click on the Next button to proceed to the next step in the wizard: In the next step, we will need to add Blank Activity to our application project for the Android wearable section of our app. From the Add an activity to Wear screen, choose the Blank Wear Activity option from the list of activities shown and click on the Next button to proceed to the next step: Next, we need to customize the properties for Blank Wear Activity so that our Android wearable can use it. Here we will need to specify the name of our activity and the layout information. From the Customize the Activity screen, enter WearActivity for Activity Name shown and click on the Next button to proceed to the next step in the wizard: Finally, click on the Finish button and the wizard will generate your project and after a few moments, the Android Studio window will appear with your project displayed. Summary In this article, we learned about three new APIs, DataAPI, NodeAPI, and MessageAPIs, and how we can use them and their associated methods to transmit information between the handheld mobile and the wearable. If, for whatever reason, the connected wearable node gets disconnected from the paired handheld device, the DataApi class is smart enough to try sending again automatically once the connection is reestablished. Resources for Article: Further resources on this subject: Speeding up Gradle builds for Android [article] Saying Hello to Unity and Android [article] Testing with the Android SDK [article]

0
0
7490

How-To Tutorials

article-image-securing-openstack-networking

Packt

10 Aug 2015

10 min read

Securing OpenStack Networking

Packt

10 Aug 2015

10 min read

In this article by Fabio Alessandro Locati, author of the book OpenStack Cloud Security, you will learn about the importance of firewall, IDS, and IPS. You will also learn about Generic Routing Encapsulation, VXLAN. (For more resources related to this topic, see here.) The importance of firewall, IDS, and IPS The security of a network can and should be achieved in multiple ways. Three components that are critical to the security of a network are: Firewall Intrusion detection system (IDS) Intrusion prevention system (IPS) Firewall Firewalls are systems that control traffic passing through them based on rules. This can seem something like a router, but they are very different. The router allows communication between different networks while the firewall limits communication between networks and hosts. The root of this confusion may occur because very often the router will have the firewall functionality and vice versa. Firewalls need to be connected in a series to your infrastructure. The first paper on the firewall technology appeared in 1988 and designed the packet filter firewall. This kind of firewall is often known as first generation firewall. This kind of firewall analyzes the packages passing through and if the package matches a rule, the firewall will act accordingly to that rule. This firewall will analyze each package by itself and will not consider other aspects such as other packages. It works on the first three layers of the OSI model with very few features using layer 4 specifically to check port numbers and protocols (UDP/TCP). First generation firewalls are still in use, because in a lot of situations, to do the job properly and are cheap and secure. Examples of typical filtering those firewalls prohibit (or allow) to IPs of certain classes (or specific IPs), to access certain IPs, or allow traffic to a specific IP only on specific ports. There are no known attacks to those kind of firewalls, but specific models can have specific bugs that can be exploited. In 1990, a new generation of firewall appeared. The initial name was circuit-level gateway, but today it is far more commonly known as stateful firewalls or second generation firewall. These firewalls are able to understand when connections are being initialized and closed so that the firewall comes to know what is the current state of a connection when a package arrives. To do so, this kind of firewall uses the first four layers of the networking stack. This allows the firewall to drop all packages that are not establishing a new connection or are in an already established connection. These firewalls are very powerful with the TCP protocol because it has states, while they have very small advantages compared to first generation firewalls handling UDP or ICMP packages, since those packages travel with no connection. In these cases, the firewall sets the connection as established; only the first valid package passes through and closes it after the connection times out. Performance-wise, stateful firewall can be faster than packet firewall because if the package is part of an active connection, no further test will be performed against that package. These kinds of firewalls are more susceptible to bugs in their code since reading more about the package makes it easier to exploit. Also, on many devices, it is possible to open connections (with SYN packages) until the firewall is saturated. In such cases, the firewall usually downgrades itself as a simple router allowing all traffic to pass through it. In 1991, improvements were made to the stateful firewall allowing it to understand more about the protocol of the package it was evaluating. The firewalls of this kind before 1994 had major problems, such as working as a proxy that the user had to interact with. In 1994, the first application firewall, as we know it, was born doing all its job completely transparently. To be able to understand the protocol, this kind of firewall requires an understanding of all seven layers of the OSI model. As for security, the same as the stateful firewall does apply to the application firewall as well. Intrusion detection system (IDS) IDSs are systems that monitor the network traffic looking for policy violation and malicious traffic. The goal of the IDS is not to block malicious activity, but instead to log and report them. These systems act in a passive mode, so you'll not see any traffic coming from them. This is very important because it makes them invisible to attackers so you can gain information about the attack, without the attacker knowing. IDSs need to be connected in parallel to your infrastructure. Intrusion prevention system (IPS) IPSs are sometimes referred to as Intrusion Detection and Prevention Systems (IDPS), since they are IDS that are also able to fight back malicious activities. IPSs have greater possibility to act than IDSs. Other than reporting, like IDS, they can also drop malicious packages, reset the connection, and block the traffic from the offending IP address. IPSs need to be connected in series to your infrastructure. Generic Routing Encapsulation (GRE) GRE is a Cisco tuning protocol that is difficult to position in the OSI model. The best place for it to be is between layers 2 and 3. Being above layer 2 (where VLANs are), we can use GRE inside VLAN. We will not go deep into the technicalities of this protocol. I'd like to focus more on the advantages and disadvantages it has over VLAN. The first advantage of (extended) GRE over VLAN is scalability. In fact, VLAN is limited to 4,096, while GRE tunnels do not have this limitation. If you are running a private cloud and you are working in a small corporation, 4,096 networks could be enough, but will definitely not be enough if you work for a big corporation or if you are running a public cloud. Also, unless you use VTP for your VLANs, you'll have to add VLANs to each network device, while GREs don't need this. You cannot have more than 4,096 VLANs in an environment. The second advantage is security. Since you can deploy multiple GRE tunnels in a single VLAN, you can connect a machine to a single VLAN and multiple GRE networks without the risks that come with putting a port in trunking that is needed to bring more VLANs in the same physical port. For these reasons, GRE has been a very common choice in a lot of OpenStack clusters deployed up to OpenStack Havana. The current preferred networking choice (since Icehouse) is Virtual Extensible LAN (VXLAN). VXLAN VXLAN is a network virtualization technology whose specifications have been originally created by Arista Networks, Cisco, and VMWare, and many other companies have backed the project. Its goal is to offer a standardized overlay encapsulation protocol and it was created because the standard VLAN were too limited for the current cloud needs and the GRE protocol was a Cisco protocol. It works using layer 2 Ethernet frames within layer 4 UDP packages on port 4789. As for the maximum number of networks, the limit is 16 million logical networks. Since the Icehouse release, the suggested standard for networking is VXLAN. Flat network versus VLAN versus GRE in OpenStack Quantum In OpenStack Quantum, you can decide to use multiple technologies for your networks: flat network, VLAN, GRE, and the most recent, VXLAN. Let's discuss them in detail: Flat network: It is often used in private clouds since it is very easy to set up. The downside is that any virtual machine will see any other virtual machines in our cloud. I strongly discourage people from using this network design because it's unsafe, and in the long run, it will have problems, as we have seen earlier. VLAN: It is sometimes used in bigger private clouds and sometimes even in small public clouds. The advantage is that many times you already have a VLAN-based installation in your company. The major disadvantages are the need to trunk ports for each physical host and the possible problems in propagation. I discourage this approach, since in my opinion, the advantages are very limited while the disadvantages are pretty strong. VXLAN: It should be used in any kind of cloud due to its technical advantages. It allows a huge number of networks, its way more secure, and often eases debugging. GRE: Until the Havana release, it was the suggested protocol, but since the Icehouse release, the suggestion has been to move toward VXLAN, where the majority of the development is focused. Design a secure network for your OpenStack deployment As for the physical infrastructure, we have to design it securely. We have seen that the network security is critical and that there a lot of possible attacks in this realm. Is it possible to design a secure environment to run OpenStack? Yes it is, if you remember a few rules: Create different networks, at the very least for management and external data (this network usually already exists in your organization and is the one where all your clients are) Never put ports on trunking mode if you use VLANs in your infrastructure, otherwise physically separated networks will be needed The following diagram is an example of how to implement it: Here, the management, tenant external networks could be either VLAN or real networks. Remember that to not use VLAN trunking, you need at least the same amount of physical ports as of VLAN, and the machine has to be subscribed to avoid port trunking that can be a huge security hole. A management network is needed for the administrator to administer the machines and for the OpenStack services to speak to each other. This network is critical, since it may contain sensible data, and for this reason, it has to be disconnected from other networks, or if not possible, have very limited connectivity. The external network is used by virtual machines to access the Internet (and vice versa). In this network, all machines will need an IP address reachable from the Web. The tenant network, sometimes even called internal or guest network is the network where the virtual machines can communicate with other virtual machines in the same cloud. This network, in some deployment cases, can be merged with the external network, but this choice has some security drawbacks. The API network is used to expose OpenStack APIs to the users. This network requires IP addresses reachable from the Web, and for this reason, is often merged into the external network. There are cases where provider networks are needed to connect tenant networks to existing networks outside the OpenStack cluster. Those networks are created by the OpenStack administrator and map directly to an existing physical network in the data center. Summary In this article, we have seen how networking works, which attacks we can expect, and how we can counter them. Also, we have seen how to implement a secure deployment of OpenStack Networking. Resources for Article: Further resources on this subject: Cloud distribution points [Article] Photo Stream with iCloud [Article] Integrating Accumulo into Various Cloud Platforms [Article]

0
0
12711

Packt

10 Aug 2015

13 min read

Share and Share Alike

Packt

10 Aug 2015

13 min read

0
0
2211

How-To Tutorials

article-image-creating-functions-and-operations

Packt

10 Aug 2015

18 min read

Creating Functions and Operations

Packt

10 Aug 2015

18 min read

In this article by Alex Libby, author of the book Sass Essentials, we will learn how to use operators or functions to construct a whole site theme from just a handful of colors, or defining font sizes for the entire site from a single value. You will learn how to do all these things in this article. Okay, so let's get started! (For more resources related to this topic, see here.) Creating values using functions and operators Imagine a scenario where you're creating a masterpiece that has taken days to put together, with a stunning choice of colors that has taken almost as long as the building of the project and yet, the client isn't happy with the color choice. What to do? At this point, I'm sure that while you're all smiles to the customer, you'd be quietly cursing the amount of work they've just landed you with, this late on a Friday. Sound familiar? I'll bet you scrap the colors and go back to poring over lots of color combinations, right? It'll work, but it will surely take a lot more time and effort. There's a better way to achieve this; instead of creating or choosing lots of different colors, we only need to choose one and create all of the others automatically. How? Easy! When working with Sass, we can use a little bit of simple math to build our color palette. One of the key tenets of Sass is its ability to work out values dynamically, using nothing more than a little simple math; we could define font sizes from H1 to H6 automatically, create new shades of colors, or even work out the right percentages to use when creating responsive sites! We will take a look at each of these examples throughout the article, but for now, let's focus on the principles of creating our colors using Sass. Creating colors using functions We can use simple math and functions to create just about any type of value, but colors are where these two really come into their own. The great thing about Sass is that we can work out the hex value for just about any color we want to, from a limited range of colors. This can easily be done using techniques such as adding two values together, or subtracting one value from another. To get a feel of how the color operators work, head over to the official documentation at http://sass-lang.com/documentation/file.SASS_REFERENCE.html#color_operations—it is worth reading! Nothing wrong with adding or subtracting values—it's a perfectly valid option, and will result in a valid hex code when compiled. But would you know that both values are actually deep shades of blue? Therein lies the benefit of using functions; instead of using math operators, we can simply say this: p { color: darken(#010203, 10%); } This, I am sure you will agree, is easier to understand as well as being infinitely more readable! The use of functions opens up a world of opportunities for us. We can use any one of the array of functions such as lighten(), darken(), mix(), or adjust-hue() to get a feel of how easy it is to get the values. If we head over to http://jackiebalzer.com/color, we can see that the author has exploded a number of Sass (and Compass—we will use this later) functions, so we can see what colors are displayed, along with their numerical values, as soon as we change the initial two values. Okay, we could play with the site ad infinitum, but I feel a demo coming on—to explore the effects of using the color functions to generate new colors. Let's construct a simple demo. For this exercise, we will dig up a copy of the colorvariables demo and modify it so that we're only assigning one color variable, not six. For this exercise, I will assume you are using Koala to compile the code. Okay, let's make a start: We'll start with opening up a copy of colorvariables.scss in your favorite text editor and removing lines 1 to 15 from the start of the file. Next, add the following lines, so that we should be left with this at the start of the file: $darkRed: #a43; $white: #fff; $black: #000; $colorBox1: $darkRed; $colorBox2: lighten($darkRed, 30%); $colorBox3: adjust-hue($darkRed, 35%); $colorBox4: complement($darkRed); $colorBox5: saturate($darkRed, 30%); $colorBox6: adjust-color($darkRed, $green: 25); Save the file as colorfunctions.scss. We need a copy of the markup file to go with this code, so go ahead and extract a copy of colorvariables.html from the code download, saving it as colorfunctions.html in the root of our project area. Don't forget to change the link for the CSS file within to colorfunctions.css! Fire up Koala, then drag and drop colorfunctions.scss from our project area over the main part of the application window to add it to the list: Right-click on the file name and select Compile, and then wait for it to show Success in a green information box. If we preview the results of our work in a browser, we should see the following boxes appear: At this point, we have a working set of colors—granted, we might have to work a little on making sure that they all work together. But the key point here is that we have only specified one color, and that the others are all calculated automatically through Sass. Now that we are only defining one color by default, how easy is it to change the colors in our code? Well, it is a cinch to do so. Let's try it out using the help of the SassMeister playground. Changing the colors in use We can easily change the values used in the code, and continue to refresh the browser after each change. However, this isn't a quick way to figure out which colors work; to get a quicker response, there is an easier way: use the online Sass playground at http://www.sassmeister.com. This is the perfect way to try out different colors—the site automatically recompiles the code and updates the result as soon as we make a change. Try copying the HTML and SCSS code into the play area to view the result. The following screenshot shows the same code used in our demo, ready for us to try using different calculations: All images work on the principle that we take a base color (in this case, $dark-blue, or #a43), then adjust the color either by a percentage or a numeric value. When compiled, Sass calculates what the new value should be and uses this in the CSS. Take, for example, the color used for #box6, which is a dark orange with a brown tone, as shown in this screenshot: To get a feel of some of the functions that we can use to create new colors (or shades of existing colors), take a look at the main documentation at http://sass-lang.com/documentation/Sass/Script/Functions.html, or https://www.makerscabin.com/web/sass/learn/colors. These sites list a variety of different functions that we can use to create our masterpiece. We can also extend the functions that we have in Sass with the help of custom functions, such as the toolbox available at https://github.com/at-import/color-schemer—this may be worth a look. In our demo, we used a dark red color as our base. If we're ever stuck for ideas on colors, or want to get the right HEX, RGB(A), or even HSL(A) codes, then there are dozens of sites online that will give us these values. Here are a couple of them that you can try: HSLa Explorer, by Chris Coyier—this is available at https://css-tricks.com/examples/HSLaExplorer/. HSL Color Picker by Brandon Mathis—this is available at http://hslpicker.com/. If we know the name, but want to get a Sass value, then we can always try the list of 1,500+ colors at https://github.com/FearMediocrity/sass-color-palettes/blob/master/colors.scss. What's more, the list can easily be imported into our CSS, although it would make better sense to simply copy the chosen values into our Sass file, and compile from there instead. Mixing colors The one thing that we've not discussed, but is equally useful is that we are not limited to using functions on their own; we can mix and match any number of functions to produce our colors. A great way to choose colors, and get the appropriate blend of functions to use, is at http://sassme.arc90.com/. Using the available sliders, we can choose our color, and get the appropriate functions to use in our Sass code. The following image shows how: In most cases, we will likely only need to use two functions (a mix of darken and adjust hue, for example); if we are using more than two–three functions, then we should perhaps rethink our approach! In this case, a better alternative is to use Sass's mix() function, as follows: $white: #fff; $berry: hsl(267, 100%, 35%); p { mix($white, $berry, 0.7) } …which will give the following valid CSS: p { color: #5101b3; } This is a useful alternative to use in place of the command we've just touched on; after all, would you understand what adjust_hue(desaturate(darken(#db4e29, 2), 41), 67) would give as a color? Granted, it is something of an extreme calculation, nonetheless, it is technically valid. If we use mix() instead, it matches more closely to what we might do, for example, when mixing paint. After all, how else would we lighten its color, if not by adding a light-colored paint? Okay, let's move on. What's next? I hear you ask. Well, so far we've used core Sass for all our functions, but it's time to go a little further afield. Let's take a look at how you can use external libraries to add extra functionality. In our next demo, we're going to introduce using Compass, which you will often see being used with Sass. Using an external library So far, we've looked at using core Sass functions to produce our colors—nothing wrong with this; the question is, can we take things a step further? Absolutely, once we've gained some experience with using these functions, we can introduce custom functions (or helpers) that expand what we can do. A great library for this purpose is Compass, available at http://www.compass-style.org; we'll make use of this to change the colors which we created from our earlier boxes demo, in the section, Creating colors using functions. Compass is a CSS authoring framework, which provides extra mixins and reusable patterns to add extra functionality to Sass. In our demo, we're using shade(), which is one of the several color helpers provided by the Compass library. Let's make a start: We're using Compass in this demo, so we'll begin with installing the library. To do this, fire up Command Prompt, then navigate to our project area. We need to make sure that our installation RubyGems system software is up to date, so at Command Prompt, enter the following, and then press Enter: gem update --system Next, we're installing Compass itself—at the prompt, enter this command, and then press Enter: gem install compass Compass works best when we get it to create a project shell (or template) for us. To do this, first browse to http://www.compass-style.org/install, and then enter the following in the Tell us about your project… area: Leave anything in grey text as blank. This produces the following commands—enter each at Command Prompt, pressing Enter each time: Navigate back to Command Prompt. We need to compile our SCSS code, so go ahead and enter this command at the prompt (or copy and paste it), then press Enter: compass watch –sourcemap Next, extract a copy of the colorlibrary folder from the code download, and save it to the project area. In colorlibrary.scss, comment out the existing line for $backgrd_box6_color, and add the following immediately below it: $backgrd_box6_color: shade($backgrd_box5_color, 25%); Save the changes to colorlibrary.scss. If all is well, Compass's watch facility should kick in and recompile the code automatically. To verify that this has been done, look in the css subfolder of the colorlibrary folder, and you should see both the compiled CSS and the source map files present. If you find Compass compiles files in unexpected folders, then try using the following command to specify the source and destination folders when compiling: compass watch --sass-dir sass --css-dir css If all is well, we will see the boxes, when previewing the results in a browser window, as in the following image. Notice how Box 6 has gone a nice shade of deep red (if not almost brown)? To really confirm that all the changes have taken place as required, we can fire up a DOM inspector such as Firebug; a quick check confirms that the color has indeed changed: If we explore even further, we can see that the compiled code shows that the original line for Box 6 has been commented out, and that we're using the new function from the Compass helper library: This is a great way to push the boundaries of what we can do when creating colors. To learn more about using the Compass helper functions, it's worth exploring the official documentation at http://compass-style.org/reference/compass/helpers/colors/. We used the shade() function in our code, which darkens the color used. There is a key difference to using something such as darken() to perform the same change. To get a feel of the difference, take a look at the article on the CreativeBloq website at http://www.creativebloq.com/css3/colour-theming-sass-and-compass-6135593, which explains the difference very well. The documentation is a little lacking in terms of how to use the color helpers; the key is not to treat them as if they were normal mixins or functions, but to simply reference them in our code. To explore more on how to use these functions, take a look at the article by Antti Hiljá at http://clubmate.fi/how-to-use-the-compass-helper-functions/. We can, of course, create mixins to create palettes—for a more complex example, take a look at http://www.zingdesign.com/how-to-generate-a-colour-palette-with-compass/ to understand how such a mixin can be created using Compass. Okay, let's move on. So far, we've talked about using functions to manipulate colors; the flip side is that we are likely to use operators to manipulate values such as font sizes. For now, let's change tack and take a look at creating new values for changing font sizes. Changing font sizes using operators We already talked about using functions to create practically any value. Well, we've seen how to do it with colors; we can apply similar principles to creating font sizes too. In this case, we set a base font size (in the same way that we set a base color), and then simply increase or decrease font sizes as desired. In this instance, we won't use functions, but instead, use standard math operators, such as add, subtract, or divide. When working with these operators, there are a couple of points to remember: Sass math functions preserve units—this means we can't work on numbers with different units, such as adding a px value to a rem value, but can work with numbers that can be converted to the same format, such as inches to centimeters If we multiply two values with the same units, then this will produce square units (that is, 10px * 10px == 100px * px). At the same time, px * px will throw an error as it is an invalid unit in CSS. There are some quirks when working with / as a division operator —in most instances, it is normally used to separate two values, such as defining a pair of font size values. However, if the value is surrounded in parentheses, used as a part of another arithmetic expression, or is stored in a variable, then this will be treated as a division operator. For full details, it is worth reading the relevant section in the official documentation at http://sass-lang.com/documentation/file.Sass_REFERENCE.html#division-and-slash. With these in mind, let's create a simple demo—a perfect use for Sass is to automatically work out sizes from H1 through to H6. We could just do this in a simple text editor, but this time, let's break with tradition and build our demo directly into a session on http://www.sassmeister.com. We can then play around with the values set, and see the effects of the changes immediately. If we're happy with the results of our work, we can copy the final version into a text editor and save them as standard SCSS (or CSS) files. Let's begin by browsing to http://www.sassmeister.com, and adding the following HTML markup window: <html> <head> <meta charset="utf-8" /> <title>Demo: Assigning colors using variables</title> <link rel="stylesheet" type="text/css" href="css/ colorvariables.css"> </head> <body> <h1>The cat sat on the mat</h1> <h2>The cat sat on the mat</h2> <h3>The cat sat on the mat</h3> <h4>The cat sat on the mat</h4> <h5>The cat sat on the mat</h5> <h6>The cat sat on the mat</h6> </body> </html> Next, add the following to the SCSS window—we first set a base value of 3.0, followed by a starting color of #b26d61, or a dark, moderate red: $baseSize: 3.0; $baseColor: #b26d61; We need to add our H1 to H6 styles. The rem mixin was created by Chris Coyier, at https://css-tricks.com/snippets/css/less-mixin-for-rem-font-sizing/. We first set the font size, followed by setting the font color, using either the base color set earlier, or a function to produce a different shade: h1 { font-size: $baseSize; color: $baseColor; } h2 { font-size: ($baseSize - 0.2); color: darken($baseColor, 20%); } h3 { font-size: ($baseSize - 0.4); color: lighten($baseColor, 10%); } h4 { font-size: ($baseSize - 0.6); color: saturate($baseColor, 20%); } h5 { font-size: ($baseSize - 0.8); color: $baseColor - 111; } h6 { font-size: ($baseSize - 1.0); color: rgb(red($baseColor) + 10, 23, 145); } SassMeister will automatically compile the code to produce a valid CSS, as shown in this screenshot: Try changing the base size of 3.0 to a different value—using http://www.sassmeister.com, we can instantly see how this affects the overall size of each H value. Note how we're multiplying the base variable by 10 to set the pixel value, or simply using the value passed to render each heading. In each instance, we can concatenate the appropriate unit using a plus (+) symbol. We then subtract an increasing value from $baseSize, before using this value as the font size for the relevant H value. You can see a similar example of this by Andy Baudoin as a CodePen, at http://codepen.io/baudoin/pen/HdliD/. He makes good use of nesting to display the color and strength of shade. Note that it uses a little JavaScript to add the text of the color that each line represents, and can be ignored; it does not affect the Sass used in the demo. The great thing about using a site such SassMeister is that we can play around with values and immediately see the results. For more details on using number operations in Sass, browse to the official documentation, which is at http://sass-lang.com/documentation/file.Sass_REFERENCE.html#number_operations. Okay, onwards we go. Let's turn our attention to creating something a little more substantial; we're going to create a complete site theme using the power of Sass and a few simple calculations. Summary Phew! What a tour! One of the key concepts of Sass is the use of functions and operators to create values, so let's take a moment to recap what we have covered throughout this article. We kicked off with a look at creating color values using functions, before discovering how we can mix and match different functions to create different shades, or using external libraries to add extra functionality to Sass. We then moved on to take a look at another key use of functions, with a look at defining different font sizes, using standard math operators. Resources for Article: Further resources on this subject: Nesting, Extend, Placeholders, and Mixins [article] Implementation of SASS [article] Constructing Common UI Widgets [article]

0
0
4138

How-To Tutorials

article-image-using-arcpy-dataaccess-module-withfeature-classesand-tables

Packt

10 Aug 2015

28 min read

Using the ArcPy DataAccess Module withFeature Classesand Tables

Packt

10 Aug 2015

28 min read

0
0
7055

Packt

10 Aug 2015

8 min read

Hands-on with Prezi Mechanics

Packt

10 Aug 2015

8 min read

In this In this article by J.J. Sylvia IV, author of the book Mastering Prezi for Business Presentations - Second Edition, we will see how to edit a figure and to style symbols. Also we will see the Grouping feature and brief introduction of the Prezi text editor. (For more resources related to this topic, see here.) Editing lines When editing lines or arrows, you can change them from being straight to curved by dragging the center point in any direction: This is extremely useful when creating the line drawings we saw earlier. It's also useful to get arrows pointing at various objects on your canvas: Styled symbols If you're on a tight deadline, or trying to create drawings with shapes simply isn't for you, then the styles available in Prezi may be of more interest to you. These are common symbols that Prezi has created in a few different styles that can be easily inserted into any of your presentations. You can select these from the same Symbols & shapes… option from the Insert menu where we found the symbols. You'll see several different styles to choose from on the right-hand side of your screen. Each of these categories has similar symbols, but styled differently. There is a wide variety of symbols available ranging from people to social media logos. You can pick a style that best matches your theme or the atmosphere you've created for your presentation. Instead of creating your own person from shapes, you can select from a variety of people symbols available: Although these symbols can be very handy, you should be aware that you can't edit them as part of your presentation. If you decide to use one, note that it will work as it is—there are no new hairstyles for these symbols. Highlighter The highlighter tool is extremely useful for pointing out key pieces of information such as an interesting fact. To use it, navigate to the Insert menu and select the Highlighter option. Then, just drag the cursor across the text you'd like to highlight. Once you've done this, the highlighter marks become objects in their own right, so you can click on them to change their size or position just as you would do for a shape. To change the color of your highlighter, you will need to go into the Theme Wizard and edit the RGB values. We'll cover how to do this later when we discuss branding. Grouping Grouping is a great feature that allows you to move or edit several different elements of your presentation at once. This can be especially useful if you're trying to reorganize the layout of your Prezi after it's been created, or to add animations to several elements at once. Let's go back to the drawing we created earlier to see how this might work: The first way to group items is to hold down the Ctrl key (Command on Mac OS) and to left-click on each element you want to group individually. In this case, I need to click on each individual line that makes up the flat top hair in the preceding image. This might be necessary if I only want to group the hair, for example: Another method for grouping is to hold down the Shift key while dragging your mouse to select multiple items at once. In the preceding screenshot, I've selected my entire person at once. Now, I can easily rotate, resize, or move the entire person at once, without having to move each individual line or shape. If you select a group of objects, move them, and then realize that a piece is missing because it didn't get selected, just press the Ctrl+Z (Command+Z on Mac OS) keys on your keyboard to undo the move. Then, broaden your selection and try again. Alternatively, you can hold down the Shift key and simply click on the piece you missed to add it to the group. If we want to keep these elements grouped together instead of having to reselect them each time we decide to make a change, we can click on the Group button that appears with this change. Now these items will stay grouped unless we click on the new Ungroup button, now located in the same place as the Group button previously was: You can also use frames to group material together. If you already created frames as part of your layout, this might make the grouping process even easier. Prezi text editor Over the years, the Prezi text editor has evolved to be quite robust, and it's now possible to easily do all of your text editing directly within Prezi. Spell checker When you spell something incorrectly, Prezi will underline the word it doesn't recognize with a red line. This is just as you would see it in Microsoft Word or any other text editor. To correct the word, simply right-click on it (or Command + Click on Mac OS) and select the word you meant to type from the suggestions, as shown in the following screenshot: The text drag-apart feature So a colleague of yours has just e-mailed you the text that they want to appear in the Prezi you're designing for them? That's great news as it'll help you understand the flow of the presentation. What's frustrating, though, is that you'll have to copy and paste every single line or paragraph across to put it in the right place on your canvas. At least, that used to be the case before Prezi introduced the drag-apart feature in the text editor. This means you can now easily drag a selection of text anywhere on your canvas without having to rely on the copy and paste options. Let's see how we can easily change the text we spellchecked previously, as shown in the following screenshot: In order to drag your text apart, simply highlight the area you require, hold the mouse button down, and then drag the text anywhere on your canvas. Once you have separated your text, you can then edit the separate parts as you would edit any other individual object on your canvas. In this example, we can change the size of the company name and leave the other text as it is, which we couldn't do within a single textbox: Building Prezis for colleagues If you've kindly offered to build a Prezi for one of your colleagues, ask them to supply the text for it in Word format. You'll be able to run a spellcheck on it from there before you copy and paste it into Prezi. Any bad spellings you miss will also get highlighted on your Prezi canvas but it's good to use both options as a safety net. Font colors Other than dragging text apart to make it stand out more on its own, you might want to highlight certain words so that they jump out at your audience even more. The great news is that you can now highlight individual lines of text or single words and change their color. To do so, just highlight a word by clicking and dragging your mouse across it. Then, click on the color picker at the top of the textbox to see the color menu, as shown in the following screenshot: Select any of the colors available in the palette to change the color of that piece of text. Nothing else in the textbox will be affected apart from the text you have selected. This gives you much greater freedom to use colored text in your Prezi design, and doesn't leave you restricted as in older versions of the software. Choose the right color To make good use of this feature, we recommend that you use a color that completely contrasts to the rest of your design. For example, if your design and corporate colors are blue, we suggest you use red or purple to highlight key words. Also, once you pick a color, stick to it throughout the presentation so that your audience knows when they see a key piece of information. Bullet points and indents Bullets and indents make it much easier to put together your business presentations and helps to give the audience some short, simple information as text in the same format they're used to seeing in other presentations. This can be done by simply selecting the main body of text and clicking on the bullet point icon at the top of the textbox. This is a really simple feature, but a useful one nonetheless. We'd obviously like to point out that too much text on any presentation is a bad thing. Keep it short and to the point. Also, remember that too many bullets can kill a presentation. Summary In this article, we discussed the basic mechanics of Prezi. Learning to combine these tools in creative ways will help you move from a Prezi novice to master. Shapes can be used creatively to create content and drawings, and can be grouped together for easy movement and editing. Prezi also features basic text editing which are explained in this article. Resources for Article: Further resources on this subject: Turning your PowerPoint into a Prezi [Article] The Fastest Way to Go from an Idea to a Prezi [Article] Using Prezi - The Online Presentation Software Tool [Article]

0
0
923

How-To Tutorials

Packt

10 Aug 2015

7 min read

Exploring Jenkins

Packt

10 Aug 2015

7 min read

0
0
2321

article-image-updating-and-building-our-masters

Packt

10 Aug 2015

20 min read

Updating and building our masters

Packt

10 Aug 2015

20 min read

0
0
9264

How-To Tutorials

Packt

10 Aug 2015

12 min read

Integrating with Muzzley

Packt

10 Aug 2015

12 min read

In this article by Miguel de Sousa, author of the book Internet of Things with Intel Galileo, we will cover the following topics: Wiring the circuit The Muzzley IoT ecosystem Creating a Muzzley app Lighting up the entrance door (For more resources related to this topic, see here.) One identified issue regarding IoT is that there will be lots of connected devices and each one speaks its own language, not sharing the same protocols with other devices. This leads to an increasing number of apps to control each of those devices. Every time you purchase connected products, you'll be required to have the exclusive product app, and, in the near future, where it is predicted that more devices will be connected to the Internet than people, this is indeed a problem, which is known as the basket of remotes. Many solutions have been appearing for this problem. Some of them focus on creating common communication standards between the devices or even creating their own protocol such as the Intel Common Connectivity Framework (CCF). A different approach consists in predicting the device's interactions, where collected data is used to predict and trigger actions on the specific devices. An example using this approach is Muzzley. It not only supports a common way to speak with the devices, but also learns from the users' interaction, allowing them to control all their devices from a common app, and on collecting usage data, it can predict users' actions and even make different devices work together. In this article, we will start by understanding what Muzzley is and how we can integrate with it. We will then do some development to allow you to control your own building's entrance door. For this purpose, we will use Galileo as a bridge to communicate with a relay and the Muzzley cloud, allowing you to control the door from a common mobile app and from anywhere as long as there is Internet access. Wiring the circuit In this article, we'll be using a real home AC inter-communicator with a building entrance door unlock button and this will require you to do some homework. This integration will require you to open your inter-communicator and adjust the inner circuit, so be aware that there are always risks of damaging it. If you don't want to use a real inter-communicator, you can replace it by an LED or even by the buzzer module. If you want to use a real device, you can use a DC inter-communicator, but in this guide, we'll only be explaining how to do the wiring using an AC inter-communicator. The first thing you have to do is to take a look at the device manual and check whether it works with AC current, and the voltage it requires. If you can't locate your product manual, search for it online. In this article, we'll be using the solid state relay. This relay accepts a voltage range from 24 V up to 380 V AC, and your inter-communicator should also work in this voltage range. You'll also need some electrical wires and electrical wires junctions: Wire junctions and the solid state relay This equipment will be used to adapt the door unlocking circuit to allow it to be controlled from the Galileo board using a relay. The main idea is to use a relay to close the door opener circuit, resulting in the door being unlocked. This can be accomplished by joining the inter-communicator switch wires with the relay wires. Use some wire and wire junctions to do it, as displayed in the following image: Wiring the circuit The building/house AC circuit is represented in yellow, and S1 and S2 represent the inter-communicator switch (button). On pressing the button, we will also be closing this circuit, and the door will be unlocked. This way, the lock can be controlled both ways, using the original button and the relay. Before starting to wire the circuit, make sure that the inter-communicator circuit is powered off. If you can't switch it off, you can always turn off your house electrical board for a couple of minutes. Make sure that it is powered off by pressing the unlock button and trying to open the door. If you are not sure of what you must do or don't feel comfortable doing it, ask for help from someone more experienced. Open your inter-communicator, locate the switch, and perform the changes displayed in the preceding image (you may have to do some soldering). The Intel Galileo board will then activate the relay using pin 13, where you should wire it to the relay's connector number 3, and the Galileo's ground (GND) should be connected to the relay's connector number 4. Beware that not all the inter-communicator circuits work the same way and although we try to provide a general way to do it, there're always the risk of damaging your device or being electrocuted. Do it at your own risk. Power on your inter-communicator circuit and check whether you can open the door by pressing the unlock door button. If you prefer not using the inter-communicator with the relay, you can always replace it with a buzzer or an LED to simulate the door opening. Also, since the relay is connected to Galileo's pin 13, with the same relay code, you'll have visual feedback from the Galileo's onboard LED. The Muzzley IoT ecosystem Muzzley is an Internet of Things ecosystem that is composed of connected devices, mobile apps, and cloud-based services. Devices can be integrated with Muzzley through the device cloud or the device itself: It offers device control, a rules system, and a machine learning system that predicts and suggests actions, based on the device usage. The mobile app is available for Android, iOS, and Windows phone. It can pack all your Internet-connected devices in to a single common app, allowing them to be controlled together, and to work with other devices that are available in real-world stores or even other homemade connected devices, like the one we will create in this article. Muzzley is known for being one of the first generation platforms with the ability to predict a users' actions by learning from the user's interaction with their own devices. Human behavior is mostly unpredictable, but for convenience, people end up creating routines in their daily lives. The interaction with home devices is an example where human behavior can be observed and learned by an automated system. Muzzley tries to take advantage of these behaviors by identifying the user's recurrent routines and making suggestions that could accelerate and simplify the interaction with the mobile app and devices. Devices that don't know of each others' existence get connected through the user behavior and may create synergies among themselves. When the user starts using the Muzzley app, the interaction is observed by a profiler agent that tries to acquire a behavioral network of the linked cause-effect events. When the frequency of these network associations becomes important enough, the profiler agent emits a suggestion for the user to act upon. For instance, if every time a user arrives home, he switches on the house lights, check the thermostat, and adjust the air conditioner accordingly, the profiler agent will emit a set of suggestions based on this. The cause of the suggestion is identified and shortcuts are offered for the effect-associated action. For instance, the user could receive in the Muzzley app the following suggestions: "You are arriving at a known location. Every time you arrive here, you switch on the «Entrance bulb». Would you like to do it now?"; or "You are arriving at a known location. The thermostat «Living room» says that the temperature is at 15 degrees Celsius. Would you like to set your «Living room» air conditioner to 21 degrees Celsius?" When it comes to security and privacy, Muzzley takes it seriously and all the collected data is used exclusively to analyze behaviors to help make your life easier. This is the system where we will be integrating our door unlocker. Creating a Muzzley app The first step is to own a Muzzley developer account. If you don't have one yet, you can obtain one by visiting https://muzzley.com/developers, clicking on the Sign up button, and submitting the displayed form. To create an app, click on the top menu option Apps and then Create app. Name your App Galileo Lock and if you want to, add a description to your project. As soon as you click on Submit, you'll see two buttons displayed, allowing you to select the integration type: Muzzley allows you to integrate through the product manufacturer cloud or directly with a device. In this example, we will be integrating directly with the device. To do so, click on Device to Cloud integration. Fill in the provider name as you wish and pick two image URLs to be used as the profile (for example, http://hub.packtpub.com/wp-content/uploads/2015/08/Commercial1.jpg) and channel (for example, http://hub.packtpub.com/wp-content/uploads/2015/08/lock.png) images. We can select one of two available ways to add our device: it can be done using UPnP discovery or by inserting a custom serial number. Pick the device discovery option Serial number and ignore the fields Interface UUID and Email Access List; we will come back for them later. Save your changes by pressing the Save changes button. Lighting up the entrance door Now that we can unlock our door from anywhere using the mobile phone with an Internet connection, a nice thing to have is the entrance lights turn on when you open the building door using your Muzzley app. To do this, you can use the Muzzley workers to define rules to perform an action when the door is unlocked using the mobile app. To do this, you'll need to own one of the Muzzley-enabled smart bulbs such as Philips Hue, WeMo LED Lighting, Milight, Easybulb, or LIFX. You can find all the enabled devices in the app profiles selection list: If you don't have those specific lighting devices but have another type of connected device, search the available list to see whether it is supported. If it is, you can use that instead. Add your bulb channel to your account. You should now find it listed in your channels under the category Lighting. If you click on it, you'll be able to control the lights. To activate the trigger option in the lock profile we created previously, go to the Muzzley website and head back to the Profile Spec app, located inside App Details. Expand the property lock status by clicking on the arrow sign in the property #1 - Lock Status section and then expand the controlInterfaces section. Create a new control interface by clicking on the +controlInterface button. In the new controlInterface #1 section, we'll need to define the possible choices of label-values for this property when setting a rule. Feel free to insert an id, and in the control interface option, select the text-picker option. In the config field, we'll need to specify each of the available options, setting the display label and the real value that will be published. Insert the following JSON object: {"options":[{"value":"true","label":"Lock"}, {"value":"false","label":"Unlock"}]}. Now we need to create a trigger. In the profile spec, expand the trigger section. Create a new trigger by clicking on the +trigger button. Inside the newly created section, select the equals condition. Create an input by clicking on +input, insert the ID value, insert the ID of the control interface you have just created in the controlInterfaceId text field. Finally, add the [{"source":"selection.value","target":"data.value"}].path to map the data. Open your mobile app and click on the workers icon. Clicking on Create Worker will display the worker creation menu to you. Here, you'll be able to select a channel component property as a trigger to some other channel component property: Select the lock and select the Lock Status is equal to Unlock trigger. Save it and select the action button. In here, select the smart bulb you own and select the Status On option: After saving this rule, give it a try and use your mobile phone to unlock the door. The smart bulb should then turn on. With this, you can configure many things in your home even before you arrive there. In this specific scenario, we used our door locker as a trigger to accomplish an action on a lightbulb. If you want, you can do the opposite and open the door when a lightbulb lights up a specific color for instance. To do it, similar to how you configured your device trigger, you just have to set up the action options in your device profile page. Summary Everyday objects that surround us are being transformed into information ecosystems and the way we interact with them is slowly changing. Although IoT is growing up fast, it is nowadays in an early stage, and many issues must be solved in order to make it successfully scalable. By 2020, it is estimated that there will be more than 25 billion devices connected to the Internet. This fast growth without security regulations and deep security studies are leading to major concerns regarding the two biggest IoT challenges—security and privacy. Devices in our home that are remotely controllable or even personal data information getting into the wrong hands could be the recipe for a disaster. In this article you have learned the basic steps in wiring the circuit of your Galileo board, creating a Muzzley app, and lighting up the entrance door of your building through your Muzzley app, by using Intel Galileo board as a bridge to communicate with Muzzley cloud. Resources for Article: Further resources on this subject: Getting Started with Intel Galileo [article] Getting the current weather forecast [article] Controlling DC motors using a shield [article]

0
0
8991

How-To Tutorials

Packt

10 Aug 2015

25 min read

Controls and Widgets

Packt

10 Aug 2015

25 min read

0
0
3200

article-image-understanding-hadoop-backup-and-recovery-needs

Packt

10 Aug 2015

25 min read

Understanding Hadoop Backup and Recovery Needs

Packt

10 Aug 2015

25 min read

0
0
7059

Packt

10 Aug 2015

17 min read

Using Handlebars with Express

Packt

10 Aug 2015

17 min read

In this article written by Paul Wellens, author of the book Practical Web Development, we cover a brief description about the following topics: Templates Node.js Express 4 Templates Templates come in many shapes or forms. Traditionally, they are non-executable files with some pre-formatted text, used as the basis of a bazillion documents that can be generated with a computer program. I worked on a project where I had to write a program that would take a Microsoft Word template, containing parameters like $first, $name, $phone, and so on, and generate a specific Word document for every student in a school. Web templating does something very similar. It uses a templating processor that takes data from a source of information, typically a database and a template, a generic HTML file with some kind of parameters inside. The processor then merges the data and template to generate a bunch of static web pages or dynamically generates HTML on the fly. If you have been using PHP to create dynamic webpages, you will have been busy with web templating. Why? Because you have been inserting PHP code inside HTML files in between the <?php and ?> strings. Your templating processor was the Apache web server that has many additional roles. By the time your browser gets to see the result of your code, it is pure HTML. This makes this an example of server side templating. You could also use Ajax and PHP to transfer data in the JSON format and then have the browser process that data using JavaScript to create the HTML you need. Combine this with using templates and you will have client side templating. Node.js What Le Sacre du Printemps by Stravinsky did to the world of classical music, Node.js may have done to the world of web development. At its introduction, it shocked the world. By now, Node.js is considered by many as the coolest thing. Just like Le Sacre is a totally different kind of music—but by now every child who has seen Fantasia has heard it—Node.js is a different way of doing web development. Rather than writing an application and using a web server to soup up your code to a browser, the application and the web server are one and the same. This may sound scary, but you should not worry as there is an entire community that developed modules you can obtain using the npm tool. Before showing you an example, I need to point out an extremely important feature of Node.js: the language in which you will write your web server and application is JavaScript. So Node.js gives you server side JavaScript. Installing Node.js How to install Node.js will be different, depending on your OS, but the result is the same everywhere. It gives you two programs: Node and npm. npm The node packaging manager (npm)is the tool that you use to look for and install modules. Each time you write code that needs a module, you will have to add a line like this in: var module = require('module'); The module will have to be installed first, or the code will fail. This is how it is done: npm install module or npm -g install module The latter will attempt to install the module globally, the former, in the directory where the command is issued. It will typically install the module in a folder called node_modules. node The node program is the command to use to start your Node.js program, for example: node myprogram.js node will start and interpret your code. Type Ctrl-C to stop node. Now create a file myprogram.js containing the following text: var http = require('http'); http.createServer(function (req, res) { res.writeHead(200, {'Content-Type': 'text/plain'}); res.end('Hello Worldn'); }).listen(8080, 'localhost'); console.log('Server running at http://localhost:8080'); So, if you installed Node.js and the required http module, typing node myprogram.js in a terminal window, your console will start up a web server. And, when you type http://localhost:8080 in a browser, you will see the world famous two word program example on your screen. This is the equivalent of getting the It works! thing, after testing your Apache web server installation. As a matter of fact, if you go to http://localhost:8080/it/does/not/matterwhat, the same will appear. Not very useful maybe, but it is a web server. Serving up static content This does not work in a way we are used to. URLs typically point to a file (or a folder, in which case the server looks for an index.html file) , foo.html, or bar.php, and when present, it is served up to the client. So what if we want to do this with Node.js? We will need a module. Several exist to do the job. We will use node-static in our example. But first we need to install it: npm install node-static In our app, we will now create not only a web server, but a fileserver as well. It will serve all the files in the local directory public. It is good to have all the so called static content together in a separate folder. These are basically all the files that will be served up to and interpreted by the client. As we will now end up with a mix of client code and server code, it is a good practice to separate them. When you use the Express framework, you have an option to have Express create these things for you. So, here is a second, more complete, Node.js example, including all its static content. hello.js, our node.js app var http = require('http'); var static = require('node-static'); var fileServer = new static.Server('./public'); http.createServer(function (req, res) { fileServer.serve(req,res); }).listen(8080, 'localhost'); console.log('Server running at http://localhost:8080'); hello.html is stored in ./public. <!DOCTYPE html> <html> <head> <meta charset="UTF-8" /> <title>Hello world document</title> <link href="./styles/hello.css" rel="stylesheet"> </head> <body> <h1>Hello, World</h1> </body> </html> hello.css is stored in public/styles. body { background-color:#FFDEAD; } h1 { color:teal; margin-left:30px; } .bigbutton { height:40px; color: white; background-color:teal; margin-left:150px; margin-top:50px; padding:15 15 25 15; font-size:18px; } So, if we now visit http://localhost:8080/hello, we will see our, by now too familiar, Hello World message with some basic styling, proving that our file server also delivered the CSS file. You can easily take it one step further and add JavaScript and the jQuery library and put it in, for example, public/js/hello.js and public/js/jquery.js respectively. Too many notes With Node.js, you only install the modules that you need, so it does not include the kitchen sink by default! You will benefit from that for as far as performance goes. Back in California, I have been a proud product manager of a PC UNIX product, and one of our coolest value-add was a tool, called kconfig, that would allow people to customize what would be inside the UNIX kernel, so that it would only contain what was needed. This is what Node.js reminds me of. And it is written in C, as was UNIX. Deja vu. However, if we wanted our Node.js web server to do everything the Apache Web Server does, we would need a lot of modules. Our application code needs to be added to that as well. That means a lot of modules. Like the critics in the movie Amadeus said: Too many notes. Express 4 A good way to get the job done with fewer notes is by using the Express framework. On the expressjs.com website, it is called a minimal and flexible Node.js web application framework, providing a robust set of features for building web applications. This is a good way to describe what Express can do for you. It is minimal, so there is little overhead for the framework itself. It is flexible, so you can add just what you need. It gives a robust set of features, which means you do not have to create them yourselves, and they have been tested by an ever growing community. But we need to get back to templating, so all we are going to do here is explain how to get Express, and give one example. Installing Express As Express is also a node module, we install it as such. In your project directory for your application, type: npm install express You will notice that a folder called express has been created inside node_modules, and inside that one, there is another collection of node-modules. These are examples of what is called middleware. In the code example that follows, we assume app.js as the name of our JavaScript file, and app for the variable that you will use in that file for your instance of Express. This is for the sake of brevity. It would be better to use a string that matches your project name. We will now use Express to rewrite the hello.js example. All static resources in the public directory can remain untouched. The only change is in the node app itself: var express = require('express'); var path = require('path'); var app = express(); app.set('port', process.env.PORT || 3000); var options = { dotfiles: 'ignore', extensions: ['htm', 'html'], index: false }; app.use(express.static(path.join(__dirname, 'public') , options )); app.listen(app.get('port'), function () { console.log('Hello express started on http://localhost:' + app.get('port') + '; press Ctrl-C to terminate.' ); }); This code uses so called middleware (static) that is included with express. There is a lot more available from third parties. Well, compared to our node.js example, it is about the same number of lines. But it looks a lot cleaner and it does more for us. You no longer need to explicitly include the HTTP module and other such things. Templating and Express We need to get back to templating now. Imagine all the JavaScript ecosystem we just described. Yes, we could still put our client JavaScript code in between the <script> tags but what about the server JavaScript code? There is no such thing as <?javascript ?> ! Node.js and Express, support several templating languages that allow you to separate layout and content, and which have the template system do the work of fetching the content and injecting it into the HTML. The default templating processor for Express appears to be Jade, which uses its own, albeit more compact than HTML, language. Unfortunately, that would mean that you have to learn yet another syntax to produce something. We propose to use handlebars.js. There are two reasons why we have chosen handlebars.js: It uses <html> as the language It is available on both the client and server side Getting the handlebars module for Express Several Express modules exist for handlebars. We happen to like the one with the surprising name express-handlebars. So, we install it, as follows: npm install express-handlebars Layouts I almost called this section templating without templates as our first example will not use a parameter inside the templates. Most websites will consist of several pages, either static or dynamically generated ones. All these pages usually have common parts; a header and footer part, a navigation part or menu, and so on. This is the layout of our site. What distinguishes one page from another, usually, is some part in the body of the page where the home page has different information than the other pages. With express-handlebars, you can separate layout and content. We will start with a very simple example. Inside your project folder that contains public, create a folder, views, with a subdirectory layout. Inside the layouts subfolder, create a file called main.handlebars. This is your default layout. Building on top of the previous example, have it say: <!doctype html> <html> <head> <title>Handlebars demo</title> </head> <link href="./styles/hello.css" rel="stylesheet"> <body> {{{body}}} </body> </html> Notice the {{{body}}} part. This token will be replaced by HTML. Handlebars escapes HTML. If we want our HTML to stay intact, we use {{{ }}}, instead of {{ }}. Body is a reserved word for handlebars. Create, in the folder views, a file called hello.handlebars with the following content. This will be one (of many) example of the HTML {{{body}}}, which will be replaced by: <h1>Hello, World</h1> Let’s create a few more june.handlebars with: <h1>Hello, June Lake</h1> And bodie.handlebars containing: <h1>Hello, Bodie</h1> Our first handlebars example Now, create a file, handlehello.js, in the project folder. For convenience, we will keep the relevant code of the previous Express example: var express = require('express'); var path = require('path'); var app = express(); var exphbs = require(‘express-handlebars’); app.engine('handlebars', exphbs({defaultLayout: 'main'})); app.set('view engine', 'handlebars'); app.set('port', process.env.PORT || 3000); var options = { dotfiles: 'ignore', etag: false, extensions: ['htm', 'html'], index: false }; app.use(express.static(path.join(__dirname, 'public') , options )); app.get('/', function(req, res) { res.render('hello'); // this is the important part }); app.get('/bodie', function(req, res) { res.render('bodie'); }); app.get('/june', function(req, res) { res.render('june'); }); app.listen(app.get('port'), function () { console.log('Hello express started on http://localhost:' + app.get('port') + '; press Ctrl-C to terminate.' ); }); Everything that worked before still works, but if you type http://localhost:3000/, you will see a page with the layout from main.handlebars and {{{body}}} replaced by, you guessed it, the same Hello World with basic markup that looks the same as our hello.html example. Let’s look at the new code. First, of course, we need to add a require statement for our express-handlebars module, giving us an instance of express-handlebars. The next two lines specify what the view engine is for this app and what the extension is that is used for the templates and layouts. We pass one option to express-handlebars, defaultLayout, setting the default layout to be main. This way, we could have different versions of our app with different layouts, for example, one using Bootstrap and another using Foundation. The res.render calls determine which views need to be rendered, so if you type http:// localhost:3000/june, you will get Hello, June Lake, rather than Hello World. But this is not at all useful, as in this implementation, you still have a separate file for each Hello flavor. Let’s create a true template instead. Templates In the views folder, create a file, town.handlebars, with the following content: {{!-- Our first template with tokens --}} <h1>Hello, {{town}} </h1> Please note the comment line. This is the syntax for a handlebars comment. You could HTML comments as well, of course, but the advantage of using handlebars comments is that it will not show up in the final output. Next, add this to your JavaScript file: app.get('/lee', function(req, res) { res.render('town', { town: "Lee Vining"}); }); Now, we have a template that we can use over and over again with different context, in this example, a different town name. All you have to do is pass a different second argument to the res.render call, and {{town}} in the template, will be replaced by the value of town in the object. In general, what is passed as the second argument is referred to as the context. Helpers The token can also be replaced by the output of a function. After all, this is JavaScript. In the context of handlebars, we call those helpers. You can write your own, or use some of the cool built-in ones, such as #if and #each. #if/else Let us update town.handlebars as follows: {{#if town}} <h1>Hello, {{town}} </h1> {{else}} <h1>Hello, World </h1> {{/if}} This should be self explanatory. If the variable town has a value, use it, if not, then show the world. Note that what comes after #if can only be something that is either true of false, zero or not. The helper does not support a construct such as #if x < y. #each A very useful built-in helper is #each, which allows you to walk through an array of things and generate HTML accordingly. This is an example of the code that could be inside your app and the template you could use in your view folder: app.js code snippet var californiapeople = { people: [ {“name":"Adams","first":"Ansel","profession":"photographer", "born" :"SanFrancisco"}, {“name":"Muir","first":"John","profession":"naturalist", "born":"Scotland"}, {“name":"Schwarzenegger","first":"Arnold", "profession":"governator","born":"Germany"}, {“name":"Wellens","first":"Paul","profession":"author", "born":"Belgium"} ] }; app.get('/californiapeople', function(req, res) { res.render('californiapeople', californiapeople); }); template (californiapeople.handlebars) <table class=“cooltable”> {{#each people}} <tr><td>{{first}}</td><td>{{name}}</td> <td>{{profession}}</td></tr> {{/each}} </table> Now we are well on our way to do some true templating. You can also write your own helpers, which is beyond the scope of an introductory article. However, before we leave you, there is one cool feature of handlebars you need to know about: partials. Partials In web development, where you dynamically generate HTML to be part of a web page, it is often the case that you repetitively need to do the same thing, albeit on a different page. There is a cool feature in express-handlebars that allows you to do that very same thing: partials. Partials are templates you can refer to inside a template, using a special syntax and drastically shortening your code that way. The partials are stored in a separate folder. By default, that would be views/partials, but you can even use subfolders. Let's redo the previous example but with a partial. So, our template is going to be extremely petite: {{!-- people.handlebars inside views --}} {{> peoplepartial }} Notice the > sign; this is what indicates a partial. Now, here is the familiar looking partial template: {{!-- peoplepartialhandlebars inside views/partials --}} <h1>Famous California people </h1> <table> {{#each people}} <tr><td>{{first}}</td><td>{{name}}</td> <td>{{profession}}</td></tr> {{/each}} </table> And, following is the JavaScript code that triggers it: app.get('/people', function(req, res) { res.render('people', californiapeople); }); So, we give it the same context but the view that is rendered is ridiculously simplistic, as there is a partial underneath that will take care of everything. Of course, these were all examples to demonstrate how handlebars and Express can work together really well, nothing more than that. Summary In this article, we talked about using templates in web development. Then, we zoomed in on using Node.js and Express, and introduced Handlebars.js. Handlebars.js is cool, as it lets you separate logic from layout and you can use it server-side (which is where we focused on), as well as client-side. Moreover, you will still be able to use HTML for your views and layouts, unlike with other templating processors. For those of you new to Node.js, I compared it to what Le Sacre du Printemps was to music. To all of you, I recommend the recording by the Los Angeles Philharmonic and Esa-Pekka Salonen. I had season tickets for this guy and went to his inaugural concert with Mahler’s third symphony. PHP had not been written yet, but this particular performance I had heard on the radio while on the road in California, and it was magnificent. Check it out. And, also check out Express and handlebars. Resources for Article: Let's Build with AngularJS and Bootstrap The Bootstrap grid system MODx Web Development: Creating Lists

0
2
38319

How-To Tutorials

Packt

10 Aug 2015

17 min read

The Splunk Interface

Packt

10 Aug 2015

17 min read

In this article by Vincent Bumgarner & James D. Miller, author of the book, Implementing Splunk - Second Edition, we will walk through the most common elements in the Splunk interface, and will touch upon concepts that will be covered in greater detail. You may want to dive right into the search section, but an overview of the user interface elements might save you some frustration later. We will cover the following topics: Logging in and app selection A detailed explanation of the search interface widgets A quick overview of the admin interface (For more resources related to this topic, see here.) Logging into Splunk The Splunk GUI interface (Splunk is also accessible through its command-line interface [CLI] and REST API) is web-based, which means that no client needs to be installed. Newer browsers with fast JavaScript engines, such as Chrome, Firefox, and Safari, work better with the interface. As of Splunk Version 6.2.0, no browser extensions are required. Splunk Versions 4.2 and earlier require Flash to render graphs. Flash can still be used by older browsers, or for older apps that reference Flash explicitly. The default port for a Splunk installation is 8000. The address will look like: http://mysplunkserver:8000 or http://mysplunkserver.mycompany.com:8000. The Splunk interface If you have installed Splunk on your local machine, the address can be some variant of http://localhost:8000, http://127.0.0.1:8000, http://machinename:8000, or http://machinename.local:8000. Once you determine the address, the first page you will see is the login screen. The default username is admin with the password changeme. The first time you log in, you will be prompted to change the password for the admin user. It is a good idea to change this password to prevent unwanted changes to your deployment. By default, accounts are configured and stored within Splunk. Authentication can be configured to use another system, for instance Lightweight Directory Access Protocol (LDAP). By default, Splunk authenticates locally. If LDAP is set up, the order is as follows: LDAP / Local. The home app After logging in, the default app is the Launcher app (some may refer to this as Home). This app is a launching pad for apps and tutorials. In earlier versions of Splunk, the Welcome tab provided two important shortcuts, Add data and the Launch search app. In version 6.2.0, the Home app is divided into distinct areas, or panes, that provide easy access to Explore Splunk Enterprise (Add Data, Splunk Apps, Splunk Docs, and Splunk Answers) as well as Apps (the App management page) Search & Reporting (the link to the Search app), and an area where you can set your default dashboard (choose a home dashboard). The Explore Splunk Enterprise pane shows links to: Add data: This links Add Data to the Splunk page. This interface is a great start for getting local data flowing into Splunk (making it available to Splunk users). The Preview data interface takes an enormous amount of complexity out of configuring dates and line breaking. Splunk Apps: This allows you to find and install more apps from the Splunk Apps Marketplace (http://apps.splunk.com). This marketplace is a useful resource where Splunk users and employees post Splunk apps, mostly free but some premium ones as well. Splunk Answers: This is one of your links to the wide amount of Splunk documentation available, specifically http://answers.splunk.com, where you can engage with the Splunk community on Splunkbase (https://splunkbase.splunk.com/) and learn how to get the most out of your Splunk deployment. The Apps section shows the apps that have GUI elements on your instance of Splunk. App is an overloaded term in Splunk. An app doesn't necessarily have a GUI at all; it is simply a collection of configurations wrapped into a directory structure that means something to Splunk. Search & Reporting is the link to the Splunk Search & Reporting app. Beneath the Search & Reporting link, Splunk provides an outline which, when you hover over it, displays a Find More Apps balloon tip. Clicking on the link opens the same Browse more apps page as the Splunk Apps link mentioned earlier. Choose a home dashboard provides an intuitive way to select an existing (simple XML) dashboard and set it as part of your Splunk Welcome or Home page. This sets you at a familiar starting point each time you enter Splunk. The following image displays the Choose Default Dashboard dialog: Once you select an existing dashboard from the dropdown list, it will be part of your welcome screen every time you log into Splunk – until you change it. There are no dashboards installed by default after installing Splunk, except the Search & Reporting app. Once you have created additional dashboards, they can be selected as the default. The top bar The bar across the top of the window contains information about where you are, as well as quick links to preferences, other apps, and administration. The current app is specified in the upper-left corner. The following image shows the upper-left Splunk bar when using the Search & Reporting app: Clicking on the text takes you to the default page for that app. In most apps, the text next to the logo is simply changed, but the whole block can be customized with logos and alternate text by modifying the app's CSS. The upper-right corner of the window, as seen in the previous image, contains action links that are almost always available: The name of the user who is currently logged in appears first. In this case, the user is Administrator. Clicking on the username allows you to select Edit Account (which will take you to the Your account page) or to Logout (of Splunk). Logout ends the session and forces the user to login again. The following screenshot shows what the Your account page looks like: This form presents the global preferences that a user is allowed to change. Other settings that affect users are configured through permissions on objects and settings on roles. (Note: preferences can also be configured using the CLI or by modifying specific Splunk configuration files). Full name and Email address are stored for the administrator's convenience. Time zone can be changed for the logged-in user. This is a new feature in Splunk 4.3. Setting the time zone only affects the time zone used to display the data. It is very important that the date is parsed properly when events are indexed. Default app controls the starting page after login. Most users will want to change this to search. Restart backgrounded jobs controls whether unfinished queries should run again if Splunk is restarted. Set password allows you to change your password. This is only relevant if Splunk is configured to use internal authentication. For instance, if the system is configured to use Windows Active Directory via LDAP (a very common configuration), users must change their password in Windows. Messages allows you to view any system-level error messages you may have pending. When there is a new message for you to review, a notification displays as a count next to the Messages menu. You can click the X to remove a message. The Settings link presents the user with the configuration pages for all Splunk Knowledge objects, Distributed Environment settings, System and Licensing, Data, and Users and Authentication settings. If you do not see some of these options, you do not have the permissions to view or edit them. The Activity menu lists shortcuts to Splunk Jobs, Triggered Alerts, and System Activity views. You can click Jobs (to open the search jobs manager window, where you can view and manage currently running searches), click Triggered Alerts (to view scheduled alerts that are triggered) or click System Activity (to see dashboards about user activity and the status of the system). Help lists links to video Tutorials, Splunk Answers, the Splunk Contact Support portal, and online Documentation. Find can be used to search for objects within your Splunk Enterprise instance. For example, if you type in error, it returns the saved objects that contain the term error. These saved objects include Reports, Dashboards, Alerts, and so on. You can also search for error in the Search & Reporting app by clicking Open error in search. The search & reporting app The Search & Reporting app (or just the search app) is where most actions in Splunk start. This app is a dashboard where you will begin your searching. The summary view Within the Search & Reporting app, the user is presented with the Summary view, which contains information about the data which that user searches for by default. This is an important distinction—in a mature Splunk installation, not all users will always search all data by default. But at first, if this is your first trip into Search & Reporting, you'll see the following: From the screen depicted in the previous screenshot, you can access the Splunk documentation related to What to Search and How to Search. Once you have at least some data indexed, Splunk will provide some statistics on the available data under What to Search (remember that this reflects only the indexes that this particular user searches by default; there are other events that are indexed by Splunk, including events that Splunk indexes about itself.) This is seen in the following image: In previous versions of Splunk, panels such as the All indexed data panel provided statistics for a user's indexed data. Other panels gave a breakdown of data using three important pieces of metadata—Source, Sourcetype, and Hosts. In the current version—6.2.0—you access this information by clicking on the button labeled Data Summary, which presents the following to the user: This dialog splits the information into three tabs—Hosts, Sources and Sourcetypes. A host is a captured hostname for an event. In the majority of cases, the host field is set to the name of the machine where the data originated. There are cases where this is not known, so the host can also be configured arbitrarily. A source in Splunk is a unique path or name. In a large installation, there may be thousands of machines submitting data, but all data on the same path across these machines counts as one source. When the data source is not a file, the value of the source can be arbitrary, for instance, the name of a script or network port. A source type is an arbitrary categorization of events. There may be many sources across many hosts, in the same source type. For instance, given the sources /var/log/access.2012-03-01.log and /var/log/access.2012-03-02.log on the hosts fred and wilma, you could reference all these logs with source type access or any other name that you like. Let's move on now and discuss each of the Splunk widgets (just below the app name). The first widget is the navigation bar. As a general rule, within Splunk, items with downward triangles are menus. Items without a downward triangle are links. Next we find the Search bar. This is where the magic starts. We'll go into great detail shortly. Search Okay, we've finally made it to search. This is where the real power of Splunk lies. For our first search, we will search for the word (not case specific); error. Click in the search bar, type the word error, and then either press Enter or click on the magnifying glass to the right of the bar. Upon initiating the search, we are taken to the search results page. Note that the search we just executed was across All time (by default); to change the search time, you can utilize the Splunk time picker. Actions Let's inspect the elements on this page. Below the Search bar, we have the event count, action icons, and menus. Starting from the left, we have the following: The number of events matched by the base search. Technically, this may not be the number of results pulled from disk, depending on your search. Also, if your query uses commands, this number may not match what is shown in the event listing. Job: This opens the Search job inspector window, which provides very detailed information about the query that was run. Pause: This causes the current search to stop locating events but keeps the job open. This is useful if you want to inspect the current results to determine whether you want to continue a long running search. Stop: This stops the execution of the current search but keeps the results generated so far. This is useful when you have found enough and want to inspect or share the results found so far. Share: This shares the search job. This option extends the job's lifetime to seven days and sets the read permissions to everyone. Export: This exports the results. Select this option to output to CSV, raw events, XML, or JavaScript Object Notation (JSON) and specify the number of results to export. Print: This formats the page for printing and instructs the browser to print. Smart Mode: This controls the search experience. You can set it to speed up searches by cutting down on the event data it returns and, additionally, by reducing the number of fields that Splunk will extract by default from the data (Fast mode). You can, otherwise, set it to return as much event information as possible (Verbose mode). In Smart mode (the default setting) it toggles search behavior based on the type of search you're running. Timeline Now we'll skip to the timeline below the action icons. Along with providing a quick overview of the event distribution over a period of time, the timeline is also a very useful tool for selecting sections of time. Placing the pointer over the timeline displays a pop-up for the number of events in that slice of time. Clicking on the timeline selects the events for a particular slice of time. Clicking and dragging selects a range of time. Once you have selected a period of time, clicking on Zoom to selection changes the time frame and reruns the search for that specific slice of time. Repeating this process is an effective way to drill down to specific events. Deselect shows all events for the time range selected in the time picker. Zoom out changes the window of time to a larger period around the events in the current time frame The field picker To the left of the search results, we find the field picker. This is a great tool for discovering patterns and filtering search results. Fields The field list contains two lists: Selected Fields, which have their values displayed under the search event in the search results Interesting Fields, which are other fields that Splunk has picked out for you Above the field list are two links: Hide Fields and All Fields. Hide Fields: Hides the field list area from view. All Fields: Takes you to the Selected Fields window. Search results We are almost through with all the widgets on the page. We still have a number of items to cover in the search results section though, just to be thorough. As you can see in the previous screenshot, at the top of this section, we have the number of events displayed. When viewing all results in their raw form, this number will match the number above the timeline. This value can be changed either by making a selection on the timeline or by using other search commands. Next, we have the action icons (described earlier) that affect these particular results. Under the action icons, we have four results tabs: Events list, which will show the raw events. This is the default view when running a simple search, as we have done so far. Patterns streamlines the event pattern detection. It displays a list of the most common patterns among the set of events returned by your search. Each of these patterns represents the number of events that share a similar structure. Statistics populates when you run a search with transforming commands such as stats, top, chart, and so on. The previous keyword search for error does not display any results in this tab because it does not have any transforming commands. Visualization transforms searches and also populates the Visualization tab. The results area of the Visualization tab includes a chart and the statistics table used to generate the chart. Not all searches are eligible for visualization. Under the tabs described just now, is the timeline. Options Beneath the timeline, (starting at the left) is a row of option links that include: Show Fields: shows the Selected Fields screen List: allows you to select an output option (Raw, List, or Table) for displaying the search results Format: provides the ability to set Result display options, such as Show row numbers, Wrap results, the Max lines (to display) and Drilldown as on or off. NN Per Page: is where you can indicate the number of results to show per page (10, 20, or 50). To the right are options that you can use to choose a page of results, and to change the number of events per page. In prior versions of Splunk, these options were available from the Results display options popup dialog. The events viewer Finally, we make it to the actual events. Let's examine a single event. Starting at the left, we have: Event Details: Clicking here (indicated by the right facing arrow) opens the selected event, providing specific information about the event by type, field, and value, and allows you the ability to perform specific actions on a particular event field. In addition, Splunk version 6.2.0 offers a button labeled Event Actions to access workflow actions, a few of which are always available. Build Eventtype: Event types are a way to name events that match a certain query. Extract Fields: This launches an interface for creating custom field extractions. Show Source: This pops up a window with a simulated view of the original source. The event number: Raw search results are always returned in the order most recent first. Next to appear are any workflow actions that have been configured. Workflow actions let you create new searches or links to other sites, using data from an event. Next comes the parsed date from this event, displayed in the time zone selected by the user. This is an important and often confusing distinction. In most installations, everything is in one time zone—the servers, the user, and the events. When one of these three things is not in the same time zone as the others, things can get confusing. Next, we see the raw event itself. This is what Splunk saw as an event. With no help, Splunk can do a good job finding the date and breaking lines appropriately, but as we will see later, with a little help, event parsing can be more reliable and more efficient. Below the event are the fields that were selected in the field picker. Clicking on the value adds the field value to the search. Summary As you have seen, the Splunk GUI provides a rich interface for working with search results. We have really only scratched the surface and will cover more elements. Resources for Article: Further resources on this subject: The Splunk Web Framework [Article] Loading data, creating an app, and adding dashboards and reports in Splunk [Article] Working with Apps in Splunk [Article]

0
0
4002

article-image-bayesian-network-fundamentals

Packt

10 Aug 2015

25 min read

Bayesian Network Fundamentals

Packt

10 Aug 2015

25 min read

In this article by Ankur Ankan and Abinash Panda, the authors of Mastering Probabilistic Graphical Models Using Python, we'll cover the basics of random variables, probability theory, and graph theory. We'll also see the Bayesian models and the independencies in Bayesian models. A graphical model is essentially a way of representing joint probability distribution over a set of random variables in a compact and intuitive form. There are two main types of graphical models, namely directed and undirected. We generally use a directed model, also known as a Bayesian network, when we mostly have a causal relationship between the random variables. Graphical models also give us tools to operate on these models to find conditional and marginal probabilities of variables, while keeping the computational complexity under control. (For more resources related to this topic, see here.) Probability theory To understand the concepts of probability theory, let's start with a real-life situation. Let's assume we want to go for an outing on a weekend. There are a lot of things to consider before going: the weather conditions, the traffic, and many other factors. If the weather is windy or cloudy, then it is probably not a good idea to go out. However, even if we have information about the weather, we cannot be completely sure whether to go or not; hence we have used the words probably or maybe. Similarly, if it is windy in the morning (or at the time we took our observations), we cannot be completely certain that it will be windy throughout the day. The same holds for cloudy weather; it might turn out to be a very pleasant day. Further, we are not completely certain of our observations. There are always some limitations in our ability to observe; sometimes, these observations could even be noisy. In short, uncertainty or randomness is the innate nature of the world. The probability theory provides us the necessary tools to study this uncertainty. It helps us look into options that are unlikely yet probable. Random variable Probability deals with the study of events. From our intuition, we can say that some events are more likely than others, but to quantify the likeliness of a particular event, we require the probability theory. It helps us predict the future by assessing how likely the outcomes are. Before going deeper into the probability theory, let's first get acquainted with the basic terminologies and definitions of the probability theory. A random variable is a way of representing an attribute of the outcome. Formally, a random variable X is a function that maps a possible set of outcomes ? to some set E, which is represented as follows: X : ? ? E As an example, let us consider the outing example again. To decide whether to go or not, we may consider the skycover (to check whether it is cloudy or not). Skycover is an attribute of the day. Mathematically, the random variable skycover (X) is interpreted as a function, which maps the day (?) to its skycover values (E). So when we say the event X = 40.1, it represents the set of all the days {?} such that , where is the mapping function. Formally speaking, . Random variables can either be discrete or continuous. A discrete random variable can only take a finite number of values. For example, the random variable representing the outcome of a coin toss can take only two values, heads or tails; and hence, it is discrete. Whereas, a continuous random variable can take infinite number of values. For example, a variable representing the speed of a car can take any number values. For any event whose outcome is represented by some random variable (X), we can assign some value to each of the possible outcomes of X, which represents how probable it is. This is known as the probability distribution of the random variable and is denoted by P(X). For example, consider a set of restaurants. Let X be a random variable representing the quality of food in a restaurant. It can take up a set of values, such as {good, bad, average}. P(X), represents the probability distribution of X, that is, if P(X = good) = 0.3, P(X = average) = 0.5, and P(X = bad) = 0.2. This means there is 30 percent chance of a restaurant serving good food, 50 percent chance of it serving average food, and 20 percent chance of it serving bad food. Independence and conditional independence In most of the situations, we are rather more interested in looking at multiple attributes at the same time. For example, to choose a restaurant, we won't only be looking just at the quality of food; we might also want to look at other attributes, such as the cost, location, size, and so on. We can have a probability distribution over a combination of these attributes as well. This type of distribution is known as joint probability distribution. Going back to our restaurant example, let the random variable for the quality of food be represented by Q, and the cost of food be represented by C. Q can have three categorical values, namely {good, average, bad}, and C can have the values {high, low}. So, the joint distribution for P(Q, C) would have probability values for all the combinations of states of Q and C. P(Q = good, C = high) will represent the probability of a pricey restaurant with good quality food, while P(Q = bad, C = low) will represent the probability of a restaurant that is less expensive with bad quality food. Let us consider another random variable representing an attribute of a restaurant, its location L. The cost of food in a restaurant is not only affected by the quality of food but also the location (generally, a restaurant located in a very good location would be more costly as compared to a restaurant present in a not-very-good location). From our intuition, we can say that the probability of a costly restaurant located at a very good location in a city would be different (generally, more) than simply the probability of a costly restaurant, or the probability of a cheap restaurant located at a prime location of city is different (generally less) than simply probability of a cheap restaurant. Formally speaking, P(C = high | L = good) will be different from P(C = high) and P(C = low | L = good) will be different from P(C = low). This indicates that the random variables C and L are not independent of each other. These attributes or random variables need not always be dependent on each other. For example, the quality of food doesn't depend upon the location of restaurant. So, P(Q = good | L = good) or P(Q = good | L = bad)would be the same as P(Q = good), that is, our estimate of the quality of food of the restaurant will not change even if we have knowledge of its location. Hence, these random variables are independent of each other. In general, random variables can be considered as independent of each other, if: They may also be considered independent if: We can easily derive this conclusion. We know the following from the chain rule of probability: P(X, Y) = P(X) P(Y | X) If Y is independent of X, that is, if X | Y, then P(Y | X) = P(Y). Then: P(X, Y) = P(X) P(Y) Extending this result on multiple variables, we can easily get to the conclusion that a set of random variables are independent of each other, if their joint probability distribution is equal to the product of probabilities of each individual random variable. Sometimes, the variables might not be independent of each other. To make this clearer, let's add another random variable, that is, the number of people visiting the restaurant N. Let's assume that, from our experience we know the number of people visiting only depends on the cost of food at the restaurant and its location (generally, lesser number of people visit costly restaurants). Does the quality of food Q affect the number of people visiting the restaurant? To answer this question, let's look into the random variable affecting N, cost C, and location L. As C is directly affected by Q, we can conclude that Q affects N. However, let's consider a situation when we know that the restaurant is costly, that is, C = high and let's ask the same question, "does the quality of food affect the number of people coming to the restaurant?". The answer is no. The number of people coming only depends on the price and location, so if we know that the cost is high, then we can easily conclude that fewer people will visit, irrespective of the quality of food. Hence, . This type of independence is called conditional independence. Installing tools Let's now see some coding examples using pgmpy, to represent joint distributions and independencies. Here, we will mostly work with IPython and pgmpy (and a few other libraries) for coding examples. So, before moving ahead, let's get a basic introduction to these. IPython IPython is a command shell for interactive computing in multiple programming languages, originally developed for the Python programming language, which offers enhanced introspection, rich media, additional shell syntax, tab completion, and a rich history. IPython provides the following features: Powerful interactive shells (terminal and Qt-based) A browser-based notebook with support for code, text, mathematical expressions, inline plots, and other rich media Support for interactive data visualization and use of GUI toolkits Flexible and embeddable interpreters to load into one's own projects Easy-to-use and high performance tools for parallel computing You can install IPython using the following command: >>> pip3 install ipython To start the IPython command shell, you can simply type ipython3 in the terminal. For more installation instructions, you can visit http://ipython.org/install.html. pgmpy pgmpy is a Python library to work with Probabilistic Graphical models. As it's currently not on PyPi, we will need to build it manually. You can get the source code from the Git repository using the following command: >>> git clone https://github.com/pgmpy/pgmpy Now cd into the cloned directory switch branch for version used and build it with the following code: >>> cd pgmpy >>> git checkout book/v0.1 >>> sudo python3 setup.py install For more installation instructions, you can visit http://pgmpy.org/install.html. With both IPython and pgmpy installed, you should now be able to run the examples. Representing independencies using pgmpy To represent independencies, pgmpy has two classes, namely IndependenceAssertion and Independencies. The IndependenceAssertion class is used to represent individual assertions of the form of or . Let's see some code to represent assertions: # Firstly we need to import IndependenceAssertion In [1]: from pgmpy.independencies import IndependenceAssertion # Each assertion is in the form of [X, Y, Z] meaning X is # independent of Y given Z. In [2]: assertion1 = IndependenceAssertion('X', 'Y') In [3]: assertion1 Out[3]: (X _|_ Y) Here, assertion1 represents that the variable X is independent of the variable Y. To represent conditional assertions, we just need to add a third argument to IndependenceAssertion: In [4]: assertion2 = IndependenceAssertion('X', 'Y', 'Z') In [5]: assertion2 Out [5]: (X _|_ Y | Z) In the preceding example, assertion2 represents . IndependenceAssertion also allows us to represent assertions in the form of . To do this, we just need to pass a list of random variables as arguments: In [4]: assertion2 = IndependenceAssertion('X', 'Y', 'Z') In [5]: assertion2 Out[5]: (X _|_ Y | Z) Moving on to the Independencies class, an Independencies object is used to represent a set of assertions. Often, in the case of Bayesian or Markov networks, we have more than one assertion corresponding to a given model, and to represent these independence assertions for the models, we generally use the Independencies object. Let's take a few examples: In [8]: from pgmpy.independencies import Independencies # There are multiple ways to create an Independencies object, we # could either initialize an empty object or initialize with some # assertions. In [9]: independencies = Independencies() # Empty object In [10]: independencies.get_assertions() Out[10]: [] In [11]: independencies.add_assertions(assertion1, assertion2) In [12]: independencies.get_assertions() Out[12]: [(X _|_ Y), (X _|_ Y | Z)] We can also directly initialize Independencies in these two ways: In [13]: independencies = Independencies(assertion1, assertion2) In [14]: independencies = Independencies(['X', 'Y'], ['A', 'B', 'C']) In [15]: independencies.get_assertions() Out[15]: [(X _|_ Y), (A _|_ B | C)] Representing joint probability distributions using pgmpy We can also represent joint probability distributions using pgmpy's JointProbabilityDistribution class. Let's say we want to represent the joint distribution over the outcomes of tossing two fair coins. So, in this case, the probability of all the possible outcomes would be 0.25, which is shown as follows: In [16]: from pgmpy.factors import JointProbabilityDistribution as Joint In [17]: distribution = Joint(['coin1', 'coin2'], [2, 2], [0.25, 0.25, 0.25, 0.25]) Here, the first argument includes names of random variable. The second argument is a list of the number of states of each random variable. The third argument is a list of probability values, assuming that the first variable changes its states the slowest. So, the preceding distribution represents the following: In [18]: print(distribution) +--------------------------------------+ ¦ coin1 ¦ coin2 ¦ P(coin1,coin2) ¦ ¦---------+---------+------------------¦ ¦ coin1_0 ¦ coin2_0 ¦ 0.2500 ¦ +---------+---------+------------------¦ ¦ coin1_0 ¦ coin2_1 ¦ 0.2500 ¦ +---------+---------+------------------¦ ¦ coin1_1 ¦ coin2_0 ¦ 0.2500 ¦ +---------+---------+------------------¦ ¦ coin1_1 ¦ coin2_1 ¦ 0.2500 ¦ +--------------------------------------+ We can also conduct independence queries over these distributions in pgmpy: In [19]: distribution.check_independence('coin1', 'coin2') Out[20]: True Conditional probability distribution Let's take an example to understand conditional probability better. Let's say we have a bag containing three apples and five oranges, and we want to randomly take out fruits from the bag one at a time without replacing them. Also, the random variables and represent the outcomes in the first try and second try respectively. So, as there are three apples and five oranges in the bag initially, and . Now, let's say that in our first attempt we got an orange. Now, we cannot simply represent the probability of getting an apple or orange in our second attempt. The probabilities in the second attempt will depend on the outcome of our first attempt and therefore, we use conditional probability to represent such cases. Now, in the second attempt, we will have the following probabilities that depend on the outcome of our first try: , , , and . The Conditional Probability Distribution (CPD) of two variables and can be represented as , representing the probability of given that is the probability of after the event has occurred and we know it's outcome. Similarly, we can have representing the probability of after having an observation for . The simplest representation of CPD is tabular CPD. In a tabular CPD, we construct a table containing all the possible combinations of different states of the random variables and the probabilities corresponding to these states. Let's consider the earlier restaurant example. Let's begin by representing the marginal distribution of the quality of food with Q. As we mentioned earlier, it can be categorized into three values {good, bad, average}. For example, P(Q) can be represented in the tabular form as follows: Quality P(Q) Good 0.3 Normal 0.5 Bad 0.2 Similarly, let's say P(L) is the probability distribution of the location of the restaurant. Its CPD can be represented as follows: Location P(L) Good 0.6 Bad 0.4 As the cost of restaurant C depends on both the quality of food Q and its location L, we will be considering P(C | Q, L), which is the conditional distribution of C, given Q and L: Location Good Bad Quality Good Normal Bad Good Normal Bad Cost High 0.8 0.6 0.1 0.6 0.6 0.05 Low 0.2 0.4 0.9 0.4 0.4 0.95 Representing CPDs using pgmpy Let's first see how to represent the tabular CPD using pgmpy for variables that have no conditional variables: In [1]: from pgmpy.factors import TabularCPD # For creating a TabularCPD object we need to pass three # arguments: the variable name, its cardinality that is the number # of states of the random variable and the probability value # corresponding each state. In [2]: quality = TabularCPD(variable='Quality', variable_card=3, values=[[0.3], [0.5], [0.2]]) In [3]: print(quality) +----------------------+ ¦ ['Quality', 0] ¦ 0.3 ¦ +----------------+-----¦ ¦ ['Quality', 1] ¦ 0.5 ¦ +----------------+-----¦ ¦ ['Quality', 2] ¦ 0.2 ¦ +----------------------+ In [4]: quality.variables Out[4]: OrderedDict([('Quality', [State(var='Quality', state=0), State(var='Quality', state=1), State(var='Quality', state=2)])]) In [5]: quality.cardinality Out[5]: array([3]) In [6]: quality.values Out[6]: array([0.3, 0.5, 0.2]) You can see here that the values of the CPD are a 1D array instead of a 2D array, which you passed as an argument. Actually, pgmpy internally stores the values of the TabularCPD as a flattened numpy array. In [7]: location = TabularCPD(variable='Location', variable_card=2, values=[[0.6], [0.4]]) In [8]: print(location) +-----------------------+ ¦ ['Location', 0] ¦ 0.6 ¦ +-----------------+-----¦ ¦ ['Location', 1] ¦ 0.4 ¦ +-----------------------+ However, when we have conditional variables, we also need to specify them and the cardinality of those variables. Let's define the TabularCPD for the cost variable: In [9]: cost = TabularCPD( variable='Cost', variable_card=2, values=[[0.8, 0.6, 0.1, 0.6, 0.6, 0.05], [0.2, 0.4, 0.9, 0.4, 0.4, 0.95]], evidence=['Q', 'L'], evidence_card=[3, 2]) Graph theory The second major framework for the study of probabilistic graphical models is graph theory. Graphs are the skeleton of PGMs, and are used to compactly encode the independence conditions of a probability distribution. Nodes and edges The foundation of graph theory was laid by Leonhard Euler when he solved the famous Seven Bridges of Konigsberg problem. The city of Konigsberg was set on both sides by the Pregel river and included two islands that were connected and maintained by seven bridges. The problem was to find a walk to exactly cross all the bridges once in a single walk. To visualize the problem, let's think of the graph in Fig 1.1: Fig 1.1: The Seven Bridges of Konigsberg graph Here, the nodes a, b, c, and d represent the land, and are known as vertices of the graph. The line segments ab, bc, cd, da, ab, and bc connecting the land parts are the bridges and are known as the edges of the graph. So, we can think of the problem of crossing all the bridges once in a single walk as tracing along all the edges of the graph without lifting our pencils. Formally, a graph G = (V, E) is an ordered pair of finite sets. The elements of the set V are known as the nodes or the vertices of the graph, and the elements of are the edges or the arcs of the graph. The number of nodes or cardinality of G, denoted by |V|, are known as the order of the graph. Similarly, the number of edges denoted by |E| are known as the size of the graph. Here, we can see that the Konigsberg city graph shown in Fig 1.1 is of order 4 and size 7. In a graph, we say that two vertices, u, v ? V are adjacent if u, v ? E. In the City graph, all the four vertices are adjacent to each other because there is an edge for every possible combination of two vertices in the graph. Also, for a vertex v ? V, we define the neighbors set of v as . In the City graph, we can see that b and d are neighbors of c. Similarly, a, b, and c are neighbors of d. We define an edge to be a self loop if the start vertex and the end vertex of the edge are the same. We can put it more formally as, any edge of the form (u, u), where u ? V is a self loop. Until now, we have been talking only about graphs whose edges don't have a direction associated with them, which means that the edge (u, v) is same as the edge (v, u). These types of graphs are known as undirected graphs. Similarly, we can think of a graph whose edges have a sense of direction associated with it. For these graphs, the edge set E would be a set of ordered pair of vertices. These types of graphs are known as directed graphs. In the case of a directed graph, we also define the indegree and outdegree for a vertex. For a vertex v ? V, we define its outdegree as the number of edges originating from the vertex v, that is, . Similarly, the indegree is defined as the number of edges that end at the vertex v, that is, . Walk, paths, and trails For a graph G = (V, E) and u,v ? V, we define a u - v walk as an alternating sequence of vertices and edges, starting with u and ending with v. In the City graph of Fig 1.1, we can have an example of a - d walk as . If there aren't multiple edges between the same vertices, then we simply represent a walk by a sequence of vertices. As in the case of the Butterfly graph shown in Fig 1.2, we can have a walk W : a, c, d, c, e: Fig 1.2: Butterfly graph—a undirected graph A walk with no repeated edges is known as a trail. For example, the walk in the City graph is a trail. Also, a walk with no repeated vertices, except possibly the first and the last, is known as a path. For example, the walk in the City graph is a path. Also, a graph is known as cyclic if there are one or more paths that start and end at the same node. Such paths are known as cycles. Similarly, if there are no cycles in a graph, it is known as an acyclic graph. Bayesian models In most of the real-life cases when we would be representing or modeling some event, we would be dealing with a lot of random variables. Even if we would consider all the random variables to be discrete, there would still be exponentially large number of values in the joint probability distribution. Dealing with such huge amount of data would be computationally expensive (and in some cases, even intractable), and would also require huge amount of memory to store the probability of each combination of states of these random variables. However, in most of the cases, many of these variables are marginally or conditionally independent of each other. By exploiting these independencies, we can reduce the number of values we need to store to represent the joint probability distribution. For instance, in the previous restaurant example, the joint probability distribution across the four random variables that we discussed (that is, quality of food Q, location of restaurant L, cost of food C, and the number of people visiting N) would require us to store 23 independent values. By the chain rule of probability, we know the following: P(Q, L, C, N) = P(Q) P(L|Q) P(C|L, Q) P(N|C, Q, L) Now, let us try to exploit the marginal and conditional independence between the variables, to make the representation more compact. Let's start by considering the independency between the location of the restaurant and quality of food over there. As both of these attributes are independent of each other, P(L|Q) would be the same as P(L). Therefore, we need to store only one parameter to represent it. From the conditional independence that we have seen earlier, we know that . Thus, P(N|C, Q, L) would be the same as P(N|C, L); thus needing only four parameters. Therefore, we now need only (2 + 1 + 6 + 4 = 13) parameters to represent the whole distribution. We can conclude that exploiting independencies helps in the compact representation of joint probability distribution. This forms the basis for the Bayesian network. Representation A Bayesian network is represented by a Directed Acyclic Graph (DAG) and a set of Conditional Probability Distributions (CPD) in which: The nodes represent random variables The edges represent dependencies For each of the nodes, we have a CPD In our previous restaurant example, the nodes would be as follows: Quality of food (Q) Location (L) Cost of food (C) Number of people (N) As the cost of food was dependent on the quality of food (Q) and the location of the restaurant (L), there will be an edge each from Q ? C and L ? C. Similarly, as the number of people visiting the restaurant depends on the price of food and its location, there would be an edge each from L ? N and C ? N. The resulting structure of our Bayesian network is shown in Fig 1.3: Fig 1.3: Bayesian network for the restaurant example Factorization of a distribution over a network Each node in our Bayesian network for restaurants has a CPD associated to it. For example, the CPD for the cost of food in the restaurant is P(C|Q, L), as it only depends on the quality of food and location. For the number of people, it would be P(N|C, L) . So, we can generalize that the CPD associated with each node would be P(node|Par(node)) where Par(node) denotes the parents of the node in the graph. Assuming some probability values, we will finally get a network as shown in Fig 1.4: Fig 1.4: Bayesian network of restaurant along with CPDs Let us go back to the joint probability distribution of all these attributes of the restaurant again. Considering the independencies among variables, we concluded as follows: P(Q,C,L,N) = P(Q)P(L)P(C|Q, L)P(N|C, L) So now, looking into the Bayesian network (BN) for the restaurant, we can say that for any Bayesian network, the joint probability distribution over all its random variables {X1,X2,...,Xn} can be represented as follows: This is known as the chain rule for Bayesian networks. Also, we say that a distribution P factorizes over a graph G, if P can be encoded as follows: Here, ParG(X) is the parent of X in the graph G. Summary In this article, we saw how we can represent a complex joint probability distribution using a directed graph and a conditional probability distribution associated with each node, which is collectively known as a Bayesian network. Resources for Article: Further resources on this subject: Web Scraping with Python [article] Exact Inference Using Graphical Models [article] wxPython: Design Approaches and Techniques [article]

0
0
47571

Packt

07 Aug 2015

9 min read

NLTK for hackers

Packt

07 Aug 2015

9 min read

In this article written by Nitin Hardeniya, author of the book NLTK Essentials, we will learn that "Life is short, we need Python" that's the mantra I follow and truly believe in. As fresh graduates, we learned and worked mostly with C/C++/JAVA. While these languages have amazing features, Python has a charm of its own. The day I started using Python I loved it. I really did. The big coincidence here is that I finally ended up working with Python during my initial projects on the job. I started to love the kind of datastructures, Libraries, and echo system Python has for beginners as well as for an expert programmer. (For more resources related to this topic, see here.) Python as a language has advanced very fast and spatially. If you are a Machine learning/ Natural language Processing enthusiast, then Python is 'the' go-to language these days. Python has some amazing ways of dealing with strings. It has a very easy and elegant coding style, and most importantly a long list of open libraries. I can go on and on about Python and my love for it. But here I want to talk about very specifically about NLTK (Natural Language Toolkit), one of the most popular Python libraries for Natural language processing. NLTK is simply awesome, and in my opinion,it's the best way to learn and implement some of the most complex NLP concepts. NLTK has variety of generic text preprocessing tool, such as Tokenization, Stop word removal, Stemming, and at the same time,has some very NLP-specific tools,such as Part of speech tagging, Chunking, Named Entity recognition, and dependency parsing. NLTK provides some of the easiest solutions to all the above stages of NLP and that's why it is the most preferred library for any text processing/ text mining application. NLTK not only provides some pretrained models that can be applied directly to your dataset, it also provides ways to customize and build your own taggers, tokenizers, and so on. NLTK is a big library that has many tools available for an NLP developer. I have provided a cheat-sheet of some of the most common steps and their solutions using NLTK. In our book, NLTK Essentials, I have tried to give you enough information to deal with all these processing steps using NLTK. To show you the power of NLTK, let's try to develop a very easy application of finding topics in the unstructured text in a word cloud. Word CloudNLTK Instead of going further into the theoretical aspects of natural language processing, let's start with a quick dive into NLTK. I am going to start with some basic example use cases of NLTK. There is a good chance that you have already done something similar. First, I will give a typical Python programmer approach and then move on to NLTK for a much more efficient, robust, and clean solution. We will start analyzing with some example text content: >>>import urllib2>>># urllib2 is use to download the html content of the web link>>>response = urllib2.urlopen('http://python.org/')>>># You can read the entire content of a file using read() method>>>html = response.read()>>>print len(html)47020 For the current example, I have taken the content from Python's home page: https://www.python.org/. We don't have any clue about the kind of topics that are discussed in this URL, so let's say that we want to start an exploratory data analysis (EDA). Typically in a text domain, EDA can have many meanings, but will go with a simple case of what kinds of terms dominate the documents. What are the topics? How frequent are they? The process will involve some level of preprocessing we will try to do this in a pure Python wayand then we will do it using NLTK. Let's start with cleaning the html tags. One way to do this is to select just tokens, including numbers and character. Anybody who has worked with regular expression should be able to convert html string into a list of tokens: >>># regular expression based split the string>>>tokens = [tok for tok in html.split()]>>>print "Total no of tokens :"+ str(len(tokens))>>># first 100 tokens>>>print tokens[0:100]Total no of tokens :2860['<!doctype', 'html>', '', '', ''type="text/css"', 'media="not', 'print,', 'braille,' ...] As you can see, there is an excess of html tags and other unwanted characters when we use the preceding method. A cleaner version of the same task will look something like this: >>>import re>>># using the split function https://docs.python.org/2/library/re.html>>>tokens = re.split('W+',html)>>>print len(tokens)>>>print tokens[0:100]5787['', 'doctype', 'html', 'if', 'lt', 'IE', '7', 'html', 'class', 'no', 'js', 'ie6', 'lt', 'ie7', 'lt', 'ie8', 'lt', 'ie9', 'endif', 'if', 'IE', '7', 'html', 'class', 'no', 'js', 'ie7', 'lt', 'ie8', 'lt', 'ie9', 'endif', 'if', 'IE', '8', 'msapplication', 'tooltip', 'content', 'The', 'official', 'home', 'of', 'the', 'Python', 'Programming', 'Language', 'meta', 'name', 'apple' ...] This looks much cleaner now. But still you can do more; I leave it to you to try to remove as much noise as you can. You can still look for word length as a criteria and remove words that have a length one—it will remove elements,such as 7, 8, and so on, which are just noise in this case. Now let's go to NLTK for the same task. There is a function called clean_html() that can do all the work we were looking for: >>>import nltk>>># http://www.nltk.org/api/nltk.html#nltk.util.clean_html>>>clean = nltk.clean_html(html)>>># clean will have entire string removing all the html noise>>>tokens = [tok for tok in clean.split()]>>>print tokens[:100]['Welcome', 'to', 'Python.org', 'Skip', 'to', 'content', '▼', 'Close', 'Python', 'PSF', 'Docs', 'PyPI', 'Jobs', 'Community', '▲', 'The', 'Python', 'Network', '&equiv;', 'Menu', 'Arts', 'Business' ...] Cool, right? This definitely is much cleaner and easier to do. No analysis in any EDA can start without distribution. Let's try to get the frequency distribution. First, let's do it the Python way, then I will tell you the NLTK recipe. >>>import operator>>>freq_dis={}>>>for tok in tokens:>>> if tok in freq_dis:>>> freq_dis[tok]+=1>>> else:>>> freq_dis[tok]=1>>># We want to sort this dictionary on values ( freq in this case )>>>sorted_freq_dist= sorted(freq_dis.items(), key=operator.itemgetter(1), reverse=True)>>> print sorted_freq_dist[:25][('Python', 55), ('>>>', 23), ('and', 21), ('to', 18), (',', 18), ('the', 14), ('of', 13), ('for', 12), ('a', 11), ('Events', 11), ('News', 11), ('is', 10), ('2014-', 10), ('More', 9), ('#', 9), ('3', 9), ('=', 8), ('in', 8), ('with', 8), ('Community', 7), ('The', 7), ('Docs', 6), ('Software', 6), (':', 6), ('3:', 5), ('that', 5), ('sum', 5)] Naturally, as this is Python's home page, Python and the >>> interpreters are the most common terms, also giving a sense about the website. A better and efficient approach is to use NLTK's FreqDist() function. For this, we will take a look at the same code we developed before: >>>import nltk>>>Freq_dist_nltk=nltk.FreqDist(tokens)>>>print Freq_dist_nltk>>>for k,v in Freq_dist_nltk.items():>>> print str(k)+':'+str(v)<FreqDist: 'Python': 55, '>>>': 23, 'and': 21, ',': 18, 'to': 18, 'the': 14, 'of': 13, 'for': 12, 'Events': 11, 'News': 11, ...>Python:55>>>:23and:21,:18to:18the:14of:13for:12Events:11News:11 Let's now do some more funky things. Let's plot this: >>>Freq_dist_nltk.plot(50, cumulative=False)>>># below is the plot for the frequency distributions We can see that the cumulative frequency is growing, and at words such as other and frequency 400, the curve is going into long tail. Still, there is some noise, and there are words such asthe, of, for, and =. These are useless words, and there is a terminology for these words. These words are stop words,such asthe, a, and an. Article pronouns are generally present in most of the documents; hence, they are not discriminative enough to be informative. In most of the NLP and information retrieval tasks, people generally remove stop words. Let's go back again to our running example: >>>stopwords=[word.strip().lower() for word in open("PATH/english.stop.txt")]>>>clean_tokens=[tok for tok in tokens if len(tok.lower())>1 and (tok.lower() not in stopwords)]>>>Freq_dist_nltk=nltk.FreqDist(clean_tokens)>>>Freq_dist_nltk.plot(50, cumulative=False) This looks much cleaner now! After finishing this much, you should be able to get something like this using word cloud: Please go to http://www.wordle.net/advanced for more word clouds. Summary To summarize, this article was intended to give you a brief introduction toNatural Language Processing. The book does assume some background in NLP andprogramming in Python, but we have tried to give a very quick head start to Pythonand NLP. Resources for Article: Further resources on this subject: Hadoop Monitoring and its aspects [Article] Big Data Analysis (R and Hadoop) [Article] SciPy for Signal Processing [Article]

0
0
2823

Sending and Syncing Data

Securing OpenStack Networking

Share and Share Alike

Creating Functions and Operations

Using the ArcPy DataAccess Module withFeature Classesand Tables

Hands-on with Prezi Mechanics

Exploring Jenkins

Updating and building our masters

Integrating with Muzzley

Controls and Widgets

Trending Topics

Understanding Hadoop Backup and Recovery Needs

Using Handlebars with Express

The Splunk Interface

Bayesian Network Fundamentals

NLTK for hackers

Create a Free Account To Continue Reading

Sign in to activate your 7-day free access