Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletter Hub
Free Learning
Arrow right icon
timer SALE ENDS IN
0 Days
:
00 Hours
:
00 Minutes
:
00 Seconds

How-To Tutorials

7009 Articles
article-image-sending-and-syncing-data
Packt
10 Aug 2015
4 min read
Save for later

Sending and Syncing Data

Packt
10 Aug 2015
4 min read
This article, by Steven F. Daniel, author of the book, Android Wearable Programming, will provide you with the background and understanding of how you can effectively build applications that communicate between the Android handheld device and the Android wearable. Android Wear comes with a number of APIs that will help to make communicating between the handheld and the wearable a breeze. We will be learning the differences between using MessageAPI, which is sometimes referred to as a "fire and forget" type of message, and DataLayerAPI that supports syncing of data between a handheld and a wearable, and NodeAPI that handles events related to each of the local and connected device nodes. (For more resources related to this topic, see here.) Creating a wearable send and receive application In this section, we will take a look at how to create an Android wearable application that will send an image and a message, and display this on our wearable device. In the next sections, we will take a look at the steps required to send data to the Android wearable using DataAPI, NodeAPI, and MessageAPIs. Firstly, create a new project in Android Studio by following these simple steps: Launch Android Studio, and then click on the File | New Project menu option. Next, enter SendReceiveData for the Application name field. Then, provide the name for the Company Domain field. Now, choose Project location and select where you would like to save your application code: Click on the Next button to proceed to the next step. Next, we will need to specify the form factors for our phone/tablet and Android Wear devices using which our application will run. On this screen, we will need to choose the minimum SDK version for our phone/tablet and Android Wear. Click on the Phone and Tablet option and choose API 19: Android 4.4 (KitKat) for Minimum SDK. Click on the Wear option and choose API 21: Android 5.0 (Lollipop) for Minimum SDK: Click on the Next button to proceed to the next step. In our next step, we will need to add Blank Activity to our application project for the mobile section of our app. From the Add an activity to Mobile screen, choose the Add Blank Activity option from the list of activities shown and click on the Next button to proceed to the next step: Next, we need to customize the properties for Blank Activity so that it can be used by our application. Here we will need to specify the name of our activity, layout information, title, and menu resource file. From the Customize the Activity screen, enter MobileActivity for Activity Name shown and click on the Next button to proceed to the next step in the wizard: In the next step, we will need to add Blank Activity to our application project for the Android wearable section of our app. From the Add an activity to Wear screen, choose the Blank Wear Activity option from the list of activities shown and click on the Next button to proceed to the next step: Next, we need to customize the properties for Blank Wear Activity so that our Android wearable can use it. Here we will need to specify the name of our activity and the layout information. From the Customize the Activity screen, enter WearActivity for Activity Name shown and click on the Next button to proceed to the next step in the wizard:   Finally, click on the Finish button and the wizard will generate your project and after a few moments, the Android Studio window will appear with your project displayed. Summary In this article, we learned about three new APIs, DataAPI, NodeAPI, and MessageAPIs, and how we can use them and their associated methods to transmit information between the handheld mobile and the wearable. If, for whatever reason, the connected wearable node gets disconnected from the paired handheld device, the DataApi class is smart enough to try sending again automatically once the connection is reestablished. Resources for Article: Further resources on this subject: Speeding up Gradle builds for Android [article] Saying Hello to Unity and Android [article] Testing with the Android SDK [article]
Read more
  • 0
  • 0
  • 7490

article-image-securing-openstack-networking
Packt
10 Aug 2015
10 min read
Save for later

Securing OpenStack Networking

Packt
10 Aug 2015
10 min read
In this article by Fabio Alessandro Locati, author of the book OpenStack Cloud Security, you will learn about the importance of firewall, IDS, and IPS. You will also learn about Generic Routing Encapsulation, VXLAN. (For more resources related to this topic, see here.) The importance of firewall, IDS, and IPS The security of a network can and should be achieved in multiple ways. Three components that are critical to the security of a network are: Firewall Intrusion detection system (IDS) Intrusion prevention system (IPS) Firewall Firewalls are systems that control traffic passing through them based on rules. This can seem something like a router, but they are very different. The router allows communication between different networks while the firewall limits communication between networks and hosts. The root of this confusion may occur because very often the router will have the firewall functionality and vice versa. Firewalls need to be connected in a series to your infrastructure. The first paper on the firewall technology appeared in 1988 and designed the packet filter firewall. This kind of firewall is often known as first generation firewall. This kind of firewall analyzes the packages passing through and if the package matches a rule, the firewall will act accordingly to that rule. This firewall will analyze each package by itself and will not consider other aspects such as other packages. It works on the first three layers of the OSI model with very few features using layer 4 specifically to check port numbers and protocols (UDP/TCP). First generation firewalls are still in use, because in a lot of situations, to do the job properly and are cheap and secure. Examples of typical filtering those firewalls prohibit (or allow) to IPs of certain classes (or specific IPs), to access certain IPs, or allow traffic to a specific IP only on specific ports. There are no known attacks to those kind of firewalls, but specific models can have specific bugs that can be exploited. In 1990, a new generation of firewall appeared. The initial name was circuit-level gateway, but today it is far more commonly known as stateful firewalls or second generation firewall. These firewalls are able to understand when connections are being initialized and closed so that the firewall comes to know what is the current state of a connection when a package arrives. To do so, this kind of firewall uses the first four layers of the networking stack. This allows the firewall to drop all packages that are not establishing a new connection or are in an already established connection. These firewalls are very powerful with the TCP protocol because it has states, while they have very small advantages compared to first generation firewalls handling UDP or ICMP packages, since those packages travel with no connection. In these cases, the firewall sets the connection as established; only the first valid package passes through and closes it after the connection times out. Performance-wise, stateful firewall can be faster than packet firewall because if the package is part of an active connection, no further test will be performed against that package. These kinds of firewalls are more susceptible to bugs in their code since reading more about the package makes it easier to exploit. Also, on many devices, it is possible to open connections (with SYN packages) until the firewall is saturated. In such cases, the firewall usually downgrades itself as a simple router allowing all traffic to pass through it. In 1991, improvements were made to the stateful firewall allowing it to understand more about the protocol of the package it was evaluating. The firewalls of this kind before 1994 had major problems, such as working as a proxy that the user had to interact with. In 1994, the first application firewall, as we know it, was born doing all its job completely transparently. To be able to understand the protocol, this kind of firewall requires an understanding of all seven layers of the OSI model. As for security, the same as the stateful firewall does apply to the application firewall as well. Intrusion detection system (IDS) IDSs are systems that monitor the network traffic looking for policy violation and malicious traffic. The goal of the IDS is not to block malicious activity, but instead to log and report them. These systems act in a passive mode, so you'll not see any traffic coming from them. This is very important because it makes them invisible to attackers so you can gain information about the attack, without the attacker knowing. IDSs need to be connected in parallel to your infrastructure. Intrusion prevention system (IPS) IPSs are sometimes referred to as Intrusion Detection and Prevention Systems (IDPS), since they are IDS that are also able to fight back malicious activities. IPSs have greater possibility to act than IDSs. Other than reporting, like IDS, they can also drop malicious packages, reset the connection, and block the traffic from the offending IP address. IPSs need to be connected in series to your infrastructure. Generic Routing Encapsulation (GRE) GRE is a Cisco tuning protocol that is difficult to position in the OSI model. The best place for it to be is between layers 2 and 3. Being above layer 2 (where VLANs are), we can use GRE inside VLAN. We will not go deep into the technicalities of this protocol. I'd like to focus more on the advantages and disadvantages it has over VLAN. The first advantage of (extended) GRE over VLAN is scalability. In fact, VLAN is limited to 4,096, while GRE tunnels do not have this limitation. If you are running a private cloud and you are working in a small corporation, 4,096 networks could be enough, but will definitely not be enough if you work for a big corporation or if you are running a public cloud. Also, unless you use VTP for your VLANs, you'll have to add VLANs to each network device, while GREs don't need this. You cannot have more than 4,096 VLANs in an environment. The second advantage is security. Since you can deploy multiple GRE tunnels in a single VLAN, you can connect a machine to a single VLAN and multiple GRE networks without the risks that come with putting a port in trunking that is needed to bring more VLANs in the same physical port. For these reasons, GRE has been a very common choice in a lot of OpenStack clusters deployed up to OpenStack Havana. The current preferred networking choice (since Icehouse) is Virtual Extensible LAN (VXLAN). VXLAN VXLAN is a network virtualization technology whose specifications have been originally created by Arista Networks, Cisco, and VMWare, and many other companies have backed the project. Its goal is to offer a standardized overlay encapsulation protocol and it was created because the standard VLAN were too limited for the current cloud needs and the GRE protocol was a Cisco protocol. It works using layer 2 Ethernet frames within layer 4 UDP packages on port 4789. As for the maximum number of networks, the limit is 16 million logical networks. Since the Icehouse release, the suggested standard for networking is VXLAN. Flat network versus VLAN versus GRE in OpenStack Quantum In OpenStack Quantum, you can decide to use multiple technologies for your networks: flat network, VLAN, GRE, and the most recent, VXLAN. Let's discuss them in detail: Flat network: It is often used in private clouds since it is very easy to set up. The downside is that any virtual machine will see any other virtual machines in our cloud. I strongly discourage people from using this network design because it's unsafe, and in the long run, it will have problems, as we have seen earlier. VLAN: It is sometimes used in bigger private clouds and sometimes even in small public clouds. The advantage is that many times you already have a VLAN-based installation in your company. The major disadvantages are the need to trunk ports for each physical host and the possible problems in propagation. I discourage this approach, since in my opinion, the advantages are very limited while the disadvantages are pretty strong. VXLAN: It should be used in any kind of cloud due to its technical advantages. It allows a huge number of networks, its way more secure, and often eases debugging. GRE: Until the Havana release, it was the suggested protocol, but since the Icehouse release, the suggestion has been to move toward VXLAN, where the majority of the development is focused. Design a secure network for your OpenStack deployment As for the physical infrastructure, we have to design it securely. We have seen that the network security is critical and that there a lot of possible attacks in this realm. Is it possible to design a secure environment to run OpenStack? Yes it is, if you remember a few rules: Create different networks, at the very least for management and external data (this network usually already exists in your organization and is the one where all your clients are) Never put ports on trunking mode if you use VLANs in your infrastructure, otherwise physically separated networks will be needed The following diagram is an example of how to implement it: Here, the management, tenant external networks could be either VLAN or real networks. Remember that to not use VLAN trunking, you need at least the same amount of physical ports as of VLAN, and the machine has to be subscribed to avoid port trunking that can be a huge security hole. A management network is needed for the administrator to administer the machines and for the OpenStack services to speak to each other. This network is critical, since it may contain sensible data, and for this reason, it has to be disconnected from other networks, or if not possible, have very limited connectivity. The external network is used by virtual machines to access the Internet (and vice versa). In this network, all machines will need an IP address reachable from the Web. The tenant network, sometimes even called internal or guest network is the network where the virtual machines can communicate with other virtual machines in the same cloud. This network, in some deployment cases, can be merged with the external network, but this choice has some security drawbacks. The API network is used to expose OpenStack APIs to the users. This network requires IP addresses reachable from the Web, and for this reason, is often merged into the external network. There are cases where provider networks are needed to connect tenant networks to existing networks outside the OpenStack cluster. Those networks are created by the OpenStack administrator and map directly to an existing physical network in the data center. Summary In this article, we have seen how networking works, which attacks we can expect, and how we can counter them. Also, we have seen how to implement a secure deployment of OpenStack Networking. Resources for Article: Further resources on this subject: Cloud distribution points [Article] Photo Stream with iCloud [Article] Integrating Accumulo into Various Cloud Platforms [Article]
Read more
  • 0
  • 0
  • 12711

article-image-share-and-share-alike
Packt
10 Aug 2015
13 min read
Save for later

Share and Share Alike

Packt
10 Aug 2015
13 min read
In this article by Kevin Harvey, author of the book Test-Driven Development with Django, we'll expose the data in our application via a REST API. As we do, we'll learn: The importance of documentation in the API development process How to write functional tests for API endpoints API patterns and best practices (For more resources related to this topic, see here.) It's an API world, we're just coding in it It's very common nowadays to include a public REST API in your web project. Exposing your services or data to the world is generally done for one of two reasons: You've got interesting data, and other developers might want to integrate that information into a project they're working on You're building a secondary system that you expect your users to interact with, and that system needs to interact with your data (that is, a mobile or desktop app, or an AJAX-driven front end) We've got both reasons in our application. We're housing novel, interesting data in our database that someone might want to access programmatically. Also, it would make sense to build a desktop application that could interact with a user's own digital music collection so they could actually hear the solos we're storing in our system. Deceptive simplicity The good news is that there are some great options for third-party plugins for Django that allow you to build a REST API into an existing application. The bad news is that the simplicity of adding one of these packages can let you go off half-cocked, throwing an API on top of your project without a real plan for it. If you're lucky, you'll just wind up with a bird's nest of an API: inconsistent URLs, wildly varying payloads, and difficult authentication. In the worst-case scenario, your bolt-on API exposes data you didn't intend to make public and wind up with a self-inflicted security issue. Never forget that an API is sort of invisible. Unlike traditional web pages, where bugs are very public and easy to describe, API bugs are only visible to other developers. Take special care to make sure your API behaves exactly as intended by writing thorough documentation and tests to make sure you've implemented it correctly. Writing documentation first "Documentation is king." - Kenneth Reitz If you've spent any time at all working with Python or Django, you know what good documentation looks like. The Django folks in particular seem to understand this well: the key to getting developers to use your code is great documentation. In documenting an API, be explicit. Most of your API methods' docs should take the form of "if you send this, you will get back this", with real-world examples of input and output. A great side effect of prewriting documentation is that it makes the intention of your API crystal clear. You're allowing yourself to conjure up the API from thin air without getting bogged down in any of the details, so you can get a bird's-eye view of what you're trying to accomplish. Your documentation will keep you oriented throughout the development process. Documentation-Driven testing Once you've got your documentation done, testing is simply a matter of writing test cases that match up with what you've promised. The actions of the test methods exercise HTTP methods, and your assertions check the responses. Test-Driven Development really shines when it comes to API development. There are great tools for sending JSON over the wire, but properly formatting JSON can be a pain, and reading it can be worse. Enshrining test JSON in test methods and asserting they match the real responses will save you a ton of headache. More developers, more problems Good documentation and test coverage are exponentially more important when two groups are developing in tandem—one on the client application and one on the API. Changes to an API are hard for teams like this to deal with, and should come with a lot of warning (and apologies). If you have to make a change to an endpoint, it should break a lot of tests, and you should methodically go and fix them all. What's more, no one feels the pain of regression bugs like the developer of an API-consuming client. You really, really, really need to know that all the endpoints you've put out there are still going to work when you add features or refactor. Building an API with Django REST framework Now that you're properly terrified of developing an API, let's get started. What sort of capabilities should we add? Here are a couple possibilities: Exposing the Album, Track, and Solo information we have Creating new Solos or updating existing ones Initial documentation In the Python world it's very common for documentation to live in docstrings, as it keeps the description of how to use an object close to the implementation. We'll eventually do the same with our docs, but it's kind of hard to write a docstring for a method that doesn't exist yet. Let's open up a new Markdown file API.md, right in the root of the project, just to get us started. If you've never used Markdown before, you can read an introduction to GitHub's version of Markdown at https://help.github.com/articles/markdown-basics/. Here's a sample of what should go in API.md. Have a look at https://github.com/kevinharvey/jmad/blob/master/API.md for the full, rendered version. ...# Get a Track with Solos* URL: /api/tracks/<pk>/* HTTP Method: GET## Example Response{"name": "All Blues","slug": "all-blues","album": {"name": "Kind of Blue","url": "http://jmad.us/api/albums/2/"},"solos": [{"artist": "Cannonball Adderley","instrument": "saxophone","start_time": "4:05","end_time": "6:04","slug": "cannonball-adderley","url": "http://jmad.us/api/solos/281/"},...]}# Add a Solo to a Track* URL: /api/solos/* HTTP Method: POST## Example Request{"track": "/api/tracks/83/","artist": "Don Cherry","instrument": "cornet","start_time": "2:13","end_time": "3:54"}## Example Response{"url": "http://jmad.us/api/solos/64/","artist": "Don Cherry","slug": "don-cherry","instrument": "cornet","start_time": "2:13","end_time": "3:54","track": "http://jmad.us/api/tracks/83/"} There's not a lot of prose, and there needn't be. All we're trying to do is layout the ins and outs of our API. It's important at this point to step back and have a look at the endpoints in their totality. Is there enough of a pattern that you can sort of guess what the next one is going to look like? Does it look like a fairly straightforward API to interact with? Does anything about it feel clunky? Would you want to work with this API by yourself? Take time to think through any weirdness now before anything gets out in the wild. $ git commit -am 'Initial API Documentation'$ git tag -a ch7-1-init-api-docs Introducing Django REST framework Now that we've got some idea what we're building, let's actually get it going. We'll be using Django REST Framework (http://www.django-rest-framework.org/). Start by installing it in your environment: $ pip install djangorestframework Add rest_framework to your INSTALLED_APPS in jmad/settings.py: INSTALLED_APPS = (...'rest_framework') Now we're ready to start testing. Writing tests for API endpoints While there's no such thing as browser-based testing for an external API, it is important to write tests that cover its end-to-end processing. We need to be able to send in requests like the ones we've documented and confirm that we receive the responses our documentation promises. Django REST Framework (DRF from here on out) provides tools to help write tests for the application functionality it provides. We'll use rest_framework.tests.APITestCase to write functional tests. Let's kick off with the list of albums. Convert albums/tests.py to a package, and add a test_api.py file. Then add the following: from rest_framework.test import APITestCasefrom albums.models import Albumclass AlbumAPITestCase(APITestCase):def setUp(self):self.kind_of_blue = Album.objects.create(name='Kind of Blue')self.a_love_supreme = Album.objects.create(name='A Love Supreme')def test_list_albums(self):"""Test that we can get a list of albums"""response = self.client.get('/api/albums/')self.assertEqual(response.status_code, 200)self.assertEqual(response.data[0]['name'],'A Love Supreme')self.assertEqual(response.data[1]['url'],'http://testserver/api/albums/1/') Since much of this is very similar to other tests that we've seen before, let's talk about the important differences: We import and subclass APITestCase, which makes self.client an instance of rest_framework.test.APIClient. Both of these subclass their respective django.test counterparts add a few niceties that help in testing APIs (none of which are showcased yet). We test response.data, which we expect to be a list of Albums. response.data will be a Python dict or list that corresponds to the JSON payload of the response. During the course of the test, APIClient (a subclass of Client) will use http://testserver as the protocol and hostname for the server, and our API should return a host-specific URI. Run this test, and we get the following: $ python manage.py test albums.tests.test_apiCreating test database for alias 'default'...F=====================================================================FAIL: test_list_albums (albums.tests.test_api.AlbumAPITestCase)Test that we can get a list of albums---------------------------------------------------------------------Traceback (most recent call last):File "/Users/kevin/dev/jmad-project/jmad/albums/tests/test_api.py",line 17, in test_list_albumsself.assertEqual(response.status_code, 200)AssertionError: 404 != 200---------------------------------------------------------------------Ran 1 test in 0.019sFAILED (failures=1) We're failing because we're getting a 404 Not Found instead of a 200 OK status code. Proper HTTP communication is important in any web application, but it really comes in to play when you're using AJAX. Most frontend libraries will properly classify responses as successful or erroneous based on the status code: making sure the code are on point will save your frontend developers friends a lot of headache. We're getting a 404 because we don't have a URL defined yet. Before we set up the route, let's add a quick unit test for routing. Update the test case with one new import and method: from django.core.urlresolvers import resolve...def test_album_list_route(self):"""Test that we've got routing set up for Albums"""route = resolve('/api/albums/')self.assertEqual(route.func.__name__, 'AlbumViewSet') Here, we're just confirming that the URL routes to the correct view. Run it: $ python manage.py testalbums.tests.test_api.AlbumAPITestCase.test_album_list_route...django.core.urlresolvers.Resolver404: {'path': 'api/albums/','tried': [[<RegexURLResolver <RegexURLPattern list> (admin:admin)^admin/>], [<RegexURLPattern solo_detail_view^recordings/(?P<album>[w-]+)/(?P<track>[w-]+)/(?P<artist>[w-]+)/$>], [<RegexURLPattern None ^$>]]}---------------------------------------------------------------------Ran 1 test in 0.003sFAILED (errors=1) We get a Resolver404 error, which is expected since Django shouldn't return anything at that path. Now we're ready to set up our URLs. API routing with DRF's SimpleRouter Take a look at the documentation for routers at http://www.django-rest-framework.org/api-guide/routers/. They're a very clean way of setting up URLs for DRF-powered views. Update jmad/urls.py like so: ...from rest_framework import routersfrom albums.views import AlbumViewSetrouter = routers.SimpleRouter()router.register(r'albums', AlbumViewSet)urlpatterns = [# Adminurl(r'^admin/', include(admin.site.urls)),# APIurl(r'^api/', include(router.urls)),# Appsurl(r'^recordings/(?P<album>[w-]+)/(?P<track>[w-]+)/(?P<artist>[w-]+)/$','solos.views.solo_detail',name='solo_detail_view'),url(r'^$', 'solos.views.index'),] Here's what we changed: We created an instance of SimpleRouter and used the register method to set up a route. The register method has two required arguments: a prefix to build the route methods from, and something called a viewset. Here we've supplied a non-existent class AlbumViewSet, which we'll come back to later. We've added a few comments to break up our urls.py, which was starting to look a little like a rat's nest. The actual API URLs are registered under the '^api/' path using Django's include function. Run the URL test again, and we'll get ImportError for AlbumViewSet. Let's add a stub to albums/views.py: class AlbumViewSet():pass Run the test now, and we'll start to see some specific DRF error messages to help us build out our view: $ python manage.py testalbums.tests.test_api.AlbumAPITestCase.test_album_list_routeCreating test database for alias 'default'...F...File "/Users/kevin/.virtualenvs/jmad/lib/python3.4/sitepackages/rest_framework/routers.py", line 60, in registerbase_name = self.get_default_base_name(viewset)File "/Users/kevin/.virtualenvs/jmad/lib/python3.4/sitepackages/rest_framework/routers.py", line 135, inget_default_base_nameassert queryset is not None, ''base_name' argument not specified,and could ' AssertionError: 'base_name' argument not specified, and could notautomatically determine the name from the viewset, as it does nothave a '.queryset' attribute. After a fairly lengthy output, the test runner tells us that it was unable to get base_name for the URL, as we did not specify the base_name in the register method, and it couldn't guess the name because the viewset (AlbumViewSet) did not have a queryset attribute. In the router documentation, we came across the optional base_name argument for register (as well as the exact wording of this error). You can use that argument to control the name your URL gets. However, let's keep letting DRF do its default behavior. We haven't read the documentation for viewsets yet, but we know that a regular Django class-based view expects a queryset parameter. Let's stick one on AlbumViewSet and see what happens: from .models import Albumclass AlbumViewSet():queryset = Album.objects.all() Run the test again, and we get: django.core.urlresolvers.Resolver404: {'path': 'api/albums/','tried': [[<RegexURLResolver <RegexURLPattern list> (admin:admin)^admin/>], [<RegexURLPattern solo_detail_view^recordings/(?P<album>[w-]+)/(?P<track>[w-]+)/(?P<artist>[w-]+)/$>], [<RegexURLPattern None ^$>]]}---------------------------------------------------------------------Ran 1 test in 0.011sFAILED (errors=1) Huh? Another 404 is a step backwards. What did we do wrong? Maybe it's time to figure out what a viewset really is. Summary In this article, we covered basic API design and testing patterns, including the importance of documentation when developing an API. In doing so, we took a deep dive into Django REST Framework and the utilities and testing tools available in it. Resources for Article: Further resources on this subject: Test-driven API Development with Django REST Framework [Article] Adding a developer with Django forms [Article] Code Style in Django [Article]
Read more
  • 0
  • 0
  • 2211

article-image-creating-functions-and-operations
Packt
10 Aug 2015
18 min read
Save for later

Creating Functions and Operations

Packt
10 Aug 2015
18 min read
In this article by Alex Libby, author of the book Sass Essentials, we will learn how to use operators or functions to construct a whole site theme from just a handful of colors, or defining font sizes for the entire site from a single value. You will learn how to do all these things in this article. Okay, so let's get started! (For more resources related to this topic, see here.) Creating values using functions and operators Imagine a scenario where you're creating a masterpiece that has taken days to put together, with a stunning choice of colors that has taken almost as long as the building of the project and yet, the client isn't happy with the color choice. What to do? At this point, I'm sure that while you're all smiles to the customer, you'd be quietly cursing the amount of work they've just landed you with, this late on a Friday. Sound familiar? I'll bet you scrap the colors and go back to poring over lots of color combinations, right? It'll work, but it will surely take a lot more time and effort. There's a better way to achieve this; instead of creating or choosing lots of different colors, we only need to choose one and create all of the others automatically. How? Easy! When working with Sass, we can use a little bit of simple math to build our color palette. One of the key tenets of Sass is its ability to work out values dynamically, using nothing more than a little simple math; we could define font sizes from H1 to H6 automatically, create new shades of colors, or even work out the right percentages to use when creating responsive sites! We will take a look at each of these examples throughout the article, but for now, let's focus on the principles of creating our colors using Sass. Creating colors using functions We can use simple math and functions to create just about any type of value, but colors are where these two really come into their own. The great thing about Sass is that we can work out the hex value for just about any color we want to, from a limited range of colors. This can easily be done using techniques such as adding two values together, or subtracting one value from another. To get a feel of how the color operators work, head over to the official documentation at http://sass-lang.com/documentation/file.SASS_REFERENCE.html#color_operations—it is worth reading! Nothing wrong with adding or subtracting values—it's a perfectly valid option, and will result in a valid hex code when compiled. But would you know that both values are actually deep shades of blue? Therein lies the benefit of using functions; instead of using math operators, we can simply say this: p { color: darken(#010203, 10%); } This, I am sure you will agree, is easier to understand as well as being infinitely more readable! The use of functions opens up a world of opportunities for us. We can use any one of the array of functions such as lighten(), darken(), mix(), or adjust-hue() to get a feel of how easy it is to get the values. If we head over to http://jackiebalzer.com/color, we can see that the author has exploded a number of Sass (and Compass—we will use this later) functions, so we can see what colors are displayed, along with their numerical values, as soon as we change the initial two values. Okay, we could play with the site ad infinitum, but I feel a demo coming on—to explore the effects of using the color functions to generate new colors. Let's construct a simple demo. For this exercise, we will dig up a copy of the colorvariables demo and modify it so that we're only assigning one color variable, not six. For this exercise, I will assume you are using Koala to compile the code. Okay, let's make a start: We'll start with opening up a copy of colorvariables.scss in your favorite text editor and removing lines 1 to 15 from the start of the file. Next, add the following lines, so that we should be left with this at the start of the file: $darkRed: #a43; $white: #fff; $black: #000;   $colorBox1: $darkRed; $colorBox2: lighten($darkRed, 30%); $colorBox3: adjust-hue($darkRed, 35%); $colorBox4: complement($darkRed); $colorBox5: saturate($darkRed, 30%); $colorBox6: adjust-color($darkRed, $green: 25); Save the file as colorfunctions.scss. We need a copy of the markup file to go with this code, so go ahead and extract a copy of colorvariables.html from the code download, saving it as colorfunctions.html in the root of our project area. Don't forget to change the link for the CSS file within to colorfunctions.css! Fire up Koala, then drag and drop colorfunctions.scss from our project area over the main part of the application window to add it to the list: Right-click on the file name and select Compile, and then wait for it to show Success in a green information box. If we preview the results of our work in a browser, we should see the following boxes appear: At this point, we have a working set of colors—granted, we might have to work a little on making sure that they all work together. But the key point here is that we have only specified one color, and that the others are all calculated automatically through Sass. Now that we are only defining one color by default, how easy is it to change the colors in our code? Well, it is a cinch to do so. Let's try it out using the help of the SassMeister playground. Changing the colors in use We can easily change the values used in the code, and continue to refresh the browser after each change. However, this isn't a quick way to figure out which colors work; to get a quicker response, there is an easier way: use the online Sass playground at http://www.sassmeister.com. This is the perfect way to try out different colors—the site automatically recompiles the code and updates the result as soon as we make a change. Try copying the HTML and SCSS code into the play area to view the result. The following screenshot shows the same code used in our demo, ready for us to try using different calculations: All images work on the principle that we take a base color (in this case, $dark-blue, or #a43), then adjust the color either by a percentage or a numeric value. When compiled, Sass calculates what the new value should be and uses this in the CSS. Take, for example, the color used for #box6, which is a dark orange with a brown tone, as shown in this screenshot: To get a feel of some of the functions that we can use to create new colors (or shades of existing colors), take a look at the main documentation at http://sass-lang.com/documentation/Sass/Script/Functions.html, or https://www.makerscabin.com/web/sass/learn/colors. These sites list a variety of different functions that we can use to create our masterpiece. We can also extend the functions that we have in Sass with the help of custom functions, such as the toolbox available at https://github.com/at-import/color-schemer—this may be worth a look. In our demo, we used a dark red color as our base. If we're ever stuck for ideas on colors, or want to get the right HEX, RGB(A), or even HSL(A) codes, then there are dozens of sites online that will give us these values. Here are a couple of them that you can try: HSLa Explorer, by Chris Coyier—this is available at https://css-tricks.com/examples/HSLaExplorer/. HSL Color Picker by Brandon Mathis—this is available at http://hslpicker.com/. If we know the name, but want to get a Sass value, then we can always try the list of 1,500+ colors at https://github.com/FearMediocrity/sass-color-palettes/blob/master/colors.scss. What's more, the list can easily be imported into our CSS, although it would make better sense to simply copy the chosen values into our Sass file, and compile from there instead. Mixing colors The one thing that we've not discussed, but is equally useful is that we are not limited to using functions on their own; we can mix and match any number of functions to produce our colors. A great way to choose colors, and get the appropriate blend of functions to use, is at http://sassme.arc90.com/. Using the available sliders, we can choose our color, and get the appropriate functions to use in our Sass code. The following image shows how: In most cases, we will likely only need to use two functions (a mix of darken and adjust hue, for example); if we are using more than two–three functions, then we should perhaps rethink our approach! In this case, a better alternative is to use Sass's mix() function, as follows: $white: #fff; $berry: hsl(267, 100%, 35%); p { mix($white, $berry, 0.7) } …which will give the following valid CSS: p { color: #5101b3; } This is a useful alternative to use in place of the command we've just touched on; after all, would you understand what adjust_hue(desaturate(darken(#db4e29, 2), 41), 67) would give as a color? Granted, it is something of an extreme calculation, nonetheless, it is technically valid. If we use mix() instead, it matches more closely to what we might do, for example, when mixing paint. After all, how else would we lighten its color, if not by adding a light-colored paint? Okay, let's move on. What's next? I hear you ask. Well, so far we've used core Sass for all our functions, but it's time to go a little further afield. Let's take a look at how you can use external libraries to add extra functionality. In our next demo, we're going to introduce using Compass, which you will often see being used with Sass. Using an external library So far, we've looked at using core Sass functions to produce our colors—nothing wrong with this; the question is, can we take things a step further? Absolutely, once we've gained some experience with using these functions, we can introduce custom functions (or helpers) that expand what we can do. A great library for this purpose is Compass, available at http://www.compass-style.org; we'll make use of this to change the colors which we created from our earlier boxes demo, in the section, Creating colors using functions. Compass is a CSS authoring framework, which provides extra mixins and reusable patterns to add extra functionality to Sass. In our demo, we're using shade(), which is one of the several color helpers provided by the Compass library. Let's make a start: We're using Compass in this demo, so we'll begin with installing the library. To do this, fire up Command Prompt, then navigate to our project area. We need to make sure that our installation RubyGems system software is up to date, so at Command Prompt, enter the following, and then press Enter: gem update --system Next, we're installing Compass itself—at the prompt, enter this command, and then press Enter: gem install compass Compass works best when we get it to create a project shell (or template) for us. To do this, first browse to http://www.compass-style.org/install, and then enter the following in the Tell us about your project… area: Leave anything in grey text as blank. This produces the following commands—enter each at Command Prompt, pressing Enter each time: Navigate back to Command Prompt. We need to compile our SCSS code, so go ahead and enter this command at the prompt (or copy and paste it), then press Enter: compass watch –sourcemap Next, extract a copy of the colorlibrary folder from the code download, and save it to the project area. In colorlibrary.scss, comment out the existing line for $backgrd_box6_color, and add the following immediately below it: $backgrd_box6_color: shade($backgrd_box5_color, 25%); Save the changes to colorlibrary.scss. If all is well, Compass's watch facility should kick in and recompile the code automatically. To verify that this has been done, look in the css subfolder of the colorlibrary folder, and you should see both the compiled CSS and the source map files present. If you find Compass compiles files in unexpected folders, then try using the following command to specify the source and destination folders when compiling: compass watch --sass-dir sass --css-dir css If all is well, we will see the boxes, when previewing the results in a browser window, as in the following image. Notice how Box 6 has gone a nice shade of deep red (if not almost brown)? To really confirm that all the changes have taken place as required, we can fire up a DOM inspector such as Firebug; a quick check confirms that the color has indeed changed: If we explore even further, we can see that the compiled code shows that the original line for Box 6 has been commented out, and that we're using the new function from the Compass helper library: This is a great way to push the boundaries of what we can do when creating colors. To learn more about using the Compass helper functions, it's worth exploring the official documentation at http://compass-style.org/reference/compass/helpers/colors/. We used the shade() function in our code, which darkens the color used. There is a key difference to using something such as darken() to perform the same change. To get a feel of the difference, take a look at the article on the CreativeBloq website at http://www.creativebloq.com/css3/colour-theming-sass-and-compass-6135593, which explains the difference very well. The documentation is a little lacking in terms of how to use the color helpers; the key is not to treat them as if they were normal mixins or functions, but to simply reference them in our code. To explore more on how to use these functions, take a look at the article by Antti Hiljá at http://clubmate.fi/how-to-use-the-compass-helper-functions/. We can, of course, create mixins to create palettes—for a more complex example, take a look at http://www.zingdesign.com/how-to-generate-a-colour-palette-with-compass/ to understand how such a mixin can be created using Compass. Okay, let's move on. So far, we've talked about using functions to manipulate colors; the flip side is that we are likely to use operators to manipulate values such as font sizes. For now, let's change tack and take a look at creating new values for changing font sizes. Changing font sizes using operators We already talked about using functions to create practically any value. Well, we've seen how to do it with colors; we can apply similar principles to creating font sizes too. In this case, we set a base font size (in the same way that we set a base color), and then simply increase or decrease font sizes as desired. In this instance, we won't use functions, but instead, use standard math operators, such as add, subtract, or divide. When working with these operators, there are a couple of points to remember: Sass math functions preserve units—this means we can't work on numbers with different units, such as adding a px value to a rem value, but can work with numbers that can be converted to the same format, such as inches to centimeters If we multiply two values with the same units, then this will produce square units (that is, 10px * 10px == 100px * px). At the same time, px * px will throw an error as it is an invalid unit in CSS. There are some quirks when working with / as a division operator —in most instances, it is normally used to separate two values, such as defining a pair of font size values. However, if the value is surrounded in parentheses, used as a part of another arithmetic expression, or is stored in a variable, then this will be treated as a division operator. For full details, it is worth reading the relevant section in the official documentation at http://sass-lang.com/documentation/file.Sass_REFERENCE.html#division-and-slash. With these in mind, let's create a simple demo—a perfect use for Sass is to automatically work out sizes from H1 through to H6. We could just do this in a simple text editor, but this time, let's break with tradition and build our demo directly into a session on http://www.sassmeister.com. We can then play around with the values set, and see the effects of the changes immediately. If we're happy with the results of our work, we can copy the final version into a text editor and save them as standard SCSS (or CSS) files. Let's begin by browsing to http://www.sassmeister.com, and adding the following HTML markup window: <html> <head>    <meta charset="utf-8" />    <title>Demo: Assigning colors using variables</title>    <link rel="stylesheet" type="text/css" href="css/     colorvariables.css"> </head> <body>    <h1>The cat sat on the mat</h1>    <h2>The cat sat on the mat</h2>    <h3>The cat sat on the mat</h3>    <h4>The cat sat on the mat</h4>    <h5>The cat sat on the mat</h5>    <h6>The cat sat on the mat</h6> </body> </html> Next, add the following to the SCSS window—we first set a base value of 3.0, followed by a starting color of #b26d61, or a dark, moderate red: $baseSize: 3.0; $baseColor: #b26d61; We need to add our H1 to H6 styles. The rem mixin was created by Chris Coyier, at https://css-tricks.com/snippets/css/less-mixin-for-rem-font-sizing/. We first set the font size, followed by setting the font color, using either the base color set earlier, or a function to produce a different shade: h1 { font-size: $baseSize; color: $baseColor; }   h2 { font-size: ($baseSize - 0.2); color: darken($baseColor, 20%); }   h3 { font-size: ($baseSize - 0.4); color: lighten($baseColor, 10%); }   h4 { font-size: ($baseSize - 0.6); color: saturate($baseColor, 20%); }   h5 { font-size: ($baseSize - 0.8); color: $baseColor - 111; }   h6 { font-size: ($baseSize - 1.0); color: rgb(red($baseColor) + 10, 23, 145); } SassMeister will automatically compile the code to produce a valid CSS, as shown in this screenshot: Try changing the base size of 3.0 to a different value—using http://www.sassmeister.com, we can instantly see how this affects the overall size of each H value. Note how we're multiplying the base variable by 10 to set the pixel value, or simply using the value passed to render each heading. In each instance, we can concatenate the appropriate unit using a plus (+) symbol. We then subtract an increasing value from $baseSize, before using this value as the font size for the relevant H value. You can see a similar example of this by Andy Baudoin as a CodePen, at http://codepen.io/baudoin/pen/HdliD/. He makes good use of nesting to display the color and strength of shade. Note that it uses a little JavaScript to add the text of the color that each line represents, and can be ignored; it does not affect the Sass used in the demo. The great thing about using a site such SassMeister is that we can play around with values and immediately see the results. For more details on using number operations in Sass, browse to the official documentation, which is at http://sass-lang.com/documentation/file.Sass_REFERENCE.html#number_operations. Okay, onwards we go. Let's turn our attention to creating something a little more substantial; we're going to create a complete site theme using the power of Sass and a few simple calculations. Summary Phew! What a tour! One of the key concepts of Sass is the use of functions and operators to create values, so let's take a moment to recap what we have covered throughout this article. We kicked off with a look at creating color values using functions, before discovering how we can mix and match different functions to create different shades, or using external libraries to add extra functionality to Sass. We then moved on to take a look at another key use of functions, with a look at defining different font sizes, using standard math operators. Resources for Article: Further resources on this subject: Nesting, Extend, Placeholders, and Mixins [article] Implementation of SASS [article] Constructing Common UI Widgets [article]
Read more
  • 0
  • 0
  • 4138

article-image-using-arcpy-dataaccess-module-withfeature-classesand-tables
Packt
10 Aug 2015
28 min read
Save for later

Using the ArcPy DataAccess Module withFeature Classesand Tables

Packt
10 Aug 2015
28 min read
In this article by Eric Pimpler, author of the book Programming ArcGIS with Python Cookbook - Second Edition, we will cover the following topics: Retrieving features from a feature class with SearchCursor Filtering records with a where clause Improving cursor performance with geometry tokens Inserting rows with InsertCursor Updating rows with UpdateCursor (For more resources related to this topic, see here.) We'll start this article with a basic question. What are cursors? Cursors are in-memory objects containing one or more rows of data from a table or feature class. Each row contains attributes from each field in a data source along with the geometry for each feature. Cursors allow you to search, add, insert, update, and delete data from tables and feature classes. The arcpy data access module or arcpy.da was introduced in ArcGIS 10.1 and contains methods that allow you to iterate through each row in a cursor. Various types of cursors can be created depending on the needs of developers. For example, search cursors can be created to read values from rows. Update cursors can be created to update values in rows or delete rows, and insert cursors can be created to insert new rows. There are a number of cursor improvements that have been introduced with the arcpy data access module. Prior to the development of ArcGIS 10.1, cursor performance was notoriously slow. Now, cursors are significantly faster. Esri has estimated that SearchCursors are up to 30 times faster, while InsertCursors are up to 12 times faster. In addition to these general performance improvements, the data access module also provides a number of new options that allow programmers to speed up processing. Rather than returning all the fields in a cursor, you can now specify that a subset of fields be returned. This increases the performance as less data needs to be returned. The same applies to geometry. Traditionally, when accessing the geometry of a feature, the entire geometric definition would be returned. You can now use geometry tokens to return a portion of the geometry rather than the full geometry of the feature. You can also use lists and tuples rather than using rows. There are also other new features, such as edit sessions and the ability to work with versions, domains, and subtypes. There are three cursor functions in arcpy.da. Each returns a cursor object with the same name as the function. SearchCursor() creates a read-only SearchCursor object containing rows from a table or feature class. InsertCursor() creates an InsertCursor object that can be used to insert new records into a table or feature class. UpdateCursor() returns a cursor object that can be used to edit or delete records from a table or feature class. Each of these cursor objects has methods to access rows in the cursor. You can see the relationship between the cursor functions, the objects they create, and how they are used, as follows: Function Object created Usage SearchCursor() SearchCursor This is a read-only view of data from a table or feature class InsertCursor() InsertCursor This adds rows to a table or feature class UpdateCursor() UpdateCursor This edits or deletes rows in a table or feature class The SearchCursor() function is used to return a SearchCursor object. This object can only be used to iterate through a set of rows returned for read-only purposes. No insertions, deletions, or updates can occur through this object. An optional where clause can be set to limit the rows returned. Once you've obtained a cursor instance, it is common to iterate the records, particularly with SearchCursor or UpdateCursor. There are some peculiarities that you need to understand when navigating the records in a cursor. Cursor navigation is forward-moving only. When a cursor is created, the pointer of the cursor sits just above the first row in the cursor. The first call to next() will move the pointer to the first row. Rather than calling the next() method, you can also use a for loop to process each of the records without the need to call the next() method. After performing whatever processing you need to do with this row, a subsequent call to next() will move the pointer to row 2. This process continues as long as you need to access additional rows. However, after a row has been visited, you can't go back a single record at a time. For instance, if the current row is row 3, you can't programmatically back up to row 2. You can only go forward. To revisit rows 1 and 2, you would need to either call the reset() method or recreate the cursor and move back through the object. As I mentioned earlier, cursors are often navigated through the use of for loops as well. In fact, this is a more common way to iterate a cursor and a more efficient way to code your scripts. Cursor navigation is illustrated in the following diagram: The InsertCursor() function is used to create an InsertCursor object that allows you to programmatically add new records to feature classes and tables. To insert rows, call the insertRow() method on this object. You can also retrieve a read-only tuple containing the field names in use by the cursor through the fields property. A lock is placed on the table or feature class being accessed through the cursor. Therefore, it is important to always design your script in a way that releases the cursor when you are done. The UpdateCursor() function can be used to create an UpdateCursor object that can update and delete rows in a table or feature class. As is the case with InsertCursor, this function places a lock on the data while it's being edited or deleted. If the cursor is used inside a Python's with statement, the lock will automatically be freed after the data has been processed. This hasn't always been the case. Prior to ArcGIS 10.1, cursors were required to be manually released using the Python del statement. Once an instance of UpdateCursor has been obtained, you can then call the updateCursor() method to update records in tables or feature classes and the deleteRow() method to delete a row. The subject of data locks requires a little more explanation. The insert and update cursors must obtain a lock on the data source they reference. This means that no other application can concurrently access this data source. Locks are a way of preventing multiple users from changing data at the same time and thus, corrupting the data. When the InsertCursor() and UpdateCursor() methods are called in your code, Python attempts to acquire a lock on the data. This lock must be specifically released after the cursor has finished processing so that the running applications of other users, such as ArcMap or ArcCatalog, can access the data sources. If this isn't done, no other application will be able to access the data. Prior to ArcGIS 10.1 and the with statement, cursors had to be specifically unlocked through Python's del statement. Similarly, ArcMap and ArcCatalog also acquire data locks when updating or deleting data. If a data source has been locked by either of these applications, your Python code will not be able to access the data. Therefore, the best practice is to close ArcMap and ArcCatalog before running any standalone Python scripts that use insert or update cursors. In this article, we're going to cover the use of cursors to access and edit tables and feature classes. However, many of the cursor concepts that existed before ArcGIS 10.1 still apply. Retrieving features from a feature class with SearchCursor There are many occasions when you need to retrieve rows from a table or feature class for read-only purposes. For example, you might want to generate a list of all land parcels in a city with a value greater than $100,000. In this case, you don't have any need to edit the data. Your needs are met simply by generating a list of rows that meet some sort of criteria. A SearchCursor object contains a read-only copy of rows from a table or feature class. These objects can also be filtered through the use of a where clause so that only a subset of the dataset is returned. Getting ready The SearchCursor() function is used to return a SearchCursor object. This object can only be used to iterate a set of rows returned for read-only purposes. No insertions, deletions, or updates can occur through this object. An optional where clause can be set to limit the rows returned. In this article, you will learn how to create a basic SearchCursor object on a feature class through the use of the SearchCursor() function. The SearchCursor object contains a fields property along with the next() and reset() methods. The fields property is a read-only structure in the form of a Python tuple, containing the fields requested from the feature class or table. You are going to hear the term tuple a lot in conjunction with cursors. If you haven't covered this topic before, tuples are a Python structure to store a sequence of data similar to Python lists. However, there are some important differences between Python tuples and lists. Tuples are defined as a sequence of values inside parentheses, while lists are defined as a sequence of values inside brackets. Unlike lists, tuples can't grow and shrink, which can be a very good thing in some cases when you want data values to occupy a specific position each time. This is the case with cursor objects that use tuples to store data from fields in tables and feature classes. How to do it… Follow these steps to learn how to retrieve rows from a table or feature class inside a SearchCursor object: Open IDLE and create a new script window. Save the script as C:ArcpyBookCh8SearchCursor.py. Import the arcpy.da module: import arcpy.da Set the workspace: arcpy.env.workspace = "c:/ArcpyBook/Ch8" Use a Python with statement to create a cursor: with arcpy.da.SearchCursor("Schools.shp",("Facility","Name")) as cursor: Loop through each row in SearchCursor and print the name of the school. Make sure you indent the for loop inside the with block:for row in sorted(cursor): print("School name: " + row[1]) The entire script should appear as follows: import arcpy.da arcpy.env.workspace = "c:/ArcpyBook/Ch8" with arcpy.da.SearchCursor("Schools.shp",("Facility","Name")) as cursor:    for row in sorted(cursor):        print("School name: " + row[1]) Save the script. You can check your work by examining the C:ArcpyBookcodeCh8SearchCursor_Step1.py solution file. Run the script. You should see the following output: School name: ALLAN School name: ALLISON School name: ANDREWS School name: BARANOFF School name: BARRINGTON School name: BARTON CREEK School name: BARTON HILLS School name: BATY School name: BECKER School name: BEE CAVE How it works… The with statement used with the SearchCursor() function will create, open, and close the cursor. So, you no longer have to be concerned with explicitly releasing the lock on the cursor as you did prior to ArcGIS 10.1. The first parameter passed into the SearchCursor() function is a feature class, represented by the Schools.shp file. The second parameter is a Python tuple containing a list of fields that we want returned in the cursor. For performance reasons, it is a best practice to limit the fields returned in the cursor to only those that you need to complete the task. Here, we've specified that only the Facility and Name fields should be returned. The SearchCursor object is stored in a variable called cursor. Inside the with block, we use a Python for loop to loop through each school returned. We also use the Python sorted() function to sort the contents of the cursor. To access the values from a field on the row, simply use the index number of the field you want to return. In this case, we want to return the contents of the Name column, which will be the 1 index number, since it is the second item in the tuple of field names that are returned. Filtering records with a where clause By default, SearchCursor will contain all rows in a table or feature class. However, in many cases, you will want to restrict the number of rows returned by some sort of criteria. Applying a filter through the use of a where clause limits the records returned. Getting ready By default, all rows from a table or feature class will be returned when you create a SearchCursor object. However, in many cases, you will want to restrict the records returned. You can do this by creating a query and passing it as a where clause parameter when calling the SearchCursor() function. In this article, you'll build on the script you created in the previous article by adding a where clause that restricts the records returned. How to do it… Follow these steps to apply a filter to a SearchCursor object that restricts the rows returned from a table or feature class: Open IDLE and load the SearchCursor.py script that you created in the previous recipe. Update the SearchCursor() function by adding a where clause that queries the facility field for records that have the HIGH SCHOOL text: with arcpy.da.SearchCursor("Schools.shp",("Facility","Name"), '"FACILITY" = 'HIGH SCHOOL'') as cursor: You can check your work by examining the C:ArcpyBookcodeCh8SearchCursor_Step2.py solution file. Save and run the script. The output will now be much smaller and restricted to high schools only: High school name: AKINS High school name: ALTERNATIVE LEARNING CENTER High school name: ANDERSON High school name: AUSTIN High school name: BOWIE High school name: CROCKETT High school name: DEL VALLE High school name: ELGIN High school name: GARZA High school name: HENDRICKSON High school name: JOHN B CONNALLY High school name: JOHNSTON High school name: LAGO VISTA How it works… The where clause parameter accepts any valid SQL query, and is used in this case to restrict the number of records that are returned. Improving cursor performance with geometry tokens Geometry tokens were introduced in ArcGIS 10.1 as a performance improvement for cursors. Rather than returning the entire geometry of a feature inside the cursor, only a portion of the geometry is returned. Returning the entire geometry of a feature can result in decreased cursor performance due to the amount of data that has to be returned. It's significantly faster to return only the specific portion of the geometry that is needed. Getting ready A token is provided as one of the fields in the field list passed into the constructor for a cursor and is in the SHAPE@<Part of Feature to be Returned> format. The only exception to this format is the OID@ token, which returns the object ID of the feature. The following code example retrieves only the X and Y coordinates of a feature: with arcpy.da.SearchCursor(fc, ("SHAPE@XY","Facility","Name")) as cursor: The following table lists the available geometry tokens. Not all cursors support the full list of tokens. Check the ArcGIS help files for information about the tokens supported by each cursor type. The SHAPE@ token returns the entire geometry of the feature. Use this carefully though, because it is an expensive operation to return the entire geometry of a feature and can dramatically affect performance. If you don't need the entire geometry, then do not include this token! In this article, you will use a geometry token to increase the performance of a cursor. You'll retrieve the X and Y coordinates of each land parcel from the parcels feature class along with some attribute information about the parcel. How to do it… Follow these steps to add a geometry token to a cursor, which should improve the performance of this object: Open IDLE and create a new script window. Save the script as C:ArcpyBookCh8GeometryToken.py. Import the arcpy.da module and the time module: import arcpy.da import time Set the workspace: arcpy.env.workspace = "c:/ArcpyBook/Ch8" We're going to measure how long it takes to execute the code using a geometry token. Add the start time for the script: start = time.clock() Use a Python with statement to create a cursor that includes the centroid of each feature as well as the ownership information stored in the PY_FULL_OW field: with arcpy.da.SearchCursor("coa_parcels.shp",("PY_FULL_OW","SHAPE@XY")) as cursor: Loop through each row in SearchCursor and print the name of the parcel and location. Make sure you indent the for loop inside the with block: for row in cursor: print("Parcel owner: {0} has a location of: {1}".format(row[0], row[1])) Measure the elapsed time: elapsed = (time.clock() - start) Print the execution time: print("Execution time: " + str(elapsed)) The entire script should appear as follows: import arcpy.da import time arcpy.env.workspace = "c:/ArcpyBook/Ch9" start = time.clock() with arcpy.da.SearchCursor("coa_parcels.shp",("PY_FULL_OW", "SHAPE@XY")) as cursor:    for row in cursor:        print("Parcel owner: {0} has a location of: {1}".format(row[0], row[1])) elapsed = (time.clock() - start) print("Execution time: " + str(elapsed)) You can check your work by examining the C:ArcpyBookcodeCh8GeometryToken.py solution file. Save the script. Run the script. You should see something similar to the following output. Note the execution time; your time will vary: Parcel owner: CITY OF AUSTIN ATTN REAL ESTATE DIVISION has a location of: (3110480.5197341456, 10070911.174956793) Parcel owner: CITY OF AUSTIN ATTN REAL ESTATE DIVISION has a location of: (3110670.413783513, 10070800.960865) Parcel owner: CITY OF AUSTIN has a location of: (3143925.0013213265, 10029388.97419636) Parcel owner: CITY OF AUSTIN % DOROTHY NELL ANDERSON ATTN BARRY LEE ANDERSON has a location of: (3134432.983822767, 10072192.047894118) Execution time: 9.08046185109 Now, we're going to measure the execution time if the entire geometry is returned instead of just the portion of the geometry that we need: Save a new copy of the script as C:ArcpyBookCh8GeometryTokenEntireGeometry.py. Change the SearchCursor() function to return the entire geometry using SHAPE@ instead of SHAPE@XY: with arcpy.da.SearchCursor("coa_parcels.shp",("PY_FULL_OW", "SHAPE@")) as cursor: You can check your work by examining the C:ArcpyBookcodeCh8GeometryTokenEntireGeometry.py solution file. Save and run the script. You should see the following output. Your time will vary from mine, but notice that the execution time is slower. In this case, it's only a little over a second slower, but we're only returning 2600 features. If the feature class were significantly larger, as many are, this would be amplified: Parcel owner: CITY OF AUSTIN ATTN REAL ESTATE DIVISION has a location of: <geoprocessing describe geometry object object at 0x06B9BE00> Parcel owner: CITY OF AUSTIN ATTN REAL ESTATE DIVISION has a location of: <geoprocessing describe geometry object object at 0x2400A700> Parcel owner: CITY OF AUSTIN has a location of: <geoprocessing describe geometry object object at 0x06B9BE00> Parcel owner: CITY OF AUSTIN % DOROTHY NELL ANDERSON ATTN BARRY LEE ANDERSON has a location of: <geoprocessing describe geometry object object at 0x2400A700> Execution time: 10.1211390896 How it works… A geometry token can be supplied as one of the field names supplied in the constructor for the cursor. These tokens are used to increase the performance of a cursor by returning only a portion of the geometry instead of the entire geometry. This can dramatically increase the performance of a cursor, particularly when you are working with large polyline or polygon datasets. If you only need specific properties of the geometry in your cursor, you should use these tokens. Inserting rows with InsertCursor You can insert a row into a table or feature class using an InsertCursor object. If you want to insert attribute values along with the new row, you'll need to supply the values in the order found in the attribute table. Getting ready The InsertCursor() function is used to create an InsertCursor object that allows you to programmatically add new records to feature classes and tables. The insertRow() method on the InsertCursor object adds the row. A row in the form of a list or tuple is passed into the insertRow() method. The values in the list must correspond to the field values defined when the InsertCursor object was created. Similar to instances that include other types of cursors, you can also limit the field names returned using the second parameter of the method. This function supports geometry tokens as well. The following code example illustrates how you can use InsertCursor to insert new rows into a feature class. Here, we insert two new wildfire points into the California feature class. The row values to be inserted are defined in a list variable. Then, an InsertCursor object is created, passing in the feature class and fields. Finally, the new rows are inserted into the feature class by using the insertRow() method: rowValues = [(Bastrop','N',3000,(-105.345,32.234)), ('Ft Davis','N', 456, (-109.456,33.468))] fc = "c:/data/wildfires.gdb/California" fields = ["FIRE_NAME", "FIRE_CONTAINED", "ACRES", "SHAPE@XY"] with arcpy.da.InsertCursor(fc, fields) as cursor: for row in rowValues:    cursor.insertRow(row) In this article, you will use InsertCursor to add wildfires retrieved from a .txt file into a point feature class. When inserting rows into a feature class, you will need to know how to add the geometric representation of a feature into the feature class. This can be accomplished by using InsertCursor along with two miscellaneous objects: Array and Point. In this exercise, we will add point features in the form of wildfire incidents to an empty point feature class. In addition to this, you will use Python file manipulation techniques to read the coordinate data from a text file. How to do it… We will be importing the North American wildland fire incident data from a single day in October, 2007. This data is contained in a comma-delimited text file containing one line for each fire incident on this particular day. Each fire incident has a latitude, longitude coordinate pair separated by commas along with a confidence value. This data was derived by automated methods that use remote sensing data to derive the presence or absence of a wildfire. Confidence values can range from 0 to 100. Higher numbers represent a greater confidence that this is indeed a wildfire: Open the file at C:ArcpyBookCh8Wildfire DataNorthAmericaWildfire_2007275.txt and examine the contents.You will notice that this is a simple comma-delimited text file containing the longitude and latitude values for each fire along with a confidence value. We will use Python to read the contents of this file line by line and insert new point features into the FireIncidents feature class located in the C:ArcpyBookCh8 WildfireDataWildlandFires.mdb personal geodatabase. Close the file. Open ArcCatalog. Navigate to C:ArcpyBookCh8WildfireData.You should see a personal geodatabase called WildlandFires. Open this geodatabase and you will see a point feature class called FireIncidents. Right now, this is an empty feature class. We will add features by reading the text file you examined earlier and inserting points. Right-click on FireIncidents and select Properties. Click on the Fields tab.The latitude/longitude values found in the file we examined earlier will be imported into the SHAPE field and the confidence values will be written to the CONFIDENCEVALUE field. Open IDLE and create a new script. Save the script to C:ArcpyBookCh8InsertWildfires.py. Import the arcpy modules: import arcpy Set the workspace: arcpy.env.workspace = "C:/ArcpyBook/Ch8/WildfireData/WildlandFires.mdb" Open the text file and read all the lines into a list: f = open("C:/ArcpyBook/Ch8/WildfireData/NorthAmericaWildfires_2007275.txt","r") lstFires = f.readlines() Start a try block: try: Create an InsertCursor object using a with block. Make sure you indent inside the try statement. The cursor will be created in the FireIncidents feature class: with arcpy.da.InsertCursor("FireIncidents",("SHAPE@XY","CONFIDENCEVALUE")) as cur: Create a counter variable that will be used to print the progress of the script: cntr = 1 Loop through the text file line by line using a for loop. Since the text file is comma-delimited, we'll use the Python split() function to separate each value into a list variable called vals. We'll then pull out the individual latitude, longitude, and confidence value items and assign them to variables. Finally, we'll place these values into a list variable called rowValue, which is then passed into the insertRow() function for the InsertCursor object, and we then print a message: for fire in lstFires:      if 'Latitude' in fire:        continue      vals = fire.split(",")      latitude = float(vals[0])      longitude = float(vals[1])      confid = int(vals[2])      rowValue = [(longitude,latitude),confid]      cur.insertRow(rowValue)      print("Record number " + str(cntr) + " written to feature class")      #arcpy.AddMessage("Record number" + str(cntr) + " written to feature class")      cntr = cntr + 1 Add the except block to print any errors that may occur: except Exception as e: print(e.message) Add a finally block to close the text file: finally: f.close() The entire script should appear as follows: import arcpy   arcpy.env.workspace = "C:/ArcpyBook/Ch8/WildfireData/WildlandFires.mdb" f = open("C:/ArcpyBook/Ch8/WildfireData/NorthAmericaWildfires_2007275.txt","r") lstFires = f.readlines() try: with arcpy.da.InsertCursor("FireIncidents", ("SHAPE@XY","CONFIDENCEVALUE")) as cur:    cntr = 1    for fire in lstFires:      if 'Latitude' in fire:        continue      vals = fire.split(",")      latitude = float(vals[0])      longitude = float(vals[1])      confid = int(vals[2])      rowValue = [(longitude,latitude),confid]      cur.insertRow(rowValue)      print("Record number " + str(cntr) + " written to feature class")      #arcpy.AddMessage("Record number" + str(cntr) + "       written to feature class")      cntr = cntr + 1 except Exception as e: print(e.message) finally: f.close() You can check your work by examining the C:ArcpyBookcodeCh8InsertWildfires.py solution file. Save and run the script. You should see messages being written to the output window as the script runs: Record number: 406 written to feature class Record number: 407 written to feature class Record number: 408 written to feature class Record number: 409 written to feature class Record number: 410 written to feature class Record number: 411 written to feature class Open ArcMap and add the FireIncidents feature class to the table of contents. The points should be visible, as shown in the following screenshot: You may want to add a basemap to provide some reference for the data. In ArcMap, click on the Add Basemap button and select a basemap from the gallery. How it works… Some additional explanation may be needed here. The lstFires variable contains a list of all the wildfires that were contained in the comma-delimited text file. The for loop will loop through each of these records one by one, inserting each individual record into the fire variable. We also include an if statement that is used to skip the first record in the file, which serves as the header. As I explained earlier, we then pull out the individual latitude, longitude, and confidence value items from the vals variable, which is just a Python list object and assign them to variables called latitude, longitude, and confid. We then place these values into a new list variable called rowValue in the order that we defined when we created InsertCursor. Thus, the latitude and longitude pair should be placed first followed by the confidence value. Finally, we call the insertRow() function on the InsertCursor object assigned to the cur variable, passing in the new rowValue variable. We close by printing a message that indicates the progress of the script and also create the except and finally blocks to handle errors and close the text file. Placing the file.close() method in the finally block ensures that it will execute and close the file even if there is an error in the previous try statement. Updating rows with UpdateCursor If you need to edit or delete rows from a table or feature class, you can use UpdateCursor. As is the case with InsertCursor, the contents of UpdateCursor can be limited through the use of a where clause. Getting ready The UpdateCursor() function can be used to either update or delete rows in a table or feature class. The returned cursor places a lock on the data, which will automatically be released if used inside a Python with statement. An UpdateCursor object is returned from a call to this method. The UpdateCursor object places a lock on the data while it's being edited or deleted. If the cursor is used inside a Python with statement, the lock will automatically be freed after the data has been processed. This hasn't always been the case. Previous versions of cursors were required to be manually released using the Python del statement. Once an instance of UpdateCursor has been obtained, you can then call the updateCursor() method to update records in tables or feature classes and the deleteRow() method can be used to delete a row. In this article, you're going to write a script that updates each feature in the FireIncidents feature class by assigning a value of poor, fair, good, or excellent to a new field that is more descriptive of the confidence values using an UpdateCursor. Prior to updating the records, your script will add the new field to the FireIncidents feature class. How to do it… Follow these steps to create an UpdateCursor object that will be used to edit rows in a feature class: Open IDLE and create a new script. Save the script to C:ArcpyBookCh8UpdateWildfires.py. Import the arcpy module: import arcpy Set the workspace: arcpy.env.workspace = "C:/ArcpyBook/Ch8/WildfireData/WildlandFires.mdb" Start a try block: try: Add a new field called CONFID_RATING to the FireIncidents feature class. Make sure to indent inside the try statement: arcpy.AddField_management("FireIncidents","CONFID_RATING", "TEXT","10") print("CONFID_RATING field added to FireIncidents") Create a new instance of UpdateCursor inside a with block: with arcpy.da.UpdateCursor("FireIncidents", ("CONFIDENCEVALUE","CONFID_RATING")) as cursor: Create a counter variable that will be used to print the progress of the script. Make sure you indent this line of code and all the lines of code that follow inside the with block: cntr = 1 Loop through each of the rows in the FireIncidents fire class. Update the CONFID_RATING field according to the following guidelines:     Confidence value 0 to 40 = POOR     Confidence value 41 to 60 = FAIR     Confidence value 61 to 85 = GOOD     Confidence value 86 to 100 = EXCELLENT This can be translated in the following block of code:    for row in cursor:      # update the confid_rating field      if row[0] <= 40:        row[1] = 'POOR'      elif row[0] > 40 and row[0] <= 60:        row[1] = 'FAIR'      elif row[0] > 60 and row[0] <= 85:        row[1] = 'GOOD'      else:        row[1] = 'EXCELLENT'      cursor.updateRow(row)                       print("Record number " + str(cntr) + " updated")      cntr = cntr + 1 Add the except block to print any errors that may occur: except Exception as e: print(e.message) The entire script should appear as follows: import arcpy   arcpy.env.workspace = "C:/ArcpyBook/Ch8/WildfireData/WildlandFires.mdb" try: #create a new field to hold the values arcpy.AddField_management("FireIncidents", "CONFID_RATING","TEXT","10") print("CONFID_RATING field added to FireIncidents") with arcpy.da.UpdateCursor("FireIncidents",("CONFIDENCEVALUE", "CONFID_RATING")) as cursor:    cntr = 1    for row in cursor:      # update the confid_rating field      if row[0] <= 40:        row[1] = 'POOR'      elif row[0] > 40 and row[0] <= 60:        row[1] = 'FAIR'      elif row[0] > 60 and row[0] <= 85:        row[1] = 'GOOD'      else:        row[1] = 'EXCELLENT'      cursor.updateRow(row)                       print("Record number " + str(cntr) + " updated")      cntr = cntr + 1 except Exception as e: print(e.message) You can check your work by examining the C:ArcpyBookcodeCh8UpdateWildfires.py solution file. Save and run the script. You should see messages being written to the output window as the script runs: Record number 406 updated Record number 407 updated Record number 408 updated Record number 409 updated Record number 410 updated Open ArcMap and add the FireIncidents feature class. Open the attribute table and you should see that a new CONFID_RATING field has been added and populated by UpdateCursor: When you insert, update, or delete data in cursors, the changes are permanent and can't be undone if you're working outside an edit session. However, with the new edit session functionality provided by ArcGIS 10.1, you can now make these changes inside an edit session to avoid these problems. We'll cover edit sessions soon. How it works… In this case, we've used UpdateCursor to update each of the features in a feature class. We first used the Add Field tool to add a new field called CONFID_RATING, which will hold new values that we assign based on values found in another field. The groups are poor, fair, good, and excellent and are based on numeric values found in the CONFIDENCEVALUE field. We then created a new instance of UpdateCursor based on the FireIncidents feature class, and returned the two fields mentioned previously. The script then loops through each of the features and assigns a value of poor, fair, good, or excellent to the CONFID_RATING field (row[1]), based on the numeric value found in CONFIDENCEVALUE. A Python if/elif/else structure is used to control the flow of the script based on the numeric value. The value for CONFID_RATING is then committed to the feature class by passing the row variable into the updateRow() method. Summary In this article we studied, how to retrieve features from a feature class with SerchCursor, filtering records with a where clause, improving cursr performance with geoetry tokens, inserting rows with InsertCursor and updating rows with UpdateCursor. Resources for Article: Further resources on this subject: Adding Graphics to the Map [article] Introduction to Mobile Web ArcGIS Development [article] Python functions – Avoid repeating code [article]
Read more
  • 0
  • 0
  • 7055

article-image-hands-prezi-mechanics
Packt
10 Aug 2015
8 min read
Save for later

Hands-on with Prezi Mechanics

Packt
10 Aug 2015
8 min read
In this In this article by J.J. Sylvia IV, author of the book Mastering Prezi for Business Presentations - Second Edition, we will see how to edit a figure and to style symbols. Also we will see the Grouping feature and brief introduction of the Prezi text editor. (For more resources related to this topic, see here.) Editing lines When editing lines or arrows, you can change them from being straight to curved by dragging the center point in any direction: This is extremely useful when creating the line drawings we saw earlier. It's also useful to get arrows pointing at various objects on your canvas: Styled symbols If you're on a tight deadline, or trying to create drawings with shapes simply isn't for you, then the styles available in Prezi may be of more interest to you. These are common symbols that Prezi has created in a few different styles that can be easily inserted into any of your presentations. You can select these from the same Symbols & shapes… option from the Insert menu where we found the symbols. You'll see several different styles to choose from on the right-hand side of your screen. Each of these categories has similar symbols, but styled differently. There is a wide variety of symbols available ranging from people to social media logos. You can pick a style that best matches your theme or the atmosphere you've created for your presentation. Instead of creating your own person from shapes, you can select from a variety of people symbols available: Although these symbols can be very handy, you should be aware that you can't edit them as part of your presentation. If you decide to use one, note that it will work as it is—there are no new hairstyles for these symbols. Highlighter The highlighter tool is extremely useful for pointing out key pieces of information such as an interesting fact. To use it, navigate to the Insert menu and select the Highlighter option. Then, just drag the cursor across the text you'd like to highlight. Once you've done this, the highlighter marks become objects in their own right, so you can click on them to change their size or position just as you would do for a shape. To change the color of your highlighter, you will need to go into the Theme Wizard and edit the RGB values. We'll cover how to do this later when we discuss branding. Grouping Grouping is a great feature that allows you to move or edit several different elements of your presentation at once. This can be especially useful if you're trying to reorganize the layout of your Prezi after it's been created, or to add animations to several elements at once. Let's go back to the drawing we created earlier to see how this might work: The first way to group items is to hold down the Ctrl key (Command on Mac OS) and to left-click on each element you want to group individually. In this case, I need to click on each individual line that makes up the flat top hair in the preceding image. This might be necessary if I only want to group the hair, for example: Another method for grouping is to hold down the Shift key while dragging your mouse to select multiple items at once. In the preceding screenshot, I've selected my entire person at once. Now, I can easily rotate, resize, or move the entire person at once, without having to move each individual line or shape. If you select a group of objects, move them, and then realize that a piece is missing because it didn't get selected, just press the Ctrl+Z (Command+Z on Mac OS) keys on your keyboard to undo the move. Then, broaden your selection and try again. Alternatively, you can hold down the Shift key and simply click on the piece you missed to add it to the group. If we want to keep these elements grouped together instead of having to reselect them each time we decide to make a change, we can click on the Group button that appears with this change. Now these items will stay grouped unless we click on the new Ungroup button, now located in the same place as the Group button previously was: You can also use frames to group material together. If you already created frames as part of your layout, this might make the grouping process even easier. Prezi text editor Over the years, the Prezi text editor has evolved to be quite robust, and it's now possible to easily do all of your text editing directly within Prezi. Spell checker When you spell something incorrectly, Prezi will underline the word it doesn't recognize with a red line. This is just as you would see it in Microsoft Word or any other text editor. To correct the word, simply right-click on it (or Command + Click on Mac OS) and select the word you meant to type from the suggestions, as shown in the following screenshot: The text drag-apart feature So a colleague of yours has just e-mailed you the text that they want to appear in the Prezi you're designing for them? That's great news as it'll help you understand the flow of the presentation. What's frustrating, though, is that you'll have to copy and paste every single line or paragraph across to put it in the right place on your canvas. At least, that used to be the case before Prezi introduced the drag-apart feature in the text editor. This means you can now easily drag a selection of text anywhere on your canvas without having to rely on the copy and paste options. Let's see how we can easily change the text we spellchecked previously, as shown in the following screenshot: In order to drag your text apart, simply highlight the area you require, hold the mouse button down, and then drag the text anywhere on your canvas. Once you have separated your text, you can then edit the separate parts as you would edit any other individual object on your canvas. In this example, we can change the size of the company name and leave the other text as it is, which we couldn't do within a single textbox: Building Prezis for colleagues If you've kindly offered to build a Prezi for one of your colleagues, ask them to supply the text for it in Word format. You'll be able to run a spellcheck on it from there before you copy and paste it into Prezi. Any bad spellings you miss will also get highlighted on your Prezi canvas but it's good to use both options as a safety net. Font colors Other than dragging text apart to make it stand out more on its own, you might want to highlight certain words so that they jump out at your audience even more. The great news is that you can now highlight individual lines of text or single words and change their color. To do so, just highlight a word by clicking and dragging your mouse across it. Then, click on the color picker at the top of the textbox to see the color menu, as shown in the following screenshot: Select any of the colors available in the palette to change the color of that piece of text. Nothing else in the textbox will be affected apart from the text you have selected. This gives you much greater freedom to use colored text in your Prezi design, and doesn't leave you restricted as in older versions of the software. Choose the right color To make good use of this feature, we recommend that you use a color that completely contrasts to the rest of your design. For example, if your design and corporate colors are blue, we suggest you use red or purple to highlight key words. Also, once you pick a color, stick to it throughout the presentation so that your audience knows when they see a key piece of information. Bullet points and indents Bullets and indents make it much easier to put together your business presentations and helps to give the audience some short, simple information as text in the same format they're used to seeing in other presentations. This can be done by simply selecting the main body of text and clicking on the bullet point icon at the top of the textbox. This is a really simple feature, but a useful one nonetheless. We'd obviously like to point out that too much text on any presentation is a bad thing. Keep it short and to the point. Also, remember that too many bullets can kill a presentation. Summary In this article, we discussed the basic mechanics of Prezi. Learning to combine these tools in creative ways will help you move from a Prezi novice to master. Shapes can be used creatively to create content and drawings, and can be grouped together for easy movement and editing. Prezi also features basic text editing which are explained in this article. Resources for Article: Further resources on this subject: Turning your PowerPoint into a Prezi [Article] The Fastest Way to Go from an Idea to a Prezi [Article] Using Prezi - The Online Presentation Software Tool [Article]
Read more
  • 0
  • 0
  • 923
Unlock access to the largest independent learning library in Tech for FREE!
Get unlimited access to 7500+ expert-authored eBooks and video courses covering every tech area you can think of.
Renews at €18.99/month. Cancel anytime
article-image-exploring-jenkins
Packt
10 Aug 2015
7 min read
Save for later

Exploring Jenkins

Packt
10 Aug 2015
7 min read
In this article by Mitesh Soni, the author of the book Jenkins Essentials, introduces us to Jenkins. (For more resources related to this topic, see here.) Jenkins is an open source application written in Java. It is one of the most popular continuous integration (CI) tools used to build and test different kinds of projects. In this article, we will have a quick overview of Jenkins, essential features, and its impact on DevOps culture. Before we can start using Jenkins, we need to install it. In this article, we have provided a step-by-step guide to install Jenkins. Installing Jenkins is a very easy task and is different from the OS flavors. This article will also cover the DevOps pipeline. To be precise, we will discuss the following topics in this article: Introduction to Jenkins and its features Installation of Jenkins on Windows and the CentOS operating system How to change configuration settings in Jenkins What is the deployment pipeline On your mark, get set, go! Introduction to Jenkins and its features Let's first understand what continuous integration is. CI is one of the most popular application development practices in recent times. Developers integrate bug fix, new feature development, or innovative functionality in code repository. The CI tool verifies the integration process with an automated build and automated test execution to detect issues with the current source of an application, and provide quick feedback. Jenkins is a simple, extensible, and user-friendly open source tool that provides CI services for application development. Jenkins supports SCM tools such as StarTeam, Subversion, CVS, Git, AccuRev and so on. Jenkins can build Freestyle, Apache Ant, and Apache Maven-based projects. The concept of plugins makes Jenkins more attractive, easy to learn, and easy to use. There are various categories of plugins available such as Source code management, Slave launchers and controllers, Build triggers, Build tools, Build notifies, Build reports, other post-build actions, External site/tool integrations, UI plugins, Authentication and user management, Android development, iOS development, .NET development, Ruby development, Library plugins, and so on. Jenkins defines interfaces or abstract classes that model a facet of a build system. Interfaces or abstract classes define an agreement on what needs to be implemented; Jenkins uses plugins to extend those implementations. To learn more about all plugins, visit https://wiki.jenkins-ci.org/x/GIAL. To learn how to create a new plugin, visit https://wiki.jenkins-ci.org/x/TYAL. To download different versions of plugins, visit https://updates.jenkins-ci.org/download/plugins/. Features Jenkins is one of the most popular CI servers in the market. The reasons for its popularity are as follows: Easy installation on different operating systems. Easy upgrades—Jenkins has very speedy release cycles. Simple and easy-to-use user interface. Easily extensible with the use of third-party plugins—over 400 plugins. Easy to configure the setup environment in the user interface. It is also possible to customize the user interface based on likings. The master slave architecture supports distributed builds to reduce loads on the CI server. Jenkins is available with test harness built around JUnit; test results are available in graphical and tabular forms. Build scheduling based on the cron expression (to know more about cron, visit http://en.wikipedia.org/wiki/Cron). Shell and Windows command execution in prebuild steps. Notification support related to the build status. Installation of Jenkins on Windows and CentOS Go to https://jenkins-ci.org/. Find the Download Jenkins section on the home page of Jenkins's website. Download the war file or native packages based on your operating system. A Java installation is needed to run Jenkins. Install Java based on your operating system and set the JAVA_HOME environment variable accordingly. Installing Jenkins on Windows Select the native package available for Windows. It will download jenkins-1.xxx.zip. In our case, it will download jenkins-1.606.zip. Extract it and you will get setup.exe and jenkins-1.606.msi files. Click on setup.exe and perform the following steps in sequence. On the welcome screen, click Next: Select the destination folder and click on Next. Click on Install to begin installation. Please wait while the Setup Wizard installs Jenkins. Once the Jenkins installation is completed, click on the Finish button. Verify the Jenkins installation on the Windows machine by opening URL http://<ip_address>:8080 on the system where you have installed Jenkins. Installation of Jenkins on CentOS To install Jenkins on CentOS, download the Jenkins repository definition to your local system at /etc/yum.repos.d/ and import the key. Use the wget -O /etc/yum.repos.d/jenkins.repo http://pkg.jenkins-ci.org/redhat/jenkins.repo command to download repo. Now, run yum install Jenkins; it will resolve dependencies and prompt for installation. Reply with y and it will download the required package to install Jenkins on CentOS. Verify the Jenkins status by issuing the service jenkins status command. Initially, it will be stopped. Start Jenkins by executing service jenkins start in the terminal. Verify the Jenkins installation on the CentOS machine by opening the URL http://<ip_address>:8080 on the system where you have installed Jenkins. How to change configuration settings in Jenkins Click on the Manage Jenkins link on the dashboard to configure system, security, to manage plugins, slave nodes, credentials, and so on. Click on the Configure System link to configure Java, Ant, Maven, and other third-party products' related information. Jenkins uses Groovy as its scripting language. To execute the arbitrary script for administration/trouble-shooting/diagnostics on the Jenkins dashboard, go to the Manage Jenkins link on the dashboard, click on Script Console, and run println(Jenkins.instance.pluginManager.plugins). To verify the system log, go to the Manage Jenkins link on the dashboard and click on the System Log link or visit http://localhost:8080/log/all. To get more information on third-party libraries—version and license information in Jenkins, go to the Manage Jenkins link on the dashboard and click on the About Jenkins link. What is the deployment pipeline? The application development life cycle is a traditionally lengthy and a manual process. In addition, it requires effective collaboration between development and operations teams. The deployment pipeline is a demonstration of automation involved in the application development life cycle containing the automated build execution and test execution, notification to the stakeholder, and deployment in different runtime environments. Effectively, the deployment pipeline is a combination of CI and continuous delivery, and hence is a part of DevOps practices. The following diagram depicts the deployment pipeline process: Members of the development team check code into a source code repository. CI products such as Jenkins are configured to poll changes from the code repository. Changes in the repository are downloaded to the local workspace and Jenkins triggers an automated build process, which is assisted by Ant or Maven. Automated test execution or unit testing, static code analysis, reporting, and notification of successful or failed build process are also part of the CI process. Once the build is successful, it can be deployed to different runtime environments such as testing, preproduction, production, and so on. Deploying a war file in terms of the JEE application is normally the final stage in the deployment pipeline. One of the biggest benefits of the deployment pipeline is the faster feedback cycle. Identification of issues in the application at early stages and no dependencies on manual efforts make this entire end-to-end process more effective. To read more, visit http://martinfowler.com/bliki/DeploymentPipeline.html and http://www.informit.com/articles/article.aspx?p=1621865&seqNum=2. Summary Congratulations! We reached the end of this article and hence we have Jenkins installed on our physical or virtual machine. Till now, we covered the basics of CI and the introduction to Jenkins and its features. We completed the installation of Jenkins on Windows and CentOS platforms. In addition to this, we discussed the deployment pipeline and its importance in CI. Resources for Article: Further resources on this subject: Jenkins Continuous Integration [article] Running Cucumber [article] Introduction to TeamCity [article]
Read more
  • 0
  • 0
  • 2321

article-image-updating-and-building-our-masters
Packt
10 Aug 2015
20 min read
Save for later

Updating and building our masters

Packt
10 Aug 2015
20 min read
In this article by John Henry Krahenbuhl, the author of the book, Axure Prototyping Blueprints, we determine that with modification, we can use all of the masters from the previous community site. To support our new use cases, we need additional registration variables, a master to support user registration, and interactions for the creation of, and to comment on, posts. Next we will create global variables and add new masters, as well as enhance the design and interactions for each master. (For more resources related to this topic, see here.) Creating additional global variables Based on project requirements, we identified that nine global variables will be required. To create global variables, on the main menu click on Project and then click on Global Variables…. In the Global Variables dialog, perform the following steps: Click the green + sign and type Email. Click on the Default Value field and type songwriter@test.com. Repeat step 1 eight more times to create additional variables using the following table for the Variable Name and Default Value fields: Variable Name Default Value Password Grammy UserEmail   UserPassword   LoggedIn No TopicIndex 0 UserText   NewPostTopic   NewPostHeadline   Click on OK. With our global variables created, we are now ready to create new masters, as well as update the design and interactions for existing masters. We will start by adding masters to the Masters pane. Adding masters to the Masters pane We will add a total of two masters to the Masters pane. To create our masters, perform the following steps: In the Masters pane, click on the, Add Master icon ,type PostCommentary and press Enter. Again, in the Masters pane, click on the Add Master icon , type NewPost and press Enter. In the same Masters pane, right-click on the icon next to the Header master, mouse over Drop Behavior and click on Lock to Master Location. We are now ready to remodel the existing masters and complete the design and interactions for our new masters. We will start with the Header master. Enhancing our Header master Once completed, the Header master will look as follows: To update the Header master, we will add an ErrorMessage label, delete the Search widgets, and update the menu items. To update widgets on the Header master, perform the following steps: In the Masters pane, double-click on the icon  next to the Header master to open in the design area. In the Widgets pane, drag the Label widget  and place it at coordinates (730,0). Now, select the Text Field widget and type Your email or password is incorrect.. In the Widget Interactions and Notes pane, click in the Shape Name field and type ErrorMessage. In the Widget Properties and Style pane, with the Style tab selected, scroll to Font and perform the following steps: Change the font size to 8. Click on the down arrow next to the Text Color icon . In the drop-down menu, in the # text field, enter FF0000. In the toolbar, click on the checkbox next to Hidden. Click on the EmailTextField at coordinates (730,10). If text is displayed on the text field, right-click and click Edit Text. All text on the widget will be highlighted, click on Delete. In the Widget Properties and Style pane, with the Properties tab selected, scroll to Text Field and perform the following steps: Next to Hint Text, enter Email. Click Hint Style. In the Set Interaction Styles dialog box, click on the checkbox next to Font Color. Click on the down arrow next to the Text Color icon . In the drop-down menu, in the # text field, enter 999999. Click on OK. Click on the PasswordTextField at coordinates (815,10). If text is displayed on the text field, right-click and click on Edit Text. All text on the widget will be highlighted, press Delete. In the Widget Properties and Style pane, with the Properties tab selected, scroll to Text Field and perform the following steps: Click on the drop-down menu next to Type and select Password. Next to Hint Text, enter Password. Click on Hint Style. In the Set Interaction Styles dialog box, click on the checkbox next to Font Color. Click on the down arrow next to the Text Color icon . In the drop-down menu, in the # text field, enter 999999. Click on OK. Click on the SearchTextField at coordinates (730,82) and then on Delete. Click on the SearchButton at coordinates (890,80) and then on Delete. Next, we will convert all the Log In widgets into a dynamic panel named LoginDP. The LoginDP will allow us to transition between states and show different content when a user logs in. To create the LoginDP, in our header, select the following widgets: Named Widget Coordinates ErrorMessage (730,0) EmailTextField (730,10) PasswordTextField (815,10) LogInButton (894,10) NewUserLink (730,30) ForgotLink (815,30) With the preceding six widgets selected, right-click and click Convert to Dynamic Panel. In the Widget Interactions and Notes pane, click on the Dynamic Panel Name field and type LogInDP. All the Log In widgets are now on State1 of the LogInDP. We will now add widgets to State2 for the LogInDP. With the Log In widgets converted into the LogInDP, we will now add and design State2. In the Widget Manager pane, under the LogInDP, right-click on State1, and in the menu, click on Add State. Click on the State icon beside  State2 twice, to open in the design area. Perform the following steps: In the Widgets pane, drag the Label widget  and place it at coordinates (0,13) and do the these steps: Type Welcome, email@test.com. In the Widget Interactions and Notes pane, click in the Shape Name field and type WelcomeLabel. In the Widget Properties and Style pane, with the Style tab selected scroll to Font, change the font size to 9, and click on the Italic icon . In the Widgets pane, drag the Button Shape widget  and place it at coordinates (164,10). Type Log Out. In the toolbar, change w: to 56 and h: to 16. In the Widget Interactions and Notes pane, click on the Shape Name field and type LogOutButton. To complete the design of the Header master, we need to rename the menu items on the HzMenu. In the Masters pane, double-click on the Header master to open in the design area. Click on the HzMenu at coordinates (250,80). Perform the following steps: Click on the first menu item and type Random Musings. In the Widget Interactions and Notes pane, click on the Menu Item Name field and type RandomMusingsMenuItem. Click on Case 1 under the OnClick event and press the Delete key. Click on Create Link…. In the pop-up sitemap, click on Random Musings. Again, click on the first menu item and type Accolades and News. In the Widget Interactions and Notes pane, click on the Menu Item Name field and type AccoladesMenuItem. Click on Case 1 under the OnClick event and press the Delete key. Click on Create Link…. In the pop-up sitemap, click on Accolades and News. Click on the first menu item and type About. In the Widget Interactions and Notes pane, click on the Menu Item Name field and type AboutMenuItem. Click on Case 1 under the OnClick event and press the Delete key. Click on Create Link…. In the pop-up sitemap, click on About. We will now create a registration lightbox that will be shown when the user clicks on the NewUserLink. To display a dynamic panel in a lightbox, we will use the OnShow action with the option treat as lightbox set. We will use the Registration dynamic panel's Pin to Browser property to have the dynamic panel shown in the center and middle of the window. Learn more at http://www.axure.com/learn/dynamic-panels/basic/lightbox-tutorial. In the Masters pane, double-click on the icon  next to the Header master to open in the design area. In the Widgets pane, drag the Dynamic Panel widget  and place it at coordinates (310,200). In the toolbar, change w: to 250, h: to 250, and click on the Hidden checkbox. In the Widget Interactions and Notes pane, click on the Dynamic Panel Name field and type RegistrationLightBoxDP. In the Widget Manager pane with the Properties tab selected, click on Pin to Browser. In the Pin to Browser dialog box, click on the checkbox next to Pin to browser window and click on OK. In the Widget Manager pane, under the RegistrationLightBoxDP, click on the State icon  beside State1 twice to open in the design area. In the Widgets pane, drag the Rectangle widget  and place it at coordinates (0,0). In the Widget Interactions and Notes pane, click on the Shape Name field and type BackgroundRectangle. In the toolbar, change w: to 250 and h: to 250. Again in the Widgets pane, drag the Heading2 widget  and place it at coordinates (25,20). With the Heading2 widget selected, type Registration. In the toolbar, change w: to 141 and h: to 28. In the Widget Interactions and Notes pane, click on the Shape Name field and type RegistrationHeading. Repeat steps 8-10 to complete the design of the RegistrationLightBoxDP using the following table (* if applicable): Widget Coordinates Text* (Shown on Widget) Width* (w:) Height* (h:) Name field (In the Widget Interactions and Notes pane)   Label (25,67) Enter Email     EnterEmailLabel   Text Field (25,86)       EnterEmailField   Label (25,121) Enter Password     EnterPasswordLabel   Text Field (25,140)       EnterPasswordField   Button Shape (25,190) Submit 200 30 SubmitButton Click on the EnterEmailField text field at coordinates (25,86). In the Widget Properties and Style pane, with the Properties tab selected, scroll to Text Field and perform the following steps: Next to Hint Text, enter Email. Click on Hint Style. In the Set Interaction Styles dialog box, click on the checkbox next to Font Color. Click on the down arrow next to the Text Color icon . In the drop-down menu, in the # text field, enter 999999. Click on OK. Click on the EnterPasswordField text field at coordinates (25,140). In the Widget Properties and Style pane, with the Properties tab selected, scroll to Text Field and perform the following steps: Click on the drop-down menu next to Type and select Password. Next to Hint Text, enter Password. Click on Hint Style. In the Set Interaction Styles dialog box, click on the checkbox next to Font Color. Click on the down arrow next to the Text Color icon . In the drop-down menu, in the # text field, enter 999999. Click on OK. With the updates completed for the Header master, we are now ready to define the interactions. Refining the interactions for our Header master We will need to add additional interactions for Log In and Registration on our Header master. Interactions with our Header master will be triggered by the following named widgets and events: Dynamic Panel State Widget Event LoginDP State1 LoginButton OnClick LoginDP State1 NewUserLink OnClick LoginDP State1 ForgotLink OnClick LoginDP State2 LogOutButton OnClick RegistrationLightBoxDP State1 SubmitButton OnClick We will now define the interactions for each widget, starting with LoginButton. Defining interactions for the LoginButton When the LoginButton is clicked, the OnClick event will evaluate if the text entered in the EmailTextField and PasswordTextField equals the e-mail and password variable values. If the variables are valid, LoginDP will be set to State2 and text on the WelcomeLabel will be updated. If the variables values are not equal, we will show an error message. We will define these actions by creating two cases: ValidateUser and ShowErrorMessage. Validating the user's email and password To define the ValidateUser case for the OnClick interaction, open the LogInDP State1 in the design area. Click on the LogInButton at coordinates (164,10). In the Widget Interactions and Notes pane with the Interactions tab selected, click on Add Case…. A Case Editor dialog box will open. In the Case Name field, type ValidateUser. In the Case Editor dialog, perform the following steps: You will see the Condition Builder window similar to the one shown in the following screenshot after the first and second conditions are defined: Create the first condition. Click on the Add Condition button. In the Condition Builder dialog box, in the outlined condition box, perform the following steps: In the first dropdown, select text on widget. In the second dropdown, select EmailTextField. In the third dropdown, select equals. In the fourth dropdown, select value. In the fifth dropdown, select [[Email]]. Click the green + sign. Create the second condition. Click on the Add Condition button. In the Condition Builder dialog box, in the outlined condition box, perform the following steps: In the first dropdown, select text on widget. In the second dropdown, select PasswordTextField. In the third dropdown, select equals. In the fourth dropdown, select value. In the fifth dropdown, select [[Password]]. Click on OK. Once the following three actions are defined, you should see the Case Editor similar to the one shown in the following screenshot: Create the first action. To set panel state for the LogInDP dynamic panel, perform the following steps: Under Click to add actions, scroll to the Dynamic Panels drop-down menu and click on Set Panel State. Under Configure actions, click on the checkbox next to LoginDP. Next to Select the state, click on the dropdown and select State2. Create the second action. To set text for the WelcomeLabel, perform the following steps: Under Click to add actions, scroll to the Widgets drop-down menu and click on Set Text. Under Configure actions, click the checkbox next to WelcomeLabel. Under Set text to, click on the dropdown and select value. In the text field, enter Welcome, [[Email]]. Create the third action. To set value of the LoggedIn variable, perform the following steps: Under Click to add actions, scroll to the Variables drop-down menu and click on Set Variable Value. Under Configure actions, click on the checkbox next to LoggedIn. Under Set variable to, click on the first dropdown and click on value. In the text field, enter [[Email]]. Click on OK. With the ValidateUser case completed, next we will create the ShowErrorMessage case. Creating the ShowErrorMessage case To create the ShowErrorMessage case, in the Widget Interactions and Notes pane with the Interactions tab selected, click on Add Case…. A Case Editor dialog box will open. In the Case Name field, type ShowErrorMessage. Create the action. To show the ErrorMessage label, perform the following steps: Under Click to add actions, scroll to the Widgets dropdown, click on the Show/Hide dropdown and click on Show. Under Configure actions, under LoginDP dynamic panel, click on the checkbox next to ErrorMessage. Click on OK. Next, we will enable the interaction for the NewUserLink. Enabling interaction for the NewUserLink When the NewUserLink is clicked, the OnClick event will show the RegistrationLightBox dynamic panel as a lightbox, as shown in the following screenshot: With the LogInDP State1 still opened in the design area, click on the NewUserLink at coordinates (0,30). To enable the OnClick event in the Widget Interactions and Notes pane with the Interactions tab selected, click on Add Case…. A Case Editor dialog box will open. In the Case Name field, type ShowLightBox. Now, create the action; to show the RegistrationLightBox, perform the following steps: Under Click to add actions, scroll to the Widgets dropdown, click on the Show/Hide dropdown, and click on Show. Under Configure actions, click on the checkbox next to RegistrationLightBoxDP. Next go to More options, click on the dropdown and select treat as lightbox. Click on OK. Next, we will activate interactions for the ForgotLink. Activating interactions for the ForgotLink When the ForgotLink is clicked, the OnClick event will show the RegistrationLightBox dynamic panel as a lightbox, the RegistrationHeading text will be updated to display Forgot Password? and the EnterPassworldLabel, as well as the EnterPasswordField, will be hidden. To enable the OnClick event, in the Widget Interactions and Notes pane with the Interactions tab selected, click on Add Case…. A Case Editor dialog box will open. In the Case Name field, type ShowForgotLB. In the Case Editor dialog, perform the following steps: Create the first action; to show the RegistrationLightBox, perform the following steps: Under Click to add actions, scroll to the Widgets dropdown, click on the Show/Hide dropdown and click on Show. Under Configure actions, click on the checkbox next to RegistrationLightBoxDP. Next, go to More options, click on the dropdown and select treat as lightbox. Create the second action; to set text for the RegistrationHeading, perform the following steps: Under Click to add actions, scroll to the Widgets drop-down menu and click on Set Text. Under Configure actions, click on the checkbox next to RegistrationHeading. Under Set text to, click on the dropdown and select value. In the text field, enter Forgot Password?. Create the third action; to hide the EnterPasswordLabel and EnterPasswordField, perform the following steps: Under Click to add actions, scroll to the Widgets dropdown, click on the Show/Hide dropdown, and click on Hide. Under Configure actions, under RegistrationLightBoxDP, click on the checkboxes next to EnterPasswordLabel and EnterPasswordField. Click on OK. We have now completed the interactions for State1 of LoginDP. Next, we will facilitate interactions for the LogOutButton. Facilitating interactions for the LogOutButton When the LogOutButton is clicked, the OnClick event will perform the following actions: Hide the ErrorMessage on the LoginDP State1 Set text for PasswordTextField and EmailTextField Set panel state for LoginDP to State1 Set variable value for LoggedIn To enable the OnClick event, open the LogInDP State2 in the design area. Click on the LogInOut at coordinates (164,10). In the Widget Interactions and Notes pane, with the Interactions tab selected, click on Add Case…. A Case Editor dialog box will open. In the Case Name field, type LogOut. In the Case Editor dialog, perform the following steps: Create the first action; to hide the ErrorMessage, perform the following steps: Under Click to add actions, scroll to the Widgets dropdown, click on the Show/Hide dropdown, and click on Hide. Under Configure actions, under LoginDP, click on the checkbox next to ErrorMessage. Create the second action; to set text for the PasswordTextField and EmailTextField, perform the following steps: Under Click to add actions, scroll to the Widgets drop-down menu and click on Set Text. Under Configure actions, click the checkbox next to PasswordTextField. Under Set text to, click the dropdown and select value. In the text field, clear any text shown. Under Configure actions, click the checkbox next to EmailTextField. Under Set text to, click on the dropdown and select value. In the text field, enter Email. Create the third action; to set panel state for the LogInDP dynamic panel, perform the following steps: Under Click to add actions, scroll to the Dynamic Panels drop-down menu and click on Set Panel State. Under Configure actions, click on the checkbox next to LoginDP. Next to Select the state, click on the dropdown and select State1. Create the fourth action. To set variable value of LoggedIn, perform the following steps: Under Click to add actions, scroll to the Variables drop-down menu and click on Set Variable Value. Under Configure actions, click on the checkbox next to LoggedIn. Under Set variable to, click on the first dropdown and click on value. In the text field, enter No. Click on OK. We have now completed interactions for State2 of the LoginDP. Next, we will construct interactions for the RegistrationLightBoxDP. Constructing interactions for the RegistrationLightBoxDP When the LoginButton is clicked, the OnClick event hides RegistrationLightBoxDp and sets the Email and Password variable values to the text entered in the EnterEmailField and EnterPasswordField. Also, if text on the RegistrationHeading label is equal to Registration, LoginDP will be set to State2. We will define these actions by creating two cases: UpdateVariables and ShowLogInState. Updating Variables and hiding the RegistrationLightBoxDP In the Widget Manger pane, double-click on the RegistrationLightBoxDP State1 to open in the design area. To define the UpdateVariables case for the OnClick interaction, click on the SubmitButton at coordinates (25,190). In the Widget Interactions and Notes pane with the Interactions tab selected, click on Add Case…. A Case Editor dialog box will open. In the Case Name field, type UpdateVariables. In the Case Editor dialog, perform the following steps: The following screenshot shows Case Editor with the actions defined: Create the first action; to set variable value for the Email and Password variables, perform the following steps: Under Click to add actions, scroll to the Widgets drop-down menu and click on Set Variable Value. Under Configure actions, click on the checkbox next to Email. Under Set variable to, click on the first dropdown and select text on widget. Click on the second dropdown and select EnterEmailField. Under Configure actions, click on the checkbox next to Password. Under Set variable to, click on the first dropdown and select text on widget. Click on the second dropdown and select EnterPasswordField. Create the second action; to hide RegistrationLightBoxDP, perform the following steps: Under Click to add actions, scroll to the Widgets dropdown, click on the Show/Hide dropdown and click on Hide. Under Configure actions, click on the checkbox next to RegistrationLightBoxDP. Click on OK. With the UpdateVariables case completed, next we will create the ShowLogInState case. Creating the ShowLoginState case To create the ShowLogInState case, in the Widget Interactions and Notes pane with the Interactions tab selected click on Add Case…. A Case Editor dialog box will open. In the Case Name field, type ShowLogInState. In the Case Editor dialog, perform the following steps: Click on the Add Condition button to create the first condition. In the Condition Builder dialog box, go to the outlined condition box and perform the following steps: In the first dropdown, select text on widget. In the second dropdown, select RegistrationHeadline. In the third dropdown, select equals. In the fourth dropdown, select value. In the fifth dropdown, select Registration. Click on OK. Create the first action; to set text for the WelcomeLabel, perform the following steps: Under Click to add actions, scroll to the Widgets drop-down menu and click on Set Text. Under Configure actions, click on the checkbox next to WelcomeLabel. Under Set text to, click on the dropdown and select value. In the text field, enter Welcome, [[Email]]. Click on OK. Create the second action; to set panel state for the LogInDP dynamic panel, perform the following steps: Under Click to add actions, scroll to the Dynamic Panels drop-down menu and click on Set Panel State. Under Configure actions, click on the checkbox next to LoginDP. Next to Select the state, click on the dropdown and select State2. Create the third action; to set value of the LoggedIn variable, perform the following steps: Under Click to add actions, scroll to the Variables drop-down menu and click on Set Variable Value. Under Configure actions, click on the checkbox next to LoggedIn. Under Set variable to, click on the first dropdown and click on value. In the text field, enter [[Email]]. Click on OK. Under the OnClick event, right-click on the ShowErrorMessage case and click on Toggle IF/ELSE IF. With our Header master updated, we are now ready to refresh data for our Forum repeater. Summary We learned how to leverage masters and pages from our community site to create a new blog site. We enhanced the Header master and refined the interactions for our Header master. Resources for Article: Further resources on this subject: Home Page Structure [article] Axure RP 6 Prototyping Essentials: Advanced Interactions [article] Common design patterns and how to prototype them [article]
Read more
  • 0
  • 0
  • 9264

article-image-integrating-muzzley
Packt
10 Aug 2015
12 min read
Save for later

Integrating with Muzzley

Packt
10 Aug 2015
12 min read
In this article by Miguel de Sousa, author of the book Internet of Things with Intel Galileo, we will cover the following topics: Wiring the circuit The Muzzley IoT ecosystem Creating a Muzzley app Lighting up the entrance door (For more resources related to this topic, see here.) One identified issue regarding IoT is that there will be lots of connected devices and each one speaks its own language, not sharing the same protocols with other devices. This leads to an increasing number of apps to control each of those devices. Every time you purchase connected products, you'll be required to have the exclusive product app, and, in the near future, where it is predicted that more devices will be connected to the Internet than people, this is indeed a problem, which is known as the basket of remotes. Many solutions have been appearing for this problem. Some of them focus on creating common communication standards between the devices or even creating their own protocol such as the Intel Common Connectivity Framework (CCF). A different approach consists in predicting the device's interactions, where collected data is used to predict and trigger actions on the specific devices. An example using this approach is Muzzley. It not only supports a common way to speak with the devices, but also learns from the users' interaction, allowing them to control all their devices from a common app, and on collecting usage data, it can predict users' actions and even make different devices work together. In this article, we will start by understanding what Muzzley is and how we can integrate with it. We will then do some development to allow you to control your own building's entrance door. For this purpose, we will use Galileo as a bridge to communicate with a relay and the Muzzley cloud, allowing you to control the door from a common mobile app and from anywhere as long as there is Internet access. Wiring the circuit In this article, we'll be using a real home AC inter-communicator with a building entrance door unlock button and this will require you to do some homework. This integration will require you to open your inter-communicator and adjust the inner circuit, so be aware that there are always risks of damaging it. If you don't want to use a real inter-communicator, you can replace it by an LED or even by the buzzer module. If you want to use a real device, you can use a DC inter-communicator, but in this guide, we'll only be explaining how to do the wiring using an AC inter-communicator. The first thing you have to do is to take a look at the device manual and check whether it works with AC current, and the voltage it requires. If you can't locate your product manual, search for it online. In this article, we'll be using the solid state relay. This relay accepts a voltage range from 24 V up to 380 V AC, and your inter-communicator should also work in this voltage range. You'll also need some electrical wires and electrical wires junctions: Wire junctions and the solid state relay This equipment will be used to adapt the door unlocking circuit to allow it to be controlled from the Galileo board using a relay. The main idea is to use a relay to close the door opener circuit, resulting in the door being unlocked. This can be accomplished by joining the inter-communicator switch wires with the relay wires. Use some wire and wire junctions to do it, as displayed in the following image: Wiring the circuit The building/house AC circuit is represented in yellow, and S1 and S2 represent the inter-communicator switch (button). On pressing the button, we will also be closing this circuit, and the door will be unlocked. This way, the lock can be controlled both ways, using the original button and the relay. Before starting to wire the circuit, make sure that the inter-communicator circuit is powered off. If you can't switch it off, you can always turn off your house electrical board for a couple of minutes. Make sure that it is powered off by pressing the unlock button and trying to open the door. If you are not sure of what you must do or don't feel comfortable doing it, ask for help from someone more experienced. Open your inter-communicator, locate the switch, and perform the changes displayed in the preceding image (you may have to do some soldering). The Intel Galileo board will then activate the relay using pin 13, where you should wire it to the relay's connector number 3, and the Galileo's ground (GND) should be connected to the relay's connector number 4. Beware that not all the inter-communicator circuits work the same way and although we try to provide a general way to do it, there're always the risk of damaging your device or being electrocuted. Do it at your own risk. Power on your inter-communicator circuit and check whether you can open the door by pressing the unlock door button. If you prefer not using the inter-communicator with the relay, you can always replace it with a buzzer or an LED to simulate the door opening. Also, since the relay is connected to Galileo's pin 13, with the same relay code, you'll have visual feedback from the Galileo's onboard LED. The Muzzley IoT ecosystem Muzzley is an Internet of Things ecosystem that is composed of connected devices, mobile apps, and cloud-based services. Devices can be integrated with Muzzley through the device cloud or the device itself: It offers device control, a rules system, and a machine learning system that predicts and suggests actions, based on the device usage. The mobile app is available for Android, iOS, and Windows phone. It can pack all your Internet-connected devices in to a single common app, allowing them to be controlled together, and to work with other devices that are available in real-world stores or even other homemade connected devices, like the one we will create in this article. Muzzley is known for being one of the first generation platforms with the ability to predict a users' actions by learning from the user's interaction with their own devices. Human behavior is mostly unpredictable, but for convenience, people end up creating routines in their daily lives. The interaction with home devices is an example where human behavior can be observed and learned by an automated system. Muzzley tries to take advantage of these behaviors by identifying the user's recurrent routines and making suggestions that could accelerate and simplify the interaction with the mobile app and devices. Devices that don't know of each others' existence get connected through the user behavior and may create synergies among themselves. When the user starts using the Muzzley app, the interaction is observed by a profiler agent that tries to acquire a behavioral network of the linked cause-effect events. When the frequency of these network associations becomes important enough, the profiler agent emits a suggestion for the user to act upon. For instance, if every time a user arrives home, he switches on the house lights, check the thermostat, and adjust the air conditioner accordingly, the profiler agent will emit a set of suggestions based on this. The cause of the suggestion is identified and shortcuts are offered for the effect-associated action. For instance, the user could receive in the Muzzley app the following suggestions: "You are arriving at a known location. Every time you arrive here, you switch on the «Entrance bulb». Would you like to do it now?"; or "You are arriving at a known location. The thermostat «Living room» says that the temperature is at 15 degrees Celsius. Would you like to set your «Living room» air conditioner to 21 degrees Celsius?" When it comes to security and privacy, Muzzley takes it seriously and all the collected data is used exclusively to analyze behaviors to help make your life easier. This is the system where we will be integrating our door unlocker. Creating a Muzzley app The first step is to own a Muzzley developer account. If you don't have one yet, you can obtain one by visiting https://muzzley.com/developers, clicking on the Sign up button, and submitting the displayed form. To create an app, click on the top menu option Apps and then Create app. Name your App Galileo Lock and if you want to, add a description to your project. As soon as you click on Submit, you'll see two buttons displayed, allowing you to select the integration type: Muzzley allows you to integrate through the product manufacturer cloud or directly with a device. In this example, we will be integrating directly with the device. To do so, click on Device to Cloud integration. Fill in the provider name as you wish and pick two image URLs to be used as the profile (for example, http://hub.packtpub.com/wp-content/uploads/2015/08/Commercial1.jpg) and channel (for example, http://hub.packtpub.com/wp-content/uploads/2015/08/lock.png) images. We can select one of two available ways to add our device: it can be done using UPnP discovery or by inserting a custom serial number. Pick the device discovery option Serial number and ignore the fields Interface UUID and Email Access List; we will come back for them later. Save your changes by pressing the Save changes button. Lighting up the entrance door Now that we can unlock our door from anywhere using the mobile phone with an Internet connection, a nice thing to have is the entrance lights turn on when you open the building door using your Muzzley app. To do this, you can use the Muzzley workers to define rules to perform an action when the door is unlocked using the mobile app. To do this, you'll need to own one of the Muzzley-enabled smart bulbs such as Philips Hue, WeMo LED Lighting, Milight, Easybulb, or LIFX. You can find all the enabled devices in the app profiles selection list: If you don't have those specific lighting devices but have another type of connected device, search the available list to see whether it is supported. If it is, you can use that instead. Add your bulb channel to your account. You should now find it listed in your channels under the category Lighting. If you click on it, you'll be able to control the lights. To activate the trigger option in the lock profile we created previously, go to the Muzzley website and head back to the Profile Spec app, located inside App Details. Expand the property lock status by clicking on the arrow sign in the property #1 - Lock Status section and then expand the controlInterfaces section. Create a new control interface by clicking on the +controlInterface button. In the new controlInterface #1 section, we'll need to define the possible choices of label-values for this property when setting a rule. Feel free to insert an id, and in the control interface option, select the text-picker option. In the config field, we'll need to specify each of the available options, setting the display label and the real value that will be published. Insert the following JSON object: {"options":[{"value":"true","label":"Lock"}, {"value":"false","label":"Unlock"}]}. Now we need to create a trigger. In the profile spec, expand the trigger section. Create a new trigger by clicking on the +trigger button. Inside the newly created section, select the equals condition. Create an input by clicking on +input, insert the ID value, insert the ID of the control interface you have just created in the controlInterfaceId text field. Finally, add the [{"source":"selection.value","target":"data.value"}].path to map the data. Open your mobile app and click on the workers icon. Clicking on Create Worker will display the worker creation menu to you. Here, you'll be able to select a channel component property as a trigger to some other channel component property: Select the lock and select the Lock Status is equal to Unlock trigger. Save it and select the action button. In here, select the smart bulb you own and select the Status On option: After saving this rule, give it a try and use your mobile phone to unlock the door. The smart bulb should then turn on. With this, you can configure many things in your home even before you arrive there. In this specific scenario, we used our door locker as a trigger to accomplish an action on a lightbulb. If you want, you can do the opposite and open the door when a lightbulb lights up a specific color for instance. To do it, similar to how you configured your device trigger, you just have to set up the action options in your device profile page. Summary Everyday objects that surround us are being transformed into information ecosystems and the way we interact with them is slowly changing. Although IoT is growing up fast, it is nowadays in an early stage, and many issues must be solved in order to make it successfully scalable. By 2020, it is estimated that there will be more than 25 billion devices connected to the Internet. This fast growth without security regulations and deep security studies are leading to major concerns regarding the two biggest IoT challenges—security and privacy. Devices in our home that are remotely controllable or even personal data information getting into the wrong hands could be the recipe for a disaster. In this article you have learned the basic steps in wiring the circuit of your Galileo board, creating a Muzzley app, and lighting up the entrance door of your building through your Muzzley app, by using Intel Galileo board as a bridge to communicate with Muzzley cloud. Resources for Article: Further resources on this subject: Getting Started with Intel Galileo [article] Getting the current weather forecast [article] Controlling DC motors using a shield [article]
Read more
  • 0
  • 0
  • 8991

article-image-controls-and-widgets
Packt
10 Aug 2015
25 min read
Save for later

Controls and Widgets

Packt
10 Aug 2015
25 min read
In this article by Chip Lambert and Shreerang Patwardhan, author of the book, Mastering jQuery Mobile, we will take our Civic Center application to the next level and in the process of doing so, we will explore different widgets. We will explore the touch events provided by the jQuery Mobile framework further and then take a look at how this framework interacts with third-party plugins. We will be covering the following different widgets and topics in this article: Collapsible widget Listview widget Range slider widget Radio button widget Touch events Third-party plugins HammerJs FastClick Accessibility (For more resources related to this topic, see here.) Widgets We already made use of widgets as part of the Civic Center application. "Which? Where? When did that happen? What did I miss?" Don't panic as you have missed nothing at all. All the components that we use as part of the jQuery Mobile framework are widgets. The page, buttons, and toolbars are all widgets. So what do we understand about widgets from their usage so far? One thing is pretty evident, widgets are feature-rich and they have a lot of things that are customizable and that can be tweaked as per the requirements of the design. These customizable things are pretty much the methods and events that these small plugins offer to the developers. So all in all: Widgets are feature rich, stateful plugins that have a complete lifecycle, along with methods and events. We will now explore a few widgets as discussed before and we will start off with the collapsible widget. A collapsible widget, more popularly known as the accordion control, is used to display and style a cluster of related content together to be easily accessible to the user. Let's see this collapsible widget in action. Pull up the index.html file. We will be adding the collapsible widget to the facilities page. You can jump directly to the content div of the facilities page. We will replace the simple-looking, unordered list and add the collapsible widget in its place. Add the following code in place of the <ul>...<li></li>...</ul> portion: <div data-role="collapsibleset"> <div data-role="collapsible"> <h3>Banquet Halls</h3> <p>List of banquet halls will go here</p> </div> <div data-role="collapsible"> <h3>Sports Arena</h3> <p>List of sports arenas will go here</p> </div> <div data-role="collapsible">    <h3>Conference Rooms</h3> <p>List of conference rooms will come here</p> </div> <div data-role="collapsible"> <h3>Ballrooms</h3> <p>List of ballrooms will come here</p> </div> </div> That was pretty simple. As you must have noticed, we are creating a group of collapsibles defined by div with data-role="collapsibleset". Inside this div, we have multiple div elements each with data-role of "collapsible". These data roles instruct the framework to style div as a collapsible. Let's break individual collapsibles further. Each collapsible div has to have a heading tag (h1-h6), which acts as the title for that collapsible. This heading can be followed by any HTML structure that is required as per your application's design. In our application, we added a paragraph tag with some dummy text for now. We will soon be replacing this text with another widget—listview. Before we proceed to look at how we will be doing this, let's see what the facilities page is looking like right now: Now let's take a look at another widget that we will include in our project—the listview widget. The listview widget is a very important widget from the mobile website stand point. The listview widget is highly customizable and can play an important role in the navigation system of your web application as well. In our application, we will include listview within the collapsible div elements that we have just created. Each collapsible will hold the relevant list items which can be linked to a detailed page for each item. Without further discussion, let's take a look at the following code. We have replaced the contents of the first collapsible list item within the paragraph tag with the code to include the listview widget. We will break up the code and discuss the minute details later: <div data-role="collapsible"> <h3>Banquet Halls</h3> <p> <span>We have 3 huge banquet halls named after 3 most celebrated Chef's from across the world.</span> <ul data-role="listview" data-inset="true"> <li> <a href="#">Gordon Ramsay</a> </li> <li> <a href="#">Anthony Bourdain</a> </li> <li> <a href="#">Sanjeev Kapoor</a> </li> </ul> </p> </div> That was pretty simple, right? We replaced the dummy text from the paragraph tag with a span that has some details concerning what that collapsible list is about, and then we have an unordered list with data-role="listview" and some property called data-inset="true". We have seen several data-roles before, and this one is no different. This data-role attribute informs the framework to style the unordered list, such as a tappable button, while a data-inset property informs the framework to apply the inset appearance to the list items. Without this property, the list items would stretch from edge to edge on the mobile device. Try setting the data-inset property to false or removing the property altogether. You will see the results for yourself. Another thing worth noticing in the preceding code is that we have included an anchor tag within the li tags. This anchor tag informs the framework to add a right arrow icon on the extreme right of that list item. Again, this icon is customizable, along with its position and other styling attributes. Right now, our facilities page should appear as seen in the following image: We will now add similar listview widgets within the remaining three collapsible items. The content for the next collapsible item titled Sports Arena should be as follows. Once added, this collapsible item, when expanded, should look as seen in the screenshot that follows the code: <div data-role="collapsible">    <h3>Sports Arena</h3>    <p>        <span>We have 3 huge sport arenas named after 3 most celebrated sport personalities from across the world.       </span>        <ul data-role="listview" data-inset="true">            <li>                <a href="#">Sachin Tendulkar</a>            </li>            <li>                <a href="#">Roger Federer</a>            </li>            <li>                <a href="#">Usain Bolt</a>            </li>        </ul>    </p> </div> The code for the listview widgets that should be included in the next collapsible item titled Conference Rooms. Once added, this collapsible, item when expanded, should look as seen in the image that follows the code: <div data-role="collapsible">    <h3>Conference Rooms</h3>    <p>        <span>            We have 3 huge conference rooms named after 3 largest technology companies.        </span>        <ul data-role="listview" data-inset="true">            <li>                <a href="#">Google</a>            </li>            <li>                <a href="#">Twitter</a>            </li>            <li>                <a href="#">Facebook</a>            </li>        </ul>    </p> </div> The final collapsible list item – Ballrooms – should hold the following code, to include its share of the listview items: <div data-role="collapsible">    <h3>Ballrooms</h3>    <p>        <span>            We have 3 huge ball rooms named after 3 different dance styles from across the world.        </span>        <ul data-role="listview" data-inset="true">            <li>                <a href="#">Ballet</a>            </li>            <li>                <a href="#">Kathak</a>            </li>            <li>                <a href="#">Paso Doble</a>            </li>        </ul>    </p> </div> After adding these listview items, our facilities page should look as seen in the following image: The facilities page now looks much better than it did earlier, and we now understand a couple more very important widgets available in jQuery Mobile—the collapsible widget and the listview Widget. We will now explore two form widgets – slider widget and the radio buttons widget. For this, we will be enhancing our catering page. Let's build a simple tool that will help the visitors of this site estimate the food expense based on the number of guests and the type of cuisine that they choose. Let's get started then. First, we will add the required HTML, to include the slider widget and the radio buttons widget. Scroll down to the content div of the catering page, where we have the paragraph tag containing some text about the Civic Center's catering services. Add the following code after the paragraph tag: <form>    <label style="font-weight: bold; padding: 15px 0px;" for="slider">Number of guests</label>    <input type="range" name="slider" id="slider" data-highlight="true" min="50" max="1000" value="50">    <fieldset data-role="controlgroup" id="cuisine-choices">        <legend style="font-weight: bold; padding: 15px 0px;">Choose your cuisine</legend>        <input type="radio" name="cuisine-choice" id="cuisine-choice-cont" value="15" checked="checked" />        <label for="cuisine-choice-cont">Continental</label>        <input type="radio" name="cuisine-choice" id="cuisine-choice-mex" value="12" />        <label for="cuisine-choice-mex">Mexican</label>        <input type="radio" name="cuisine-choice" id="cuisine-choice-ind" value="14" />        <label for="cuisine-choice-ind">Indian</label>    </fieldset>    <p>        The approximate cost will be: <span style="font-weight: bold;" id="totalCost"></span>    </p> </form> That is not much code, but we are adding and initializing two new form widgets here. Let's take a look at the code in detail: <label style="font-weight: bold; padding: 15px 0px;" for="slider">Number of guests</label> <input type="range" name="slider" id="slider" data-highlight="true" min="50" max="1000" value="50"> We are initializing our first form widget here—the slider widget. The slider widget is an input element of the type range, which accepts a minimum value and maximum value and a default value. We will be using this slider to accept the number of guests. Since the Civic Center can cater to a maximum of 1,000 people, we will set the maximum limit to 1,000 and we expect that we have at least 50 guests, so we set a minimum value of 50. Since the minimum number of guests that we cater for is 50, we set the input's default value to 50. We also set the data-highlight attribute value to true, which informs the framework that the selected area on the slider should be highlighted. Next comes the group of radio buttons. The most important attribute to be considered here is the data-role="controlgroup" set on the fieldset element. Adding this data-role combines the radio buttons into one single group, which helps inform the user that one of the radio buttons is to be selected. This gives a visual indication to the user that one radio button out of the whole lot needs to be selected. The values assigned to each of the radio inputs here indicate the cost per person for that particular cuisine. This value will help us calculate the final dollar value for the number of selected guests and the type of cuisine. Whenever you are using the form widgets, make sure you have the form elements in the hierarchy as required by the jQuery Mobile framework. When the elements are in the required hierarchy, the framework can apply the required styles. At the end of the previous code snippet, we have a paragraph tag where we will populate the approximate cost of catering for the selected number of guests and the type of cuisine selected. The catering page should now look as seen in the following image. Right now, we only have the HTML widgets in place. When you drag the slider or select different radio buttons, you will only see the UI interactions of these widgets and the UI treatments that the framework applies to these widgets. However, the total cost will not be populated yet. We will need to write some JavaScript logic to determine this value, and we will take a look at this in a minute. Before moving to the JavaScript part, make sure you have all the code that is needed: Now let's take a look at the magic part of the code (read JavaScript) that is going to make our widgets usable for the visitors of this Civic Center web application. Add the following JavaScript code in the script tag at the very end of our index.html file: $(document).on('pagecontainershow', function(){    var guests = 50;    var cost = 35;    var totalCost;    $("#slider").on("slidestop", function(event, ui){        guests = $('#slider').val();        totalCost = costCal();        $("#totalCost").text("$" + totalCost);    });    $("input:radio[name=cuisine-choice]").on("click", function() {        cost = $(this).val();        var totalCost = costCal();        $("#totalCost").text("$" + totalCost);    });    function costCal(){        return guests * cost;    } }); That is a pretty small chunk of code and pretty simple too. We will be looking at a few very important events that are part of the framework and that come in very handy when developing web applications with jQuery Mobile. One of the most important things that you must have already noticed is that we are not making use of the customary $(document).on('ready', function(){ in Jquery, but something that looks as the following code: $(document).on('pagecontainershow', function(){ The million dollar question here is "why doesn't DOM already work in jQuery Mobile?" As part of jQuery, the first thing that we often learn to do is execute our jQuery code as soon as the DOM is ready, and this is identified using the $(document).ready function. In jQuery Mobile, pages are requested and injected into the same DOM as the user navigates from one page to another and so the DOM ready event is as useful as it executes only for the first page. Now we need an event that should execute when every page loads, and $(document).pagecontainershow is the one. The pagecontainershow element is triggered on the toPage after the transition animation has completed. The pagecontainershow element is triggered on the pagecontainer element and not on the actual page. In the function, we initialize the guests and the cost variables to 50 and 35 respectively, as the minimum number of guests we can have is 50 and the "Continental" cuisine is selected by default, which has a value of 35. We will be calculating the estimated cost when the user changes the number of guests or selects a different radio button. This brings us to the next part of our code. We need to get the value of the number of guests as soon as the user stops sliding the slider. jQuery Mobile provides us with the slidestop event for this very purpose. As soon as the user stops sliding, we get the value of the slider and then call the costCal function, which returns a value that is the number of guests multiplied by the cost of the selected cuisine per person. We then display this value in the paragraph at the bottom for the user to get an estimated cost. We will discuss some more about the touch events that are available as part of the jQuery Mobile framework in the next section. When the user selects a different radio button, we retrieve the value of the selected radio button, call the costCal function again, and update the value displayed in the paragraph at the bottom of our page. If you have the code correct and your functions are all working fine, you should see something similar to the following image: Input with touch We will take a look at a couple of touch events, which are tap and taphold. The tap event is triggered after a quick touch; whereas the taphold event is triggered after a sustained, long press touch. The jQuery Mobile tap event is the gesture equivalent of the standard click event that is triggered on the release of the touch gesture. The following snippet of code should help you incorporate the tap event when you need to use it in your application: $(".selector").on("tap", function(){    console.log("tap event is triggered"); }); The jQuery Mobile taphold event triggers after a sustained, complete touch event, which is more commonly known as the long press event. The taphold event fires when the user taps and holds for a minimum of 750 milliseconds. You can also change the default value, but we will come to that in a minute. First, let's see how the taphold event is used: $(".selector").on("taphold", function(){    console.log("taphold event is triggered"); }); Now to change the default value for the long press event, we need to set the value for the following piece of code: $.event.special.tap.tapholdThreshold Working with plugins A number of times, we will come across scenarios where the capabilities of the framework are just not sufficient for all the requirements of your project. In such scenarios, we have to make use of third-party plugins in our project. We will be looking at two very interesting plugins in the course of this article, but before that, you need to understand what jQuery plugins exactly are. A jQuery plugin is simply a new method that has been used to extend jQuery's prototype object. When we include the jQuery plugin as part of our code, this new method becomes available for use within your application. When selecting jQuery plugins for your jQuery Mobile web application, make sure that the plugin is optimized for mobile devices and incorporates touch events as well, based on your requirements. The first plugin that we are going to look at today is called FastClick and is developed by FT Labs. This is an open source plugin and so can be used as part of your application. FastClick is a simple, easy-to-use library designed to eliminate the 300 ms delay between a physical tap and the firing on the click event on mobile browsers. Wait! What are we talking about? What is this 300 ms delay between tap and click? What exactly are we discussing? Sure. We understand the confusion. Let's explain this 300 ms delay issue. The click events have a 300 ms delay on touch devices, which makes web applications feel laggy on a mobile device and doesn't give users a native-like feel. If you go to a site that isn't mobile-optimized, it starts zoomed out. You have to then either pinch and zoom or double tap some content so that it becomes readable. The double-tap is a performance killer, because with every tap we have to wait to see whether it might be a double tap—and this wait is 300 ms. Here is how it plays out: touchstart touchend Wait 300ms in case of another tap click This pause of 300 ms applies to click events in JavaScript, but also other click-based interactions such as links and form controls. Most mobile web browsers out there have this 300 ms delay on the click events, but now a few modern browsers such as Chrome and FireFox for Android and iOS are removing this 300 ms delay. However, if you are supporting the older Android and iOS versions, with older mobile browsers, you might want to consider including the FastClick plugin in your application, which helps resolve this problem. Let's take a look at how we can use this plugin in any web application. First, you need to download the plugin files, or clone their GitHub repository here: https://github.com/ftlabs/fastclick. Once you have done that, include a reference to the plugin's JavaScript file in your application: <script type="application/javascript" src="path/fastclick.js"></script> Make sure that the script is loaded prior to instantiating FastClick on any element of the page. FastClick recommends you to instantiate the plugin on the body element itself. We can do this using the following piece of code: $(function){    FastClick.attach(document.body); } That is it! Your application is now free of the 300 ms click delay issue and will work as smooth as a native application. We have just provided you with an introduction to the FastClick plugin. There are several more features that this plugin provides. Make sure you visit their website—https://github.com/ftlabs/fastclick—for more details on what the plugin has to offer. Another important plugin that we will look at is HammerJs. HammerJs, again is an open source library that helps recognize gestures made by touch, mouse, and pointerEvents. Now, you would say that the jQuery Mobile framework already takes care of this, so why do we need a third-party plugin again? True, jQuery Mobile supports a variety of touch events such as tap, tap and hold, and swipe, as well as the regular mouse events, but what if in our application we want to make use of some touch gestures such as pan, pinch, rotate, and so on, which are not supported by jQuery Mobile by default? This is where HammerJs comes into the picture and plays nicely along with jQuery Mobile. Including HammerJS in your web application code is extremely simple and straightforward, like the FastClick plugin. You need to download the plugin files and then add a reference to the plugin JavaScript file: <script type="application/javascript" src="path/hammer.js"></script> Once you have included the plugin, you need to create a new instance on the Hammer object and then start using the plugin for all the touch gestures you need to support: var hammerPan = new Hammer(element_name, options); hammerPan.on('pan', function(){    console.log("Inside Pan event"); }); By default, Hammer adds a set of events—tap, double tap, swipe, pan, press, pinch, and rotate. The pinch and rotate recognizers are disabled by default, but can be turned on as and when required. HammerJS offers a lot of features that you might want to explore. Make sure you visit their website—http://hammerjs.github.io/ to understand the different features the library has to offer and how you can integrate this plugin within your existing or new jQuery Mobile projects. Accessibility Most of us today cannot imagine our lives without the Internet and our smartphones. Some will even argue that the Internet is the single largest revolutionary invention of all time that has touched numerous lives across the globe. Now, at the click of a mouse or the touch of your fingertip, the world is now at your disposal, provided you can use the mouse, see the screen, and hear the audio—impairments might make it difficult for people to access the Internet. This makes us wonder about how people with disabilities would use the Internet, their frustration in doing so, and the efforts that must be taken to make websites accessible to all. Though estimates vary on this, most studies have revealed that about 15% of the world's population have some kind of disability. Not all of these people would have an issue with accessing the web, but let's assume 5% of these people would face a problem in accessing the web. This 5% is also a considerable amount of users, which cannot be ignored by businesses on the web, and efforts must be taken in the right direction to make the web accessible to these users with disabilities. jQuery Mobile framework comes with built-in support for accessibility. jQuery Mobile is built with accessibility and universal access in mind. Any application that is built using jQuery Mobile is accessible via the screen reader as well. When you make use of the different jQuery Mobile widgets in your application, unknowingly you are also adding support for web accessibility into your application. jQuery Mobile framework adds all the necessary aria attributes to the elements in the DOM. Let's take a look at how the DOM looks for our facilities page: Look at the highlighted Events button in the top right corner and its corresponding HTML (also highlighted) in the developer tools. You will notice that there are a few attributes added to the anchor tag that start with aria-. We did not add any of these aria- attributes when we wrote the code for the Events button. jQuery Mobile library takes care of these things for you. The accessibility implementation is an ongoing process and the awesome developers at jQuery Mobile are working towards improving the support every new release. We spoke about aria- attributes, but what do they really represent? WAI - ARIA stands for Web Accessibility Initiative – Accessible Rich Internet Applications. This was a technical specification published by the World Wide Web Consortium (W3C) and basically specifies how to increase the accessibility of web pages. ARIA specifies the roles, properties, and states of a web page that make it accessible to all users. Accessibility is extremely vast, hence covering every detail of it is not possible. However, there is excellent material available on the Internet on this topic and we encourage you to read and understand this. Try to implement accessibility into your current or next project even if it is not based on jQuery Mobile. Web accessibility is an extremely important thing that should be considered, especially when you are building web applications that will be consumed by a huge consumer base—on e-commerce websites for example. Summary In this article, we made use of some of the available widgets from the jQuery Mobile framework and we built some interactivity into our existing Civic Center application. The widgets that we used included the range slider, the collapsible widget, the listview widget, and the radio button widget. We evaluated and looked at how to use two different third-party plugins—FastClick and HammerJs. We concluded the article by taking a look at the concept of Web Accessibility. Resources for Article: Further resources on this subject: Creating Mobile Dashboards [article] Speeding up Gradle builds for Android [article] Saying Hello to Unity and Android [article]
Read more
  • 0
  • 0
  • 3200
article-image-understanding-hadoop-backup-and-recovery-needs
Packt
10 Aug 2015
25 min read
Save for later

Understanding Hadoop Backup and Recovery Needs

Packt
10 Aug 2015
25 min read
In this article by Gaurav Barot, Chintan Mehta, and Amij Patel, authors of the book Hadoop Backup and Recovery Solutions, we will discuss backup and recovery needs. In the present age of information explosion, data is the backbone of business organizations of all sizes. We need a complete data backup and recovery system and a strategy to ensure that critical data is available and accessible when the organizations need it. Data must be protected against loss, damage, theft, and unauthorized changes. If disaster strikes, data recovery must be swift and smooth so that business does not get impacted. Every organization has its own data backup and recovery needs, and priorities based on the applications and systems they are using. Today's IT organizations face the challenge of implementing reliable backup and recovery solutions in the most efficient, cost-effective manner. To meet this challenge, we need to carefully define our business requirements and recovery objectives before deciding on the right backup and recovery strategies or technologies to deploy. (For more resources related to this topic, see here.) Before jumping onto the implementation approach, we first need to know about the backup and recovery strategies and how to efficiently plan them. Understanding the backup and recovery philosophies Backup and recovery is becoming more challenging and complicated, especially with the explosion of data growth and increasing need for data security today. Imagine big players such as Facebook, Yahoo! (the first to implement Hadoop), eBay, and more; how challenging it will be for them to handle unprecedented volumes and velocities of unstructured data, something which traditional relational databases can't handle and deliver. To emphasize the importance of backup, let's take a look at a study conducted in 2009. This was the time when Hadoop was evolving and a handful of bugs still existed in Hadoop. Yahoo! had about 20,000 nodes running Apache Hadoop in 10 different clusters. HDFS lost only 650 blocks, out of 329 million total blocks. Now hold on a second. These blocks were lost due to the bugs found in the Hadoop package. So, imagine what the scenario would be now. I am sure you will bet on losing hardly a block. Being a backup manager, your utmost target is to think, make, strategize, and execute a foolproof backup strategy capable of retrieving data after any disaster. Solely speaking, the plan of the strategy is to protect the files in HDFS against disastrous situations and revamp the files back to their normal state, just like James Bond resurrects after so many blows and probably death-like situations. Coming back to the backup manager's role, the following are the activities of this role: Testing out various case scenarios to forestall any threats, if any, in the future Building a stable recovery point and setup for backup and recovery situations Preplanning and daily organization of the backup schedule Constantly supervising the backup and recovery process and avoiding threats, if any Repairing and constructing solutions for backup processes The ability to reheal, that is, recover from data threats, if they arise (the resurrection power) Data protection is one of the activities and it includes the tasks of maintaining data replicas for long-term storage Resettling data from one destination to another Basically, backup and recovery strategies should cover all the areas mentioned here. For any system data, application, or configuration, transaction logs are mission critical, though it depends on the datasets, configurations, and applications that are used to design the backup and recovery strategies. Hadoop is all about big data processing. After gathering some exabytes for data processing, the following are the obvious questions that we may come up with: What's the best way to back up data? Do we really need to take a backup of these large chunks of data? Where will we find more storage space if the current storage space runs out? Will we have to maintain distributed systems? What if our backup storage unit gets corrupted? The answer to the preceding questions depends on the situation you may be facing; let's see a few situations. One of the situations is where you may be dealing with a plethora of data. Hadoop is used for fact-finding semantics and data is in abundance. Here, the span of data is short; it is short lived and important sources of the data are already backed up. Such is the scenario wherein the policy of not backing up data at all is feasible, as there are already three copies (replicas) in our data nodes (HDFS). Moreover, since Hadoop is still vulnerable to human error, a backup of configuration files and NameNode metadata (dfs.name.dir) should be created. You may find yourself facing a situation where the data center on which Hadoop runs crashes and the data is not available as of now; this results in a failure to connect with mission-critical data. A possible solution here is to back up Hadoop, like any other cluster (the Hadoop command is Hadoop). Replication of data using DistCp To replicate data, the distcp command writes data to two different clusters. Let's look at the distcp command with a few examples or options. DistCp is a handy tool used for large inter/intra cluster copying. It basically expands a list of files to input in order to map tasks, each of which will copy files that are specified in the source list. Let's understand how to use distcp with some of the basic examples. The most common use case of distcp is intercluster copying. Let's see an example: bash$ hadoop distcp2 hdfs://ka-16:8020/parth/ghiya hdfs://ka-001:8020/knowarth/parth This command will expand the namespace under /parth/ghiya on the ka-16 NameNode into the temporary file, get its content, divide them among a set of map tasks, and start copying the process on each task tracker from ka-16 to ka-001. The command used for copying can be generalized as follows: hadoop distcp2 hftp://namenode-location:50070/basePath hdfs://namenode-location Here, hftp://namenode-location:50070/basePath is the source and hdfs://namenode-location is the destination. In the preceding command, namenode-location refers to the hostname and 50070 is the NameNode's HTTP server post. Updating and overwriting using DistCp The -update option is used when we want to copy files from the source that don't exist on the target or have some different contents, which we do not want to erase. The -overwrite option overwrites the target files even if they exist at the source. The files can be invoked by simply adding -update and -overwrite. In the example, we used distcp2, which is an advanced version of DistCp. The process will go smoothly even if we use the distcp command. Now, let's look at two versions of DistCp, the legacy DistCp or just DistCp and the new DistCp or the DistCp2: During the intercluster copy process, files that were skipped during the copy process have all their file attributes (permissions, owner group information, and so on) unchanged when we copy using legacy DistCp or just DistCp. This, however, is not the case in new DistCp. These values are now updated even if a file is skipped. Empty root directories among the source inputs were not created in the target folder in legacy DistCp, which is not the case anymore in the new DistCp. There is a common misconception that Hadoop protects data loss; therefore, we don't need to back up the data in the Hadoop cluster. Since Hadoop replicates data three times by default, this sounds like a safe statement; however, it is not 100 percent safe. While Hadoop protects from hardware failure on the data nodes—meaning that if one entire node goes down, you will not lose any data—there are other ways in which data loss may occur. Data loss may occur due to various reasons, such as Hadoop being highly susceptible to human errors, corrupted data writes, accidental deletions, rack failures, and many such instances. Any of these reasons are likely to cause data loss. Consider an example where a corrupt application can destroy all data replications. During the process, it will attempt to compute each replication and on not finding a possible match, it will delete the replica. User deletions are another example of how data can be lost, as Hadoop's trash mechanism is not enabled by default. Also, one of the most complicated and expensive-to-implement aspects of protecting data in Hadoop is the disaster recovery plan. There are many different approaches to this, and determining which approach is right requires a balance between cost, complexity, and recovery time. A real-life scenario can be Facebook. The data that Facebook holds increases exponentially from 15 TB to 30 PB, that is, 3,000 times the Library of Congress. With increasing data, the problem faced was physical movement of the machines to the new data center, which required man power. Plus, it also impacted services for a period of time. Data availability in a short period of time is a requirement for any service; that's when Facebook started exploring Hadoop. To conquer the problem while dealing with such large repositories of data is yet another headache. The reason why Hadoop was invented was to keep the data bound to neighborhoods on commodity servers and reasonable local storage, and to provide maximum availability to data within the neighborhood. So, a data plan is incomplete without data backup and recovery planning. A big data execution using Hadoop states a situation wherein the focus on the potential to recover from a crisis is mandatory. The backup philosophy We need to determine whether Hadoop, the processes and applications that run on top of it (Pig, Hive, HDFS, and more), and specifically the data stored in HDFS are mission critical. If the data center where Hadoop is running disappeared, will the business stop? Some of the key points that have to be taken into consideration have been explained in the sections that follow; by combining these points, we will arrive at the core of the backup philosophy. Changes since the last backup Considering the backup philosophy that we need to construct, the first thing we are going to look at are changes. We have a sound application running and then we add some changes. In case our system crashes and we need to go back to our last safe state, our backup strategy should have a clause of the changes that have been made. These changes can be either database changes or configuration changes. Our clause should include the following points in order to construct a sound backup strategy: Changes we made since our last backup The count of files changed Ensure that our changes are tracked The possibility of bugs in user applications since the last change implemented, which may cause hindrance and it may be necessary to go back to the last safe state After applying new changes to the last backup, if the application doesn't work as expected, then high priority should be given to the activity of taking the application back to its last safe state or backup. This ensures that the user is not interrupted while using the application or product. The rate of new data arrival The next thing we are going to look at is how many changes we are dealing with. Is our application being updated so much that we are not able to decide what the last stable version was? Data is produced at a surpassing rate. Consider Facebook, which alone produces 250 TB of data a day. Data production occurs at an exponential rate. Soon, terms such as zettabytes will come upon a common place. Our clause should include the following points in order to construct a sound backup: The rate at which new data is arriving The need for backing up each and every change The time factor involved in backup between two changes Policies to have a reserve backup storage The size of the cluster The size of a cluster is yet another important factor, wherein we will have to select cluster size such that it will allow us to optimize the environment for our purpose with exceptional results. Recalling the Yahoo! example, Yahoo! has 10 clusters all over the world, covering 20,000 nodes. Also, Yahoo! has the maximum number of nodes in its large clusters. Our clause should include the following points in order to construct a sound backup: Selecting the right resource, which will allow us to optimize our environment. The selection of the right resources will vary as per need. Say, for instance, users with I/O-intensive workloads will go for more spindles per core. A Hadoop cluster contains four types of roles, that is, NameNode, JobTracker, TaskTracker, and DataNode. Handling the complexities of optimizing a distributed data center. Priority of the datasets The next thing we are going to look at are the new datasets, which are arriving. With the increase in the rate of new data arrivals, we always face a dilemma of what to backup. Are we tracking all the changes in the backup? Now, if are we backing up all the changes, will our performance be compromised? Our clause should include the following points in order to construct a sound backup: Making the right backup of the dataset Taking backups at a rate that will not compromise performance Selecting the datasets or parts of datasets The next thing we are going to look at is what exactly is backed up. When we deal with large chunks of data, there's always a thought in our mind: Did we miss anything while selecting the datasets or parts of datasets that have not been backed up yet? Our clause should include the following points in order to construct a sound backup: Backup of necessary configuration files Backup of files and application changes The timeliness of data backups With such a huge amount of data collected daily (Facebook), the time interval between backups is yet another important factor. Do we back up our data daily? In two days? In three days? Should we backup small chunks of data daily, or should we back up larger chunks at a later period? Our clause should include the following points in order to construct a sound backup: Dealing with any impacts if the time interval between two backups is large Monitoring a timely backup strategy and going through it The frequency of data backups depends on various aspects. Firstly, it depends on the application and usage. If it is I/O intensive, we may need more backups, as each dataset is not worth losing. If it is not so I/O intensive, we may keep the frequency low. We can determine the timeliness of data backups from the following points: The amount of data that we need to backup The rate at which new updates are coming Determining the window of possible data loss and making it as low as possible Critical datasets that need to be backed up Configuration and permission files that need to be backed up Reducing the window of possible data loss The next thing we are going to look at is how to minimize the window of possible data loss. If our backup frequency is great then what are the chances of data loss? What's our chance of recovering the latest files? Our clause should include the following points in order to construct a sound backup: The potential to recover latest files in the case of a disaster Having a low data-loss probability Backup consistency The next thing we are going to look at is backup consistency. The probability of invalid backups should be less or even better zero. This is because if invalid backups are not tracked, then copies of invalid backups will be made further, which will again disrupt our backup process. Our clause should include the following points in order to construct a sound backup: Avoid copying data when it's being changed Possibly, construct a shell script, which takes timely backups Ensure that the shell script is bug-free Avoiding invalid backups We are going to continue the discussion on invalid backups. As you saw, HDFS makes three copies of our backup for the recovery process. What if the original backup was flawed with errors or bugs? The three copies will be corrupted copies; now, when we recover these flawed copies, the result indeed will be a catastrophe. Our clause should include the following points in order to construct a sound backup: Avoid having a long backup frequency Have the right backup process, and probably having an automated shell script Track unnecessary backups If our backup clause covers all the preceding mentioned points, we surely are on the way to making a good backup strategy. A good backup policy basically covers all these points; so, if a disaster occurs, it always aims to go to the last stable state. That's all about backups. Moving on, let's say a disaster occurs and we need to go to the last stable state. Let's have a look at the recovery philosophy and all the points that make a sound recovery strategy. The recovery philosophy After a deadly storm, we always try to recover from the after-effects of the storm. Similarly, after a disaster, we try to recover from the effects of the disaster. In just one moment, storage capacity which was a boon turns into a curse and just another expensive, useless thing. Starting off with the best question, what will be the best recovery philosophy? Well, it's obvious that the best philosophy will be one wherein we may never have to perform recovery at all. Also, there may be scenarios where we may need to do a manual recovery. Let's look at the possible levels of recovery before moving on to recovery in Hadoop: Recovery to the flawless state Recovery to the last supervised state Recovery to a possible past state Recovery to a sound state Recovery to a stable state So, obviously we want our recovery state to be flawless. But if it's not achieved, we are willing to compromise a little and allow the recovery to go to a possible past state we are aware of. Now, if that's not possible, again we are ready to compromise a little and allow it to go to the last possible sound state. That's how we deal with recovery: first aim for the best, and if not, then compromise a little. Just like the saying goes, "The bigger the storm, more is the work we have to do to recover," here also we can say "The bigger the disaster, more intense is the recovery plan we have to take." So, the recovery philosophy that we construct should cover the following points: An automation system setup that detects a crash and restores the system to the last working state, where the application runs as per expected behavior. The ability to track modified files and copy them. Track the sequences on files, just like an auditor trails his audits. Merge the files that are copied separately. Multiple version copies to maintain a version control. Should be able to treat the updates without impacting the application's security and protection. Delete the original copy only after carefully inspecting the changed copy. Treat new updates but first make sure they are fully functional and will not hinder anything else. If they hinder, then there should be a clause to go to the last safe state. Coming back to recovery in Hadoop, the first question we may think of is what happens when the NameNode goes down? When the NameNode goes down, so does the metadata file (the file that stores data about file owners and file permissions, where the file is stored on data nodes and more), and there will be no one present to route our read/write file request to the data node. Our goal will be to recover the metadata file. HDFS provides an efficient way to handle name node failures. There are basically two places where we can find metadata. First, fsimage and second, the edit logs. Our clause should include the following points: Maintain three copies of the name node. When we try to recover, we get four options, namely, continue, stop, quit, and always. Choose wisely. Give preference to save the safe part of the backups. If there is an ABORT! error, save the safe state. Hadoop provides four recovery modes based on the four options it provides (continue, stop, quit, and always): Continue: This allows you to continue over the bad parts. This option will let you cross over a few stray blocks and continue over to try to produce a full recovery mode. This can be the Prompt when found error mode. Stop: This allows you to stop the recovery process and make an image file of the copy. Now, the part that we stopped won't be recovered, because we are not allowing it to. In this case, we can say that we are having the safe-recovery mode. Quit: This exits the recovery process without making a backup at all. In this, we can say that we are having the no-recovery mode. Always: This is one step further than continue. Always selects continue by default and thus avoids stray blogs found further. This can be the prompt only once mode. We will look at these in further discussions. Now, you may think that the backup and recovery philosophy is cool, but wasn't Hadoop designed to handle these failures? Well, of course, it was invented for this purpose but there's always the possibility of a mashup at some level. Are we overconfident and not ready to take precaution, which can protect us, and are we just entrusting our data blindly with Hadoop? No, certainly we aren't. We are going to take every possible preventive step from our side. In the next topic, we look at the very same topic as to why we need preventive measures to back up Hadoop. Knowing the necessity of backing up Hadoop Change is the fundamental law of nature. There may come a time when Hadoop may be upgraded on the present cluster, as we see many system upgrades everywhere. As no upgrade is bug free, there is a probability that existing applications may not work the way they used to. There may be scenarios where we don't want to lose any data, let alone start HDFS from scratch. This is a scenario where backup is useful, so a user can go back to a point in time. Looking at the HDFS replication process, the NameNode handles the client request to write a file on a DataNode. The DataNode then replicates the block and writes the block to another DataNode. This DataNode repeats the same process. Thus, we have three copies of the same block. Now, how these DataNodes are selected for placing copies of blocks is another issue, which we are going to cover later in Rack awareness. You will see how to place these copies efficiently so as to handle situations such as hardware failure. But the bottom line is when our DataNode is down there's no need to panic; we still have a copy on a different DataNode. Now, this approach gives us various advantages such as: Security: This ensures that blocks are stored on two different DataNodes High write capacity: This writes only on a single DataNode; the replication factor is handled by the DataNode Read options: This denotes better options from where to read; the NameNode maintains records of all the locations of the copies and the distance from the NameNode Block circulation: The client writes only a single block; others are handled through the replication pipeline During the write operation on a DataNode, it receives data from the client as well as passes data to the next DataNode simultaneously; thus, our performance factor is not compromised. Data never passes through the NameNode. The NameNode takes the client's request to write data on a DataNode and processes the request by deciding on the division of files into blocks and the replication factor. The following figure shows the replication pipeline, wherein a block of the file is written and three different copies are made at different DataNode locations: After hearing such a foolproof plan and seeing so many advantages, we again arrive at the same question: is there a need for backup in Hadoop? Of course there is. There often exists a common mistaken belief that Hadoop shelters you against data loss, which gives you the freedom to not take backups in your Hadoop cluster. Hadoop, by convention, has a facility to replicate your data three times by default. Although reassuring, the statement is not safe and does not guarantee foolproof protection against data loss. Hadoop gives you the power to protect your data over hardware failures; the scenario wherein one disk, cluster, node, or region may go down, data will still be preserved for you. However, there are many scenarios where data loss may occur. Consider an example where a classic human-prone error can be the storage locations that the user provides during operations in Hive. If the user provides a location wherein data already exists and they perform a query on the same table, the entire existing data will be deleted, be it of size 1 GB or 1 TB. In the following figure, the client gives a read operation but we have a faulty program. Going through the process, the NameNode is going to see its metadata file for the location of the DataNode containing the block. But when it reads from the DataNode, it's not going to match the requirements, so the NameNode will classify that block as an under replicated block and move on to the next copy of the block. Oops, again we will have the same situation. This way, all the safe copies of the block will be transferred to under replicated blocks, thereby HDFS fails and we need some other backup strategy: When copies do not match the way NameNode explains, it discards the copy and replaces it with a fresh copy that it has. HDFS replicas are not your one-stop solution for protection against data loss. The needs for recovery Now, we need to decide up to what level we want to recover. Like you saw earlier, we have four modes available, which recover either to a safe copy, the last possible state, or no copy at all. Based on your needs decided in the disaster recovery plan we defined earlier, you need to take appropriate steps based on that. We need to look at the following factors: The performance impact (is it compromised?) How large is the data footprint that my recovery method leaves? What is the application downtime? Is there just one backup or are there incremental backups? Is it easy to implement? What is the average recovery time that the method provides? Based on the preceding aspects, we will decide which modes of recovery we need to implement. The following methods are available in Hadoop: Snapshots: Snapshots simply capture a moment in time and allow you to go back to the possible recovery state. Replication: This involves copying data from one cluster and moving it to another cluster, out of the vicinity of the first cluster, so that if one cluster is faulty, it doesn't have an impact on the other. Manual recovery: Probably, the most brutal one is moving data manually from one cluster to another. Clearly, its downsides are large footprints and large application downtime. API: There's always a custom development using the public API available. We will move on to the recovery areas in Hadoop. Understanding recovery areas Recovering data after some sort of disaster needs a well-defined business disaster recovery plan. So, the first step is to decide our business requirements, which will define the need for data availability, precision in data, and requirements for the uptime and downtime of the application. Any disaster recovery policy should basically cover areas as per requirements in the disaster recovery principal. Recovery areas define those portions without which an application won't be able to come back to its normal state. If you are armed and fed with proper information, you will be able to decide the priority of which areas need to be recovered. Recovery areas cover the following core components: Datasets NameNodes Applications Database sets in HBase Let's go back to the Facebook example. Facebook uses a customized version of MySQL for its home page and other interests. But when it comes to Facebook Messenger, Facebook uses the NoSQL database provided by Hadoop. Now, looking from that point of view, Facebook will have both those things in recovery areas and will need different steps to recover each of these areas. Summary In this article, we went through the backup and recovery philosophy and what all points a good backup philosophy should have. We went through what a recovery philosophy constitutes. We saw the modes available for recovery in Hadoop. Then, we looked at why backup is important even though HDFS provides the replication process. Lastly, we looked at the recovery needs and areas. Quite a journey, wasn't it? Well, hold on tight. These are just your first steps into Hadoop User Group (HUG). Resources for Article: Further resources on this subject: Cassandra Architecture [article] Oracle GoldenGate 12c — An Overview [article] Backup and Restore Improvements [article]
Read more
  • 0
  • 0
  • 7059

article-image-using-handlebars-express
Packt
10 Aug 2015
17 min read
Save for later

Using Handlebars with Express

Packt
10 Aug 2015
17 min read
In this article written by Paul Wellens, author of the book Practical Web Development, we cover a brief description about the following topics: Templates Node.js Express 4 Templates Templates come in many shapes or forms. Traditionally, they are non-executable files with some pre-formatted text, used as the basis of a bazillion documents that can be generated with a computer program. I worked on a project where I had to write a program that would take a Microsoft Word template, containing parameters like $first, $name, $phone, and so on, and generate a specific Word document for every student in a school. Web templating does something very similar. It uses a templating processor that takes data from a source of information, typically a database and a template, a generic HTML file with some kind of parameters inside. The processor then merges the data and template to generate a bunch of static web pages or dynamically generates HTML on the fly. If you have been using PHP to create dynamic webpages, you will have been busy with web templating. Why? Because you have been inserting PHP code inside HTML files in between the <?php and ?> strings. Your templating processor was the Apache web server that has many additional roles. By the time your browser gets to see the result of your code, it is pure HTML. This makes this an example of server side templating. You could also use Ajax and PHP to transfer data in the JSON format and then have the browser process that data using JavaScript to create the HTML you need. Combine this with using templates and you will have client side templating. Node.js What Le Sacre du Printemps by Stravinsky did to the world of classical music, Node.js may have done to the world of web development. At its introduction, it shocked the world. By now, Node.js is considered by many as the coolest thing. Just like Le Sacre is a totally different kind of music—but by now every child who has seen Fantasia has heard it—Node.js is a different way of doing web development. Rather than writing an application and using a web server to soup up your code to a browser, the application and the web server are one and the same. This may sound scary, but you should not worry as there is an entire community that developed modules you can obtain using the npm tool. Before showing you an example, I need to point out an extremely important feature of Node.js: the language in which you will write your web server and application is JavaScript. So Node.js gives you server side JavaScript. Installing Node.js How to install Node.js will be different, depending on your OS, but the result is the same everywhere. It gives you two programs: Node and npm. npm The node packaging manager (npm)is the tool that you use to look for and install modules. Each time you write code that needs a module, you will have to add a line like this in: var module = require('module'); The module will have to be installed first, or the code will fail. This is how it is done: npm install module or npm -g install module The latter will attempt to install the module globally, the former, in the directory where the command is issued. It will typically install the module in a folder called node_modules. node The node program is the command to use to start your Node.js program, for example: node myprogram.js node will start and interpret your code. Type Ctrl-C to stop node. Now create a file myprogram.js containing the following text: var http = require('http'); http.createServer(function (req, res) { res.writeHead(200, {'Content-Type': 'text/plain'}); res.end('Hello Worldn'); }).listen(8080, 'localhost'); console.log('Server running at http://localhost:8080'); So, if you installed Node.js and the required http module, typing node myprogram.js in a terminal window, your console will start up a web server. And, when you type http://localhost:8080 in a browser, you will see the world famous two word program example on your screen. This is the equivalent of getting the It works! thing, after testing your Apache web server installation. As a matter of fact, if you go to http://localhost:8080/it/does/not/matterwhat, the same will appear. Not very useful maybe, but it is a web server. Serving up static content This does not work in a way we are used to. URLs typically point to a file (or a folder, in which case the server looks for an index.html file) , foo.html, or bar.php, and when present, it is served up to the client. So what if we want to do this with Node.js? We will need a module. Several exist to do the job. We will use node-static in our example. But first we need to install it: npm install node-static In our app, we will now create not only a web server, but a fileserver as well. It will serve all the files in the local directory public. It is good to have all the so called static content together in a separate folder. These are basically all the files that will be served up to and interpreted by the client. As we will now end up with a mix of client code and server code, it is a good practice to separate them. When you use the Express framework, you have an option to have Express create these things for you. So, here is a second, more complete, Node.js example, including all its static content. hello.js, our node.js app var http = require('http'); var static = require('node-static'); var fileServer = new static.Server('./public'); http.createServer(function (req, res) { fileServer.serve(req,res); }).listen(8080, 'localhost'); console.log('Server running at http://localhost:8080'); hello.html is stored in ./public. <!DOCTYPE html> <html> <head> <meta charset="UTF-8" /> <title>Hello world document</title> <link href="./styles/hello.css" rel="stylesheet"> </head> <body> <h1>Hello, World</h1> </body> </html> hello.css is stored in public/styles. body { background-color:#FFDEAD; } h1 { color:teal; margin-left:30px; } .bigbutton { height:40px; color: white; background-color:teal; margin-left:150px; margin-top:50px; padding:15 15 25 15; font-size:18px; } So, if we now visit http://localhost:8080/hello, we will see our, by now too familiar, Hello World message with some basic styling, proving that our file server also delivered the CSS file. You can easily take it one step further and add JavaScript and the jQuery library and put it in, for example, public/js/hello.js and public/js/jquery.js respectively. Too many notes With Node.js, you only install the modules that you need, so it does not include the kitchen sink by default! You will benefit from that for as far as performance goes. Back in California, I have been a proud product manager of a PC UNIX product, and one of our coolest value-add was a tool, called kconfig, that would allow people to customize what would be inside the UNIX kernel, so that it would only contain what was needed. This is what Node.js reminds me of. And it is written in C, as was UNIX. Deja vu. However, if we wanted our Node.js web server to do everything the Apache Web Server does, we would need a lot of modules. Our application code needs to be added to that as well. That means a lot of modules. Like the critics in the movie Amadeus said: Too many notes. Express 4 A good way to get the job done with fewer notes is by using the Express framework. On the expressjs.com website, it is called a minimal and flexible Node.js web application framework, providing a robust set of features for building web applications. This is a good way to describe what Express can do for you. It is minimal, so there is little overhead for the framework itself. It is flexible, so you can add just what you need. It gives a robust set of features, which means you do not have to create them yourselves, and they have been tested by an ever growing community. But we need to get back to templating, so all we are going to do here is explain how to get Express, and give one example. Installing Express As Express is also a node module, we install it as such. In your project directory for your application, type: npm install express You will notice that a folder called express has been created inside node_modules, and inside that one, there is another collection of node-modules. These are examples of what is called middleware. In the code example that follows, we assume app.js as the name of our JavaScript file, and app for the variable that you will use in that file for your instance of Express. This is for the sake of brevity. It would be better to use a string that matches your project name. We will now use Express to rewrite the hello.js example. All static resources in the public directory can remain untouched. The only change is in the node app itself: var express = require('express'); var path = require('path'); var app = express(); app.set('port', process.env.PORT || 3000); var options = { dotfiles: 'ignore', extensions: ['htm', 'html'], index: false }; app.use(express.static(path.join(__dirname, 'public') , options )); app.listen(app.get('port'), function () { console.log('Hello express started on http://localhost:' + app.get('port') + '; press Ctrl-C to terminate.' ); }); This code uses so called middleware (static) that is included with express. There is a lot more available from third parties. Well, compared to our node.js example, it is about the same number of lines. But it looks a lot cleaner and it does more for us. You no longer need to explicitly include the HTTP module and other such things. Templating and Express We need to get back to templating now. Imagine all the JavaScript ecosystem we just described. Yes, we could still put our client JavaScript code in between the <script> tags but what about the server JavaScript code? There is no such thing as <?javascript ?> ! Node.js and Express, support several templating languages that allow you to separate layout and content, and which have the template system do the work of fetching the content and injecting it into the HTML. The default templating processor for Express appears to be Jade, which uses its own, albeit more compact than HTML, language. Unfortunately, that would mean that you have to learn yet another syntax to produce something. We propose to use handlebars.js. There are two reasons why we have chosen handlebars.js: It uses <html> as the language It is available on both the client and server side Getting the handlebars module for Express Several Express modules exist for handlebars. We happen to like the one with the surprising name express-handlebars. So, we install it, as follows: npm install express-handlebars Layouts I almost called this section templating without templates as our first example will not use a parameter inside the templates. Most websites will consist of several pages, either static or dynamically generated ones. All these pages usually have common parts; a header and footer part, a navigation part or menu, and so on. This is the layout of our site. What distinguishes one page from another, usually, is some part in the body of the page where the home page has different information than the other pages. With express-handlebars, you can separate layout and content. We will start with a very simple example. Inside your project folder that contains public, create a folder, views, with a subdirectory layout. Inside the layouts subfolder, create a file called main.handlebars. This is your default layout. Building on top of the previous example, have it say: <!doctype html> <html> <head> <title>Handlebars demo</title> </head> <link href="./styles/hello.css" rel="stylesheet"> <body> {{{body}}} </body> </html> Notice the {{{body}}} part. This token will be replaced by HTML. Handlebars escapes HTML. If we want our HTML to stay intact, we use {{{ }}}, instead of {{ }}. Body is a reserved word for handlebars. Create, in the folder views, a file called hello.handlebars with the following content. This will be one (of many) example of the HTML {{{body}}}, which will be replaced by: <h1>Hello, World</h1> Let’s create a few more june.handlebars with: <h1>Hello, June Lake</h1> And bodie.handlebars containing: <h1>Hello, Bodie</h1> Our first handlebars example Now, create a file, handlehello.js, in the project folder. For convenience, we will keep the relevant code of the previous Express example: var express = require('express'); var path = require('path'); var app = express(); var exphbs = require(‘express-handlebars’); app.engine('handlebars', exphbs({defaultLayout: 'main'})); app.set('view engine', 'handlebars'); app.set('port', process.env.PORT || 3000); var options = { dotfiles: 'ignore', etag: false, extensions: ['htm', 'html'], index: false }; app.use(express.static(path.join(__dirname, 'public') , options  )); app.get('/', function(req, res) { res.render('hello');   // this is the important part }); app.get('/bodie', function(req, res) { res.render('bodie'); }); app.get('/june', function(req, res) { res.render('june'); }); app.listen(app.get('port'),  function () { console.log('Hello express started on http://localhost:' + app.get('port') + '; press Ctrl-C to terminate.' ); }); Everything that worked before still works, but if you type http://localhost:3000/, you will see a page with the layout from main.handlebars and {{{body}}} replaced by, you guessed it, the same Hello World with basic markup that looks the same as our hello.html example. Let’s look at the new code. First, of course, we need to add a require statement for our express-handlebars module, giving us an instance of express-handlebars. The next two lines specify what the view engine is for this app and what the extension is that is used for the templates and layouts. We pass one option to express-handlebars, defaultLayout, setting the default layout to be main. This way, we could have different versions of our app with different layouts, for example, one using Bootstrap and another using Foundation. The res.render calls determine which views need to be rendered, so if you type http:// localhost:3000/june, you will get Hello, June Lake, rather than Hello World. But this is not at all useful, as in this implementation, you still have a separate file for each Hello flavor. Let’s create a true template instead. Templates In the views folder, create a file, town.handlebars, with the following content: {{!-- Our first template with tokens --}} <h1>Hello, {{town}} </h1> Please note the comment line. This is the syntax for a handlebars comment. You could HTML comments as well, of course, but the advantage of using handlebars comments is that it will not show up in the final output. Next, add this to your JavaScript file: app.get('/lee', function(req, res) { res.render('town', { town: "Lee Vining"}); }); Now, we have a template that we can use over and over again with different context, in this example, a different town name. All you have to do is pass a different second argument to the res.render call, and {{town}} in the template, will be replaced by the value of town in the object. In general, what is passed as the second argument is referred to as the context. Helpers The token can also be replaced by the output of a function. After all, this is JavaScript. In the context of handlebars, we call those helpers. You can write your own, or use some of the cool built-in ones, such as #if and #each. #if/else Let us update town.handlebars as follows: {{#if town}} <h1>Hello, {{town}} </h1> {{else}} <h1>Hello, World </h1> {{/if}} This should be self explanatory. If the variable town has a value, use it, if not, then show the world. Note that what comes after #if can only be something that is either true of false, zero or not. The helper does not support a construct such as #if x < y. #each A very useful built-in helper is #each, which allows you to walk through an array of things and generate HTML accordingly. This is an example of the code that could be inside your app and the template you could use in your view folder: app.js code snippet var californiapeople = {    people: [ {“name":"Adams","first":"Ansel","profession":"photographer",    "born"       :"SanFrancisco"}, {“name":"Muir","first":"John","profession":"naturalist",    "born":"Scotland"}, {“name":"Schwarzenegger","first":"Arnold",    "profession":"governator","born":"Germany"}, {“name":"Wellens","first":"Paul","profession":"author",    "born":"Belgium"} ]   }; app.get('/californiapeople', function(req, res) { res.render('californiapeople', californiapeople); }); template (californiapeople.handlebars) <table class=“cooltable”> {{#each people}}    <tr><td>{{first}}</td><td>{{name}}</td>    <td>{{profession}}</td></tr> {{/each}} </table> Now we are well on our way to do some true templating. You can also write your own helpers, which is beyond the scope of an introductory article. However, before we leave you, there is one cool feature of handlebars you need to know about: partials. Partials In web development, where you dynamically generate HTML to be part of a web page, it is often the case that you repetitively need to do the same thing, albeit on a different page. There is a cool feature in express-handlebars that allows you to do that very same thing: partials. Partials are templates you can refer to inside a template, using a special syntax and drastically shortening your code that way. The partials are stored in a separate folder. By default, that would be views/partials, but you can even use subfolders. Let's redo the previous example but with a partial. So, our template is going to be extremely petite: {{!-- people.handlebars inside views  --}}    {{> peoplepartial }} Notice the > sign; this is what indicates a partial. Now, here is the familiar looking partial template: {{!-- peoplepartialhandlebars inside views/partials --}} <h1>Famous California people </h1> <table> {{#each people}} <tr><td>{{first}}</td><td>{{name}}</td> <td>{{profession}}</td></tr> {{/each}} </table> And, following is the JavaScript code that triggers it: app.get('/people', function(req, res) { res.render('people', californiapeople); }); So, we give it the same context but the view that is rendered is ridiculously simplistic, as there is a partial underneath that will take care of everything. Of course, these were all examples to demonstrate how handlebars and Express can work together really well, nothing more than that. Summary In this article, we talked about using templates in web development. Then, we zoomed in on using Node.js and Express, and introduced Handlebars.js. Handlebars.js is cool, as it lets you separate logic from layout and you can use it server-side (which is where we focused on), as well as client-side. Moreover, you will still be able to use HTML for your views and layouts, unlike with other templating processors. For those of you new to Node.js, I compared it to what Le Sacre du Printemps was to music. To all of you, I recommend the recording by the Los Angeles Philharmonic and Esa-Pekka Salonen. I had season tickets for this guy and went to his inaugural concert with Mahler’s third symphony. PHP had not been written yet, but this particular performance I had heard on the radio while on the road in California, and it was magnificent. Check it out. And, also check out Express and handlebars. Resources for Article: Let's Build with AngularJS and Bootstrap The Bootstrap grid system MODx Web Development: Creating Lists
Read more
  • 0
  • 2
  • 38319

article-image-splunk-interface
Packt
10 Aug 2015
17 min read
Save for later

The Splunk Interface

Packt
10 Aug 2015
17 min read
In this article by Vincent Bumgarner & James D. Miller, author of the book, Implementing Splunk - Second Edition, we will walk through the most common elements in the Splunk interface, and will touch upon concepts that will be covered in greater detail. You may want to dive right into the search section, but an overview of the user interface elements might save you some frustration later. We will cover the following topics: Logging in and app selection A detailed explanation of the search interface widgets A quick overview of the admin interface (For more resources related to this topic, see here.) Logging into Splunk The Splunk GUI interface (Splunk is also accessible through its command-line interface [CLI] and REST API) is web-based, which means that no client needs to be installed. Newer browsers with fast JavaScript engines, such as Chrome, Firefox, and Safari, work better with the interface. As of Splunk Version 6.2.0, no browser extensions are required. Splunk Versions 4.2 and earlier require Flash to render graphs. Flash can still be used by older browsers, or for older apps that reference Flash explicitly. The default port for a Splunk installation is 8000. The address will look like: http://mysplunkserver:8000 or http://mysplunkserver.mycompany.com:8000. The Splunk interface If you have installed Splunk on your local machine, the address can be some variant of http://localhost:8000, http://127.0.0.1:8000, http://machinename:8000, or http://machinename.local:8000. Once you determine the address, the first page you will see is the login screen. The default username is admin with the password changeme. The first time you log in, you will be prompted to change the password for the admin user. It is a good idea to change this password to prevent unwanted changes to your deployment. By default, accounts are configured and stored within Splunk. Authentication can be configured to use another system, for instance Lightweight Directory Access Protocol (LDAP). By default, Splunk authenticates locally. If LDAP is set up, the order is as follows: LDAP / Local. The home app After logging in, the default app is the Launcher app (some may refer to this as Home). This app is a launching pad for apps and tutorials. In earlier versions of Splunk, the Welcome tab provided two important shortcuts, Add data and the Launch search app. In version 6.2.0, the Home app is divided into distinct areas, or panes, that provide easy access to Explore Splunk Enterprise (Add Data, Splunk Apps, Splunk Docs, and Splunk Answers) as well as Apps (the App management page) Search & Reporting (the link to the Search app), and an area where you can set your default dashboard (choose a home dashboard).                 The Explore Splunk Enterprise pane shows links to: Add data: This links Add Data to the Splunk page. This interface is a great start for getting local data flowing into Splunk (making it available to Splunk users). The Preview data interface takes an enormous amount of complexity out of configuring dates and line breaking. Splunk Apps: This allows you to find and install more apps from the Splunk Apps Marketplace (http://apps.splunk.com). This marketplace is a useful resource where Splunk users and employees post Splunk apps, mostly free but some premium ones as well. Splunk Answers: This is one of your links to the wide amount of Splunk documentation available, specifically http://answers.splunk.com, where you can engage with the Splunk community on Splunkbase (https://splunkbase.splunk.com/) and learn how to get the most out of your Splunk deployment. The Apps section shows the apps that have GUI elements on your instance of Splunk. App is an overloaded term in Splunk. An app doesn't necessarily have a GUI at all; it is simply a collection of configurations wrapped into a directory structure that means something to Splunk. Search & Reporting is the link to the Splunk Search & Reporting app. Beneath the Search & Reporting link, Splunk provides an outline which, when you hover over it, displays a Find More Apps balloon tip. Clicking on the link opens the same Browse more apps page as the Splunk Apps link mentioned earlier. Choose a home dashboard provides an intuitive way to select an existing (simple XML) dashboard and set it as part of your Splunk Welcome or Home page. This sets you at a familiar starting point each time you enter Splunk. The following image displays the Choose Default Dashboard dialog: Once you select an existing dashboard from the dropdown list, it will be part of your welcome screen every time you log into Splunk – until you change it. There are no dashboards installed by default after installing Splunk, except the Search & Reporting app. Once you have created additional dashboards, they can be selected as the default. The top bar The bar across the top of the window contains information about where you are, as well as quick links to preferences, other apps, and administration. The current app is specified in the upper-left corner. The following image shows the upper-left Splunk bar when using the Search & Reporting app: Clicking on the text takes you to the default page for that app. In most apps, the text next to the logo is simply changed, but the whole block can be customized with logos and alternate text by modifying the app's CSS. The upper-right corner of the window, as seen in the previous image, contains action links that are almost always available: The name of the user who is currently logged in appears first. In this case, the user is Administrator. Clicking on the username allows you to select Edit Account (which will take you to the Your account page) or to Logout (of Splunk). Logout ends the session and forces the user to login again. The following screenshot shows what the Your account page looks like: This form presents the global preferences that a user is allowed to change. Other settings that affect users are configured through permissions on objects and settings on roles. (Note: preferences can also be configured using the CLI or by modifying specific Splunk configuration files). Full name and Email address are stored for the administrator's convenience. Time zone can be changed for the logged-in user. This is a new feature in Splunk 4.3. Setting the time zone only affects the time zone used to display the data. It is very important that the date is parsed properly when events are indexed. Default app controls the starting page after login. Most users will want to change this to search. Restart backgrounded jobs controls whether unfinished queries should run again if Splunk is restarted. Set password allows you to change your password. This is only relevant if Splunk is configured to use internal authentication. For instance, if the system is configured to use Windows Active Directory via LDAP (a very common configuration), users must change their password in Windows. Messages allows you to view any system-level error messages you may have pending. When there is a new message for you to review, a notification displays as a count next to the Messages menu. You can click the X to remove a message. The Settings link presents the user with the configuration pages for all Splunk Knowledge objects, Distributed Environment settings, System and Licensing, Data, and Users and Authentication settings. If you do not see some of these options, you do not have the permissions to view or edit them. The Activity menu lists shortcuts to Splunk Jobs, Triggered Alerts, and System Activity views. You can click Jobs (to open the search jobs manager window, where you can view and manage currently running searches), click Triggered Alerts (to view scheduled alerts that are triggered) or click System Activity (to see dashboards about user activity and the status of the system). Help lists links to video Tutorials, Splunk Answers, the Splunk Contact Support portal, and online Documentation. Find can be used to search for objects within your Splunk Enterprise instance. For example, if you type in error, it returns the saved objects that contain the term error. These saved objects include Reports, Dashboards, Alerts, and so on. You can also search for error in the Search & Reporting app by clicking Open error in search. The search & reporting app The Search & Reporting app (or just the search app) is where most actions in Splunk start. This app is a dashboard where you will begin your searching. The summary view Within the Search & Reporting app, the user is presented with the Summary view, which contains information about the data which that user searches for by default. This is an important distinction—in a mature Splunk installation, not all users will always search all data by default. But at first, if this is your first trip into Search & Reporting, you'll see the following: From the screen depicted in the previous screenshot, you can access the Splunk documentation related to What to Search and How to Search. Once you have at least some data indexed, Splunk will provide some statistics on the available data under What to Search (remember that this reflects only the indexes that this particular user searches by default; there are other events that are indexed by Splunk, including events that Splunk indexes about itself.) This is seen in the following image: In previous versions of Splunk, panels such as the All indexed data panel provided statistics for a user's indexed data. Other panels gave a breakdown of data using three important pieces of metadata—Source, Sourcetype, and Hosts. In the current version—6.2.0—you access this information by clicking on the button labeled Data Summary, which presents the following to the user: This dialog splits the information into three tabs—Hosts, Sources and Sourcetypes. A host is a captured hostname for an event. In the majority of cases, the host field is set to the name of the machine where the data originated. There are cases where this is not known, so the host can also be configured arbitrarily. A source in Splunk is a unique path or name. In a large installation, there may be thousands of machines submitting data, but all data on the same path across these machines counts as one source. When the data source is not a file, the value of the source can be arbitrary, for instance, the name of a script or network port. A source type is an arbitrary categorization of events. There may be many sources across many hosts, in the same source type. For instance, given the sources /var/log/access.2012-03-01.log and /var/log/access.2012-03-02.log on the hosts fred and wilma, you could reference all these logs with source type access or any other name that you like. Let's move on now and discuss each of the Splunk widgets (just below the app name). The first widget is the navigation bar. As a general rule, within Splunk, items with downward triangles are menus. Items without a downward triangle are links. Next we find the Search bar. This is where the magic starts. We'll go into great detail shortly. Search Okay, we've finally made it to search. This is where the real power of Splunk lies. For our first search, we will search for the word (not case specific); error. Click in the search bar, type the word error, and then either press Enter or click on the magnifying glass to the right of the bar. Upon initiating the search, we are taken to the search results page. Note that the search we just executed was across All time (by default); to change the search time, you can utilize the Splunk time picker. Actions Let's inspect the elements on this page. Below the Search bar, we have the event count, action icons, and menus. Starting from the left, we have the following: The number of events matched by the base search. Technically, this may not be the number of results pulled from disk, depending on your search. Also, if your query uses commands, this number may not match what is shown in the event listing. Job: This opens the Search job inspector window, which provides very detailed information about the query that was run. Pause: This causes the current search to stop locating events but keeps the job open. This is useful if you want to inspect the current results to determine whether you want to continue a long running search. Stop: This stops the execution of the current search but keeps the results generated so far. This is useful when you have found enough and want to inspect or share the results found so far. Share: This shares the search job. This option extends the job's lifetime to seven days and sets the read permissions to everyone. Export: This exports the results. Select this option to output to CSV, raw events, XML, or JavaScript Object Notation (JSON) and specify the number of results to export. Print: This formats the page for printing and instructs the browser to print. Smart Mode: This controls the search experience. You can set it to speed up searches by cutting down on the event data it returns and, additionally, by reducing the number of fields that Splunk will extract by default from the data (Fast mode). You can, otherwise, set it to return as much event information as possible (Verbose mode). In Smart mode (the default setting) it toggles search behavior based on the type of search you're running. Timeline Now we'll skip to the timeline below the action icons. Along with providing a quick overview of the event distribution over a period of time, the timeline is also a very useful tool for selecting sections of time. Placing the pointer over the timeline displays a pop-up for the number of events in that slice of time. Clicking on the timeline selects the events for a particular slice of time. Clicking and dragging selects a range of time. Once you have selected a period of time, clicking on Zoom to selection changes the time frame and reruns the search for that specific slice of time. Repeating this process is an effective way to drill down to specific events. Deselect shows all events for the time range selected in the time picker. Zoom out changes the window of time to a larger period around the events in the current time frame The field picker To the left of the search results, we find the field picker. This is a great tool for discovering patterns and filtering search results. Fields The field list contains two lists: Selected Fields, which have their values displayed under the search event in the search results Interesting Fields, which are other fields that Splunk has picked out for you Above the field list are two links: Hide Fields and All Fields. Hide Fields: Hides the field list area from view. All Fields: Takes you to the Selected Fields window. Search results We are almost through with all the widgets on the page. We still have a number of items to cover in the search results section though, just to be thorough. As you can see in the previous screenshot, at the top of this section, we have the number of events displayed. When viewing all results in their raw form, this number will match the number above the timeline. This value can be changed either by making a selection on the timeline or by using other search commands. Next, we have the action icons (described earlier) that affect these particular results. Under the action icons, we have four results tabs: Events list, which will show the raw events. This is the default view when running a simple search, as we have done so far. Patterns streamlines the event pattern detection. It displays a list of the most common patterns among the set of events returned by your search. Each of these patterns represents the number of events that share a similar structure. Statistics populates when you run a search with transforming commands such as stats, top, chart, and so on. The previous keyword search for error does not display any results in this tab because it does not have any transforming commands. Visualization transforms searches and also populates the Visualization tab. The results area of the Visualization tab includes a chart and the statistics table used to generate the chart. Not all searches are eligible for visualization. Under the tabs described just now, is the timeline. Options Beneath the timeline, (starting at the left) is a row of option links that include: Show Fields: shows the Selected Fields screen List: allows you to select an output option (Raw, List, or Table) for displaying the search results Format: provides the ability to set Result display options, such as Show row numbers, Wrap results, the Max lines (to display) and Drilldown as on or off. NN Per Page: is where you can indicate the number of results to show per page (10, 20, or 50). To the right are options that you can use to choose a page of results, and to change the number of events per page. In prior versions of Splunk, these options were available from the Results display options popup dialog. The events viewer Finally, we make it to the actual events. Let's examine a single event. Starting at the left, we have: Event Details: Clicking here (indicated by the right facing arrow) opens the selected event, providing specific information about the event by type, field, and value, and allows you the ability to perform specific actions on a particular event field. In addition, Splunk version 6.2.0 offers a button labeled Event Actions to access workflow actions, a few of which are always available. Build Eventtype: Event types are a way to name events that match a certain query. Extract Fields: This launches an interface for creating custom field extractions. Show Source: This pops up a window with a simulated view of the original source. The event number: Raw search results are always returned in the order most recent first. Next to appear are any workflow actions that have been configured. Workflow actions let you create new searches or links to other sites, using data from an event. Next comes the parsed date from this event, displayed in the time zone selected by the user. This is an important and often confusing distinction. In most installations, everything is in one time zone—the servers, the user, and the events. When one of these three things is not in the same time zone as the others, things can get confusing. Next, we see the raw event itself. This is what Splunk saw as an event. With no help, Splunk can do a good job finding the date and breaking lines appropriately, but as we will see later, with a little help, event parsing can be more reliable and more efficient. Below the event are the fields that were selected in the field picker. Clicking on the value adds the field value to the search. Summary As you have seen, the Splunk GUI provides a rich interface for working with search results. We have really only scratched the surface and will cover more elements. Resources for Article: Further resources on this subject: The Splunk Web Framework [Article] Loading data, creating an app, and adding dashboards and reports in Splunk [Article] Working with Apps in Splunk [Article]
Read more
  • 0
  • 0
  • 4002
article-image-bayesian-network-fundamentals
Packt
10 Aug 2015
25 min read
Save for later

Bayesian Network Fundamentals

Packt
10 Aug 2015
25 min read
In this article by Ankur Ankan and Abinash Panda, the authors of Mastering Probabilistic Graphical Models Using Python, we'll cover the basics of random variables, probability theory, and graph theory. We'll also see the Bayesian models and the independencies in Bayesian models. A graphical model is essentially a way of representing joint probability distribution over a set of random variables in a compact and intuitive form. There are two main types of graphical models, namely directed and undirected. We generally use a directed model, also known as a Bayesian network, when we mostly have a causal relationship between the random variables. Graphical models also give us tools to operate on these models to find conditional and marginal probabilities of variables, while keeping the computational complexity under control. (For more resources related to this topic, see here.) Probability theory To understand the concepts of probability theory, let's start with a real-life situation. Let's assume we want to go for an outing on a weekend. There are a lot of things to consider before going: the weather conditions, the traffic, and many other factors. If the weather is windy or cloudy, then it is probably not a good idea to go out. However, even if we have information about the weather, we cannot be completely sure whether to go or not; hence we have used the words probably or maybe. Similarly, if it is windy in the morning (or at the time we took our observations), we cannot be completely certain that it will be windy throughout the day. The same holds for cloudy weather; it might turn out to be a very pleasant day. Further, we are not completely certain of our observations. There are always some limitations in our ability to observe; sometimes, these observations could even be noisy. In short, uncertainty or randomness is the innate nature of the world. The probability theory provides us the necessary tools to study this uncertainty. It helps us look into options that are unlikely yet probable. Random variable Probability deals with the study of events. From our intuition, we can say that some events are more likely than others, but to quantify the likeliness of a particular event, we require the probability theory. It helps us predict the future by assessing how likely the outcomes are. Before going deeper into the probability theory, let's first get acquainted with the basic terminologies and definitions of the probability theory. A random variable is a way of representing an attribute of the outcome. Formally, a random variable X is a function that maps a possible set of outcomes ? to some set E, which is represented as follows: X : ? ? E As an example, let us consider the outing example again. To decide whether to go or not, we may consider the skycover (to check whether it is cloudy or not). Skycover is an attribute of the day. Mathematically, the random variable skycover (X) is interpreted as a function, which maps the day (?) to its skycover values (E). So when we say the event X = 40.1, it represents the set of all the days {?} such that  , where  is the mapping function. Formally speaking, . Random variables can either be discrete or continuous. A discrete random variable can only take a finite number of values. For example, the random variable representing the outcome of a coin toss can take only two values, heads or tails; and hence, it is discrete. Whereas, a continuous random variable can take infinite number of values. For example, a variable representing the speed of a car can take any number values. For any event whose outcome is represented by some random variable (X), we can assign some value to each of the possible outcomes of X, which represents how probable it is. This is known as the probability distribution of the random variable and is denoted by P(X). For example, consider a set of restaurants. Let X be a random variable representing the quality of food in a restaurant. It can take up a set of values, such as {good, bad, average}. P(X), represents the probability distribution of X, that is, if P(X = good) = 0.3, P(X = average) = 0.5, and P(X = bad) = 0.2. This means there is 30 percent chance of a restaurant serving good food, 50 percent chance of it serving average food, and 20 percent chance of it serving bad food. Independence and conditional independence In most of the situations, we are rather more interested in looking at multiple attributes at the same time. For example, to choose a restaurant, we won't only be looking just at the quality of food; we might also want to look at other attributes, such as the cost, location, size, and so on. We can have a probability distribution over a combination of these attributes as well. This type of distribution is known as joint probability distribution. Going back to our restaurant example, let the random variable for the quality of food be represented by Q, and the cost of food be represented by C. Q can have three categorical values, namely {good, average, bad}, and C can have the values {high, low}. So, the joint distribution for P(Q, C) would have probability values for all the combinations of states of Q and C. P(Q = good, C = high) will represent the probability of a pricey restaurant with good quality food, while P(Q = bad, C = low) will represent the probability of a restaurant that is less expensive with bad quality food. Let us consider another random variable representing an attribute of a restaurant, its location L. The cost of food in a restaurant is not only affected by the quality of food but also the location (generally, a restaurant located in a very good location would be more costly as compared to a restaurant present in a not-very-good location). From our intuition, we can say that the probability of a costly restaurant located at a very good location in a city would be different (generally, more) than simply the probability of a costly restaurant, or the probability of a cheap restaurant located at a prime location of city is different (generally less) than simply probability of a cheap restaurant. Formally speaking, P(C = high | L = good) will be different from P(C = high) and P(C = low | L = good) will be different from P(C = low). This indicates that the random variables C and L are not independent of each other. These attributes or random variables need not always be dependent on each other. For example, the quality of food doesn't depend upon the location of restaurant. So, P(Q = good | L = good) or P(Q = good | L = bad)would be the same as P(Q = good), that is, our estimate of the quality of food of the restaurant will not change even if we have knowledge of its location. Hence, these random variables are independent of each other. In general, random variables  can be considered as independent of each other, if: They may also be considered independent if: We can easily derive this conclusion. We know the following from the chain rule of probability: P(X, Y) = P(X) P(Y | X) If Y is independent of X, that is, if X | Y, then P(Y | X) = P(Y). Then: P(X, Y) = P(X) P(Y) Extending this result on multiple variables, we can easily get to the conclusion that a set of random variables are independent of each other, if their joint probability distribution is equal to the product of probabilities of each individual random variable. Sometimes, the variables might not be independent of each other. To make this clearer, let's add another random variable, that is, the number of people visiting the restaurant N. Let's assume that, from our experience we know the number of people visiting only depends on the cost of food at the restaurant and its location (generally, lesser number of people visit costly restaurants). Does the quality of food Q affect the number of people visiting the restaurant? To answer this question, let's look into the random variable affecting N, cost C, and location L. As C is directly affected by Q, we can conclude that Q affects N. However, let's consider a situation when we know that the restaurant is costly, that is, C = high and let's ask the same question, "does the quality of food affect the number of people coming to the restaurant?". The answer is no. The number of people coming only depends on the price and location, so if we know that the cost is high, then we can easily conclude that fewer people will visit, irrespective of the quality of food. Hence,  . This type of independence is called conditional independence. Installing tools Let's now see some coding examples using pgmpy, to represent joint distributions and independencies. Here, we will mostly work with IPython and pgmpy (and a few other libraries) for coding examples. So, before moving ahead, let's get a basic introduction to these. IPython IPython is a command shell for interactive computing in multiple programming languages, originally developed for the Python programming language, which offers enhanced introspection, rich media, additional shell syntax, tab completion, and a rich history. IPython provides the following features: Powerful interactive shells (terminal and Qt-based) A browser-based notebook with support for code, text, mathematical expressions, inline plots, and other rich media Support for interactive data visualization and use of GUI toolkits Flexible and embeddable interpreters to load into one's own projects Easy-to-use and high performance tools for parallel computing You can install IPython using the following command: >>> pip3 install ipython To start the IPython command shell, you can simply type ipython3 in the terminal. For more installation instructions, you can visit http://ipython.org/install.html. pgmpy pgmpy is a Python library to work with Probabilistic Graphical models. As it's currently not on PyPi, we will need to build it manually. You can get the source code from the Git repository using the following command: >>> git clone https://github.com/pgmpy/pgmpy Now cd into the cloned directory switch branch for version used and build it with the following code: >>> cd pgmpy >>> git checkout book/v0.1 >>> sudo python3 setup.py install For more installation instructions, you can visit http://pgmpy.org/install.html. With both IPython and pgmpy installed, you should now be able to run the examples. Representing independencies using pgmpy To represent independencies, pgmpy has two classes, namely IndependenceAssertion and Independencies. The IndependenceAssertion class is used to represent individual assertions of the form of  or  . Let's see some code to represent assertions: # Firstly we need to import IndependenceAssertion In [1]: from pgmpy.independencies import IndependenceAssertion # Each assertion is in the form of [X, Y, Z] meaning X is # independent of Y given Z. In [2]: assertion1 = IndependenceAssertion('X', 'Y') In [3]: assertion1 Out[3]: (X _|_ Y) Here, assertion1 represents that the variable X is independent of the variable Y. To represent conditional assertions, we just need to add a third argument to IndependenceAssertion: In [4]: assertion2 = IndependenceAssertion('X', 'Y', 'Z') In [5]: assertion2 Out [5]: (X _|_ Y | Z) In the preceding example, assertion2 represents . IndependenceAssertion also allows us to represent assertions in the form of  . To do this, we just need to pass a list of random variables as arguments: In [4]: assertion2 = IndependenceAssertion('X', 'Y', 'Z') In [5]: assertion2 Out[5]: (X _|_ Y | Z) Moving on to the Independencies class, an Independencies object is used to represent a set of assertions. Often, in the case of Bayesian or Markov networks, we have more than one assertion corresponding to a given model, and to represent these independence assertions for the models, we generally use the Independencies object. Let's take a few examples: In [8]: from pgmpy.independencies import Independencies # There are multiple ways to create an Independencies object, we # could either initialize an empty object or initialize with some # assertions.   In [9]: independencies = Independencies() # Empty object In [10]: independencies.get_assertions() Out[10]: []   In [11]: independencies.add_assertions(assertion1, assertion2) In [12]: independencies.get_assertions() Out[12]: [(X _|_ Y), (X _|_ Y | Z)] We can also directly initialize Independencies in these two ways: In [13]: independencies = Independencies(assertion1, assertion2) In [14]: independencies = Independencies(['X', 'Y'],                                          ['A', 'B', 'C']) In [15]: independencies.get_assertions() Out[15]: [(X _|_ Y), (A _|_ B | C)] Representing joint probability distributions using pgmpy We can also represent joint probability distributions using pgmpy's JointProbabilityDistribution class. Let's say we want to represent the joint distribution over the outcomes of tossing two fair coins. So, in this case, the probability of all the possible outcomes would be 0.25, which is shown as follows: In [16]: from pgmpy.factors import JointProbabilityDistribution as         Joint In [17]: distribution = Joint(['coin1', 'coin2'],                              [2, 2],                              [0.25, 0.25, 0.25, 0.25]) Here, the first argument includes names of random variable. The second argument is a list of the number of states of each random variable. The third argument is a list of probability values, assuming that the first variable changes its states the slowest. So, the preceding distribution represents the following: In [18]: print(distribution) +--------------------------------------+ ¦ coin1   ¦ coin2   ¦   P(coin1,coin2) ¦ ¦---------+---------+------------------¦ ¦ coin1_0 ¦ coin2_0 ¦   0.2500         ¦ +---------+---------+------------------¦ ¦ coin1_0 ¦ coin2_1 ¦   0.2500         ¦ +---------+---------+------------------¦ ¦ coin1_1 ¦ coin2_0 ¦   0.2500         ¦ +---------+---------+------------------¦ ¦ coin1_1 ¦ coin2_1 ¦   0.2500         ¦ +--------------------------------------+ We can also conduct independence queries over these distributions in pgmpy: In [19]: distribution.check_independence('coin1', 'coin2') Out[20]: True Conditional probability distribution Let's take an example to understand conditional probability better. Let's say we have a bag containing three apples and five oranges, and we want to randomly take out fruits from the bag one at a time without replacing them. Also, the random variables  and  represent the outcomes in the first try and second try respectively. So, as there are three apples and five oranges in the bag initially,  and  . Now, let's say that in our first attempt we got an orange. Now, we cannot simply represent the probability of getting an apple or orange in our second attempt. The probabilities in the second attempt will depend on the outcome of our first attempt and therefore, we use conditional probability to represent such cases. Now, in the second attempt, we will have the following probabilities that depend on the outcome of our first try:  ,  ,  , and  . The Conditional Probability Distribution (CPD) of two variables  and  can be represented as  , representing the probability of  given  that is the probability of  after the event  has occurred and we know it's outcome. Similarly, we can have  representing the probability of  after having an observation for . The simplest representation of CPD is tabular CPD. In a tabular CPD, we construct a table containing all the possible combinations of different states of the random variables and the probabilities corresponding to these states. Let's consider the earlier restaurant example. Let's begin by representing the marginal distribution of the quality of food with Q. As we mentioned earlier, it can be categorized into three values {good, bad, average}. For example, P(Q) can be represented in the tabular form as follows: Quality P(Q) Good 0.3 Normal 0.5 Bad 0.2 Similarly, let's say P(L) is the probability distribution of the location of the restaurant. Its CPD can be represented as follows: Location P(L) Good 0.6 Bad 0.4 As the cost of restaurant C depends on both the quality of food Q and its location L, we will be considering P(C | Q, L), which is the conditional distribution of C, given Q and L: Location Good Bad Quality Good Normal Bad Good Normal Bad Cost             High 0.8 0.6 0.1 0.6 0.6 0.05 Low 0.2 0.4 0.9 0.4 0.4 0.95 Representing CPDs using pgmpy Let's first see how to represent the tabular CPD using pgmpy for variables that have no conditional variables: In [1]: from pgmpy.factors import TabularCPD   # For creating a TabularCPD object we need to pass three # arguments: the variable name, its cardinality that is the number # of states of the random variable and the probability value # corresponding each state. In [2]: quality = TabularCPD(variable='Quality',                              variable_card=3,                                values=[[0.3], [0.5], [0.2]]) In [3]: print(quality) +----------------------+ ¦ ['Quality', 0] ¦ 0.3 ¦ +----------------+-----¦ ¦ ['Quality', 1] ¦ 0.5 ¦ +----------------+-----¦ ¦ ['Quality', 2] ¦ 0.2 ¦ +----------------------+ In [4]: quality.variables Out[4]: OrderedDict([('Quality', [State(var='Quality', state=0),                                  State(var='Quality', state=1),                                  State(var='Quality', state=2)])])   In [5]: quality.cardinality Out[5]: array([3])   In [6]: quality.values Out[6]: array([0.3, 0.5, 0.2]) You can see here that the values of the CPD are a 1D array instead of a 2D array, which you passed as an argument. Actually, pgmpy internally stores the values of the TabularCPD as a flattened numpy array. In [7]: location = TabularCPD(variable='Location',                               variable_card=2,                              values=[[0.6], [0.4]]) In [8]: print(location) +-----------------------+ ¦ ['Location', 0] ¦ 0.6 ¦ +-----------------+-----¦ ¦ ['Location', 1] ¦ 0.4 ¦ +-----------------------+ However, when we have conditional variables, we also need to specify them and the cardinality of those variables. Let's define the TabularCPD for the cost variable: In [9]: cost = TabularCPD(                      variable='Cost',                      variable_card=2,                      values=[[0.8, 0.6, 0.1, 0.6, 0.6, 0.05],                              [0.2, 0.4, 0.9, 0.4, 0.4, 0.95]],                      evidence=['Q', 'L'],                      evidence_card=[3, 2]) Graph theory The second major framework for the study of probabilistic graphical models is graph theory. Graphs are the skeleton of PGMs, and are used to compactly encode the independence conditions of a probability distribution. Nodes and edges The foundation of graph theory was laid by Leonhard Euler when he solved the famous Seven Bridges of Konigsberg problem. The city of Konigsberg was set on both sides by the Pregel river and included two islands that were connected and maintained by seven bridges. The problem was to find a walk to exactly cross all the bridges once in a single walk. To visualize the problem, let's think of the graph in Fig 1.1: Fig 1.1: The Seven Bridges of Konigsberg graph Here, the nodes a, b, c, and d represent the land, and are known as vertices of the graph. The line segments ab, bc, cd, da, ab, and bc connecting the land parts are the bridges and are known as the edges of the graph. So, we can think of the problem of crossing all the bridges once in a single walk as tracing along all the edges of the graph without lifting our pencils. Formally, a graph G = (V, E) is an ordered pair of finite sets. The elements of the set V are known as the nodes or the vertices of the graph, and the elements of  are the edges or the arcs of the graph. The number of nodes or cardinality of G, denoted by |V|, are known as the order of the graph. Similarly, the number of edges denoted by |E| are known as the size of the graph. Here, we can see that the Konigsberg city graph shown in Fig 1.1 is of order 4 and size 7. In a graph, we say that two vertices, u, v ? V are adjacent if u, v ? E. In the City graph, all the four vertices are adjacent to each other because there is an edge for every possible combination of two vertices in the graph. Also, for a vertex v ? V, we define the neighbors set of v as  . In the City graph, we can see that b and d are neighbors of c. Similarly, a, b, and c are neighbors of d. We define an edge to be a self loop if the start vertex and the end vertex of the edge are the same. We can put it more formally as, any edge of the form (u, u), where u ? V is a self loop. Until now, we have been talking only about graphs whose edges don't have a direction associated with them, which means that the edge (u, v) is same as the edge (v, u). These types of graphs are known as undirected graphs. Similarly, we can think of a graph whose edges have a sense of direction associated with it. For these graphs, the edge set E would be a set of ordered pair of vertices. These types of graphs are known as directed graphs. In the case of a directed graph, we also define the indegree and outdegree for a vertex. For a vertex v ? V, we define its outdegree as the number of edges originating from the vertex v, that is,  . Similarly, the indegree is defined as the number of edges that end at the vertex v, that is,  . Walk, paths, and trails For a graph G = (V, E) and u,v ? V, we define a u - v walk as an alternating sequence of vertices and edges, starting with u and ending with v. In the City graph of Fig 1.1, we can have an example of a - d walk as . If there aren't multiple edges between the same vertices, then we simply represent a walk by a sequence of vertices. As in the case of the Butterfly graph shown in Fig 1.2, we can have a walk W : a, c, d, c, e: Fig 1.2: Butterfly graph—a undirected graph A walk with no repeated edges is known as a trail. For example, the walk  in the City graph is a trail. Also, a walk with no repeated vertices, except possibly the first and the last, is known as a path. For example, the walk  in the City graph is a path. Also, a graph is known as cyclic if there are one or more paths that start and end at the same node. Such paths are known as cycles. Similarly, if there are no cycles in a graph, it is known as an acyclic graph. Bayesian models In most of the real-life cases when we would be representing or modeling some event, we would be dealing with a lot of random variables. Even if we would consider all the random variables to be discrete, there would still be exponentially large number of values in the joint probability distribution. Dealing with such huge amount of data would be computationally expensive (and in some cases, even intractable), and would also require huge amount of memory to store the probability of each combination of states of these random variables. However, in most of the cases, many of these variables are marginally or conditionally independent of each other. By exploiting these independencies, we can reduce the number of values we need to store to represent the joint probability distribution. For instance, in the previous restaurant example, the joint probability distribution across the four random variables that we discussed (that is, quality of food Q, location of restaurant L, cost of food C, and the number of people visiting N) would require us to store 23 independent values. By the chain rule of probability, we know the following: P(Q, L, C, N) = P(Q) P(L|Q) P(C|L, Q) P(N|C, Q, L) Now, let us try to exploit the marginal and conditional independence between the variables, to make the representation more compact. Let's start by considering the independency between the location of the restaurant and quality of food over there. As both of these attributes are independent of each other, P(L|Q) would be the same as P(L). Therefore, we need to store only one parameter to represent it. From the conditional independence that we have seen earlier, we know that  . Thus, P(N|C, Q, L) would be the same as P(N|C, L); thus needing only four parameters. Therefore, we now need only (2 + 1 + 6 + 4 = 13) parameters to represent the whole distribution. We can conclude that exploiting independencies helps in the compact representation of joint probability distribution. This forms the basis for the Bayesian network. Representation A Bayesian network is represented by a Directed Acyclic Graph (DAG) and a set of Conditional Probability Distributions (CPD) in which: The nodes represent random variables The edges represent dependencies For each of the nodes, we have a CPD In our previous restaurant example, the nodes would be as follows: Quality of food (Q) Location (L) Cost of food (C) Number of people (N) As the cost of food was dependent on the quality of food (Q) and the location of the restaurant (L), there will be an edge each from Q ? C and L ? C. Similarly, as the number of people visiting the restaurant depends on the price of food and its location, there would be an edge each from L ? N and C ? N. The resulting structure of our Bayesian network is shown in Fig 1.3: Fig 1.3: Bayesian network for the restaurant example Factorization of a distribution over a network Each node in our Bayesian network for restaurants has a CPD associated to it. For example, the CPD for the cost of food in the restaurant is P(C|Q, L), as it only depends on the quality of food and location. For the number of people, it would be P(N|C, L) . So, we can generalize that the CPD associated with each node would be P(node|Par(node)) where Par(node) denotes the parents of the node in the graph. Assuming some probability values, we will finally get a network as shown in Fig 1.4: Fig 1.4: Bayesian network of restaurant along with CPDs Let us go back to the joint probability distribution of all these attributes of the restaurant again. Considering the independencies among variables, we concluded as follows: P(Q,C,L,N) = P(Q)P(L)P(C|Q, L)P(N|C, L) So now, looking into the Bayesian network (BN) for the restaurant, we can say that for any Bayesian network, the joint probability distribution  over all its random variables {X1,X2,...,Xn} can be represented as follows: This is known as the chain rule for Bayesian networks. Also, we say that a distribution P factorizes over a graph G, if P can be encoded as follows: Here, ParG(X) is the parent of X in the graph G. Summary In this article, we saw how we can represent a complex joint probability distribution using a directed graph and a conditional probability distribution associated with each node, which is collectively known as a Bayesian network. Resources for Article:   Further resources on this subject: Web Scraping with Python [article] Exact Inference Using Graphical Models [article] wxPython: Design Approaches and Techniques [article]
Read more
  • 0
  • 0
  • 47571

article-image-nltk-hackers
Packt
07 Aug 2015
9 min read
Save for later

NLTK for hackers

Packt
07 Aug 2015
9 min read
In this article written by Nitin Hardeniya, author of the book NLTK Essentials, we will learn that "Life is short, we need Python" that's the mantra I follow and truly believe in. As fresh graduates, we learned and worked mostly with C/C++/JAVA. While these languages have amazing features, Python has a charm of its own. The day I started using Python I loved it. I really did. The big coincidence here is that I finally ended up working with Python during my initial projects on the job. I started to love the kind of datastructures, Libraries, and echo system Python has for beginners as well as for an expert programmer. (For more resources related to this topic, see here.) Python as a language has advanced very fast and spatially. If you are a Machine learning/ Natural language Processing enthusiast, then Python is 'the' go-to language these days. Python has some amazing ways of dealing with strings. It has a very easy and elegant coding style, and most importantly a long list of open libraries. I can go on and on about Python and my love for it. But here I want to talk about very specifically about NLTK (Natural Language Toolkit), one of the most popular Python libraries for Natural language processing. NLTK is simply awesome, and in my opinion,it's the best way to learn and implement some of the most complex NLP concepts. NLTK has variety of generic text preprocessing tool, such as Tokenization, Stop word removal, Stemming, and at the same time,has some very NLP-specific tools,such as Part of speech tagging, Chunking, Named Entity recognition, and dependency parsing. NLTK provides some of the easiest solutions to all the above stages of NLP and that's why it is the most preferred library for any text processing/ text mining application. NLTK not only provides some pretrained models that can be applied directly to your dataset, it also provides ways to customize and build your own taggers, tokenizers, and so on. NLTK is a big library that has many tools available for an NLP developer. I have provided a cheat-sheet of some of the most common steps and their solutions using NLTK. In our book, NLTK Essentials, I have tried to give you enough information to deal with all these processing steps using NLTK. To show you the power of NLTK, let's try to develop a very easy application of finding topics in the unstructured text in a word cloud. Word CloudNLTK Instead of going further into the theoretical aspects of natural language processing, let's start with a quick dive into NLTK. I am going to start with some basic example use cases of NLTK. There is a good chance that you have already done something similar. First, I will give a typical Python programmer approach and then move on to NLTK for a much more efficient, robust, and clean solution. We will start analyzing with some example text content: >>>import urllib2>>># urllib2 is use to download the html content of the web link>>>response = urllib2.urlopen('http://python.org/')>>># You can read the entire content of a file using read() method>>>html = response.read()>>>print len(html)47020 For the current example, I have taken the content from Python's home page: https://www.python.org/. We don't have any clue about the kind of topics that are discussed in this URL, so let's say that we want to start an exploratory data analysis (EDA). Typically in a text domain, EDA can have many meanings, but will go with a simple case of what kinds of terms dominate the documents. What are the topics? How frequent are they? The process will involve some level of preprocessing we will try to do this in a pure Python wayand then we will do it using NLTK. Let's start with cleaning the html tags. One way to do this is to select just tokens, including numbers and character. Anybody who has worked with regular expression should be able to convert html string into a list of tokens: >>># regular expression based split the string>>>tokens = [tok for tok in html.split()]>>>print "Total no of tokens :"+ str(len(tokens))>>># first 100 tokens>>>print tokens[0:100]Total no of tokens :2860['<!doctype', 'html>', '<!--[if', 'lt', 'IE', '7]>', '<html', 'class="no-js', 'ie6', 'lt-ie7', 'lt-ie8', 'lt-ie9">', '<![endif]-->', '<!--[if', 'IE', '7]>', '<html', 'class="no-js', 'ie7', 'lt-ie8', 'lt-ie9">', '<![endif]-->', ''type="text/css"', 'media="not', 'print,', 'braille,' ...] As you can see, there is an excess of html tags and other unwanted characters when we use the preceding method. A cleaner version of the same task will look something like this: >>>import re>>># using the split function https://docs.python.org/2/library/re.html>>>tokens = re.split('W+',html)>>>print len(tokens)>>>print tokens[0:100]5787['', 'doctype', 'html', 'if', 'lt', 'IE', '7', 'html', 'class', 'no', 'js', 'ie6', 'lt', 'ie7', 'lt', 'ie8', 'lt', 'ie9', 'endif', 'if', 'IE', '7', 'html', 'class', 'no', 'js', 'ie7', 'lt', 'ie8', 'lt', 'ie9', 'endif', 'if', 'IE', '8', 'msapplication', 'tooltip', 'content', 'The', 'official', 'home', 'of', 'the', 'Python', 'Programming', 'Language', 'meta', 'name', 'apple' ...] This looks much cleaner now. But still you can do more; I leave it to you to try to remove as much noise as you can. You can still look for word length as a criteria and remove words that have a length one—it will remove elements,such as 7, 8, and so on, which are just noise in this case. Now let's go to NLTK for the same task. There is a function called clean_html() that can do all the work we were looking for: >>>import nltk>>># http://www.nltk.org/api/nltk.html#nltk.util.clean_html>>>clean = nltk.clean_html(html)>>># clean will have entire string removing all the html noise>>>tokens = [tok for tok in clean.split()]>>>print tokens[:100]['Welcome', 'to', 'Python.org', 'Skip', 'to', 'content', '&#9660;', 'Close', 'Python', 'PSF', 'Docs', 'PyPI', 'Jobs', 'Community', '&#9650;', 'The', 'Python', 'Network', '&equiv;', 'Menu', 'Arts', 'Business' ...] Cool, right? This definitely is much cleaner and easier to do. No analysis in any EDA can start without distribution. Let's try to get the frequency distribution. First, let's do it the Python way, then I will tell you the NLTK recipe. >>>import operator>>>freq_dis={}>>>for tok in tokens:>>>    if tok in freq_dis:>>>        freq_dis[tok]+=1>>>    else:>>>        freq_dis[tok]=1>>># We want to sort this dictionary on values ( freq in this case )>>>sorted_freq_dist= sorted(freq_dis.items(), key=operator.itemgetter(1), reverse=True)>>> print sorted_freq_dist[:25][('Python', 55), ('>>>', 23), ('and', 21), ('to', 18), (',', 18), ('the', 14), ('of', 13), ('for', 12), ('a', 11), ('Events', 11), ('News', 11), ('is', 10), ('2014-', 10), ('More', 9), ('#', 9), ('3', 9), ('=', 8), ('in', 8), ('with', 8), ('Community', 7), ('The', 7), ('Docs', 6), ('Software', 6), (':', 6),  ('3:', 5), ('that', 5), ('sum', 5)] Naturally, as this is Python's home page, Python and the >>> interpreters are the most common terms, also giving a sense about the website. A better and efficient approach is to use NLTK's FreqDist() function. For this, we will take a look at the same code we developed before: >>>import nltk>>>Freq_dist_nltk=nltk.FreqDist(tokens)>>>print Freq_dist_nltk>>>for k,v in Freq_dist_nltk.items():>>>    print str(k)+':'+str(v)<FreqDist: 'Python': 55, '>>>': 23, 'and': 21, ',': 18, 'to': 18, 'the': 14, 'of': 13, 'for': 12, 'Events': 11, 'News': 11, ...>Python:55>>>:23and:21,:18to:18the:14of:13for:12Events:11News:11 Let's now do some more funky things. Let's plot this: >>>Freq_dist_nltk.plot(50, cumulative=False)>>># below is the plot for the frequency distributions We can see that the cumulative frequency is growing, and at words such as other and frequency 400, the curve is going into long tail. Still, there is some noise, and there are words such asthe, of, for, and =. These are useless words, and there is a terminology for these words. These words are stop words,such asthe, a, and an. Article pronouns are generally present in most of the documents; hence, they are not discriminative enough to be informative. In most of the NLP and information retrieval tasks, people generally remove stop words. Let's go back again to our running example: >>>stopwords=[word.strip().lower() for word in open("PATH/english.stop.txt")]>>>clean_tokens=[tok for tok in tokens if len(tok.lower())>1 and (tok.lower() not in stopwords)]>>>Freq_dist_nltk=nltk.FreqDist(clean_tokens)>>>Freq_dist_nltk.plot(50, cumulative=False) This looks much cleaner now! After finishing this much, you should be able to get something like this using word cloud: Please go to http://www.wordle.net/advanced for more word clouds. Summary To summarize, this article was intended to give you a brief introduction toNatural Language Processing. The book does assume some background in NLP andprogramming in Python, but we have tried to give a very quick head start to Pythonand NLP. Resources for Article: Further resources on this subject: Hadoop Monitoring and its aspects [Article] Big Data Analysis (R and Hadoop) [Article] SciPy for Signal Processing [Article]
Read more
  • 0
  • 0
  • 2823
Modal Close icon
Modal Close icon