Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Events
Videos
Audiobooks
Packt Hub
Free Learning
Arrow right icon
timer SALE ENDS IN
0 Days
:
00 Hours
:
00 Minutes
:
00 Seconds

How-To Tutorials

7019 Articles
article-image-getting-started-java-driver-mongodb
Packt
11 Aug 2015
8 min read
Save for later

Getting Started with Java Driver for MongoDB

Packt
11 Aug 2015
8 min read
In this article by Francesco Marchioni, author of the book MongoDB for Java Developers, you will be able to perform all the create/read/update/delete (CRUD) operations that we have so far accomplished using the mongo shell. (For more resources related to this topic, see here.) Querying data We will now see how to use the Java API to query for your documents. Querying for documents with MongoDB resembles JDBC queries; the main difference is that the returned object is a com.mongodb.DBCursor class, which is an iterator over the database result. In the following example, we are iterating over the javastuff collection that should contain the documents inserted so far: package com.packtpub.mongo.chapter2;   import com.mongodb.DB; import com.mongodb.DBCollection; import com.mongodb.DBCursor; import com.mongodb.DBObject; import com.mongodb.MongoClient;   public class SampleQuery{   private final static String HOST = "localhost"; private final static int PORT = 27017;   public static void main( String args[] ){    try{      MongoClient mongoClient = new MongoClient( HOST,PORT );           DB db = mongoClient.getDB( "sampledb" );           DBCollection coll = db.getCollection("javastuff");      DBCursor cursor = coll.find();      try {        while(cursor.hasNext()) {          DBObject object = cursor.next();          System.out.println(object);        }      }      finally {        cursor.close();      }       }    catch(Exception e) {      System.err.println( e.getClass().getName() + ": " +       e.getMessage() );    } } } Depending on the documents you have inserted, the output could be something like this: { "_id" : { "$oid" : "5513f8836c0df1301685315b"} , "name" : "john" , "age" : 35 , "kids" : [ { "name" : "mike"} , { "name" : "faye"}] , "info" : { "email" : "john@mail.com" , "phone" : "876-134-667"}} . . . . Restricting the search to the first document The find operator executed without any parameter returns the full cursor of a collection; pretty much like the SELECT * query in relational DB terms. If you are interested in reading just the first document in the collection, you could use the findOne() operation to get the first document in the collection. This method returns a single document (instead of the DBCursor that the find() operation returns). As you can see, the findOne() operator directly returns a DBObject instead of a com.mongodb.DBCursor class: DBObject myDoc = coll.findOne(); System.out.println(myDoc); Querying the number of documents in a collection Another typical construct that you probably know from the SQL is the SELECT count(*) query that is useful to retrieve the number of records in a table. In MongoDB terms, you can get this value simply by invoking the getCount against a DBCollection class: DBCollection coll = db.getCollection("javastuff"); System.out.println(coll.getCount()); As an alternative, you could execute the count() method over the DBCursor object: DBCursor cursor = coll.find(); System.out.println(cursor.count()); Eager fetching of data using DBCursor When find is executed and a DBCursor is executed you have a pointer to a database document. This means that the documents are fetched in the memory as you call next() method on the DBCursor. On the other hand, you can eagerly load all the data into the memory by executing the toArray() method, which returns a java.util.List structure: List list = collection.find( query ).toArray(); The problem with this approach is that you could potentially fill up the memory with lots of documents, which are eagerly loaded. You are therefore advised to include some operators such as skip() and limit() to control the amount of data to be loaded into the memory: List list = collection.find( query ).skip( 100 ). limit( 10 ).toArray(); Just like you learned from the mongo shell, the skip operator can be used as an initial offset of your cursor whilst the limit construct can eventually load the first n occurrences in the cursor. Filtering through the records Typically, you will not need to fetch the whole set of documents in a collection. So, just like SQL uses WHERE conditions to filter records, in MongoDB you can restrict searches by creating a BasicDBObject and passing it to the find function as an argument. See the following example: DBCollection coll = db.getCollection("javastuff");   DBObject query = new BasicDBObject("name", "owen");   DBCursor cursor = coll.find(query);   try {    while(cursor.hasNext()) {        System.out.println(cursor.next());    } } finally {    cursor.close(); } In the preceding example, we retrieve the documents in the javastuff collection, whose name key equals to owen. That's the equivalent of an SQL query like this: SELECT * FROM javastuff WHERE name='owen' Building more complex searches As your collections keep growing, you will need to be more selective with your searches. For example, you could include multiple keys in your BasicDBObject that will eventually be passed to find. We can then apply the same functions in our queries. For example, here is how to find documents whose name does not equal ($ne) to Frank and whose age is greater than 10: DBCollection coll = db.getCollection("javastuff");   DBObject query = new      BasicDBObject("name", new BasicDBObject("$ne",        "frank")).append("age", new BasicDBObject("$gt", 10));   DBCursor cursor = coll.find(query); Updating documents Having learned about create and read, we are half way through our CRUD track. The next operation you will learn is update. The DBCollection class contains an update method that can be used for this purpose. Let's say we have the following document: > db.javastuff.find({"name":"frank"}).pretty() { "_id" : ObjectId("55142c27627b27560bd365b1"), "name" : "frank", "age" : 31, "info" : { "email" : "frank@mail.com", "phone" : "222-111-444" } } Now we want to change the age value for this document by setting it to 23: DBCollection coll = db.getCollection("javastuff");   DBObject newDocument = new BasicDBObject(); newDocument.put("age", 23);   DBObject searchQuery = new BasicDBObject().append("name", "owen");   coll.update(searchQuery, newDocument); You might think that would do the trick, but wait! Let's have a look at our document using the mongo shell: > db.javastuff.find({"age":23}).pretty()   { "_id" : ObjectId("55142c27627b27560bd365b1"), "age" : 23 } As you can see, the update statement has replaced the original document with another one, including only the keys and values we have passed to the update. In most cases, this is not what we want to achieve. If we want to update a particular value, we have to use the $set update modifier. DBCollection coll = db.getCollection("javastuff");   BasicDBObject newDocument = new BasicDBObject(); newDocument.append("$set", new BasicDBObject().append("age",   23));   BasicDBObject searchQuery = new BasicDBObject().append("name", "frank");   coll.update(searchQuery, newDocument); So, suppose we restored the initial document with all the fields, this is the outcome of the update using the $set update modifier: > db.javastuff.find({"age":23}).pretty() { "_id" : ObjectId("5514326e627b383428c2ccd8"), "name" : "frank", "age" : 23, "in,fo" : {    "email" : "frank@mail.com",    "phone" : "222-111-444" } } Please note that the DBCollection class overloads the method update with update (DBObject q, DBObject o, boolean upsert, boolean multi). The first parameter (upsert) determines whether the database should create the element if it does not exist. The second one (multi) causes the update to be applied to all matching objects. Deleting documents The operator to be used for deleting documents is obviously delete. As for other operators, it includes several variants. In its simplest form, when executed over a single document returned, it will remove it: MongoClient mongoClient = new MongoClient("localhost", 27017);   DB db = mongoClient.getDB("sampledb"); DBCollection coll = db.getCollection("javastuff"); DBObject doc = coll.findOne();   coll.remove(doc); Most of the time you will need to filter the documents to be deleted. Here is how to delete the document with the key frank: DBObject document = new BasicDBObject(); document.put("name", "frank");   coll.remove(document); Deleting a set of documents Bulk deletion of documents can be achieved by including the keys in a List and building an $in modifier expression that uses this list. Let's see, for example, how to delete all records whose age ranges from 0 to 49: BasicDBObject deleteQuery = new BasicDBObject(); List<Integer> list = new ArrayList<Integer>();   for (int i=0;i<50;i++) list.add(i);   deleteQuery.put("age", new BasicDBObject("$in", list)); Summary In this article, we covered how to perform the same operations that are available in the mongo shell. Resources for Article: Further resources on this subject: MongoDB data modeling [article] Apache Solr and Big Data – integration with MongoDB [article] About MongoDB [article]
Read more
  • 0
  • 0
  • 2052

article-image-managing-recipients
Packt
11 Aug 2015
17 min read
Save for later

Managing Recipients

Packt
11 Aug 2015
17 min read
In this article by Jonas Andersson and Mike Pfeiffer, authors of the book Microsoft Exchange Server PowerShell Cookbook - Third Edition, we will see how the mailboxes are managed in the Exchange Management Shell. (For more resources related to this topic, see here.) Adding, modifying, and removing mailboxes One of the most common tasks performed within the Exchange Management Shell is mailbox management. In this recipe, we'll take a look at the command syntax required to create, update, and remove mailboxes from your Exchange organization. The concepts outlined in this recipe can be used to perform basic day-to-day tasks and will be useful for more advanced scenarios, such as creating mailboxes in bulk. How to do it... Let's see how to add, modify, and delete mailboxes using the following steps: Let's start off by creating a mailbox-enabled Active Directory user account. To do this, we can use the New-Mailbox cmdlet, as shown in the following example: $password = ConvertTo-SecureString -AsPlainText P@ssw0rd `-ForceNew-Mailbox -UserPrincipalName dave@contoso.com `-Alias dave `-Database DAGDB1 `-Name 'Dave Jones' `-OrganizationalUnit Sales `-Password $password `-FirstName Dave `-LastName Jones `-DisplayName 'Dave Jones' Once the mailbox has been created, we can modify it using the Set-Mailbox cmdlet: Set-Mailbox -Identity dave `-UseDatabaseQuotaDefaults $false `-ProhibitSendReceiveQuota 5GB `-IssueWarningQuota 4GB To remove the Exchange attributes from the Active Directory user account and mark the mailbox in the database for removal, use the Disable-Mailbox cmdlet: Disable-Mailbox -Identity dave -Confirm:$false How it works... When running the New-Mailbox cmdlet, the -Password parameter is required, and you need to provide a value for it, using a secure string object. As you can see from the code, we used the ConvertTo-SecureString cmdlet to create a $password variable that stores a specified value as an encrypted string. This $password variable is then assigned to the -Password parameter when running the cmdlet. It's not required to first store this object in a variable; we could have done it inline, as shown next: New-Mailbox -UserPrincipalName dave@contoso.com `-Alias dave `-Database DAGDB1 `-Name 'Dave Jones' `-OrganizationalUnit Sales `-Password (ConvertTo-SecureString -AsPlainText P@ssw0rd -Force) `-FirstName Dave `-LastName Jones `-DisplayName 'Dave Jones' Keep in mind that the password used here needs to comply with your Active Directory password policies, which may enforce a minimum password length and have requirements for complexity. Only a few parameters are actually required when running New-Mailbox, but the cmdlet itself supports several useful parameters that can be used to set certain properties when creating the mailbox. You can run Get-Help New-Mailbox -Detailed to determine which additional parameters are supported. The New-Mailbox cmdlet creates a new Active Directory user and then mailbox-enables that account. We can also create mailboxes for existing users with the Enable-Mailbox cmdlet, using a syntax similar to the following: Enable-Mailbox steve -Database DAGDB1 The only requirement when running the Enable-Mailbox cmdlet is that you need to provide the identity of the Active Directory user that should be mailbox-enabled. In the previous example, we specified the database in which the mailbox should be created, but this is optional. The Enable-Mailbox cmdlet supports a number of other parameters that you can use to control the initial settings for the mailbox. You can use a simple one-liner to create mailboxes in bulk for existing Active Directory users: Get-User -RecipientTypeDetails User | Enable-Mailbox -Database DAGDB1 Notice that we run the Get-User cmdlet, specifying User as the value for the -RecipientTypeDetails parameter. This will retrieve only the accounts in Active Directory that have not been mailbox-enabled. We then pipe those objects down to the Enable-Mailbox cmdlet and mailboxes will be created for each of those users in one simple operation. Once the mailboxes have been created, they can be modified with the Set-Mailbox cmdlet. As you may recall from our original example, we used the Set-Mailbox cmdlet to configure the custom storage quota settings after creating a mailbox for Dave Jones. Keep in mind that the Set-Mailbox cmdlet supports over 100 parameters, so anything that can be done to modify a mailbox can be scripted. Bulk modifications to mailboxes can be done easily by taking advantage of the pipeline and the Set-Mailbox cmdlet. Instead of configuring storage quotas on a single mailbox, we can do it for multiple users at once: Get-Mailbox -OrganizationalUnit contoso.com/sales |Set-Mailbox -UseDatabaseQuotaDefaults $false `-ProhibitSendReceiveQuota 5GB `-IssueWarningQuota 4GB Here, we are simply retrieving every mailbox in the sales OU using the Get-Mailbox cmdlet. The objects returned from this command are piped down to Set-Mailbox, which then modifies the quota settings for each mailbox in one shot. The Disable-Mailbox cmdlet will strip the Exchange attributes from an Active Directory user and will disconnect the associated mailbox. By default, disconnected mailboxes are retained for 30 days. You can modify this setting on the database that holds the mailbox. In addition to this, you can also use the Remove-Mailbox cmdlet to delete both the Active Directory account and the mailbox at once: Remove-Mailbox -Identity dave -Confirm:$false After running this command, the mailbox will be purged once it exceeds the deleted mailbox retention setting on the database. One common mistake is when administrators use the Remove-Mailbox cmdlet when the Disable-Mailbox cmdlet should have been used. It's important to remember that the Remove-Mailbox cmdlet will delete the Active Directory user account and mailbox, while the Disable-Mailbox cmdlet only removes the mailbox, but the Active Directory user account still remains. There's more... When we ran the New-Mailbox cmdlet in the previous examples, we assigned a secure string object to the –Password parameter using the ConvertTo-SecureString cmdlet. This is a great technique to use when your scripts need complete automation, but you can also allow an operator to enter this information interactively. For example, you might build a script that prompts an operator for a password when creating one or more mailboxes. There are a couple of ways you can do this. First, you can use the Read-Host cmdlet to prompt the user running the script to enter a password: $pass = Read-Host "Enter Password" -AsSecureString Once a value has been entered in the shell, your script can assign the $pass variable to the -Password parameter of the New-Mailbox cmdlet. Alternatively, you can supply a value for the -Password parameter using the Get-Credential cmdlet: New-Mailbox -Name Dave -UserPrincipalName dave@contoso.com `-Password (Get-Credential).password You can see that the value we are assigning to the -Password parameter in this example is actually the password property of the object returned by the Get-Credential cmdlet. Executing this command will first launch a Windows authentication dialog box, where the caller can enter a username and password. Once the credential object has been created, the New-Mailbox cmdlet will run. Even though a username and password must be entered in the authentication dialog box, only the password value will be used when the command executes. Setting the Active Directory attributes Some of the Active Directory attributes that you may want to set when creating a mailbox might not be available when using the New-Mailbox cmdlet. Good examples of this are a user's city, state, company, and department attributes. In order to set these attributes, you'll need to call the Set-User cmdlet after the mailbox has been created: Set-User –Identity dave –Office IT –City Seattle –State Washington You can run Get-Help Set-User -Detailed to view all of the available parameters supported by this cmdlet. Notice that the Set-User cmdlet is an Active Directory cmdlet and not an Exchange cmdlet. In the examples using the New-Mailbox cmdlet to create new mailboxes, it is not required to use all these parameters from the example. The only required parameters are UserPrincipalName, Name, and Password. Working with contacts Once you've started managing mailboxes using the Exchange Management Shell, you'll probably notice that the concepts and command syntax used to manage contacts are very similar. The difference, of course, is that we need to use a different set of cmdlets. In addition, we also have two types of contacts to deal with in Exchange. We'll take a look at how you can manage both of them in this recipe. How to do it... Let's see how to work with contacts using the following steps: To create a mail-enabled contact, use the New-MailContact cmdlet: New-MailContact -Alias rjones `-Name "Rob Jones" `-ExternalEmailAddress rob@fabrikam.com `-OrganizationalUnit sales Mail-enabled users can be created with the New-MailUser cmdlet: New-MailUser -Name 'John Davis' `-Alias jdavis `-UserPrincipalName jdavis@contoso.com `-FirstName John `-LastName Davis `-Password (ConvertTo-SecureString -AsPlainText P@ssw0rd `-Force) `-ResetPasswordOnNextLogon $false `-ExternalEmailAddress jdavis@fabrikam.com How it works... Mail contacts are useful when you have external e-mail recipients that need to show up in your global address list. When you use the New-MailContact cmdlet, an Active Directory contact object is created and mail-enabled with the external e-mail address assigned. You can mail-enable an existing Active Directory contact using the Enable-MailContact cmdlet. Mail users are similar to mail contacts in the way that they have an associated external e-mail address. The difference is that these objects are mail-enabled Active Directory users and this explains why we need to assign a password when creating the object. You might use a mail user for a contractor who works onsite in your organization and needs to be able to log on to your domain. When users in your organization need to e-mail this person, they can select them from the global address list and messages sent to these recipients will be delivered to the external address configured for the account. When dealing with mailboxes, there are a couple of things that should be taken into consideration when it comes to removing contacts and mail users. You can remove the Exchange attributes from a contact using the Disable-MailContact cmdlet. The Remove-MailContact cmdlet will remove the contact object from the Active Directory and Exchange. Similarly, the Disable-MailUser and Remove-MailUser cmdlets work in the same fashion. There's more... Like mailboxes, mail contacts and mail-enabled user accounts have several Active Directory attributes that can be set, such as the job title, company, department, and so on. To update these attributes, you can use the Set-* cmdlets that are available for each respective type. For example, to update our mail contact, we can use the Set-Contact cmdlet with the following syntax: Set-Contact -Identity rjones `-Title 'Sales Contractor' `-Company Fabrikam `-Department Sales To modify the same settings for a mail-enabled user, use the Set-User cmdlet: Set-User -Identity jdavis `-Title 'Sales Contractor' `-Company Fabrikam `-Department Sales Both cmdlets can be used to modify a number of different settings. Use the help system to view all of the available parameters. Managing distribution groups In many Exchange environments, distribution groups are heavily relied upon and require frequent changes. This recipe will cover the creation of distribution groups and how to add members to groups, which might be useful when performing these tasks interactively in the shell or through automated scripts. How to do it... Let's see how to create distribution groups using the following steps: To create a distribution group, use the New-DistributionGroup cmdlet: New-DistributionGroup -Name Sales Once the group has been created, adding multiple members can be done easily using a one-liner, as follows: Get-Mailbox -OrganizationalUnit Sales |Add-DistributionGroupMember -Identity Sales We can also create distribution groups whose memberships are set dynamically: New-DynamicDistributionGroup -Name Accounting `-Alias Accounting `-IncludedRecipients MailboxUsers,MailContacts `-OrganizationalUnit Accounting `-ConditionalDepartment accounting,finance `-RecipientContainer contoso.com How it works... There are two types of distribution groups that can be created with Exchange. Firstly, there are regular distribution groups, which contain a distinct list of users. Secondly, there are dynamic distribution groups, whose members are determined at the time a message is sent based on a number of conditions or filters that have been defined. Both types have a set of cmdlets that can be used to add, remove, update, enable, or disable these groups. By default, when creating a standard distribution group, the group scope will be set to Universal. You can create a mail-enabled security group using the New-DistributionGroup cmdlet by setting the -Type parameter to Security. If you do not provide a value for the -Type parameter, the group will be created using the Distribution group type. You can mail-enable an existing Active Directory universal distribution group using the Enable-DistributionGroup cmdlet. After creating the Sales distribution group in our previous example, we added all of the mailboxes in the Sales OU to the group using the Add-DistributionGroupMember cmdlet. You can do this in bulk, or for one user at a time, using the –Member parameter: Add-DistributionGroupMember -Identity Sales -Member administrator Dynamic distribution groups determine their membership based on a defined set of filters and conditions. When we created the Accounting distribution group, we used the -IncludedRecipients parameter to specify that only the MailboxUsers and MailContacts object types would be included in the group. This eliminates resource mailboxes, groups, or mail users from being included as members. The group will be created in the Accounting OU based on the value used with the -OrganizationalUnit parameter. Using the –ConditionalDepartment parameter, the group will only include users that have a department setting of either Accounting or Finance. Finally, since the -RecipientContainer parameter is set to the FQDN of the domain, any user located in the Active Directory can potentially be included in the group. You can create more complex filters for dynamic distribution groups using a recipient filter. You can modify both group types using the Set-DistributionGroup and Set-DynamicDistributionGroup cmdlets. There's more... When dealing with other recipient types, there are a couple of things that should be taken into consideration when it comes to removing distribution groups. You can remove the Exchange attributes from a group using the Disable-DistributionGroup cmdlet. The Remove-DistributionGroup cmdlet will remove the group object from the Active Directory and Exchange. Managing resource mailboxes In addition to mailboxes, groups, and external contacts, recipients can also include specific rooms or pieces of equipment. Locations such as a conference room or a classroom can be given a mailbox so that they can be reserved for meetings. Equipment mailboxes can be assigned to physical, non-location specific resources, such as laptops or projectors, and can then be checked out to individual users or groups by booking a time with the mailbox. In this recipe, we'll take a look at how you can manage resource mailboxes using the Exchange Management Shell. How to do it... When creating a resource mailbox from within the shell, the syntax is similar to creating a mailbox for a regular user. For example, you still use the New-Mailbox cmdlet when creating a resource mailbox: New-Mailbox -Name "CR23" -DisplayName "Conference Room 23" `-UserPrincipalName CR23@contoso.com -Room How it works... There are two main differences when it comes to creating a resource mailbox, as opposed to a standard user mailbox. First, you need to use either the -Room switch parameter or the -Equipment switch parameter to define the type of resource mailbox that will be created. Second, you do not need to provide a password value for the user account. When using either of these resource mailbox switch parameters to create a mailbox, the New-Mailbox cmdlet will create a disabled Active Directory user account that will be associated with the mailbox. The entire concept of room and equipment mailboxes revolves around the calendars used by these resources. If you want to reserve a room or a piece of equipment, you book a time through Outlook or OWA with these resources for the duration that you'll need them. The requests sent to these resources need to be accepted, either by a delegate or automatically using the Resource Booking Attendant. To configure the room mailbox created in the previous example, to automatically accept new meeting requests, we can use the Set-CalendarProcessing cmdlet to set the Resource Booking Attendant for that mailbox to AutoAccept: Set-CalendarProcessing CR23 -AutomateProcessing AutoAccept When the Resource Booking Attendant is set to AutoAccept, the request will be immediately accepted as long as there is no conflict with another meeting. If there is a conflict, an e-mail message will be returned to the requestor, which explains that the request was declined due to scheduling conflicts. You can allow conflicts by adding the –AllowConflicts switch parameter to the previous command. When working with resource mailboxes with AutomateProcessing set to AutoAccept, you'll get an automated e-mail response from the resource after booking a time. This e-mail message will explain whether the request was accepted or declined, depending on your settings. You can add an additional text to the response message that the meeting organizer will receive using the following syntax: Set-CalendarProcessing -Identity CR23 `-AddAdditionalResponse $true `-AdditionalResponse 'For Assistance Contact Support at Ext. #3376' This example uses the Set-CalendarProcessing cmdlet to customize the response messages sent from the CR23 room mailbox. You can see here that we added a message that tells the user the help desk number to call if assistance is required. Keep in mind that you can only add an additional response text, when the AutomateProcessing property is set to AutoAccept. If you do not want to automate the calendar processing for a resource mailbox, then you'll need to add delegates that can accept or deny meetings for that resource. Again, we can turn to the Set-CalendarProcessing cmdlet to accomplish this: Set-CalendarProcessing -Identity CR23 `-ResourceDelegates "joe@contoso.com","steve@contoso.com" `-AutomateProcessing None In this example, we added two delegates to the resource mailbox and have turned off automated processing. When a request comes into the CR23 mailbox, both Steve and Joe will be notified and can accept or deny the request on behalf of the resource mailbox. There's more... When it comes to working with resource mailboxes, another useful feature is the ability to assign custom resource properties to rooms and equipment resources. For example, you may have a total of 5, 10, or 15 conference rooms, but maybe only four of these have whiteboards. It might be useful for your users to know this information when booking a resource for a meeting, where they will be conducting a training session. Using the shell, we can add custom resource properties to the Exchange organization by modifying the resource schema. Once these custom resource properties have been added, they can then be assigned to specific resource mailboxes. You can use the following code to add a whiteboard resource property to the Exchange organization's resource schema: Set-ResourceConfig -ResourcePropertySchema 'Room/Whiteboard' Now that the whiteboard resource property is available within the Exchange organization, we can add this to our Conference Room 23 mailbox using the following command: Set-Mailbox -Identity CR23 -ResourceCustom Whiteboard When users access the Select Rooms or Add Rooms dialog box in Outlook 2007, 2010, or 2013, they will see that Conference Room 23 has a whiteboard available. Converting mailboxes If you've moved from an old version, you may have a number of mailboxes that were used as resource mailboxes. Once these mailboxes have been moved they will be identified as Shared mailboxes. You can convert them to different types using the Set-Mailbox cmdlet so that they'll have all of the properties of a resource mailbox: Get-Mailbox conf* | Set-Mailbox -Type Room You can run the Set-Mailbox cmdlet against each mailbox one at a time and convert them to Room mailboxes, using the -Type parameter. Or, if you use a common naming convention, you may be able to do them in bulk by retrieving a list of mailboxes using a wildcard and piping them to Set-Mailbox, as shown previously. Summary In this article we have discussed how the various types of mailboxes are managed, how to set up the Active Directory attributes, how to work with contacts and distribution groups. Resources for Article: Further resources on this subject: Unleashing Your Development Skills with PowerShell [Article] Inventorying Servers with PowerShell [Article] Exchange Server 2010 Windows PowerShell: Working with Address Lists [Article]
Read more
  • 0
  • 0
  • 1194

article-image-working-virtual-machines
Packt
11 Aug 2015
7 min read
Save for later

Working with Virtual Machines

Packt
11 Aug 2015
7 min read
In this article by Yohan Rohinton Wadia the author of Learning VMware vCloud Air, we are going to walk through setting up and accessing virtual machines. (For more resources related to this topic, see here.) What is a virtual machine? Most of you reading this article must be aware of what a virtual machine is, but for the sake of simplicity, let's have a quick look at what it really is. A virtual machine is basically an emulation of a real or physical computer which runs on an operating system and can host your favorite applications as well. Each virtual machine consists of a set of files that govern the way the virtual machine is configured and run. The most important of these files would be a virtual drive, that acts just as a physical drive storing all your data, applications and operating system; and a configuration file that basically tells the virtual machine how much resources are dedicated to it, which networks or storage adapters to use, and so on. The beauty of these files is that you can port them from one virtualization platform to another and manage them more effectively and securely as compared to a physical server. The following diagram shows an overview of how a virtual machine works over a host: Virtual machine creation in vCloud Air is a very simple and straight forward process. vCloud Air provides you with three mechanisms using which you can create your own virtual machines briefly summarized as follows: Wizard driven: vCloud Air provides a simple wizard using which you can deploy virtual machines from pre-configured templates. This option is provided via the vCloud Air web interface itself. Using vCloud Director: vCloud Air provides an advanced option as well for users who want to create their virtual machines from scratch. This is done via the vCloud Director interface and is a bit more complex as compared to the wizard driven option. Bring your own media: Because vCloud Air natively runs on VMware vSphere and vCloud Director platforms, its relatively easy for you to migrate your own media, templates and vApps into vCloud Air using a special tool called as VMware vCloud Connector. Create a virtual machine using template As we saw earlier, VMware vCloud Air provides us with a default template using which you can deploy virtual machines in your public cloud in a matter of seconds. The process is a wizard driven activity where you can select and configure the virtual machine's resources such as CPU, memory, hard disk space all with a few simple clicks. The following steps will you create a virtual machine using a template: Login to your vCloud Air (https://vchs.vmware.com/login) using the username and password that we set during the sign in process. From the Home page, select the VPC on Demand tab. Once there, from the drop-down menu above the tabs, select your region and the corresponding VDC where you would like to deploy your first virtual machine. In this case, I have selected the UK-Slough-6 as the region and MyFirstVDC as the default VDC where I will deploy my virtual machines:If you have selected more than one VDC, you will be prompted to select a specific virtual data center before you start the wizard as a virtual machine cannot span across regions or VDCs. From the Virtual Machines tab, select the Create your first virtual machine option. This will bring up the VM launch wizard as shown here: As you can see here, there are two tabs provided by default: a VMware Catalog and another section called as My Catalog. This is an empty catalog by default but this is the place where all your custom templates and vApps will be shown if you have added them from the vCloud Director portal or purchased them from the Solutions Exchange site as well. Select any particular template to get started with. You can choose your virtual machine to be either powered by a 32 bit or a 64 bit operating system. In my case, I have selected a CentOS 6.4 64 bit template for this exercise. Click Continue once done. Templates provided by vCloud Air are either free or paid. The paid ones generally have a $ sign marked next to the OS architecture, indicating that you will be charged once you start using the virtual machine. You can track all your purchases using the vCloud Air billing statement. The next step is to define the basic configuration for your virtual machine. Provide a suitable name for your virtual machine. You can add an optional description to it as well. Next, select the CPU, memory and storage for the virtual machine. The CPU and memory resources are linked with each other so changing the CPU will automatically set the default vRAM for the virtual machine as well; however you can always increase the vRAM as per your needs. In this case, the virtual machine has 2 CPUs and 4 GB vRAM allocated to it. Select the amount of storage you want to provide to your virtual machine. VMware can allocate a maximum of 2 TB of storage as a single drive to a virtual machine. However as a best practice; it is always good to add more storage by adding multiple drives rather than storing it all on one single drive. You can optionally select your disks to be either standard or SSD-accelerated; both features we will discuss shortly. Virtual machine configuration Click on Create Virtual Machine once you are satisfied with your changes. Your virtual machine will now be provisioned within few minutes. By default, the virtual machine is not powered on after it is created. You can power it on by selecting the virtual machine and clicking on the Power On icon in the tool bar above the virtual machine: Status of the virtual machine created There you have it. Your very first virtual machine is now ready for use! Once powered on, you can select the virtual machine name to view its details along with a default password that is auto-generated by vCloud Air. Accessing virtual machines using the VMRC Once your virtual machines are created and powered on, you can access and view them easily using the virtual machine remote console (VMRC). There are two ways to invoke the VMRC, one is by selecting your virtual machine from the vCloud Air dashboard, selecting the Actions tab and select the option Open in Console as shown: The other way to do so is by selecting the virtual machine name. This will display the Settings page for that particular virtual machine. To launch the console select the Open Virtual Machine option as shown: Make a note of the Guest OS Password from the Guest OS section. This is the default password that will be used to log in to your virtual machine. To log in to the virtual machine, use the following credentials: Username: root Password: <Guest_OS_Password> This is shown in the following screenshot: You will be prompted to change this password on your first login. Provide a strong new password that contains at least one special character and contains an alphanumeric pattern as well. Summary There you have it! Your very own Linux virtual machine on the cloud! Resources for Article: Further resources on this subject: vCloud Networks [Article] Creating your first VM using vCloud technology [Article] Securing vCloud Using the vCloud Networking and Security App Firewall [Article]
Read more
  • 0
  • 0
  • 8028

article-image-exploring-jenkins
Packt
10 Aug 2015
7 min read
Save for later

Exploring Jenkins

Packt
10 Aug 2015
7 min read
In this article by Mitesh Soni, the author of the book Jenkins Essentials, introduces us to Jenkins. (For more resources related to this topic, see here.) Jenkins is an open source application written in Java. It is one of the most popular continuous integration (CI) tools used to build and test different kinds of projects. In this article, we will have a quick overview of Jenkins, essential features, and its impact on DevOps culture. Before we can start using Jenkins, we need to install it. In this article, we have provided a step-by-step guide to install Jenkins. Installing Jenkins is a very easy task and is different from the OS flavors. This article will also cover the DevOps pipeline. To be precise, we will discuss the following topics in this article: Introduction to Jenkins and its features Installation of Jenkins on Windows and the CentOS operating system How to change configuration settings in Jenkins What is the deployment pipeline On your mark, get set, go! Introduction to Jenkins and its features Let's first understand what continuous integration is. CI is one of the most popular application development practices in recent times. Developers integrate bug fix, new feature development, or innovative functionality in code repository. The CI tool verifies the integration process with an automated build and automated test execution to detect issues with the current source of an application, and provide quick feedback. Jenkins is a simple, extensible, and user-friendly open source tool that provides CI services for application development. Jenkins supports SCM tools such as StarTeam, Subversion, CVS, Git, AccuRev and so on. Jenkins can build Freestyle, Apache Ant, and Apache Maven-based projects. The concept of plugins makes Jenkins more attractive, easy to learn, and easy to use. There are various categories of plugins available such as Source code management, Slave launchers and controllers, Build triggers, Build tools, Build notifies, Build reports, other post-build actions, External site/tool integrations, UI plugins, Authentication and user management, Android development, iOS development, .NET development, Ruby development, Library plugins, and so on. Jenkins defines interfaces or abstract classes that model a facet of a build system. Interfaces or abstract classes define an agreement on what needs to be implemented; Jenkins uses plugins to extend those implementations. To learn more about all plugins, visit https://wiki.jenkins-ci.org/x/GIAL. To learn how to create a new plugin, visit https://wiki.jenkins-ci.org/x/TYAL. To download different versions of plugins, visit https://updates.jenkins-ci.org/download/plugins/. Features Jenkins is one of the most popular CI servers in the market. The reasons for its popularity are as follows: Easy installation on different operating systems. Easy upgrades—Jenkins has very speedy release cycles. Simple and easy-to-use user interface. Easily extensible with the use of third-party plugins—over 400 plugins. Easy to configure the setup environment in the user interface. It is also possible to customize the user interface based on likings. The master slave architecture supports distributed builds to reduce loads on the CI server. Jenkins is available with test harness built around JUnit; test results are available in graphical and tabular forms. Build scheduling based on the cron expression (to know more about cron, visit http://en.wikipedia.org/wiki/Cron). Shell and Windows command execution in prebuild steps. Notification support related to the build status. Installation of Jenkins on Windows and CentOS Go to https://jenkins-ci.org/. Find the Download Jenkins section on the home page of Jenkins's website. Download the war file or native packages based on your operating system. A Java installation is needed to run Jenkins. Install Java based on your operating system and set the JAVA_HOME environment variable accordingly. Installing Jenkins on Windows Select the native package available for Windows. It will download jenkins-1.xxx.zip. In our case, it will download jenkins-1.606.zip. Extract it and you will get setup.exe and jenkins-1.606.msi files. Click on setup.exe and perform the following steps in sequence. On the welcome screen, click Next: Select the destination folder and click on Next. Click on Install to begin installation. Please wait while the Setup Wizard installs Jenkins. Once the Jenkins installation is completed, click on the Finish button. Verify the Jenkins installation on the Windows machine by opening URL http://<ip_address>:8080 on the system where you have installed Jenkins. Installation of Jenkins on CentOS To install Jenkins on CentOS, download the Jenkins repository definition to your local system at /etc/yum.repos.d/ and import the key. Use the wget -O /etc/yum.repos.d/jenkins.repo http://pkg.jenkins-ci.org/redhat/jenkins.repo command to download repo. Now, run yum install Jenkins; it will resolve dependencies and prompt for installation. Reply with y and it will download the required package to install Jenkins on CentOS. Verify the Jenkins status by issuing the service jenkins status command. Initially, it will be stopped. Start Jenkins by executing service jenkins start in the terminal. Verify the Jenkins installation on the CentOS machine by opening the URL http://<ip_address>:8080 on the system where you have installed Jenkins. How to change configuration settings in Jenkins Click on the Manage Jenkins link on the dashboard to configure system, security, to manage plugins, slave nodes, credentials, and so on. Click on the Configure System link to configure Java, Ant, Maven, and other third-party products' related information. Jenkins uses Groovy as its scripting language. To execute the arbitrary script for administration/trouble-shooting/diagnostics on the Jenkins dashboard, go to the Manage Jenkins link on the dashboard, click on Script Console, and run println(Jenkins.instance.pluginManager.plugins). To verify the system log, go to the Manage Jenkins link on the dashboard and click on the System Log link or visit http://localhost:8080/log/all. To get more information on third-party libraries—version and license information in Jenkins, go to the Manage Jenkins link on the dashboard and click on the About Jenkins link. What is the deployment pipeline? The application development life cycle is a traditionally lengthy and a manual process. In addition, it requires effective collaboration between development and operations teams. The deployment pipeline is a demonstration of automation involved in the application development life cycle containing the automated build execution and test execution, notification to the stakeholder, and deployment in different runtime environments. Effectively, the deployment pipeline is a combination of CI and continuous delivery, and hence is a part of DevOps practices. The following diagram depicts the deployment pipeline process: Members of the development team check code into a source code repository. CI products such as Jenkins are configured to poll changes from the code repository. Changes in the repository are downloaded to the local workspace and Jenkins triggers an automated build process, which is assisted by Ant or Maven. Automated test execution or unit testing, static code analysis, reporting, and notification of successful or failed build process are also part of the CI process. Once the build is successful, it can be deployed to different runtime environments such as testing, preproduction, production, and so on. Deploying a war file in terms of the JEE application is normally the final stage in the deployment pipeline. One of the biggest benefits of the deployment pipeline is the faster feedback cycle. Identification of issues in the application at early stages and no dependencies on manual efforts make this entire end-to-end process more effective. To read more, visit http://martinfowler.com/bliki/DeploymentPipeline.html and http://www.informit.com/articles/article.aspx?p=1621865&seqNum=2. Summary Congratulations! We reached the end of this article and hence we have Jenkins installed on our physical or virtual machine. Till now, we covered the basics of CI and the introduction to Jenkins and its features. We completed the installation of Jenkins on Windows and CentOS platforms. In addition to this, we discussed the deployment pipeline and its importance in CI. Resources for Article: Further resources on this subject: Jenkins Continuous Integration [article] Running Cucumber [article] Introduction to TeamCity [article]
Read more
  • 0
  • 0
  • 2321

article-image-using-arcpy-dataaccess-module-withfeature-classesand-tables
Packt
10 Aug 2015
28 min read
Save for later

Using the ArcPy DataAccess Module withFeature Classesand Tables

Packt
10 Aug 2015
28 min read
In this article by Eric Pimpler, author of the book Programming ArcGIS with Python Cookbook - Second Edition, we will cover the following topics: Retrieving features from a feature class with SearchCursor Filtering records with a where clause Improving cursor performance with geometry tokens Inserting rows with InsertCursor Updating rows with UpdateCursor (For more resources related to this topic, see here.) We'll start this article with a basic question. What are cursors? Cursors are in-memory objects containing one or more rows of data from a table or feature class. Each row contains attributes from each field in a data source along with the geometry for each feature. Cursors allow you to search, add, insert, update, and delete data from tables and feature classes. The arcpy data access module or arcpy.da was introduced in ArcGIS 10.1 and contains methods that allow you to iterate through each row in a cursor. Various types of cursors can be created depending on the needs of developers. For example, search cursors can be created to read values from rows. Update cursors can be created to update values in rows or delete rows, and insert cursors can be created to insert new rows. There are a number of cursor improvements that have been introduced with the arcpy data access module. Prior to the development of ArcGIS 10.1, cursor performance was notoriously slow. Now, cursors are significantly faster. Esri has estimated that SearchCursors are up to 30 times faster, while InsertCursors are up to 12 times faster. In addition to these general performance improvements, the data access module also provides a number of new options that allow programmers to speed up processing. Rather than returning all the fields in a cursor, you can now specify that a subset of fields be returned. This increases the performance as less data needs to be returned. The same applies to geometry. Traditionally, when accessing the geometry of a feature, the entire geometric definition would be returned. You can now use geometry tokens to return a portion of the geometry rather than the full geometry of the feature. You can also use lists and tuples rather than using rows. There are also other new features, such as edit sessions and the ability to work with versions, domains, and subtypes. There are three cursor functions in arcpy.da. Each returns a cursor object with the same name as the function. SearchCursor() creates a read-only SearchCursor object containing rows from a table or feature class. InsertCursor() creates an InsertCursor object that can be used to insert new records into a table or feature class. UpdateCursor() returns a cursor object that can be used to edit or delete records from a table or feature class. Each of these cursor objects has methods to access rows in the cursor. You can see the relationship between the cursor functions, the objects they create, and how they are used, as follows: Function Object created Usage SearchCursor() SearchCursor This is a read-only view of data from a table or feature class InsertCursor() InsertCursor This adds rows to a table or feature class UpdateCursor() UpdateCursor This edits or deletes rows in a table or feature class The SearchCursor() function is used to return a SearchCursor object. This object can only be used to iterate through a set of rows returned for read-only purposes. No insertions, deletions, or updates can occur through this object. An optional where clause can be set to limit the rows returned. Once you've obtained a cursor instance, it is common to iterate the records, particularly with SearchCursor or UpdateCursor. There are some peculiarities that you need to understand when navigating the records in a cursor. Cursor navigation is forward-moving only. When a cursor is created, the pointer of the cursor sits just above the first row in the cursor. The first call to next() will move the pointer to the first row. Rather than calling the next() method, you can also use a for loop to process each of the records without the need to call the next() method. After performing whatever processing you need to do with this row, a subsequent call to next() will move the pointer to row 2. This process continues as long as you need to access additional rows. However, after a row has been visited, you can't go back a single record at a time. For instance, if the current row is row 3, you can't programmatically back up to row 2. You can only go forward. To revisit rows 1 and 2, you would need to either call the reset() method or recreate the cursor and move back through the object. As I mentioned earlier, cursors are often navigated through the use of for loops as well. In fact, this is a more common way to iterate a cursor and a more efficient way to code your scripts. Cursor navigation is illustrated in the following diagram: The InsertCursor() function is used to create an InsertCursor object that allows you to programmatically add new records to feature classes and tables. To insert rows, call the insertRow() method on this object. You can also retrieve a read-only tuple containing the field names in use by the cursor through the fields property. A lock is placed on the table or feature class being accessed through the cursor. Therefore, it is important to always design your script in a way that releases the cursor when you are done. The UpdateCursor() function can be used to create an UpdateCursor object that can update and delete rows in a table or feature class. As is the case with InsertCursor, this function places a lock on the data while it's being edited or deleted. If the cursor is used inside a Python's with statement, the lock will automatically be freed after the data has been processed. This hasn't always been the case. Prior to ArcGIS 10.1, cursors were required to be manually released using the Python del statement. Once an instance of UpdateCursor has been obtained, you can then call the updateCursor() method to update records in tables or feature classes and the deleteRow() method to delete a row. The subject of data locks requires a little more explanation. The insert and update cursors must obtain a lock on the data source they reference. This means that no other application can concurrently access this data source. Locks are a way of preventing multiple users from changing data at the same time and thus, corrupting the data. When the InsertCursor() and UpdateCursor() methods are called in your code, Python attempts to acquire a lock on the data. This lock must be specifically released after the cursor has finished processing so that the running applications of other users, such as ArcMap or ArcCatalog, can access the data sources. If this isn't done, no other application will be able to access the data. Prior to ArcGIS 10.1 and the with statement, cursors had to be specifically unlocked through Python's del statement. Similarly, ArcMap and ArcCatalog also acquire data locks when updating or deleting data. If a data source has been locked by either of these applications, your Python code will not be able to access the data. Therefore, the best practice is to close ArcMap and ArcCatalog before running any standalone Python scripts that use insert or update cursors. In this article, we're going to cover the use of cursors to access and edit tables and feature classes. However, many of the cursor concepts that existed before ArcGIS 10.1 still apply. Retrieving features from a feature class with SearchCursor There are many occasions when you need to retrieve rows from a table or feature class for read-only purposes. For example, you might want to generate a list of all land parcels in a city with a value greater than $100,000. In this case, you don't have any need to edit the data. Your needs are met simply by generating a list of rows that meet some sort of criteria. A SearchCursor object contains a read-only copy of rows from a table or feature class. These objects can also be filtered through the use of a where clause so that only a subset of the dataset is returned. Getting ready The SearchCursor() function is used to return a SearchCursor object. This object can only be used to iterate a set of rows returned for read-only purposes. No insertions, deletions, or updates can occur through this object. An optional where clause can be set to limit the rows returned. In this article, you will learn how to create a basic SearchCursor object on a feature class through the use of the SearchCursor() function. The SearchCursor object contains a fields property along with the next() and reset() methods. The fields property is a read-only structure in the form of a Python tuple, containing the fields requested from the feature class or table. You are going to hear the term tuple a lot in conjunction with cursors. If you haven't covered this topic before, tuples are a Python structure to store a sequence of data similar to Python lists. However, there are some important differences between Python tuples and lists. Tuples are defined as a sequence of values inside parentheses, while lists are defined as a sequence of values inside brackets. Unlike lists, tuples can't grow and shrink, which can be a very good thing in some cases when you want data values to occupy a specific position each time. This is the case with cursor objects that use tuples to store data from fields in tables and feature classes. How to do it… Follow these steps to learn how to retrieve rows from a table or feature class inside a SearchCursor object: Open IDLE and create a new script window. Save the script as C:ArcpyBookCh8SearchCursor.py. Import the arcpy.da module: import arcpy.da Set the workspace: arcpy.env.workspace = "c:/ArcpyBook/Ch8" Use a Python with statement to create a cursor: with arcpy.da.SearchCursor("Schools.shp",("Facility","Name")) as cursor: Loop through each row in SearchCursor and print the name of the school. Make sure you indent the for loop inside the with block:for row in sorted(cursor): print("School name: " + row[1]) The entire script should appear as follows: import arcpy.da arcpy.env.workspace = "c:/ArcpyBook/Ch8" with arcpy.da.SearchCursor("Schools.shp",("Facility","Name")) as cursor:    for row in sorted(cursor):        print("School name: " + row[1]) Save the script. You can check your work by examining the C:ArcpyBookcodeCh8SearchCursor_Step1.py solution file. Run the script. You should see the following output: School name: ALLAN School name: ALLISON School name: ANDREWS School name: BARANOFF School name: BARRINGTON School name: BARTON CREEK School name: BARTON HILLS School name: BATY School name: BECKER School name: BEE CAVE How it works… The with statement used with the SearchCursor() function will create, open, and close the cursor. So, you no longer have to be concerned with explicitly releasing the lock on the cursor as you did prior to ArcGIS 10.1. The first parameter passed into the SearchCursor() function is a feature class, represented by the Schools.shp file. The second parameter is a Python tuple containing a list of fields that we want returned in the cursor. For performance reasons, it is a best practice to limit the fields returned in the cursor to only those that you need to complete the task. Here, we've specified that only the Facility and Name fields should be returned. The SearchCursor object is stored in a variable called cursor. Inside the with block, we use a Python for loop to loop through each school returned. We also use the Python sorted() function to sort the contents of the cursor. To access the values from a field on the row, simply use the index number of the field you want to return. In this case, we want to return the contents of the Name column, which will be the 1 index number, since it is the second item in the tuple of field names that are returned. Filtering records with a where clause By default, SearchCursor will contain all rows in a table or feature class. However, in many cases, you will want to restrict the number of rows returned by some sort of criteria. Applying a filter through the use of a where clause limits the records returned. Getting ready By default, all rows from a table or feature class will be returned when you create a SearchCursor object. However, in many cases, you will want to restrict the records returned. You can do this by creating a query and passing it as a where clause parameter when calling the SearchCursor() function. In this article, you'll build on the script you created in the previous article by adding a where clause that restricts the records returned. How to do it… Follow these steps to apply a filter to a SearchCursor object that restricts the rows returned from a table or feature class: Open IDLE and load the SearchCursor.py script that you created in the previous recipe. Update the SearchCursor() function by adding a where clause that queries the facility field for records that have the HIGH SCHOOL text: with arcpy.da.SearchCursor("Schools.shp",("Facility","Name"), '"FACILITY" = 'HIGH SCHOOL'') as cursor: You can check your work by examining the C:ArcpyBookcodeCh8SearchCursor_Step2.py solution file. Save and run the script. The output will now be much smaller and restricted to high schools only: High school name: AKINS High school name: ALTERNATIVE LEARNING CENTER High school name: ANDERSON High school name: AUSTIN High school name: BOWIE High school name: CROCKETT High school name: DEL VALLE High school name: ELGIN High school name: GARZA High school name: HENDRICKSON High school name: JOHN B CONNALLY High school name: JOHNSTON High school name: LAGO VISTA How it works… The where clause parameter accepts any valid SQL query, and is used in this case to restrict the number of records that are returned. Improving cursor performance with geometry tokens Geometry tokens were introduced in ArcGIS 10.1 as a performance improvement for cursors. Rather than returning the entire geometry of a feature inside the cursor, only a portion of the geometry is returned. Returning the entire geometry of a feature can result in decreased cursor performance due to the amount of data that has to be returned. It's significantly faster to return only the specific portion of the geometry that is needed. Getting ready A token is provided as one of the fields in the field list passed into the constructor for a cursor and is in the SHAPE@<Part of Feature to be Returned> format. The only exception to this format is the OID@ token, which returns the object ID of the feature. The following code example retrieves only the X and Y coordinates of a feature: with arcpy.da.SearchCursor(fc, ("SHAPE@XY","Facility","Name")) as cursor: The following table lists the available geometry tokens. Not all cursors support the full list of tokens. Check the ArcGIS help files for information about the tokens supported by each cursor type. The SHAPE@ token returns the entire geometry of the feature. Use this carefully though, because it is an expensive operation to return the entire geometry of a feature and can dramatically affect performance. If you don't need the entire geometry, then do not include this token! In this article, you will use a geometry token to increase the performance of a cursor. You'll retrieve the X and Y coordinates of each land parcel from the parcels feature class along with some attribute information about the parcel. How to do it… Follow these steps to add a geometry token to a cursor, which should improve the performance of this object: Open IDLE and create a new script window. Save the script as C:ArcpyBookCh8GeometryToken.py. Import the arcpy.da module and the time module: import arcpy.da import time Set the workspace: arcpy.env.workspace = "c:/ArcpyBook/Ch8" We're going to measure how long it takes to execute the code using a geometry token. Add the start time for the script: start = time.clock() Use a Python with statement to create a cursor that includes the centroid of each feature as well as the ownership information stored in the PY_FULL_OW field: with arcpy.da.SearchCursor("coa_parcels.shp",("PY_FULL_OW","SHAPE@XY")) as cursor: Loop through each row in SearchCursor and print the name of the parcel and location. Make sure you indent the for loop inside the with block: for row in cursor: print("Parcel owner: {0} has a location of: {1}".format(row[0], row[1])) Measure the elapsed time: elapsed = (time.clock() - start) Print the execution time: print("Execution time: " + str(elapsed)) The entire script should appear as follows: import arcpy.da import time arcpy.env.workspace = "c:/ArcpyBook/Ch9" start = time.clock() with arcpy.da.SearchCursor("coa_parcels.shp",("PY_FULL_OW", "SHAPE@XY")) as cursor:    for row in cursor:        print("Parcel owner: {0} has a location of: {1}".format(row[0], row[1])) elapsed = (time.clock() - start) print("Execution time: " + str(elapsed)) You can check your work by examining the C:ArcpyBookcodeCh8GeometryToken.py solution file. Save the script. Run the script. You should see something similar to the following output. Note the execution time; your time will vary: Parcel owner: CITY OF AUSTIN ATTN REAL ESTATE DIVISION has a location of: (3110480.5197341456, 10070911.174956793) Parcel owner: CITY OF AUSTIN ATTN REAL ESTATE DIVISION has a location of: (3110670.413783513, 10070800.960865) Parcel owner: CITY OF AUSTIN has a location of: (3143925.0013213265, 10029388.97419636) Parcel owner: CITY OF AUSTIN % DOROTHY NELL ANDERSON ATTN BARRY LEE ANDERSON has a location of: (3134432.983822767, 10072192.047894118) Execution time: 9.08046185109 Now, we're going to measure the execution time if the entire geometry is returned instead of just the portion of the geometry that we need: Save a new copy of the script as C:ArcpyBookCh8GeometryTokenEntireGeometry.py. Change the SearchCursor() function to return the entire geometry using SHAPE@ instead of SHAPE@XY: with arcpy.da.SearchCursor("coa_parcels.shp",("PY_FULL_OW", "SHAPE@")) as cursor: You can check your work by examining the C:ArcpyBookcodeCh8GeometryTokenEntireGeometry.py solution file. Save and run the script. You should see the following output. Your time will vary from mine, but notice that the execution time is slower. In this case, it's only a little over a second slower, but we're only returning 2600 features. If the feature class were significantly larger, as many are, this would be amplified: Parcel owner: CITY OF AUSTIN ATTN REAL ESTATE DIVISION has a location of: <geoprocessing describe geometry object object at 0x06B9BE00> Parcel owner: CITY OF AUSTIN ATTN REAL ESTATE DIVISION has a location of: <geoprocessing describe geometry object object at 0x2400A700> Parcel owner: CITY OF AUSTIN has a location of: <geoprocessing describe geometry object object at 0x06B9BE00> Parcel owner: CITY OF AUSTIN % DOROTHY NELL ANDERSON ATTN BARRY LEE ANDERSON has a location of: <geoprocessing describe geometry object object at 0x2400A700> Execution time: 10.1211390896 How it works… A geometry token can be supplied as one of the field names supplied in the constructor for the cursor. These tokens are used to increase the performance of a cursor by returning only a portion of the geometry instead of the entire geometry. This can dramatically increase the performance of a cursor, particularly when you are working with large polyline or polygon datasets. If you only need specific properties of the geometry in your cursor, you should use these tokens. Inserting rows with InsertCursor You can insert a row into a table or feature class using an InsertCursor object. If you want to insert attribute values along with the new row, you'll need to supply the values in the order found in the attribute table. Getting ready The InsertCursor() function is used to create an InsertCursor object that allows you to programmatically add new records to feature classes and tables. The insertRow() method on the InsertCursor object adds the row. A row in the form of a list or tuple is passed into the insertRow() method. The values in the list must correspond to the field values defined when the InsertCursor object was created. Similar to instances that include other types of cursors, you can also limit the field names returned using the second parameter of the method. This function supports geometry tokens as well. The following code example illustrates how you can use InsertCursor to insert new rows into a feature class. Here, we insert two new wildfire points into the California feature class. The row values to be inserted are defined in a list variable. Then, an InsertCursor object is created, passing in the feature class and fields. Finally, the new rows are inserted into the feature class by using the insertRow() method: rowValues = [(Bastrop','N',3000,(-105.345,32.234)), ('Ft Davis','N', 456, (-109.456,33.468))] fc = "c:/data/wildfires.gdb/California" fields = ["FIRE_NAME", "FIRE_CONTAINED", "ACRES", "SHAPE@XY"] with arcpy.da.InsertCursor(fc, fields) as cursor: for row in rowValues:    cursor.insertRow(row) In this article, you will use InsertCursor to add wildfires retrieved from a .txt file into a point feature class. When inserting rows into a feature class, you will need to know how to add the geometric representation of a feature into the feature class. This can be accomplished by using InsertCursor along with two miscellaneous objects: Array and Point. In this exercise, we will add point features in the form of wildfire incidents to an empty point feature class. In addition to this, you will use Python file manipulation techniques to read the coordinate data from a text file. How to do it… We will be importing the North American wildland fire incident data from a single day in October, 2007. This data is contained in a comma-delimited text file containing one line for each fire incident on this particular day. Each fire incident has a latitude, longitude coordinate pair separated by commas along with a confidence value. This data was derived by automated methods that use remote sensing data to derive the presence or absence of a wildfire. Confidence values can range from 0 to 100. Higher numbers represent a greater confidence that this is indeed a wildfire: Open the file at C:ArcpyBookCh8Wildfire DataNorthAmericaWildfire_2007275.txt and examine the contents.You will notice that this is a simple comma-delimited text file containing the longitude and latitude values for each fire along with a confidence value. We will use Python to read the contents of this file line by line and insert new point features into the FireIncidents feature class located in the C:ArcpyBookCh8 WildfireDataWildlandFires.mdb personal geodatabase. Close the file. Open ArcCatalog. Navigate to C:ArcpyBookCh8WildfireData.You should see a personal geodatabase called WildlandFires. Open this geodatabase and you will see a point feature class called FireIncidents. Right now, this is an empty feature class. We will add features by reading the text file you examined earlier and inserting points. Right-click on FireIncidents and select Properties. Click on the Fields tab.The latitude/longitude values found in the file we examined earlier will be imported into the SHAPE field and the confidence values will be written to the CONFIDENCEVALUE field. Open IDLE and create a new script. Save the script to C:ArcpyBookCh8InsertWildfires.py. Import the arcpy modules: import arcpy Set the workspace: arcpy.env.workspace = "C:/ArcpyBook/Ch8/WildfireData/WildlandFires.mdb" Open the text file and read all the lines into a list: f = open("C:/ArcpyBook/Ch8/WildfireData/NorthAmericaWildfires_2007275.txt","r") lstFires = f.readlines() Start a try block: try: Create an InsertCursor object using a with block. Make sure you indent inside the try statement. The cursor will be created in the FireIncidents feature class: with arcpy.da.InsertCursor("FireIncidents",("SHAPE@XY","CONFIDENCEVALUE")) as cur: Create a counter variable that will be used to print the progress of the script: cntr = 1 Loop through the text file line by line using a for loop. Since the text file is comma-delimited, we'll use the Python split() function to separate each value into a list variable called vals. We'll then pull out the individual latitude, longitude, and confidence value items and assign them to variables. Finally, we'll place these values into a list variable called rowValue, which is then passed into the insertRow() function for the InsertCursor object, and we then print a message: for fire in lstFires:      if 'Latitude' in fire:        continue      vals = fire.split(",")      latitude = float(vals[0])      longitude = float(vals[1])      confid = int(vals[2])      rowValue = [(longitude,latitude),confid]      cur.insertRow(rowValue)      print("Record number " + str(cntr) + " written to feature class")      #arcpy.AddMessage("Record number" + str(cntr) + " written to feature class")      cntr = cntr + 1 Add the except block to print any errors that may occur: except Exception as e: print(e.message) Add a finally block to close the text file: finally: f.close() The entire script should appear as follows: import arcpy   arcpy.env.workspace = "C:/ArcpyBook/Ch8/WildfireData/WildlandFires.mdb" f = open("C:/ArcpyBook/Ch8/WildfireData/NorthAmericaWildfires_2007275.txt","r") lstFires = f.readlines() try: with arcpy.da.InsertCursor("FireIncidents", ("SHAPE@XY","CONFIDENCEVALUE")) as cur:    cntr = 1    for fire in lstFires:      if 'Latitude' in fire:        continue      vals = fire.split(",")      latitude = float(vals[0])      longitude = float(vals[1])      confid = int(vals[2])      rowValue = [(longitude,latitude),confid]      cur.insertRow(rowValue)      print("Record number " + str(cntr) + " written to feature class")      #arcpy.AddMessage("Record number" + str(cntr) + "       written to feature class")      cntr = cntr + 1 except Exception as e: print(e.message) finally: f.close() You can check your work by examining the C:ArcpyBookcodeCh8InsertWildfires.py solution file. Save and run the script. You should see messages being written to the output window as the script runs: Record number: 406 written to feature class Record number: 407 written to feature class Record number: 408 written to feature class Record number: 409 written to feature class Record number: 410 written to feature class Record number: 411 written to feature class Open ArcMap and add the FireIncidents feature class to the table of contents. The points should be visible, as shown in the following screenshot: You may want to add a basemap to provide some reference for the data. In ArcMap, click on the Add Basemap button and select a basemap from the gallery. How it works… Some additional explanation may be needed here. The lstFires variable contains a list of all the wildfires that were contained in the comma-delimited text file. The for loop will loop through each of these records one by one, inserting each individual record into the fire variable. We also include an if statement that is used to skip the first record in the file, which serves as the header. As I explained earlier, we then pull out the individual latitude, longitude, and confidence value items from the vals variable, which is just a Python list object and assign them to variables called latitude, longitude, and confid. We then place these values into a new list variable called rowValue in the order that we defined when we created InsertCursor. Thus, the latitude and longitude pair should be placed first followed by the confidence value. Finally, we call the insertRow() function on the InsertCursor object assigned to the cur variable, passing in the new rowValue variable. We close by printing a message that indicates the progress of the script and also create the except and finally blocks to handle errors and close the text file. Placing the file.close() method in the finally block ensures that it will execute and close the file even if there is an error in the previous try statement. Updating rows with UpdateCursor If you need to edit or delete rows from a table or feature class, you can use UpdateCursor. As is the case with InsertCursor, the contents of UpdateCursor can be limited through the use of a where clause. Getting ready The UpdateCursor() function can be used to either update or delete rows in a table or feature class. The returned cursor places a lock on the data, which will automatically be released if used inside a Python with statement. An UpdateCursor object is returned from a call to this method. The UpdateCursor object places a lock on the data while it's being edited or deleted. If the cursor is used inside a Python with statement, the lock will automatically be freed after the data has been processed. This hasn't always been the case. Previous versions of cursors were required to be manually released using the Python del statement. Once an instance of UpdateCursor has been obtained, you can then call the updateCursor() method to update records in tables or feature classes and the deleteRow() method can be used to delete a row. In this article, you're going to write a script that updates each feature in the FireIncidents feature class by assigning a value of poor, fair, good, or excellent to a new field that is more descriptive of the confidence values using an UpdateCursor. Prior to updating the records, your script will add the new field to the FireIncidents feature class. How to do it… Follow these steps to create an UpdateCursor object that will be used to edit rows in a feature class: Open IDLE and create a new script. Save the script to C:ArcpyBookCh8UpdateWildfires.py. Import the arcpy module: import arcpy Set the workspace: arcpy.env.workspace = "C:/ArcpyBook/Ch8/WildfireData/WildlandFires.mdb" Start a try block: try: Add a new field called CONFID_RATING to the FireIncidents feature class. Make sure to indent inside the try statement: arcpy.AddField_management("FireIncidents","CONFID_RATING", "TEXT","10") print("CONFID_RATING field added to FireIncidents") Create a new instance of UpdateCursor inside a with block: with arcpy.da.UpdateCursor("FireIncidents", ("CONFIDENCEVALUE","CONFID_RATING")) as cursor: Create a counter variable that will be used to print the progress of the script. Make sure you indent this line of code and all the lines of code that follow inside the with block: cntr = 1 Loop through each of the rows in the FireIncidents fire class. Update the CONFID_RATING field according to the following guidelines:     Confidence value 0 to 40 = POOR     Confidence value 41 to 60 = FAIR     Confidence value 61 to 85 = GOOD     Confidence value 86 to 100 = EXCELLENT This can be translated in the following block of code:    for row in cursor:      # update the confid_rating field      if row[0] <= 40:        row[1] = 'POOR'      elif row[0] > 40 and row[0] <= 60:        row[1] = 'FAIR'      elif row[0] > 60 and row[0] <= 85:        row[1] = 'GOOD'      else:        row[1] = 'EXCELLENT'      cursor.updateRow(row)                       print("Record number " + str(cntr) + " updated")      cntr = cntr + 1 Add the except block to print any errors that may occur: except Exception as e: print(e.message) The entire script should appear as follows: import arcpy   arcpy.env.workspace = "C:/ArcpyBook/Ch8/WildfireData/WildlandFires.mdb" try: #create a new field to hold the values arcpy.AddField_management("FireIncidents", "CONFID_RATING","TEXT","10") print("CONFID_RATING field added to FireIncidents") with arcpy.da.UpdateCursor("FireIncidents",("CONFIDENCEVALUE", "CONFID_RATING")) as cursor:    cntr = 1    for row in cursor:      # update the confid_rating field      if row[0] <= 40:        row[1] = 'POOR'      elif row[0] > 40 and row[0] <= 60:        row[1] = 'FAIR'      elif row[0] > 60 and row[0] <= 85:        row[1] = 'GOOD'      else:        row[1] = 'EXCELLENT'      cursor.updateRow(row)                       print("Record number " + str(cntr) + " updated")      cntr = cntr + 1 except Exception as e: print(e.message) You can check your work by examining the C:ArcpyBookcodeCh8UpdateWildfires.py solution file. Save and run the script. You should see messages being written to the output window as the script runs: Record number 406 updated Record number 407 updated Record number 408 updated Record number 409 updated Record number 410 updated Open ArcMap and add the FireIncidents feature class. Open the attribute table and you should see that a new CONFID_RATING field has been added and populated by UpdateCursor: When you insert, update, or delete data in cursors, the changes are permanent and can't be undone if you're working outside an edit session. However, with the new edit session functionality provided by ArcGIS 10.1, you can now make these changes inside an edit session to avoid these problems. We'll cover edit sessions soon. How it works… In this case, we've used UpdateCursor to update each of the features in a feature class. We first used the Add Field tool to add a new field called CONFID_RATING, which will hold new values that we assign based on values found in another field. The groups are poor, fair, good, and excellent and are based on numeric values found in the CONFIDENCEVALUE field. We then created a new instance of UpdateCursor based on the FireIncidents feature class, and returned the two fields mentioned previously. The script then loops through each of the features and assigns a value of poor, fair, good, or excellent to the CONFID_RATING field (row[1]), based on the numeric value found in CONFIDENCEVALUE. A Python if/elif/else structure is used to control the flow of the script based on the numeric value. The value for CONFID_RATING is then committed to the feature class by passing the row variable into the updateRow() method. Summary In this article we studied, how to retrieve features from a feature class with SerchCursor, filtering records with a where clause, improving cursr performance with geoetry tokens, inserting rows with InsertCursor and updating rows with UpdateCursor. Resources for Article: Further resources on this subject: Adding Graphics to the Map [article] Introduction to Mobile Web ArcGIS Development [article] Python functions – Avoid repeating code [article]
Read more
  • 0
  • 0
  • 7055

article-image-integrating-muzzley
Packt
10 Aug 2015
12 min read
Save for later

Integrating with Muzzley

Packt
10 Aug 2015
12 min read
In this article by Miguel de Sousa, author of the book Internet of Things with Intel Galileo, we will cover the following topics: Wiring the circuit The Muzzley IoT ecosystem Creating a Muzzley app Lighting up the entrance door (For more resources related to this topic, see here.) One identified issue regarding IoT is that there will be lots of connected devices and each one speaks its own language, not sharing the same protocols with other devices. This leads to an increasing number of apps to control each of those devices. Every time you purchase connected products, you'll be required to have the exclusive product app, and, in the near future, where it is predicted that more devices will be connected to the Internet than people, this is indeed a problem, which is known as the basket of remotes. Many solutions have been appearing for this problem. Some of them focus on creating common communication standards between the devices or even creating their own protocol such as the Intel Common Connectivity Framework (CCF). A different approach consists in predicting the device's interactions, where collected data is used to predict and trigger actions on the specific devices. An example using this approach is Muzzley. It not only supports a common way to speak with the devices, but also learns from the users' interaction, allowing them to control all their devices from a common app, and on collecting usage data, it can predict users' actions and even make different devices work together. In this article, we will start by understanding what Muzzley is and how we can integrate with it. We will then do some development to allow you to control your own building's entrance door. For this purpose, we will use Galileo as a bridge to communicate with a relay and the Muzzley cloud, allowing you to control the door from a common mobile app and from anywhere as long as there is Internet access. Wiring the circuit In this article, we'll be using a real home AC inter-communicator with a building entrance door unlock button and this will require you to do some homework. This integration will require you to open your inter-communicator and adjust the inner circuit, so be aware that there are always risks of damaging it. If you don't want to use a real inter-communicator, you can replace it by an LED or even by the buzzer module. If you want to use a real device, you can use a DC inter-communicator, but in this guide, we'll only be explaining how to do the wiring using an AC inter-communicator. The first thing you have to do is to take a look at the device manual and check whether it works with AC current, and the voltage it requires. If you can't locate your product manual, search for it online. In this article, we'll be using the solid state relay. This relay accepts a voltage range from 24 V up to 380 V AC, and your inter-communicator should also work in this voltage range. You'll also need some electrical wires and electrical wires junctions: Wire junctions and the solid state relay This equipment will be used to adapt the door unlocking circuit to allow it to be controlled from the Galileo board using a relay. The main idea is to use a relay to close the door opener circuit, resulting in the door being unlocked. This can be accomplished by joining the inter-communicator switch wires with the relay wires. Use some wire and wire junctions to do it, as displayed in the following image: Wiring the circuit The building/house AC circuit is represented in yellow, and S1 and S2 represent the inter-communicator switch (button). On pressing the button, we will also be closing this circuit, and the door will be unlocked. This way, the lock can be controlled both ways, using the original button and the relay. Before starting to wire the circuit, make sure that the inter-communicator circuit is powered off. If you can't switch it off, you can always turn off your house electrical board for a couple of minutes. Make sure that it is powered off by pressing the unlock button and trying to open the door. If you are not sure of what you must do or don't feel comfortable doing it, ask for help from someone more experienced. Open your inter-communicator, locate the switch, and perform the changes displayed in the preceding image (you may have to do some soldering). The Intel Galileo board will then activate the relay using pin 13, where you should wire it to the relay's connector number 3, and the Galileo's ground (GND) should be connected to the relay's connector number 4. Beware that not all the inter-communicator circuits work the same way and although we try to provide a general way to do it, there're always the risk of damaging your device or being electrocuted. Do it at your own risk. Power on your inter-communicator circuit and check whether you can open the door by pressing the unlock door button. If you prefer not using the inter-communicator with the relay, you can always replace it with a buzzer or an LED to simulate the door opening. Also, since the relay is connected to Galileo's pin 13, with the same relay code, you'll have visual feedback from the Galileo's onboard LED. The Muzzley IoT ecosystem Muzzley is an Internet of Things ecosystem that is composed of connected devices, mobile apps, and cloud-based services. Devices can be integrated with Muzzley through the device cloud or the device itself: It offers device control, a rules system, and a machine learning system that predicts and suggests actions, based on the device usage. The mobile app is available for Android, iOS, and Windows phone. It can pack all your Internet-connected devices in to a single common app, allowing them to be controlled together, and to work with other devices that are available in real-world stores or even other homemade connected devices, like the one we will create in this article. Muzzley is known for being one of the first generation platforms with the ability to predict a users' actions by learning from the user's interaction with their own devices. Human behavior is mostly unpredictable, but for convenience, people end up creating routines in their daily lives. The interaction with home devices is an example where human behavior can be observed and learned by an automated system. Muzzley tries to take advantage of these behaviors by identifying the user's recurrent routines and making suggestions that could accelerate and simplify the interaction with the mobile app and devices. Devices that don't know of each others' existence get connected through the user behavior and may create synergies among themselves. When the user starts using the Muzzley app, the interaction is observed by a profiler agent that tries to acquire a behavioral network of the linked cause-effect events. When the frequency of these network associations becomes important enough, the profiler agent emits a suggestion for the user to act upon. For instance, if every time a user arrives home, he switches on the house lights, check the thermostat, and adjust the air conditioner accordingly, the profiler agent will emit a set of suggestions based on this. The cause of the suggestion is identified and shortcuts are offered for the effect-associated action. For instance, the user could receive in the Muzzley app the following suggestions: "You are arriving at a known location. Every time you arrive here, you switch on the «Entrance bulb». Would you like to do it now?"; or "You are arriving at a known location. The thermostat «Living room» says that the temperature is at 15 degrees Celsius. Would you like to set your «Living room» air conditioner to 21 degrees Celsius?" When it comes to security and privacy, Muzzley takes it seriously and all the collected data is used exclusively to analyze behaviors to help make your life easier. This is the system where we will be integrating our door unlocker. Creating a Muzzley app The first step is to own a Muzzley developer account. If you don't have one yet, you can obtain one by visiting https://muzzley.com/developers, clicking on the Sign up button, and submitting the displayed form. To create an app, click on the top menu option Apps and then Create app. Name your App Galileo Lock and if you want to, add a description to your project. As soon as you click on Submit, you'll see two buttons displayed, allowing you to select the integration type: Muzzley allows you to integrate through the product manufacturer cloud or directly with a device. In this example, we will be integrating directly with the device. To do so, click on Device to Cloud integration. Fill in the provider name as you wish and pick two image URLs to be used as the profile (for example, http://hub.packtpub.com/wp-content/uploads/2015/08/Commercial1.jpg) and channel (for example, http://hub.packtpub.com/wp-content/uploads/2015/08/lock.png) images. We can select one of two available ways to add our device: it can be done using UPnP discovery or by inserting a custom serial number. Pick the device discovery option Serial number and ignore the fields Interface UUID and Email Access List; we will come back for them later. Save your changes by pressing the Save changes button. Lighting up the entrance door Now that we can unlock our door from anywhere using the mobile phone with an Internet connection, a nice thing to have is the entrance lights turn on when you open the building door using your Muzzley app. To do this, you can use the Muzzley workers to define rules to perform an action when the door is unlocked using the mobile app. To do this, you'll need to own one of the Muzzley-enabled smart bulbs such as Philips Hue, WeMo LED Lighting, Milight, Easybulb, or LIFX. You can find all the enabled devices in the app profiles selection list: If you don't have those specific lighting devices but have another type of connected device, search the available list to see whether it is supported. If it is, you can use that instead. Add your bulb channel to your account. You should now find it listed in your channels under the category Lighting. If you click on it, you'll be able to control the lights. To activate the trigger option in the lock profile we created previously, go to the Muzzley website and head back to the Profile Spec app, located inside App Details. Expand the property lock status by clicking on the arrow sign in the property #1 - Lock Status section and then expand the controlInterfaces section. Create a new control interface by clicking on the +controlInterface button. In the new controlInterface #1 section, we'll need to define the possible choices of label-values for this property when setting a rule. Feel free to insert an id, and in the control interface option, select the text-picker option. In the config field, we'll need to specify each of the available options, setting the display label and the real value that will be published. Insert the following JSON object: {"options":[{"value":"true","label":"Lock"}, {"value":"false","label":"Unlock"}]}. Now we need to create a trigger. In the profile spec, expand the trigger section. Create a new trigger by clicking on the +trigger button. Inside the newly created section, select the equals condition. Create an input by clicking on +input, insert the ID value, insert the ID of the control interface you have just created in the controlInterfaceId text field. Finally, add the [{"source":"selection.value","target":"data.value"}].path to map the data. Open your mobile app and click on the workers icon. Clicking on Create Worker will display the worker creation menu to you. Here, you'll be able to select a channel component property as a trigger to some other channel component property: Select the lock and select the Lock Status is equal to Unlock trigger. Save it and select the action button. In here, select the smart bulb you own and select the Status On option: After saving this rule, give it a try and use your mobile phone to unlock the door. The smart bulb should then turn on. With this, you can configure many things in your home even before you arrive there. In this specific scenario, we used our door locker as a trigger to accomplish an action on a lightbulb. If you want, you can do the opposite and open the door when a lightbulb lights up a specific color for instance. To do it, similar to how you configured your device trigger, you just have to set up the action options in your device profile page. Summary Everyday objects that surround us are being transformed into information ecosystems and the way we interact with them is slowly changing. Although IoT is growing up fast, it is nowadays in an early stage, and many issues must be solved in order to make it successfully scalable. By 2020, it is estimated that there will be more than 25 billion devices connected to the Internet. This fast growth without security regulations and deep security studies are leading to major concerns regarding the two biggest IoT challenges—security and privacy. Devices in our home that are remotely controllable or even personal data information getting into the wrong hands could be the recipe for a disaster. In this article you have learned the basic steps in wiring the circuit of your Galileo board, creating a Muzzley app, and lighting up the entrance door of your building through your Muzzley app, by using Intel Galileo board as a bridge to communicate with Muzzley cloud. Resources for Article: Further resources on this subject: Getting Started with Intel Galileo [article] Getting the current weather forecast [article] Controlling DC motors using a shield [article]
Read more
  • 0
  • 0
  • 8991
Unlock access to the largest independent learning library in Tech for FREE!
Get unlimited access to 7500+ expert-authored eBooks and video courses covering every tech area you can think of.
Renews at €18.99/month. Cancel anytime
article-image-controls-and-widgets
Packt
10 Aug 2015
25 min read
Save for later

Controls and Widgets

Packt
10 Aug 2015
25 min read
In this article by Chip Lambert and Shreerang Patwardhan, author of the book, Mastering jQuery Mobile, we will take our Civic Center application to the next level and in the process of doing so, we will explore different widgets. We will explore the touch events provided by the jQuery Mobile framework further and then take a look at how this framework interacts with third-party plugins. We will be covering the following different widgets and topics in this article: Collapsible widget Listview widget Range slider widget Radio button widget Touch events Third-party plugins HammerJs FastClick Accessibility (For more resources related to this topic, see here.) Widgets We already made use of widgets as part of the Civic Center application. "Which? Where? When did that happen? What did I miss?" Don't panic as you have missed nothing at all. All the components that we use as part of the jQuery Mobile framework are widgets. The page, buttons, and toolbars are all widgets. So what do we understand about widgets from their usage so far? One thing is pretty evident, widgets are feature-rich and they have a lot of things that are customizable and that can be tweaked as per the requirements of the design. These customizable things are pretty much the methods and events that these small plugins offer to the developers. So all in all: Widgets are feature rich, stateful plugins that have a complete lifecycle, along with methods and events. We will now explore a few widgets as discussed before and we will start off with the collapsible widget. A collapsible widget, more popularly known as the accordion control, is used to display and style a cluster of related content together to be easily accessible to the user. Let's see this collapsible widget in action. Pull up the index.html file. We will be adding the collapsible widget to the facilities page. You can jump directly to the content div of the facilities page. We will replace the simple-looking, unordered list and add the collapsible widget in its place. Add the following code in place of the <ul>...<li></li>...</ul> portion: <div data-role="collapsibleset"> <div data-role="collapsible"> <h3>Banquet Halls</h3> <p>List of banquet halls will go here</p> </div> <div data-role="collapsible"> <h3>Sports Arena</h3> <p>List of sports arenas will go here</p> </div> <div data-role="collapsible">    <h3>Conference Rooms</h3> <p>List of conference rooms will come here</p> </div> <div data-role="collapsible"> <h3>Ballrooms</h3> <p>List of ballrooms will come here</p> </div> </div> That was pretty simple. As you must have noticed, we are creating a group of collapsibles defined by div with data-role="collapsibleset". Inside this div, we have multiple div elements each with data-role of "collapsible". These data roles instruct the framework to style div as a collapsible. Let's break individual collapsibles further. Each collapsible div has to have a heading tag (h1-h6), which acts as the title for that collapsible. This heading can be followed by any HTML structure that is required as per your application's design. In our application, we added a paragraph tag with some dummy text for now. We will soon be replacing this text with another widget—listview. Before we proceed to look at how we will be doing this, let's see what the facilities page is looking like right now: Now let's take a look at another widget that we will include in our project—the listview widget. The listview widget is a very important widget from the mobile website stand point. The listview widget is highly customizable and can play an important role in the navigation system of your web application as well. In our application, we will include listview within the collapsible div elements that we have just created. Each collapsible will hold the relevant list items which can be linked to a detailed page for each item. Without further discussion, let's take a look at the following code. We have replaced the contents of the first collapsible list item within the paragraph tag with the code to include the listview widget. We will break up the code and discuss the minute details later: <div data-role="collapsible"> <h3>Banquet Halls</h3> <p> <span>We have 3 huge banquet halls named after 3 most celebrated Chef's from across the world.</span> <ul data-role="listview" data-inset="true"> <li> <a href="#">Gordon Ramsay</a> </li> <li> <a href="#">Anthony Bourdain</a> </li> <li> <a href="#">Sanjeev Kapoor</a> </li> </ul> </p> </div> That was pretty simple, right? We replaced the dummy text from the paragraph tag with a span that has some details concerning what that collapsible list is about, and then we have an unordered list with data-role="listview" and some property called data-inset="true". We have seen several data-roles before, and this one is no different. This data-role attribute informs the framework to style the unordered list, such as a tappable button, while a data-inset property informs the framework to apply the inset appearance to the list items. Without this property, the list items would stretch from edge to edge on the mobile device. Try setting the data-inset property to false or removing the property altogether. You will see the results for yourself. Another thing worth noticing in the preceding code is that we have included an anchor tag within the li tags. This anchor tag informs the framework to add a right arrow icon on the extreme right of that list item. Again, this icon is customizable, along with its position and other styling attributes. Right now, our facilities page should appear as seen in the following image: We will now add similar listview widgets within the remaining three collapsible items. The content for the next collapsible item titled Sports Arena should be as follows. Once added, this collapsible item, when expanded, should look as seen in the screenshot that follows the code: <div data-role="collapsible">    <h3>Sports Arena</h3>    <p>        <span>We have 3 huge sport arenas named after 3 most celebrated sport personalities from across the world.       </span>        <ul data-role="listview" data-inset="true">            <li>                <a href="#">Sachin Tendulkar</a>            </li>            <li>                <a href="#">Roger Federer</a>            </li>            <li>                <a href="#">Usain Bolt</a>            </li>        </ul>    </p> </div> The code for the listview widgets that should be included in the next collapsible item titled Conference Rooms. Once added, this collapsible, item when expanded, should look as seen in the image that follows the code: <div data-role="collapsible">    <h3>Conference Rooms</h3>    <p>        <span>            We have 3 huge conference rooms named after 3 largest technology companies.        </span>        <ul data-role="listview" data-inset="true">            <li>                <a href="#">Google</a>            </li>            <li>                <a href="#">Twitter</a>            </li>            <li>                <a href="#">Facebook</a>            </li>        </ul>    </p> </div> The final collapsible list item – Ballrooms – should hold the following code, to include its share of the listview items: <div data-role="collapsible">    <h3>Ballrooms</h3>    <p>        <span>            We have 3 huge ball rooms named after 3 different dance styles from across the world.        </span>        <ul data-role="listview" data-inset="true">            <li>                <a href="#">Ballet</a>            </li>            <li>                <a href="#">Kathak</a>            </li>            <li>                <a href="#">Paso Doble</a>            </li>        </ul>    </p> </div> After adding these listview items, our facilities page should look as seen in the following image: The facilities page now looks much better than it did earlier, and we now understand a couple more very important widgets available in jQuery Mobile—the collapsible widget and the listview Widget. We will now explore two form widgets – slider widget and the radio buttons widget. For this, we will be enhancing our catering page. Let's build a simple tool that will help the visitors of this site estimate the food expense based on the number of guests and the type of cuisine that they choose. Let's get started then. First, we will add the required HTML, to include the slider widget and the radio buttons widget. Scroll down to the content div of the catering page, where we have the paragraph tag containing some text about the Civic Center's catering services. Add the following code after the paragraph tag: <form>    <label style="font-weight: bold; padding: 15px 0px;" for="slider">Number of guests</label>    <input type="range" name="slider" id="slider" data-highlight="true" min="50" max="1000" value="50">    <fieldset data-role="controlgroup" id="cuisine-choices">        <legend style="font-weight: bold; padding: 15px 0px;">Choose your cuisine</legend>        <input type="radio" name="cuisine-choice" id="cuisine-choice-cont" value="15" checked="checked" />        <label for="cuisine-choice-cont">Continental</label>        <input type="radio" name="cuisine-choice" id="cuisine-choice-mex" value="12" />        <label for="cuisine-choice-mex">Mexican</label>        <input type="radio" name="cuisine-choice" id="cuisine-choice-ind" value="14" />        <label for="cuisine-choice-ind">Indian</label>    </fieldset>    <p>        The approximate cost will be: <span style="font-weight: bold;" id="totalCost"></span>    </p> </form> That is not much code, but we are adding and initializing two new form widgets here. Let's take a look at the code in detail: <label style="font-weight: bold; padding: 15px 0px;" for="slider">Number of guests</label> <input type="range" name="slider" id="slider" data-highlight="true" min="50" max="1000" value="50"> We are initializing our first form widget here—the slider widget. The slider widget is an input element of the type range, which accepts a minimum value and maximum value and a default value. We will be using this slider to accept the number of guests. Since the Civic Center can cater to a maximum of 1,000 people, we will set the maximum limit to 1,000 and we expect that we have at least 50 guests, so we set a minimum value of 50. Since the minimum number of guests that we cater for is 50, we set the input's default value to 50. We also set the data-highlight attribute value to true, which informs the framework that the selected area on the slider should be highlighted. Next comes the group of radio buttons. The most important attribute to be considered here is the data-role="controlgroup" set on the fieldset element. Adding this data-role combines the radio buttons into one single group, which helps inform the user that one of the radio buttons is to be selected. This gives a visual indication to the user that one radio button out of the whole lot needs to be selected. The values assigned to each of the radio inputs here indicate the cost per person for that particular cuisine. This value will help us calculate the final dollar value for the number of selected guests and the type of cuisine. Whenever you are using the form widgets, make sure you have the form elements in the hierarchy as required by the jQuery Mobile framework. When the elements are in the required hierarchy, the framework can apply the required styles. At the end of the previous code snippet, we have a paragraph tag where we will populate the approximate cost of catering for the selected number of guests and the type of cuisine selected. The catering page should now look as seen in the following image. Right now, we only have the HTML widgets in place. When you drag the slider or select different radio buttons, you will only see the UI interactions of these widgets and the UI treatments that the framework applies to these widgets. However, the total cost will not be populated yet. We will need to write some JavaScript logic to determine this value, and we will take a look at this in a minute. Before moving to the JavaScript part, make sure you have all the code that is needed: Now let's take a look at the magic part of the code (read JavaScript) that is going to make our widgets usable for the visitors of this Civic Center web application. Add the following JavaScript code in the script tag at the very end of our index.html file: $(document).on('pagecontainershow', function(){    var guests = 50;    var cost = 35;    var totalCost;    $("#slider").on("slidestop", function(event, ui){        guests = $('#slider').val();        totalCost = costCal();        $("#totalCost").text("$" + totalCost);    });    $("input:radio[name=cuisine-choice]").on("click", function() {        cost = $(this).val();        var totalCost = costCal();        $("#totalCost").text("$" + totalCost);    });    function costCal(){        return guests * cost;    } }); That is a pretty small chunk of code and pretty simple too. We will be looking at a few very important events that are part of the framework and that come in very handy when developing web applications with jQuery Mobile. One of the most important things that you must have already noticed is that we are not making use of the customary $(document).on('ready', function(){ in Jquery, but something that looks as the following code: $(document).on('pagecontainershow', function(){ The million dollar question here is "why doesn't DOM already work in jQuery Mobile?" As part of jQuery, the first thing that we often learn to do is execute our jQuery code as soon as the DOM is ready, and this is identified using the $(document).ready function. In jQuery Mobile, pages are requested and injected into the same DOM as the user navigates from one page to another and so the DOM ready event is as useful as it executes only for the first page. Now we need an event that should execute when every page loads, and $(document).pagecontainershow is the one. The pagecontainershow element is triggered on the toPage after the transition animation has completed. The pagecontainershow element is triggered on the pagecontainer element and not on the actual page. In the function, we initialize the guests and the cost variables to 50 and 35 respectively, as the minimum number of guests we can have is 50 and the "Continental" cuisine is selected by default, which has a value of 35. We will be calculating the estimated cost when the user changes the number of guests or selects a different radio button. This brings us to the next part of our code. We need to get the value of the number of guests as soon as the user stops sliding the slider. jQuery Mobile provides us with the slidestop event for this very purpose. As soon as the user stops sliding, we get the value of the slider and then call the costCal function, which returns a value that is the number of guests multiplied by the cost of the selected cuisine per person. We then display this value in the paragraph at the bottom for the user to get an estimated cost. We will discuss some more about the touch events that are available as part of the jQuery Mobile framework in the next section. When the user selects a different radio button, we retrieve the value of the selected radio button, call the costCal function again, and update the value displayed in the paragraph at the bottom of our page. If you have the code correct and your functions are all working fine, you should see something similar to the following image: Input with touch We will take a look at a couple of touch events, which are tap and taphold. The tap event is triggered after a quick touch; whereas the taphold event is triggered after a sustained, long press touch. The jQuery Mobile tap event is the gesture equivalent of the standard click event that is triggered on the release of the touch gesture. The following snippet of code should help you incorporate the tap event when you need to use it in your application: $(".selector").on("tap", function(){    console.log("tap event is triggered"); }); The jQuery Mobile taphold event triggers after a sustained, complete touch event, which is more commonly known as the long press event. The taphold event fires when the user taps and holds for a minimum of 750 milliseconds. You can also change the default value, but we will come to that in a minute. First, let's see how the taphold event is used: $(".selector").on("taphold", function(){    console.log("taphold event is triggered"); }); Now to change the default value for the long press event, we need to set the value for the following piece of code: $.event.special.tap.tapholdThreshold Working with plugins A number of times, we will come across scenarios where the capabilities of the framework are just not sufficient for all the requirements of your project. In such scenarios, we have to make use of third-party plugins in our project. We will be looking at two very interesting plugins in the course of this article, but before that, you need to understand what jQuery plugins exactly are. A jQuery plugin is simply a new method that has been used to extend jQuery's prototype object. When we include the jQuery plugin as part of our code, this new method becomes available for use within your application. When selecting jQuery plugins for your jQuery Mobile web application, make sure that the plugin is optimized for mobile devices and incorporates touch events as well, based on your requirements. The first plugin that we are going to look at today is called FastClick and is developed by FT Labs. This is an open source plugin and so can be used as part of your application. FastClick is a simple, easy-to-use library designed to eliminate the 300 ms delay between a physical tap and the firing on the click event on mobile browsers. Wait! What are we talking about? What is this 300 ms delay between tap and click? What exactly are we discussing? Sure. We understand the confusion. Let's explain this 300 ms delay issue. The click events have a 300 ms delay on touch devices, which makes web applications feel laggy on a mobile device and doesn't give users a native-like feel. If you go to a site that isn't mobile-optimized, it starts zoomed out. You have to then either pinch and zoom or double tap some content so that it becomes readable. The double-tap is a performance killer, because with every tap we have to wait to see whether it might be a double tap—and this wait is 300 ms. Here is how it plays out: touchstart touchend Wait 300ms in case of another tap click This pause of 300 ms applies to click events in JavaScript, but also other click-based interactions such as links and form controls. Most mobile web browsers out there have this 300 ms delay on the click events, but now a few modern browsers such as Chrome and FireFox for Android and iOS are removing this 300 ms delay. However, if you are supporting the older Android and iOS versions, with older mobile browsers, you might want to consider including the FastClick plugin in your application, which helps resolve this problem. Let's take a look at how we can use this plugin in any web application. First, you need to download the plugin files, or clone their GitHub repository here: https://github.com/ftlabs/fastclick. Once you have done that, include a reference to the plugin's JavaScript file in your application: <script type="application/javascript" src="path/fastclick.js"></script> Make sure that the script is loaded prior to instantiating FastClick on any element of the page. FastClick recommends you to instantiate the plugin on the body element itself. We can do this using the following piece of code: $(function){    FastClick.attach(document.body); } That is it! Your application is now free of the 300 ms click delay issue and will work as smooth as a native application. We have just provided you with an introduction to the FastClick plugin. There are several more features that this plugin provides. Make sure you visit their website—https://github.com/ftlabs/fastclick—for more details on what the plugin has to offer. Another important plugin that we will look at is HammerJs. HammerJs, again is an open source library that helps recognize gestures made by touch, mouse, and pointerEvents. Now, you would say that the jQuery Mobile framework already takes care of this, so why do we need a third-party plugin again? True, jQuery Mobile supports a variety of touch events such as tap, tap and hold, and swipe, as well as the regular mouse events, but what if in our application we want to make use of some touch gestures such as pan, pinch, rotate, and so on, which are not supported by jQuery Mobile by default? This is where HammerJs comes into the picture and plays nicely along with jQuery Mobile. Including HammerJS in your web application code is extremely simple and straightforward, like the FastClick plugin. You need to download the plugin files and then add a reference to the plugin JavaScript file: <script type="application/javascript" src="path/hammer.js"></script> Once you have included the plugin, you need to create a new instance on the Hammer object and then start using the plugin for all the touch gestures you need to support: var hammerPan = new Hammer(element_name, options); hammerPan.on('pan', function(){    console.log("Inside Pan event"); }); By default, Hammer adds a set of events—tap, double tap, swipe, pan, press, pinch, and rotate. The pinch and rotate recognizers are disabled by default, but can be turned on as and when required. HammerJS offers a lot of features that you might want to explore. Make sure you visit their website—http://hammerjs.github.io/ to understand the different features the library has to offer and how you can integrate this plugin within your existing or new jQuery Mobile projects. Accessibility Most of us today cannot imagine our lives without the Internet and our smartphones. Some will even argue that the Internet is the single largest revolutionary invention of all time that has touched numerous lives across the globe. Now, at the click of a mouse or the touch of your fingertip, the world is now at your disposal, provided you can use the mouse, see the screen, and hear the audio—impairments might make it difficult for people to access the Internet. This makes us wonder about how people with disabilities would use the Internet, their frustration in doing so, and the efforts that must be taken to make websites accessible to all. Though estimates vary on this, most studies have revealed that about 15% of the world's population have some kind of disability. Not all of these people would have an issue with accessing the web, but let's assume 5% of these people would face a problem in accessing the web. This 5% is also a considerable amount of users, which cannot be ignored by businesses on the web, and efforts must be taken in the right direction to make the web accessible to these users with disabilities. jQuery Mobile framework comes with built-in support for accessibility. jQuery Mobile is built with accessibility and universal access in mind. Any application that is built using jQuery Mobile is accessible via the screen reader as well. When you make use of the different jQuery Mobile widgets in your application, unknowingly you are also adding support for web accessibility into your application. jQuery Mobile framework adds all the necessary aria attributes to the elements in the DOM. Let's take a look at how the DOM looks for our facilities page: Look at the highlighted Events button in the top right corner and its corresponding HTML (also highlighted) in the developer tools. You will notice that there are a few attributes added to the anchor tag that start with aria-. We did not add any of these aria- attributes when we wrote the code for the Events button. jQuery Mobile library takes care of these things for you. The accessibility implementation is an ongoing process and the awesome developers at jQuery Mobile are working towards improving the support every new release. We spoke about aria- attributes, but what do they really represent? WAI - ARIA stands for Web Accessibility Initiative – Accessible Rich Internet Applications. This was a technical specification published by the World Wide Web Consortium (W3C) and basically specifies how to increase the accessibility of web pages. ARIA specifies the roles, properties, and states of a web page that make it accessible to all users. Accessibility is extremely vast, hence covering every detail of it is not possible. However, there is excellent material available on the Internet on this topic and we encourage you to read and understand this. Try to implement accessibility into your current or next project even if it is not based on jQuery Mobile. Web accessibility is an extremely important thing that should be considered, especially when you are building web applications that will be consumed by a huge consumer base—on e-commerce websites for example. Summary In this article, we made use of some of the available widgets from the jQuery Mobile framework and we built some interactivity into our existing Civic Center application. The widgets that we used included the range slider, the collapsible widget, the listview widget, and the radio button widget. We evaluated and looked at how to use two different third-party plugins—FastClick and HammerJs. We concluded the article by taking a look at the concept of Web Accessibility. Resources for Article: Further resources on this subject: Creating Mobile Dashboards [article] Speeding up Gradle builds for Android [article] Saying Hello to Unity and Android [article]
Read more
  • 0
  • 0
  • 3200

article-image-setting-synchronous-replication
Packt
10 Aug 2015
17 min read
Save for later

Setting Up Synchronous Replication

Packt
10 Aug 2015
17 min read
In this article by the author, Hans-Jürgen Schönig, of the book, PostgreSQL Replication, Second Edition, we learn how to set up synchronous replication. In asynchronous replication, data is submitted and received by the slave (or slaves) after the transaction has been committed on the master. During the time between the master's commit and the point when the slave actually has fully received the data, it can still be lost. Here, you will learn about the following topics: Making sure that no single transaction can be lost Configuring PostgreSQL for synchronous replication Understanding and using application_name The performance impact of synchronous replication Optimizing replication for speed Synchronous replication can be the cornerstone of your replication setup, providing a system that ensures zero data loss. (For more resources related to this topic, see here.) Synchronous replication setup Synchronous replication has been made to protect your data at all costs. The core idea of synchronous replication is that a transaction must be on at least two servers before the master returns success to the client. Making sure that data is on at least two nodes is a key requirement to ensure no data loss in the event of a crash. Setting up synchronous replication works just like setting up asynchronous replication. Just a handful of parameters discussed here have to be changed to enjoy the blessings of synchronous replication. However, if you are about to create a setup based on synchronous replication, we recommend getting started with an asynchronous setup and gradually extending your configuration and turning it into synchronous replication. This will allow you to debug things more easily and avoid problems down the road. Understanding the downside to synchronous replication The most important thing you have to know about synchronous replication is that it is simply expensive. Synchronous replication and its downsides are two of the core reasons for which we have decided to include all this background information in this book. It is essential to understand the physical limitations of synchronous replication, otherwise you could end up in deep trouble. When setting up synchronous replication, try to keep the following things in mind: Minimize the latency Make sure you have redundant connections Synchronous replication is more expensive than asynchronous replication Always cross-check twice whether there is a real need for synchronous replication In many cases, it is perfectly fine to lose a couple of rows in the event of a crash. Synchronous replication can safely be skipped in this case. However, if there is zero tolerance, synchronous replication is a tool that should be used. Understanding the application_name parameter In order to understand a synchronous setup, a config variable called application_name is essential, and it plays an important role in a synchronous setup. In a typical application, people use the application_name parameter for debugging purposes, as it allows users to assign a name to their database connection. It can help track bugs, identify what an application is doing, and so on: test=# SHOW application_name; application_name ------------------ psql (1 row)   test=# SET application_name TO 'whatever'; SET test=# SHOW application_name; application_name ------------------ whatever (1 row) As you can see, it is possible to set the application_name parameter freely. The setting is valid for the session we are in, and will be gone as soon as we disconnect. The question now is: What does application_name have to do with synchronous replication? Well, the story goes like this: if this application_name value happens to be part of synchronous_standby_names, the slave will be a synchronous one. In addition to that, to be a synchronous standby, it has to be: connected streaming data in real-time (that is, not fetching old WAL records) Once a standby becomes synced, it remains in that position until disconnection. In the case of cascaded replication (which means that a slave is again connected to a slave), the cascaded slave is not treated synchronously anymore. Only the first server is considered to be synchronous. With all of this information in mind, we can move forward and configure our first synchronous replication. Making synchronous replication work To show you how synchronous replication works, this article will include a full, working example outlining all the relevant configuration parameters. A couple of changes have to be made to the master. The following settings will be needed in postgresql.conf on the master: wal_level = hot_standby max_wal_senders = 5   # or any number synchronous_standby_names = 'book_sample' hot_standby = on # on the slave to make it readable Then we have to adapt pg_hba.conf. After that, the server can be restarted and the master is ready for action. We recommend that you set wal_keep_segments as well to keep more transaction logs. We also recommend setting wal_keep_segments to keep more transaction logs on the master database. This makes the entire setup way more robust. It is also possible to utilize replication slots. In the next step, we can perform a base backup just as we have done before. We have to call pg_basebackup on the slave. Ideally, we already include the transaction log when doing the base backup. The --xlog-method=stream parameter allows us to fire things up quickly and without any greater risks. The --xlog-method=stream and wal_keep_segments parameters are a good combo, and in our opinion, should be used in most cases to ensure that a setup works flawlessly and safely. We have already recommended setting hot_standby on the master. The config file will be replicated anyway, so you save yourself one trip to postgresql.conf to change this setting. Of course, this is not fine art but an easy and pragmatic approach. Once the base backup has been performed, we can move ahead and write a simple recovery.conf file suitable for synchronous replication, as follows: iMac:slavehs$ cat recovery.conf primary_conninfo = 'host=localhost                    application_name=book_sample                    port=5432'   standby_mode = on The config file looks just like before. The only difference is that we have added application_name to the scenery. Note that the application_name parameter must be identical to the synchronous_standby_names setting on the master. Once we have finished writing recovery.conf, we can fire up the slave. In our example, the slave is on the same server as the master. In this case, you have to ensure that those two instances will use different TCP ports, otherwise the instance that starts second will not be able to fire up. The port can easily be changed in postgresql.conf. After these steps, the database instance can be started. The slave will check out its connection information and connect to the master. Once it has replayed all the relevant transaction logs, it will be in synchronous state. The master and the slave will hold exactly the same data from then on. Checking the replication Now that we have started the database instance, we can connect to the system and see whether things are working properly. To check for replication, we can connect to the master and take a look at pg_stat_replication. For this check, we can connect to any database inside our (master) instance, as follows: postgres=# x Expanded display is on. postgres=# SELECT * FROM pg_stat_replication; -[ RECORD 1 ]----+------------------------------ pid            | 62871 usesysid         | 10 usename         | hs application_name | book_sample client_addr     | ::1 client_hostname | client_port     | 59235 backend_start   | 2013-03-29 14:53:52.352741+01 state           | streaming sent_location   | 0/30001E8 write_location   | 0/30001E8 flush_location   | 0/30001E8 replay_location | 0/30001E8 sync_priority   | 1 sync_state       | sync This system view will show exactly one line per slave attached to your master system. The x command will make the output more readable for you. If you don't use x to transpose the output, the lines will be so long that it will be pretty hard for you to comprehend the content of this table. In expanded display mode, each column will be in one line instead. You can see that the application_name parameter has been taken from the connect string passed to the master by the slave (which is book_sample in our example). As the application_name parameter matches the master's synchronous_standby_names setting, we have convinced the system to replicate synchronously. No transaction can be lost anymore because every transaction will end up on two servers instantly. The sync_state setting will tell you precisely how data is moving from the master to the slave. You can also use a list of application names, or simply a * sign in synchronous_standby_names to indicate that the first slave has to be synchronous. Understanding performance issues At various points in this book, we have already pointed out that synchronous replication is an expensive thing to do. Remember that we have to wait for a remote server and not just the local system. The network between those two nodes is definitely not something that is going to speed things up. Writing to more than one node is always more expensive than writing to only one node. Therefore, we definitely have to keep an eye on speed, otherwise we might face some pretty nasty surprises. Consider what you have learned about the CAP theory earlier in this book. Synchronous replication is exactly where it should be, with the serious impact that the physical limitations will have on performance. The main question you really have to ask yourself is: do I really want to replicate all transactions synchronously? In many cases, you don't. To prove our point, let's imagine a typical scenario: a bank wants to store accounting-related data as well as some logging data. We definitely don't want to lose a couple of million dollars just because a database node goes down. This kind of data might be worth the effort of replicating synchronously. The logging data is quite different, however. It might be far too expensive to cope with the overhead of synchronous replication. So, we want to replicate this data in an asynchronous way to ensure maximum throughput. How can we configure a system to handle important as well as not-so-important transactions nicely? The answer lies in a variable you have already seen earlier in the book—the synchronous_commit variable. Setting synchronous_commit to on In the default PostgreSQL configuration, synchronous_commit has been set to on. In this case, commits will wait until a reply from the current synchronous standby indicates that it has received the commit record of the transaction and has flushed it to the disk. In other words, both servers must report that the data has been written safely. Unless both servers crash at the same time, your data will survive potential problems (crashing of both servers should be pretty unlikely). Setting synchronous_commit to remote_write Flushing to both disks can be highly expensive. In many cases, it is enough to know that the remote server has accepted the XLOG and passed it on to the operating system without flushing things to the disk on the slave. As we can be pretty certain that we don't lose two servers at the very same time, this is a reasonable compromise between performance and consistency with respect to data protection. Setting synchronous_commit to off The idea is to delay WAL writing to reduce disk flushes. This can be used if performance is more important than durability. In the case of replication, it means that we are not replicating in a fully synchronous way. Keep in mind that this can have a serious impact on your application. Imagine a transaction committing on the master and you wanting to query that data instantly on one of the slaves. There would still be a tiny window during which you can actually get outdated data. Setting synchronous_commit to local The local value will flush locally but not wait for the replica to respond. In other words, it will turn your transaction into an asynchronous one. Setting synchronous_commit to local can also cause a small time delay window, during which the slave can actually return slightly outdated data. This phenomenon has to be kept in mind when you decide to offload reads to the slave. In short, if you want to replicate synchronously, you have to ensure that synchronous_commit is set to either on or remote_write. Changing durability settings on the fly Changing the way data is replicated on the fly is easy and highly important to many applications, as it allows the user to control durability on the fly. Not all data has been created equal, and therefore, more important data should be written in a safer way than data that is not as important (such as log files). We have already set up a full synchronous replication infrastructure by adjusting synchronous_standby_names (master) along with the application_name (slave) parameter. The good thing about PostgreSQL is that you can change your durability requirements on the fly: test=# BEGIN; BEGIN test=# CREATE TABLE t_test (id int4); CREATE TABLE test=# SET synchronous_commit TO local; SET test=# x Expanded display is on. test=# SELECT * FROM pg_stat_replication; -[ RECORD 1 ]----+------------------------------ pid             | 62871 usesysid         | 10 usename         | hs application_name | book_sample client_addr     | ::1 client_hostname | client_port     | 59235 backend_start   | 2013-03-29 14:53:52.352741+01 state           | streaming sent_location   | 0/3026258 write_location   | 0/3026258 flush_location   | 0/3026258 replay_location | 0/3026258 sync_priority   | 1 sync_state       | sync   test=# COMMIT; COMMIT In this example, we changed the durability requirements on the fly. This will make sure that this very specific transaction will not wait for the slave to flush to the disk. Note, as you can see, sync_state has not changed. Don't be fooled by what you see here; you can completely rely on the behavior outlined in this section. PostgreSQL is perfectly able to handle each transaction separately. This is a unique feature of this wonderful open source database; it puts you in control and lets you decide which kind of durability requirements you want. Understanding the practical implications and performance We have already talked about practical implications as well as performance implications. But what good is a theoretical example? Let's do a simple benchmark and see how replication behaves. We are performing this kind of testing to show you that various levels of durability are not just a minor topic; they are the key to performance. Let's assume a simple test: in the following scenario, we have connected two equally powerful machines (3 GHz, 8 GB RAM) over a 1 Gbit network. The two machines are next to each other. To demonstrate the impact of synchronous replication, we have left shared_buffers and all other memory parameters as default, and only changed fsync to off to make sure that the effect of disk wait is reduced to practically zero. The test is simple: we use a one-column table with only one integer field and 10,000 single transactions consisting of just one INSERT statement: INSERT INTO t_test VALUES (1); We can try this with full, synchronous replication (synchronous_commit = on): real 0m6.043s user 0m0.131s sys 0m0.169s As you can see, the test has taken around 6 seconds to complete. This test can be repeated with synchronous_commit = local now (which effectively means asynchronous replication): real 0m0.909s user 0m0.101s sys 0m0.142s In this simple test, you can see that the speed has gone up by us much as six times. Of course, this is a brute-force example, which does not fully reflect reality (this was not the goal anyway). What is important to understand, however, is that synchronous versus asynchronous replication is not a matter of a couple of percentage points or so. This should stress our point even more: replicate synchronously only if it is really needed, and if you really have to use synchronous replication, make sure that you limit the number of synchronous transactions to an absolute minimum. Also, please make sure that your network is up to the job. Replicating data synchronously over network connections with high latency will kill your system performance like nothing else. Keep in mind that throwing expensive hardware at the problem will not solve the problem. Doubling the clock speed of your servers will do practically nothing for you because the real limitation will always come from network latency. The performance penalty with just one connection is definitely a lot larger than that with many connections. Remember that things can be done in parallel, and network latency does not make us more I/O or CPU bound, so we can reduce the impact of slow transactions by firing up more concurrent work. When synchronous replication is used, how can you still make sure that performance does not suffer too much? Basically, there are a couple of important suggestions that have proven to be helpful: Use longer transactions: Remember that the system must ensure on commit that the data is available on two servers. We don't care what happens in the middle of a transaction, because anybody outside our transaction cannot see the data anyway. A longer transaction will dramatically reduce network communication. Run stuff concurrently: If you have more than one transaction going on at the same time, it will be beneficial to performance. The reason for this is that the remote server will return the position inside the XLOG that is considered to be processed safely (flushed or accepted). This method ensures that many transactions can be confirmed at the same time. Redundancy and stopping replication When talking about synchronous replication, there is one phenomenon that must not be left out. Imagine we have a two-node cluster replicating synchronously. What happens if the slave dies? The answer is that the master cannot distinguish between a slow and a dead slave easily, so it will start waiting for the slave to come back. At first glance, this looks like nonsense, but if you think about it more deeply, you will figure out that synchronous replication is actually the only correct thing to do. If somebody decides to go for synchronous replication, the data in the system must be worth something, so it must not be at risk. It is better to refuse data and cry out to the end user than to risk data and silently ignore the requirements of high durability. If you decide to use synchronous replication, you must consider using at least three nodes in your cluster. Otherwise, it will be very risky, and you cannot afford to lose a single node without facing significant downtime or risking data loss. Summary Here, we outlined the basic concept of synchronous replication, and showed how data can be replicated synchronously. We also showed how durability requirements can be changed on the fly by modifying PostgreSQL runtime parameters. PostgreSQL gives users the choice of how a transaction should be replicated, and which level of durability is necessary for a certain transaction. Resources for Article: Further resources on this subject: Introducing PostgreSQL 9 [article] PostgreSQL – New Features [article] Installing PostgreSQL [article]
Read more
  • 0
  • 0
  • 4683

article-image-understanding-hadoop-backup-and-recovery-needs
Packt
10 Aug 2015
25 min read
Save for later

Understanding Hadoop Backup and Recovery Needs

Packt
10 Aug 2015
25 min read
In this article by Gaurav Barot, Chintan Mehta, and Amij Patel, authors of the book Hadoop Backup and Recovery Solutions, we will discuss backup and recovery needs. In the present age of information explosion, data is the backbone of business organizations of all sizes. We need a complete data backup and recovery system and a strategy to ensure that critical data is available and accessible when the organizations need it. Data must be protected against loss, damage, theft, and unauthorized changes. If disaster strikes, data recovery must be swift and smooth so that business does not get impacted. Every organization has its own data backup and recovery needs, and priorities based on the applications and systems they are using. Today's IT organizations face the challenge of implementing reliable backup and recovery solutions in the most efficient, cost-effective manner. To meet this challenge, we need to carefully define our business requirements and recovery objectives before deciding on the right backup and recovery strategies or technologies to deploy. (For more resources related to this topic, see here.) Before jumping onto the implementation approach, we first need to know about the backup and recovery strategies and how to efficiently plan them. Understanding the backup and recovery philosophies Backup and recovery is becoming more challenging and complicated, especially with the explosion of data growth and increasing need for data security today. Imagine big players such as Facebook, Yahoo! (the first to implement Hadoop), eBay, and more; how challenging it will be for them to handle unprecedented volumes and velocities of unstructured data, something which traditional relational databases can't handle and deliver. To emphasize the importance of backup, let's take a look at a study conducted in 2009. This was the time when Hadoop was evolving and a handful of bugs still existed in Hadoop. Yahoo! had about 20,000 nodes running Apache Hadoop in 10 different clusters. HDFS lost only 650 blocks, out of 329 million total blocks. Now hold on a second. These blocks were lost due to the bugs found in the Hadoop package. So, imagine what the scenario would be now. I am sure you will bet on losing hardly a block. Being a backup manager, your utmost target is to think, make, strategize, and execute a foolproof backup strategy capable of retrieving data after any disaster. Solely speaking, the plan of the strategy is to protect the files in HDFS against disastrous situations and revamp the files back to their normal state, just like James Bond resurrects after so many blows and probably death-like situations. Coming back to the backup manager's role, the following are the activities of this role: Testing out various case scenarios to forestall any threats, if any, in the future Building a stable recovery point and setup for backup and recovery situations Preplanning and daily organization of the backup schedule Constantly supervising the backup and recovery process and avoiding threats, if any Repairing and constructing solutions for backup processes The ability to reheal, that is, recover from data threats, if they arise (the resurrection power) Data protection is one of the activities and it includes the tasks of maintaining data replicas for long-term storage Resettling data from one destination to another Basically, backup and recovery strategies should cover all the areas mentioned here. For any system data, application, or configuration, transaction logs are mission critical, though it depends on the datasets, configurations, and applications that are used to design the backup and recovery strategies. Hadoop is all about big data processing. After gathering some exabytes for data processing, the following are the obvious questions that we may come up with: What's the best way to back up data? Do we really need to take a backup of these large chunks of data? Where will we find more storage space if the current storage space runs out? Will we have to maintain distributed systems? What if our backup storage unit gets corrupted? The answer to the preceding questions depends on the situation you may be facing; let's see a few situations. One of the situations is where you may be dealing with a plethora of data. Hadoop is used for fact-finding semantics and data is in abundance. Here, the span of data is short; it is short lived and important sources of the data are already backed up. Such is the scenario wherein the policy of not backing up data at all is feasible, as there are already three copies (replicas) in our data nodes (HDFS). Moreover, since Hadoop is still vulnerable to human error, a backup of configuration files and NameNode metadata (dfs.name.dir) should be created. You may find yourself facing a situation where the data center on which Hadoop runs crashes and the data is not available as of now; this results in a failure to connect with mission-critical data. A possible solution here is to back up Hadoop, like any other cluster (the Hadoop command is Hadoop). Replication of data using DistCp To replicate data, the distcp command writes data to two different clusters. Let's look at the distcp command with a few examples or options. DistCp is a handy tool used for large inter/intra cluster copying. It basically expands a list of files to input in order to map tasks, each of which will copy files that are specified in the source list. Let's understand how to use distcp with some of the basic examples. The most common use case of distcp is intercluster copying. Let's see an example: bash$ hadoop distcp2 hdfs://ka-16:8020/parth/ghiya hdfs://ka-001:8020/knowarth/parth This command will expand the namespace under /parth/ghiya on the ka-16 NameNode into the temporary file, get its content, divide them among a set of map tasks, and start copying the process on each task tracker from ka-16 to ka-001. The command used for copying can be generalized as follows: hadoop distcp2 hftp://namenode-location:50070/basePath hdfs://namenode-location Here, hftp://namenode-location:50070/basePath is the source and hdfs://namenode-location is the destination. In the preceding command, namenode-location refers to the hostname and 50070 is the NameNode's HTTP server post. Updating and overwriting using DistCp The -update option is used when we want to copy files from the source that don't exist on the target or have some different contents, which we do not want to erase. The -overwrite option overwrites the target files even if they exist at the source. The files can be invoked by simply adding -update and -overwrite. In the example, we used distcp2, which is an advanced version of DistCp. The process will go smoothly even if we use the distcp command. Now, let's look at two versions of DistCp, the legacy DistCp or just DistCp and the new DistCp or the DistCp2: During the intercluster copy process, files that were skipped during the copy process have all their file attributes (permissions, owner group information, and so on) unchanged when we copy using legacy DistCp or just DistCp. This, however, is not the case in new DistCp. These values are now updated even if a file is skipped. Empty root directories among the source inputs were not created in the target folder in legacy DistCp, which is not the case anymore in the new DistCp. There is a common misconception that Hadoop protects data loss; therefore, we don't need to back up the data in the Hadoop cluster. Since Hadoop replicates data three times by default, this sounds like a safe statement; however, it is not 100 percent safe. While Hadoop protects from hardware failure on the data nodes—meaning that if one entire node goes down, you will not lose any data—there are other ways in which data loss may occur. Data loss may occur due to various reasons, such as Hadoop being highly susceptible to human errors, corrupted data writes, accidental deletions, rack failures, and many such instances. Any of these reasons are likely to cause data loss. Consider an example where a corrupt application can destroy all data replications. During the process, it will attempt to compute each replication and on not finding a possible match, it will delete the replica. User deletions are another example of how data can be lost, as Hadoop's trash mechanism is not enabled by default. Also, one of the most complicated and expensive-to-implement aspects of protecting data in Hadoop is the disaster recovery plan. There are many different approaches to this, and determining which approach is right requires a balance between cost, complexity, and recovery time. A real-life scenario can be Facebook. The data that Facebook holds increases exponentially from 15 TB to 30 PB, that is, 3,000 times the Library of Congress. With increasing data, the problem faced was physical movement of the machines to the new data center, which required man power. Plus, it also impacted services for a period of time. Data availability in a short period of time is a requirement for any service; that's when Facebook started exploring Hadoop. To conquer the problem while dealing with such large repositories of data is yet another headache. The reason why Hadoop was invented was to keep the data bound to neighborhoods on commodity servers and reasonable local storage, and to provide maximum availability to data within the neighborhood. So, a data plan is incomplete without data backup and recovery planning. A big data execution using Hadoop states a situation wherein the focus on the potential to recover from a crisis is mandatory. The backup philosophy We need to determine whether Hadoop, the processes and applications that run on top of it (Pig, Hive, HDFS, and more), and specifically the data stored in HDFS are mission critical. If the data center where Hadoop is running disappeared, will the business stop? Some of the key points that have to be taken into consideration have been explained in the sections that follow; by combining these points, we will arrive at the core of the backup philosophy. Changes since the last backup Considering the backup philosophy that we need to construct, the first thing we are going to look at are changes. We have a sound application running and then we add some changes. In case our system crashes and we need to go back to our last safe state, our backup strategy should have a clause of the changes that have been made. These changes can be either database changes or configuration changes. Our clause should include the following points in order to construct a sound backup strategy: Changes we made since our last backup The count of files changed Ensure that our changes are tracked The possibility of bugs in user applications since the last change implemented, which may cause hindrance and it may be necessary to go back to the last safe state After applying new changes to the last backup, if the application doesn't work as expected, then high priority should be given to the activity of taking the application back to its last safe state or backup. This ensures that the user is not interrupted while using the application or product. The rate of new data arrival The next thing we are going to look at is how many changes we are dealing with. Is our application being updated so much that we are not able to decide what the last stable version was? Data is produced at a surpassing rate. Consider Facebook, which alone produces 250 TB of data a day. Data production occurs at an exponential rate. Soon, terms such as zettabytes will come upon a common place. Our clause should include the following points in order to construct a sound backup: The rate at which new data is arriving The need for backing up each and every change The time factor involved in backup between two changes Policies to have a reserve backup storage The size of the cluster The size of a cluster is yet another important factor, wherein we will have to select cluster size such that it will allow us to optimize the environment for our purpose with exceptional results. Recalling the Yahoo! example, Yahoo! has 10 clusters all over the world, covering 20,000 nodes. Also, Yahoo! has the maximum number of nodes in its large clusters. Our clause should include the following points in order to construct a sound backup: Selecting the right resource, which will allow us to optimize our environment. The selection of the right resources will vary as per need. Say, for instance, users with I/O-intensive workloads will go for more spindles per core. A Hadoop cluster contains four types of roles, that is, NameNode, JobTracker, TaskTracker, and DataNode. Handling the complexities of optimizing a distributed data center. Priority of the datasets The next thing we are going to look at are the new datasets, which are arriving. With the increase in the rate of new data arrivals, we always face a dilemma of what to backup. Are we tracking all the changes in the backup? Now, if are we backing up all the changes, will our performance be compromised? Our clause should include the following points in order to construct a sound backup: Making the right backup of the dataset Taking backups at a rate that will not compromise performance Selecting the datasets or parts of datasets The next thing we are going to look at is what exactly is backed up. When we deal with large chunks of data, there's always a thought in our mind: Did we miss anything while selecting the datasets or parts of datasets that have not been backed up yet? Our clause should include the following points in order to construct a sound backup: Backup of necessary configuration files Backup of files and application changes The timeliness of data backups With such a huge amount of data collected daily (Facebook), the time interval between backups is yet another important factor. Do we back up our data daily? In two days? In three days? Should we backup small chunks of data daily, or should we back up larger chunks at a later period? Our clause should include the following points in order to construct a sound backup: Dealing with any impacts if the time interval between two backups is large Monitoring a timely backup strategy and going through it The frequency of data backups depends on various aspects. Firstly, it depends on the application and usage. If it is I/O intensive, we may need more backups, as each dataset is not worth losing. If it is not so I/O intensive, we may keep the frequency low. We can determine the timeliness of data backups from the following points: The amount of data that we need to backup The rate at which new updates are coming Determining the window of possible data loss and making it as low as possible Critical datasets that need to be backed up Configuration and permission files that need to be backed up Reducing the window of possible data loss The next thing we are going to look at is how to minimize the window of possible data loss. If our backup frequency is great then what are the chances of data loss? What's our chance of recovering the latest files? Our clause should include the following points in order to construct a sound backup: The potential to recover latest files in the case of a disaster Having a low data-loss probability Backup consistency The next thing we are going to look at is backup consistency. The probability of invalid backups should be less or even better zero. This is because if invalid backups are not tracked, then copies of invalid backups will be made further, which will again disrupt our backup process. Our clause should include the following points in order to construct a sound backup: Avoid copying data when it's being changed Possibly, construct a shell script, which takes timely backups Ensure that the shell script is bug-free Avoiding invalid backups We are going to continue the discussion on invalid backups. As you saw, HDFS makes three copies of our backup for the recovery process. What if the original backup was flawed with errors or bugs? The three copies will be corrupted copies; now, when we recover these flawed copies, the result indeed will be a catastrophe. Our clause should include the following points in order to construct a sound backup: Avoid having a long backup frequency Have the right backup process, and probably having an automated shell script Track unnecessary backups If our backup clause covers all the preceding mentioned points, we surely are on the way to making a good backup strategy. A good backup policy basically covers all these points; so, if a disaster occurs, it always aims to go to the last stable state. That's all about backups. Moving on, let's say a disaster occurs and we need to go to the last stable state. Let's have a look at the recovery philosophy and all the points that make a sound recovery strategy. The recovery philosophy After a deadly storm, we always try to recover from the after-effects of the storm. Similarly, after a disaster, we try to recover from the effects of the disaster. In just one moment, storage capacity which was a boon turns into a curse and just another expensive, useless thing. Starting off with the best question, what will be the best recovery philosophy? Well, it's obvious that the best philosophy will be one wherein we may never have to perform recovery at all. Also, there may be scenarios where we may need to do a manual recovery. Let's look at the possible levels of recovery before moving on to recovery in Hadoop: Recovery to the flawless state Recovery to the last supervised state Recovery to a possible past state Recovery to a sound state Recovery to a stable state So, obviously we want our recovery state to be flawless. But if it's not achieved, we are willing to compromise a little and allow the recovery to go to a possible past state we are aware of. Now, if that's not possible, again we are ready to compromise a little and allow it to go to the last possible sound state. That's how we deal with recovery: first aim for the best, and if not, then compromise a little. Just like the saying goes, "The bigger the storm, more is the work we have to do to recover," here also we can say "The bigger the disaster, more intense is the recovery plan we have to take." So, the recovery philosophy that we construct should cover the following points: An automation system setup that detects a crash and restores the system to the last working state, where the application runs as per expected behavior. The ability to track modified files and copy them. Track the sequences on files, just like an auditor trails his audits. Merge the files that are copied separately. Multiple version copies to maintain a version control. Should be able to treat the updates without impacting the application's security and protection. Delete the original copy only after carefully inspecting the changed copy. Treat new updates but first make sure they are fully functional and will not hinder anything else. If they hinder, then there should be a clause to go to the last safe state. Coming back to recovery in Hadoop, the first question we may think of is what happens when the NameNode goes down? When the NameNode goes down, so does the metadata file (the file that stores data about file owners and file permissions, where the file is stored on data nodes and more), and there will be no one present to route our read/write file request to the data node. Our goal will be to recover the metadata file. HDFS provides an efficient way to handle name node failures. There are basically two places where we can find metadata. First, fsimage and second, the edit logs. Our clause should include the following points: Maintain three copies of the name node. When we try to recover, we get four options, namely, continue, stop, quit, and always. Choose wisely. Give preference to save the safe part of the backups. If there is an ABORT! error, save the safe state. Hadoop provides four recovery modes based on the four options it provides (continue, stop, quit, and always): Continue: This allows you to continue over the bad parts. This option will let you cross over a few stray blocks and continue over to try to produce a full recovery mode. This can be the Prompt when found error mode. Stop: This allows you to stop the recovery process and make an image file of the copy. Now, the part that we stopped won't be recovered, because we are not allowing it to. In this case, we can say that we are having the safe-recovery mode. Quit: This exits the recovery process without making a backup at all. In this, we can say that we are having the no-recovery mode. Always: This is one step further than continue. Always selects continue by default and thus avoids stray blogs found further. This can be the prompt only once mode. We will look at these in further discussions. Now, you may think that the backup and recovery philosophy is cool, but wasn't Hadoop designed to handle these failures? Well, of course, it was invented for this purpose but there's always the possibility of a mashup at some level. Are we overconfident and not ready to take precaution, which can protect us, and are we just entrusting our data blindly with Hadoop? No, certainly we aren't. We are going to take every possible preventive step from our side. In the next topic, we look at the very same topic as to why we need preventive measures to back up Hadoop. Knowing the necessity of backing up Hadoop Change is the fundamental law of nature. There may come a time when Hadoop may be upgraded on the present cluster, as we see many system upgrades everywhere. As no upgrade is bug free, there is a probability that existing applications may not work the way they used to. There may be scenarios where we don't want to lose any data, let alone start HDFS from scratch. This is a scenario where backup is useful, so a user can go back to a point in time. Looking at the HDFS replication process, the NameNode handles the client request to write a file on a DataNode. The DataNode then replicates the block and writes the block to another DataNode. This DataNode repeats the same process. Thus, we have three copies of the same block. Now, how these DataNodes are selected for placing copies of blocks is another issue, which we are going to cover later in Rack awareness. You will see how to place these copies efficiently so as to handle situations such as hardware failure. But the bottom line is when our DataNode is down there's no need to panic; we still have a copy on a different DataNode. Now, this approach gives us various advantages such as: Security: This ensures that blocks are stored on two different DataNodes High write capacity: This writes only on a single DataNode; the replication factor is handled by the DataNode Read options: This denotes better options from where to read; the NameNode maintains records of all the locations of the copies and the distance from the NameNode Block circulation: The client writes only a single block; others are handled through the replication pipeline During the write operation on a DataNode, it receives data from the client as well as passes data to the next DataNode simultaneously; thus, our performance factor is not compromised. Data never passes through the NameNode. The NameNode takes the client's request to write data on a DataNode and processes the request by deciding on the division of files into blocks and the replication factor. The following figure shows the replication pipeline, wherein a block of the file is written and three different copies are made at different DataNode locations: After hearing such a foolproof plan and seeing so many advantages, we again arrive at the same question: is there a need for backup in Hadoop? Of course there is. There often exists a common mistaken belief that Hadoop shelters you against data loss, which gives you the freedom to not take backups in your Hadoop cluster. Hadoop, by convention, has a facility to replicate your data three times by default. Although reassuring, the statement is not safe and does not guarantee foolproof protection against data loss. Hadoop gives you the power to protect your data over hardware failures; the scenario wherein one disk, cluster, node, or region may go down, data will still be preserved for you. However, there are many scenarios where data loss may occur. Consider an example where a classic human-prone error can be the storage locations that the user provides during operations in Hive. If the user provides a location wherein data already exists and they perform a query on the same table, the entire existing data will be deleted, be it of size 1 GB or 1 TB. In the following figure, the client gives a read operation but we have a faulty program. Going through the process, the NameNode is going to see its metadata file for the location of the DataNode containing the block. But when it reads from the DataNode, it's not going to match the requirements, so the NameNode will classify that block as an under replicated block and move on to the next copy of the block. Oops, again we will have the same situation. This way, all the safe copies of the block will be transferred to under replicated blocks, thereby HDFS fails and we need some other backup strategy: When copies do not match the way NameNode explains, it discards the copy and replaces it with a fresh copy that it has. HDFS replicas are not your one-stop solution for protection against data loss. The needs for recovery Now, we need to decide up to what level we want to recover. Like you saw earlier, we have four modes available, which recover either to a safe copy, the last possible state, or no copy at all. Based on your needs decided in the disaster recovery plan we defined earlier, you need to take appropriate steps based on that. We need to look at the following factors: The performance impact (is it compromised?) How large is the data footprint that my recovery method leaves? What is the application downtime? Is there just one backup or are there incremental backups? Is it easy to implement? What is the average recovery time that the method provides? Based on the preceding aspects, we will decide which modes of recovery we need to implement. The following methods are available in Hadoop: Snapshots: Snapshots simply capture a moment in time and allow you to go back to the possible recovery state. Replication: This involves copying data from one cluster and moving it to another cluster, out of the vicinity of the first cluster, so that if one cluster is faulty, it doesn't have an impact on the other. Manual recovery: Probably, the most brutal one is moving data manually from one cluster to another. Clearly, its downsides are large footprints and large application downtime. API: There's always a custom development using the public API available. We will move on to the recovery areas in Hadoop. Understanding recovery areas Recovering data after some sort of disaster needs a well-defined business disaster recovery plan. So, the first step is to decide our business requirements, which will define the need for data availability, precision in data, and requirements for the uptime and downtime of the application. Any disaster recovery policy should basically cover areas as per requirements in the disaster recovery principal. Recovery areas define those portions without which an application won't be able to come back to its normal state. If you are armed and fed with proper information, you will be able to decide the priority of which areas need to be recovered. Recovery areas cover the following core components: Datasets NameNodes Applications Database sets in HBase Let's go back to the Facebook example. Facebook uses a customized version of MySQL for its home page and other interests. But when it comes to Facebook Messenger, Facebook uses the NoSQL database provided by Hadoop. Now, looking from that point of view, Facebook will have both those things in recovery areas and will need different steps to recover each of these areas. Summary In this article, we went through the backup and recovery philosophy and what all points a good backup philosophy should have. We went through what a recovery philosophy constitutes. We saw the modes available for recovery in Hadoop. Then, we looked at why backup is important even though HDFS provides the replication process. Lastly, we looked at the recovery needs and areas. Quite a journey, wasn't it? Well, hold on tight. These are just your first steps into Hadoop User Group (HUG). Resources for Article: Further resources on this subject: Cassandra Architecture [article] Oracle GoldenGate 12c — An Overview [article] Backup and Restore Improvements [article]
Read more
  • 0
  • 0
  • 7059

article-image-securing-openstack-networking
Packt
10 Aug 2015
10 min read
Save for later

Securing OpenStack Networking

Packt
10 Aug 2015
10 min read
In this article by Fabio Alessandro Locati, author of the book OpenStack Cloud Security, you will learn about the importance of firewall, IDS, and IPS. You will also learn about Generic Routing Encapsulation, VXLAN. (For more resources related to this topic, see here.) The importance of firewall, IDS, and IPS The security of a network can and should be achieved in multiple ways. Three components that are critical to the security of a network are: Firewall Intrusion detection system (IDS) Intrusion prevention system (IPS) Firewall Firewalls are systems that control traffic passing through them based on rules. This can seem something like a router, but they are very different. The router allows communication between different networks while the firewall limits communication between networks and hosts. The root of this confusion may occur because very often the router will have the firewall functionality and vice versa. Firewalls need to be connected in a series to your infrastructure. The first paper on the firewall technology appeared in 1988 and designed the packet filter firewall. This kind of firewall is often known as first generation firewall. This kind of firewall analyzes the packages passing through and if the package matches a rule, the firewall will act accordingly to that rule. This firewall will analyze each package by itself and will not consider other aspects such as other packages. It works on the first three layers of the OSI model with very few features using layer 4 specifically to check port numbers and protocols (UDP/TCP). First generation firewalls are still in use, because in a lot of situations, to do the job properly and are cheap and secure. Examples of typical filtering those firewalls prohibit (or allow) to IPs of certain classes (or specific IPs), to access certain IPs, or allow traffic to a specific IP only on specific ports. There are no known attacks to those kind of firewalls, but specific models can have specific bugs that can be exploited. In 1990, a new generation of firewall appeared. The initial name was circuit-level gateway, but today it is far more commonly known as stateful firewalls or second generation firewall. These firewalls are able to understand when connections are being initialized and closed so that the firewall comes to know what is the current state of a connection when a package arrives. To do so, this kind of firewall uses the first four layers of the networking stack. This allows the firewall to drop all packages that are not establishing a new connection or are in an already established connection. These firewalls are very powerful with the TCP protocol because it has states, while they have very small advantages compared to first generation firewalls handling UDP or ICMP packages, since those packages travel with no connection. In these cases, the firewall sets the connection as established; only the first valid package passes through and closes it after the connection times out. Performance-wise, stateful firewall can be faster than packet firewall because if the package is part of an active connection, no further test will be performed against that package. These kinds of firewalls are more susceptible to bugs in their code since reading more about the package makes it easier to exploit. Also, on many devices, it is possible to open connections (with SYN packages) until the firewall is saturated. In such cases, the firewall usually downgrades itself as a simple router allowing all traffic to pass through it. In 1991, improvements were made to the stateful firewall allowing it to understand more about the protocol of the package it was evaluating. The firewalls of this kind before 1994 had major problems, such as working as a proxy that the user had to interact with. In 1994, the first application firewall, as we know it, was born doing all its job completely transparently. To be able to understand the protocol, this kind of firewall requires an understanding of all seven layers of the OSI model. As for security, the same as the stateful firewall does apply to the application firewall as well. Intrusion detection system (IDS) IDSs are systems that monitor the network traffic looking for policy violation and malicious traffic. The goal of the IDS is not to block malicious activity, but instead to log and report them. These systems act in a passive mode, so you'll not see any traffic coming from them. This is very important because it makes them invisible to attackers so you can gain information about the attack, without the attacker knowing. IDSs need to be connected in parallel to your infrastructure. Intrusion prevention system (IPS) IPSs are sometimes referred to as Intrusion Detection and Prevention Systems (IDPS), since they are IDS that are also able to fight back malicious activities. IPSs have greater possibility to act than IDSs. Other than reporting, like IDS, they can also drop malicious packages, reset the connection, and block the traffic from the offending IP address. IPSs need to be connected in series to your infrastructure. Generic Routing Encapsulation (GRE) GRE is a Cisco tuning protocol that is difficult to position in the OSI model. The best place for it to be is between layers 2 and 3. Being above layer 2 (where VLANs are), we can use GRE inside VLAN. We will not go deep into the technicalities of this protocol. I'd like to focus more on the advantages and disadvantages it has over VLAN. The first advantage of (extended) GRE over VLAN is scalability. In fact, VLAN is limited to 4,096, while GRE tunnels do not have this limitation. If you are running a private cloud and you are working in a small corporation, 4,096 networks could be enough, but will definitely not be enough if you work for a big corporation or if you are running a public cloud. Also, unless you use VTP for your VLANs, you'll have to add VLANs to each network device, while GREs don't need this. You cannot have more than 4,096 VLANs in an environment. The second advantage is security. Since you can deploy multiple GRE tunnels in a single VLAN, you can connect a machine to a single VLAN and multiple GRE networks without the risks that come with putting a port in trunking that is needed to bring more VLANs in the same physical port. For these reasons, GRE has been a very common choice in a lot of OpenStack clusters deployed up to OpenStack Havana. The current preferred networking choice (since Icehouse) is Virtual Extensible LAN (VXLAN). VXLAN VXLAN is a network virtualization technology whose specifications have been originally created by Arista Networks, Cisco, and VMWare, and many other companies have backed the project. Its goal is to offer a standardized overlay encapsulation protocol and it was created because the standard VLAN were too limited for the current cloud needs and the GRE protocol was a Cisco protocol. It works using layer 2 Ethernet frames within layer 4 UDP packages on port 4789. As for the maximum number of networks, the limit is 16 million logical networks. Since the Icehouse release, the suggested standard for networking is VXLAN. Flat network versus VLAN versus GRE in OpenStack Quantum In OpenStack Quantum, you can decide to use multiple technologies for your networks: flat network, VLAN, GRE, and the most recent, VXLAN. Let's discuss them in detail: Flat network: It is often used in private clouds since it is very easy to set up. The downside is that any virtual machine will see any other virtual machines in our cloud. I strongly discourage people from using this network design because it's unsafe, and in the long run, it will have problems, as we have seen earlier. VLAN: It is sometimes used in bigger private clouds and sometimes even in small public clouds. The advantage is that many times you already have a VLAN-based installation in your company. The major disadvantages are the need to trunk ports for each physical host and the possible problems in propagation. I discourage this approach, since in my opinion, the advantages are very limited while the disadvantages are pretty strong. VXLAN: It should be used in any kind of cloud due to its technical advantages. It allows a huge number of networks, its way more secure, and often eases debugging. GRE: Until the Havana release, it was the suggested protocol, but since the Icehouse release, the suggestion has been to move toward VXLAN, where the majority of the development is focused. Design a secure network for your OpenStack deployment As for the physical infrastructure, we have to design it securely. We have seen that the network security is critical and that there a lot of possible attacks in this realm. Is it possible to design a secure environment to run OpenStack? Yes it is, if you remember a few rules: Create different networks, at the very least for management and external data (this network usually already exists in your organization and is the one where all your clients are) Never put ports on trunking mode if you use VLANs in your infrastructure, otherwise physically separated networks will be needed The following diagram is an example of how to implement it: Here, the management, tenant external networks could be either VLAN or real networks. Remember that to not use VLAN trunking, you need at least the same amount of physical ports as of VLAN, and the machine has to be subscribed to avoid port trunking that can be a huge security hole. A management network is needed for the administrator to administer the machines and for the OpenStack services to speak to each other. This network is critical, since it may contain sensible data, and for this reason, it has to be disconnected from other networks, or if not possible, have very limited connectivity. The external network is used by virtual machines to access the Internet (and vice versa). In this network, all machines will need an IP address reachable from the Web. The tenant network, sometimes even called internal or guest network is the network where the virtual machines can communicate with other virtual machines in the same cloud. This network, in some deployment cases, can be merged with the external network, but this choice has some security drawbacks. The API network is used to expose OpenStack APIs to the users. This network requires IP addresses reachable from the Web, and for this reason, is often merged into the external network. There are cases where provider networks are needed to connect tenant networks to existing networks outside the OpenStack cluster. Those networks are created by the OpenStack administrator and map directly to an existing physical network in the data center. Summary In this article, we have seen how networking works, which attacks we can expect, and how we can counter them. Also, we have seen how to implement a secure deployment of OpenStack Networking. Resources for Article: Further resources on this subject: Cloud distribution points [Article] Photo Stream with iCloud [Article] Integrating Accumulo into Various Cloud Platforms [Article]
Read more
  • 0
  • 0
  • 12711
article-image-creating-functions-and-operations
Packt
10 Aug 2015
18 min read
Save for later

Creating Functions and Operations

Packt
10 Aug 2015
18 min read
In this article by Alex Libby, author of the book Sass Essentials, we will learn how to use operators or functions to construct a whole site theme from just a handful of colors, or defining font sizes for the entire site from a single value. You will learn how to do all these things in this article. Okay, so let's get started! (For more resources related to this topic, see here.) Creating values using functions and operators Imagine a scenario where you're creating a masterpiece that has taken days to put together, with a stunning choice of colors that has taken almost as long as the building of the project and yet, the client isn't happy with the color choice. What to do? At this point, I'm sure that while you're all smiles to the customer, you'd be quietly cursing the amount of work they've just landed you with, this late on a Friday. Sound familiar? I'll bet you scrap the colors and go back to poring over lots of color combinations, right? It'll work, but it will surely take a lot more time and effort. There's a better way to achieve this; instead of creating or choosing lots of different colors, we only need to choose one and create all of the others automatically. How? Easy! When working with Sass, we can use a little bit of simple math to build our color palette. One of the key tenets of Sass is its ability to work out values dynamically, using nothing more than a little simple math; we could define font sizes from H1 to H6 automatically, create new shades of colors, or even work out the right percentages to use when creating responsive sites! We will take a look at each of these examples throughout the article, but for now, let's focus on the principles of creating our colors using Sass. Creating colors using functions We can use simple math and functions to create just about any type of value, but colors are where these two really come into their own. The great thing about Sass is that we can work out the hex value for just about any color we want to, from a limited range of colors. This can easily be done using techniques such as adding two values together, or subtracting one value from another. To get a feel of how the color operators work, head over to the official documentation at http://sass-lang.com/documentation/file.SASS_REFERENCE.html#color_operations—it is worth reading! Nothing wrong with adding or subtracting values—it's a perfectly valid option, and will result in a valid hex code when compiled. But would you know that both values are actually deep shades of blue? Therein lies the benefit of using functions; instead of using math operators, we can simply say this: p { color: darken(#010203, 10%); } This, I am sure you will agree, is easier to understand as well as being infinitely more readable! The use of functions opens up a world of opportunities for us. We can use any one of the array of functions such as lighten(), darken(), mix(), or adjust-hue() to get a feel of how easy it is to get the values. If we head over to http://jackiebalzer.com/color, we can see that the author has exploded a number of Sass (and Compass—we will use this later) functions, so we can see what colors are displayed, along with their numerical values, as soon as we change the initial two values. Okay, we could play with the site ad infinitum, but I feel a demo coming on—to explore the effects of using the color functions to generate new colors. Let's construct a simple demo. For this exercise, we will dig up a copy of the colorvariables demo and modify it so that we're only assigning one color variable, not six. For this exercise, I will assume you are using Koala to compile the code. Okay, let's make a start: We'll start with opening up a copy of colorvariables.scss in your favorite text editor and removing lines 1 to 15 from the start of the file. Next, add the following lines, so that we should be left with this at the start of the file: $darkRed: #a43; $white: #fff; $black: #000;   $colorBox1: $darkRed; $colorBox2: lighten($darkRed, 30%); $colorBox3: adjust-hue($darkRed, 35%); $colorBox4: complement($darkRed); $colorBox5: saturate($darkRed, 30%); $colorBox6: adjust-color($darkRed, $green: 25); Save the file as colorfunctions.scss. We need a copy of the markup file to go with this code, so go ahead and extract a copy of colorvariables.html from the code download, saving it as colorfunctions.html in the root of our project area. Don't forget to change the link for the CSS file within to colorfunctions.css! Fire up Koala, then drag and drop colorfunctions.scss from our project area over the main part of the application window to add it to the list: Right-click on the file name and select Compile, and then wait for it to show Success in a green information box. If we preview the results of our work in a browser, we should see the following boxes appear: At this point, we have a working set of colors—granted, we might have to work a little on making sure that they all work together. But the key point here is that we have only specified one color, and that the others are all calculated automatically through Sass. Now that we are only defining one color by default, how easy is it to change the colors in our code? Well, it is a cinch to do so. Let's try it out using the help of the SassMeister playground. Changing the colors in use We can easily change the values used in the code, and continue to refresh the browser after each change. However, this isn't a quick way to figure out which colors work; to get a quicker response, there is an easier way: use the online Sass playground at http://www.sassmeister.com. This is the perfect way to try out different colors—the site automatically recompiles the code and updates the result as soon as we make a change. Try copying the HTML and SCSS code into the play area to view the result. The following screenshot shows the same code used in our demo, ready for us to try using different calculations: All images work on the principle that we take a base color (in this case, $dark-blue, or #a43), then adjust the color either by a percentage or a numeric value. When compiled, Sass calculates what the new value should be and uses this in the CSS. Take, for example, the color used for #box6, which is a dark orange with a brown tone, as shown in this screenshot: To get a feel of some of the functions that we can use to create new colors (or shades of existing colors), take a look at the main documentation at http://sass-lang.com/documentation/Sass/Script/Functions.html, or https://www.makerscabin.com/web/sass/learn/colors. These sites list a variety of different functions that we can use to create our masterpiece. We can also extend the functions that we have in Sass with the help of custom functions, such as the toolbox available at https://github.com/at-import/color-schemer—this may be worth a look. In our demo, we used a dark red color as our base. If we're ever stuck for ideas on colors, or want to get the right HEX, RGB(A), or even HSL(A) codes, then there are dozens of sites online that will give us these values. Here are a couple of them that you can try: HSLa Explorer, by Chris Coyier—this is available at https://css-tricks.com/examples/HSLaExplorer/. HSL Color Picker by Brandon Mathis—this is available at http://hslpicker.com/. If we know the name, but want to get a Sass value, then we can always try the list of 1,500+ colors at https://github.com/FearMediocrity/sass-color-palettes/blob/master/colors.scss. What's more, the list can easily be imported into our CSS, although it would make better sense to simply copy the chosen values into our Sass file, and compile from there instead. Mixing colors The one thing that we've not discussed, but is equally useful is that we are not limited to using functions on their own; we can mix and match any number of functions to produce our colors. A great way to choose colors, and get the appropriate blend of functions to use, is at http://sassme.arc90.com/. Using the available sliders, we can choose our color, and get the appropriate functions to use in our Sass code. The following image shows how: In most cases, we will likely only need to use two functions (a mix of darken and adjust hue, for example); if we are using more than two–three functions, then we should perhaps rethink our approach! In this case, a better alternative is to use Sass's mix() function, as follows: $white: #fff; $berry: hsl(267, 100%, 35%); p { mix($white, $berry, 0.7) } …which will give the following valid CSS: p { color: #5101b3; } This is a useful alternative to use in place of the command we've just touched on; after all, would you understand what adjust_hue(desaturate(darken(#db4e29, 2), 41), 67) would give as a color? Granted, it is something of an extreme calculation, nonetheless, it is technically valid. If we use mix() instead, it matches more closely to what we might do, for example, when mixing paint. After all, how else would we lighten its color, if not by adding a light-colored paint? Okay, let's move on. What's next? I hear you ask. Well, so far we've used core Sass for all our functions, but it's time to go a little further afield. Let's take a look at how you can use external libraries to add extra functionality. In our next demo, we're going to introduce using Compass, which you will often see being used with Sass. Using an external library So far, we've looked at using core Sass functions to produce our colors—nothing wrong with this; the question is, can we take things a step further? Absolutely, once we've gained some experience with using these functions, we can introduce custom functions (or helpers) that expand what we can do. A great library for this purpose is Compass, available at http://www.compass-style.org; we'll make use of this to change the colors which we created from our earlier boxes demo, in the section, Creating colors using functions. Compass is a CSS authoring framework, which provides extra mixins and reusable patterns to add extra functionality to Sass. In our demo, we're using shade(), which is one of the several color helpers provided by the Compass library. Let's make a start: We're using Compass in this demo, so we'll begin with installing the library. To do this, fire up Command Prompt, then navigate to our project area. We need to make sure that our installation RubyGems system software is up to date, so at Command Prompt, enter the following, and then press Enter: gem update --system Next, we're installing Compass itself—at the prompt, enter this command, and then press Enter: gem install compass Compass works best when we get it to create a project shell (or template) for us. To do this, first browse to http://www.compass-style.org/install, and then enter the following in the Tell us about your project… area: Leave anything in grey text as blank. This produces the following commands—enter each at Command Prompt, pressing Enter each time: Navigate back to Command Prompt. We need to compile our SCSS code, so go ahead and enter this command at the prompt (or copy and paste it), then press Enter: compass watch –sourcemap Next, extract a copy of the colorlibrary folder from the code download, and save it to the project area. In colorlibrary.scss, comment out the existing line for $backgrd_box6_color, and add the following immediately below it: $backgrd_box6_color: shade($backgrd_box5_color, 25%); Save the changes to colorlibrary.scss. If all is well, Compass's watch facility should kick in and recompile the code automatically. To verify that this has been done, look in the css subfolder of the colorlibrary folder, and you should see both the compiled CSS and the source map files present. If you find Compass compiles files in unexpected folders, then try using the following command to specify the source and destination folders when compiling: compass watch --sass-dir sass --css-dir css If all is well, we will see the boxes, when previewing the results in a browser window, as in the following image. Notice how Box 6 has gone a nice shade of deep red (if not almost brown)? To really confirm that all the changes have taken place as required, we can fire up a DOM inspector such as Firebug; a quick check confirms that the color has indeed changed: If we explore even further, we can see that the compiled code shows that the original line for Box 6 has been commented out, and that we're using the new function from the Compass helper library: This is a great way to push the boundaries of what we can do when creating colors. To learn more about using the Compass helper functions, it's worth exploring the official documentation at http://compass-style.org/reference/compass/helpers/colors/. We used the shade() function in our code, which darkens the color used. There is a key difference to using something such as darken() to perform the same change. To get a feel of the difference, take a look at the article on the CreativeBloq website at http://www.creativebloq.com/css3/colour-theming-sass-and-compass-6135593, which explains the difference very well. The documentation is a little lacking in terms of how to use the color helpers; the key is not to treat them as if they were normal mixins or functions, but to simply reference them in our code. To explore more on how to use these functions, take a look at the article by Antti Hiljá at http://clubmate.fi/how-to-use-the-compass-helper-functions/. We can, of course, create mixins to create palettes—for a more complex example, take a look at http://www.zingdesign.com/how-to-generate-a-colour-palette-with-compass/ to understand how such a mixin can be created using Compass. Okay, let's move on. So far, we've talked about using functions to manipulate colors; the flip side is that we are likely to use operators to manipulate values such as font sizes. For now, let's change tack and take a look at creating new values for changing font sizes. Changing font sizes using operators We already talked about using functions to create practically any value. Well, we've seen how to do it with colors; we can apply similar principles to creating font sizes too. In this case, we set a base font size (in the same way that we set a base color), and then simply increase or decrease font sizes as desired. In this instance, we won't use functions, but instead, use standard math operators, such as add, subtract, or divide. When working with these operators, there are a couple of points to remember: Sass math functions preserve units—this means we can't work on numbers with different units, such as adding a px value to a rem value, but can work with numbers that can be converted to the same format, such as inches to centimeters If we multiply two values with the same units, then this will produce square units (that is, 10px * 10px == 100px * px). At the same time, px * px will throw an error as it is an invalid unit in CSS. There are some quirks when working with / as a division operator —in most instances, it is normally used to separate two values, such as defining a pair of font size values. However, if the value is surrounded in parentheses, used as a part of another arithmetic expression, or is stored in a variable, then this will be treated as a division operator. For full details, it is worth reading the relevant section in the official documentation at http://sass-lang.com/documentation/file.Sass_REFERENCE.html#division-and-slash. With these in mind, let's create a simple demo—a perfect use for Sass is to automatically work out sizes from H1 through to H6. We could just do this in a simple text editor, but this time, let's break with tradition and build our demo directly into a session on http://www.sassmeister.com. We can then play around with the values set, and see the effects of the changes immediately. If we're happy with the results of our work, we can copy the final version into a text editor and save them as standard SCSS (or CSS) files. Let's begin by browsing to http://www.sassmeister.com, and adding the following HTML markup window: <html> <head>    <meta charset="utf-8" />    <title>Demo: Assigning colors using variables</title>    <link rel="stylesheet" type="text/css" href="css/     colorvariables.css"> </head> <body>    <h1>The cat sat on the mat</h1>    <h2>The cat sat on the mat</h2>    <h3>The cat sat on the mat</h3>    <h4>The cat sat on the mat</h4>    <h5>The cat sat on the mat</h5>    <h6>The cat sat on the mat</h6> </body> </html> Next, add the following to the SCSS window—we first set a base value of 3.0, followed by a starting color of #b26d61, or a dark, moderate red: $baseSize: 3.0; $baseColor: #b26d61; We need to add our H1 to H6 styles. The rem mixin was created by Chris Coyier, at https://css-tricks.com/snippets/css/less-mixin-for-rem-font-sizing/. We first set the font size, followed by setting the font color, using either the base color set earlier, or a function to produce a different shade: h1 { font-size: $baseSize; color: $baseColor; }   h2 { font-size: ($baseSize - 0.2); color: darken($baseColor, 20%); }   h3 { font-size: ($baseSize - 0.4); color: lighten($baseColor, 10%); }   h4 { font-size: ($baseSize - 0.6); color: saturate($baseColor, 20%); }   h5 { font-size: ($baseSize - 0.8); color: $baseColor - 111; }   h6 { font-size: ($baseSize - 1.0); color: rgb(red($baseColor) + 10, 23, 145); } SassMeister will automatically compile the code to produce a valid CSS, as shown in this screenshot: Try changing the base size of 3.0 to a different value—using http://www.sassmeister.com, we can instantly see how this affects the overall size of each H value. Note how we're multiplying the base variable by 10 to set the pixel value, or simply using the value passed to render each heading. In each instance, we can concatenate the appropriate unit using a plus (+) symbol. We then subtract an increasing value from $baseSize, before using this value as the font size for the relevant H value. You can see a similar example of this by Andy Baudoin as a CodePen, at http://codepen.io/baudoin/pen/HdliD/. He makes good use of nesting to display the color and strength of shade. Note that it uses a little JavaScript to add the text of the color that each line represents, and can be ignored; it does not affect the Sass used in the demo. The great thing about using a site such SassMeister is that we can play around with values and immediately see the results. For more details on using number operations in Sass, browse to the official documentation, which is at http://sass-lang.com/documentation/file.Sass_REFERENCE.html#number_operations. Okay, onwards we go. Let's turn our attention to creating something a little more substantial; we're going to create a complete site theme using the power of Sass and a few simple calculations. Summary Phew! What a tour! One of the key concepts of Sass is the use of functions and operators to create values, so let's take a moment to recap what we have covered throughout this article. We kicked off with a look at creating color values using functions, before discovering how we can mix and match different functions to create different shades, or using external libraries to add extra functionality to Sass. We then moved on to take a look at another key use of functions, with a look at defining different font sizes, using standard math operators. Resources for Article: Further resources on this subject: Nesting, Extend, Placeholders, and Mixins [article] Implementation of SASS [article] Constructing Common UI Widgets [article]
Read more
  • 0
  • 0
  • 4138

article-image-updating-and-building-our-masters
Packt
10 Aug 2015
20 min read
Save for later

Updating and building our masters

Packt
10 Aug 2015
20 min read
In this article by John Henry Krahenbuhl, the author of the book, Axure Prototyping Blueprints, we determine that with modification, we can use all of the masters from the previous community site. To support our new use cases, we need additional registration variables, a master to support user registration, and interactions for the creation of, and to comment on, posts. Next we will create global variables and add new masters, as well as enhance the design and interactions for each master. (For more resources related to this topic, see here.) Creating additional global variables Based on project requirements, we identified that nine global variables will be required. To create global variables, on the main menu click on Project and then click on Global Variables…. In the Global Variables dialog, perform the following steps: Click the green + sign and type Email. Click on the Default Value field and type songwriter@test.com. Repeat step 1 eight more times to create additional variables using the following table for the Variable Name and Default Value fields: Variable Name Default Value Password Grammy UserEmail   UserPassword   LoggedIn No TopicIndex 0 UserText   NewPostTopic   NewPostHeadline   Click on OK. With our global variables created, we are now ready to create new masters, as well as update the design and interactions for existing masters. We will start by adding masters to the Masters pane. Adding masters to the Masters pane We will add a total of two masters to the Masters pane. To create our masters, perform the following steps: In the Masters pane, click on the, Add Master icon ,type PostCommentary and press Enter. Again, in the Masters pane, click on the Add Master icon , type NewPost and press Enter. In the same Masters pane, right-click on the icon next to the Header master, mouse over Drop Behavior and click on Lock to Master Location. We are now ready to remodel the existing masters and complete the design and interactions for our new masters. We will start with the Header master. Enhancing our Header master Once completed, the Header master will look as follows: To update the Header master, we will add an ErrorMessage label, delete the Search widgets, and update the menu items. To update widgets on the Header master, perform the following steps: In the Masters pane, double-click on the icon  next to the Header master to open in the design area. In the Widgets pane, drag the Label widget  and place it at coordinates (730,0). Now, select the Text Field widget and type Your email or password is incorrect.. In the Widget Interactions and Notes pane, click in the Shape Name field and type ErrorMessage. In the Widget Properties and Style pane, with the Style tab selected, scroll to Font and perform the following steps: Change the font size to 8. Click on the down arrow next to the Text Color icon . In the drop-down menu, in the # text field, enter FF0000. In the toolbar, click on the checkbox next to Hidden. Click on the EmailTextField at coordinates (730,10). If text is displayed on the text field, right-click and click Edit Text. All text on the widget will be highlighted, click on Delete. In the Widget Properties and Style pane, with the Properties tab selected, scroll to Text Field and perform the following steps: Next to Hint Text, enter Email. Click Hint Style. In the Set Interaction Styles dialog box, click on the checkbox next to Font Color. Click on the down arrow next to the Text Color icon . In the drop-down menu, in the # text field, enter 999999. Click on OK. Click on the PasswordTextField at coordinates (815,10). If text is displayed on the text field, right-click and click on Edit Text. All text on the widget will be highlighted, press Delete. In the Widget Properties and Style pane, with the Properties tab selected, scroll to Text Field and perform the following steps: Click on the drop-down menu next to Type and select Password. Next to Hint Text, enter Password. Click on Hint Style. In the Set Interaction Styles dialog box, click on the checkbox next to Font Color. Click on the down arrow next to the Text Color icon . In the drop-down menu, in the # text field, enter 999999. Click on OK. Click on the SearchTextField at coordinates (730,82) and then on Delete. Click on the SearchButton at coordinates (890,80) and then on Delete. Next, we will convert all the Log In widgets into a dynamic panel named LoginDP. The LoginDP will allow us to transition between states and show different content when a user logs in. To create the LoginDP, in our header, select the following widgets: Named Widget Coordinates ErrorMessage (730,0) EmailTextField (730,10) PasswordTextField (815,10) LogInButton (894,10) NewUserLink (730,30) ForgotLink (815,30) With the preceding six widgets selected, right-click and click Convert to Dynamic Panel. In the Widget Interactions and Notes pane, click on the Dynamic Panel Name field and type LogInDP. All the Log In widgets are now on State1 of the LogInDP. We will now add widgets to State2 for the LogInDP. With the Log In widgets converted into the LogInDP, we will now add and design State2. In the Widget Manager pane, under the LogInDP, right-click on State1, and in the menu, click on Add State. Click on the State icon beside  State2 twice, to open in the design area. Perform the following steps: In the Widgets pane, drag the Label widget  and place it at coordinates (0,13) and do the these steps: Type Welcome, email@test.com. In the Widget Interactions and Notes pane, click in the Shape Name field and type WelcomeLabel. In the Widget Properties and Style pane, with the Style tab selected scroll to Font, change the font size to 9, and click on the Italic icon . In the Widgets pane, drag the Button Shape widget  and place it at coordinates (164,10). Type Log Out. In the toolbar, change w: to 56 and h: to 16. In the Widget Interactions and Notes pane, click on the Shape Name field and type LogOutButton. To complete the design of the Header master, we need to rename the menu items on the HzMenu. In the Masters pane, double-click on the Header master to open in the design area. Click on the HzMenu at coordinates (250,80). Perform the following steps: Click on the first menu item and type Random Musings. In the Widget Interactions and Notes pane, click on the Menu Item Name field and type RandomMusingsMenuItem. Click on Case 1 under the OnClick event and press the Delete key. Click on Create Link…. In the pop-up sitemap, click on Random Musings. Again, click on the first menu item and type Accolades and News. In the Widget Interactions and Notes pane, click on the Menu Item Name field and type AccoladesMenuItem. Click on Case 1 under the OnClick event and press the Delete key. Click on Create Link…. In the pop-up sitemap, click on Accolades and News. Click on the first menu item and type About. In the Widget Interactions and Notes pane, click on the Menu Item Name field and type AboutMenuItem. Click on Case 1 under the OnClick event and press the Delete key. Click on Create Link…. In the pop-up sitemap, click on About. We will now create a registration lightbox that will be shown when the user clicks on the NewUserLink. To display a dynamic panel in a lightbox, we will use the OnShow action with the option treat as lightbox set. We will use the Registration dynamic panel's Pin to Browser property to have the dynamic panel shown in the center and middle of the window. Learn more at http://www.axure.com/learn/dynamic-panels/basic/lightbox-tutorial. In the Masters pane, double-click on the icon  next to the Header master to open in the design area. In the Widgets pane, drag the Dynamic Panel widget  and place it at coordinates (310,200). In the toolbar, change w: to 250, h: to 250, and click on the Hidden checkbox. In the Widget Interactions and Notes pane, click on the Dynamic Panel Name field and type RegistrationLightBoxDP. In the Widget Manager pane with the Properties tab selected, click on Pin to Browser. In the Pin to Browser dialog box, click on the checkbox next to Pin to browser window and click on OK. In the Widget Manager pane, under the RegistrationLightBoxDP, click on the State icon  beside State1 twice to open in the design area. In the Widgets pane, drag the Rectangle widget  and place it at coordinates (0,0). In the Widget Interactions and Notes pane, click on the Shape Name field and type BackgroundRectangle. In the toolbar, change w: to 250 and h: to 250. Again in the Widgets pane, drag the Heading2 widget  and place it at coordinates (25,20). With the Heading2 widget selected, type Registration. In the toolbar, change w: to 141 and h: to 28. In the Widget Interactions and Notes pane, click on the Shape Name field and type RegistrationHeading. Repeat steps 8-10 to complete the design of the RegistrationLightBoxDP using the following table (* if applicable): Widget Coordinates Text* (Shown on Widget) Width* (w:) Height* (h:) Name field (In the Widget Interactions and Notes pane)   Label (25,67) Enter Email     EnterEmailLabel   Text Field (25,86)       EnterEmailField   Label (25,121) Enter Password     EnterPasswordLabel   Text Field (25,140)       EnterPasswordField   Button Shape (25,190) Submit 200 30 SubmitButton Click on the EnterEmailField text field at coordinates (25,86). In the Widget Properties and Style pane, with the Properties tab selected, scroll to Text Field and perform the following steps: Next to Hint Text, enter Email. Click on Hint Style. In the Set Interaction Styles dialog box, click on the checkbox next to Font Color. Click on the down arrow next to the Text Color icon . In the drop-down menu, in the # text field, enter 999999. Click on OK. Click on the EnterPasswordField text field at coordinates (25,140). In the Widget Properties and Style pane, with the Properties tab selected, scroll to Text Field and perform the following steps: Click on the drop-down menu next to Type and select Password. Next to Hint Text, enter Password. Click on Hint Style. In the Set Interaction Styles dialog box, click on the checkbox next to Font Color. Click on the down arrow next to the Text Color icon . In the drop-down menu, in the # text field, enter 999999. Click on OK. With the updates completed for the Header master, we are now ready to define the interactions. Refining the interactions for our Header master We will need to add additional interactions for Log In and Registration on our Header master. Interactions with our Header master will be triggered by the following named widgets and events: Dynamic Panel State Widget Event LoginDP State1 LoginButton OnClick LoginDP State1 NewUserLink OnClick LoginDP State1 ForgotLink OnClick LoginDP State2 LogOutButton OnClick RegistrationLightBoxDP State1 SubmitButton OnClick We will now define the interactions for each widget, starting with LoginButton. Defining interactions for the LoginButton When the LoginButton is clicked, the OnClick event will evaluate if the text entered in the EmailTextField and PasswordTextField equals the e-mail and password variable values. If the variables are valid, LoginDP will be set to State2 and text on the WelcomeLabel will be updated. If the variables values are not equal, we will show an error message. We will define these actions by creating two cases: ValidateUser and ShowErrorMessage. Validating the user's email and password To define the ValidateUser case for the OnClick interaction, open the LogInDP State1 in the design area. Click on the LogInButton at coordinates (164,10). In the Widget Interactions and Notes pane with the Interactions tab selected, click on Add Case…. A Case Editor dialog box will open. In the Case Name field, type ValidateUser. In the Case Editor dialog, perform the following steps: You will see the Condition Builder window similar to the one shown in the following screenshot after the first and second conditions are defined: Create the first condition. Click on the Add Condition button. In the Condition Builder dialog box, in the outlined condition box, perform the following steps: In the first dropdown, select text on widget. In the second dropdown, select EmailTextField. In the third dropdown, select equals. In the fourth dropdown, select value. In the fifth dropdown, select [[Email]]. Click the green + sign. Create the second condition. Click on the Add Condition button. In the Condition Builder dialog box, in the outlined condition box, perform the following steps: In the first dropdown, select text on widget. In the second dropdown, select PasswordTextField. In the third dropdown, select equals. In the fourth dropdown, select value. In the fifth dropdown, select [[Password]]. Click on OK. Once the following three actions are defined, you should see the Case Editor similar to the one shown in the following screenshot: Create the first action. To set panel state for the LogInDP dynamic panel, perform the following steps: Under Click to add actions, scroll to the Dynamic Panels drop-down menu and click on Set Panel State. Under Configure actions, click on the checkbox next to LoginDP. Next to Select the state, click on the dropdown and select State2. Create the second action. To set text for the WelcomeLabel, perform the following steps: Under Click to add actions, scroll to the Widgets drop-down menu and click on Set Text. Under Configure actions, click the checkbox next to WelcomeLabel. Under Set text to, click on the dropdown and select value. In the text field, enter Welcome, [[Email]]. Create the third action. To set value of the LoggedIn variable, perform the following steps: Under Click to add actions, scroll to the Variables drop-down menu and click on Set Variable Value. Under Configure actions, click on the checkbox next to LoggedIn. Under Set variable to, click on the first dropdown and click on value. In the text field, enter [[Email]]. Click on OK. With the ValidateUser case completed, next we will create the ShowErrorMessage case. Creating the ShowErrorMessage case To create the ShowErrorMessage case, in the Widget Interactions and Notes pane with the Interactions tab selected, click on Add Case…. A Case Editor dialog box will open. In the Case Name field, type ShowErrorMessage. Create the action. To show the ErrorMessage label, perform the following steps: Under Click to add actions, scroll to the Widgets dropdown, click on the Show/Hide dropdown and click on Show. Under Configure actions, under LoginDP dynamic panel, click on the checkbox next to ErrorMessage. Click on OK. Next, we will enable the interaction for the NewUserLink. Enabling interaction for the NewUserLink When the NewUserLink is clicked, the OnClick event will show the RegistrationLightBox dynamic panel as a lightbox, as shown in the following screenshot: With the LogInDP State1 still opened in the design area, click on the NewUserLink at coordinates (0,30). To enable the OnClick event in the Widget Interactions and Notes pane with the Interactions tab selected, click on Add Case…. A Case Editor dialog box will open. In the Case Name field, type ShowLightBox. Now, create the action; to show the RegistrationLightBox, perform the following steps: Under Click to add actions, scroll to the Widgets dropdown, click on the Show/Hide dropdown, and click on Show. Under Configure actions, click on the checkbox next to RegistrationLightBoxDP. Next go to More options, click on the dropdown and select treat as lightbox. Click on OK. Next, we will activate interactions for the ForgotLink. Activating interactions for the ForgotLink When the ForgotLink is clicked, the OnClick event will show the RegistrationLightBox dynamic panel as a lightbox, the RegistrationHeading text will be updated to display Forgot Password? and the EnterPassworldLabel, as well as the EnterPasswordField, will be hidden. To enable the OnClick event, in the Widget Interactions and Notes pane with the Interactions tab selected, click on Add Case…. A Case Editor dialog box will open. In the Case Name field, type ShowForgotLB. In the Case Editor dialog, perform the following steps: Create the first action; to show the RegistrationLightBox, perform the following steps: Under Click to add actions, scroll to the Widgets dropdown, click on the Show/Hide dropdown and click on Show. Under Configure actions, click on the checkbox next to RegistrationLightBoxDP. Next, go to More options, click on the dropdown and select treat as lightbox. Create the second action; to set text for the RegistrationHeading, perform the following steps: Under Click to add actions, scroll to the Widgets drop-down menu and click on Set Text. Under Configure actions, click on the checkbox next to RegistrationHeading. Under Set text to, click on the dropdown and select value. In the text field, enter Forgot Password?. Create the third action; to hide the EnterPasswordLabel and EnterPasswordField, perform the following steps: Under Click to add actions, scroll to the Widgets dropdown, click on the Show/Hide dropdown, and click on Hide. Under Configure actions, under RegistrationLightBoxDP, click on the checkboxes next to EnterPasswordLabel and EnterPasswordField. Click on OK. We have now completed the interactions for State1 of LoginDP. Next, we will facilitate interactions for the LogOutButton. Facilitating interactions for the LogOutButton When the LogOutButton is clicked, the OnClick event will perform the following actions: Hide the ErrorMessage on the LoginDP State1 Set text for PasswordTextField and EmailTextField Set panel state for LoginDP to State1 Set variable value for LoggedIn To enable the OnClick event, open the LogInDP State2 in the design area. Click on the LogInOut at coordinates (164,10). In the Widget Interactions and Notes pane, with the Interactions tab selected, click on Add Case…. A Case Editor dialog box will open. In the Case Name field, type LogOut. In the Case Editor dialog, perform the following steps: Create the first action; to hide the ErrorMessage, perform the following steps: Under Click to add actions, scroll to the Widgets dropdown, click on the Show/Hide dropdown, and click on Hide. Under Configure actions, under LoginDP, click on the checkbox next to ErrorMessage. Create the second action; to set text for the PasswordTextField and EmailTextField, perform the following steps: Under Click to add actions, scroll to the Widgets drop-down menu and click on Set Text. Under Configure actions, click the checkbox next to PasswordTextField. Under Set text to, click the dropdown and select value. In the text field, clear any text shown. Under Configure actions, click the checkbox next to EmailTextField. Under Set text to, click on the dropdown and select value. In the text field, enter Email. Create the third action; to set panel state for the LogInDP dynamic panel, perform the following steps: Under Click to add actions, scroll to the Dynamic Panels drop-down menu and click on Set Panel State. Under Configure actions, click on the checkbox next to LoginDP. Next to Select the state, click on the dropdown and select State1. Create the fourth action. To set variable value of LoggedIn, perform the following steps: Under Click to add actions, scroll to the Variables drop-down menu and click on Set Variable Value. Under Configure actions, click on the checkbox next to LoggedIn. Under Set variable to, click on the first dropdown and click on value. In the text field, enter No. Click on OK. We have now completed interactions for State2 of the LoginDP. Next, we will construct interactions for the RegistrationLightBoxDP. Constructing interactions for the RegistrationLightBoxDP When the LoginButton is clicked, the OnClick event hides RegistrationLightBoxDp and sets the Email and Password variable values to the text entered in the EnterEmailField and EnterPasswordField. Also, if text on the RegistrationHeading label is equal to Registration, LoginDP will be set to State2. We will define these actions by creating two cases: UpdateVariables and ShowLogInState. Updating Variables and hiding the RegistrationLightBoxDP In the Widget Manger pane, double-click on the RegistrationLightBoxDP State1 to open in the design area. To define the UpdateVariables case for the OnClick interaction, click on the SubmitButton at coordinates (25,190). In the Widget Interactions and Notes pane with the Interactions tab selected, click on Add Case…. A Case Editor dialog box will open. In the Case Name field, type UpdateVariables. In the Case Editor dialog, perform the following steps: The following screenshot shows Case Editor with the actions defined: Create the first action; to set variable value for the Email and Password variables, perform the following steps: Under Click to add actions, scroll to the Widgets drop-down menu and click on Set Variable Value. Under Configure actions, click on the checkbox next to Email. Under Set variable to, click on the first dropdown and select text on widget. Click on the second dropdown and select EnterEmailField. Under Configure actions, click on the checkbox next to Password. Under Set variable to, click on the first dropdown and select text on widget. Click on the second dropdown and select EnterPasswordField. Create the second action; to hide RegistrationLightBoxDP, perform the following steps: Under Click to add actions, scroll to the Widgets dropdown, click on the Show/Hide dropdown and click on Hide. Under Configure actions, click on the checkbox next to RegistrationLightBoxDP. Click on OK. With the UpdateVariables case completed, next we will create the ShowLogInState case. Creating the ShowLoginState case To create the ShowLogInState case, in the Widget Interactions and Notes pane with the Interactions tab selected click on Add Case…. A Case Editor dialog box will open. In the Case Name field, type ShowLogInState. In the Case Editor dialog, perform the following steps: Click on the Add Condition button to create the first condition. In the Condition Builder dialog box, go to the outlined condition box and perform the following steps: In the first dropdown, select text on widget. In the second dropdown, select RegistrationHeadline. In the third dropdown, select equals. In the fourth dropdown, select value. In the fifth dropdown, select Registration. Click on OK. Create the first action; to set text for the WelcomeLabel, perform the following steps: Under Click to add actions, scroll to the Widgets drop-down menu and click on Set Text. Under Configure actions, click on the checkbox next to WelcomeLabel. Under Set text to, click on the dropdown and select value. In the text field, enter Welcome, [[Email]]. Click on OK. Create the second action; to set panel state for the LogInDP dynamic panel, perform the following steps: Under Click to add actions, scroll to the Dynamic Panels drop-down menu and click on Set Panel State. Under Configure actions, click on the checkbox next to LoginDP. Next to Select the state, click on the dropdown and select State2. Create the third action; to set value of the LoggedIn variable, perform the following steps: Under Click to add actions, scroll to the Variables drop-down menu and click on Set Variable Value. Under Configure actions, click on the checkbox next to LoggedIn. Under Set variable to, click on the first dropdown and click on value. In the text field, enter [[Email]]. Click on OK. Under the OnClick event, right-click on the ShowErrorMessage case and click on Toggle IF/ELSE IF. With our Header master updated, we are now ready to refresh data for our Forum repeater. Summary We learned how to leverage masters and pages from our community site to create a new blog site. We enhanced the Header master and refined the interactions for our Header master. Resources for Article: Further resources on this subject: Home Page Structure [article] Axure RP 6 Prototyping Essentials: Advanced Interactions [article] Common design patterns and how to prototype them [article]
Read more
  • 0
  • 0
  • 9264

article-image-bayesian-network-fundamentals
Packt
10 Aug 2015
25 min read
Save for later

Bayesian Network Fundamentals

Packt
10 Aug 2015
25 min read
In this article by Ankur Ankan and Abinash Panda, the authors of Mastering Probabilistic Graphical Models Using Python, we'll cover the basics of random variables, probability theory, and graph theory. We'll also see the Bayesian models and the independencies in Bayesian models. A graphical model is essentially a way of representing joint probability distribution over a set of random variables in a compact and intuitive form. There are two main types of graphical models, namely directed and undirected. We generally use a directed model, also known as a Bayesian network, when we mostly have a causal relationship between the random variables. Graphical models also give us tools to operate on these models to find conditional and marginal probabilities of variables, while keeping the computational complexity under control. (For more resources related to this topic, see here.) Probability theory To understand the concepts of probability theory, let's start with a real-life situation. Let's assume we want to go for an outing on a weekend. There are a lot of things to consider before going: the weather conditions, the traffic, and many other factors. If the weather is windy or cloudy, then it is probably not a good idea to go out. However, even if we have information about the weather, we cannot be completely sure whether to go or not; hence we have used the words probably or maybe. Similarly, if it is windy in the morning (or at the time we took our observations), we cannot be completely certain that it will be windy throughout the day. The same holds for cloudy weather; it might turn out to be a very pleasant day. Further, we are not completely certain of our observations. There are always some limitations in our ability to observe; sometimes, these observations could even be noisy. In short, uncertainty or randomness is the innate nature of the world. The probability theory provides us the necessary tools to study this uncertainty. It helps us look into options that are unlikely yet probable. Random variable Probability deals with the study of events. From our intuition, we can say that some events are more likely than others, but to quantify the likeliness of a particular event, we require the probability theory. It helps us predict the future by assessing how likely the outcomes are. Before going deeper into the probability theory, let's first get acquainted with the basic terminologies and definitions of the probability theory. A random variable is a way of representing an attribute of the outcome. Formally, a random variable X is a function that maps a possible set of outcomes ? to some set E, which is represented as follows: X : ? ? E As an example, let us consider the outing example again. To decide whether to go or not, we may consider the skycover (to check whether it is cloudy or not). Skycover is an attribute of the day. Mathematically, the random variable skycover (X) is interpreted as a function, which maps the day (?) to its skycover values (E). So when we say the event X = 40.1, it represents the set of all the days {?} such that  , where  is the mapping function. Formally speaking, . Random variables can either be discrete or continuous. A discrete random variable can only take a finite number of values. For example, the random variable representing the outcome of a coin toss can take only two values, heads or tails; and hence, it is discrete. Whereas, a continuous random variable can take infinite number of values. For example, a variable representing the speed of a car can take any number values. For any event whose outcome is represented by some random variable (X), we can assign some value to each of the possible outcomes of X, which represents how probable it is. This is known as the probability distribution of the random variable and is denoted by P(X). For example, consider a set of restaurants. Let X be a random variable representing the quality of food in a restaurant. It can take up a set of values, such as {good, bad, average}. P(X), represents the probability distribution of X, that is, if P(X = good) = 0.3, P(X = average) = 0.5, and P(X = bad) = 0.2. This means there is 30 percent chance of a restaurant serving good food, 50 percent chance of it serving average food, and 20 percent chance of it serving bad food. Independence and conditional independence In most of the situations, we are rather more interested in looking at multiple attributes at the same time. For example, to choose a restaurant, we won't only be looking just at the quality of food; we might also want to look at other attributes, such as the cost, location, size, and so on. We can have a probability distribution over a combination of these attributes as well. This type of distribution is known as joint probability distribution. Going back to our restaurant example, let the random variable for the quality of food be represented by Q, and the cost of food be represented by C. Q can have three categorical values, namely {good, average, bad}, and C can have the values {high, low}. So, the joint distribution for P(Q, C) would have probability values for all the combinations of states of Q and C. P(Q = good, C = high) will represent the probability of a pricey restaurant with good quality food, while P(Q = bad, C = low) will represent the probability of a restaurant that is less expensive with bad quality food. Let us consider another random variable representing an attribute of a restaurant, its location L. The cost of food in a restaurant is not only affected by the quality of food but also the location (generally, a restaurant located in a very good location would be more costly as compared to a restaurant present in a not-very-good location). From our intuition, we can say that the probability of a costly restaurant located at a very good location in a city would be different (generally, more) than simply the probability of a costly restaurant, or the probability of a cheap restaurant located at a prime location of city is different (generally less) than simply probability of a cheap restaurant. Formally speaking, P(C = high | L = good) will be different from P(C = high) and P(C = low | L = good) will be different from P(C = low). This indicates that the random variables C and L are not independent of each other. These attributes or random variables need not always be dependent on each other. For example, the quality of food doesn't depend upon the location of restaurant. So, P(Q = good | L = good) or P(Q = good | L = bad)would be the same as P(Q = good), that is, our estimate of the quality of food of the restaurant will not change even if we have knowledge of its location. Hence, these random variables are independent of each other. In general, random variables  can be considered as independent of each other, if: They may also be considered independent if: We can easily derive this conclusion. We know the following from the chain rule of probability: P(X, Y) = P(X) P(Y | X) If Y is independent of X, that is, if X | Y, then P(Y | X) = P(Y). Then: P(X, Y) = P(X) P(Y) Extending this result on multiple variables, we can easily get to the conclusion that a set of random variables are independent of each other, if their joint probability distribution is equal to the product of probabilities of each individual random variable. Sometimes, the variables might not be independent of each other. To make this clearer, let's add another random variable, that is, the number of people visiting the restaurant N. Let's assume that, from our experience we know the number of people visiting only depends on the cost of food at the restaurant and its location (generally, lesser number of people visit costly restaurants). Does the quality of food Q affect the number of people visiting the restaurant? To answer this question, let's look into the random variable affecting N, cost C, and location L. As C is directly affected by Q, we can conclude that Q affects N. However, let's consider a situation when we know that the restaurant is costly, that is, C = high and let's ask the same question, "does the quality of food affect the number of people coming to the restaurant?". The answer is no. The number of people coming only depends on the price and location, so if we know that the cost is high, then we can easily conclude that fewer people will visit, irrespective of the quality of food. Hence,  . This type of independence is called conditional independence. Installing tools Let's now see some coding examples using pgmpy, to represent joint distributions and independencies. Here, we will mostly work with IPython and pgmpy (and a few other libraries) for coding examples. So, before moving ahead, let's get a basic introduction to these. IPython IPython is a command shell for interactive computing in multiple programming languages, originally developed for the Python programming language, which offers enhanced introspection, rich media, additional shell syntax, tab completion, and a rich history. IPython provides the following features: Powerful interactive shells (terminal and Qt-based) A browser-based notebook with support for code, text, mathematical expressions, inline plots, and other rich media Support for interactive data visualization and use of GUI toolkits Flexible and embeddable interpreters to load into one's own projects Easy-to-use and high performance tools for parallel computing You can install IPython using the following command: >>> pip3 install ipython To start the IPython command shell, you can simply type ipython3 in the terminal. For more installation instructions, you can visit http://ipython.org/install.html. pgmpy pgmpy is a Python library to work with Probabilistic Graphical models. As it's currently not on PyPi, we will need to build it manually. You can get the source code from the Git repository using the following command: >>> git clone https://github.com/pgmpy/pgmpy Now cd into the cloned directory switch branch for version used and build it with the following code: >>> cd pgmpy >>> git checkout book/v0.1 >>> sudo python3 setup.py install For more installation instructions, you can visit http://pgmpy.org/install.html. With both IPython and pgmpy installed, you should now be able to run the examples. Representing independencies using pgmpy To represent independencies, pgmpy has two classes, namely IndependenceAssertion and Independencies. The IndependenceAssertion class is used to represent individual assertions of the form of  or  . Let's see some code to represent assertions: # Firstly we need to import IndependenceAssertion In [1]: from pgmpy.independencies import IndependenceAssertion # Each assertion is in the form of [X, Y, Z] meaning X is # independent of Y given Z. In [2]: assertion1 = IndependenceAssertion('X', 'Y') In [3]: assertion1 Out[3]: (X _|_ Y) Here, assertion1 represents that the variable X is independent of the variable Y. To represent conditional assertions, we just need to add a third argument to IndependenceAssertion: In [4]: assertion2 = IndependenceAssertion('X', 'Y', 'Z') In [5]: assertion2 Out [5]: (X _|_ Y | Z) In the preceding example, assertion2 represents . IndependenceAssertion also allows us to represent assertions in the form of  . To do this, we just need to pass a list of random variables as arguments: In [4]: assertion2 = IndependenceAssertion('X', 'Y', 'Z') In [5]: assertion2 Out[5]: (X _|_ Y | Z) Moving on to the Independencies class, an Independencies object is used to represent a set of assertions. Often, in the case of Bayesian or Markov networks, we have more than one assertion corresponding to a given model, and to represent these independence assertions for the models, we generally use the Independencies object. Let's take a few examples: In [8]: from pgmpy.independencies import Independencies # There are multiple ways to create an Independencies object, we # could either initialize an empty object or initialize with some # assertions.   In [9]: independencies = Independencies() # Empty object In [10]: independencies.get_assertions() Out[10]: []   In [11]: independencies.add_assertions(assertion1, assertion2) In [12]: independencies.get_assertions() Out[12]: [(X _|_ Y), (X _|_ Y | Z)] We can also directly initialize Independencies in these two ways: In [13]: independencies = Independencies(assertion1, assertion2) In [14]: independencies = Independencies(['X', 'Y'],                                          ['A', 'B', 'C']) In [15]: independencies.get_assertions() Out[15]: [(X _|_ Y), (A _|_ B | C)] Representing joint probability distributions using pgmpy We can also represent joint probability distributions using pgmpy's JointProbabilityDistribution class. Let's say we want to represent the joint distribution over the outcomes of tossing two fair coins. So, in this case, the probability of all the possible outcomes would be 0.25, which is shown as follows: In [16]: from pgmpy.factors import JointProbabilityDistribution as         Joint In [17]: distribution = Joint(['coin1', 'coin2'],                              [2, 2],                              [0.25, 0.25, 0.25, 0.25]) Here, the first argument includes names of random variable. The second argument is a list of the number of states of each random variable. The third argument is a list of probability values, assuming that the first variable changes its states the slowest. So, the preceding distribution represents the following: In [18]: print(distribution) +--------------------------------------+ ¦ coin1   ¦ coin2   ¦   P(coin1,coin2) ¦ ¦---------+---------+------------------¦ ¦ coin1_0 ¦ coin2_0 ¦   0.2500         ¦ +---------+---------+------------------¦ ¦ coin1_0 ¦ coin2_1 ¦   0.2500         ¦ +---------+---------+------------------¦ ¦ coin1_1 ¦ coin2_0 ¦   0.2500         ¦ +---------+---------+------------------¦ ¦ coin1_1 ¦ coin2_1 ¦   0.2500         ¦ +--------------------------------------+ We can also conduct independence queries over these distributions in pgmpy: In [19]: distribution.check_independence('coin1', 'coin2') Out[20]: True Conditional probability distribution Let's take an example to understand conditional probability better. Let's say we have a bag containing three apples and five oranges, and we want to randomly take out fruits from the bag one at a time without replacing them. Also, the random variables  and  represent the outcomes in the first try and second try respectively. So, as there are three apples and five oranges in the bag initially,  and  . Now, let's say that in our first attempt we got an orange. Now, we cannot simply represent the probability of getting an apple or orange in our second attempt. The probabilities in the second attempt will depend on the outcome of our first attempt and therefore, we use conditional probability to represent such cases. Now, in the second attempt, we will have the following probabilities that depend on the outcome of our first try:  ,  ,  , and  . The Conditional Probability Distribution (CPD) of two variables  and  can be represented as  , representing the probability of  given  that is the probability of  after the event  has occurred and we know it's outcome. Similarly, we can have  representing the probability of  after having an observation for . The simplest representation of CPD is tabular CPD. In a tabular CPD, we construct a table containing all the possible combinations of different states of the random variables and the probabilities corresponding to these states. Let's consider the earlier restaurant example. Let's begin by representing the marginal distribution of the quality of food with Q. As we mentioned earlier, it can be categorized into three values {good, bad, average}. For example, P(Q) can be represented in the tabular form as follows: Quality P(Q) Good 0.3 Normal 0.5 Bad 0.2 Similarly, let's say P(L) is the probability distribution of the location of the restaurant. Its CPD can be represented as follows: Location P(L) Good 0.6 Bad 0.4 As the cost of restaurant C depends on both the quality of food Q and its location L, we will be considering P(C | Q, L), which is the conditional distribution of C, given Q and L: Location Good Bad Quality Good Normal Bad Good Normal Bad Cost             High 0.8 0.6 0.1 0.6 0.6 0.05 Low 0.2 0.4 0.9 0.4 0.4 0.95 Representing CPDs using pgmpy Let's first see how to represent the tabular CPD using pgmpy for variables that have no conditional variables: In [1]: from pgmpy.factors import TabularCPD   # For creating a TabularCPD object we need to pass three # arguments: the variable name, its cardinality that is the number # of states of the random variable and the probability value # corresponding each state. In [2]: quality = TabularCPD(variable='Quality',                              variable_card=3,                                values=[[0.3], [0.5], [0.2]]) In [3]: print(quality) +----------------------+ ¦ ['Quality', 0] ¦ 0.3 ¦ +----------------+-----¦ ¦ ['Quality', 1] ¦ 0.5 ¦ +----------------+-----¦ ¦ ['Quality', 2] ¦ 0.2 ¦ +----------------------+ In [4]: quality.variables Out[4]: OrderedDict([('Quality', [State(var='Quality', state=0),                                  State(var='Quality', state=1),                                  State(var='Quality', state=2)])])   In [5]: quality.cardinality Out[5]: array([3])   In [6]: quality.values Out[6]: array([0.3, 0.5, 0.2]) You can see here that the values of the CPD are a 1D array instead of a 2D array, which you passed as an argument. Actually, pgmpy internally stores the values of the TabularCPD as a flattened numpy array. In [7]: location = TabularCPD(variable='Location',                               variable_card=2,                              values=[[0.6], [0.4]]) In [8]: print(location) +-----------------------+ ¦ ['Location', 0] ¦ 0.6 ¦ +-----------------+-----¦ ¦ ['Location', 1] ¦ 0.4 ¦ +-----------------------+ However, when we have conditional variables, we also need to specify them and the cardinality of those variables. Let's define the TabularCPD for the cost variable: In [9]: cost = TabularCPD(                      variable='Cost',                      variable_card=2,                      values=[[0.8, 0.6, 0.1, 0.6, 0.6, 0.05],                              [0.2, 0.4, 0.9, 0.4, 0.4, 0.95]],                      evidence=['Q', 'L'],                      evidence_card=[3, 2]) Graph theory The second major framework for the study of probabilistic graphical models is graph theory. Graphs are the skeleton of PGMs, and are used to compactly encode the independence conditions of a probability distribution. Nodes and edges The foundation of graph theory was laid by Leonhard Euler when he solved the famous Seven Bridges of Konigsberg problem. The city of Konigsberg was set on both sides by the Pregel river and included two islands that were connected and maintained by seven bridges. The problem was to find a walk to exactly cross all the bridges once in a single walk. To visualize the problem, let's think of the graph in Fig 1.1: Fig 1.1: The Seven Bridges of Konigsberg graph Here, the nodes a, b, c, and d represent the land, and are known as vertices of the graph. The line segments ab, bc, cd, da, ab, and bc connecting the land parts are the bridges and are known as the edges of the graph. So, we can think of the problem of crossing all the bridges once in a single walk as tracing along all the edges of the graph without lifting our pencils. Formally, a graph G = (V, E) is an ordered pair of finite sets. The elements of the set V are known as the nodes or the vertices of the graph, and the elements of  are the edges or the arcs of the graph. The number of nodes or cardinality of G, denoted by |V|, are known as the order of the graph. Similarly, the number of edges denoted by |E| are known as the size of the graph. Here, we can see that the Konigsberg city graph shown in Fig 1.1 is of order 4 and size 7. In a graph, we say that two vertices, u, v ? V are adjacent if u, v ? E. In the City graph, all the four vertices are adjacent to each other because there is an edge for every possible combination of two vertices in the graph. Also, for a vertex v ? V, we define the neighbors set of v as  . In the City graph, we can see that b and d are neighbors of c. Similarly, a, b, and c are neighbors of d. We define an edge to be a self loop if the start vertex and the end vertex of the edge are the same. We can put it more formally as, any edge of the form (u, u), where u ? V is a self loop. Until now, we have been talking only about graphs whose edges don't have a direction associated with them, which means that the edge (u, v) is same as the edge (v, u). These types of graphs are known as undirected graphs. Similarly, we can think of a graph whose edges have a sense of direction associated with it. For these graphs, the edge set E would be a set of ordered pair of vertices. These types of graphs are known as directed graphs. In the case of a directed graph, we also define the indegree and outdegree for a vertex. For a vertex v ? V, we define its outdegree as the number of edges originating from the vertex v, that is,  . Similarly, the indegree is defined as the number of edges that end at the vertex v, that is,  . Walk, paths, and trails For a graph G = (V, E) and u,v ? V, we define a u - v walk as an alternating sequence of vertices and edges, starting with u and ending with v. In the City graph of Fig 1.1, we can have an example of a - d walk as . If there aren't multiple edges between the same vertices, then we simply represent a walk by a sequence of vertices. As in the case of the Butterfly graph shown in Fig 1.2, we can have a walk W : a, c, d, c, e: Fig 1.2: Butterfly graph—a undirected graph A walk with no repeated edges is known as a trail. For example, the walk  in the City graph is a trail. Also, a walk with no repeated vertices, except possibly the first and the last, is known as a path. For example, the walk  in the City graph is a path. Also, a graph is known as cyclic if there are one or more paths that start and end at the same node. Such paths are known as cycles. Similarly, if there are no cycles in a graph, it is known as an acyclic graph. Bayesian models In most of the real-life cases when we would be representing or modeling some event, we would be dealing with a lot of random variables. Even if we would consider all the random variables to be discrete, there would still be exponentially large number of values in the joint probability distribution. Dealing with such huge amount of data would be computationally expensive (and in some cases, even intractable), and would also require huge amount of memory to store the probability of each combination of states of these random variables. However, in most of the cases, many of these variables are marginally or conditionally independent of each other. By exploiting these independencies, we can reduce the number of values we need to store to represent the joint probability distribution. For instance, in the previous restaurant example, the joint probability distribution across the four random variables that we discussed (that is, quality of food Q, location of restaurant L, cost of food C, and the number of people visiting N) would require us to store 23 independent values. By the chain rule of probability, we know the following: P(Q, L, C, N) = P(Q) P(L|Q) P(C|L, Q) P(N|C, Q, L) Now, let us try to exploit the marginal and conditional independence between the variables, to make the representation more compact. Let's start by considering the independency between the location of the restaurant and quality of food over there. As both of these attributes are independent of each other, P(L|Q) would be the same as P(L). Therefore, we need to store only one parameter to represent it. From the conditional independence that we have seen earlier, we know that  . Thus, P(N|C, Q, L) would be the same as P(N|C, L); thus needing only four parameters. Therefore, we now need only (2 + 1 + 6 + 4 = 13) parameters to represent the whole distribution. We can conclude that exploiting independencies helps in the compact representation of joint probability distribution. This forms the basis for the Bayesian network. Representation A Bayesian network is represented by a Directed Acyclic Graph (DAG) and a set of Conditional Probability Distributions (CPD) in which: The nodes represent random variables The edges represent dependencies For each of the nodes, we have a CPD In our previous restaurant example, the nodes would be as follows: Quality of food (Q) Location (L) Cost of food (C) Number of people (N) As the cost of food was dependent on the quality of food (Q) and the location of the restaurant (L), there will be an edge each from Q ? C and L ? C. Similarly, as the number of people visiting the restaurant depends on the price of food and its location, there would be an edge each from L ? N and C ? N. The resulting structure of our Bayesian network is shown in Fig 1.3: Fig 1.3: Bayesian network for the restaurant example Factorization of a distribution over a network Each node in our Bayesian network for restaurants has a CPD associated to it. For example, the CPD for the cost of food in the restaurant is P(C|Q, L), as it only depends on the quality of food and location. For the number of people, it would be P(N|C, L) . So, we can generalize that the CPD associated with each node would be P(node|Par(node)) where Par(node) denotes the parents of the node in the graph. Assuming some probability values, we will finally get a network as shown in Fig 1.4: Fig 1.4: Bayesian network of restaurant along with CPDs Let us go back to the joint probability distribution of all these attributes of the restaurant again. Considering the independencies among variables, we concluded as follows: P(Q,C,L,N) = P(Q)P(L)P(C|Q, L)P(N|C, L) So now, looking into the Bayesian network (BN) for the restaurant, we can say that for any Bayesian network, the joint probability distribution  over all its random variables {X1,X2,...,Xn} can be represented as follows: This is known as the chain rule for Bayesian networks. Also, we say that a distribution P factorizes over a graph G, if P can be encoded as follows: Here, ParG(X) is the parent of X in the graph G. Summary In this article, we saw how we can represent a complex joint probability distribution using a directed graph and a conditional probability distribution associated with each node, which is collectively known as a Bayesian network. Resources for Article:   Further resources on this subject: Web Scraping with Python [article] Exact Inference Using Graphical Models [article] wxPython: Design Approaches and Techniques [article]
Read more
  • 0
  • 0
  • 47571
article-image-introduction-wep
Packt
10 Aug 2015
4 min read
Save for later

An Introduction to WEP

Packt
10 Aug 2015
4 min read
In this article by Marco Alamanni, author of the book, Kali Linux Wireless Penetration Testing Essentials, has explained that the WEP protocol was introduced with the original 802.11 standard as a means to provide authentication and encryption to wireless LAN implementations. It is based on the RC4 (Rivest Cipher 4) stream cypher with a preshared secret key (PSK) of 40 or 104 bits, depending on the implementation. A 24 bit pseudo-random Initialization Vector (IV) is concatenated with the preshared key to generate the per-packet keystream used by RC4 for the actual encryption and decryption processes. Thus, the resulting keystream could be 64 or 128 bits long. (For more resources related to this topic, see here.) In the encryption phase, the keystream is XORed with the plaintext data to obtain the encrypted data, while in the decryption phase the encrypted data is XORed with the keystream to obtain the plaintext data. The encryption process is shown in the following diagram: Attacks against WEP First of all, we must say that WEP is an insecure protocol and has been deprecated by the Wi-Fi Alliance. It suffers from various vulnerabilities related to the generation of the keystreams, to the use of IVs and to the length of the keys. The IV is used to add randomness to the keystream, trying to avoid the reuse of the same keystream to encrypt different packets. This purpose has not been accomplished in the design of WEP, because the IV is only 24 bits long (with 2^24 = 16,777,216 possible values) and it is transmitted in clear-text within each frame. Thus, after a certain period of time (depending on the network traffic) the same IV, and consequently the same keystream, will be reused, allowing the attacker to collect the relative cypher texts and perform statistical attacks to recover the plain texts and the key. The first well-known attack against WEP was the Fluhrer, Mantin and Shamir (FMS) attack, back in 2001. The FMS attack relies on the way WEP generates the keystreams and on the fact that it also uses weak IVs to generate weak keystreams, making possible for an attacker to collect a sufficient number of packets encrypted with these keystreams, analyze them, and recover the key. The number of IVs to be collected to complete the FMS attack is about 250,000 for 40-bit keys and 1,500,000 for 104-bit keys. The FMS attack has been enhanced by Korek, improving its performances. Andreas Klein found more correlations between the RC4 keystream and the key than the ones discovered by Fluhrer, Mantin, and Shamir, that can used to crack the WEP key. In 2007, Pyshkin, Tews, and Weinmann (PTW) extended Andreas Klein's research and improved the FMS attack, significantly reducing the number of IVs needed to successfully recover the WEP key. Indeed, the PTW attack does not rely on weak IVs like the FMS attack does and is very fast and effective. It is able to recover a 104-bit WEP key with a success probability of 50 percent using less than 40,000 frames and with a probability of 95 percent with 85,000 frames. The PTW attack is the default method used by Aircrack-ng to crack WEP keys. Both the FMS and PTW attacks need to collect quite a large number of frames to succeed and can be conducted passively, sniffing the wireless traffic on the same channel of the target AP and capturing frames. The problem is that, in normal conditions, we will have to spend quite a long time to passively collect all the necessary packets for the attacks, especially with the FMS attack. To accelerate the process, the idea is to re-inject frames in the network to generate traffic in response so that we could collect the necessary IVs more quickly. A type of frame that is suitable for this purpose is the ARP request, because the AP broadcasts it and each time with a new IV. As we are not associated with the AP, if we send frames to it directly, they are discarded and a de-authentication frame is sent. Instead, we can capture ARP requests from associated clients and retransmit them to the AP. This technique is called the ARP Request Replay attack and is also adopted by Aircrack-ng for the implementation of the PTW attack. Summary In this article, we covered the WEP protocol, the attacks that have been developed to crack the keys. Resources for Article: Further resources on this subject: Kali Linux – Wireless Attacks [article] What is Kali Linux [article] Penetration Testing [article]
Read more
  • 0
  • 0
  • 9674

article-image-oracle-goldengate-12c-overview
Packt
10 Aug 2015
21 min read
Save for later

Oracle GoldenGate 12c — An Overview

Packt
10 Aug 2015
21 min read
In this article by John P Jeffries, author of the book Oracle GoldenGate 12c Implementer's Guide, he provides an introduction to Oracle GoldenGate by describing the key components, processes, and considerations required to build and implement a GoldenGate solution. John tells you how to address some of the issues that influence the decision-making process when you design a GoldenGate solution. He focuses on the additional configuration options available in Oracle GoldenGate 12c (For more resources related to this topic, see here.) 12c new features Oracle has provided some exciting new features in their 12c version of GoldenGate, some of which we have already touched upon. Following the official desupport of Oracle Streams in Oracle Database 12c, Oracle has essentially migrated some of the key features to its strategic product. You will find that GoldenGate now has a tighter integration with the Oracle database, enabling enhanced functionality. Let's explore some of the new features available in Oracle GoldenGate 12c. Integrated capture Integrated capture has been available since Oracle GoldenGate 11gR2 with Oracle Database 11g (11.2.0.3). Originally decoupled from the database, GoldenGate's new architecture provides the option to integrate its Extract process(es) with the Oracle database. This enables GoldenGate to access the database's data dictionary and undo tablespace, providing replication support for advanced features and data types. Oracle GoldenGate 12c still supports the original Extract configuration, known as Classic Capture. Integrated Replicat Integrated Replicat is a new feature in Oracle GoldenGate 12c for the delivery of data to Oracle Database 11g (11.2.0.4) or 12c. The performance enhancement provides better scalability and load balancing that leverages the database parallel apply servers for automatic, dependency-aware parallel Replicat processes. With Integrated Replicat, there is no need for users to manually split the delivery process into multiple threads and manage multiple parameter files. GoldenGate now uses a lightweight streaming API to prepare, coordinate, and apply the data to the downstream database. Oracle GoldenGate 12c still supports the original Replicat configuration, known as Classic Delivery. Downstream capture Downstream capture was one of my favorite Oracle Stream features. It allows for a combined in-memory capture and apply process that achieves very low latency even in heavy data load situations. Like Streams, GoldenGate builds on this feature by employing a real-time downstream capture process. This method uses Oracle Data Guard's log transportation mechanism, which writes changed data to standby redo logs. It provides a best-of-both-worlds approach, enabling a real-time mine configuration that falls back to archive log mining when the apply process cannot keep up. In addition, the real-time mine process is re-enabled automatically when the data throughput is less. Installation One of the major changes in Oracle GoldenGate 12c is the installation method. Like other Oracle products, Oracle GoldenGate 12c is now installed using the Java-based Oracle Universal Installer (OUI) in either the interactive or silent mode. OUI reads the Oracle Inventory on your system to discover existing installations (Oracle Homes), allowing you to install, deinstall, or clone software products. Upgrading to 12c Whether you wish to upgrade your current GoldenGate installation from Oracle GoldenGate 11g Release 2 or from an earlier version, the steps are the same. Simply stop all the GoldenGate running processes on your database server, backup the GoldenGate home, and then use OUI to perform the fresh installation. It is important to note, however, while restarting replication, ensure the capture process begins from the point at which it was gracefully stopped to guarantee against lost synchronization data. Multitenant database replication As the version suggests, Oracle GoldenGate 12c now supports data replication for Oracle Database 12c. Those familiar with the 12c database features will be aware of the multitenant container database (CDB) that provides database consolidation. Each CDB consists of a root container and one or more pluggable databases (PDB). The PDB can contain multiple schemas and objects, just like a conventional database that GoldenGate replicates data to and from. The GoldenGate Extract process pulls data from multiple PDBs or containers in the source, combining the changed data into a single trail file. Replicat, however, splits the data into multiple process groups in order to apply the changes to a target PDB. Coordinated Delivery The Coordinated Delivery option applies to the GoldenGate Replicat process when configured in the classic mode. It provides a performance gain by automatically splitting the delivered data from a remote trail file into multiple threads that are then applied to the target database in parallel. GoldenGate manages the coordination across selected events that require ordering, including DDL, primary key updates, event marker interface (EMI), and SQLEXEC. Coordinated Delivery can be used with both Oracle (from version 11.2.0.4) and non-Oracle databases. Event-based processing In GoldenGate 12c, event-based processing has been enhanced to allow specific events to be captured and acted upon automatically through an EMI. SQLEXEC provides the API to EMI, enabling programmatic execution of tasks following an event. Now it is possible, for example, to detect the start of a batch job or large transaction, trap the SQL statement(s), and ignore the subsequent multiple change records until the end of the source system transaction. The original DML can then be replayed on the target database as one transaction. This is a major step forward in the performance tuning for data replication. Enhanced security Recent versions of GoldenGate have included security features such as the encryption of passwords and data. Oracle GoldenGate 12c now supports a credential store, better known as an Oracle wallet, that securely stores an alias associated with a username and password. The alias is then referenced in the GoldenGate parameter files rather than the actual username and password. Conflict Detection and Resolution In earlier versions of GoldenGate, Conflict Detection and Resolution (CDR) has been somewhat lightweight and was not readily available out of the box. Although available in Oracle Streams, the GoldenGate administrator would have to programmatically resolve any data conflict in the replication process using GoldenGate built-in tools. In the 12c version, the feature has emerged as an easily configurable option through Extract and Replicat parameters. Dynamic Rollback Selective data back out of applied transactions is now possible using the Dynamic Rollback feature. The feature operates at table and record-level and supports point-in-time recovery. This potentially eliminates the need for a full database restore, following data corruption, erroneous deletions, or perhaps the removal of test data, thus avoiding hours of system downtime. Streams to GoldenGate migration Oracle Streams users can now migrate their data replication solution to Oracle GoldenGate 12c using a purpose-built utility. This is a welcomed feature given that Streams is no longer supported in Oracle Database 12c. The Streams2ogg tool auto generates Oracle GoldenGate configuration files that greatly simplify the effort required in the migration process. Performance In today's demand for real-time access to real-time data, high performance is the key. For example, businesses will no longer wait for information to arrive on their DSS to make decisions and users will expect the latest information to be available in the public cloud. Data has value and must be delivered in real time to meet the demand. So, how long does it take to replicate a transaction from the source database to its target? This is known as end-to-end latency, which typically has a threshold that must not be breeched in order to satisfy a predefined Service Level Agreement (SLA). GoldenGate refers to latency as lag, which can be measured at different intervals in the replication process. They are as follows: Source to Extract: The time taken for a record to be processed by the Extract compared to the commit timestamp on the database Replicat to target: The time taken for the last record to be processed by the Replicat process compared to the record creation time in the trail file A well-designed system may still encounter spikes in the latency, but it should never be continuous or growing. Peaks are typically caused by load on the source database system, where the latency increases with the number of transactions per second. Lag should be measured as an average over a specified period. Trying to tune GoldenGate when the design is poor is a difficult situation to be in. For the system to perform well, you may need to revisit the design. Availability Another important NFR is availability. Normally quoted as a percentage, the system must be available for the specified length of time. For example, NFR of 99.9 percent availability equates to a downtime of 8.76 hours in a year, which sounds quite a lot, especially if it were to occur all at once. Oracle's maximum availability architecture (MAA) offers enhanced availability through products such as Real Application Clusters (RAC) and Active Data Guard (ADG). However, as we previously described, the network plays a major role in data replication. The NFR relates to the whole system, so you need to be sure your design covers redundancy for all components. Event-based processing It is important in any data replication environment to capture and manage events, such as trail records containing specific data or operations or maybe the occurrence of a certain error. These are known as Event Markers. GoldenGate provides a mechanism to perform an action on a given event or condition. These are known as Event Actions and are triggered by Event Records. If you are familiar with Oracle Streams, Event Actions are like rules. The Event Marker System GoldenGate's Event Marker System, also known as event marker interface (EMI), allows custom DML-driven processing on an event. This comprises of an Event Record to trigger a given action. An Event Record can be either a trail record that satisfies a condition evaluated by a WHERE or FILTER clause or a record written to an event table that enables an action to occur. Typical actions are writing status information, reporting errors, ignoring certain records in a trail, invoking a shell script, or performing an administrative task. The following Replicat code describes the process of capturing an event and performing an action by logging DELETE operations made against the CREDITCARD_ACCOUNTS table using the EVENTACTIONS parameter: MAP SRC.CREDITCARD_ACCOUNTS, TARGET TGT.CREDITCARD_ACCOUNTS_DIM;TABLE SRC.CREDITCARD_ACCOUNTS, &FILTER (@GETENV ('GGHEADER', 'OPTYPE') = 'DELETE'), &EVENTACTIONS (LOG INFO); By default, all logged information is written to the process group report file, the GoldenGate error log, and the system messages file. On Linux, this is the /var/log/messages file. Note that the TABLE parameter is also used in the Replicat's parameter file. This is a means of triggering an Event Action to be executed by the Replicat when it encounters an Event Marker. The following code shows the use of the IGNORE option that prevents certain records from being extracted or replicated, which is particularly useful to filter out system type data. When used with the TRANSACTION option, the whole transaction and not just the Event Record is ignored: TABLE SRC.CREDITCARD_ACCOUNTS, &FILTER (@GETENV ('GGHEADER', 'OPTYPE') = 'DELETE'), &EVENTACTIONS (IGNORE TRANSACTION); The preceding code extends the previous code by stopping the Event Record itself from being replicated. Using Event Actions to improve batch performance All replication technologies typically suffer from one flaw that is the way in which the data is replicated. Consider a table that is populated with a million rows as part of a batch process. This may be a bulk insert operation that Oracle completes on the source database as one transaction. However, Oracle will write each change to its redo logs as Logical Change Records (LCRs). GoldenGate will subsequently mine the logs, write the LCRs to a remote trail, convert each one back to DML, and apply them to the target database, one row at a time. The single source transaction becomes one million transactions, which causes a huge performance overhead. To overcome this issue, we can use Event Actions to: Detect the DML statement (INSERT INTO TABLE SELECT ..) Ignore the data resulting from the SELECT part of the statement Replicate just the DML statement as an Event Record Execute just the DML statement on the target database The solution requires a statement table on both source and target databases to trigger the event. Also, both databases must be perfectly synchronized to avoid data integrity issues. User tokens User tokens are GoldenGate environment variables that are captured and stored in the trail record for replication. They can be accessed via the @GETENV function. We can use token data in column maps, stored procedures called by SQLEXEC, and, of course, in macros. Using user tokens to populate a heartbeat table A vast array of user tokens exist in GoldenGate. Let's start by looking at a common method of replicating system information to populate a heartbeat table that can be used to monitor performance. We can use the TOKENS option of the Extract TABLE parameter to define a user token and associate it with the GoldenGate environment data. The following Extract configuration code shows the token declarations for the heartbeat table: TABLE GGADMIN.GG_HB_OUT, &TOKENS (EXTGROUP = @GETENV ("GGENVIRONMENT","GROUPNAME"), &EXTTIME = @DATE ("YYYY-MM-DD HH:MI:SS.FFFFFF","JTS",@GETENV("JULIANTIMESTAMP")), &EXTLAG = @GETENV ("LAG","SEC"), &EXTSTAT_TOTAL = @GETENV ("DELTASTATS","DML"), &), FILTER (@STREQ (EXTGROUP, @GETENV("GGENVIRONMENT","GROUPNAME"))); For the data pump, the example Extract configuration is shown here: TABLE GGADMIN.GG_HB_OUT, &TOKENS (PMPGROUP = @GETENV ("GGENVIRONMENT","GROUPNAME"), &PMPTIME = @DATE ("YYYY-MM-DD HH:MI:SS.FFFFFF","JTS",@GETENV("JULIANTIMESTAMP")), &PMPLAG = @GETENV ("LAG","SEC")); Also, for the Replicat, the following configuration populates the heartbeat table on the target database with the token data derived from Extract, data pump, and Replicat, containing system details and replication lag: MAP GGADMIN.GG_HB_OUT_SRC, TARGET GGADMIN.GG_HB_IN_TGT, &KEYCOLS (DB_NAME, EXTGROUP, PMPGROUP, REPGROUP), &INSERTMISSINGUPDATES, &COLMAP (USEDEFAULTS, &ID = 0, &SOURCE_COMMIT = @GETENV ("GGHEADER", "COMMITTIMESTAMP"), &EXTGROUP = @TOKEN ("EXTGROUP"), &EXTTIME = @TOKEN ("EXTTIME"), &PMPGROUP = @TOKEN ("PMPGROUP"), &PMPTIME = @TOKEN ("PMPTIME"), &REPGROUP = @TOKEN ("REPGROUP"), &REPTIME = @DATE ("YYYY-MM-DD HH:MI:SS.FFFFFF","JTS",@GETENV("JULIANTIMESTAMP")), &EXTLAG = @TOKEN ("EXTLAG"), &PMPLAG = @TOKEN ("PMPLAG"), &REPLAG = @GETENV ("LAG","SEC"), &EXTSTAT_TOTAL = @TOKEN ("EXTSTAT_TOTAL")); As in the heartbeat table example, the defined user tokens can be called in a MAP statement using the @TOKEN function. The SOURCE_COMMIT and LAG metrics are self-explained. However, EXTSTAT_TOTAL, which is derived from DELTASTATS, is particularly useful to measure the load on the source system when you evaluate latency peaks. For applications, user tokens are useful to audit data and trap exceptions within the replicated data stream. Common user tokens are shown in the following code that replicates the token data to five columns of an audit table: MAP SRC.AUDIT_LOG, TARGET TGT.AUDIT_LOG, &COLMAP (USEDEFAULTS, &OSUSER = @TOKEN ("TKN_OSUSER"), &DBNAME = @TOKEN ("TKN_DBNAME"), &HOSTNAME = @TOKEN ("TKN_HOSTNAME"), &TIMESTAMP = @TOKEN ("TKN_COMMITTIME"), &BEFOREAFTERINDICATOR = @TOKEN ("TKN_ BEFOREAFTERINDICATOR"); The BEFOREAFTERINDICATOR environment variable is particularly useful to provide a status flag in order to check whether the data was from a Before or After image of an UPDATE or DELETE operation. By default, GoldenGate provides After images. To enable a Before image extraction, the GETUPDATEBEFORES Extract parameter must be used on the source database. Using logic in the data replication GoldenGate has a number of functions that enable the administrator to program logic in the Extract and Replicat process configuration. These provide generic functions found in the IF and CASE programming languages. In addition, the @COLTEST function enables conditional calculations by testing for one or more column conditions. This is typically used with the @IF function, as shown in the following code: MAP SRC.CREDITCARD_PAYMENTS, TARGET TGT.CREDITCARD_PAYMENTS_FACT,&COLMAP (USEDEFAULTS, &AMOUNT = @IF(@COLTEST(AMOUNT, MISSING, INVALID), 0, AMOUNT)); Here, the @COLTEST function tests the AMOUNT column in the source data to check whether it is MISSING or INVALID. The @IF function returns 0 if @COLTEST returns TRUE and returns the value of AMOUNT if FALSE. The target AMOUNT column is therefore set to 0 when the equivalent source is found to be missing or invalid; otherwise, a direct mapping occurs. The @CASE function tests a list of values for a match and then returns a specified value. If no match is found, @CASE will return a default value. There is no limit to the number of cases to test; however, if the list is very large, a database lookup may be more appropriate. The following code shows the simplicity of the @CASE statement. Here, the country name is returned from the country code: MAP SRC.CREDITCARD_STATEMENT, TARGET TGT.CREDITCARD_STATEMENT_DIM,&COLMAP (USEDEFAULTS, &COUNTRY = @CASE(COUNTRY_CODE, "UK", "United Kingdom", "USA","United States of America")); Other GoldenGate functions: @EVAL and @VALONEOF exist that perform tests. Similar to @CASE, @VALONEOF compares a column or string to a list of values. The difference being it evaluates more than one value against a single column or string. When the following code is used with @IF, it returns "EUROPE" when TRUE and "UNKNOWN" when FALSE: MAP SRC.CREDITCARD_STATEMENT, TARGET TGT.CREDITCARD_STATEMENT_DIM,&COLMAP (USEDEFAULTS, &REGION = @IF(@VALONEOF(COUNTRY_CODE, "UK","E", "D"),"EUROPE","UNKNOWN")); The @EVAL function evaluates a list of conditions and returns a specified value. Optionally, if none are satisfied, it returns a default value. There is no limit to the number of evaluations you can list. However, it is best to list the most common evaluations at the beginning to enhance performance. The following code includes the BEFORE option that compares the before value of the replicated source column to the current value of the target column. Depending on the evaluation, @EVAL will return "PAID MORE", "PAID LESS", or "PAID SAME": MAP SRC.CREDITCARD_ PAYMENTS, TARGET TGT.CREDITCARD_PAYMENTS, &COLMAP (USEDEFAULTS, &STATUS = @EVAL(AMOUNT < BEFORE.AMOUNT, "PAID LESS", AMOUNT > BEFORE.AMOUNT, "PAID MORE", AMOUNT = BEFORE.AMOUNT, "PAID SAME")); The BEFORE option can be used with other GoldenGate functions, including the WHERE and FILTER clauses. However, for the Before image to be written to the trail and to be available, the GETUPDATEBEFORES parameter must be enabled in the source database's Extract parameter file or the target database's Replicat parameter file, but not both. The GETUPDATEBEFORES parameter can be set globally for all tables defined in the Extract or individually per table using GETUPDATEBEFORES and IGNOREUPDATEBEFORES, as seen in the following code: EXTRACT EOLTP01USERIDALIAS srcdb DOMAIN adminSOURCECATALOG PDB1EXTTRAIL ./dirdat/aaGETAPPLOPSIGNOREREPLICATESGETUPDATEBEFORESTABLE SRC.CHECK_PAYMENTS;IGNOREUPDATEBEFORESTABLE SRC.CHECK_PAYMENTS_STATUS;TABLE SRC.CREDITCARD_ACCOUNTS;TABLE SRC.CREDITCARD_PAYMENTS; Tracing processes to find wait events If you have worked with Oracle software, particularly in the performance tuning space, you will be familiar with tracing. Tracing enables additional information to be gathered from a given process or function to diagnose performance problems or even bugs. One example is the SQL trace that can be enabled at a database session or the system level to provide key information, such as; wait events, parse, fetch, and execute times. Oracle GoldenGate 12c offers a similar tracing mechanism through its trace and trace2 options of the SEND GGSCI command. This is like the session-level SQL trace. Also, in a similar fashion to performing a database system trace, tracing can be enabled in the GoldenGate process parameter files that make it permanent until the Extract or Replicat is stopped. trace provides processing information, whereas trace2 identifies the processes with wait events. The following commands show tracing being dynamically enabled for 2 minutes on a running Replicat process: GGSCI (db12server02) 1> send ROLAP01 trace2 ./dirrpt/ROLAP01.trc Wait for 2 minutes, then turn tracing off: GGSCI (db12server02) 2> send ROLAP01 trace2 offGGSCI (db12server02) 3> exit To view the contents of the Replicat trace file, we can execute the following command. In the case of a coordinated Replicat, the trace file will contain information from all of its threads: $ view dirrpt/ROLAP01.trcstatistics between 2015-08-08 Wed HKT 11:55:27 and 2015-08-08 Wed HKT11:57:28RPT_PROD_Ol.LIMIT_TP_RESP : n=2 : op=Insert; total=3; avg=1.5000;max=3msecRPT_PROD_01.SUP_POOL_SMRY_HIST : n=1 : op=Insert; total=2; avg=2.0000;max=2msecRPT_PROD_01.EVENTS : n=1 : op=Insert; total=2; avg=2.0000; max=2msecRPT_PROD_01.DOC_SHIP_DTLS : n=17880 : op=FieldComp; total=22003;avg=1.2306; max=42msecRPT_PROD_01.BUY_POOL_SMRY_HIST : n=1 : op=Insert; total=2; avg=2.0000;max=2msecRPT_PROD_01.LIMIT_TP_LOG : n=2 : op-Insert; total=2; avg=1.0000;max=2msecRPT_PROD_01.POOL_SMRY : n=1 : op=FieldComp; total=2; avg=2.0000;max=2msec..===============================================summary==============Delete : n=2; total=2; avg=1.00;Insert : n=78; total=356; avg=4.56;FieldComp : n=85728; total=123018; avg=1.43;total_op_num=85808 : total_op_time=123376 ms : total_avg_time=1.44ms/optotal commit number=1 The trace file provides the following information: The table name The operation type (FieldComp is for a compressed field) The number of operations The average wait The maximum wait Summary Armed with the preceding information, we can quickly see what operations against which tables are taking the longest time. Exception handling Oracle GoldenGate 12c now supports Conflict Detection and Resolution (CDR). However, out-of-the-box, GoldenGate takes a catch all approach to exception handling. For example, by default, should any operational failure occur, a Replicat process will ABEND and roll back the transaction to the last known checkpoint. This may not be ideal in a production environment. The HANDLECOLLISIONS and NOHANDLECOLLISIONS parameters can be used to control whether or not a Replicat process tries to resolve the duplicate record error and the missing record error. The way to determine what error occurred and on which Replicat is to create an exceptions handler. Exception handling differs from CDR by trapping and reporting Oracle errors suffered by the data replication (DML and DDL). On the other hand, CDR detects and resolves inconsistencies in the replicated data, such as mismatches with before and after images. Exceptions can always be trapped by the Oracle error they produce. GoldenGate provides an exception handler parameter called REPERROR that allows the Replicat to continue processing data after a predefined error. For example, we can include the following configuration in our Replicat parameter file to ignore ORA-00001 "unique constraint (%s.%s) violated": REPERROR (DEFAULT, EXCEPTION)REPERROR (DEFAULT2, ABEND)REPERROR (-1, EXCEPTION) Cloud computing Cloud computing has grown enormously in the recent years. Oracle has named its latest version of products: 12c, the c standing for Cloud of course. The architecture of Oracle 12c Database allows a multitenant container database to support multiple pluggable databases—a key feature of cloud computing—rather than implement the inefficient schema consolidation, typical of the previous Oracle database version architecture, which is known to cause contention on shared resources during high load. The Oracle 12c architecture supports a database consolidation approach through its efficient memory management and dedicated background processes. Online computer companies such as Amazon have leveraged the cloud concept by offering Relational Database Services (RDS), which is becoming very popular for its speed of readiness, support, and low cost. The cloud environments are often huge, containing hundreds of servers, petabytes of storage, terabytes of memory, and countless CPU cores. The cloud has to support multiple applications in a multi-tiered, shared environment, often through virtualization technologies, where storage and CPUs are typically the driving factors for cost-effective options. Customers choose their hardware footprint that best suits their budget and system requirements, commonly known as Platform as a Service (PaaS). Cloud computing is an extension to grid computing that offers both public and private clouds. GoldenGate and Big Data It is increasingly evident that organizations need to quickly access, analyze, and report on their data across their Enterprise in order to be agile in a competitive market. Data is becoming more of an asset to companies; it adds value to a business, but may be stored in any number of current and legacy systems, making it difficult to realize its full potential. Known as big data, it has until recently been nearly impossible to perform real-time business analysis on the combined data from multiple sources. Nowadays, the ability to access all transactional data with low latency is essential. With the introduction of products such as Apache Hadoop, integration of structured data from an RDBMS, including semi-structured and unstructured data, offers a common playing field to support business intelligence. When coupled with ODI, GoldenGate for big data provides real-time delivery to a suite of Apache products, such as Flume, HDFS, Hive, and Hbase, to support big data analytics. Summary In this article, we have learned an introduction to Oracle GoldenGate by describing the key components, processes, and considerations required to build and implement a GoldenGate solution. Resources for Article: Further resources on this subject: What is Oracle Public Cloud? [Article] Oracle GoldenGate- Advanced Administration Tasks - I [Article] Oracle B2B Overview [Article]
Read more
  • 0
  • 0
  • 6636
Modal Close icon
Modal Close icon