How-To Tutorials

article-image-getting-started-java-driver-mongodb

11 Aug 2015

8 min read

Getting Started with Java Driver for MongoDB

11 Aug 2015

In this article by Francesco Marchioni, author of the book MongoDB for Java Developers, you will be able to perform all the create/read/update/delete (CRUD) operations that we have so far accomplished using the mongo shell. (For more resources related to this topic, see here.) Querying data We will now see how to use the Java API to query for your documents. Querying for documents with MongoDB resembles JDBC queries; the main difference is that the returned object is a com.mongodb.DBCursor class, which is an iterator over the database result. In the following example, we are iterating over the javastuff collection that should contain the documents inserted so far: package com.packtpub.mongo.chapter2; import com.mongodb.DB; import com.mongodb.DBCollection; import com.mongodb.DBCursor; import com.mongodb.DBObject; import com.mongodb.MongoClient; public class SampleQuery{ private final static String HOST = "localhost"; private final static int PORT = 27017; public static void main( String args[] ){ try{ MongoClient mongoClient = new MongoClient( HOST,PORT ); DB db = mongoClient.getDB( "sampledb" ); DBCollection coll = db.getCollection("javastuff"); DBCursor cursor = coll.find(); try { while(cursor.hasNext()) { DBObject object = cursor.next(); System.out.println(object); } } finally { cursor.close(); } } catch(Exception e) { System.err.println( e.getClass().getName() + ": " + e.getMessage() ); } } } Depending on the documents you have inserted, the output could be something like this: { "_id" : { "$oid" : "5513f8836c0df1301685315b"} , "name" : "john" , "age" : 35 , "kids" : [ { "name" : "mike"} , { "name" : "faye"}] , "info" : { "email" : "john@mail.com" , "phone" : "876-134-667"}} . . . . Restricting the search to the first document The find operator executed without any parameter returns the full cursor of a collection; pretty much like the SELECT * query in relational DB terms. If you are interested in reading just the first document in the collection, you could use the findOne() operation to get the first document in the collection. This method returns a single document (instead of the DBCursor that the find() operation returns). As you can see, the findOne() operator directly returns a DBObject instead of a com.mongodb.DBCursor class: DBObject myDoc = coll.findOne(); System.out.println(myDoc); Querying the number of documents in a collection Another typical construct that you probably know from the SQL is the SELECT count(*) query that is useful to retrieve the number of records in a table. In MongoDB terms, you can get this value simply by invoking the getCount against a DBCollection class: DBCollection coll = db.getCollection("javastuff"); System.out.println(coll.getCount()); As an alternative, you could execute the count() method over the DBCursor object: DBCursor cursor = coll.find(); System.out.println(cursor.count()); Eager fetching of data using DBCursor When find is executed and a DBCursor is executed you have a pointer to a database document. This means that the documents are fetched in the memory as you call next() method on the DBCursor. On the other hand, you can eagerly load all the data into the memory by executing the toArray() method, which returns a java.util.List structure: List list = collection.find( query ).toArray(); The problem with this approach is that you could potentially fill up the memory with lots of documents, which are eagerly loaded. You are therefore advised to include some operators such as skip() and limit() to control the amount of data to be loaded into the memory: List list = collection.find( query ).skip( 100 ). limit( 10 ).toArray(); Just like you learned from the mongo shell, the skip operator can be used as an initial offset of your cursor whilst the limit construct can eventually load the first n occurrences in the cursor. Filtering through the records Typically, you will not need to fetch the whole set of documents in a collection. So, just like SQL uses WHERE conditions to filter records, in MongoDB you can restrict searches by creating a BasicDBObject and passing it to the find function as an argument. See the following example: DBCollection coll = db.getCollection("javastuff"); DBObject query = new BasicDBObject("name", "owen"); DBCursor cursor = coll.find(query); try { while(cursor.hasNext()) { System.out.println(cursor.next()); } } finally { cursor.close(); } In the preceding example, we retrieve the documents in the javastuff collection, whose name key equals to owen. That's the equivalent of an SQL query like this: SELECT * FROM javastuff WHERE name='owen' Building more complex searches As your collections keep growing, you will need to be more selective with your searches. For example, you could include multiple keys in your BasicDBObject that will eventually be passed to find. We can then apply the same functions in our queries. For example, here is how to find documents whose name does not equal ($ne) to Frank and whose age is greater than 10: DBCollection coll = db.getCollection("javastuff"); DBObject query = new BasicDBObject("name", new BasicDBObject("$ne", "frank")).append("age", new BasicDBObject("$gt", 10)); DBCursor cursor = coll.find(query); Updating documents Having learned about create and read, we are half way through our CRUD track. The next operation you will learn is update. The DBCollection class contains an update method that can be used for this purpose. Let's say we have the following document: > db.javastuff.find({"name":"frank"}).pretty() { "_id" : ObjectId("55142c27627b27560bd365b1"), "name" : "frank", "age" : 31, "info" : { "email" : "frank@mail.com", "phone" : "222-111-444" } } Now we want to change the age value for this document by setting it to 23: DBCollection coll = db.getCollection("javastuff"); DBObject newDocument = new BasicDBObject(); newDocument.put("age", 23); DBObject searchQuery = new BasicDBObject().append("name", "owen"); coll.update(searchQuery, newDocument); You might think that would do the trick, but wait! Let's have a look at our document using the mongo shell: > db.javastuff.find({"age":23}).pretty() { "_id" : ObjectId("55142c27627b27560bd365b1"), "age" : 23 } As you can see, the update statement has replaced the original document with another one, including only the keys and values we have passed to the update. In most cases, this is not what we want to achieve. If we want to update a particular value, we have to use the $set update modifier. DBCollection coll = db.getCollection("javastuff"); BasicDBObject newDocument = new BasicDBObject(); newDocument.append("$set", new BasicDBObject().append("age", 23)); BasicDBObject searchQuery = new BasicDBObject().append("name", "frank"); coll.update(searchQuery, newDocument); So, suppose we restored the initial document with all the fields, this is the outcome of the update using the $set update modifier: > db.javastuff.find({"age":23}).pretty() { "_id" : ObjectId("5514326e627b383428c2ccd8"), "name" : "frank", "age" : 23, "in,fo" : { "email" : "frank@mail.com", "phone" : "222-111-444" } } Please note that the DBCollection class overloads the method update with update (DBObject q, DBObject o, boolean upsert, boolean multi). The first parameter (upsert) determines whether the database should create the element if it does not exist. The second one (multi) causes the update to be applied to all matching objects. Deleting documents The operator to be used for deleting documents is obviously delete. As for other operators, it includes several variants. In its simplest form, when executed over a single document returned, it will remove it: MongoClient mongoClient = new MongoClient("localhost", 27017); DB db = mongoClient.getDB("sampledb"); DBCollection coll = db.getCollection("javastuff"); DBObject doc = coll.findOne(); coll.remove(doc); Most of the time you will need to filter the documents to be deleted. Here is how to delete the document with the key frank: DBObject document = new BasicDBObject(); document.put("name", "frank"); coll.remove(document); Deleting a set of documents Bulk deletion of documents can be achieved by including the keys in a List and building an $in modifier expression that uses this list. Let's see, for example, how to delete all records whose age ranges from 0 to 49: BasicDBObject deleteQuery = new BasicDBObject(); List<Integer> list = new ArrayList<Integer>(); for (int i=0;i<50;i++) list.add(i); deleteQuery.put("age", new BasicDBObject("$in", list)); Summary In this article, we covered how to perform the same operations that are available in the mongo shell. Resources for Article: Further resources on this subject: MongoDB data modeling [article] Apache Solr and Big Data – integration with MongoDB [article] About MongoDB [article]

0
0
2052

Packt

11 Aug 2015

17 min read

Managing Recipients

Packt

11 Aug 2015

17 min read

0
0
1194

Packt

11 Aug 2015

7 min read

Working with Virtual Machines

Packt

11 Aug 2015

7 min read

0
0
8028

Packt

10 Aug 2015

7 min read

Exploring Jenkins

Packt

10 Aug 2015

7 min read

0
0
2321

article-image-using-arcpy-dataaccess-module-withfeature-classesand-tables

Packt

10 Aug 2015

28 min read

Using the ArcPy DataAccess Module withFeature Classesand Tables

Packt

10 Aug 2015

28 min read

0
0
7055

Packt

10 Aug 2015

12 min read

Integrating with Muzzley

Packt

10 Aug 2015

12 min read

In this article by Miguel de Sousa, author of the book Internet of Things with Intel Galileo, we will cover the following topics: Wiring the circuit The Muzzley IoT ecosystem Creating a Muzzley app Lighting up the entrance door (For more resources related to this topic, see here.) One identified issue regarding IoT is that there will be lots of connected devices and each one speaks its own language, not sharing the same protocols with other devices. This leads to an increasing number of apps to control each of those devices. Every time you purchase connected products, you'll be required to have the exclusive product app, and, in the near future, where it is predicted that more devices will be connected to the Internet than people, this is indeed a problem, which is known as the basket of remotes. Many solutions have been appearing for this problem. Some of them focus on creating common communication standards between the devices or even creating their own protocol such as the Intel Common Connectivity Framework (CCF). A different approach consists in predicting the device's interactions, where collected data is used to predict and trigger actions on the specific devices. An example using this approach is Muzzley. It not only supports a common way to speak with the devices, but also learns from the users' interaction, allowing them to control all their devices from a common app, and on collecting usage data, it can predict users' actions and even make different devices work together. In this article, we will start by understanding what Muzzley is and how we can integrate with it. We will then do some development to allow you to control your own building's entrance door. For this purpose, we will use Galileo as a bridge to communicate with a relay and the Muzzley cloud, allowing you to control the door from a common mobile app and from anywhere as long as there is Internet access. Wiring the circuit In this article, we'll be using a real home AC inter-communicator with a building entrance door unlock button and this will require you to do some homework. This integration will require you to open your inter-communicator and adjust the inner circuit, so be aware that there are always risks of damaging it. If you don't want to use a real inter-communicator, you can replace it by an LED or even by the buzzer module. If you want to use a real device, you can use a DC inter-communicator, but in this guide, we'll only be explaining how to do the wiring using an AC inter-communicator. The first thing you have to do is to take a look at the device manual and check whether it works with AC current, and the voltage it requires. If you can't locate your product manual, search for it online. In this article, we'll be using the solid state relay. This relay accepts a voltage range from 24 V up to 380 V AC, and your inter-communicator should also work in this voltage range. You'll also need some electrical wires and electrical wires junctions: Wire junctions and the solid state relay This equipment will be used to adapt the door unlocking circuit to allow it to be controlled from the Galileo board using a relay. The main idea is to use a relay to close the door opener circuit, resulting in the door being unlocked. This can be accomplished by joining the inter-communicator switch wires with the relay wires. Use some wire and wire junctions to do it, as displayed in the following image: Wiring the circuit The building/house AC circuit is represented in yellow, and S1 and S2 represent the inter-communicator switch (button). On pressing the button, we will also be closing this circuit, and the door will be unlocked. This way, the lock can be controlled both ways, using the original button and the relay. Before starting to wire the circuit, make sure that the inter-communicator circuit is powered off. If you can't switch it off, you can always turn off your house electrical board for a couple of minutes. Make sure that it is powered off by pressing the unlock button and trying to open the door. If you are not sure of what you must do or don't feel comfortable doing it, ask for help from someone more experienced. Open your inter-communicator, locate the switch, and perform the changes displayed in the preceding image (you may have to do some soldering). The Intel Galileo board will then activate the relay using pin 13, where you should wire it to the relay's connector number 3, and the Galileo's ground (GND) should be connected to the relay's connector number 4. Beware that not all the inter-communicator circuits work the same way and although we try to provide a general way to do it, there're always the risk of damaging your device or being electrocuted. Do it at your own risk. Power on your inter-communicator circuit and check whether you can open the door by pressing the unlock door button. If you prefer not using the inter-communicator with the relay, you can always replace it with a buzzer or an LED to simulate the door opening. Also, since the relay is connected to Galileo's pin 13, with the same relay code, you'll have visual feedback from the Galileo's onboard LED. The Muzzley IoT ecosystem Muzzley is an Internet of Things ecosystem that is composed of connected devices, mobile apps, and cloud-based services. Devices can be integrated with Muzzley through the device cloud or the device itself: It offers device control, a rules system, and a machine learning system that predicts and suggests actions, based on the device usage. The mobile app is available for Android, iOS, and Windows phone. It can pack all your Internet-connected devices in to a single common app, allowing them to be controlled together, and to work with other devices that are available in real-world stores or even other homemade connected devices, like the one we will create in this article. Muzzley is known for being one of the first generation platforms with the ability to predict a users' actions by learning from the user's interaction with their own devices. Human behavior is mostly unpredictable, but for convenience, people end up creating routines in their daily lives. The interaction with home devices is an example where human behavior can be observed and learned by an automated system. Muzzley tries to take advantage of these behaviors by identifying the user's recurrent routines and making suggestions that could accelerate and simplify the interaction with the mobile app and devices. Devices that don't know of each others' existence get connected through the user behavior and may create synergies among themselves. When the user starts using the Muzzley app, the interaction is observed by a profiler agent that tries to acquire a behavioral network of the linked cause-effect events. When the frequency of these network associations becomes important enough, the profiler agent emits a suggestion for the user to act upon. For instance, if every time a user arrives home, he switches on the house lights, check the thermostat, and adjust the air conditioner accordingly, the profiler agent will emit a set of suggestions based on this. The cause of the suggestion is identified and shortcuts are offered for the effect-associated action. For instance, the user could receive in the Muzzley app the following suggestions: "You are arriving at a known location. Every time you arrive here, you switch on the «Entrance bulb». Would you like to do it now?"; or "You are arriving at a known location. The thermostat «Living room» says that the temperature is at 15 degrees Celsius. Would you like to set your «Living room» air conditioner to 21 degrees Celsius?" When it comes to security and privacy, Muzzley takes it seriously and all the collected data is used exclusively to analyze behaviors to help make your life easier. This is the system where we will be integrating our door unlocker. Creating a Muzzley app The first step is to own a Muzzley developer account. If you don't have one yet, you can obtain one by visiting https://muzzley.com/developers, clicking on the Sign up button, and submitting the displayed form. To create an app, click on the top menu option Apps and then Create app. Name your App Galileo Lock and if you want to, add a description to your project. As soon as you click on Submit, you'll see two buttons displayed, allowing you to select the integration type: Muzzley allows you to integrate through the product manufacturer cloud or directly with a device. In this example, we will be integrating directly with the device. To do so, click on Device to Cloud integration. Fill in the provider name as you wish and pick two image URLs to be used as the profile (for example, http://hub.packtpub.com/wp-content/uploads/2015/08/Commercial1.jpg) and channel (for example, http://hub.packtpub.com/wp-content/uploads/2015/08/lock.png) images. We can select one of two available ways to add our device: it can be done using UPnP discovery or by inserting a custom serial number. Pick the device discovery option Serial number and ignore the fields Interface UUID and Email Access List; we will come back for them later. Save your changes by pressing the Save changes button. Lighting up the entrance door Now that we can unlock our door from anywhere using the mobile phone with an Internet connection, a nice thing to have is the entrance lights turn on when you open the building door using your Muzzley app. To do this, you can use the Muzzley workers to define rules to perform an action when the door is unlocked using the mobile app. To do this, you'll need to own one of the Muzzley-enabled smart bulbs such as Philips Hue, WeMo LED Lighting, Milight, Easybulb, or LIFX. You can find all the enabled devices in the app profiles selection list: If you don't have those specific lighting devices but have another type of connected device, search the available list to see whether it is supported. If it is, you can use that instead. Add your bulb channel to your account. You should now find it listed in your channels under the category Lighting. If you click on it, you'll be able to control the lights. To activate the trigger option in the lock profile we created previously, go to the Muzzley website and head back to the Profile Spec app, located inside App Details. Expand the property lock status by clicking on the arrow sign in the property #1 - Lock Status section and then expand the controlInterfaces section. Create a new control interface by clicking on the +controlInterface button. In the new controlInterface #1 section, we'll need to define the possible choices of label-values for this property when setting a rule. Feel free to insert an id, and in the control interface option, select the text-picker option. In the config field, we'll need to specify each of the available options, setting the display label and the real value that will be published. Insert the following JSON object: {"options":[{"value":"true","label":"Lock"}, {"value":"false","label":"Unlock"}]}. Now we need to create a trigger. In the profile spec, expand the trigger section. Create a new trigger by clicking on the +trigger button. Inside the newly created section, select the equals condition. Create an input by clicking on +input, insert the ID value, insert the ID of the control interface you have just created in the controlInterfaceId text field. Finally, add the [{"source":"selection.value","target":"data.value"}].path to map the data. Open your mobile app and click on the workers icon. Clicking on Create Worker will display the worker creation menu to you. Here, you'll be able to select a channel component property as a trigger to some other channel component property: Select the lock and select the Lock Status is equal to Unlock trigger. Save it and select the action button. In here, select the smart bulb you own and select the Status On option: After saving this rule, give it a try and use your mobile phone to unlock the door. The smart bulb should then turn on. With this, you can configure many things in your home even before you arrive there. In this specific scenario, we used our door locker as a trigger to accomplish an action on a lightbulb. If you want, you can do the opposite and open the door when a lightbulb lights up a specific color for instance. To do it, similar to how you configured your device trigger, you just have to set up the action options in your device profile page. Summary Everyday objects that surround us are being transformed into information ecosystems and the way we interact with them is slowly changing. Although IoT is growing up fast, it is nowadays in an early stage, and many issues must be solved in order to make it successfully scalable. By 2020, it is estimated that there will be more than 25 billion devices connected to the Internet. This fast growth without security regulations and deep security studies are leading to major concerns regarding the two biggest IoT challenges—security and privacy. Devices in our home that are remotely controllable or even personal data information getting into the wrong hands could be the recipe for a disaster. In this article you have learned the basic steps in wiring the circuit of your Galileo board, creating a Muzzley app, and lighting up the entrance door of your building through your Muzzley app, by using Intel Galileo board as a bridge to communicate with Muzzley cloud. Resources for Article: Further resources on this subject: Getting Started with Intel Galileo [article] Getting the current weather forecast [article] Controlling DC motors using a shield [article]

0
0
8991

How-To Tutorials

Packt

10 Aug 2015

25 min read

Controls and Widgets

Packt

10 Aug 2015

25 min read

0
0
3200

article-image-setting-synchronous-replication

Packt

10 Aug 2015

17 min read

Setting Up Synchronous Replication

Packt

10 Aug 2015

17 min read

0
0
4683

article-image-understanding-hadoop-backup-and-recovery-needs

Packt

10 Aug 2015

25 min read

Understanding Hadoop Backup and Recovery Needs

Packt

10 Aug 2015

25 min read

0
0
7059

article-image-securing-openstack-networking

Packt

10 Aug 2015

10 min read

Securing OpenStack Networking

Packt

10 Aug 2015

10 min read

In this article by Fabio Alessandro Locati, author of the book OpenStack Cloud Security, you will learn about the importance of firewall, IDS, and IPS. You will also learn about Generic Routing Encapsulation, VXLAN. (For more resources related to this topic, see here.) The importance of firewall, IDS, and IPS The security of a network can and should be achieved in multiple ways. Three components that are critical to the security of a network are: Firewall Intrusion detection system (IDS) Intrusion prevention system (IPS) Firewall Firewalls are systems that control traffic passing through them based on rules. This can seem something like a router, but they are very different. The router allows communication between different networks while the firewall limits communication between networks and hosts. The root of this confusion may occur because very often the router will have the firewall functionality and vice versa. Firewalls need to be connected in a series to your infrastructure. The first paper on the firewall technology appeared in 1988 and designed the packet filter firewall. This kind of firewall is often known as first generation firewall. This kind of firewall analyzes the packages passing through and if the package matches a rule, the firewall will act accordingly to that rule. This firewall will analyze each package by itself and will not consider other aspects such as other packages. It works on the first three layers of the OSI model with very few features using layer 4 specifically to check port numbers and protocols (UDP/TCP). First generation firewalls are still in use, because in a lot of situations, to do the job properly and are cheap and secure. Examples of typical filtering those firewalls prohibit (or allow) to IPs of certain classes (or specific IPs), to access certain IPs, or allow traffic to a specific IP only on specific ports. There are no known attacks to those kind of firewalls, but specific models can have specific bugs that can be exploited. In 1990, a new generation of firewall appeared. The initial name was circuit-level gateway, but today it is far more commonly known as stateful firewalls or second generation firewall. These firewalls are able to understand when connections are being initialized and closed so that the firewall comes to know what is the current state of a connection when a package arrives. To do so, this kind of firewall uses the first four layers of the networking stack. This allows the firewall to drop all packages that are not establishing a new connection or are in an already established connection. These firewalls are very powerful with the TCP protocol because it has states, while they have very small advantages compared to first generation firewalls handling UDP or ICMP packages, since those packages travel with no connection. In these cases, the firewall sets the connection as established; only the first valid package passes through and closes it after the connection times out. Performance-wise, stateful firewall can be faster than packet firewall because if the package is part of an active connection, no further test will be performed against that package. These kinds of firewalls are more susceptible to bugs in their code since reading more about the package makes it easier to exploit. Also, on many devices, it is possible to open connections (with SYN packages) until the firewall is saturated. In such cases, the firewall usually downgrades itself as a simple router allowing all traffic to pass through it. In 1991, improvements were made to the stateful firewall allowing it to understand more about the protocol of the package it was evaluating. The firewalls of this kind before 1994 had major problems, such as working as a proxy that the user had to interact with. In 1994, the first application firewall, as we know it, was born doing all its job completely transparently. To be able to understand the protocol, this kind of firewall requires an understanding of all seven layers of the OSI model. As for security, the same as the stateful firewall does apply to the application firewall as well. Intrusion detection system (IDS) IDSs are systems that monitor the network traffic looking for policy violation and malicious traffic. The goal of the IDS is not to block malicious activity, but instead to log and report them. These systems act in a passive mode, so you'll not see any traffic coming from them. This is very important because it makes them invisible to attackers so you can gain information about the attack, without the attacker knowing. IDSs need to be connected in parallel to your infrastructure. Intrusion prevention system (IPS) IPSs are sometimes referred to as Intrusion Detection and Prevention Systems (IDPS), since they are IDS that are also able to fight back malicious activities. IPSs have greater possibility to act than IDSs. Other than reporting, like IDS, they can also drop malicious packages, reset the connection, and block the traffic from the offending IP address. IPSs need to be connected in series to your infrastructure. Generic Routing Encapsulation (GRE) GRE is a Cisco tuning protocol that is difficult to position in the OSI model. The best place for it to be is between layers 2 and 3. Being above layer 2 (where VLANs are), we can use GRE inside VLAN. We will not go deep into the technicalities of this protocol. I'd like to focus more on the advantages and disadvantages it has over VLAN. The first advantage of (extended) GRE over VLAN is scalability. In fact, VLAN is limited to 4,096, while GRE tunnels do not have this limitation. If you are running a private cloud and you are working in a small corporation, 4,096 networks could be enough, but will definitely not be enough if you work for a big corporation or if you are running a public cloud. Also, unless you use VTP for your VLANs, you'll have to add VLANs to each network device, while GREs don't need this. You cannot have more than 4,096 VLANs in an environment. The second advantage is security. Since you can deploy multiple GRE tunnels in a single VLAN, you can connect a machine to a single VLAN and multiple GRE networks without the risks that come with putting a port in trunking that is needed to bring more VLANs in the same physical port. For these reasons, GRE has been a very common choice in a lot of OpenStack clusters deployed up to OpenStack Havana. The current preferred networking choice (since Icehouse) is Virtual Extensible LAN (VXLAN). VXLAN VXLAN is a network virtualization technology whose specifications have been originally created by Arista Networks, Cisco, and VMWare, and many other companies have backed the project. Its goal is to offer a standardized overlay encapsulation protocol and it was created because the standard VLAN were too limited for the current cloud needs and the GRE protocol was a Cisco protocol. It works using layer 2 Ethernet frames within layer 4 UDP packages on port 4789. As for the maximum number of networks, the limit is 16 million logical networks. Since the Icehouse release, the suggested standard for networking is VXLAN. Flat network versus VLAN versus GRE in OpenStack Quantum In OpenStack Quantum, you can decide to use multiple technologies for your networks: flat network, VLAN, GRE, and the most recent, VXLAN. Let's discuss them in detail: Flat network: It is often used in private clouds since it is very easy to set up. The downside is that any virtual machine will see any other virtual machines in our cloud. I strongly discourage people from using this network design because it's unsafe, and in the long run, it will have problems, as we have seen earlier. VLAN: It is sometimes used in bigger private clouds and sometimes even in small public clouds. The advantage is that many times you already have a VLAN-based installation in your company. The major disadvantages are the need to trunk ports for each physical host and the possible problems in propagation. I discourage this approach, since in my opinion, the advantages are very limited while the disadvantages are pretty strong. VXLAN: It should be used in any kind of cloud due to its technical advantages. It allows a huge number of networks, its way more secure, and often eases debugging. GRE: Until the Havana release, it was the suggested protocol, but since the Icehouse release, the suggestion has been to move toward VXLAN, where the majority of the development is focused. Design a secure network for your OpenStack deployment As for the physical infrastructure, we have to design it securely. We have seen that the network security is critical and that there a lot of possible attacks in this realm. Is it possible to design a secure environment to run OpenStack? Yes it is, if you remember a few rules: Create different networks, at the very least for management and external data (this network usually already exists in your organization and is the one where all your clients are) Never put ports on trunking mode if you use VLANs in your infrastructure, otherwise physically separated networks will be needed The following diagram is an example of how to implement it: Here, the management, tenant external networks could be either VLAN or real networks. Remember that to not use VLAN trunking, you need at least the same amount of physical ports as of VLAN, and the machine has to be subscribed to avoid port trunking that can be a huge security hole. A management network is needed for the administrator to administer the machines and for the OpenStack services to speak to each other. This network is critical, since it may contain sensible data, and for this reason, it has to be disconnected from other networks, or if not possible, have very limited connectivity. The external network is used by virtual machines to access the Internet (and vice versa). In this network, all machines will need an IP address reachable from the Web. The tenant network, sometimes even called internal or guest network is the network where the virtual machines can communicate with other virtual machines in the same cloud. This network, in some deployment cases, can be merged with the external network, but this choice has some security drawbacks. The API network is used to expose OpenStack APIs to the users. This network requires IP addresses reachable from the Web, and for this reason, is often merged into the external network. There are cases where provider networks are needed to connect tenant networks to existing networks outside the OpenStack cluster. Those networks are created by the OpenStack administrator and map directly to an existing physical network in the data center. Summary In this article, we have seen how networking works, which attacks we can expect, and how we can counter them. Also, we have seen how to implement a secure deployment of OpenStack Networking. Resources for Article: Further resources on this subject: Cloud distribution points [Article] Photo Stream with iCloud [Article] Integrating Accumulo into Various Cloud Platforms [Article]

0
0
12711

article-image-creating-functions-and-operations

Packt

10 Aug 2015

18 min read

Creating Functions and Operations

Packt

10 Aug 2015

18 min read

In this article by Alex Libby, author of the book Sass Essentials, we will learn how to use operators or functions to construct a whole site theme from just a handful of colors, or defining font sizes for the entire site from a single value. You will learn how to do all these things in this article. Okay, so let's get started! (For more resources related to this topic, see here.) Creating values using functions and operators Imagine a scenario where you're creating a masterpiece that has taken days to put together, with a stunning choice of colors that has taken almost as long as the building of the project and yet, the client isn't happy with the color choice. What to do? At this point, I'm sure that while you're all smiles to the customer, you'd be quietly cursing the amount of work they've just landed you with, this late on a Friday. Sound familiar? I'll bet you scrap the colors and go back to poring over lots of color combinations, right? It'll work, but it will surely take a lot more time and effort. There's a better way to achieve this; instead of creating or choosing lots of different colors, we only need to choose one and create all of the others automatically. How? Easy! When working with Sass, we can use a little bit of simple math to build our color palette. One of the key tenets of Sass is its ability to work out values dynamically, using nothing more than a little simple math; we could define font sizes from H1 to H6 automatically, create new shades of colors, or even work out the right percentages to use when creating responsive sites! We will take a look at each of these examples throughout the article, but for now, let's focus on the principles of creating our colors using Sass. Creating colors using functions We can use simple math and functions to create just about any type of value, but colors are where these two really come into their own. The great thing about Sass is that we can work out the hex value for just about any color we want to, from a limited range of colors. This can easily be done using techniques such as adding two values together, or subtracting one value from another. To get a feel of how the color operators work, head over to the official documentation at http://sass-lang.com/documentation/file.SASS_REFERENCE.html#color_operations—it is worth reading! Nothing wrong with adding or subtracting values—it's a perfectly valid option, and will result in a valid hex code when compiled. But would you know that both values are actually deep shades of blue? Therein lies the benefit of using functions; instead of using math operators, we can simply say this: p { color: darken(#010203, 10%); } This, I am sure you will agree, is easier to understand as well as being infinitely more readable! The use of functions opens up a world of opportunities for us. We can use any one of the array of functions such as lighten(), darken(), mix(), or adjust-hue() to get a feel of how easy it is to get the values. If we head over to http://jackiebalzer.com/color, we can see that the author has exploded a number of Sass (and Compass—we will use this later) functions, so we can see what colors are displayed, along with their numerical values, as soon as we change the initial two values. Okay, we could play with the site ad infinitum, but I feel a demo coming on—to explore the effects of using the color functions to generate new colors. Let's construct a simple demo. For this exercise, we will dig up a copy of the colorvariables demo and modify it so that we're only assigning one color variable, not six. For this exercise, I will assume you are using Koala to compile the code. Okay, let's make a start: We'll start with opening up a copy of colorvariables.scss in your favorite text editor and removing lines 1 to 15 from the start of the file. Next, add the following lines, so that we should be left with this at the start of the file: $darkRed: #a43; $white: #fff; $black: #000; $colorBox1: $darkRed; $colorBox2: lighten($darkRed, 30%); $colorBox3: adjust-hue($darkRed, 35%); $colorBox4: complement($darkRed); $colorBox5: saturate($darkRed, 30%); $colorBox6: adjust-color($darkRed, $green: 25); Save the file as colorfunctions.scss. We need a copy of the markup file to go with this code, so go ahead and extract a copy of colorvariables.html from the code download, saving it as colorfunctions.html in the root of our project area. Don't forget to change the link for the CSS file within to colorfunctions.css! Fire up Koala, then drag and drop colorfunctions.scss from our project area over the main part of the application window to add it to the list: Right-click on the file name and select Compile, and then wait for it to show Success in a green information box. If we preview the results of our work in a browser, we should see the following boxes appear: At this point, we have a working set of colors—granted, we might have to work a little on making sure that they all work together. But the key point here is that we have only specified one color, and that the others are all calculated automatically through Sass. Now that we are only defining one color by default, how easy is it to change the colors in our code? Well, it is a cinch to do so. Let's try it out using the help of the SassMeister playground. Changing the colors in use We can easily change the values used in the code, and continue to refresh the browser after each change. However, this isn't a quick way to figure out which colors work; to get a quicker response, there is an easier way: use the online Sass playground at http://www.sassmeister.com. This is the perfect way to try out different colors—the site automatically recompiles the code and updates the result as soon as we make a change. Try copying the HTML and SCSS code into the play area to view the result. The following screenshot shows the same code used in our demo, ready for us to try using different calculations: All images work on the principle that we take a base color (in this case, $dark-blue, or #a43), then adjust the color either by a percentage or a numeric value. When compiled, Sass calculates what the new value should be and uses this in the CSS. Take, for example, the color used for #box6, which is a dark orange with a brown tone, as shown in this screenshot: To get a feel of some of the functions that we can use to create new colors (or shades of existing colors), take a look at the main documentation at http://sass-lang.com/documentation/Sass/Script/Functions.html, or https://www.makerscabin.com/web/sass/learn/colors. These sites list a variety of different functions that we can use to create our masterpiece. We can also extend the functions that we have in Sass with the help of custom functions, such as the toolbox available at https://github.com/at-import/color-schemer—this may be worth a look. In our demo, we used a dark red color as our base. If we're ever stuck for ideas on colors, or want to get the right HEX, RGB(A), or even HSL(A) codes, then there are dozens of sites online that will give us these values. Here are a couple of them that you can try: HSLa Explorer, by Chris Coyier—this is available at https://css-tricks.com/examples/HSLaExplorer/. HSL Color Picker by Brandon Mathis—this is available at http://hslpicker.com/. If we know the name, but want to get a Sass value, then we can always try the list of 1,500+ colors at https://github.com/FearMediocrity/sass-color-palettes/blob/master/colors.scss. What's more, the list can easily be imported into our CSS, although it would make better sense to simply copy the chosen values into our Sass file, and compile from there instead. Mixing colors The one thing that we've not discussed, but is equally useful is that we are not limited to using functions on their own; we can mix and match any number of functions to produce our colors. A great way to choose colors, and get the appropriate blend of functions to use, is at http://sassme.arc90.com/. Using the available sliders, we can choose our color, and get the appropriate functions to use in our Sass code. The following image shows how: In most cases, we will likely only need to use two functions (a mix of darken and adjust hue, for example); if we are using more than two–three functions, then we should perhaps rethink our approach! In this case, a better alternative is to use Sass's mix() function, as follows: $white: #fff; $berry: hsl(267, 100%, 35%); p { mix($white, $berry, 0.7) } …which will give the following valid CSS: p { color: #5101b3; } This is a useful alternative to use in place of the command we've just touched on; after all, would you understand what adjust_hue(desaturate(darken(#db4e29, 2), 41), 67) would give as a color? Granted, it is something of an extreme calculation, nonetheless, it is technically valid. If we use mix() instead, it matches more closely to what we might do, for example, when mixing paint. After all, how else would we lighten its color, if not by adding a light-colored paint? Okay, let's move on. What's next? I hear you ask. Well, so far we've used core Sass for all our functions, but it's time to go a little further afield. Let's take a look at how you can use external libraries to add extra functionality. In our next demo, we're going to introduce using Compass, which you will often see being used with Sass. Using an external library So far, we've looked at using core Sass functions to produce our colors—nothing wrong with this; the question is, can we take things a step further? Absolutely, once we've gained some experience with using these functions, we can introduce custom functions (or helpers) that expand what we can do. A great library for this purpose is Compass, available at http://www.compass-style.org; we'll make use of this to change the colors which we created from our earlier boxes demo, in the section, Creating colors using functions. Compass is a CSS authoring framework, which provides extra mixins and reusable patterns to add extra functionality to Sass. In our demo, we're using shade(), which is one of the several color helpers provided by the Compass library. Let's make a start: We're using Compass in this demo, so we'll begin with installing the library. To do this, fire up Command Prompt, then navigate to our project area. We need to make sure that our installation RubyGems system software is up to date, so at Command Prompt, enter the following, and then press Enter: gem update --system Next, we're installing Compass itself—at the prompt, enter this command, and then press Enter: gem install compass Compass works best when we get it to create a project shell (or template) for us. To do this, first browse to http://www.compass-style.org/install, and then enter the following in the Tell us about your project… area: Leave anything in grey text as blank. This produces the following commands—enter each at Command Prompt, pressing Enter each time: Navigate back to Command Prompt. We need to compile our SCSS code, so go ahead and enter this command at the prompt (or copy and paste it), then press Enter: compass watch –sourcemap Next, extract a copy of the colorlibrary folder from the code download, and save it to the project area. In colorlibrary.scss, comment out the existing line for $backgrd_box6_color, and add the following immediately below it: $backgrd_box6_color: shade($backgrd_box5_color, 25%); Save the changes to colorlibrary.scss. If all is well, Compass's watch facility should kick in and recompile the code automatically. To verify that this has been done, look in the css subfolder of the colorlibrary folder, and you should see both the compiled CSS and the source map files present. If you find Compass compiles files in unexpected folders, then try using the following command to specify the source and destination folders when compiling: compass watch --sass-dir sass --css-dir css If all is well, we will see the boxes, when previewing the results in a browser window, as in the following image. Notice how Box 6 has gone a nice shade of deep red (if not almost brown)? To really confirm that all the changes have taken place as required, we can fire up a DOM inspector such as Firebug; a quick check confirms that the color has indeed changed: If we explore even further, we can see that the compiled code shows that the original line for Box 6 has been commented out, and that we're using the new function from the Compass helper library: This is a great way to push the boundaries of what we can do when creating colors. To learn more about using the Compass helper functions, it's worth exploring the official documentation at http://compass-style.org/reference/compass/helpers/colors/. We used the shade() function in our code, which darkens the color used. There is a key difference to using something such as darken() to perform the same change. To get a feel of the difference, take a look at the article on the CreativeBloq website at http://www.creativebloq.com/css3/colour-theming-sass-and-compass-6135593, which explains the difference very well. The documentation is a little lacking in terms of how to use the color helpers; the key is not to treat them as if they were normal mixins or functions, but to simply reference them in our code. To explore more on how to use these functions, take a look at the article by Antti Hiljá at http://clubmate.fi/how-to-use-the-compass-helper-functions/. We can, of course, create mixins to create palettes—for a more complex example, take a look at http://www.zingdesign.com/how-to-generate-a-colour-palette-with-compass/ to understand how such a mixin can be created using Compass. Okay, let's move on. So far, we've talked about using functions to manipulate colors; the flip side is that we are likely to use operators to manipulate values such as font sizes. For now, let's change tack and take a look at creating new values for changing font sizes. Changing font sizes using operators We already talked about using functions to create practically any value. Well, we've seen how to do it with colors; we can apply similar principles to creating font sizes too. In this case, we set a base font size (in the same way that we set a base color), and then simply increase or decrease font sizes as desired. In this instance, we won't use functions, but instead, use standard math operators, such as add, subtract, or divide. When working with these operators, there are a couple of points to remember: Sass math functions preserve units—this means we can't work on numbers with different units, such as adding a px value to a rem value, but can work with numbers that can be converted to the same format, such as inches to centimeters If we multiply two values with the same units, then this will produce square units (that is, 10px * 10px == 100px * px). At the same time, px * px will throw an error as it is an invalid unit in CSS. There are some quirks when working with / as a division operator —in most instances, it is normally used to separate two values, such as defining a pair of font size values. However, if the value is surrounded in parentheses, used as a part of another arithmetic expression, or is stored in a variable, then this will be treated as a division operator. For full details, it is worth reading the relevant section in the official documentation at http://sass-lang.com/documentation/file.Sass_REFERENCE.html#division-and-slash. With these in mind, let's create a simple demo—a perfect use for Sass is to automatically work out sizes from H1 through to H6. We could just do this in a simple text editor, but this time, let's break with tradition and build our demo directly into a session on http://www.sassmeister.com. We can then play around with the values set, and see the effects of the changes immediately. If we're happy with the results of our work, we can copy the final version into a text editor and save them as standard SCSS (or CSS) files. Let's begin by browsing to http://www.sassmeister.com, and adding the following HTML markup window: <html> <head> <meta charset="utf-8" /> <title>Demo: Assigning colors using variables</title> <link rel="stylesheet" type="text/css" href="css/ colorvariables.css"> </head> <body> <h1>The cat sat on the mat</h1> <h2>The cat sat on the mat</h2> <h3>The cat sat on the mat</h3> <h4>The cat sat on the mat</h4> <h5>The cat sat on the mat</h5> <h6>The cat sat on the mat</h6> </body> </html> Next, add the following to the SCSS window—we first set a base value of 3.0, followed by a starting color of #b26d61, or a dark, moderate red: $baseSize: 3.0; $baseColor: #b26d61; We need to add our H1 to H6 styles. The rem mixin was created by Chris Coyier, at https://css-tricks.com/snippets/css/less-mixin-for-rem-font-sizing/. We first set the font size, followed by setting the font color, using either the base color set earlier, or a function to produce a different shade: h1 { font-size: $baseSize; color: $baseColor; } h2 { font-size: ($baseSize - 0.2); color: darken($baseColor, 20%); } h3 { font-size: ($baseSize - 0.4); color: lighten($baseColor, 10%); } h4 { font-size: ($baseSize - 0.6); color: saturate($baseColor, 20%); } h5 { font-size: ($baseSize - 0.8); color: $baseColor - 111; } h6 { font-size: ($baseSize - 1.0); color: rgb(red($baseColor) + 10, 23, 145); } SassMeister will automatically compile the code to produce a valid CSS, as shown in this screenshot: Try changing the base size of 3.0 to a different value—using http://www.sassmeister.com, we can instantly see how this affects the overall size of each H value. Note how we're multiplying the base variable by 10 to set the pixel value, or simply using the value passed to render each heading. In each instance, we can concatenate the appropriate unit using a plus (+) symbol. We then subtract an increasing value from $baseSize, before using this value as the font size for the relevant H value. You can see a similar example of this by Andy Baudoin as a CodePen, at http://codepen.io/baudoin/pen/HdliD/. He makes good use of nesting to display the color and strength of shade. Note that it uses a little JavaScript to add the text of the color that each line represents, and can be ignored; it does not affect the Sass used in the demo. The great thing about using a site such SassMeister is that we can play around with values and immediately see the results. For more details on using number operations in Sass, browse to the official documentation, which is at http://sass-lang.com/documentation/file.Sass_REFERENCE.html#number_operations. Okay, onwards we go. Let's turn our attention to creating something a little more substantial; we're going to create a complete site theme using the power of Sass and a few simple calculations. Summary Phew! What a tour! One of the key concepts of Sass is the use of functions and operators to create values, so let's take a moment to recap what we have covered throughout this article. We kicked off with a look at creating color values using functions, before discovering how we can mix and match different functions to create different shades, or using external libraries to add extra functionality to Sass. We then moved on to take a look at another key use of functions, with a look at defining different font sizes, using standard math operators. Resources for Article: Further resources on this subject: Nesting, Extend, Placeholders, and Mixins [article] Implementation of SASS [article] Constructing Common UI Widgets [article]

0
0
4138

How-To Tutorials

article-image-updating-and-building-our-masters

Packt

10 Aug 2015

20 min read

Updating and building our masters

Packt

10 Aug 2015

20 min read

0
0
9264

How-To Tutorials

article-image-bayesian-network-fundamentals

Packt

10 Aug 2015

25 min read

Bayesian Network Fundamentals

Packt

10 Aug 2015

25 min read

In this article by Ankur Ankan and Abinash Panda, the authors of Mastering Probabilistic Graphical Models Using Python, we'll cover the basics of random variables, probability theory, and graph theory. We'll also see the Bayesian models and the independencies in Bayesian models. A graphical model is essentially a way of representing joint probability distribution over a set of random variables in a compact and intuitive form. There are two main types of graphical models, namely directed and undirected. We generally use a directed model, also known as a Bayesian network, when we mostly have a causal relationship between the random variables. Graphical models also give us tools to operate on these models to find conditional and marginal probabilities of variables, while keeping the computational complexity under control. (For more resources related to this topic, see here.) Probability theory To understand the concepts of probability theory, let's start with a real-life situation. Let's assume we want to go for an outing on a weekend. There are a lot of things to consider before going: the weather conditions, the traffic, and many other factors. If the weather is windy or cloudy, then it is probably not a good idea to go out. However, even if we have information about the weather, we cannot be completely sure whether to go or not; hence we have used the words probably or maybe. Similarly, if it is windy in the morning (or at the time we took our observations), we cannot be completely certain that it will be windy throughout the day. The same holds for cloudy weather; it might turn out to be a very pleasant day. Further, we are not completely certain of our observations. There are always some limitations in our ability to observe; sometimes, these observations could even be noisy. In short, uncertainty or randomness is the innate nature of the world. The probability theory provides us the necessary tools to study this uncertainty. It helps us look into options that are unlikely yet probable. Random variable Probability deals with the study of events. From our intuition, we can say that some events are more likely than others, but to quantify the likeliness of a particular event, we require the probability theory. It helps us predict the future by assessing how likely the outcomes are. Before going deeper into the probability theory, let's first get acquainted with the basic terminologies and definitions of the probability theory. A random variable is a way of representing an attribute of the outcome. Formally, a random variable X is a function that maps a possible set of outcomes ? to some set E, which is represented as follows: X : ? ? E As an example, let us consider the outing example again. To decide whether to go or not, we may consider the skycover (to check whether it is cloudy or not). Skycover is an attribute of the day. Mathematically, the random variable skycover (X) is interpreted as a function, which maps the day (?) to its skycover values (E). So when we say the event X = 40.1, it represents the set of all the days {?} such that , where is the mapping function. Formally speaking, . Random variables can either be discrete or continuous. A discrete random variable can only take a finite number of values. For example, the random variable representing the outcome of a coin toss can take only two values, heads or tails; and hence, it is discrete. Whereas, a continuous random variable can take infinite number of values. For example, a variable representing the speed of a car can take any number values. For any event whose outcome is represented by some random variable (X), we can assign some value to each of the possible outcomes of X, which represents how probable it is. This is known as the probability distribution of the random variable and is denoted by P(X). For example, consider a set of restaurants. Let X be a random variable representing the quality of food in a restaurant. It can take up a set of values, such as {good, bad, average}. P(X), represents the probability distribution of X, that is, if P(X = good) = 0.3, P(X = average) = 0.5, and P(X = bad) = 0.2. This means there is 30 percent chance of a restaurant serving good food, 50 percent chance of it serving average food, and 20 percent chance of it serving bad food. Independence and conditional independence In most of the situations, we are rather more interested in looking at multiple attributes at the same time. For example, to choose a restaurant, we won't only be looking just at the quality of food; we might also want to look at other attributes, such as the cost, location, size, and so on. We can have a probability distribution over a combination of these attributes as well. This type of distribution is known as joint probability distribution. Going back to our restaurant example, let the random variable for the quality of food be represented by Q, and the cost of food be represented by C. Q can have three categorical values, namely {good, average, bad}, and C can have the values {high, low}. So, the joint distribution for P(Q, C) would have probability values for all the combinations of states of Q and C. P(Q = good, C = high) will represent the probability of a pricey restaurant with good quality food, while P(Q = bad, C = low) will represent the probability of a restaurant that is less expensive with bad quality food. Let us consider another random variable representing an attribute of a restaurant, its location L. The cost of food in a restaurant is not only affected by the quality of food but also the location (generally, a restaurant located in a very good location would be more costly as compared to a restaurant present in a not-very-good location). From our intuition, we can say that the probability of a costly restaurant located at a very good location in a city would be different (generally, more) than simply the probability of a costly restaurant, or the probability of a cheap restaurant located at a prime location of city is different (generally less) than simply probability of a cheap restaurant. Formally speaking, P(C = high | L = good) will be different from P(C = high) and P(C = low | L = good) will be different from P(C = low). This indicates that the random variables C and L are not independent of each other. These attributes or random variables need not always be dependent on each other. For example, the quality of food doesn't depend upon the location of restaurant. So, P(Q = good | L = good) or P(Q = good | L = bad)would be the same as P(Q = good), that is, our estimate of the quality of food of the restaurant will not change even if we have knowledge of its location. Hence, these random variables are independent of each other. In general, random variables can be considered as independent of each other, if: They may also be considered independent if: We can easily derive this conclusion. We know the following from the chain rule of probability: P(X, Y) = P(X) P(Y | X) If Y is independent of X, that is, if X | Y, then P(Y | X) = P(Y). Then: P(X, Y) = P(X) P(Y) Extending this result on multiple variables, we can easily get to the conclusion that a set of random variables are independent of each other, if their joint probability distribution is equal to the product of probabilities of each individual random variable. Sometimes, the variables might not be independent of each other. To make this clearer, let's add another random variable, that is, the number of people visiting the restaurant N. Let's assume that, from our experience we know the number of people visiting only depends on the cost of food at the restaurant and its location (generally, lesser number of people visit costly restaurants). Does the quality of food Q affect the number of people visiting the restaurant? To answer this question, let's look into the random variable affecting N, cost C, and location L. As C is directly affected by Q, we can conclude that Q affects N. However, let's consider a situation when we know that the restaurant is costly, that is, C = high and let's ask the same question, "does the quality of food affect the number of people coming to the restaurant?". The answer is no. The number of people coming only depends on the price and location, so if we know that the cost is high, then we can easily conclude that fewer people will visit, irrespective of the quality of food. Hence, . This type of independence is called conditional independence. Installing tools Let's now see some coding examples using pgmpy, to represent joint distributions and independencies. Here, we will mostly work with IPython and pgmpy (and a few other libraries) for coding examples. So, before moving ahead, let's get a basic introduction to these. IPython IPython is a command shell for interactive computing in multiple programming languages, originally developed for the Python programming language, which offers enhanced introspection, rich media, additional shell syntax, tab completion, and a rich history. IPython provides the following features: Powerful interactive shells (terminal and Qt-based) A browser-based notebook with support for code, text, mathematical expressions, inline plots, and other rich media Support for interactive data visualization and use of GUI toolkits Flexible and embeddable interpreters to load into one's own projects Easy-to-use and high performance tools for parallel computing You can install IPython using the following command: >>> pip3 install ipython To start the IPython command shell, you can simply type ipython3 in the terminal. For more installation instructions, you can visit http://ipython.org/install.html. pgmpy pgmpy is a Python library to work with Probabilistic Graphical models. As it's currently not on PyPi, we will need to build it manually. You can get the source code from the Git repository using the following command: >>> git clone https://github.com/pgmpy/pgmpy Now cd into the cloned directory switch branch for version used and build it with the following code: >>> cd pgmpy >>> git checkout book/v0.1 >>> sudo python3 setup.py install For more installation instructions, you can visit http://pgmpy.org/install.html. With both IPython and pgmpy installed, you should now be able to run the examples. Representing independencies using pgmpy To represent independencies, pgmpy has two classes, namely IndependenceAssertion and Independencies. The IndependenceAssertion class is used to represent individual assertions of the form of or . Let's see some code to represent assertions: # Firstly we need to import IndependenceAssertion In [1]: from pgmpy.independencies import IndependenceAssertion # Each assertion is in the form of [X, Y, Z] meaning X is # independent of Y given Z. In [2]: assertion1 = IndependenceAssertion('X', 'Y') In [3]: assertion1 Out[3]: (X _|_ Y) Here, assertion1 represents that the variable X is independent of the variable Y. To represent conditional assertions, we just need to add a third argument to IndependenceAssertion: In [4]: assertion2 = IndependenceAssertion('X', 'Y', 'Z') In [5]: assertion2 Out [5]: (X _|_ Y | Z) In the preceding example, assertion2 represents . IndependenceAssertion also allows us to represent assertions in the form of . To do this, we just need to pass a list of random variables as arguments: In [4]: assertion2 = IndependenceAssertion('X', 'Y', 'Z') In [5]: assertion2 Out[5]: (X _|_ Y | Z) Moving on to the Independencies class, an Independencies object is used to represent a set of assertions. Often, in the case of Bayesian or Markov networks, we have more than one assertion corresponding to a given model, and to represent these independence assertions for the models, we generally use the Independencies object. Let's take a few examples: In [8]: from pgmpy.independencies import Independencies # There are multiple ways to create an Independencies object, we # could either initialize an empty object or initialize with some # assertions. In [9]: independencies = Independencies() # Empty object In [10]: independencies.get_assertions() Out[10]: [] In [11]: independencies.add_assertions(assertion1, assertion2) In [12]: independencies.get_assertions() Out[12]: [(X _|_ Y), (X _|_ Y | Z)] We can also directly initialize Independencies in these two ways: In [13]: independencies = Independencies(assertion1, assertion2) In [14]: independencies = Independencies(['X', 'Y'], ['A', 'B', 'C']) In [15]: independencies.get_assertions() Out[15]: [(X _|_ Y), (A _|_ B | C)] Representing joint probability distributions using pgmpy We can also represent joint probability distributions using pgmpy's JointProbabilityDistribution class. Let's say we want to represent the joint distribution over the outcomes of tossing two fair coins. So, in this case, the probability of all the possible outcomes would be 0.25, which is shown as follows: In [16]: from pgmpy.factors import JointProbabilityDistribution as Joint In [17]: distribution = Joint(['coin1', 'coin2'], [2, 2], [0.25, 0.25, 0.25, 0.25]) Here, the first argument includes names of random variable. The second argument is a list of the number of states of each random variable. The third argument is a list of probability values, assuming that the first variable changes its states the slowest. So, the preceding distribution represents the following: In [18]: print(distribution) +--------------------------------------+ ¦ coin1 ¦ coin2 ¦ P(coin1,coin2) ¦ ¦---------+---------+------------------¦ ¦ coin1_0 ¦ coin2_0 ¦ 0.2500 ¦ +---------+---------+------------------¦ ¦ coin1_0 ¦ coin2_1 ¦ 0.2500 ¦ +---------+---------+------------------¦ ¦ coin1_1 ¦ coin2_0 ¦ 0.2500 ¦ +---------+---------+------------------¦ ¦ coin1_1 ¦ coin2_1 ¦ 0.2500 ¦ +--------------------------------------+ We can also conduct independence queries over these distributions in pgmpy: In [19]: distribution.check_independence('coin1', 'coin2') Out[20]: True Conditional probability distribution Let's take an example to understand conditional probability better. Let's say we have a bag containing three apples and five oranges, and we want to randomly take out fruits from the bag one at a time without replacing them. Also, the random variables and represent the outcomes in the first try and second try respectively. So, as there are three apples and five oranges in the bag initially, and . Now, let's say that in our first attempt we got an orange. Now, we cannot simply represent the probability of getting an apple or orange in our second attempt. The probabilities in the second attempt will depend on the outcome of our first attempt and therefore, we use conditional probability to represent such cases. Now, in the second attempt, we will have the following probabilities that depend on the outcome of our first try: , , , and . The Conditional Probability Distribution (CPD) of two variables and can be represented as , representing the probability of given that is the probability of after the event has occurred and we know it's outcome. Similarly, we can have representing the probability of after having an observation for . The simplest representation of CPD is tabular CPD. In a tabular CPD, we construct a table containing all the possible combinations of different states of the random variables and the probabilities corresponding to these states. Let's consider the earlier restaurant example. Let's begin by representing the marginal distribution of the quality of food with Q. As we mentioned earlier, it can be categorized into three values {good, bad, average}. For example, P(Q) can be represented in the tabular form as follows: Quality P(Q) Good 0.3 Normal 0.5 Bad 0.2 Similarly, let's say P(L) is the probability distribution of the location of the restaurant. Its CPD can be represented as follows: Location P(L) Good 0.6 Bad 0.4 As the cost of restaurant C depends on both the quality of food Q and its location L, we will be considering P(C | Q, L), which is the conditional distribution of C, given Q and L: Location Good Bad Quality Good Normal Bad Good Normal Bad Cost High 0.8 0.6 0.1 0.6 0.6 0.05 Low 0.2 0.4 0.9 0.4 0.4 0.95 Representing CPDs using pgmpy Let's first see how to represent the tabular CPD using pgmpy for variables that have no conditional variables: In [1]: from pgmpy.factors import TabularCPD # For creating a TabularCPD object we need to pass three # arguments: the variable name, its cardinality that is the number # of states of the random variable and the probability value # corresponding each state. In [2]: quality = TabularCPD(variable='Quality', variable_card=3, values=[[0.3], [0.5], [0.2]]) In [3]: print(quality) +----------------------+ ¦ ['Quality', 0] ¦ 0.3 ¦ +----------------+-----¦ ¦ ['Quality', 1] ¦ 0.5 ¦ +----------------+-----¦ ¦ ['Quality', 2] ¦ 0.2 ¦ +----------------------+ In [4]: quality.variables Out[4]: OrderedDict([('Quality', [State(var='Quality', state=0), State(var='Quality', state=1), State(var='Quality', state=2)])]) In [5]: quality.cardinality Out[5]: array([3]) In [6]: quality.values Out[6]: array([0.3, 0.5, 0.2]) You can see here that the values of the CPD are a 1D array instead of a 2D array, which you passed as an argument. Actually, pgmpy internally stores the values of the TabularCPD as a flattened numpy array. In [7]: location = TabularCPD(variable='Location', variable_card=2, values=[[0.6], [0.4]]) In [8]: print(location) +-----------------------+ ¦ ['Location', 0] ¦ 0.6 ¦ +-----------------+-----¦ ¦ ['Location', 1] ¦ 0.4 ¦ +-----------------------+ However, when we have conditional variables, we also need to specify them and the cardinality of those variables. Let's define the TabularCPD for the cost variable: In [9]: cost = TabularCPD( variable='Cost', variable_card=2, values=[[0.8, 0.6, 0.1, 0.6, 0.6, 0.05], [0.2, 0.4, 0.9, 0.4, 0.4, 0.95]], evidence=['Q', 'L'], evidence_card=[3, 2]) Graph theory The second major framework for the study of probabilistic graphical models is graph theory. Graphs are the skeleton of PGMs, and are used to compactly encode the independence conditions of a probability distribution. Nodes and edges The foundation of graph theory was laid by Leonhard Euler when he solved the famous Seven Bridges of Konigsberg problem. The city of Konigsberg was set on both sides by the Pregel river and included two islands that were connected and maintained by seven bridges. The problem was to find a walk to exactly cross all the bridges once in a single walk. To visualize the problem, let's think of the graph in Fig 1.1: Fig 1.1: The Seven Bridges of Konigsberg graph Here, the nodes a, b, c, and d represent the land, and are known as vertices of the graph. The line segments ab, bc, cd, da, ab, and bc connecting the land parts are the bridges and are known as the edges of the graph. So, we can think of the problem of crossing all the bridges once in a single walk as tracing along all the edges of the graph without lifting our pencils. Formally, a graph G = (V, E) is an ordered pair of finite sets. The elements of the set V are known as the nodes or the vertices of the graph, and the elements of are the edges or the arcs of the graph. The number of nodes or cardinality of G, denoted by |V|, are known as the order of the graph. Similarly, the number of edges denoted by |E| are known as the size of the graph. Here, we can see that the Konigsberg city graph shown in Fig 1.1 is of order 4 and size 7. In a graph, we say that two vertices, u, v ? V are adjacent if u, v ? E. In the City graph, all the four vertices are adjacent to each other because there is an edge for every possible combination of two vertices in the graph. Also, for a vertex v ? V, we define the neighbors set of v as . In the City graph, we can see that b and d are neighbors of c. Similarly, a, b, and c are neighbors of d. We define an edge to be a self loop if the start vertex and the end vertex of the edge are the same. We can put it more formally as, any edge of the form (u, u), where u ? V is a self loop. Until now, we have been talking only about graphs whose edges don't have a direction associated with them, which means that the edge (u, v) is same as the edge (v, u). These types of graphs are known as undirected graphs. Similarly, we can think of a graph whose edges have a sense of direction associated with it. For these graphs, the edge set E would be a set of ordered pair of vertices. These types of graphs are known as directed graphs. In the case of a directed graph, we also define the indegree and outdegree for a vertex. For a vertex v ? V, we define its outdegree as the number of edges originating from the vertex v, that is, . Similarly, the indegree is defined as the number of edges that end at the vertex v, that is, . Walk, paths, and trails For a graph G = (V, E) and u,v ? V, we define a u - v walk as an alternating sequence of vertices and edges, starting with u and ending with v. In the City graph of Fig 1.1, we can have an example of a - d walk as . If there aren't multiple edges between the same vertices, then we simply represent a walk by a sequence of vertices. As in the case of the Butterfly graph shown in Fig 1.2, we can have a walk W : a, c, d, c, e: Fig 1.2: Butterfly graph—a undirected graph A walk with no repeated edges is known as a trail. For example, the walk in the City graph is a trail. Also, a walk with no repeated vertices, except possibly the first and the last, is known as a path. For example, the walk in the City graph is a path. Also, a graph is known as cyclic if there are one or more paths that start and end at the same node. Such paths are known as cycles. Similarly, if there are no cycles in a graph, it is known as an acyclic graph. Bayesian models In most of the real-life cases when we would be representing or modeling some event, we would be dealing with a lot of random variables. Even if we would consider all the random variables to be discrete, there would still be exponentially large number of values in the joint probability distribution. Dealing with such huge amount of data would be computationally expensive (and in some cases, even intractable), and would also require huge amount of memory to store the probability of each combination of states of these random variables. However, in most of the cases, many of these variables are marginally or conditionally independent of each other. By exploiting these independencies, we can reduce the number of values we need to store to represent the joint probability distribution. For instance, in the previous restaurant example, the joint probability distribution across the four random variables that we discussed (that is, quality of food Q, location of restaurant L, cost of food C, and the number of people visiting N) would require us to store 23 independent values. By the chain rule of probability, we know the following: P(Q, L, C, N) = P(Q) P(L|Q) P(C|L, Q) P(N|C, Q, L) Now, let us try to exploit the marginal and conditional independence between the variables, to make the representation more compact. Let's start by considering the independency between the location of the restaurant and quality of food over there. As both of these attributes are independent of each other, P(L|Q) would be the same as P(L). Therefore, we need to store only one parameter to represent it. From the conditional independence that we have seen earlier, we know that . Thus, P(N|C, Q, L) would be the same as P(N|C, L); thus needing only four parameters. Therefore, we now need only (2 + 1 + 6 + 4 = 13) parameters to represent the whole distribution. We can conclude that exploiting independencies helps in the compact representation of joint probability distribution. This forms the basis for the Bayesian network. Representation A Bayesian network is represented by a Directed Acyclic Graph (DAG) and a set of Conditional Probability Distributions (CPD) in which: The nodes represent random variables The edges represent dependencies For each of the nodes, we have a CPD In our previous restaurant example, the nodes would be as follows: Quality of food (Q) Location (L) Cost of food (C) Number of people (N) As the cost of food was dependent on the quality of food (Q) and the location of the restaurant (L), there will be an edge each from Q ? C and L ? C. Similarly, as the number of people visiting the restaurant depends on the price of food and its location, there would be an edge each from L ? N and C ? N. The resulting structure of our Bayesian network is shown in Fig 1.3: Fig 1.3: Bayesian network for the restaurant example Factorization of a distribution over a network Each node in our Bayesian network for restaurants has a CPD associated to it. For example, the CPD for the cost of food in the restaurant is P(C|Q, L), as it only depends on the quality of food and location. For the number of people, it would be P(N|C, L) . So, we can generalize that the CPD associated with each node would be P(node|Par(node)) where Par(node) denotes the parents of the node in the graph. Assuming some probability values, we will finally get a network as shown in Fig 1.4: Fig 1.4: Bayesian network of restaurant along with CPDs Let us go back to the joint probability distribution of all these attributes of the restaurant again. Considering the independencies among variables, we concluded as follows: P(Q,C,L,N) = P(Q)P(L)P(C|Q, L)P(N|C, L) So now, looking into the Bayesian network (BN) for the restaurant, we can say that for any Bayesian network, the joint probability distribution over all its random variables {X1,X2,...,Xn} can be represented as follows: This is known as the chain rule for Bayesian networks. Also, we say that a distribution P factorizes over a graph G, if P can be encoded as follows: Here, ParG(X) is the parent of X in the graph G. Summary In this article, we saw how we can represent a complex joint probability distribution using a directed graph and a conditional probability distribution associated with each node, which is collectively known as a Bayesian network. Resources for Article: Further resources on this subject: Web Scraping with Python [article] Exact Inference Using Graphical Models [article] wxPython: Design Approaches and Techniques [article]

0
0
47571

Packt

10 Aug 2015

4 min read

An Introduction to WEP

Packt

10 Aug 2015

4 min read

In this article by Marco Alamanni, author of the book, Kali Linux Wireless Penetration Testing Essentials, has explained that the WEP protocol was introduced with the original 802.11 standard as a means to provide authentication and encryption to wireless LAN implementations. It is based on the RC4 (Rivest Cipher 4) stream cypher with a preshared secret key (PSK) of 40 or 104 bits, depending on the implementation. A 24 bit pseudo-random Initialization Vector (IV) is concatenated with the preshared key to generate the per-packet keystream used by RC4 for the actual encryption and decryption processes. Thus, the resulting keystream could be 64 or 128 bits long. (For more resources related to this topic, see here.) In the encryption phase, the keystream is XORed with the plaintext data to obtain the encrypted data, while in the decryption phase the encrypted data is XORed with the keystream to obtain the plaintext data. The encryption process is shown in the following diagram: Attacks against WEP First of all, we must say that WEP is an insecure protocol and has been deprecated by the Wi-Fi Alliance. It suffers from various vulnerabilities related to the generation of the keystreams, to the use of IVs and to the length of the keys. The IV is used to add randomness to the keystream, trying to avoid the reuse of the same keystream to encrypt different packets. This purpose has not been accomplished in the design of WEP, because the IV is only 24 bits long (with 2^24 = 16,777,216 possible values) and it is transmitted in clear-text within each frame. Thus, after a certain period of time (depending on the network traffic) the same IV, and consequently the same keystream, will be reused, allowing the attacker to collect the relative cypher texts and perform statistical attacks to recover the plain texts and the key. The first well-known attack against WEP was the Fluhrer, Mantin and Shamir (FMS) attack, back in 2001. The FMS attack relies on the way WEP generates the keystreams and on the fact that it also uses weak IVs to generate weak keystreams, making possible for an attacker to collect a sufficient number of packets encrypted with these keystreams, analyze them, and recover the key. The number of IVs to be collected to complete the FMS attack is about 250,000 for 40-bit keys and 1,500,000 for 104-bit keys. The FMS attack has been enhanced by Korek, improving its performances. Andreas Klein found more correlations between the RC4 keystream and the key than the ones discovered by Fluhrer, Mantin, and Shamir, that can used to crack the WEP key. In 2007, Pyshkin, Tews, and Weinmann (PTW) extended Andreas Klein's research and improved the FMS attack, significantly reducing the number of IVs needed to successfully recover the WEP key. Indeed, the PTW attack does not rely on weak IVs like the FMS attack does and is very fast and effective. It is able to recover a 104-bit WEP key with a success probability of 50 percent using less than 40,000 frames and with a probability of 95 percent with 85,000 frames. The PTW attack is the default method used by Aircrack-ng to crack WEP keys. Both the FMS and PTW attacks need to collect quite a large number of frames to succeed and can be conducted passively, sniffing the wireless traffic on the same channel of the target AP and capturing frames. The problem is that, in normal conditions, we will have to spend quite a long time to passively collect all the necessary packets for the attacks, especially with the FMS attack. To accelerate the process, the idea is to re-inject frames in the network to generate traffic in response so that we could collect the necessary IVs more quickly. A type of frame that is suitable for this purpose is the ARP request, because the AP broadcasts it and each time with a new IV. As we are not associated with the AP, if we send frames to it directly, they are discarded and a de-authentication frame is sent. Instead, we can capture ARP requests from associated clients and retransmit them to the AP. This technique is called the ARP Request Replay attack and is also adopted by Aircrack-ng for the implementation of the PTW attack. Summary In this article, we covered the WEP protocol, the attacks that have been developed to crack the keys. Resources for Article: Further resources on this subject: Kali Linux – Wireless Attacks [article] What is Kali Linux [article] Penetration Testing [article]

0
0
9674

How-To Tutorials

article-image-oracle-goldengate-12c-overview

Packt

10 Aug 2015

21 min read

Oracle GoldenGate 12c — An Overview

Packt

10 Aug 2015

21 min read

In this article by John P Jeffries, author of the book Oracle GoldenGate 12c Implementer's Guide, he provides an introduction to Oracle GoldenGate by describing the key components, processes, and considerations required to build and implement a GoldenGate solution. John tells you how to address some of the issues that influence the decision-making process when you design a GoldenGate solution. He focuses on the additional configuration options available in Oracle GoldenGate 12c (For more resources related to this topic, see here.) 12c new features Oracle has provided some exciting new features in their 12c version of GoldenGate, some of which we have already touched upon. Following the official desupport of Oracle Streams in Oracle Database 12c, Oracle has essentially migrated some of the key features to its strategic product. You will find that GoldenGate now has a tighter integration with the Oracle database, enabling enhanced functionality. Let's explore some of the new features available in Oracle GoldenGate 12c. Integrated capture Integrated capture has been available since Oracle GoldenGate 11gR2 with Oracle Database 11g (11.2.0.3). Originally decoupled from the database, GoldenGate's new architecture provides the option to integrate its Extract process(es) with the Oracle database. This enables GoldenGate to access the database's data dictionary and undo tablespace, providing replication support for advanced features and data types. Oracle GoldenGate 12c still supports the original Extract configuration, known as Classic Capture. Integrated Replicat Integrated Replicat is a new feature in Oracle GoldenGate 12c for the delivery of data to Oracle Database 11g (11.2.0.4) or 12c. The performance enhancement provides better scalability and load balancing that leverages the database parallel apply servers for automatic, dependency-aware parallel Replicat processes. With Integrated Replicat, there is no need for users to manually split the delivery process into multiple threads and manage multiple parameter files. GoldenGate now uses a lightweight streaming API to prepare, coordinate, and apply the data to the downstream database. Oracle GoldenGate 12c still supports the original Replicat configuration, known as Classic Delivery. Downstream capture Downstream capture was one of my favorite Oracle Stream features. It allows for a combined in-memory capture and apply process that achieves very low latency even in heavy data load situations. Like Streams, GoldenGate builds on this feature by employing a real-time downstream capture process. This method uses Oracle Data Guard's log transportation mechanism, which writes changed data to standby redo logs. It provides a best-of-both-worlds approach, enabling a real-time mine configuration that falls back to archive log mining when the apply process cannot keep up. In addition, the real-time mine process is re-enabled automatically when the data throughput is less. Installation One of the major changes in Oracle GoldenGate 12c is the installation method. Like other Oracle products, Oracle GoldenGate 12c is now installed using the Java-based Oracle Universal Installer (OUI) in either the interactive or silent mode. OUI reads the Oracle Inventory on your system to discover existing installations (Oracle Homes), allowing you to install, deinstall, or clone software products. Upgrading to 12c Whether you wish to upgrade your current GoldenGate installation from Oracle GoldenGate 11g Release 2 or from an earlier version, the steps are the same. Simply stop all the GoldenGate running processes on your database server, backup the GoldenGate home, and then use OUI to perform the fresh installation. It is important to note, however, while restarting replication, ensure the capture process begins from the point at which it was gracefully stopped to guarantee against lost synchronization data. Multitenant database replication As the version suggests, Oracle GoldenGate 12c now supports data replication for Oracle Database 12c. Those familiar with the 12c database features will be aware of the multitenant container database (CDB) that provides database consolidation. Each CDB consists of a root container and one or more pluggable databases (PDB). The PDB can contain multiple schemas and objects, just like a conventional database that GoldenGate replicates data to and from. The GoldenGate Extract process pulls data from multiple PDBs or containers in the source, combining the changed data into a single trail file. Replicat, however, splits the data into multiple process groups in order to apply the changes to a target PDB. Coordinated Delivery The Coordinated Delivery option applies to the GoldenGate Replicat process when configured in the classic mode. It provides a performance gain by automatically splitting the delivered data from a remote trail file into multiple threads that are then applied to the target database in parallel. GoldenGate manages the coordination across selected events that require ordering, including DDL, primary key updates, event marker interface (EMI), and SQLEXEC. Coordinated Delivery can be used with both Oracle (from version 11.2.0.4) and non-Oracle databases. Event-based processing In GoldenGate 12c, event-based processing has been enhanced to allow specific events to be captured and acted upon automatically through an EMI. SQLEXEC provides the API to EMI, enabling programmatic execution of tasks following an event. Now it is possible, for example, to detect the start of a batch job or large transaction, trap the SQL statement(s), and ignore the subsequent multiple change records until the end of the source system transaction. The original DML can then be replayed on the target database as one transaction. This is a major step forward in the performance tuning for data replication. Enhanced security Recent versions of GoldenGate have included security features such as the encryption of passwords and data. Oracle GoldenGate 12c now supports a credential store, better known as an Oracle wallet, that securely stores an alias associated with a username and password. The alias is then referenced in the GoldenGate parameter files rather than the actual username and password. Conflict Detection and Resolution In earlier versions of GoldenGate, Conflict Detection and Resolution (CDR) has been somewhat lightweight and was not readily available out of the box. Although available in Oracle Streams, the GoldenGate administrator would have to programmatically resolve any data conflict in the replication process using GoldenGate built-in tools. In the 12c version, the feature has emerged as an easily configurable option through Extract and Replicat parameters. Dynamic Rollback Selective data back out of applied transactions is now possible using the Dynamic Rollback feature. The feature operates at table and record-level and supports point-in-time recovery. This potentially eliminates the need for a full database restore, following data corruption, erroneous deletions, or perhaps the removal of test data, thus avoiding hours of system downtime. Streams to GoldenGate migration Oracle Streams users can now migrate their data replication solution to Oracle GoldenGate 12c using a purpose-built utility. This is a welcomed feature given that Streams is no longer supported in Oracle Database 12c. The Streams2ogg tool auto generates Oracle GoldenGate configuration files that greatly simplify the effort required in the migration process. Performance In today's demand for real-time access to real-time data, high performance is the key. For example, businesses will no longer wait for information to arrive on their DSS to make decisions and users will expect the latest information to be available in the public cloud. Data has value and must be delivered in real time to meet the demand. So, how long does it take to replicate a transaction from the source database to its target? This is known as end-to-end latency, which typically has a threshold that must not be breeched in order to satisfy a predefined Service Level Agreement (SLA). GoldenGate refers to latency as lag, which can be measured at different intervals in the replication process. They are as follows: Source to Extract: The time taken for a record to be processed by the Extract compared to the commit timestamp on the database Replicat to target: The time taken for the last record to be processed by the Replicat process compared to the record creation time in the trail file A well-designed system may still encounter spikes in the latency, but it should never be continuous or growing. Peaks are typically caused by load on the source database system, where the latency increases with the number of transactions per second. Lag should be measured as an average over a specified period. Trying to tune GoldenGate when the design is poor is a difficult situation to be in. For the system to perform well, you may need to revisit the design. Availability Another important NFR is availability. Normally quoted as a percentage, the system must be available for the specified length of time. For example, NFR of 99.9 percent availability equates to a downtime of 8.76 hours in a year, which sounds quite a lot, especially if it were to occur all at once. Oracle's maximum availability architecture (MAA) offers enhanced availability through products such as Real Application Clusters (RAC) and Active Data Guard (ADG). However, as we previously described, the network plays a major role in data replication. The NFR relates to the whole system, so you need to be sure your design covers redundancy for all components. Event-based processing It is important in any data replication environment to capture and manage events, such as trail records containing specific data or operations or maybe the occurrence of a certain error. These are known as Event Markers. GoldenGate provides a mechanism to perform an action on a given event or condition. These are known as Event Actions and are triggered by Event Records. If you are familiar with Oracle Streams, Event Actions are like rules. The Event Marker System GoldenGate's Event Marker System, also known as event marker interface (EMI), allows custom DML-driven processing on an event. This comprises of an Event Record to trigger a given action. An Event Record can be either a trail record that satisfies a condition evaluated by a WHERE or FILTER clause or a record written to an event table that enables an action to occur. Typical actions are writing status information, reporting errors, ignoring certain records in a trail, invoking a shell script, or performing an administrative task. The following Replicat code describes the process of capturing an event and performing an action by logging DELETE operations made against the CREDITCARD_ACCOUNTS table using the EVENTACTIONS parameter: MAP SRC.CREDITCARD_ACCOUNTS, TARGET TGT.CREDITCARD_ACCOUNTS_DIM;TABLE SRC.CREDITCARD_ACCOUNTS, &FILTER (@GETENV ('GGHEADER', 'OPTYPE') = 'DELETE'), &EVENTACTIONS (LOG INFO); By default, all logged information is written to the process group report file, the GoldenGate error log, and the system messages file. On Linux, this is the /var/log/messages file. Note that the TABLE parameter is also used in the Replicat's parameter file. This is a means of triggering an Event Action to be executed by the Replicat when it encounters an Event Marker. The following code shows the use of the IGNORE option that prevents certain records from being extracted or replicated, which is particularly useful to filter out system type data. When used with the TRANSACTION option, the whole transaction and not just the Event Record is ignored: TABLE SRC.CREDITCARD_ACCOUNTS, &FILTER (@GETENV ('GGHEADER', 'OPTYPE') = 'DELETE'), &EVENTACTIONS (IGNORE TRANSACTION); The preceding code extends the previous code by stopping the Event Record itself from being replicated. Using Event Actions to improve batch performance All replication technologies typically suffer from one flaw that is the way in which the data is replicated. Consider a table that is populated with a million rows as part of a batch process. This may be a bulk insert operation that Oracle completes on the source database as one transaction. However, Oracle will write each change to its redo logs as Logical Change Records (LCRs). GoldenGate will subsequently mine the logs, write the LCRs to a remote trail, convert each one back to DML, and apply them to the target database, one row at a time. The single source transaction becomes one million transactions, which causes a huge performance overhead. To overcome this issue, we can use Event Actions to: Detect the DML statement (INSERT INTO TABLE SELECT ..) Ignore the data resulting from the SELECT part of the statement Replicate just the DML statement as an Event Record Execute just the DML statement on the target database The solution requires a statement table on both source and target databases to trigger the event. Also, both databases must be perfectly synchronized to avoid data integrity issues. User tokens User tokens are GoldenGate environment variables that are captured and stored in the trail record for replication. They can be accessed via the @GETENV function. We can use token data in column maps, stored procedures called by SQLEXEC, and, of course, in macros. Using user tokens to populate a heartbeat table A vast array of user tokens exist in GoldenGate. Let's start by looking at a common method of replicating system information to populate a heartbeat table that can be used to monitor performance. We can use the TOKENS option of the Extract TABLE parameter to define a user token and associate it with the GoldenGate environment data. The following Extract configuration code shows the token declarations for the heartbeat table: TABLE GGADMIN.GG_HB_OUT, &TOKENS (EXTGROUP = @GETENV ("GGENVIRONMENT","GROUPNAME"), &EXTTIME = @DATE ("YYYY-MM-DD HH:MI:SS.FFFFFF","JTS",@GETENV("JULIANTIMESTAMP")), &EXTLAG = @GETENV ("LAG","SEC"), &EXTSTAT_TOTAL = @GETENV ("DELTASTATS","DML"), &), FILTER (@STREQ (EXTGROUP, @GETENV("GGENVIRONMENT","GROUPNAME"))); For the data pump, the example Extract configuration is shown here: TABLE GGADMIN.GG_HB_OUT, &TOKENS (PMPGROUP = @GETENV ("GGENVIRONMENT","GROUPNAME"), &PMPTIME = @DATE ("YYYY-MM-DD HH:MI:SS.FFFFFF","JTS",@GETENV("JULIANTIMESTAMP")), &PMPLAG = @GETENV ("LAG","SEC")); Also, for the Replicat, the following configuration populates the heartbeat table on the target database with the token data derived from Extract, data pump, and Replicat, containing system details and replication lag: MAP GGADMIN.GG_HB_OUT_SRC, TARGET GGADMIN.GG_HB_IN_TGT, &KEYCOLS (DB_NAME, EXTGROUP, PMPGROUP, REPGROUP), &INSERTMISSINGUPDATES, &COLMAP (USEDEFAULTS, &ID = 0, &SOURCE_COMMIT = @GETENV ("GGHEADER", "COMMITTIMESTAMP"), &EXTGROUP = @TOKEN ("EXTGROUP"), &EXTTIME = @TOKEN ("EXTTIME"), &PMPGROUP = @TOKEN ("PMPGROUP"), &PMPTIME = @TOKEN ("PMPTIME"), &REPGROUP = @TOKEN ("REPGROUP"), &REPTIME = @DATE ("YYYY-MM-DD HH:MI:SS.FFFFFF","JTS",@GETENV("JULIANTIMESTAMP")), &EXTLAG = @TOKEN ("EXTLAG"), &PMPLAG = @TOKEN ("PMPLAG"), &REPLAG = @GETENV ("LAG","SEC"), &EXTSTAT_TOTAL = @TOKEN ("EXTSTAT_TOTAL")); As in the heartbeat table example, the defined user tokens can be called in a MAP statement using the @TOKEN function. The SOURCE_COMMIT and LAG metrics are self-explained. However, EXTSTAT_TOTAL, which is derived from DELTASTATS, is particularly useful to measure the load on the source system when you evaluate latency peaks. For applications, user tokens are useful to audit data and trap exceptions within the replicated data stream. Common user tokens are shown in the following code that replicates the token data to five columns of an audit table: MAP SRC.AUDIT_LOG, TARGET TGT.AUDIT_LOG, &COLMAP (USEDEFAULTS, &OSUSER = @TOKEN ("TKN_OSUSER"), &DBNAME = @TOKEN ("TKN_DBNAME"), &HOSTNAME = @TOKEN ("TKN_HOSTNAME"), &TIMESTAMP = @TOKEN ("TKN_COMMITTIME"), &BEFOREAFTERINDICATOR = @TOKEN ("TKN_ BEFOREAFTERINDICATOR"); The BEFOREAFTERINDICATOR environment variable is particularly useful to provide a status flag in order to check whether the data was from a Before or After image of an UPDATE or DELETE operation. By default, GoldenGate provides After images. To enable a Before image extraction, the GETUPDATEBEFORES Extract parameter must be used on the source database. Using logic in the data replication GoldenGate has a number of functions that enable the administrator to program logic in the Extract and Replicat process configuration. These provide generic functions found in the IF and CASE programming languages. In addition, the @COLTEST function enables conditional calculations by testing for one or more column conditions. This is typically used with the @IF function, as shown in the following code: MAP SRC.CREDITCARD_PAYMENTS, TARGET TGT.CREDITCARD_PAYMENTS_FACT,&COLMAP (USEDEFAULTS, &AMOUNT = @IF(@COLTEST(AMOUNT, MISSING, INVALID), 0, AMOUNT)); Here, the @COLTEST function tests the AMOUNT column in the source data to check whether it is MISSING or INVALID. The @IF function returns 0 if @COLTEST returns TRUE and returns the value of AMOUNT if FALSE. The target AMOUNT column is therefore set to 0 when the equivalent source is found to be missing or invalid; otherwise, a direct mapping occurs. The @CASE function tests a list of values for a match and then returns a specified value. If no match is found, @CASE will return a default value. There is no limit to the number of cases to test; however, if the list is very large, a database lookup may be more appropriate. The following code shows the simplicity of the @CASE statement. Here, the country name is returned from the country code: MAP SRC.CREDITCARD_STATEMENT, TARGET TGT.CREDITCARD_STATEMENT_DIM,&COLMAP (USEDEFAULTS, &COUNTRY = @CASE(COUNTRY_CODE, "UK", "United Kingdom", "USA","United States of America")); Other GoldenGate functions: @EVAL and @VALONEOF exist that perform tests. Similar to @CASE, @VALONEOF compares a column or string to a list of values. The difference being it evaluates more than one value against a single column or string. When the following code is used with @IF, it returns "EUROPE" when TRUE and "UNKNOWN" when FALSE: MAP SRC.CREDITCARD_STATEMENT, TARGET TGT.CREDITCARD_STATEMENT_DIM,&COLMAP (USEDEFAULTS, &REGION = @IF(@VALONEOF(COUNTRY_CODE, "UK","E", "D"),"EUROPE","UNKNOWN")); The @EVAL function evaluates a list of conditions and returns a specified value. Optionally, if none are satisfied, it returns a default value. There is no limit to the number of evaluations you can list. However, it is best to list the most common evaluations at the beginning to enhance performance. The following code includes the BEFORE option that compares the before value of the replicated source column to the current value of the target column. Depending on the evaluation, @EVAL will return "PAID MORE", "PAID LESS", or "PAID SAME": MAP SRC.CREDITCARD_ PAYMENTS, TARGET TGT.CREDITCARD_PAYMENTS, &COLMAP (USEDEFAULTS, &STATUS = @EVAL(AMOUNT < BEFORE.AMOUNT, "PAID LESS", AMOUNT > BEFORE.AMOUNT, "PAID MORE", AMOUNT = BEFORE.AMOUNT, "PAID SAME")); The BEFORE option can be used with other GoldenGate functions, including the WHERE and FILTER clauses. However, for the Before image to be written to the trail and to be available, the GETUPDATEBEFORES parameter must be enabled in the source database's Extract parameter file or the target database's Replicat parameter file, but not both. The GETUPDATEBEFORES parameter can be set globally for all tables defined in the Extract or individually per table using GETUPDATEBEFORES and IGNOREUPDATEBEFORES, as seen in the following code: EXTRACT EOLTP01USERIDALIAS srcdb DOMAIN adminSOURCECATALOG PDB1EXTTRAIL ./dirdat/aaGETAPPLOPSIGNOREREPLICATESGETUPDATEBEFORESTABLE SRC.CHECK_PAYMENTS;IGNOREUPDATEBEFORESTABLE SRC.CHECK_PAYMENTS_STATUS;TABLE SRC.CREDITCARD_ACCOUNTS;TABLE SRC.CREDITCARD_PAYMENTS; Tracing processes to find wait events If you have worked with Oracle software, particularly in the performance tuning space, you will be familiar with tracing. Tracing enables additional information to be gathered from a given process or function to diagnose performance problems or even bugs. One example is the SQL trace that can be enabled at a database session or the system level to provide key information, such as; wait events, parse, fetch, and execute times. Oracle GoldenGate 12c offers a similar tracing mechanism through its trace and trace2 options of the SEND GGSCI command. This is like the session-level SQL trace. Also, in a similar fashion to performing a database system trace, tracing can be enabled in the GoldenGate process parameter files that make it permanent until the Extract or Replicat is stopped. trace provides processing information, whereas trace2 identifies the processes with wait events. The following commands show tracing being dynamically enabled for 2 minutes on a running Replicat process: GGSCI (db12server02) 1> send ROLAP01 trace2 ./dirrpt/ROLAP01.trc Wait for 2 minutes, then turn tracing off: GGSCI (db12server02) 2> send ROLAP01 trace2 offGGSCI (db12server02) 3> exit To view the contents of the Replicat trace file, we can execute the following command. In the case of a coordinated Replicat, the trace file will contain information from all of its threads: $ view dirrpt/ROLAP01.trcstatistics between 2015-08-08 Wed HKT 11:55:27 and 2015-08-08 Wed HKT11:57:28RPT_PROD_Ol.LIMIT_TP_RESP : n=2 : op=Insert; total=3; avg=1.5000;max=3msecRPT_PROD_01.SUP_POOL_SMRY_HIST : n=1 : op=Insert; total=2; avg=2.0000;max=2msecRPT_PROD_01.EVENTS : n=1 : op=Insert; total=2; avg=2.0000; max=2msecRPT_PROD_01.DOC_SHIP_DTLS : n=17880 : op=FieldComp; total=22003;avg=1.2306; max=42msecRPT_PROD_01.BUY_POOL_SMRY_HIST : n=1 : op=Insert; total=2; avg=2.0000;max=2msecRPT_PROD_01.LIMIT_TP_LOG : n=2 : op-Insert; total=2; avg=1.0000;max=2msecRPT_PROD_01.POOL_SMRY : n=1 : op=FieldComp; total=2; avg=2.0000;max=2msec..===============================================summary==============Delete : n=2; total=2; avg=1.00;Insert : n=78; total=356; avg=4.56;FieldComp : n=85728; total=123018; avg=1.43;total_op_num=85808 : total_op_time=123376 ms : total_avg_time=1.44ms/optotal commit number=1 The trace file provides the following information: The table name The operation type (FieldComp is for a compressed field) The number of operations The average wait The maximum wait Summary Armed with the preceding information, we can quickly see what operations against which tables are taking the longest time. Exception handling Oracle GoldenGate 12c now supports Conflict Detection and Resolution (CDR). However, out-of-the-box, GoldenGate takes a catch all approach to exception handling. For example, by default, should any operational failure occur, a Replicat process will ABEND and roll back the transaction to the last known checkpoint. This may not be ideal in a production environment. The HANDLECOLLISIONS and NOHANDLECOLLISIONS parameters can be used to control whether or not a Replicat process tries to resolve the duplicate record error and the missing record error. The way to determine what error occurred and on which Replicat is to create an exceptions handler. Exception handling differs from CDR by trapping and reporting Oracle errors suffered by the data replication (DML and DDL). On the other hand, CDR detects and resolves inconsistencies in the replicated data, such as mismatches with before and after images. Exceptions can always be trapped by the Oracle error they produce. GoldenGate provides an exception handler parameter called REPERROR that allows the Replicat to continue processing data after a predefined error. For example, we can include the following configuration in our Replicat parameter file to ignore ORA-00001 "unique constraint (%s.%s) violated": REPERROR (DEFAULT, EXCEPTION)REPERROR (DEFAULT2, ABEND)REPERROR (-1, EXCEPTION) Cloud computing Cloud computing has grown enormously in the recent years. Oracle has named its latest version of products: 12c, the c standing for Cloud of course. The architecture of Oracle 12c Database allows a multitenant container database to support multiple pluggable databases—a key feature of cloud computing—rather than implement the inefficient schema consolidation, typical of the previous Oracle database version architecture, which is known to cause contention on shared resources during high load. The Oracle 12c architecture supports a database consolidation approach through its efficient memory management and dedicated background processes. Online computer companies such as Amazon have leveraged the cloud concept by offering Relational Database Services (RDS), which is becoming very popular for its speed of readiness, support, and low cost. The cloud environments are often huge, containing hundreds of servers, petabytes of storage, terabytes of memory, and countless CPU cores. The cloud has to support multiple applications in a multi-tiered, shared environment, often through virtualization technologies, where storage and CPUs are typically the driving factors for cost-effective options. Customers choose their hardware footprint that best suits their budget and system requirements, commonly known as Platform as a Service (PaaS). Cloud computing is an extension to grid computing that offers both public and private clouds. GoldenGate and Big Data It is increasingly evident that organizations need to quickly access, analyze, and report on their data across their Enterprise in order to be agile in a competitive market. Data is becoming more of an asset to companies; it adds value to a business, but may be stored in any number of current and legacy systems, making it difficult to realize its full potential. Known as big data, it has until recently been nearly impossible to perform real-time business analysis on the combined data from multiple sources. Nowadays, the ability to access all transactional data with low latency is essential. With the introduction of products such as Apache Hadoop, integration of structured data from an RDBMS, including semi-structured and unstructured data, offers a common playing field to support business intelligence. When coupled with ODI, GoldenGate for big data provides real-time delivery to a suite of Apache products, such as Flume, HDFS, Hive, and Hbase, to support big data analytics. Summary In this article, we have learned an introduction to Oracle GoldenGate by describing the key components, processes, and considerations required to build and implement a GoldenGate solution. Resources for Article: Further resources on this subject: What is Oracle Public Cloud? [Article] Oracle GoldenGate- Advanced Administration Tasks - I [Article] Oracle B2B Overview [Article]

0
0
6636

Getting Started with Java Driver for MongoDB

Managing Recipients

Working with Virtual Machines

Exploring Jenkins

Using the ArcPy DataAccess Module withFeature Classesand Tables

Integrating with Muzzley

Controls and Widgets

Setting Up Synchronous Replication

Understanding Hadoop Backup and Recovery Needs

Securing OpenStack Networking

Trending Topics

Creating Functions and Operations

Updating and building our masters

Bayesian Network Fundamentals

An Introduction to WEP

Oracle GoldenGate 12c — An Overview

Create a Free Account To Continue Reading

Sign in to activate your 7-day free access