How-To Tutorials

24 Sep 2014

22 min read

Creating an Extension in Yii 2

24 Sep 2014

0
0
6494

How-To Tutorials

article-image-protecting-gpg-keys-beaglebone

Packt

24 Sep 2014

23 min read

Protecting GPG Keys in BeagleBone

Packt

24 Sep 2014

23 min read

In this article by Josh Datko, author of BeagleBone for Secret Agents, you will learn how to use the BeagleBone Black to safeguard e-mail encryption keys. (For more resources related to this topic, see here.) After our investigation into BBB hardware security, we'll now use that technology to protect your personal encryption keys for the popular GPG software. GPG is a free implementation of the OpenPGP standard. This standard was developed based on the work of Philip Zimmerman and his Pretty Good Privacy (PGP) software. PGP has a complex socio-political backstory, which we'll briefly cover before getting into the project. For the project, we'll treat the BBB as a separate cryptographic co-processor and use the CryptoCape, with a keypad code entry device, to protect our GPG keys when they are not in use. Specifically, we will do the following: Tell you a little about the history and importance of the PGP software Perform basic threat modeling to analyze your project Create a strong PGP key using the free GPG software Teach you to use the TPM to protect encryption keys History of PGP The software used in this article would have once been considered a munition by the U.S. Government. Exporting it without a license from the government, would have violated the International Traffic in Arms Regulations (ITAR). As late as the early 1990s, cryptography was heavily controlled and restricted. While the early 90s are filled with numerous accounts by crypto-activists, all of which are well documented in Steven Levy's Crypto, there is one man in particular who was the driving force behind the software in this project: Philip Zimmerman. Philip Zimmerman had a small pet project around the year 1990, which he called Pretty Good Privacy. Motivated by a strong childhood passion for codes and ciphers, combined with a sense of political activism against a government capable of strong electronic surveillance, he set out to create a strong encryption program for the people (Levy 2001). One incident in particular helped to motivate Zimmerman to finish PGP and publish his work. This was the language that the then U.S. Senator Joseph Biden added to Senate Bill #266, which would mandate that: "Providers of electronic communication services and manufacturers of electronic communications service equipment shall ensure that communication systems permit the government to obtain the plaintext contents of voice, data, and other communications when appropriately authorized by law." In 1991, in a rush to release PGP 1.0 before it was illegal, Zimmerman released his software as a freeware to the Internet. Subsequently, after PGP spread, the U.S. Government opened a criminal investigation on Zimmerman for the violation of the U.S. export laws. Zimmerman, in what is best described as a legal hack, published the entire source code of PGP, including instructions on how to scan it back into digital form, as a book. As Zimmerman describes: "It would be politically difficult for the Government to prohibit the export of a book that anyone may find in a public library or a bookstore." (Zimmerman, 1995) A book published in the public domain would no longer fall under ITAR export controls. The genie was out of the bottle; the government dropped its case against Zimmerman in 1996. Reflecting on the Crypto Wars Zimmerman's battle is considered a resilient victory. Many other outspoken supporters of strong cryptography, known as cypherpunks, also won battles popularizing and spreading encryption technology. But if the Crypto Wars were won in the early nineties, why hasn't cryptography become ubiquitous? Well, to a degree, it has. When you make purchases online, it should be protected by strong cryptography. Almost nobody would insist that their bank or online store not use cryptography and most probably feel more secure that they do. But what about personal privacy protecting software? For these tools, habits must change as the normal e-mail, chat, and web browsing tools are insecure by default. This change causes tension and resistance towards adoption. Also, security tools are notoriously hard to use. In the seminal paper on security usability, researchers conclude that the then PGP version 5.0, complete with a Graphical User Interface (GUI), was not able to prevent users, who were inexperienced with cryptography but all of whom had at least some college education, from making catastrophic security errors (Whitten 1999). Glenn Greenwald delayed his initial contact with Edward Snowden for roughly two months because he thought GPG was too complicated to use (Greenwald, 2014). Snowden absolutely refused to share anything with Greenwald until he installed GPG. GPG and PGP enable an individual to protect their own communications. Implicitly, you must also trust the receiving party not to forward your plaintext communication. GPG expects you to protect your private key and does not rely on a third party. While this adds some complexity and maintenance processes, trusting a third party with your private key can be disastrous. In August of 2013, Ladar Levison decided to shut down his own company, Lavabit, an e-mail provider, rather than turn over his users' data to the authorities. Levison courageously pulled the plug on his company rather then turn over the data. The Lavabit service generated and stored your private key. While this key was encrypted to the user's password, it still enabled the server to have access to the raw key. Even though the Lavabit service alleviated users from managing their private key themselves, it enabled the awkward position for Levison. To use GPG properly, you should never turn over your private key. For a complete analysis of Lavabit, see Moxie Marlinspike's blog post at http://www.thoughtcrime.org/blog/lavabit-critique/. Given the breadth and depth of state surveillance capabilities, there is a re-kindled interest in protecting one's privacy. Researchers are now designing secure protocols, with these threats in mind (Borisov, 2014). Philip Zimmerman ended the chapter on Why Do You Need PGP? in the Official PGP User's Guide with the following statement, which is as true today as it was when first inked: "PGP empowers people to take their privacy into their own hands. There's a growing social need for it." Developing a threat model We introduced the concept of a threat model. A threat model is an analysis of the security of the system that identifies assets, threats, vulnerabilities, and risks. Like any model, the depth of the analysis can vary. In the upcoming section, we'll present a cursory analysis so that you can start thinking about this process. This analysis will also help us understand the capabilities and limitations of our project. Outlining the key protection system The first step of our analysis is to clearly provide a description of the system we are trying to protect. In this project, we'll build a logical GPG co-processor using the BBB and the CryptoCape. We'll store the GPG keys on the BBB and then connect to the BBB over Secure Shell (SSH) to use the keys and to run GPG. The CryptoCape will be used to encrypt your GPG key when not in use, known as at rest. We'll add a keypad to collect a numeric code, which will be provided to the TPM. This will allow the TPM to unwrap your GPG key. The idea for this project was inspired by Peter Gutmann's work on open source cryptographic co-processors (Gutmann, 2000). The BBB, when acting as a co-processor to a host, is extremely flexible, and considering the power usage, relatively high in performance. By running sensitive code that will have access to cleartext encryption keys on a separate hardware, we gain an extra layer of protection (or at the minimum, a layer of indirection). Identifying the assets we need to protect Before we can protect anything, we must know what to protect. The most important assets are the GPG private keys. With these keys, an attacker can decrypt past encrypted messages, recover future messages, and use the keys to impersonate you. By protecting your private key, we are also protecting your reputation, which is another asset. Our decrypted messages are also an asset. An attacker may not care about your key if he/she can easily access your decrypted messages. The BBB itself is an asset that needs protecting. If the BBB is rendered inoperable, then an attacker has successfully prevented you from accessing your private keys, which is known as a Denial-Of-Service (DOS). Threat identification To identify the threats against our system, we need to classify the capabilities of our adversaries. This is a highly personal analysis, but we can generalize our adversaries into three archetypes: a well funded state actor, a skilled cracker, and a jealous ex-lover. The state actor has nearly limitless resources both from a financial and personnel point of view. The cracker is a skilled operator, but lacks the funding and resources of the state actor. The jealous ex-lover is not a sophisticated computer attacker, but is very motivated to do you harm. Unfortunately, if you are the target of directed surveillance from a state actor, you probably have much bigger problems than your GPG keys. This actor can put your entire life under monitoring and why go through the trouble of stealing your GPG keys when the hidden video camera in the wall records everything on your screen. Also, it's reasonable to assume that everyone you are communicating with is also under surveillance and it only takes one mistake from one person to reveal your plans for world domination. The adage by Benjamin Franklin is apropos here: Three may keep a secret if two of them are dead. However, properly using GPG will protect you from global passive surveillance. When used correctly, neither your Internet Service Provider, nor your e-mail provider, or any passive attacker would learn the contents of your messages. The passive adversary is not going to engage your system, but they could monitor a significant amount of Internet traffic in an attempt to collect it all. Therefore, the confidentiality of your message should remain protected. We'll assume the cracker trying to harm you is remote and does not have physical access to your BBB. We'll also assume the worst case that the cracker has compromised your host machine. In this scenario there is, unfortunately, a lot that the cracker can perform. He can install a key logger and capture everything, including the password that is typed on your computer. He will not be able to get the code that we'll enter on the BBB; however, he would be able to log in to the BBB when the key is available. The jealous ex-lover doesn't understand computers very well, but he doesn't need to, because he knows how to use a golf club. He knows that this BBB connected to your computer is somehow important to you because you've talked his ear off about this really cool project that you read in a book. He physically can destroy the BBB and with it, your private key (and probably the relationship as well!). Identifying the risks How likely are the previous risks? The risk of active government surveillance in most countries is fortunately low. However, the consequences of this attack are very damaging. The risk of being caught up in passive surveillance by a state actor, as we have learned from Edward Snowden, is very likely. However, by using GPG, we add protection against this threat. An active cracker seeking you harm is probably unlikely. Contracting keystroke-capturing malware, however, is probably not an unreasonable event. A 2013 study by Microsoft concluded that 8 out of every 1,000 computers were infected with malware. You may be tempted to play these odds but let's rephrase this statement: in a group of 125 computers, one is infected with malware. A school or university easily has more computers than this. Lastly, only you can assess the risk of a jealous ex-lover. For the full Microsoft report, refer to http://blogs.technet.com/b/security/archive/2014/03/31/united-states-malware-infection-rate-more-than-doubles-in-the-first-half-of-2013.aspx. Mitigating the identified risks If you find yourself the target of a state, this project alone is not going to help much. We can protect ourselves somewhat from the cracker with two strategies. The first is instead of connecting the BBB to your laptop or computer, you can use the BBB as a standalone machine and transfer files via a microSD card. This is known as an air-gap. With a dedicated monitor and keyboard, it is much less likely for software vulnerabilities to break the gap and infect the BBB. However, this comes as a high level of personal inconvenience, depending on how often you encrypt files. If you consider the risk of running the BBB attached to your computer too high, create an air-gapped BBB for maximum protection. If you deem the risk low, because you've hardened your computer and have other protection mechanism, then keep the BBB attached to the computer. An air-gapped computer can still be compromised. In 2010, a highly specialized worm known as Stuxnet was able to spread to networked isolated machines through USB flash drives. The second strategy is to somehow enter the GPG passphrase directly into the BBB without using the host's keyboard. After we complete the project, we'll suggest a mechanism to do this, but it is slightly more complicated. This would eliminate the threat of the key logger since the pin is directly entered. The mitigation against the ex-lover is to treat your BBB as you would your own wallet, and don't leave it out of your sight. It's slightly larger than you would want, but it's certainly small enough to fit in a small backpack or briefcase. Summarizing our threat model Our threat model, while cursory, illustrates the thought process one should go through before using or developing security technologies. The term threat model is specific to the security industry, but it's really just proper planning. The purpose of this analysis is to find logic bugs and prevent you from spending thousands of dollars on high-tech locks for your front door when you keep your backdoor unlocked. Now that we understand what we are trying to protect and why it is important to use GPG, let's build the project. Generating GPG keys First, we need to install GPG on the BBB. It is mostly likely already installed, but you can check and install it with the following command: sudo apt-get install gnupg gnupg-curl Next, we need to add a secret key. For those that already have a secret key, you can import your secret key ring, secring.gpg, to your ~/.gnupg folder. For those that want to create a new key, on the BBB, proceed to the upcoming section. This project assumes some familiarity with GPG. If GPG is new to you, the Free Software Foundation maintains the Email Self-Defense guide which is a very approachable introduction to the software and can be found at https://emailselfdefense.fsf.org/en/index.html. Generating entropy If you decided to create a new key on the BBB, there are a few technicalities we must consider. First of all, GPG will need a lot of random data to generate the keys. The amount of random data available in the kernel is proportional to the amount of entropy that is available. You can check the available entropy with the following command: cat /proc/sys/kernel/random/entropy_avail If this command returns a relatively low number, under 200, then GPG will not have enough entropy to generate a key. On a PC, one can increase the amount of entropy by interacting with the computer such as typing on the keyboard or moving the mouse. However, such sources of entropy are difficult for embedded systems, and in our current setup, we don't have the luxury of moving a mouse. Fortunately, there are a few tools to help us. If your BBB is running kernel version 3.13 or later, we can use the hardware random number generator on the AM3358 to help us out. You'll need to install the rng-tools package. Once installed, you can edit /etc/default/rng-tools and add the following line to register the hardware random number generated for rng-tools: HRNGDEVICE=/dev/hwrng After this, you should start the rng-tools daemon with: /etc/init.d/rng-tools start If you don't have /dev/hwrng—and currently, the chips on the CryptoCape do not yet have character device support and aren't available to /dev/hwrng—then you can install haveged. This daemon implements the Hardware Volatile Entropy Gathering and Expansion (HAVEGE) algorithm, the details of which are available at http://www.irisa.fr/caps/projects/hipsor/. This daemon will ensure that the BBB maintains a pool of entropy, which will be sufficient for generating a GPG key on the BBB. Creating a good gpg.conf file Before you generate your key, we need to establish some more secure defaults for GPG. As we discussed earlier, it is still not as easy as it should be to use e-mail encryption. Riseup.net, an e-mail provider with a strong social cause, maintains an OpenPGP best practices guide at https://help.riseup.net/en/security/message-security/openpgp/best-practices. This guide details how to harden your GPG configuration and provides the motivation behind each option. It is well worth a read to understand the intricacies of GPG key management. Jacob Applebaum maintains an implementation of these best practices, which you should download from https://github.com/ioerror/duraconf/raw/master/configs/gnupg/gpg.conf and save as your ~/.gnupg/gpg.conf file. The configuration is well commented and you can refer to the best practices guide available at Riseup.net for more information. There are three entries, however, that you should modify. The first is default-key, which is the fingerprint of your primary GPG key. Later in this article, we'll show you how to retrieve that fingerprint. We can't perform this action now because we don't have a key yet. The second is keyserver-options ca-cert-file, which is the certificate authority for the keyserver pool. Keyservers host your public keys and a keyserver pool is a redundant collection of keyservers. The instructions on Riseup.net gives the details on how to download and install that certificate. Lastly, you can use Tor to fetch updates on your keys. The act of you requesting a public key from a keyserver signals that you have a potential interest in communicating with the owner of that key. This metadata might be more interesting to a passive adversary than the contents of your message, since it reveals your social network. Tor is apt at protecting traffic analysis. You probably don't want to store your GPG keys on the same BBB as your bridge, so a second BBB would help here. On your GPG BBB, you need to only run Tor as a client, which is its default configuration. Then you can update keyserver-options http-proxy to point to your Tor SOCKS proxy running on localhost. The Electronic Frontier Foundation (EFF) provides some hypothetical examples on the telling nature of metadata, for example, They (the government) know you called the suicide prevention hotline from the Golden Gate Bridge. But the topic of the call remains a secret. Refer to the EFF blog post at https://www.eff.org/deeplinks/2013/06/why-metadata-matters for more details. Generating the key Now you can generate your GPG key. Follow the on screen instructions and don't include a comment. Depending on your entropy source, this could take a while. This example took 10 minutes using haveged as the entropy collector. There are various opinions on what to set as the expiration date. If this is your first GPG, try one year at first. You can always make a new key or extend the same one. If you set the key to never expire and you lose the key, by forgetting the passphrase, people will still think it's valid unless you revoke it. Also, be sure to set the user ID to a name that matches some sort of identification, which will make it easier for people to verify that the holder of the private key is the same person as a certified piece of paper. The command to create a new key is gpg –-gen-key: Please select what kind of key you want: (1) RSA and RSA (default) (2) DSA and Elgamal (3) DSA (sign only) (4) RSA (sign only) Your selection? 1 RSA keys may be between 1024 and 4096 bits long. What keysize do you want? (2048) 4096 Requested keysize is 4096 bits Please specify how long the key should be valid. 0 = key does not expire <n> = key expires in n days <n>w = key expires in n weeks <n>m = key expires in n months <n>y = key expires in n years Key is valid for? (0) 1y Key expires at Sat 06 Jun 2015 10:07:07 PM UTC Is this correct? (y/N) y You need a user ID to identify your key; the software constructs the user ID from the Real Name, Comment and Email Address in this form: "Heinrich Heine (Der Dichter) <heinrichh@duesseldorf.de>" Real name: Tyrone Slothrop Email address: tyrone.slothrop@yoyodyne.com Comment: You selected this USER-ID: "Tyrone Slothrop <tyrone.slothrop@yoyodyne.com>" Change (N)ame, (C)omment, (E)mail or (O)kay/(Q)uit? O You need a Passphrase to protect your secret key. We need to generate a lot of random bytes. It is a good idea to perform some other action (type on the keyboard, move the mouse, utilize the disks) during the prime generation; this gives the random number generator a better chance to gain enough entropy. ......+++++ ..+++++ gpg: key 0xABD9088171345468 marked as ultimately trusted public and secret key created and signed. gpg: checking the trustdb gpg: 3 marginal(s) needed, 1 complete(s) needed, PGP trust model gpg: depth: 0 valid: 1 signed: 0 trust: 0-, 0q, 0n, 0m, 0f, 1u gpg: next trustdb check due at 2015-06-06 pub 4096R/0xABD9088171345468 2014-06-06 [expires: 2015-06-06] Key fingerprint = CBF9 1404 7214 55C5 C477 B688 ABD9 0881 7134 5468 uid [ultimate] Tyrone Slothrop <tyrone.slothrop@yoyodyne.com> sub 4096R/0x9DB8B6ACC7949DD1 2014-06-06 [expires: 2015-06-06] gpg --gen-key 320.62s user 0.32s system 51% cpu 10:23.26 total From this example, we know that our secret key is 0xABD9088171345468. If you end up creating multiple keys, but use just one of them more regularly, you can edit your gpg.conf file and add the following line: default-key 0xABD9088171345468 Postgeneration maintenance In order for people to send you encrypted messages, they need to know your public key. Having your public key server can help distribute your public key. You can post your key as follows, and replace the fingerprint with your primary key ID: gpg --send-keys 0xABD9088171345468 GPG does not rely on third parties and expects you to perform key management. To ease this burden, the OpenPGP standards define the Web-of-Trust as a mechanism to verify other users' keys. Details on how to participate in the Web-of-Trust can be found in the GPG Privacy Handbook at https://www.gnupg.org/gph/en/manual/x334.html. You are also going to want to create a revocation certificate. A revocation certificate is needed when you want to revoke your key. You would do this when the key has been compromised, say if it was stolen. Or more likely, if the BBB fails and you can no longer access your key. Generate the certificate and follow the ensuing prompts replacing the ID with your key ID: gpg --output revocation-certificate.asc --gen-revoke 0xABD9088171345468 sec 4096R/0xABD9088171345468 2014-06-06 Tyrone Slothrop <tyrone.slothrop@yoyodyne.com> Create a revocation certificate for this key? (y/N) y Please select the reason for the revocation: 0 = No reason specified 1 = Key has been compromised 2 = Key is superseded 3 = Key is no longer used Q = Cancel (Probably you want to select 1 here) Your decision? 0 Enter an optional description; end it with an empty line: > Reason for revocation: No reason specified (No description given) Is this okay? (y/N) y You need a passphrase to unlock the secret key for user: "Tyrone Slothrop <tyrone.slothrop@yoyodyne.com>" 4096-bit RSA key, ID 0xABD9088171345468, created 2014-06-06 ASCII armored output forced. Revocation certificate created. Please move it to a medium which you can hide away; if Mallory gets access to this certificate he can use it to make your key unusable. It is smart to print this certificate and store it away, just in case your media become unreadable. But have some caution: The print system of your machine might store the data and make it available to others! Do take the advice and move this file off the BeagleBone. Printing it out and storing it somewhere safe is a good option, or burn it to a CD. The lifespan of a CD or DVD may not be as long as you think. The United States National Archives Frequently Asked Questions (FAQ) page on optical storage media states that: "CD/DVD experiential life expectancy is 2 to 5 years even though published life expectancies are often cited as 10 years, 25 years, or longer." Refer to their website http://www.archives.gov/records-mgmt/initiatives/temp-opmedia-faq.html for more details. Lastly, create an encrypted backup of your encryption key and consider storing that in a safe location on durable media. Using GPG With your GPG private key created or imported, you can now use GPG on the BBB as you would on any other computer. You may have already installed Emacs on your host computer. If you follow the GNU/Linux instructions, you can also install Emacs on the BBB. If you do, you'll enjoy automatic GPG encryption and decryption for files that end in the .gpg extension. For example, suppose you want to send a message to your good friend, Pirate Prentice, whose GPG key you already have. Compose your message in Emacs, and then save it with a .gpg extension. Emacs will prompt you to select the public keys for encryption and will automatically encrypt the buffer. If a GPG-encrypted message is encrypted to a public key, with which you have the corresponding private key, Emacs will automatically decrypt the message if it ends with .gpg. When using Emacs from the terminal, the prompt for encryption should look like the following screenshot: Summary This article covered and taught you about how GPG can protect e-mail confidentiality Resources for Article: Further resources on this subject: Making the Unit Very Mobile - Controlling Legged Movement [Article] Pulse width modulator [Article] Home Security by BeagleBone [Article]

0
0
23042

How-To Tutorials

Packt

24 Sep 2014

12 min read

Exploring the Usages of Delphi

Packt

24 Sep 2014

12 min read

This article written by Daniele Teti, the author of Delphi Cookbook, explains the process of writing enumerable types. It also discusses the steps to customize FireMonkey controls. (For more resources related to this topic, see here.) Writing enumerable types When the for...in loop was introduced in Delphi 2005, the concept of enumerable types was also introduced into the Delphi language. As you know, there are some built-in enumerable types. However, you can create your own enumerable types using a very simple pattern. To make your container enumerable, implement a single method called GetEnumerator, that must return a reference to an object, interface, or record, that implements the following three methods and one property (in the sample, the element to enumerate is TFoo): function GetCurrent: TFoo; function MoveNext: Boolean; property Current: TFoo read GetCurrent; There are a lot of samples related to standard enumerable types, so in this recipe you'll look at some not-so-common utilizations. Getting ready In this recipe, you'll see a file enumerable function as it exists in other, mostly dynamic, languages. The goal is to enumerate all the rows in a text file without actual opening, reading and closing the file, as shown in the following code: var row: String; begin for row in EachRows('....myfile.txt') do WriteLn(row); end; Nice, isn't it? Let's start… How to do it... We have to create an enumerable function result. The function simply returns the actual enumerable type. This type is not freed automatically by the compiler so you've to use a value type or an interfaced type. For the sake of simplicity, let's code to return a record type: function EachRows(const AFileName: String): TFileEnumerable; begin Result := TFileEnumerable.Create(AFileName); end; The TFileEnumerable type is defined as follows: type TFileEnumerable = record private FFileName: string; public constructor Create(AFileName: String); function GetEnumerator: TEnumerator<String>; end; . . . constructor TFileEnumerable.Create(AFileName: String); begin FFileName := AFileName; end; function TFileEnumerable.GetEnumerator: TEnumerator<String<; begin Result := TFileEnumerator.Create(FFileName); end; No logic here; this record is required only because you need a type that has a GetEnumerator method defined. This method is called automatically by the compiler when the type is used on the right side of the for..in loop. An interesting thing happens in the TFileEnumerator type, the actual enumerator, declared in the implementation section of the unit. Remember, this object is automatically freed by the compiler because it is the return of the GetEnumerator call: type TFileEnumerator = class(TEnumerator<String>) private FCurrent: String; FFile: TStreamReader; protected constructor Create(AFileName: String); destructor Destroy; override; function DoGetCurrent: String; override; function DoMoveNext: Boolean; override; end; { TFileEnumerator } constructor TFileEnumerator.Create(AFileName: String); begin inherited Create; FFile := TFile.OpenText(AFileName); end; destructor TFileEnumerator.Destroy; begin FFile.Free; inherited; end; function TFileEnumerator.DoGetCurrent: String; begin Result := FCurrent; end; function TFileEnumerator.DoMoveNext: Boolean; begin Result := not FFile.EndOfStream; if Result then FCurrent := FFile.ReadLine; end; The enumerator inherits from TEnumerator<String> because each row of the file is represented as a string. This class also gives a mechanism to implement the required methods. The DoGetCurrent (called internally by the TEnumerator<T>.GetCurrent method) returns the current line. The DoMoveNext method (called internally by the TEnumerator<T>.MoveNext method) returns true or false if there are more lines to read in the file or not. Remember that this method is called before the first call to the GetCurrent method. After the first call to the DoMoveNext method, FCurrent is properly set to the first row of the file. The compiler generates a piece of code similar to the following pseudo code: it = typetoenumerate.GetEnumerator; while it.MoveNext do begin S := it.Current; //do something useful with string S end it.free; There's more… Enumerable types are really powerful and help you to write less, and less error prone, code. There are some shortcuts to iterate over in-place data without even creating an actual container. If you have a bounce or integers or if you want to create a not homogenous for loop over some kind of data type, you can use the new TArray<T> type as shown here: for i in TArray<Integer>.Create(2, 4, 8, 16) do WriteLn(i); //write 2 4 8 16 TArray<T> is a generic type, so the same works also for strings: for s in TArray<String>.Create('Hello','Delphi','World') do WriteLn(s); It can also be used for Plain Old Delphi Object (PODO) or controls: for btn in TArray<TButton>.Create(btn1, btn31,btn2) do btn.Enabled := false See also http://docwiki.embarcadero.com/RADStudio/XE6/en/Declarations_and_Statements#Iteration_Over_Containers_Using_For_statements: This Embarcadero documentation will provide a detailed introduction to enumerable types Giving a new appearance to the standard FireMonkey controls using styles Since Version XE2, RAD Studio includes FireMonkey. FireMonkey is an amazing library. It is a really ambitious target for Embarcadero, but it's important for its long-term strategy. VCL is and will remain a Windows-only library, while FireMonkey has been designed to be completely OS and device independent. You can develop one application and compile it anywhere (if anywhere is contained in Windows, OS X, Android, and iOS; let's say that is a good part of anywhere). Getting ready A styled component doesn't know how it will be rendered on the screen, but the style. Changing the style, you can change the aspect of the component without changing its code. The relation between the component code and style is similar to the relation between HTML and CSS, one is the content and another is the display. In terms of FireMonkey, the component code contains the actual functionalities the component has, but the aspect is completely handled by the associated style. All the TStyledControl child classes support styles. Let's say you have to create an application to find a holiday house for a travel agency. Your customer wants a nice-looking application to search for the dream house their customers. Your graphic design department (if present) decided to create a semitransparent look-and-feel, as shown in the following screenshot, and you've to create such an interface. How to do that? This is the UI we want How to do it… In this case, you require some step-by-step instructions, so here they are: Create a new FireMonkey desktop application (navigate to File | New | FireMonkey Desktop Application). Drop a TImage component on the form. Set its Align property to alClient, and use the MultiResBitmap property and its property editor to load a nice-looking picture. Set the WrapMode property to iwFit and resize the form to let the image cover the entire form. Now, drop a TEdit component and a TListBox component over the TImage component. Name the TEdit component as EditSearch and the TListBox component as ListBoxHouses. Set the Scale property of the TEdit and TListBox components to the following values: Scale.X: 2 Scale.Y: 2 Your form should now look like this: The form with the standard components The actions to be performed by the users are very simple. They should write some search criteria in the Edit field and click on Return. Then, the listbox shows all the houses available for that criteria (with a "contains" search). In a real app, you require a database or a web service to query, but this is a sample so you'll use fake search criteria on fake data. Add the RandomUtilsU.pas file from the Commons folder of the project and add it to the uses clause of the main form. Create an OnKeyUp event handler for the TEdit component and write the following code inside it: procedure TForm1.EditSearchKeyUp(Sender: TObject; var Key: Word; var KeyChar: Char; Shift: TShiftState); var I: Integer; House: string; SearchText: string; begin if Key <> vkReturn then Exit; // this is a fake search... ListBoxHouses.Clear; SearchText := EditSearch.Text.ToUpper; //now, gets 50 random houses and match the criteria for I := 1 to 50 do begin House := GetRndHouse; if House.ToUpper.Contains(SearchText) then ListBoxHouses.Items.Add(House); end; if ListBoxHouses.Count > 0 then ListBoxHouses.ItemIndex := 0 else ListBoxHouses.Items.Add('<Sorry, no houses found>'); ListBoxHouses.SetFocus; end; Run the application and try it to familiarize yourself with the behavior. Now, you have a working application, but you still need to make it transparent. Let's start with the FireMonkey Style Designer (FSD). Just to be clear, at the time of writing, the FireMonkey Style Designer is far to be perfect. It works, but it is not a pleasure to work with it. However, it does its job. Right-click on the TEdit component. From the contextual menu, choose Edit Custom Style (general information about styles and the style editor can be found at http://docwiki.embarcadero.com/RADStudio/XE6/en/FireMonkey_Style_Designer and http://docwiki.embarcadero.com/RADStudio/XE6/en/Editing_a_FireMonkey_Style). Delphi opens a new tab that contains the FSD. However, to work with it, you need the Structure pane to be visible as well (navigate to View | Structure or Shift + Alt + F11). In the Structure pane, there are all the styles used by the TEdit control. You should see a Structure pane similar to the following screenshot: The Structure pane showing the default style for the TEdit control In the Structure pane, open the editsearchstyle1 node, select the background subnode, and go to the Object Inspector. In the Object Inspector window, remove the content of the SourceLookup property. The background part of the style is TActiveStyleObject. A TActiveStyleObject style is a style that is able to show a part of an image as default and another part of the same image when the component that uses it is active, checked, focused, mouse hovered, pressed, or selected. The image to use is in the SourceLookup property. Our TEdit component must be completely transparent in every state, so we removed the value of the SourceLookup property. Now the TEdit component is completely invisible. Click on Apply and Close and run the application. As you can confirm, the edit works but it is completely transparent. Close the application. When you opened the FSD for the first time, a TStyleBook component has been automatically dropped on the form and contains all your custom styles. Double-click on it and the style designer opens again. The edit, as you saw, is transparent, but it is not usable at all. You need to see at least where to click and write. Let's add a small bottom line to the edit style, just like a small underline. To perform the next step, you require the Tool Palette window and the Structure pane visible. Here is my preferred setup for this situation: The Structure pane and the Tool Palette window are visible at the same time using the docking mechanism; you can also use the floating windows if you wish Now, search for a TLine component in the Tool Palette window. Drag-and-drop the TLine component onto the editsearchstyle1 node in the Structure pane. Yes, you have to drop a component from the Tool Palette window directly onto the Structure pane. Now, select the TLine component in the Structure Pane (do not use the FSD to select the components, you have to use the Structure pane nodes). In the Object Inspector, set the following properties: Align: alContents HitTest: False LineType: ltTop RotationAngle: 180 Opacity: 0.6 Click on Apply and Close. Run the application. Now, the text is underlined by a small black line that makes it easy to identify that the application is transparent. Stop the application. Now, you've to work on the listbox; it is still 100 percent opaque. Right-click on the ListBoxHouses option and click on Edit Custom Style. In the Structure pane, there are some new styles related to the TListBox class. Select the listboxhousesstyle1 option, open it, and select its child style, background. In the Object Inspector, change the Opacity property of the background style to 0.6. Click on Apply and Close. That's it! Run the application, write Calif in the Edit field and press Return. You should see a nice-looking application with a semitransparent user interface showing your dream houses in California (just like it was shown in the screenshot in the Getting ready section of this recipe). Are you amazed by the power of FireMonkey styles? How it works... The trick used in this recipe is simple. If you require a transparent UI, just identify which part of the style of each component is responsible to draw the background of the component. Then, put the Opacity setting to a level less than 1 (0.6 or 0.7 could be enough for most cases). Why not simply change the Opacity property of the component? Because if you change the Opacity property of the component, the whole component will be drawn with that opacity. However, you need only the background to be transparent; the inner text must be completely opaque. This is the reason why you changed the style and not the component property. In the case of the TEdit component, you completely removed the painting when you removed the SourceLookup property from TActiveStyleObject that draws the background. As a thumb rule, if you have to change the appearance of a control, check its properties. If the required customization is not possible using only the properties, then change the style. There's more… If you are new to FireMonkey styles, probably most concepts in this recipe must have been difficult to grasp. If so, check the official documentation on the Embarcadero DocWiki at the following URL: http://docwiki.embarcadero.com/RADStudio/XE6/en/Customizing_FireMonkey_Applications_with_Styles Summary In this article, we discussed ways to write enumerable types in Delphi. We also discussed how we can use styles to make our FireMonkey controls look better. Resources for Article: Further resources on this subject: Adding Graphics to the Map [Article] Application Performance [Article] Coding for the Real-time Web [Article]

0
0
15011

article-image-how-to-build-a-recommender-by-running-mahout-on-spark

Pat Ferrel

24 Sep 2014

7 min read

How to Build a Recommender by Running Mahout on Spark

Pat Ferrel

24 Sep 2014

7 min read

Mahout on Spark: Recommenders There are big changes happening in Apache Mahout. For several years it was the go-to machine learning library for Hadoop. It contained most of the best-in-class algorithms for scalable machine learning, which means clustering, classification, and recommendations. But it was written for Hadoop and MapReduce. Today a number of new parallel execution engines show great promise in speeding calculations by 10-100x (Spark, H2O, Flink). That means that instead of buying 10 computers for a cluster, one will do. That should get your manager’s attention. After releasing Mahout 0.9, the team decided to begin an aggressive retool using Spark, building in the flexibility to support other engines, and both H2O and Flink have shown active interest. This post is about moving the heart of Mahout’s item-based collaborative filtering recommender to Spark. Where we are Mahout is currently on the 1.0 snapshot version, meaning we are working on what will be released as 1.0. For the past year or, some of the team has been working on a Scala-based DSL (Domain Specific Language), which looks like Scala with R-like algebraic expressions. Since Scala supports not only operator overloading but functional programming, it is a natural choice for building distributed code with rich linear algebra expressions. Currently we have an interactive shell that runs Scala with all of the R-like expression support. Think of it as R but supporting truly huge data in a completely distributed way. Many algorithms—the ones that can be expressed as simple linear algebra equations—are implemented with relative ease (SSVD, PCA). Scala also has lazy evaluation, which allows Mahout to slide a modern optimizer underneath the DSL. When an end product of a calculation is needed, the optimizer figures out the best path to follow and spins off the most efficient Spark jobs to accomplish the whole. Recommenders One of the first things we want to implement is the popular item-based recommenders. But here, too, we’ll introduce many innovations. It still starts from some linear algebra. Let’s take the case of recommending purchases on an e-commerce site. The problem can be defined in the following code example: r_p = recommendations for purchases for a given user. This is a vector of item-ids and strengths of recommendation. h_p = history of purchases for a given user A = the matrix of all purchases by all users. Rows are users, columns items, for now we will just flag a purchase so the matrix is all ones and zeros. r_p = [A’A]h_p A’A is the matrix A transposed then multiplied by A. This is the core cooccurrence or indicator matrix used in this style of recommender. Using the Mahout Scala DSL we could write the recommender as: val recs = (A.t %*% A) * userHist This would produce a reasonable recommendation, but from experience we know that A’A is better calculated using a method called the Log Likelihood Ratio, which is a probabilistic measure of the importance of a cooccurrence (http://en.wikipedia.org/wiki/Likelihood-ratio_test). In general, when you see something like A’A, it can be replaced with a similarity comparison for each row with every other row. This will produce a matrix whose rows are items and whose columns are the same items. The magnitude of the value in the matrix determines the strength of similarity of row item to the column item. In recommenders, the more similar the items, the more they were purchased by similar people. The previous line of code is replaced by the following: val recs = CooccurrenceAnalysis.cooccurence(A)* userHist However, this would take time to execute for each user as they visit the e-commerce site, so we’ll handle that outside of Mahout. First let’s talk about data preparation. Item Similarity Driver Creating the indicator matrix (A’A) is the core of this type of recommender. We have a quick flexible way to create this using text log files and creating output that is in an easy form to digest. The job of data prep is greatly streamlined in the Mahout 1.0 snapshot. In the past a user would have to do all the data prep themselves. This requires translating their own user and item IDs into Mahout IDs, putting the data into text tuple files and feeding them to the recommender. Out the other end you’d get a Hadoop binary file called a sequence file and you’d have to translate the Mahout IDs into something your application could understand. This is no longer required. To make this process much simpler we created the spark-itemsimilarity command line tool. After installing Mahout, Hadoop, and Spark, and assuming you have logged user purchases in some directories in HDFS, we can probably read them in, calculate the indicator matrix, and write it out with no other prep required. The spark-itemsimilarity command line takes in text-delimited files, extracts the user ID and item ID, runs the cooccurrence analysis, and outputs a text file with your application’s user and item IDs restored. Here is the sample input file where we’ve specified a simple comma-delimited format that field holds the user ID, the item ID, and the filter—purchase: Thu Jul 10 10:52:10.996,u1,purchase,iphone Fri Jul 11 13:52:51.018,u1,purchase,ipad Fri Jul 11 21:42:26.232,u2,purchase,nexus Sat Jul 12 09:43:12.434,u2,purchase,galaxy Sat Jul 12 19:58:09.975,u3,purchase,surface Sat Jul 12 23:08:03.457,u4,purchase,iphone Sun Jul 13 14:43:38.363,u4,purchase,galaxy spark-itemsimilarity will create a Spark distributed dataset (rdd) to back the Mahout DRM (distributed row matrix) that holds this data: User/item iPhone IPad Nexus Galaxy Surface u1 1 1 0 0 0 u2 0 0 1 1 0 u3 0 0 0 0 1 u4 1 0 0 1 0 The output of the job is the LLR computed “indicator matrix” and will contain this data: Item/item iPhone iPad Nexus Galaxy Surface iPhone 0 1.726092435 0 0 0 iPad 1.726092435 0 0 0 0 Nexus 0 0 0 1.726092435 0 Galaxy 0 0 1.726092435 0 0 Surface 0 0 0 0 0 Reading this, we see that self-similarities have been removed so the diagonal is all zeros. The iPhone is similar to the iPad and the Galaxy is similar to the Nexus. The output of the spark-itemsimilarity job can be formatted in various ways but by default it looks like this: galaxy<tab>nexus:1.7260924347106847 ipad<tab>iphone:1.7260924347106847 nexus<tab>galaxy:1.7260924347106847 iphone<tab>ipad:1.7260924347106847 surface On the e-commerce site for the page displaying the Nexus we can show that the Galaxy was purchased by the same people. Notice that application specific IDs are preserved here and the text file is very easy to parse in text-delimited format. The numeric values are the strength of similarity, and for the cases where there are many similar products, you can sort on that value if you want to show only the highest weighted recommendations. Still, this is only part way to an individual recommender. We have done the [A’A] part, but now we need to do the [A’A]h_p. Using the current user’s purchase history will personalize the recommendations. The next post in this series will talk about using a search engine to take this last step. About the author Pat is a serial entrepreneur, consultant, and Apache Mahout committer working on the next generation of Spark-based recommenders. He lives in Seattle and can be contacted through his site https://finderbots.com or @occam on Twitter. Want more Spark? We've got you covered - click here.

0
0
5915

article-image-windows-phone-8-applications

Packt

23 Sep 2014

17 min read

Windows Phone 8 Applications

Packt

23 Sep 2014

17 min read

0
0
1559

Packt

23 Sep 2014

15 min read

Installing RHEV Manager

Packt

23 Sep 2014

15 min read

This article by Pradeep Subramanian, author of Getting Started with Red Hat Enterprise Virtualization, describes setting up RHEV-M, including the installation, initial configuration, and connection to the administrator and user portal of the manager web interface. (For more resources related to this topic, see here.) Setting up the RHEL operating system for the manager Prior to starting the installation of RHEV-M, please make sure all the prerequisite are met to set up RHEV environment. Consider the following when setting up RHEL OS for RHEV-M: Install Red Hat Enterprise Linux 6 with latest minor update of 5, and during package selection step, select minimal or basic server as an option. Don't select any custom package. The hostname should be set to FQDN. Set up basic networking; use of static IP is recommended for your manager with a default gateway and primary and secondary DNS client configured. SELinux and iptables are enabled by default as part of the operating system installation. For more security, it's highly recommended to keep it on. To disable SELinux on Red Hat Enterprise Linux, please run the following command as the root user: # setenforce Permissive This command will switch off SELinux enforcement temporarily until the machine is rebooted. If you would like to permanently disable it, edit /etc/sysconfig/selinux and enter SELINUX=disabled. Registering with Red Hat Network To install RHEV-M, you need to first register your manager machine with Red Hat Network and subscribe to the relevant channels. You need to connect your machine to the Red Hat Network with a valid account with access to the relevant software channels to register your machine and deploy RHEV-M packages. If your environment does not have access to the Red Hat Network, you can perform an offline installation of RHEV-M. For more information, please refer to https://access.redhat.com/site/articles/216983. To register your machine with the Red Hat Network using RHN Classic, please run the following command from the shell and follow the onscreen instructions: # rhn_register This command will register your manager machine to the parent channel of your operating system version. It's strongly recommended to use Red Hat Subscription Manager to register and subscribe to the relevant channel. To use Red Hat Subscription Manager, please refer to the Subscribing to the Red Hat Enterprise Virtualization Manager Channels using Subscription Manager section from the RHEV 3.3 installation guide at https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Virtualization/3.3/html/Installation_Guide/index.html. After successful registration of your manager machine to the Red Hat Network, subscribe the manager machine using the following command to subscribe to the relevant channels. Then download and install the manager-related software packages. The following command will prompt you to enter your Red Hat Network login credentials: # rhn-channel -a -c rhel-x86_64-server-6-rhevm-3.3 -c rhel-x86_64-server-supplementary-6 -c jbappplatform-6-x86_64-server-6-rpm Username: "yourrhnlogin" Password: XXXX To cross-check whether your manager machine is registered with Red Hat Network and subscribed to the relevant channels, please run the following command. This will return all the channels mentioned earlier plus the base channel of your operating system version, as shown in the following yum command output: # yum repolist repo id repo name status jbappplatform-6-x86_64-server-6-rpm Red Hat JBoss EAP (v 6) for 6Server x86_64 1,415 rhel-x86_64-server-6 Red Hat Enterprise Linux Server (v. 6 for 64-bit x86_64) 12,662 rhel-x86_64-server-6-rhevm-3.3 Red Hat Enterprise Virtualization Manager (v.3.3 x86_64) 164 rhel-x86_64-server-supplementary-6 RHEL Server Supplementary (v. 6 64-bit x86_64) 370 You are now ready to start downloading and installing the software required to set up and run your RHEV-M. Installing the RHEV-Manager packages Update your base Red Hat Enterprise Linux operating system to the latest up-to-date version by running the following command: # yum -y upgrade Reboot the machine if the upgrade installed the latest version of the kernel. After a successful upgrade, run the following command to install RHEV-M and its dependent packages: # yum -y install rhevm There are a few conditions you need to consider before configuring RHEV-M: We need a working DNS for forward and reverse lookup of FQDN. We are going to use the Red Hat IdM server configured with the DNS role in the rest of the article for domain name resolution of the entire virtualization infrastructure. Refer to the Red Hat Identity Management Guide for more information on how to add forward and reverse zone records to the configured IdM DNS at https://access.redhat.com/documentation/en- US/Red_Hat_Enterprise_Linux/6/html/Identity_Management_Guide/Working_with_DNS.html. You can't install Identity Management software on the same box where the manager is going to be deployed due to some package conflicts. To store ISO images of operating systems in order to create a virtual machine, you need Network File Server (NFS) with a planned NFS export path. If your manager machine has sufficient storage space to host all your ISOs, you can set up the ISO domain while configuring the manager to set up the NFS share automatically through the installer to store all your ISO images. If you have an existing NFS server, it's recommended to use a dedicated export for the ISO domain to store the ISO images instead of using the manager server to serve the NFS service. Here we are going to use a dedicated local mount point named /rhev-iso-library on the RHEV Manager box to store our ISO images to provision the virtual machine. Note that the mount point should be empty and only contain the user and group ownership and permission sets before running the installer: # chown -R 36:36 /rhev-iso-library ; chmod 0755 /rhev-iso-library It will also be useful to have the following information at hand: Ports to be used for HTTP and HTTPS communication. FQDN of the manager. A reverse lookup is performed on your hostname. At the time of writing this article, RHEV supported only the PostgreSQL database for use with RHEV-M. You can use a local database or remote database setup. Here we are going to use the local database. In the case of a remote database setup, keep all database login credentials ready. Please refer to the Preparing a PostgreSQL Database for Use with Red Hat Enterprise Virtualization Manager section for detailed information on setting up a remote database to use with manager at https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Virtualization/3.3/html-single/Installation_Guide/index.html#Preparing_a_Postgres_Database_Server_for_use_with_Red_Hat_Enterprise_Virtualization_Manager. Password for internal admin account of RHEV-M. Organization name for the RHEV-M SSL certificate. Leave the default storage type to NFS for the initial default data center. We will create a new data center in the latter stage of our setup. Provide the file system path and display name for NFS ISO library configuration, so that the manager will configure NFS of the supplied filesystem path, and make it visible by the display name under the Storage tab section on administration portal of RHEV-M. Running the initial engine setup Once you're prepared with all the answers to the questions we discussed in the previous section, it's time to run the initial configuration script called engine-setup to perform the initial configuration and setting up of RHEV-M. The installer will ask you several questions, which have been discussed above, and based on your input, it will configure your RHEV-M. Leave the default settings as they are and press Enter if you feel the installer's default answers are appropriate to your setup. Once the installer takes in all your input, it will ask you for the final confirmation of your supplied configuration setting; type in OK and press Enter to continue the setup. For better understanding, please refer to the following output of the engine-setup installer while setting up a lab for this article. Log in to manager as the root user, and from the shell of your Manager machine, run the following engine-setup command: # engine-setup Once you execute this command, engine-setup performs the following set of tasks on the system: First check whether any updates are available for this system. Accept the default Yes and proceed further: Checking for product updates and update if available. Enter Default Yes. Set the hostname of the RHEV-M system. The administration portal web access will get bound to the FQDN entered here: Host fully qualified DNS name of this server [rhevmanager.example.com]: Set up the firewall rule on the manager system, and this will backup your existing firewall rule configured on the manager system if any: Do you want Setup to configure the firewall? (Yes, No) [Yes]: No Local will set up the PostgreSQL database instance on the manager system; optionally, you can choose Remote to use the existing remote PostgreSQL database instance to use with manager: Where is the database located? (Local, Remote) [Local]: If you selected Local, you will get an option to customize the PostgreSQL database setup by choosing the relevant option: Would you like Setup to automatically configure PostgreSQL, or prefer to perform that manually? (Automatic, Manual) [Automatic]: Set up the internal admin user password to access the manager web interface for initial setup of the virtualization infrastructure: Engine admin password: Confirm engine admin password: RHEV supports the use of clusters to manage Gluster storage bricks in addition to virtualization hosts. Choosing both will give the flexibility to use hypervisor hosts to host virtual machines as well as other sets of hypervisor hosts to manage Gluster storage bricks in your RHEV environment: Application mode (Both, Virt, Gluster) [Both]: Engine installer creates a data center named Default as part of the initial setup. The following step will ask you to select the type of storage to be used with the data center. Mixing storage domains of different types is not supported in the 3.3 release, but it is supported in the latest 3.4 release. Choose the default NFS option and proceed further. We are going to create a new data center, using the administration portal, from scratch after the engine setup and then select the storage type as ISCSI for the rest of this article: Default storage type: (NFS, FC, ISCSI, POSIXFS) [NFS]: The manager uses certificates to communicate securely with its hosts. Provide your organization's name for the certificate: Organization name for certificate [example.com]: The manager uses the Apache web server to present a landing page to users. The engine-setup script can make the landing page of the manager the default page presented by Apache: Do you wish to set the application as the default page of the web server? (Yes, No) [Yes]: By default, external SSL (HTTPS) communications with the manager are secured with the self-signed certificate created in the PKI configuration stage for secure communication with hosts. Another certificate may be chosen for external HTTPS connections without affecting how the manager communicates with hosts: Setup can configure apache to use SSL using a certificate issued from the internal CA. Do you wish Setup to configure that, or prefer to perform that manually? (Automatic, Manual) [Automatic]: Choose Yes to set up an NFS share on the manager system and provide the export path to be used to dump the ISO images in a later part. Finally, label the ISO domain with a name that will be unique and easily identifiable on the Storage tab of the administration portal: Configure an NFS share on this server to be used as an ISO Domain? (Yes, No) [Yes]: Local ISO domain path [/var/lib/exports/iso]: /rhev-iso-library Local ISO domain name [ISO_DOMAIN]: ISO_Datastore The engine-setup script can optionally configure a WebSocket proxy server in order to allow users to connect with virtual machines via the noVNC or HTML 5 consoles: Configure WebSocket Proxy on this machine? (Yes, No) [Yes]: The final step will ask you to provide proxy server credentials if the manager system is hosted behind the proxy server to access the Internet. RHEV supports vRed Hat Access Plugin, which will help you collect the logs and open a service request with Red Hat Global Support Services from the administration portal of the manager: Would you like transactions from the Red Hat Access Plugin sent from the RHEV Manager to be brokered through a proxy server? (Yes, No) [No]: Finally, if you feel all the input and configurations are satisfactory, press Enter to complete the engine setup. It will show you the configuration preview, and if you feel satisfied, press OK: Please confirm installation settings (OK, Cancel) [OK]: After the successful setup of RHEV-M, you can see the summary, which will show various bits of information such as how to access the admin portal of RHEV-M, the installed logs, the configured iptables firewall, the required ports, and so on. Connecting to the admin and user portal 006C Now access the admin portal, as shown in the following screenshot, using the following URLs: http://rhevmanager.example.com:80/ovirt-engine https://rhevmanager.example.com:443/ovirt-engine Use the user admin and password specified during the setup to log in to the oVirt engine (also called RHEV-M). Click on Administration Portal and log in using the credentials you set up for the admin account during the engine setup. Then click on User Portal and log in using the credentials you set up for the admin account during the engine setup. You will see a difference in the portal with a very trimmed-down user interface that is useful for self-service. We will see how to integrate the manager with other active directory services and efficiently use the user portal for self-service consumption later in the article. RHEV reporting RHEV bundles two optional components. The first is the history management database, which holds the historical information of various virtualization resources such as data centers, clusters, hosts, virtual machines, and others so that any other external application can consume them for reporting. The second optional component is the customized JasperServer and JasperReports. JasperServer is an open source reporting tool capable of generating and exporting reports in various formats such as PDF, Word, and CSV for end user consumption. To enable the reporting functionality, you need to install the specific components that we discussed. For simplicity, we are installing both the components at one go using the command described in the following section. Installing the RHEV history database and report server To install the history database and report servers, execute the following command: # yum install rhevm-dwh rhevm-reports Once you have installed the reporting components, you need to start with setting up the RHEV history database by using the following command: # rhevm-dwh-setup This will momentarily stop and start the oVirt engine service during the setup. Further, it will ask you to create a read-only user account to access the history database. Create it if you want to allow remote access to the history database and follow the onscreen instructions and finish the setup. Once the oVirt engine history database (also known as the RHEV Manager history database) is created, move on to setting up the report server. From the RHEV-M server, run the following command to set up the reporting server: # rhevm-reports-setup #setup will prompt to restart ovirt-engine service. In order to proceed the installer must stop the ovirt-engine service Would you like to stop the ovirt-engine service? (yes|no): #The command then performs a number of actions before prompting you to set the password for the Red Hat Enterprise Virtualization Manager Reports administrative users (rhevm-admin and superuser) Please choose a password for the reports admin user(s) (rhevm-admin and superuser): Downloading the example code You can download the example code files for all Packt books you have purchased from your account at http://www.packtpub.com. If you purchased this book elsewhere, you can visit http://www.packtpub.com/support and register to have the files e-mailed directly to you. Follow the onscreen instructions and enter Yes to stop the oVirt-engine and set up a password for the default internal super user account called rhevm-admin to access and manage the report portal and proceed further with the setup. Note that this user is different from the internal admin account we set up during the engine setup of RHEV-M. The rhevm-admin user is used only for accessing and managing the report portal, not for the admin or user portal. Accessing the RHEV report portal After the successful installation and initial configuration setup of the report portal, you can access it by https://rhevmanager.example.com/rhevm-reports/login.html from your client machine. You can also access the report portal from the manager web interface by clicking on the Reports Portal hyperlink, which will redirect you to the report portal. Log in with rhevm-admin and the password credentials we set while running the RHEV-M report setup script in the previous section to generate reports and create and manage users to access the report portal. Initially, most of the report portal is empty since we are yet to set up and create the virtual infrastructure. It will take at least a day or two after the complete virtualization infrastructure setup to view various resources and generate reports. To learn more about using and gathering reports using the report portal, please refer to Reports, History Database Reports, and Dashboards at https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Virtualization/3.3/html/Administration_Guide/chap-Reports_History_Database_Reports_and_Dashboards.html. Summary In this article, we discussed setting up our basic virtualization infrastructure, which includes installing RHEV-M and report server and connecting to various portals such as admin, user, and report portal. Resources for Article: Further resources on this subject: Designing a XenDesktop® Site [article] XenMobile™ Solutions Bundle [article] Installing Virtual Desktop Agent – server OS and desktop OS [article]

0
0
3307

article-image-setting-software-infrastructure-cloud

Packt

23 Sep 2014

42 min read

Setting up of Software Infrastructure on the Cloud

Packt

23 Sep 2014

42 min read

0
0
2848

Packt

23 Sep 2014

24 min read

Animations in Cocos2d-x

Packt

23 Sep 2014

24 min read

In this article, created by Siddharth Shekhar, the author of Learning Cocos2d-x Game Development, we will learn different tools that can be used to animate the character. Then, using these animations, we will create a simple state machine that will automatically check whether the hero is falling or is being boosted up into the air, and depending on the state, the character will be animated accordingly. We will cover the following in this article: Animation basics TexturePacker Creating spritesheet for the player Creating and coding the enemy animation Creating the skeletal animation Coding the player walk cycle (For more resources related to this topic, see here.) Animation basics First of all, let's understand what animation is. An animation is made up of different images that are played in a certain order and at a certain speed, for example, movies that run images at 30 fps or 24 fps, depending on which format it is in, NTSC or PAL. When you pause a movie, you are actually seeing an individual image of that movie, and if you play the movie in slow motion, you will see the frames or images that make up to create the full movie. In games while making animations, we will do the same thing: adding frames and running them at a certain speed. We will control the images to play in a particular sequence and interval by code. For an animation to be "smooth", you should have at least 24 images or frames being played in a second, which is known as frames per second (FPS). Each of the images in the animation is called a frame. Let's take the example of a simple walk cycle. Each walk cycle should be of 24 frames. You might say that it is a lot of work, and for sure it is, but the good news is that these 24 frames can be broken down into keyframes, which are important images that give the illusion of the character walking. The more frames you add between these keyframes, the smoother the animation will be. The keyframes for a walk cycle are Contact, Down, Pass, and Up positions. For mobile games, as we would like to get away with as minimal work as possible, instead of having all the 24 frames, some games use just the 4 keyframes to create a walk animation and then speed up the animation so that player is not able to see the missing frames. So overall, if you are making a walk cycle for your character, you will create eight images or four frames for each side. For a stylized walk cycle, you can even get away with a lesser number of frames. For the animation in the game, we will create images that we will cycle through to create two sets of animation: an idle animation, which will be played when the player is moving down, and a boost animation, which will get played when the player is boosted up into the air. Creating animation in games is done using two methods. The most popular form of animation is called spritesheet animation and the other is called skeletal animation. Spritesheet animation Spritesheet animation is when you keep all the frames of the animation in a single file accompanied by a data file that will have the name and location of each of the frames. This is very similar to the BitmapFont. The following is the spritesheet we will be using in the game. For the boost and idle animations, each of the frames for the corresponding animation will be stored in an array and made to loop at a particular predefined speed. The top four images are the frames for the boost animation. Whenever the player taps on the screen, the animation will cycle through these four images appearing as if the player is boosted up because of the jetpack. The bottom four images are for the idle animation when the player is dropping down due to gravity. In this animation, the character will look as if she is blinking and the flames from the jetpack are reduced and made to look as if they are swaying in the wind. Skeletal animation Skeletal animation is relatively new and is used in games such as Rayman Origins that have loads and loads of animations. This is a more powerful way of making animations for 2D games as it gives a lot of flexibility to the developer to create animations that are fast to produce and test. In the case of spritesheet animations, if you had to change a single frame of the animation, the whole spritesheet would have to be recreated causing delay; imagine having to rework 3000 frames of animations in your game. If each frame was hand painted, it would take a lot of time to produce the individual images causing delay in production time, not to mention the effort and time in redrawing images. The other problem is device memory. If you are making a game for the PC, it would be fine, but in the case of mobiles where memory is limited, spritesheet animation is not a viable option unless cuts are made to the design of the game. So, how does skeletal animation work? In the case of skeletal animation, each item to be animated is stored in a separate spritesheet along with the data file for the locations of the individual images for each body part and object to be animated, and another data file is generated that positions and rotates the individual items for each of the frames of the animation. To make this clearer, look at the spritesheet for the same character created with skeletal animation: Here, each part of the body and object to be animated is a separate image, unlike the method used in spritesheet animation where, for each frame of animation, the whole character is redrawn. TexturePacker To create a spritesheet animation, you will have to initially create individual frames in Photoshop, Illustrator, GIMP or any other image editing software. I have already made it and have each of the images for the individual frames ready. Next, you will have to use a software to create spritesheets from images. TexturePacker is a very popular software that is used by industry professionals to create spritesheets. You can download it from https://www.codeandweb.com/. These are the same guys who made PhysicsEditor, which we used to make shapes for Box2D. You can use the trial version of this software. While downloading, choose the version that is compatible with your operating system. Fortunately, TexturePacker is available for all the major operating systems, including Linux. Refer to the following screenshot to check out the steps to use TexturePacker: Once you have downloaded TexturePacker, you have three options: you can click to try the full version for a week, or you can purchase the license, or click on the essential version to use in the trial version. In the trial version, some of the professional features are disabled, so I recommend trying the professional features for a week. Once you click the option, you should see the following interface: Texture packer has three panels; let's start from the right. The right-hand side panel will display the names of all the images that you select to create the spritesheet. The center panel is a preview window that shows how the images are packed. The left-hand side panel gives you options to store the packed texture and data file to be published to and decide the maximum size of the packed image. The Layout section gives a lot of flexibility to set up the individual images in TexturePacker, and then you have the advanced section. Let's look at some of the key items on the panel on the left. The display section The display section consists of the following options: Data Format: As we saw earlier, each exported file creates a spritesheet that has a collection of images and a data file that keeps track of the positions on the spritesheet. The data format usually changes depending upon the framework or engine. In TexturePacker, you can select the framework that you are using to develop the game, and TexturePacker will create a data file format that is compatible with the framework. If you look at the drop-down menu, you can see a lot of popular frameworks and engines in the list such as 2DToolkit, OGRE, Cocos2d, Corona SDK, LibGDX, Moai, Sparrow/Starling, SpriteKit, and Unity. You can also create a regular JSON file too if you wish. Java Script Object Notification (JSON) is similar to an XML file that is used to store and retrieve data. It is a collection of names and value pairs used for data interchanging. Data file: This is the location where you want the exported file to be placed. Texture format: Usually, this is set to .png, but you can select the one that is most convenient. Apart from PNG, you also have PVR, which is used so that people cannot view the image readily and also provides image compression. Png OPT file: This is used to set the quality of PNG images. Image format: This sets the RGB format to be used; usually, you would want this to be set at the default value. AutoSD: If you are going to create images for different resolutions, this option allows you to create resources depending on the different resolutions you are developing the game for, without the need for going into the graphics software, shrinking the images and packing them again for all the resolutions. Content protection: This protects the image and data file with an encryption key so that people can't steal spritesheets from the game file. The Geometry section The Geometry section consists of the following options: Max size: You can specify the maximum width and height of the spritesheet depending upon the framework. Usually, all frameworks allow up to 4092 x 4092, but it mostly depends on the device. Fixed size: Apparently, if you want a fixed size, you will go with this option. Size constraint: Some frameworks prefer the spritesheets to be in the power of 2 (POT), for example, 32x32, 64x64, 256x256, and so on. If this is the case, you need to select the size accordingly. For Cocos2d, you can choose any size. Scale: This is used to scale up or scale down the image. The Layout section The Layout section consists of the following options: Algorithm: This is the algorithm that will be used to make sure that the images you select to create the spritesheet are packed in the most efficient way. If you are using the pro version, choose MaxRects, but if you are using the essential version, you will have to choose Basic. Border Padding / Shape Padding: Border padding packs the gap between the border of the spritesheet and the image that it is surrounding. Shape padding is the padding between the individual images of the spritesheets. If you find that the images are getting overlapped while playing the animation in the game, you might want to increase the values to avoid overlapping. Trim: This removes the extra alpha that is surrounding the image, which would unnecessarily increase the image size of the spritesheet. Advanced features The following are some miscellaneous options in TexturePacker: Texture path: This appends the path of the texture file at the beginning of the texture name Clean transparent pixels: This sets the transparent pixels color to #000 Trim sprite names: This will remove the extension from the names of the sprites (.png and .jpg), so while calling for the name of the frame, you will not have to use extensions Creating a spritesheet for the player Now that we understand the different items in the TextureSettings panel of TexturePacker, let's create our spritesheet for the player animation from individual frames provided in the Resources folder. Open up the folder in the system and select all the images for the player that contains the idle and boost frames. There will be four images for each of the animation. Select all eight images and click-and-drag all the images to the Sprites panel, which is the right-most panel of TexturePacker. Once you have all the images on the Sprites panel, the preview panel at the center will show a preview of the spritesheet that will be created: Now on the TextureSettings panel, for the Data format option, select cocos2d. Then, in the Data file option, click on the folder icon on the right and select the location where you would like to place the data file and give the name as player_anim. Once selected, you will see that the Texture file location also auto populates with the same location. The data file will have a format of .plist and the texture file will have an extension of .png. The .plist format creates data in a markup language similar to XML. Although it is more common on Mac, you can use this data type independent of the platform you use while developing the game using Cocos2d-x. Keep the rest of the settings the same. Save the file by clicking on the save icon on the top to a location where the data and spritesheet files are saved. This way, you can access them easily the next time if you want to make the same modifications to the spritesheet. Now, click on the Publish button and you will see two files, player_anim.plist and player_anim.png, in the location you specified in the Data file and Location file options. Copy and paste these two files in the Resources folder of the project so that we can use these files to create the player states. Creating and coding enemy animation Now, let's create a similar spritesheet and data file for the enemy also. All the required files for the enemy frames are provided in the Resources folder. So, once you create the spritesheet for the enemy, it should look something like the following screenshot. Don't worry if the images are shown in the wrong sequence, just make sure that the files are numbered correctly from 1 to 4 and it is in the sequence the animations needs to be played in. Now, place the enemy_anim.png spritesheet and data file in the Resources folder in the directory and add the following lines of code in the Enemy.cpp file to animate the enemy: //enemy animation CCSpriteBatchNode* spritebatch = CCSpriteBatchNode::create("enemy_anim.png"); CCSpriteFrameCache* cache = CCSpriteFrameCache::sharedSpriteFrameCache(); cache->addSpriteFramesWithFile("enemy_anim.plist"); this->createWithSpriteFrameName("enemy_idle_1.png"); this->addChild(spritebatch); //idle animation CCArray* animFrames = CCArray::createWithCapacity(4); char str1[100] = {0}; for(int i = 1; i <= 4; i++) { sprintf(str1, "enemy_idle_%d.png", i); CCSpriteFrame* frame = cache->spriteFrameByName( str1 ); animFrames->addObject(frame); } CCAnimation* idleanimation = CCAnimation::createWithSpriteFrames(animFrames, 0.25f); this->runAction (CCRepeatForever::create(CCAnimate::create(idleanimation))) ; This is very similar to the code for the player. The only difference is that for the enemy, instead of calling the function on the hero, we call it to the same class. So, now if you build and run the game, you should see the enemy being animated. The following is the screenshot from the updated code. You can now see the flames from the booster engine of the enemy. Sadly, he doesn't have a boost animation but his feet swing in the air. Now that we have mastered the spritesheet animation technique, let's see how to create a simple animation using the skeletal animation technique. Creating the skeletal animation Using this technique, we will create a very simple player walk cycle. For this, there is a software called Spine by Esoteric Software, which is a very widely used professional software to create skeletal animations for 2D games. The software can be downloaded from the company's website at http://esotericsoftware.com/spine-purchase: There are three versions of the software available: the trial, essential, and professional versions. Although majority of the features of the professional version are available in the essential version, it doesn't have ghosting, meshes, free-form deformation, skinning, and IK pinning, which is in beta stage. The inclusion of these features does speed up the animation process and certainly takes out a lot of manual work for the animator or illustrator. To learn more about these features, visit the website and hover the mouse over these features to have a better understanding of what they do. You can follow along by downloading the trial version, which can be done by clicking the Download trial link on the website. Spine is available for all platforms including Windows, Mac, and Linux. So download it for the OS of your choice. On Mac, after downloading and running the software, it will ask to install X11, or you can download and install it from http://xquartz.macosforge.org/landing/. After downloading and installing the plugin, you can open Spine. Once the software is up and running, you should see the following window: Now, create a new project by clicking on the spine icon on the top left. As we can see in the screenshot, we are now in the SETUP mode where we set up the character. On the Tree panel on the right-hand side, in the Hierarchy pane, select the Images folder. After selecting the folder, you will be able to select the path where the individual files are located for the player. Navigate to the player_skeletal_anim folder where all the images are present. Once selected, you will see the panel populate with the images that are present in the folder, namely the following: bookGame_player_Lleg bookGame_player_Rleg bookGame_player_bazooka bookGame_player_body bookGame_player_hand bookGame_player_head Now drag-and-drop all the files from the Images folder onto the scene. Don't worry if the images are not in the right order. In the Draw Order dropdown in the Hierarchy panel, you can move around the different items by drag-and-drop to make them draw in the order that you want them to be displayed. Once reordered, move the individual images on the screen to the appropriate positions: You can move around the images by clicking on the translate button on the bottom of the screen. If you hover over the buttons, you can see the names of the buttons. We will now start creating the bones that we will use to animate the character. In the panel on the bottom of the Tools section, click on the Create button. You should now see the cursor change to the bone creation icon. Before you create a bone, you have to always select the bone that will be the parent. In this case, we select the root bone that is in the center of the character. Click on it and drag downwards and hold the Shift key at the same time. Click-and-drag downwards up to the end of the blue dress of the character; make sure that the blue dress is highlighted. Now release the mouse button. The end point of this bone will be used as the hip joint from where the leg bones will be created for the character. Now select the end of the newly created bone, which you made in the last step, and click-and-drag downwards again holding Shift at the same time to make a bone that goes all the way to the end of the leg. With the leg still getting highlighted, release the mouse button. To create the bone for the other leg, create a new bone again starting from end of the first bone and the hip joint, and while the other leg is selected, release the mouse button to create a bone for the leg. Now, we will create a bone for the hand. Select the root node, the node in the middle of the character while holding Shift again, and draw a bone to the hand while the hand is highlighted. Create a bone for the head by again selecting the root node selected earlier. Draw a bone from the root node to the head while holding Shift and release the mouse button once you are near the ear of the character and the head is highlighted. You will notice that we never created a bone for the bazooka. For the bazooka, we will make the hand as the parent bone so that when the hand gets rotated, the bazooka also rotates along. Click on the bazooka node on the Hierarchy panel (not the image) and drag it to the hand node in the skeleton list. You can rotate each of the bones to check whether it is rotating properly. If not, you can move either the bones or images around by locking either one of them in its place so that you can move or rotate the other freely by clicking either the bones or the images button in the compensate button at the bottom of the screen. The following is the screenshot that shows my setup. You can use it to follow and create the bones to get a more satisfying animation. To animate the character, click on the SETUP button on the top and the layout will change to ANIMATE. You will see that a new timeline has appeared at the bottom. Click on the Animations tab in Hierarchy and rename the animation name from animation to runCycle by double-clicking on it. We will use the timeline to animate the character. Click on the Dopesheet icon at the bottom. This will show all the keyframes that we have made for the animation. As we have not created any, the dopesheet is empty. To create our first keyframe, we will click on the legs and rotate both the bones so that it reflects the contact pose of the walk cycle. Now to set a keyframe, click on the orange-colored key icon next to Rotate in the Transform panel at the bottom of the screen. Click on the translate key, as we will be changing the translation as well later. Once you click on it, the dopesheet will show the bones that you just rotated and also show what changes you made to the bone. Here, we rotated the bone, so you will see Rotation under the bones, and as we clicked on the translate key, it will show the Translate also. Now, frame 24 is the same as frame 0. So, to create the keyframe at frame 24, drag the timeline scrubber to frame 24 and click on the rotate and translate keys again. To set the keyframe at the middle where the contact pose happens but with opposite legs, rotate the legs to where the opposite leg was and select the keys to create a keyframe. For frames 6 and 18, we will keep the walk cycle very simple, so just raise the character above by selecting the root node, move it up in the y direction and click the orange key next to the translate button in the Transform panel at the bottom. Remember that you have to click it once in frame 6 and then move the timeline scrubber to frame 18, move the character up again, and click on the key again to create keyframes for both frames 6 and 18. Now the dopesheet should look as follow: Now to play the animation in a loop, click on the Repeat Animation button to the right of the Play button and then on the Play button. You will see the simple walk animation we created for the character. Next, we will export the data required to create the animation in Cocos2d-x. First, we will export the data for the animation. Click on the Spine button on top and select Export. The following window should pop up. Select JSON and choose the directory in which you would want to save the file to and click on Export: That is not all; we have to create a spritesheet and data file just as we created one in texture packer. There is an inbuilt tool in Spine to create a packed spritesheet. Again, click on the Spine icon and this time select Texture Packer. Here, in the input directory, select the Images folder from where we imported all the images initially. For the output directory, select the location to where the PNG and data files should be saved to. If you click on the settings button, you will see that it looks very similar to what we saw in TexturePacker. Keep the default values as they are. Click on Pack and give the name as player. This will create the .png and .atlas files, which are the spritesheet and data file, respectively: You have three files instead of the two in TexturePacker. There are two data files and an image file. While exporting the JSON file, if you didn't give it a name, you can rename the file manually to player.json just for consistency. Drag the player.atlas, player.json, and player.png files into the project folder. Finally, we come to the fun part where we actually use the data files to animate the character. For testing, we will add the animations to the HelloWorldScene.cpp file and check the result. Later, when we add the main menu, we will move it there so that it shows as soon as the game is launched. Coding the player walk cycle If you want to test the animations in the current project itself, add the following to the HelloWorldScene.h file first: #include <spine/spine-cocos2dx.h> Include the spine header file and create a variable named skeletonNode of the CCSkeletalAnimation type: extension::CCSkeletonAnimation* skeletonNode; Next, we initialize the skeletonNode variable in the HelloWorldScene.cpp file: skeletonNode = extension::CCSkeletonAnimation::createWithFile("player.json", "player.atlas", 1.0f); skeletonNode->addAnimation("runCycle",true,0,0); skeletonNode->setPosition(ccp(visibleSize.width/2 , skeletonNode->getContentSize().height/2)); addChild(skeletonNode); Here, we give the two data files into the createWithFile() function of CCSkeletonAnimation. Then, we initiate it with addAnimation and give it the animation name we gave when we created the animation in Spine, which is runCycle. We next set the position of the skeletonNode; we set it right above the bottom of the screen. Next, we add the skeletonNode to the display list. Now, if you build and run the project, you will see the player getting animated forever in a loop at the bottom of the screen: On the left, we have the animation we created using TexturePacker from CodeAndWeb, and in the middle, we have the animation that was created using Spine from Esoteric Software. Both techniques have their set of advantages, and it also depends upon the type and scale of the game that you are making. Depending on this, you can choose the tool that is more tuned to your needs. If you have a smaller number of animations in your game and if you have good artists, you could use regular spritesheet animations. If you have a lot of animations or don't have good animators in your team, Spine makes the animation process a lot less cumbersome. Either way, both tools in professional hands can create very good animations that will give life to the characters in the game and therefore give a lot of character to the game itself. Summary This article took a very brief look at animations and how to create an animated character in the game using the two of the most popular animation techniques used in games. We also looked at FSM and at how we can create a simple state machine between two states and make the animation change according to the state of the player at that moment. Resources for Article: Further resources on this subject: Moving the Space Pod Using Touch [Article] Sprites [Article] Cocos2d-x: Installation [Article]

0
0
11395

Packt

23 Sep 2014

5 min read

Galera Cluster Basics

Packt

23 Sep 2014

5 min read

This article written by Pierre MAVRO, the author of MariaDB High Performance, provides a brief introduction to Galera Cluster. (For more resources related to this topic, see here.) Galera Cluster is a synchronous multimaster solution created by Codership. It's a patch for MySQL and MariaDB with its own commands and configuration. On MariaDB, it has been officially promoted as the MariaDB Cluster. Galera Cluster provides certification-based replication. This means that each node certifies the replicated write set against other write sets. You don't have to worry about data integrity, as it manages it automatically and very well. Galera Cluster is a young product, but is ready for production. If you have already heard of MySQL Cluster, don't be confused; this is not the same thing at all. MySQL Cluster is a solution that has not been ported to MariaDB due to its complexity, code, and other reasons. MySQL Cluster provides availability and partitioning, while Galera Cluster provides consistency and availability. Galera Cluster is a simple yet powerful solution. How Galera Cluster works The following are some advantages of Galera Cluster: True multimaster: It can read and write to any node at any time Synchronous replication: There is no slave lag and no data is lost at node crash Consistent data: All nodes have the same state (same data exists between nodes at a point in time) Multithreaded slave: This enables better performance with any workload No need of an HA Cluster for management: There are no master-slave failover operations (such as Pacemaker, PCR, and so on) Hot standby: There is no downtime during failover Transparent to applications: No specific drivers or application changes are required No read and write splitting needed: There is no need to split the read and write requests WAN: Galera Cluster supports WAN replication Galera Cluster needs at least three nodes to work properly (because of the notion of quorum, election, and so on). You can also work with a two-node cluster, but you will need an arbiter (hence three nodes). The arbiter could be used on another machine available in the same LAN of your Galera Cluster, if possible. The multimaster replication has been designed for InnoDB/XtraDB. It doesn't mean you can't perform a replication with other storage engines! If you want to use other storage engines, you will be limited by the following: They can only write on a single node at a time to maintain consistency. Replication with other nodes may not be fully supported. Conflict management won't be supported. Applications that connect to Galera will only be able to write on a single node (IP/DNS) at the same time. As you can see in the preceding diagram, HTTP and App servers speak directly to their respective DBMS servers without wondering which node of the Galera Cluster they are targeting. Usually, without Galera Cluster, you can use a cluster software such as Pacemaker/Corosync to get a VIP on a master node that can switch over in case a problem occurs. No need to get PCR in that case; a simple VIP with a custom script will be sufficient to check whether the server is in sync with others is enough. Galera Cluster uses the following advanced mechanisms for replication: Transaction reordering: Transactions are reordered before commitment to other nodes. This increases the number of successful transaction certification pass tests. Write sets: This reduces the number of operations between nodes by writing sets in a single write set to avoid too much node coordination. Database state machine: Read-only transactions are processed on the local node. Write transactions are executed locally on shadow copies and then broadcasted as a read set to the other nodes for certification and commit. Group communication: High-level abstraction for communication between nodes to guarantee consistency (gcomm or spread). To get consistency and similar IDs between nodes, Galera Cluster uses GTID, similar to MariaDB 10 replication. However, it doesn't use the MariaDB GTID replication mechanism at all, as it has its own implementation for its own usage. Galera Cluster limitations Galera Cluster has limitations that prevent it from working correctly. Do not go live in production if you haven't checked that your application is in compliance with the limitations listed. The following are the limitations: Galera Cluster only fully supports InnoDB tables. TokuDB is planned but not yet available and MyISAM is partially supported. Galera Cluster uses primary keys on all your tables (mandatory) to avoid different query execution orders between all your nodes. If you do not do it on your own, Galera will create one. The delete operation is not supported on the tables without primary keys. Locking/unlocking tables and lock functions are not supported. They will be ignored if you try to use them. Galera Cluster disables query cache. XA transactions (global transactions) are not supported. Query logs can't be directed to a table, but can be directed to a file instead. Other less common limitations exist (please refer to the full list if you want to get them all: http://galeracluster.com/documentation-webpages/limitations.html) but in most cases, you shouldn't be annoyed with those ones. Summary This article introduced the benefits and drawbacks of Galera Cluster. It also discussed the features of Galera Cluster that makes it a good solution for write replications. Resources for Article: Further resources on this subject: Building a Web Application with PHP and MariaDB – Introduction to caching [Article] Using SHOW EXPLAIN with running queries [Article] Installing MariaDB on Windows and Mac OS X [Article]

0
0
6759

article-image-testing-your-recipes-and-getting-started-chefspec

Packt

23 Sep 2014

18 min read

Testing Your Recipes and Getting Started with ChefSpec

Packt

23 Sep 2014

18 min read

This article by John Ewart, the author of Chef Essentials, introduces how to model your infrastructure, provision hosts in the cloud, and what goes into a cookbook. One important aspect of developing cookbooks is writing tests so that your recipes do not degrade over time or have bugs introduced into them in the future. This article introduces you to the following concepts: Understanding test methodologies How RSpec structures your tests Using ChefSpec to test recipes Running your tests Writing tests that cover multiple platforms (For more resources related to this topic, see here.) These techniques will prove to be very useful to write robust, maintainable cookbooks that you can use to confidently manage your infrastructure. Tests enable you to perform the following: Identify mistakes in your recipe logic Test your recipes against multiple platforms locally Develop recipes faster with local test execution before running them on a host Catch the changes in dependencies that will otherwise break your infrastructure before they get deployed Write tests for bugs to prevent them from happening again in the future (regression) Testing recipes There are a number of ways to test your recipes. One approach is to simply follow the process of developing your recipes, uploading them to your Chef server, and deploying them to a host; repeat this until you are satisfied. This has the benefit of executing your recipes on real instances, but the drawback is that it is slow, particularly if you are testing on multiple platforms, and requires that you maintain a fleet of hosts. If your cookbook run times are reasonably short and you have a small number of platforms to support them, then this might be a viable option. There is a better option to test your recipes, and it is called ChefSpec. For those who have used RSpec, a Ruby testing library, these examples will be a natural extension of RSpec. If you have never used RSpec, the beginning of this article will introduce you to RSpec's testing language and mechanisms. RSpec RSpec is a framework to test Ruby code that allows you to use a domain-specific language to provide tests, much in the same way Chef provides a domain-specific language to manipulate an infrastructure. Instead of using a DSL to manage systems, RSpec's DSL provides a number of components to express the expectations of code and simulate the execution of portions of the system (also known as mocking). The following examples in RSpec should give you a high-level idea of what RSpec can do: # simple expectationit 'should add 2 and 2 together' dox = 2 + 2expect(x).to eq 4end# Ensure any instance of Object receives a call to 'foo'# and return a pre-defined value (mocking)it 'verifies that an instance receives :foo' doexpect_any_instance_of(Object) .to receive(:foo).and_return(:return_value)o = Object.new expect(o.foo).to eq(:return_value) end# Deep expectations (i.e client makes an HTTP call somewhere# inside it, make sure it happens as expected)it 'should make an authorized HTTP GET call' doexpect_any_instance_of(Net::HTTP::Get) .to receive(:basic_auth)@client.make_http_callend RSpec and ChefSpec As with most testing libraries, RSpec enables you to construct a set of expectations, build objects and interact with them, and verify that the expectations have been met. For example, one expects that when a user logs in to the system, a database record is created, tracking their login history. However, to keep tests running quickly, the application should not make an actual database call; in place of the actual database call, a mock method should be used. Here, our mock method will catch the message in the database in order to verify that it was going to be sent; then, it will return an expected result so that the code does not know the database is not really there. Mock methods are methods that are used to replace one call with another; you can think of them as stunt doubles. For example, rather than making your code actually connect to the database, you might want to write a method that acts as though it has successfully connected to the database and fetched the expected data. This can be extended to model Chef's ability to handle multiple platforms and environments very nicely; code should be verified to behave as expected on multiple platforms without having to execute recipes on those platforms. This means that you can test the expectations about Red Hat recipes from an OS X development machine or Windows recipes from an Ubuntu desktop, without needing to have hosts around to deploy to for testing purposes. Additionally, the development cycle time is greatly reduced as tests can be executed much faster with expectations than when they are performing some work on an end host. You may be asking yourself, "How does this replace testing on an actual host?" The answer is that it may not, and so you should use integration testing to validate that the recipes work when deployed to real hosts. What it does allow you to do is validate your expectations of what resources are being executed, which attributes are being used, and that the logical flow of your recipes are behaving properly before you push your code to your hosts. This forms a tighter development cycle for rapid development of features while providing a longer, more formal loop to ensure that the code behaves correctly in the wild. If you are new to testing software, and in particular, testing Ruby code, this is a brief introduction to some of the concepts that we will cover. Testing can happen at many different levels of the software life cycle: Single-module level (called unit tests) Multi-module level (known as functional tests) System-level testing (also referred to as integration testing) Testing basics In the test-driven-development (TDD) philosophy, tests are written and executed early and often, typically, even before code is written. This guarantees that your code conforms to your expectations from the beginning and does not regress to a previous state of non-conformity. This article will not dive into the TDD philosophy and continuous testing, but it will provide you with enough knowledge to begin testing the recipes that you write and feel confident that they will do the correct thing when deployed into your production environment. Comparing RSpec with other testing libraries RSpec is designed to provide a more expressive testing language. This means that the syntax of an RSpec test (also referred to as a spec test or spec) is designed to create a language that feels more like a natural language, such as English. For example, using RSpec, one could write the following: expect(factorial(4)).to eq 24 If you read the preceding code, it will come out like expect factorial of 4 to equal 24. Compare this to a similar JUnit test (for Java): assertEquals(24, factorial(4)); If you read the preceding code, it would sound more like assert that the following are equal, 24 and factorial of 4. While this is readable by most programmers, it does not feel as natural as the one we saw earlier. RSpec also provides context and describe blocks that allow you to group related examples and shared expectations between examples in the group to help improve organization and readability. For example, consider the following spec test: describe Array do it "should be empty when created" do Array.new.should == [] endend Compare the preceding test to a similar NUnit (.NET testing framework) test: namespace MyTest { using System.Collectionusing NUnit.Framework; [TestFixture] public class ArrayTest { [Test] public void NewArray() { ArrayList list = new ArrayList(); Assert.AreEqual(0, list.size()); }}} Clearly, the spec test is much more concise and easier to read, which is a goal of RSpec. Using ChefSpec ChefSpec brings the expressiveness of RSpec to Chef cookbooks and recipes by providing Chef-specific primitives and mechanisms on top of RSpec's simple testing language. For example, ChefSpec allows you to say things like: it 'creates a file' do expect(chef_run).to create_file('/tmp/myfile.txt') end Here, chef_run is an instance of a fully planned Chef client execution on a designated end host, as we will see later. Also, in this case, it is expected that it will create a file, /tmp/myfile.txt, and the test will fail if the simulated run does not create such a file. Getting started with ChefSpec In order to get started with ChefSpec, create a new cookbook directory (here it is $HOME/cookbooks/mycookbook) along with a recipes and spec directory: mkdir -p ~/cookbooks/mycookbookmkdir -p ~/cookbooks/mycookbook/recipesmkdir -p ~/cookbooks/mycookbook/spec Now you will need a simple metadata.rb file inside your cookbook (here, this will be ~/cookbooks/mycookbook/metadata.rb): maintainer "Your name here"maintainer_email "you@domain.com"license "Apache"description "Simple cookbook"long_description "Super simple cookbook"version "1.0"supports "debian" Once we have this, we now have the bare bones of a cookbook that we can begin to add recipes and tests to. Installing ChefSpec In order to get started with ChefSpec, you will need to install a gem that contains the ChefSpec libraries and all the supporting components. Not surprisingly, that gem is named chefspec and can be installed simply by running the following: gem install chefspec However, because Ruby gems often have a number of dependencies, the Ruby community has built a tool called Bundler to manage gem versions that need to be installed. Similar to how RVM provides interpreter-level version management and a way to keep your gems organized, Bundler provides gem-level version management. We will use Bundler for two reasons. In this case, we want to limit the number of differences between the versions of software you will be installing and the versions the author has installed to ensure that things are as similar as possible; secondly, this extends well to releasing production software—limiting the number of variables is critical to consistent and reliable behavior. Locking your dependencies in Ruby Bundler uses a file, specifically named Gemfile, to describe the gems that your project is dependent upon. This file is placed in the root of your project, and its contents inform Bundler which gems you are using, what versions to use, and where to find gems so that it can install them as needed. For example, here is the Gemfile that is being used to describe the gem versions that are used when writing these examples: source 'https://rubygems.org'gem 'chef', '11.10.0'gem 'chefspec', '3.2.0'gem 'colorize', '0.6.0' Using this will ensure that the gems you install locally match the ones that are used when writing these examples. This should limit the differences between your local testing environments if you run these examples on your workstation. In order to use a Gemfile, you will need to have Bundler installed. If you are using RVM, Bundler should be installed with every gemset you create; if not, you will need to install it on your own via the following code: gem install bundler Once Bundler is installed and a Gemfile that contains the previous lines is placed in the root directory of your cookbook, you can execute bundle install from inside your cookbook's directory: user@host:~/cookbooks/mycookbook $> bundle install Bundler will parse the Gemfile in order to download and install the versions of the gems that are defined inside. Here, Bundler will install chefspec, chef, and colorize along with any dependencies those gems require that you do not already have installed. Creating a simple recipe and a matching ChefSpec test Once these dependencies are installed, you will want to create a spec test inside your cookbook and a matching recipe. In keeping with the TDD philosophy, we will first create a file, default_spec.rb, in the spec directory. The name of the spec file should match the name of the recipe file, only with the addition of _spec at the end. If you have a recipe file named default.rb (which most cookbooks will), the matching spec test would be contained in a file named default_spec.rb. Let's take a look at a very simple recipe and a matching ChefSpec test. Writing a ChefSpec test The test, shown as follows, will verify that our recipe will create a new file, /tmp/myfile.txt: require 'chefspec'describe 'mycookbook::default' dolet(:chef_run) { ChefSpec::Runner.new.converge(described_recipe)} it 'creates a file' do expect(chef_run).to create_file('/tmp/myfile.txt') endend Here, RSpec uses a describe block similar to the way Chef uses a resource block (again, blocks are identified by the do ... end syntax or code contained inside curly braces) to describe a resource, in this case, the default recipe inside of mycookbook. The described resource has a number of examples, and each example is described by an it block such as the following, which comes from the previous spec test: it 'creates a file' doexpect(chef_run).to create_file('/tmp/myfile.txt') end The string given to the it block provides the example with a human-readable description of what the example is testing; in this case, we are expecting that the recipe creates a file. When our recipes are run through ChefSpec, the resources described are not actually created or modified. Instead, a model of what would happen is built as the recipes are executed. This means that ChefSpec can validate that an expected action would have occurred if the recipe were to be executed on an end host during a real client run. It's important to note that each example block resets expectations before it is executed, so any expectations defined inside of a given test will not fall through to other tests. Because most of the tests will involve simulating a Chef client run, we want to run the simulation every time. There are two options: execute the code in every example or use a shared resource that all the tests can take advantage of. In the first case, the test will look something like the following: it 'creates a file' dochef_run = ChefSpec::Runner.new.converge(described_recipe) expect(chef_run).to create_file('/tmp/myfile.txt') end The primary problem with this approach is remembering that every test will have to have the resource running at the beginning of the test. This translates to a large amount of duplicated code, and if the client needs to be configured differently, then the code needs to be changed for all the tests. To solve this problem, RSpec provides access to a shared resource through a built-in method, let. Using let allows a test to define a shared resource that is cached for each example and reset as needed for the following examples. This resource is then accessible inside of each block as a local variable, and RSpec takes care of knowing when to initialize it as needed. Our example test uses a let block to define the chef_run resource, which is described as a new ChefSpec runner for the described recipe, as shown in the following code: let(:chef_run) {ChefSpec::Runner.new.converge(described_recipe)} Here, described_recipe is a ChefSpec shortcut for the name of the recipe provided in the describe block. Again, this is a DRY (don't repeat yourself) mechanism that allows us to rename the recipe and then only have to change the name of the description rather than hunt through the code. These techniques make tests better able to adapt to changes in names and resources, which reduces code rot as time goes by. Building our recipe The recipe, as defined here, is a very simple recipe whose only job is to create a simple file, /tmp/myfile.txt, on the end host: file "/tmp/myfile.txt" doowner "root" group "root" mode "0755" action :createend Put this recipe into the recipes/default.rb file of your cookbook so that you have the following file layout: mycookbook/|- recipes/| |- default.rb|- spec/ |- default_spec.rb Executing tests In order to run the tests, we use the rspec application. This is a Ruby script that comes with the RSpec gem, which will run the test scripts as spec tests using the RSpec language. It will also use the ChefSpec extensions because in our spec test, we have included them via the line require 'chefspec' at the top of our default_spec.rb file. Here, rspec is executed through Bundler to ensure that the desired gem versions, as specified in our Gemfile, are used at runtime without having to explicitly load them. This is done using the bundle exec command: bundle exec rspec spec/default_spec.rb This will run RSpec using Bundler and process the default_spec.rb file. As it runs, you will see the results of your tests, a . (period) for tests that pass, and an F for any tests that fail. Initially, the output from rspec will look like this: Finished in 0.17367 seconds1 example, 0 failures RSpec says that it completed the execution in 0.17 seconds and that you had one example with zero failures. However, the results would be quite different if we have a failed test; RSpec will tell us which test failed and why. Understanding failures RSpec is very good at telling you what went wrong with your tests; it doesn't do you any good to have failing tests if it's impossible to determine what went wrong. When an expectation in your test is not met, RSpec will tell you which expectation was unmet, what the expected value was, and what value was seen. In order to see what happens when a test fails, modify your recipe to ensure that the test fails. Look in your recipe for the following file resource: file "/tmp/myfile.txt" do Replace the file resource with a different filename, such as myfile2.txt, instead of myfile.txt, like the following example: file "/tmp/myfile2.txt" do Next, rerun your spec tests; you will see that the test is now failing because the simulated Chef client execution did something that was unexpected by our spec test. An example of this new execution would look like the following: [user@host]$ bundle exec rspec spec/default_spec.rbFFailures:1) my_cookbook::default creates a file Failure/Error: expect(chef_run).to create_file('/tmp/myfile.txt') expected "file[/tmp/myfile.txt]" with action :create to be in Chef run. Other file resources: file[/tmp/myfile2.txt] # ./spec/default_spec.rb:9:in `block (2 levels) in <top (required)>'Finished in 0.18071 seconds1 example, 1 failure Notice that instead of a dot, the test results in an F; this is because the test is now failing. As you can see from the previous output, RSpec is telling us the following: The creates a file example in the 'my_cookbook::default' test suite failed Our example failed in the ninth line of default_spec.rb (as indicated by the line that contained ./spec/default_spec.rb:9) The file resource /tmp/myfile.txt was expected to be operated on with the :create action The recipe interacted with a file resource /tmp/myfile2.txt instead of /tmp/myfile.txt RSpec will continue to execute all the tests in the files specified on the command line, printing out their status as to whether they passed or failed. If your tests are well written and run in isolation, then they will have no effect on one another; it should be safe to execute all of them even if some fail so that you can see what is no longer working. Summary RSpec with ChefSpec extensions provides us with incredibly powerful tools to test our cookbooks and recipes. In this article, you have seen how to develop basic ChefSpec tests for your recipes, organize your spec tests inside of your cookbook, execute and analyze the output of your spec tests, and simulate the execution of your recipes across multiple platforms. Resources for Article: Further resources on this subject: Chef Infrastructure [article] Going Beyond the Basics [article] Setting Up a Development Environment [article]

0
0
9670

Packt

23 Sep 2014

8 min read

JavaScript Promises – Why Should I Care?

Packt

23 Sep 2014

8 min read

This article by Rami Sarieddine, the author of the book JavaScript Promises Essentials, introduces JavaScript promises and reasons why should you care about promises when comparing it to the common way of doing things asynchronously. (For more resources related to this topic, see here.) Why should I care about promises? What do promises have to do with all of this? Well, let's start by defining promises. "A promise represents the eventual result of an asynchronous operation." - Promises/A+ specification, http://promisesaplus.com/ So a promise object represents a value that may not be available yet, but will be resolved at some point in the future. Promises have states and at any point in time, can be in one of the following: Pending: The promise's value is not yet determined and its state may transition to either fulfilled or rejected. Fulfilled: The promise was fulfilled with success and now has a value that must not change. Additionally, it must not transition to any other state from the fulfilled state. Rejected: The promise is returned from a failed operation and must have a reason for failure. This reason must not change and the promise must not transition to any other state from this state. A promise may only move from the pending state to the fulfilled state or from the pending state to the rejected state. However, once a promise is either fulfilled or rejected, it must not transition to any other state and its value cannot change because it is immutable. The immutable characteristic of promises is super important. It helps evade undesired side-effects from listeners, which can cause unexpected changes in behavior, and in turn allows promises to be passed to other functions without affecting the caller function. From an API perspective, a promise is defined as an object that has a function as the value for the property then. The promise object has a primary then method that returns a new promise object. Its syntax will look like the following: then(onFulfilled, onRejected); The following two arguments are basically callback functions that will be called for completion of a promise: onFulfilled: This argument is called when a promise is fulfilled onRejected: This argument is called when a promise has failed Bear in mind that both the arguments are optional. Moreover, non-function values for the arguments will be ignored, so it might be a good practice to always check whether the arguments passed are functions before executing them. It is worth noting that when you research promises, you might come across two definitions/specs: one based on Promises/A+ and an older one based on Promises/A by CommonJS. The new promise returned by the then method is resolved when the given onFulfilled or onRejected callback is completed. The implementation reflects a very simple concept: when a promise is fulfilled, it has a value, and when it is rejected, it has a reason. The following is a simple example of how to use a promise: promise.then(function (value){ var result = JSON.parse(data).value; }, function (reason) { alert(error.message); }); The fact that the value returned from the callback handler is the fulfillment value for the returned promise allows promise operations to be chained together. Hence, we will have something like the following: $.getJSON('example.json').then(JSON.parse).then(function(response) { alert("Hello There: ", response); }); Well, you guessed it right! What the previous code sample does is chain the promise returned from the first then() call to the second then() call. Hence, the getJSON method will return a promise that contains the value of the JSON returned. Thus, we can call a then method on it, following which we will invoke another then call on the promise returned. This promise includes the value of JSON.parse. Eventually, we will take that value and display it in an alert. Can't I just use a callback? Callbacks are simple! We pass a function, it gets invoked at some point in the future, and we get to do things asynchronously. Additionally, callbacks are lightweight since we need to add extra libraries. Using functions as higher-order objects is already built into the JavaScript programming language; hence, we do not require additional code to use it. However, asynchronous programming in JavaScript can quickly become complicated if not dealt with care, especially callbacks. Callback functions tend to become difficult to maintain and debug when nested within long lines of code. Additionally, the use of anonymous inline functions in a callback can make reading the call stack very tedious. Also, when it comes to debugging, exceptions that are thrown back from within a deeply nested set of callbacks might not propagate properly up to the function that initiated the call within the chain, which makes it difficult to determine exactly where the error is located. Moreover, it is hard to structure a code that is based around callbacks as they roll out a messy code like a snowball. We will end up having something like the following code sample but on a much larger scale: function readJSON(filename, callback) { fs.readFile(filename, function (err, result) { if (err) return callback(err); try { result = JSON.parse(result, function (err, result) { fun.readAsync(result, function (err, result) { alert("I'm inside this loop now"); }); alert("I'm here now"); }); } catch (ex) { return callback(ex); } callback(null, result); }); } The sample code in the previous example is an excerpt of a deeply nested code that is sometimes referred to as the pyramid of doom. Such a code, when it grows, will make it a daunting task to read through, structure, maintain, and debug. Promises, on the other hand, provide an abstraction to manage interactions with asynchronous APIs and present a more managed approach towards asynchronous programming in JavaScript when compared to the use of callbacks and event handlers. We can think of promises as more of a pattern for asynchronous programming. Simply put, the promises pattern will allow the asynchronous programming to move from the continuation-passing style that is widespread to one where the functions we call return a value, called a promise that will represent the eventual results of that particular operation. It allows you to go from: call1(function (value1) { call2(value1, function(value2) { call3(value2, function(value3) { call4(value3, function(value4) { // execute some code }); }); }); }); To: Promise.asynCall(promisedStep1) .then(promisedStep2) .then(promisedStep3) .then(promisedStep4) .then(function (value4) { // execute some code }); If we list the properties that make promises easier to work with, they will be as follows: It is easier to read as in cleaner method signatures It allows us to attach more than one callback to a single promise It allows for values and errors to be passed along, and bubble up to the caller function It allows for chaining of promises What we can observe is that promises bring functional composition to synchronous capabilities by returning values, and error bubbling by throwing exceptions to the asynchronous functions. These are capabilities that we take for granted in the synchronous world. The following sample (dummy) code shows the difference between using callbacks to compose asynchronous functions communicating with each other and promises to do the same. The following is an example with callbacks: $("#testInpt").click(function () { firstCallBack(function (param) { getValues(param, function (result) { alert(result); }); }); }); The following is a code example that converts the previous callback functions to promise-returning functions that can be chained to each other: $("#testInpt").clickPromise() // promise-returning function .then(firstCallBack) .then(getValues) .then(alert); As we have seen, the flat chains that promises provide allow us to have code that is easier to read and eventually easier to maintain when compared to the traditional callback approach. Summary Promises are a pattern that allows for a standardized approach in asynchronous programming, which enables developers to write asynchronous code that is more readable and maintainable. Resources for Article: Further resources on this subject: REST – Where It Begins [Article] Uploading multiple files [Article] Grunt in Action [Article]

0
0
1915

article-image-using-socketio-and-express-together

Packt

23 Sep 2014

16 min read

Using Socket.IO and Express together

Packt

23 Sep 2014

16 min read

In this article by Joshua Johanan, the author of the book Building Scalable Apps with Redis and Node.js, tells us that Express application is just the foundation. We are going to add features until it is a fully usable app. We currently can serve web pages and respond to HTTP, but now we want to add real-time communication. It's very fortunate that we just spent most of this article learning about Socket.IO; it does just that! Let's see how we are going to integrate Socket.IO with an Express application. (For more resources related to this topic, see here.) We are going to use Express and Socket.IO side by side. Socket.IO does not use HTTP like a web application. It is event based, not request based. This means that Socket.IO will not interfere with Express routes that we have set up, and that's a good thing. The bad thing is that we will not have access to all the middleware that we set up for Express in Socket.IO. There are some frameworks that combine these two, but it still has to convert the request from Express into something that Socket.IO can use. I am not trying to knock down these frameworks. They simplify a complex problem and most importantly, they do it well (Sails is a great example of this). Our app, though, is going to keep Socket.IO and Express separated as much as possible with the least number of dependencies. We know that Socket.IO does not need Express, as all our examples have not used Express in any way. This has an added benefit in that we can break off our Socket.IO module and run it as its own application at a future point in time. The other great benefit is that we learn how to do it ourselves. We need to go into the directory where our Express application is. Make sure that our pacakage.json has all the additional packages for this article and run npm.install. The first thing we need to do is add our configuration settings. Adding Socket.IO to the config We will use the same config file that we created for our Express app. Open up config.js and change the file to what I have done in the following code: var config = {port: 3000,secret: 'secret',redisPort: 6379,redisHost: 'localhost',routes: { login: '/account/login', logout: '/account/logout'}};module.exports = config; We are adding two new attributes, redisPort and redisHost. This is because of how the redis package configures its clients. We also are removing the redisUrl attribute. We can configure all our clients with just these two Redis config options. Next, create a directory under the root of our project named socket.io. Then, create a file called index.js. This will be where we initialize Socket.IO and wire up all our event listeners and emitters. We are just going to use one namespace for our application. If we were to add multiple namespaces, I would just add them as files underneath the socket.io directory. Open up app.js and change the following lines in it: //variable declarations at the topVar io = require('./socket.io');//after all the middleware and routesvar server = app.listen(config.port);io.startIo(server); We will define the startIo function shortly, but let's talk about our app.listen change. Previously, we had the app.listen execute, and we did not capture it in a variable; now we are. Socket.IO listens using Node's http.createServer. It does this automatically if you pass in a number into its listen function. When Express executes app.listen, it returns an instance of the HTTP server. We capture that, and now we can pass the http server to Socket.IO's listen function. Let's create that startIo function. Open up index.js present in the socket.io location and add the following lines of code to it: var io = require('socket.io');var config = require('../config');var socketConnection = function socketConnection(socket){socket.emit('message', {message: 'Hey!'});};exports.startIo = function startIo(server){io = io.listen(server);var packtchat = io.of('/packtchat');packtchat.on('connection', socketConnection);return io;}; We are exporting the startIo function that expects a server object that goes right into Socket.IO's listen function. This should start Socket.IO serving. Next, we get a reference to our namespace and listen on the connection event, sending a message event back to the client. We also are loading our configuration settings. Let's add some code to the layout and see whether our application has real-time communication. We will need the Socket.IO client library, so link to it from node_modules like you have been doing, and put it in our static directory under a newly created js directory. Open layout.ejs present in the packtchatviews location and add the following lines to it: <script type="text/javascript" src="/js/socket.io.js"></script><script>var socket = io.connect("http://localhost:3000/packtchat");socket.on('message', function(d){console.log(d);});</script> We just listen for a message event and log it to the console. Fire up the node and load your application, http://localhost:3000. Check to see whether you get a message in your console. You should see your message logged to the console, as seen in the following screenshot: Success! Our application now has real-time communication. We are not done though. We still have to wire up all the events for our app. Who are you? There is one glaring issue. How do we know who is making the requests? Express has middleware that parses the session to see if someone has logged in. Socket.IO does not even know about a session. Socket.IO lets anyone connect that knows the URL. We do not want anonymous connections that can listen to all our events and send events to the server. We only want authenticated users to be able to create a WebSocket. We need to get Socket.IO access to our sessions. Authorization in Socket.IO We haven't discussed it yet, but Socket.IO has middleware. Before the connection event gets fired, we can execute a function and either allow the connection or deny it. This is exactly what we need. Using the authorization handler Authorization can happen at two places, on the default namespace or on a named namespace connection. Both authorizations happen through the handshake. The function's signature is the same either way. It will pass in the socket server, which has some stuff we need such as the connection's headers, for example. For now, we will add a simple authorization function to see how it works with Socket.IO. Open up index.js, present at the packtchatsocket.io location, and add a new function that will sit next to the socketConnection function, as seen in the following code: var io = require('socket.io');var socketAuth = function socketAuth(socket, next){return next();return next(new Error('Nothing Defined'));};var socketConnection = function socketConnection(socket){socket.emit('message', {message: 'Hey!'});};exports.startIo = function startIo(server){io = io.listen(server);var packtchat = io.of('/packtchat');packtchat.use(socketAuth);packtchat.on('connection', socketConnection);return io;}; I know that there are two returns in this function. We are going to comment one out, load the site, and then switch the lines that are commented out. The socket server that is passed in will have a reference to the handshake data that we will use shortly. The next function works just like it does in Express. If we execute it without anything, the middleware chain will continue. If it is executed with an error, it will stop the chain. Let's load up our site and test both by switching which return gets executed. We can allow or deny connections as we please now, but how do we know who is trying to connect? Cookies and sessions We will do it the same way Express does. We will look at the cookies that are passed and see if there is a session. If there is a session, then we will load it up and see what is in it. At this point, we should have the same knowledge about the Socket.IO connection that Express does about a request. The first thing we need to do is get a cookie parser. We will use a very aptly named package called cookie. This should already be installed if you updated your package.json and installed all the packages. Add a reference to this at the top of index.js present in the packtchatsocket.io location with all the other variable declarations: Var cookie = require('cookie'); And now we can parse our cookies. Socket.IO passes in the cookie with the socket object in our middleware. Here is how we parse it. Add the following code in the socketAuth function: var handshakeData = socket.request;var parsedCookie = cookie.parse(handshakeData.headers.cookie); At this point, we will have an object that has our connect.sid in it. Remember that this is a signed value. We cannot use it as it is right now to get the session ID. We will need to parse this signed cookie. This is where cookie-parser comes in. We will now create a reference to it, as follows: var cookieParser = require('cookie-parser'); We can now parse the signed connect.sid cookie to get our session ID. Add the following code right after our parsing code: var sid = cookieParser.signedCookie (parsedCookie['connect.sid'], config.secret); This will take the value from our parsedCookie and using our secret passphrase, will return the unsigned value. We will do a quick check to make sure this was a valid signed cookie by comparing the unsigned value to the original. We will do this in the following way: if (parsedCookie['connect.sid'] === sid) return next(new Error('Not Authenticated')); This check will make sure we are only using valid signed session IDs. The following screenshot will show you the values of an example Socket.IO authorization with a cookie: Getting the session We now have a session ID so we can query Redis and get the session out. The default session store object of Express is extended by connect-redis. To use connect-redis, we use the same session package as we did with Express, express-session. The following code is used to create all this in index.js, present at packtchatsocket.io: //at the top with the other variable declarationsvar expressSession = require('express-session');var ConnectRedis = require('connect-redis')(expressSession);var redisSession = new ConnectRedis({host: config.redisHost, port: config.redisPort}); The final line is creating the object that will connect to Redis and get our session. This is the same command used with Express when setting the store option for the session. We can now get the session from Redis and see what's inside of it. What follows is the entire socketAuth function along with all our variable declarations: var io = require('socket.io'),connect = require('connect'),cookie = require('cookie'),expressSession = require('express-session'),ConnectRedis = require('connect-redis')(expressSession),redis = require('redis'),config = require('../config'),redisSession = new ConnectRedis({host: config.redisHost, port: config.redisPort});var socketAuth = function socketAuth(socket, next){var handshakeData = socket.request;var parsedCookie = cookie.parse(handshakeData.headers.cookie);var sid = connect.utils.parseSignedCookie(parsedCookie['connect.sid'], config.secret);if (parsedCookie['connect.sid'] === sid) return next(new Error('Not Authenticated'));redisSession.get(sid, function(err, session){ if (session.isAuthenticated) { socket.user = session.user; socket.sid = sid; return next(); } else return next(new Error('Not Authenticated'));});}; We can use redisSession and sid to get the session out of Redis and check its attributes. As far as our packages are concerned, we are just another Express app getting session data. Once we have the session data, we check the isAuthenticated attribute. If it's true, we know the user is logged in. If not, we do not let them connect yet. We are adding properties to the socket object to store information from the session. Later on, after a connection is made, we can get this information. As an example, we are going to change our socketConnection function to send the user object to the client. The following should be our socketConnection function: var socketConnection = function socketConnection(socket){socket.emit('message', {message: 'Hey!'});socket.emit('message', socket.user);}; Now, let's load up our browser and go to http://localhost:3000. Log in and then check the browser's console. The following screenshot will show that the client is receiving the messages: Adding application-specific events The next thing to do is to build out all the real-time events that Socket.IO is going to listen for and respond to. We are just going to create the skeleton for each of these listeners. Open up index.js, present in packtchatsocket.io, and change the entire socketConnection function to the following code: var socketConnection = function socketConnection(socket){socket.on('GetMe', function(){});socket.on('GetUser', function(room){});socket.on('GetChat', function(data){});socket.on('AddChat', function(chat){});socket.on('GetRoom', function(){});socket.on('AddRoom', function(r){});socket.on('disconnect', function(){});}; Most of our emit events will happen in response to a listener. Using Redis as the store for Socket.IO The final thing we are going to add is to switch Socket.IO's internal store to Redis. By default, Socket.IO uses a memory store to save any data you attach to a socket. As we know now, we cannot have an application state that is stored only on one server. We need to store it in Redis. Therefore, we add it to index.js, present in packtchatsocket.io. Add the following code to the variable declarations: Var redisAdapter = require('socket.io-redis'); An application state is a flexible idea. We can store the application state locally. This is done when the state does not need to be shared. A simple example is keeping the path to a local temp file. When the data will be needed by multiple connections, then it must be put into a shared space. Anything with a user's session will need to be shared, for example. The next thing we need to do is add some code to our startIo function. The following code is what our startIo function should look like: exports.startIo = function startIo(server){io = io.listen(server);io.adapter(redisAdapter({host: config.redisHost, port: config.redisPort}));var packtchat = io.of('/packtchat');packtchat.use(socketAuth);packtchat.on('connection', socketConnection);return io;}; The first thing is to start the server listening. Next, we will call io.set, which allows us to set configuration options. We create a new redisStore and set all the Redis attributes (redisPub, redisSub, and redisClient) to a new Redis client connection. The Redis client takes a port and the hostname. Socket.IO inner workings We are not going to completely dive into everything that Socket.IO does, but we will discuss a few topics. WebSockets This is what makes Socket.IO work. All web servers serve HTTP, that is, what makes them web servers. This works great when all you want to do is serve pages. These pages are served based on requests. The browser must ask for information before receiving it. If you want to have real-time connections, though, it is difficult and requires some workaround. HTTP was not designed to have the server initiate the request. This is where WebSockets come in. WebSockets allow the server and client to create a connection and keep it open. Inside of this connection, either side can send messages back and forth. This is what Socket.IO (technically, Engine.io) leverages to create real-time communication. Socket.IO even has fallbacks if you are using a browser that does not support WebSockets. The browsers that do support WebSockets at the time of writing include the latest versions of Chrome, Firefox, Safari, Safari on iOS, Opera, and IE 11. This means the browsers that do not support WebSockets are all the older versions of IE. Socket.IO will use different techniques to simulate a WebSocket connection. This involves creating an Ajax request and keeping the connection open for a long time. If data needs to be sent, it will send it in an Ajax request. Eventually, that request will close and the client will immediately create another request. Socket.IO even has an Adobe Flash implementation if you have to support really old browsers (IE 6, for example). It is not enabled by default. WebSockets also are a little different when scaling our application. Because each WebSocket creates a persistent connection, we may need more servers to handle Socket.IO traffic then regular HTTP. For example, when someone connects and chats for an hour, there will have only been one or two HTTP requests. In contrast, a WebSocket will have to be open for the entire hour. The way our code base is written, we can easily scale up more Socket.IO servers by themselves. Ideas to take away from this article The first takeaway is that for every emit, there needs to be an on. This is true whether the sender is the server or the client. It is always best to sit down and map out each event and which direction it is going. The next idea is that of note, which entails building our app out of loosely coupled modules. Our app.js kicks everything that deals with Express off. Then, it fires the startIo function. While it does pass over an object, we could easily create one and use that. Socket.IO just wants a basic HTTP server. In fact, you can just pass the port, which is what we used in our first couple of Socket.IO applications (Ping-Pong). If we wanted to create an application layer of Socket.IO servers, we could refactor this code out and have all the Socket.IO servers run on separate servers other than Express. Summary At this point, we should feel comfortable about using real-time events in Socket.IO. We should also know how to namespace our io server and create groups of users. We also learned how to authorize socket connections to only allow logged-in users to connect. Resources for Article: Further resources on this subject: Exploring streams [article] Working with Data Access and File Formats Using Node.js [article] So, what is Node.js? [article]

0
0
18817

How-To Tutorials

article-image-adding-real-time-functionality-using-socketio

Packt

22 Sep 2014

18 min read

Adding Real-time Functionality Using Socket.io

Packt

22 Sep 2014

18 min read

0
0
15879

How-To Tutorials

article-image-building-publishing-and-supporting-your-forcecom-application

Packt

22 Sep 2014

39 min read

Building, Publishing, and Supporting Your Force.com Application

Packt

22 Sep 2014

39 min read

0
0
3412

article-image-visualization-tool-understand-data

Packt

22 Sep 2014

23 min read

Visualization as a Tool to Understand Data

Packt

22 Sep 2014

23 min read

In this article by Nazmus Saquib, the author of Mathematica Data Visualization, we will look at a few simple examples that demonstrate the importance of data visualization. We will then discuss the types of datasets that we will encounter over the course of this book, and learn about the Mathematica interface to get ourselves warmed up for coding. (For more resources related to this topic, see here.) In the last few decades, the quick growth in the volume of information we produce and the capacity of digital information storage have opened a new door for data analytics. We have moved on from the age of terabytes to that of petabytes and exabytes. Traditional data analysis is now augmented with the term big data analysis, and computer scientists are pushing the bounds for analyzing this huge sea of data using statistical, computational, and algorithmic techniques. Along with the size, the types and categories of data have also evolved. Along with the typical and popular data domain in Computer Science (text, image, and video), graphs and various categorical data that arise from Internet interactions have become increasingly interesting to analyze. With the advances in computational methods and computing speed, scientists nowadays produce an enormous amount of numerical simulation data that has opened up new challenges in the field of Computer Science. Simulation data tends to be structured and clean, whereas data collected or scraped from websites can be quite unstructured and hard to make sense of. For example, let's say we want to analyze some blog entries in order to find out which blogger gets more follows and referrals from other bloggers. This is not as straightforward as getting some friends' information from social networking sites. Blog entries consist of text and HTML tags; thus, a combination of text analytics and tag parsing, coupled with a careful observation of the results would give us our desired outcome. Regardless of whether the data is simulated or empirical, the key word here is observation. In order to make intelligent observations, data scientists tend to follow a certain pipeline. The data needs to be acquired and cleaned to make sure that it is ready to be analyzed using existing tools. Analysis may take the route of visualization, statistics, and algorithms, or a combination of any of the three. Inference and refining the analysis methods based on the inference is an iterative process that needs to be carried out several times until we think that a set of hypotheses is formed, or a clear question is asked for further analysis, or a question is answered with enough evidence. Visualization is a very effective and perceptive method to make sense of our data. While statistics and algorithmic techniques provide good insights about data, an effective visualization makes it easy for anyone with little training to gain beautiful insights about their datasets. The power of visualization resides not only in the ease of interpretation, but it also reveals visual trends and patterns in data, which are often hard to find using statistical or algorithmic techniques. It can be used during any step of the data analysis pipeline—validation, verification, analysis, and inference—to aid the data scientist. How have you visualized your data recently? If you still have not, it is okay, as this book will teach you exactly that. However, if you had the opportunity to play with any kind of data already, I want you to take a moment and think about the techniques you used to visualize your data so far. Make a list of them. Done? Do you have 2D and 3D plots, histograms, bar charts, and pie charts in the list? If yes, excellent! We will learn how to style your plots and make them more interactive using Mathematica. Do you have chord diagrams, graph layouts, word cloud, parallel coordinates, isosurfaces, and maps somewhere in that list? If yes, then you are already familiar with some modern visualization techniques, but if you have not had the chance to use Mathematica as a data visualization language before, we will explore how visualization prototypes can be built seamlessly in this software using very little code. The aim of this book is to teach a Mathematica beginner the data-analysis and visualization powerhouse built into Mathematica, and at the same time, familiarize the reader with some of the modern visualization techniques that can be easily built with Mathematica. We will learn how to load, clean, and dissect different types of data, visualize the data using Mathematica's built-in tools, and then use the Mathematica graphics language and interactivity functions to build prototypes of a modern visualization. The importance of visualization Visualization has a broad definition, and so does data. The cave paintings drawn by our ancestors can be argued as visualizations as they convey historical data through a visual medium. Map visualizations were commonly used in wars since ancient times to discuss the past, present, and future states of a war, and to come up with new strategies. Astronomers in the 17th century were believed to have built the first visualization of their statistical data. In the 18th century, William Playfair invented many of the popular graphs we use today (line, bar, circle, and pie charts). Therefore, it appears as if many, since ancient times, have recognized the importance of visualization in giving some meaning to data. To demonstrate the importance of visualization in a simple mathematical setting, consider fitting a line to a given set of points. Without looking at the data points, it would be unwise to try to fit them with a model that seemingly lowers the error bound. It should also be noted that sometimes, the data needs to be changed or transformed to the correct form that allows us to use a particular tool. Visualizing the data points ensures that we do not fall into any trap. The following screenshot shows the visualization of a polynomial as a circle: Figure1.1 Fitting a polynomial In figure 1.1, the points are distributed around a circle. Imagine we are given these points in a Cartesian space (orthogonal x and y coordinates), and we are asked to fit a simple linear model. There is not much benefit if we try to fit these points to any polynomial in a Cartesian space; what we really need to do is change the parameter space to polar coordinates. A 1-degree polynomial in polar coordinate space (essentially a circle) would nicely fit these points when they are converted to polar coordinates, as shown in figure 1.1. Visualizing the data points in more complicated but similar situations can save us a lot of trouble. The following is a screenshot of Anscombe's quartet: Figure1.2 Anscombe's quartet, generated using Mathematica Downloading the color images of this book We also provide you a PDF file that has color images of the screenshots/diagrams used in this book. The color images will help you better understand the changes in the output. You can download this file from: https://www.packtpub.com/sites/default/files/downloads/2999OT_coloredimages.PDF. Anscombe's quartet (figure 1.2), named after the statistician Francis Anscombe, is a classic example of how simple data visualization like plotting can save us from making wrong statistical inferences. The quartet consists of four datasets that have nearly identical statistical properties (such as mean, variance, and correlation), and gives rise to the same linear model when a regression routine is run on these datasets. However, the second dataset does not really constitute a linear relationship; a spline would fit the points better. The third dataset (at the bottom-left corner of figure 1.2) actually has a different regression line, but the outlier exerts enough influence to force the same regression line on the data. The fourth dataset is not even a linear relationship, but the outlier enforces the same regression line again. These two examples demonstrate the importance of "seeing" our data before we blindly run algorithms and statistics. Fortunately, for visualization scientists like us, the world of data types is quite vast. Every now and then, this gives us the opportunity to create new visual tools other than the traditional graphs, plots, and histograms. These visual signatures and tools serve the same purpose that the graph plotting examples previously just did—spy and investigate data to infer valuable insights—but on different types of datasets other than just point clouds. Another important use of visualization is to enable the data scientist to interactively explore the data. Two features make today's visualization tools very attractive—the ability to view data from different perspectives (viewing angles) and at different resolutions. These features facilitate the investigator in understanding both the micro- and macro-level behavior of their dataset. Types of datasets There are many different types of datasets that a visualization scientist encounters in their work. This book's aim is to prepare an enthusiastic beginner to delve into the world of data visualization. Certainly, we will not comprehensively cover each and every visualization technique out there. Our aim is to learn to use Mathematica as a tool to create interactive visualizations. To achieve that, we will focus on a general classification of datasets that will determine which Mathematica functions and programming constructs we should learn in order to visualize the broad class of data covered in this book. Tables The table is one of the most common data structures in Computer Science. You might have already encountered this in a computer science, database, or even statistics course, but for the sake of completeness, we will describe the ways in which one could use this structure to represent different kinds of data. Consider the following table as an example: Attribute 1 Attribute 2 … Item 1 Item 2 Item 3 When storing datasets in tables, each row in the table represents an instance of the dataset, and each column represents an attribute of that data point. For example, a set of two-dimensional Cartesian vectors can be represented as a table with two attributes, where each row represents a vector, and the attributes are the x and y coordinates relative to an origin. For three-dimensional vectors or more, we could just increase the number of attributes accordingly. Tables can be used to store more advanced forms of scientific, time series, and graph data. We will cover some of these datasets over the course of this book, so it is a good idea for us to get introduced to them now. Here, we explain the general concepts. Scalar fields There are many kinds of scientific dataset out there. In order to aid their investigations, scientists have created their own data formats and mathematical tools to analyze the data. Engineers have also developed their own visualization language in order to convey ideas in their community. In this book, we will cover a few typical datasets that are widely used by scientists and engineers. We will eventually learn how to create molecular visualizations and biomedical dataset exploration tools when we feel comfortable manipulating these datasets. In practice, multidimensional data (just like vectors in the previous example) is usually augmented with one or more characteristic variable values. As an example, let's think about how a physicist or an engineer would keep track of the temperature of a room. In order to tackle the problem, they would begin by measuring the geometry and the shape of the room, and put temperature sensors at certain places to measure the temperature. They will note the exact positions of those sensors relative to the room's coordinate system, and then, they will be all set to start measuring the temperature. Thus, the temperature of a room can be represented, in a discrete sense, by using a set of points that represent the temperature sensor locations and the actual temperature at those points. We immediately notice that the data is multidimensional in nature (the location of a sensor can be considered as a vector), and each data point has a scalar value associated with it (temperature). Such a discrete representation of multidimensional data is quite widely used in the scientific community. It is called a scalar field. The following screenshot shows the representation of a scalar field in 2D and 3D: Figure1.3 In practice, scalar fields are discrete and ordered Figure 1.3 depicts how one would represent an ordered scalar field in 2D or 3D. Each point in the 2D field has a well-defined x and y location, and a single temperature value gets associated with it. To represent a 3D scalar field, we can think of it as a set of 2D scalar field slices placed at a regular interval along the third dimension. Each point in the 3D field is a point that has {x, y, z} values, along with a temperature value. A scalar field can be represented using a table. We will denote each {x, y} point (for 2D) or {x, y, z} point values (for 3D) as a row, but this time, an additional attribute for the scalar value will be created in the table. Thus, a row will have the attributes {x, y, z, T}, where T is the temperature associated with the point defined by the x, y, and z coordinates. This is the most common representation of scalar fields. A widely used visualization technique to analyze scalar fields is to find out the isocontours or isosurfaces of interest. However, for now, let's take a look at the kind of application areas such analysis will enable one to pursue. Instead of temperature, one could think of associating regularly spaced points with any relevant scalar value to form problem-specific scalar fields. In an electrochemical simulation, it is important to keep track of the charge density in the simulation space. Thus, the chemist would create a scalar field with charge values at specific points. For an aerospace engineer, it is quite important to understand how air pressure varies across airplane wings; they would keep track of the pressure by forming a scalar field of pressure values. Scalar field visualization is very important in many other significant areas, ranging from from biomedical analysis to particle physics. In this book, we will cover how to visualize this type of data using Mathematica. Time series Another widely used data type is the time series. A time series is a sequence of data points that are measured usually over a uniform interval of time. Time series arise in many fields, but in today's world, they are mostly known for their applications in Economics and Finance. Other than these, they are frequently used in statistics, weather prediction, signal processing, astronomy, and so on. It is not the purpose of this book to describe the theory and mathematics of time series data. However, we will cover some of Mathematica's excellent capabilities for visualizing time series, and in the course of this book, we will construct our own visualization tool to view time series data. Time series can be easily represented using tables. Each row of the time series table will represent one point in the series, with one attribute denoting the time stamp—the time at which the data point was recorded, and the other attribute storing the actual data value. If the starting time and the time interval are known, then we can get rid of the time attribute and simply store the data value in each row. The actual timestamp of each value can be calculated using the initial time and time interval. Images and videos can be represented as tables too, with pixel-intensity values occupying each entry of the table. As we focus on visualization and not image processing, we will skip those types of data. Graphs Nowadays, graphs arise in all contexts of computer science and social science. This particular data structure provides a way to convert real-world problems into a set of entities and relationships. Once we have a graph, we can use a plethora of graph algorithms to find beautiful insights about the dataset. Technically, a graph can be stored as a table. However, Mathematica has its own graph data structure, so we will stick to its norm. Sometimes, visualizing the graph structure reveals quite a lot of hidden information. Graph visualization itself is a challenging problem, and is an active research area in computer science. A proper visualization layout, along with proper color maps and size distribution, can produce very useful outputs. Text The most common form of data that we encounter everywhere is text. Mathematica does not provide any specific visualization package for state-of-the-art text visualization methods. Cartographic data As mentioned before, map visualization is one of the ancient forms of visualization known to us. Nowadays, with the advent of GPS, smartphones, and publicly available country-based data repositories, maps are providing an excellent way to contrast and compare different countries, cities, or even communities. Cartographic data comes in various forms. A common form of a single data item is one that includes latitude, longitude, location name, and an attribute (usually numerical) that records a relevant quantity. However, instead of a latitude and longitude coordinate, we may be given a set of polygons that describe the geographical shape of the place. The attributable quantity may not be numerical, but rather something qualitative, like text. Thus, there is really no standard form that one can expect when dealing with cartographic data. Fortunately, Mathematica provides us with excellent data-mining and dissecting capabilities to build custom formats out of the data available to us. . Mathematica as a tool for visualization At this point, you might be wondering why Mathematica is suited for visualizing all the kinds of datasets that we have mentioned in the preceding examples. There are many excellent tools and packages out there to visualize data. Mathematica is quite different from other languages and packages because of the unique set of capabilities it presents to its user. Mathematica has its own graphics language, with which graphics primitives can be interactively rendered inside the worksheet. This makes Mathematica's capability similar to many widely used visualization languages. Mathematica provides a plethora of functions to combine these primitives and make them interactive. Speaking of interactivity, Mathematica provides a suite of functions to interactively display any of its process. Not only visualization, but any function or code evaluation can be interactively visualized. This is particularly helpful when managing and visualizing big datasets. Mathematica provides many packages and functions to visualize the kinds of datasets we have mentioned so far. We will learn to use the built-in functions to visualize structured and unstructured data. These functions include point, line, and surface plots; histograms; standard statistical charts; and so on. Other than these, we will learn to use the advanced functions that will let us build our own visualization tools. Another interesting feature is the built-in datasets that this software provides to its users. This feature provides a nice playground for the user to experiment with different datasets and visualization functions. From our discussion so far, we have learned that visualization tools are used to analyze very large datasets. While Mathematica is not really suited for dealing with petabytes or exabytes of data (and many other popularly used visualization tools are not suited for that either), often, one needs to build quick prototypes of such visualization tools using smaller sample datasets. Mathematica is very well suited to prototype such tools because of its efficient and fast data-handling capabilities, along with its loads of convenient functions and user-friendly interface. It also supports GPU and other high-performance computing platforms. Although it is not within the scope of this book, a user who knows how to harness the computing power of Mathematica can couple that knowledge with visualization techniques to build custom big data visualization solutions. Another feature that Mathematica presents to a data scientist is the ability to keep the workflow within one worksheet. In practice, many data scientists tend to do their data analysis with one package, visualize their data with another, and export and present their findings using something else. Mathematica provides a complete suite of a core language, mathematical and statistical functions, a visualization platform, and versatile data import and export features inside a single worksheet. This helps the user focus on the data instead of irrelevant details. By now, I hope you are convinced that Mathematica is worth learning for your data-visualization needs. If you still do not believe me, I hope I will be able to convince you again at the end of the book, when we will be done developing several visualization prototypes, each requiring only few lines of code! Getting started with Mathematica We will need to know a few basic Mathematica notebook essentials. Assuming you already have Mathematica installed on your computer, let's open a new notebook by navigating to File|New|Notebook, and do the following experiments. Creating and selecting cells In Mathematica, a chunk of code or any number of mathematical expressions can be written within a cell. Each cell in the notebook can be evaluated to see the output immediately below it. To start a new cell, simply start typing at the position of the blinking cursor. Each cell can be selected by clicking on the respective rightmost bracket. To select multiple cells, press Ctrl + right-mouse button in Windows or Linux (or cmd + right-mouse button on a Mac) on each of the cells. The following screenshot shows several cells selected together, along with the output from each cell: Figure1.4 Selecting and evaluating cells in Mathematica We can place a new cell in between any set of cells in order to change the sequence of instruction execution. Use the mouse to place the cursor in between two cells, and start typing your commands to create a new cell. We can also cut, copy, and paste cells by selecting them and applying the usual shortcuts (for example, Ctrl + C, Ctrl + X, and Ctrl + V in Windows/Linux, or cmd + C, cmd + X, and cmd + V in Mac) or using the Edit menu bar. In order to delete cell(s), select the cell(s) and press the Delete key. Evaluating a cell A cell can be evaluated by pressing Shift + Enter. Multiple cells can be selected and evaluated in the same way. To evaluate the full notebook, press Ctrl + A (to select all the cells) and then press Shift + Enter. In this case, the cells will be evaluated one after the other in the sequence in which they appear in the notebook. To see examples of notebooks filled with commands, code, and mathematical expressions, you can open the notebooks supplied with this article, which are the polar coordinates fitting and Anscombe's quartet examples, and select each cell (or all of them) and evaluate them. If we evaluate a cell that uses variables declared in a previous cell, and the previous cell was not already evaluated, then we may get errors. It is possible that Mathematica will treat the unevaluated variables as a symbolic expression; in that case, no error will be displayed, but the results will not be numeric anymore. Suppressing output from a cell If we don't wish to see the intermediate output as we load data or assign values to variables, we can add a semicolon (;) at the end of each line that we want to leave out from the output. Cell formatting Mathematica input cells treat everything inside them as mathematical and/or symbolic expressions. By default, every new cell you create by typing at the horizontal cursor will be an input expression cell. However, you can convert the cell to other formats for convenient typesetting. In order to change the format of cell(s), select the cell(s) and navigate to Format|Style from the menu bar, and choose a text format style from the available options. You can add mathematical symbols to your text by selecting Palettes|Basic Math Assistant. Note that evaluating a text cell will have no effect/output. Commenting We can write any comment in a text cell as it will be ignored during the evaluation of our code. However, if we would like to write a comment inside an input cell, we use the (* operator to open a comment and the *) operator to close it, as shown in the following code snippet: (* This is a comment *) The shortcut Ctrl + / (cmd + / in Mac) is used to comment/uncomment a chunk of code too. This operation is also available in the menu bar. Downloading the example code You can download the example code files for all Packt books you have purchased from your account at http://www.packtpub.com. If you purchased this book elsewhere, you can visit http://www.packtpub.com/support and register to have the files e-mailed directly to you. Aborting evaluation We can abort the currently running evaluation of a cell by navigating to Evaluation|Abort Evaluation in the menu bar, or simply by pressing Alt + . (period). This is useful when you want to end a time-consuming process that you suddenly realize will not give you the correct results at the end of the evaluation, or end a process that might use up the available memory and shut down the Mathematica kernel. Further reading The history of visualization deserves a separate book, as it is really fascinating how the field has matured over the centuries, and it is still growing very strongly. Michael Friendly, from York University, published a historical development paper that is freely available online, titled Milestones in History of Data Visualization: A Case Study in Statistical Historiography. This is an entertaining compilation of the history of visualization methods. The book The Visual Display of Quantitative Information by Edward R. Tufte published by Graphics Press USA, is an excellent resource and a must-read for every data visualization practitioner. This is a classic book on the theory and practice of data graphics and visualization. Since we will not have the space to discuss the theory of visualization, the interested reader can consider reading this book for deeper insights. Summary In this article, we discussed the importance of data visualization in different contexts. We also introduced the types of dataset that will be visualized over the course of this book. The flexibility and power of Mathematica as a visualization package was discussed, and we will see the demonstration of these properties throughout the book with beautiful visualizations. Finally, we have taken the first step to writing code in Mathematica. Resources for Article: Further resources on this subject: Driving Visual Analyses with Automobile Data (Python) [article] Importing Dynamic Data [article] Interacting with Data for Dashboards [article]

0
0
9283

Creating an Extension in Yii 2

Protecting GPG Keys in BeagleBone

Exploring the Usages of Delphi

How to Build a Recommender by Running Mahout on Spark

Windows Phone 8 Applications

Installing RHEV Manager

Setting up of Software Infrastructure on the Cloud

Animations in Cocos2d-x

Galera Cluster Basics

Testing Your Recipes and Getting Started with ChefSpec

Trending Topics

JavaScript Promises – Why Should I Care?

Using Socket.IO and Express together

Adding Real-time Functionality Using Socket.io

Building, Publishing, and Supporting Your Force.com Application

Visualization as a Tool to Understand Data

Create a Free Account To Continue Reading

Sign in to activate your 7-day free access