Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletter Hub
Free Learning
Arrow right icon
timer SALE ENDS IN
0 Days
:
00 Hours
:
00 Minutes
:
00 Seconds

How-To Tutorials

7019 Articles
article-image-special-effects
Packt
11 Apr 2016
16 min read
Save for later

Special Effects

Packt
11 Apr 2016
16 min read
In this article by Maciej Szczesnik, author of Unity 5.x Animation Cookbook, we will cover the following recipes: Creating camera shakes with the Animation View and Animator Controller Using the Animation View to animate public script variables Using additive Mecanim layers of add extra motion to a character Using Blend Shapes to morph an object into another one (For more resources related to this topic, see here.) Introduction This one is all about encouraging you to experiment with Unity's animation system. In the next ten recipes, we will create interesting effects and use animations in new, creative ways. Using Animation Events to trigger sound and visual effects This recipe shows a simple, generic way of playing different sound and visual effects with Animation Events. Getting ready To start with, you need to have a character with one, looped animation—Jump. We also need a sound effect and a particle system. We will need a transparent DustParticle.png texture for the particle system. It should resemble a small dust cloud. In the Rigs directory, you will find all the animations you need, and in the Resources folder, you'll find other required assets. When you play the game, you will see a character using the Jump animation. It will also play a sound effect and a particle effect while landing. How to do it... To play sound and visual effects with Animation Events, follow these steps: Import the character with the Jump animation. In the Import Settings, Animation tab, select the Jump animation. Make it loop. Go to the Events section. Scrub through the timeline in the Preview section, and click on Add Event Button. The Edit Animation Event window will appear.Edit Animation Event window Type Sound in the Function field and Jump in the String field. This will call a Sound function in a script attached to the character and pass the Jump word as a string parameter to it. Create another Animation Event. Set the Function field to Effect and the String field to Dust. Apply Import Settings. Create Animator Controller for the character with just the Jump animation in it. Place the character in a scene. Attach the controller to the Animator component of the character. Attach an Audio Source component to the character. Uncheck the Play On Awake option. Create an empty Game Object and name it Dust. Add a Particle System component to it. This will be our dust effect. Set the Particle System parameters as follows: Duration to 1 second. Start Life Time to 0,5 seconds.      Start Speed to 0,4.      Start Size to random between two constants: 1 and 2.      Start Color to a light brown.      Emission | Rate to 0.      Emission | Bursts to one burst with time set to 0, min and max set to 5.      Shape | Shape to Sphere.      Shape | Radius to 0.2.     For Color Over Lifetime, create a gradient for the alpha channel. In the 0% mark and 100% mark, it should be set to 0. In the 10% and 90% mark, it should be set to 255. Create a new Material and set the shader by navigating to Particles | Alpha Blended. Drag and drop a transparent texture of DustParticle.png into the Texture field of Material. Drag and drop Material by navigating to the Renderer | Material slot of our Dust Particle System. Create a Resources folder in the project's structure. Unity can load assets from the Resources folder in runtime without the need of referencing them as prefabs. Drag and drop the Jump.ogg sound and Dust Game Object into the Resources folder. Write a new script and name it TriggerEffects.cs. This script has two public void functions. Both are called from the Jump animation as Animation Events. In the first function, we load an Audio Clip from the Resources folder. We set the Audio Clip name in the Animation Event itself as a string parameter (it was set to Jump). When we successfully load the Audio Clip, we play it using the Audio Source component, reference to which we store in the source variable. We also randomize the pitch of the Audio Source to have a little variation when playing the Jump.ogg sound. public void Sound (string soundResourceName) { AudioClip clip = (AudioClip) Resources.Load(soundResourceName); if (clip != null) { source.pitch = Random.Range(0.9f, 1.2f); source.PlayOneShot(clip); } } In the second function, we try to load a prefab with the name specified as the function's parameter. We also set this name in the Animation Event (it was set to Dust). If we manage to load the prefab, we instantiate it, creating the dust effect under our character's feet. public void Effect (string effectResourceName) { GameObject effectResource = (GameObject)Resources.Load(effectResourceName); if (effectResource != null) { GameObject.Instantiate(effectResource, transform.position, Quaternion.identity); } } Assign the script to our character and play the game to see the effect. How it works... We are using one important feature of Animation Events in this recipe: the possibility of a passing string, int, or float parameter to our script's functions. This way, we can create one function to play all the sound effects associated with our character and pass clip names as string parameters from the Animation Events. The same concept is used to spawn the Dust effect. The Resources folder is needed to get any resource (prefab, texture, audio clip, and so on.) with the Resources.Load(string path) function. This method is convenient in order to load assets using their names. There's more... Our Dust effect has the AutoDestroy.cs script attached to make it disappear after a certain period of time. You can find that script in the Shared Scripts folder in the provided Unity project example. Creating camera shakes with the Animation View and the Animator Controller In this recipe, we will use a simple but very effective method to create camera shakes. These effects are often used to emphasize impacts or explosions in our games. Getting ready... You don't need anything special for this recipe. We will create everything from scratch in Unity. You can also download the provided example. When you open the Example.scene scene and play the game, you can press Space to see a simple camera shake effect. How to do it... To create a camera shake effect, follow these steps: Create an empty Game Object in Scene View and name it CameraRig. Parent Main Camera to CameraRig. Select Main Camera and add an Animator component to it. Open Animation View. Create a new Animation Clip and call it CamNormal. The camera should have no motion in this clip. Add keys for both the camera's position and its rotation. Create another Animation Clip and call it CameraShake. Animate the camera's rotation and position it to create a shake effect. The animation should be for about 0.5 seconds. Open the automatically created Main Camera controller. Add a Shake Trigger parameter. Create two transitions:      Navigate to CamNormal | CameraShake with this condition: Shake the Trigger parameter, Has Exit Time is set to false, and Transition Duration is set to 0.2 seconds.      Navigate to CameraShake | CamNormal with no conditions, Has Exit Time is set to true, and Transition Duration is set to 0.2 seconds. Write a new script and call it CamShake.cs. In this script's Update() function, we check whether the player pressed the Space key. If so, we trigger the Shake Trigger in our controller. if (Input.GetKeyDown(KeyCode.Space)) { anim.SetTrigger("Shake"); } As always, the anim variable holds the reference to the Animator component and is set in the Start() function with the GetComponent<Animator>() method. Assign the script to Main Camera. Play the game and press Space to see the effect. How it works... In this recipe, we've animated the camera's position and rotation relative to the CameraRig object. This way, we can still move CameraRig (or attach it to a character). Our CameraShake animation affects only the local position and rotation of the camera. In the script, we simply call the Shake Trigger to play the CameraShake animation once. There's more... You can create more sophisticated camera shake effects with Blend Trees. To do so, prepare several shake animations of different strengths and blend them in a Blend Tree using a Strengthfloat parameter. This way, you will be able to set the shake's strength, depending on different situations in the game (the distance from an explosion, for instance). Using the Animation View to animate public script variables In Unity, we can animate public script variables. The most standard types are supported. We can use this to achieve interesting effects that are not possible to achieve directly. For instance, we can animate the fog's color and density, which is not directly accessible through the Animation View. Getting ready... In this recipe, everything will be created from scratch, so you don't need to prepare any special assets. You can find the Example.scene scene there. If you open it and press Space, you can observe the fog changing color and density. This is achieved by animating the public variables of a script. Animated fog How to do it... To animate public script variables, follow these steps: Create a new script and call it FogAnimator.cs. Create two public variables in this script: public float fogDensity and public Color fogColor. In the script's Update() function, we call the o Trigger in the controller when the player presses Space. We also set the RenderSettings.fogColor and RenderSettings.fogDensity parameters using our public variables. We also adjust the main camera's background color to match the fog color. if (Input.GetKeyDown(KeyCode.Space)) { anim.SetTrigger("ChangeFog"); } RenderSettings.fogColor = fogColor; RenderSettings.fogDensity = fogDensity; Camera.main.backgroundColor = fogColor; Create a new Game Object and name it FogAnimator. Attach the FogAnimator.cs script to it. Select the FogAnimator game object and add an Animator component to it. Open the Animation View. Create a new Animation Clip. Make sure Record Button is pressed. Create an animation for the public float fogDensity and public Color fogColor parameters by changing their values. You can create any number of animations and connect them in the automatically created Animator Controller with transitions based on the ChangeFog Trigger (you need to add this parameter to the controller first). Here's an example controller:An example controller for different fog animations Remember that you don't need to create animations of the fog changing its color or density. You can rely on blending between animations in the controller. All you need to have is one key for the density and one for the color in each animation. In this example, all Transition Durations are set to 1 second, and every transition's Has Exit Time parameter is set to false. Make sure that the fog is enabled in the Lighting settings. Play the game and press the Space button to see the effect. How it works... Normally, we can't animate the fog's color or density using the Animation View. But we can do this easily with a script that sets the RenderSettings.fogColor and RenderSettings.fogDensity parameters in every frame. We use animations to change the script's public variables values in time. This way, we've created a workaround in order to animate fog in Unity. We've just scratched the surface of what's possible in terms of animating public script variables. Try experimenting with them to achieve awesome effects. Using additive Mecanim layers to add extra motion to a character In previous recipes, we used Mecanim layers in the override mode. We can set a layer to be additive. This can add additional movement to our base layer animations. Getting ready... We will need a character with three animations—Idle, TiredReference, and Tired. The first animation is a normal, stationary idle. The second animation has no motion and is used as a reference pose to calculate the additive motion from the third animation. TiredReference can be the first frame of the Tired animation. In the Tired animation, we can see our character breathing heavily. You will find the same Humanoid character there. If you play the game and press Space, our character will start breathing heavily while still using the Idle animation. You can find all the required animations in the Rigs directory. How to do it... To use additive layers, follow these steps: Import the character into Unity and place it in a scene. Go to the Animation tab in Import Settings. Find the TiredReference animation and check the Additive Reference Pose option (you can also use the normal Tired animation and specify the frame in the Pose Frame field). Loop the Idle and Tired animations. Create a new Animator Controller. Drag and drop the Idle animation into the controller and make it the default state. Find the Layers tab in upper-left corner of the Animator window. Select it and click on the Plus button below to add a new layer. Name the newly created layer Tired. Click on the Gear icon and set the Blending to Additive. Take a look at this diagram for reference:                                                                                                      Additive layer settings Drag and drop the Tired animation to the newly created layer. Assign the controller to our character. Create a new script and call it Tired.cs. In this script's Update() function, we set the weight of the Tired layer when the player presses Space. The Tired layer has an index of 1. We use a weightTarget helper variable to set the new weight to 0 or 1, depending on its current value. This allows us to switch the additive layer on and off every time the player presses Space. Finally, we interpolate the weight value in time to make the transition more smooth, and we set weight of our additive layer with the SetLayerWeight() function. if (Input.GetKeyDown(KeyCode.Space)) { if (weightTarget < 0.5f) { weightTarget = 1f; } else if (weightTarget > 0.5f) { weightTarget = 0f; } } weight = Mathf.Lerp(weight, weightTarget, Time.deltaTime * tiredLerpSpeed); anim.SetLayerWeight(1, weight); Attach the script to the Humanoid character. Play the game and press Space to see the additive animation effect. How it works... Additive animations are calculated using the reference pose. Movements relative to this pose are then added to other animations. This way, we can not only override the base layer with other layers but also modify base movements by adding a secondary motion. Try experimenting with different additive animations. You can, for instance, make your character bend, aim, or change its overall body pose. Using Blend Shapes to morph an object into another one Previously, we used Blend Shapes to create face expressions. This is also an excellent tool for special effects. In this recipe, we will morph one object into another. Getting ready... To follow this recipe, we need to prepare an object with Blend Shapes. We've created a really simple example in Blender—a subdivided cube with one shape key that looks like a sphere. Take a look at this screenshot for reference: A cube with a Blend Shape that turns it into a sphere You will see a number of cubes there. If you hit the Space key in play mode, the cubes will morph into spheres. You can find the Cuboid.fbx asset with the required Blend Shapes in the Model directory. How to do it... To use Blend Shapes to morph objects, follow these steps: Import the model with at least one Blend Shape to Unity. You may need to go to the Import Settings | Model tab and choose Import BlendShapes. Place the model in Scene. Create a new script and call it ChangeShape.cs. This script is similar to the one from the previous recipe. In the Update() function, we change the weight of the of the first Blend Shape when player presses Space. Again, we use a helper variable weightTarget to set the new weight to 0 or 100, depending on its current value. Blend Shapes have weights from 0 to 100 instead of 1. Finally, we interpolate the weight value in time to make the transition smoother. We use the SetBlendShapeWeight() function on the skinnedRenderer object. This variable is set in the Start() function with the GetComponent<SkinnedMeshRenderer>() function. if (Input.GetKeyDown(KeyCode.Space)) { if (weightTarget < 50f) { weightTarget = 100f; } else if (weightTarget > 50f) { weightTarget = 0f; } } weight = Mathf.Lerp(weight, weightTarget, Time.deltaTime * blendShapeLerpSpeed); skinnedRenderer.SetBlendShapeWeight(0, weight); Attach the script to the model on the scene. Play the game and press Space to see the model morph. How it works... Blend Shapes store vertices position of a mesh. We have to create them in a 3D package. Unity imports Blend Shapes and we can modify their weights in runtime using the SetBlendShapeWeight() function on the Skinned Mesh Renderer component. Blend Shapes have trouble with storing normals. If we import normals from our model it may look weird after morphing. Sometimes setting the Normals option to Calculate in the Import Settings can helps with the problem. If we choose this option Unity will calculate normals based on the angle between faces of our model. This allowed us to morph a hard surface cube into a smooth sphere in this example. Summary This article covers some basic recipes which can be performed using Unity. It also covers basic concept of of using Animation Layer, Mecanim layer and creating Camera shakes Resources for Article: Further resources on this subject: Animation features in Unity 5[article] Saying Hello to Unity and Android[article] Learning NGUI for Unity[article]
Read more
  • 0
  • 0
  • 13288

article-image-cardboard-virtual-reality-everyone
Packt
11 Apr 2016
22 min read
Save for later

Cardboard is Virtual Reality for Everyone

Packt
11 Apr 2016
22 min read
In this article, by Jonathan Linowes and Matt Schoen, authors of the book Cardboard VR Projects for Android, introduce and define Google Cardboard. (For more resources related to this topic, see here.) Welcome to the exciting new world of virtual reality! We're sure that, as an Android developer, you want to jump right in and start building cool stuff that can be viewed using Google Cardboard. Your users can then just slip their smartphone into a viewer and step into your virtual creations. Let's take an outside-in tour of VR, Google Cardboard, and its Android SDK to see how they all fit together. Why is it called Cardboard? It all started in early 2014 when Google employees, David Coz and Damien Henry, in their spare time, built a simple and cheap stereoscopic viewer for their Android smartphone. They designed a device that can be constructed from ordinary cardboard, plus a couple of lenses for your eyes, and a mechanism to trigger a button "click." The viewer is literally made from cardboard. They wrote software that renders a 3D scene with a split screen, one view for the left eye, and another view, with offset, for the right eye. Peering through the device, you get a real sense of 3D immersion into the computer generated scene. It worked! The project was then proposed and approved as a "20% project" (where employees may dedicate one day a week for innovations), funded, and joined by other employees. In fact, Cardboard worked so well that Google decided to go forward, taking the project to the next level and releasing it to the public a few months later at Google I/O 2014. Since its inception, Google Cardboard has been accessible to hackers, hobbyists, and professional developers alike. Google open sourced the viewer design for anyone to download the schematics and make their own, from a pizza box or from whatever they had lying around. One can even go into business selling precut kits directly to consumers. An assembled Cardboard viewer is shown in the following image: The Cardboard project also includes a software development kit (SDK) that makes it easy to build VR apps. Google has released continuous improvements to the software, including both a native Java SDK as well as a plugin for the Unity 3D game engine (https://unity3d.com/). Since the release of Cardboard, a huge number of applications have been developed and made available on the Google Play Store. At Google I/O 2015, Version 2.0 introduced an upgraded design, improved software, and support for Apple iOS. Google Cardboard has rapidly evolved in the eye of the market from an almost laughable toy into a serious new media device for certain types of 3D content and VR experiences. Google's own Cardboard demo app has been downloaded millions of times from the Google Play store. The New York Times distributed about a million cardboard viewers with its November 8, Sunday issue back in 2015. Cardboard is useful for viewing 360-degree photos and playing low-fidelity 3D VR games. It is universally accessible to almost anyone because it runs on any Android or iOS smartphone. For developers who are integrating 3D VR content directly into Android apps, Google Cardboard is a way of experiencing virtual reality that is here to stay. A gateway to VR Even in a very short time, it's been available; this generation of consumer virtual reality, whether Cardboard or Oculus Rift, has demonstrated itself to be instantly compelling, immersive, entertaining, and "a game changer" for just about everyone who tries it. Google Cardboard is especially easy to access with a very low barrier to use. All you need is a smartphone, a low-cost Cardboard viewer (as low as $5 USD), and free apps downloaded from Google Play (or Apple App Store for iOS). Google Cardboard has been called a gateway to VR, perhaps in reference to marijuana as a "gateway drug" to more dangerous illicit drug abuse? We can play with this analogy for a moment, however, decadent. Perhaps Cardboard will give you a small taste of VR's potential. You'll want more. And then more again. This will help you fulfill your desire for better, faster, more intense, and immersive virtual experiences that can only be found in higher end VR devices. At this point, perhaps there'll be no turning back, you're addicted! Yet as a Rift user, I still also enjoy Cardboard. It's quick. It's easy. It's fun. And it really does work, provided I run apps that are appropriately designed for the device. I brought a Cardboard viewer in my backpack when visiting my family for the holidays. Everyone enjoyed it a lot. Many of my relatives didn't even get past the standard Google Cardboard demo app, especially its 360-degree photo viewer. That was engaging enough to entertain them for a while. Others jumped to a game or two or more. They wanted to keep playing and try new experiences. Perhaps it's just the novelty. Or, perhaps it's the nature of this new medium. The point is that Google Cardboard provides an immersive experience that's enjoyable, useful, and very easily accessible. In short, it is amazing. Then, show them an HTC Vive or Oculus Rift. Holy Cow! That's really REALLY amazing! We're not here to talk about the higher end VR devices, except to contrast it with Cardboard and to keep things in perspective. Once you try desktop VR, is it hard to "go back" to mobile VR? Some folks say so. But that's almost silly. The fact is that they're really separate things. As discussed earlier, desktop VR comes with much higher processing power and other high-fidelity features, whereas mobile VR is limited to your smartphone. If one were to try and directly port a desktop VR app to a mobile device, there's a good chance that you'll be disappointed. It's best to think of each as a separate media. Just like a desktop application or a console game is different from, but similar to, a mobile one. The design criteria may be similar but different. The technologies are similar but different. The user expectations are similar but different. Mobile VR may be similar to desktop VR, but it's different. To emphasize how different Cardboard is from desktop VR devices, it's worth pointing out that Google has written their manufacturer's specifications and guidelines. "Do not include a headstrap with your viewer. When the user holds the Cardboard with their hands against the face, their head rotation speed is limited by the torso rotational speed (which is much slower than the neck rotational speed). This reduces the chance of "VR sickness" caused by rendering/IMU latency and increases the immersiveness in VR." The implication is that Cardboard apps should be designed for shorter, simpler, and somewhat stationary experiences. Let's now consider the other ways where Cardboard is a gateway to VR. We predict that Android will continue to grow as a primary platform for virtual reality into the future. More and more technologies will get crammed into smartphones. And this technology will include features specifically with keeping VR in mind: Faster processors and mobile GPUs Higher resolution screens Higher precision motion sensors Optimized graphics pipelines Better software Tons of more VR apps Mobile VR will not give way to desktop VR; it may even eventually replace it. Furthermore, maybe soon we'll see dedicated mobile VR headsets that have the guts of a smartphone built-in without the cost of a wireless communications' contract. No need to use your own phone. No more getting interrupted while in VR by an incoming call or notification. No more rationing battery life in case you need to receive an important call or otherwise use your phone. These dedicated VR devices will likely be Android-based. The value of low-end VR Meanwhile, Android and Google Cardboard are here today, on our phones, in our pockets, in our homes, at the office, and even in our schools. Google Expeditions, for example, is Google's educational program for Cardboard (https://www.google.com/edu/expeditions/), which allows K-12 school children to take virtual field trips to "places a school bus can't," as they say, "around the globe, on the surface of Mars, on a dive to coral reefs, or back in time." The kits include Cardboard viewers and Android phones for each child in a classroom, plus an Android tablet for the teacher. They're connected with a network. The teacher can then guide students on virtual field trips, provide enhanced content, and create learning experiences that go way beyond a textbook or classroom video, as shown in the following image: The entire Internet can be considered a world-wide publishing and media distribution network. It's a web of hyperlinked pages, text, images, music, video, JSON data, web services, and many more. It's also teeming with 360-degree photos and videos. There's also an ever growing amount of three-dimensional content and virtual worlds. Would you consider writing an Android app today that doesn't display images? Probably not. There's a good chance that your app also needs to support sound files, videos, or other media. So, pay attention. Three-dimensional Cardboard-enabled content is coming quickly. You might be interested in reading this article now because VR looks fun. But soon enough, it may be a customer-driven requirement for your next app. Some examples of types of popular Cardboard apps include: 360 degree photo viewing, for example, Google's Cardboard demo (https://play.google.com/store/apps/details?id=com.google.samples.apps.cardboarddemo) and Cardboard Camera (https://play.google.com/store/apps/details?id=com.google.vr.cyclops) Video and cinema viewing, for example, a Cardboard theatre (https://play.google.com/store/apps/details?id=it.couchgames.apps.cardboardcinema) Roller coasters and thrill rides, for example, VR Roller Coaster (https://play.google.com/store/apps/details?id=com.frag.vrrollercoaster) Cartoonish 3D games, for example, Lamber VR (https://play.google.com/store/apps/details?id=com.archiactinteractive.LfGC&hl=en_GB) First person shooter games, for example, Battle 360 VR (https://play.google.com/store/apps/details?id=com.oddknot.battle360vr) Creepy scary stuff, for example, Sisters (https://play.google.com/store/apps/details?id=com.otherworld.Sisters) Educational experiences, for example, Titans of Space (https://play.google.com/store/apps/details?id=com.drashvr.titansofspacecb&hl=en_GB) Marketing experiences, for example, Volvo Reality (https://play.google.com/store/apps/details?id=com.volvo.volvoreality) And much more. Thousands more. The most popular ones have had hundreds of thousands of downloads (the Cardboard demo app itself has millions of downloads). Cardware! Let's take a look at the variety of Cardboard devices that are available. There's a lot of variety. Obviously, the original Google design is actually made from cardboard. And manufacturers have followed suit, offering cardboard Cardboards directly to consumers—brands such as Unofficial Cardboard, DODOCase, and IAmCardboard were among the first. Google provides the specifications and schematics free of charge (refer to https://www.google.com/get/cardboard/manufacturers/). The basic viewer design consists of an enclosure body, two lenses, and an input mechanism. The Works with Google Cardboard certification program indicates that a given viewer product meets the Google standards and works well with Cardboard apps. The viewer enclosure may be constructed from any material: cardboard, plastic, foam, aluminum, and so on. It should be lightweight and do a pretty good job of blocking the ambient light. The lenses (I/O 2015 Edition) are 34 mm diameter aspherical single lenses with an 80 degree circular FOV (field of view) and other specified parameters. The input trigger ("clicker") can be one of the several alternative mechanisms. The simplest is none, where the user must touch the smartphone screen directly with their finger to trigger a click. This may be inconvenient since the phone is sitting inside the viewer enclosure but it works. Plenty of viewers just include a hole to stick your finger inside. Alternatively, the original Cardboard utilized a small ring magnet attached to the outside of the viewer. The user can slide the magnet and this movement is sensed by the phone's magnetometer and recognized by the software as a "click". This design is not always reliable because the location of the magnetometer varies among phones. Also, using this method, it is harder to detect a "press and hold" interaction, which means that there is only one type of user input "event" to use within your application. Lastly, Version 2.0 introduced a button input constructed from a conductive "strip" and "pillow" glued to a Cardboard-based "hammer". When the button is pressed, the user's body charge is transferred onto the smartphone screen, as if he'd directly touched the screen with his finger. This clever solution avoids the unreliable magnetometer solution, instead uses the phone's native touchscreen input, albeit indirectly. It is also worth mentioning at this point that since your smartphone supports Bluetooth, it's possible to use a handheld Bluetooth controller with your Cardboard apps. This is not part of the Cardboard specifications and requires some extra configuration; the use of a third-party input handler or controller support built into the app. A mini Bluetooth controller is shown in the following image: Cardboard viewers are not necessarily made out of cardboard. Plastic viewers can get relatively costly. While they are more sturdy than cardboards, they fundamentally have the same design (assembled). Some devices include adjustable lenses, for the distance of the lenses from the screen, and/or the distance between your eyes (IPD or inter-pupillary distance). The Zeiss VR One, Homido, and Sunnypeak devices were among the first to become popular. Some manufacturers have gone out of the box (pun intended) with innovations that are not necessarily compliant with Google's specifications but provide capabilities beyond the Cardboard design. A notable example is the Wearality viewer (http://www.wearality.com/), which includes an exclusive patent 150-degree field of view (FOV) double Fresnel lens. It's so portable that it folds up like a pair of sunglasses. The Wearality viewer is shown in the following image: Configuring your Cardboard viewer With such a variety of Cardboard devices and variations in lens distance, field of view, distortion, and so on, Cardboard apps must be configured to a specific device's attributes. Google provides a solution to this as well. Each Cardboard viewer comes with a unique QR code and/or NFC chip, which you scan to configure the software for that device. If you're interested in calibrating your own device or customizing your parameters, check out the profile generator tools at https://www.google.com/get/cardboard/viewerprofilegenerator/. To configure your phone to a specific Cardboard viewer, open the standard Google Cardboard app, and select the Settings icon in the center bottom section of the screen, as shown in the following image: Then, point the camera to the QR code for your particular Cardboard viewer: Your phone is now configured for the specific Cardboard viewer parameters. Developing apps for Cardboard At the time of writing this article, Google provides two SDKs for Cardboard: Cardboard SDK for Android (https://developers.google.com/cardboard/android) Cardboard SDK for Unity (https://developers.google.com/cardboard/unity) Let's consider the Unity option first. Using Unity Unity (http://unity3d.com/) is a popular fully featured 3D game engine, which supports building your games on a wide gamut of platforms, from Playstation and XBox, to Windows and Mac (and Linux!), to Android and iOS. Unity consists of many separate tools integrated into a powerful engine under a unified visual editor. There are graphics tools, physics, scripting, networking, audio, animations, UI, and many more. It includes advanced computer graphics rendering, shading, textures, particles, and lighting with all kinds of options for optimizing performance and fine tuning the quality of your graphics for both 2D and 3D. If that's not enough, Unity hosts a huge Assets Store teaming with models, scripts, tools, and other assets created by its large community of developers. The Cardboard SDK for Unity provides a plugin package that you can import into the Unity Editor, containing prefabs (premade objects), C# scripts, and other assets. The package gives you what you need in order to add a stereo camera to your virtual 3D scene and build your projects to run as Cardboard apps on Android (and iOS). If you're interested in learning more about using Unity to build VR applications for Cardboard, check out another book by Packt Publishing, Unity Virtual Reality Projects by Jonathan Linowes (https://www.packtpub.com/game-development/unity-virtual-reality-projects). Going native So, why not just use Unity for Cardboard development? Good question. It depends on what you're trying to do. Certainly, if you need all the power and features of Unity for your project, it's the way to go. But at what cost? With great power comes great responsibility (says Uncle Ben Parker). It is quick to learn but takes a lifetime to master (says the Go Master). Seriously though, Unity is a powerful engine that may be an overkill for many applications. To take full advantage, you may require additional expertise in modeling, animation, level design, graphics, and game mechanics. Cardboard applications built with Unity are bulky. An empty Unity scene build for Android generates an .apk file that has minimum 23 megabytes. In contrast, the simple native Cardboard application, .apk, is under one megabyte. Along with this large app size comes a long loading time, possibly more than several seconds. It impacts the memory usage and battery use. Unless you've paid for a Unity Android license, your app always starts with the Made With Unity splash screen. These may not be acceptable constraints for you. In general, the closer you are to the metal, the better performance you'll eke out of your application. When you write directly for Android, you have direct access to the features of the device, more control over memory and other resources, and more opportunities for customization and optimization. This is why native mobile apps tend to trump over mobile web apps. Lastly, one of the best reasons to develop with native Android and Java may be the simplest. You're anxious to build something now! If you're already an Android developer, then just use what you already know and love! Take the straightest path from here to there. If you're familiar with Android development, then Cardboard development will come naturally. Using the Cardboard SDK for Android, you can do programming in Java, perhaps using an IDE (integrated development environment) such as Android Studio (also known as IntelliJ). As we'll notice throughout this article, your Cardboard Android app is like other Android apps, including a manifest, resources, and Java code. As with any Android app, you will implement a MainActivity class, but yours will extend CardboardActivity and implement CardboardView.StereoRenderer. Your app will utilize OpenGL ES 2.0 graphics, shaders, and 3D matrix math. It will be responsible for updating the display on each frame, that is, rerendering your 3D scene based on the direction the user is looking at that particular slice in time. It is particularly important in VR, but also in any 3D graphics context, to render a new frame as quickly as the display allows, usually at 60 FPS. Your app will handle the user input via the Cardboard trigger and/or gaze-based control. That's what your app needs to do. However, there are still more nitty gritty details that must be handled to make VR work. As noted in the Google Cardboard SDK guide (https://developers.google.com/cardboard/android/), the SDK simplifies many of these common VR development tasks, including the following: Lens distortion correction Head tracking 3D calibration Side-by-side rendering Stereo geometry configuration User input event handling Functions are provided in the SDK to handle these tasks for you. Building and deploying your applications for development, debugging, profiling, and eventually publishing on Google Play also follow the same Android workflows you may be familiar with already. That's cool. Of course, there's more to building an app than simply following an example. We'll take a look at techniques; for example, for using data-driven geometric models, for abstracting shaders and OpenGL ES API calls, and for building user interface elements, such as menus and icons. On top of all this, there are important suggested best practices for making your VR experiences work and avoiding common mistakes. An overview to VR best practices More and more is being discovered and written each day about the dos and don'ts when designing and developing for VR. Google provides a couple of resources to help developers build great VR experiences, including the following: Designing for Google Cardboard is a best practice document that helps you focus on the overall usability as well as avoid common VR pitfalls (http://www.google.com/design/spec-vr/designing-for-google-cardboard/a-new-dimension.html). Cardboard Design Lab is a Cardboard app that directly illustrates the principles of designing for VR, which you can explore in Cardboard itself. At Vision Summit 2016, the Cardboard team announced that they have released the source (Unity) project for developers to examine and extend (https://play.google.com/store/apps/details?id=com.google.vr.cardboard.apps.designlab and https://github.com/googlesamples/cardboard-unity/tree/master/Samples/CardboardDesignLab). VR motion sickness is a real symptom and concern for virtual reality caused in parts by a lag in screen updates, or latency, when you're moving your head. Your brain expects the world around you to change exactly in sync with your actual motion. Any perceptible delay can make you feel uncomfortable, to say the least, and possibly nauseous. Latency can be reduced by faster rendering of each frame and maintaining the recommended frames per second. Desktop VR apps are held at the high standard of 90 FPS, enabled by a custom HMD screen. On mobile devices, the screen hardware often limits the refresh rates to 60 FPS, or in the worst case, 30 FPS. There are additional causes of VR motion sickness and other user discomforts, which can be mitigated by following these design guidelines: Always maintain head tracking. If the virtual world seems to freeze or pause, this may cause users to feel ill. Displays user interface elements, such as titles and buttons, in 3D virtual spaces. If rendered in 2D, they'll seem to be "stuck to your face" and you will feel uncomfortable. When transitioning between scenes, fade to black, cut scenes will be very disorienting. Fading to white might be uncomfortably bright for your users. Users should remain in control of their movement within the app. Something about initiating camera motion yourself helps reduce motion sickness. Avoid acceleration and deceleration. As humans, we feel the acceleration but not constant velocity. If you are moving the camera inside the app, keep a constant velocity. Rollercoasters are fun, but even in real life, it can make you feel sick. Keep your users grounded. Being a virtual floating point in space, it can make you feel sick, whereas a feeling like you're standing on the ground or sitting in a cockpit provides a sense of stability. Maintain a reasonable distance from the eye for UI elements, such as buttons and reticle cursors. If too close, the user may have to look cross-eyed and can experience an eye strain. Some items that are too close may not converge at all and cause "double-vision." Building applications for virtual reality also differ from the conventional Android ones in other ways, such as follows: When transitioning from a 2D application into VR, it is recommended that you provide a headset icon for the user, the tap, as shown in the following image: To exit VR, the user can hit the back button in the system bar (if present) or the home button. The cardboard sample apps use a "tilt-up" gesture to return to the main menu, which is a fine approach if you want to allow a "back" input without forcing the user to remove the phone from the device. Make sure that you build your app to run in fullscreen mode (and not in Android's Lights Out mode). Do not perform any API calls that will present the user with a 2D dialog box. The user will be forced to remove the phone from the viewer to respond. Provide audio and haptic (vibration) feedback to convey information and indicate that the user input is recognized by the app. So, let's say that you've got your awesome Cardboard app done and is ready to publish. Now what? There's a line you can put in the AndroidManifext file that marks the app as a Cardboard app. Google's Cardboard app includes a Google Play store browser used to find a Cardboard app. Then, just publish it as you would do for any normal Android application. Summary In this article, we started by defining Google Cardboard and saw how it fits in the spectrum of consumer virtual reality devices. We then contrasted Cardboard with higher end VRs, such as Oculus Rift, HTC Vive, and PlayStation VR, making the case for low-end VR as a separate medium in its own right. We talked a bit about developing for Cardboard, and considered why and why not to use the Unity 3D game engine versus writing a native Android app in Java with the Cardboard SDK. And lastly, we took a quick survey of many design considerations for developing for VR, including ways to avoid motion sickness and tips for integrating Cardboard with Android apps in general. Resources for Article: Further resources on this subject: VR Build and Run[article] vROps – Introduction and Architecture[article] Designing and Building a vRealize Automation 6.2 Infrastructure[article]
Read more
  • 0
  • 0
  • 12421

article-image-setting-and-cleaning
Packt
11 Apr 2016
34 min read
Save for later

Setting Up and Cleaning Up

Packt
11 Apr 2016
34 min read
This article, by Mani Tadayon, author of the book, RSpec Essentials, discusses support code to set tests up and clean up after them. Initialization, configuration, cleanup, and other support code related to RSpec specs are important in real-world RSpec usage. We will learn how to cleanly organize support code in real-world applications by learning about the following topics: Configuring RSpec with spec_helper.rb Initialization and configuration of resources Preventing tests from accessing the Internet with WebMock Maintaining clean test state Custom helper code Loading support code on demand with tags (For more resources related to this topic, see here.) Configuring RSpec with spec_helper.rb The RSpec specs that we've seen so far have functioned as standalone units. Specs in the real world, however, almost never work without supporting code to prepare the test environment before tests are run and ensure it is cleaned up afterwards. In fact, the first line of nearly every real-world RSpec spec file loads a file that takes care of initialization, configuration, and cleanup: require 'spec_helper' By convention, the entry point for all support code for specs is in a file called spec_helper.rb. Another convention is that specs are located in a folder called spec in the root folder of the project. The spec_helper.rb file is located in the root of this spec folder. Now that we know where it goes, what do we actually put in spec_helper.rb? Let's start with an example: # spec/spec_helper.rb require 'rspec'   RSpec.configure do |config|   config.order            = 'random'   config.profile_examples = 3    end To see what these two options do, let's create a couple of dummy spec files that include our spec_helper.rb. Here's the first spec file: # spec/first_spec.rb require 'spec_helper'   describe 'first spec' do   it 'sleeps for 1 second' do     sleep 1   end     it 'sleeps for 2 seconds' do     sleep 2   end      it 'sleeps for 3 seconds' do     sleep 3   end  end And here's our second spec file: # spec/second_spec.rb require 'spec_helper'   describe 'second spec' do   it 'sleeps for 4 second' do     sleep 4   end     it 'sleeps for 5 seconds' do     sleep 5   end      it 'sleeps for 6 seconds' do     sleep 6   end  end Now let's run our two spec files and see what happens: We note that we used --format documentation when running RSpec so that we see the order in which the tests were run (the default format just outputs a green dot for each passing test). From the output, we can see that the tests were run in a random order. We can also see the three slowest specs. Although this was a toy example, I would recommend using both of these configuration options for RSpec. Running examples in a random order is very important, as it is the only reliable way of detecting bad tests which sometimes pass and sometimes fail based on the order the in which overall test suite is run. Also, keeping tests running fast is very important for maintaining a productive development flow, and seeing which tests are slow on every test run is the most effective way of encouraging developers to make the slow tests fast, or remove them from the test run. We'll return to both test order and test speed later. For now, let us just note that RSpec configuration is very important to keeping our specs reliable and fast. Initialization and configuration of resources Real-world applications rely on resources, such as databases, and external services, such as HTTP APIs. These must be initialized and configured for the application to work properly. When writing tests, dealing with these resources and services can be a challenge because of two opposing fundamental interests. First, we would like the test environment to match as closely as possible the production environment so that tests that interact with resources and services are realistic. For example, we may use a powerful database system in production that runs on many servers to provide the best performance. Should we spend money and effort to create and maintain a second production-grade database environment just for testing purposes? Second, we would like the test environment to be simple and relatively easy to understand, so that we understand what we are actually testing. We would also like to keep our code modular so that components can be tested in isolation, or in simpler environments that are easier to create, maintain, and understand. If we think of the example of the system that relies on a database cluster in production, we may ask ourselves whether we are better off using a single-server setup for our test database. We could even go so far as to use an entirely different database for our tests, such as the file-based SQLite. As always, there are no easy answers to such trade-offs. The important thing is to understand the costs and benefits, and adjust where we are on the continuum between production faithfulness and test simplicity as our system evolves, along with the goals it serves. For example, for a small hobbyist application or a project with a limited budget, we may choose to completely favor test simplicity. As the same code grows to become a successful fan site or a big-budget project, we may have a much lower tolerance for failure, and have both the motivation and resources to shift towards production faithfulness for our test environment. Some rules of thumb to keep in mind: Unit tests are better places for test simplicity Integration tests are better places for production faithfulness Try to cleverly increase production faithfulness in unit tests Try to cleverly increase test simplicity in integration tests In between unit and integration tests, be clear what is and isn't faithful to the production environment A case study of test simplicity with an external service Let's put these ideas into practice. I haven't changed the application code, except to rename the module OldWeatherQuery. The test code is also slightly changed to require a spec_helper file and to use a subject block to define an alias for the module name, which makes it easier to rename the code without having to change many lines of test code. So let's look at our three files now. First, here's the application code: # old_weather_query.rb   require 'net/http' require 'json' require 'timeout'   module OldWeatherQuery   extend self     class NetworkError < StandardError   end     def forecast(place, use_cache=true)     add_to_history(place)       if use_cache       cache[place] ||= begin         @api_request_count += 1         JSON.parse( http(place) )       end     else       JSON.parse( http(place) )     end   rescue JSON::ParserError     raise NetworkError.new("Bad response")   end     def api_request_count     @api_request_count ||= 0   end     def history     (@history || []).dup   end     def clear!     @history           = []     @cache             = {}     @api_request_count = 0   end     private     def add_to_history(s)     @history ||= []     @history << s   end     def cache     @cache ||= {}   end     BASE_URI = 'http://api.openweathermap.org/data/2.5/weather?q='   def http(place)     uri = URI(BASE_URI + place)       Net::HTTP.get(uri)   rescue Timeout::Error     raise NetworkError.new("Request timed out")   rescue URI::InvalidURIError     raise NetworkError.new("Bad place name: #{place}")   rescue SocketError     raise NetworkError.new("Could not reach #{uri.to_s}")   end end Next is the spec file: # spec/old_weather_query_spec.rb   require_relative 'spec_helper' require_relative '../old_weather_query'   describe OldWeatherQuery do   subject(:weather_query) { described_class }     describe 'caching' do     let(:json_response) do       '{"weather" : { "description" : "Sky is Clear"}}'     end       around(:example) do |example|       actual = weather_query.send(:cache)       expect(actual).to eq({})         example.run         weather_query.clear!     end       it "stores results in local cache" do       weather_query.forecast('Malibu,US')         actual = weather_query.send(:cache)       expect(actual.keys).to eq(['Malibu,US'])       expect(actual['Malibu,US']).to be_a(Hash)     end       it "uses cached result in subsequent queries" do       weather_query.forecast('Malibu,US')       weather_query.forecast('Malibu,US')       weather_query.forecast('Malibu,US')     end   end     describe 'query history' do     before do       expect(weather_query.history).to eq([])       allow(weather_query).to receive(:http).and_return("{}")     end     after do       weather_query.clear!     end       it "stores every place requested" do       places = %w(         Malibu,US         Beijing,CN         Delhi,IN         Malibu,US         Malibu,US         Beijing,CN       )         places.each {|s| weather_query.forecast(s) }         expect(weather_query.history).to eq(places)     end       it "does not allow history to be modified" do       expect {         weather_query.history = ['Malibu,CN']       }.to raise_error         weather_query.history << 'Malibu,CN'       expect(weather_query.history).to eq([])     end   end     describe 'number of API requests' do     before do       expect(weather_query.api_request_count).to eq(0)       allow(weather_query).to receive(:http).and_return("{}")     end       after do       weather_query.clear!     end       it "stores every place requested" do       places = %w(         Malibu,US         Beijing,CN         Delhi,IN         Malibu,US         Malibu,US         Beijing,CN       )         places.each {|s| weather_query.forecast(s) }         expect(weather_query.api_request_count).to eq(3)     end       it "does not allow count to be modified" do       expect {         weather_query.api_request_count = 100       }.to raise_error         expect {         weather_query.api_request_count += 10       }.to raise_error         expect(weather_query.api_request_count).to eq(0)     end   end end And last but not least, our spec_helper file, which has also changed only slightly: we only configure RSpec to show one slow spec (to keep test results uncluttered) and use color in the output to distinguish passes and failures more easily: # spec/spec_helper.rb   require 'rspec'   RSpec.configure do |config|   config.order            = 'random'   config.profile_examples = 1   config.color            = true end When we run these specs, something unexpected happens. Most of the time the specs pass, but sometimes they fail. If we keep running the specs with the same command, we'll see the tests pass and fail apparently at random. These are flaky tests, and we have exposed them because of the random order configuration we chose. If our tests run in a certain order, they fail. The problem could be simply in our tests. For example, we could have forgotten to clear state before or after a test. However, there could also be a problem with our code. In any case, we need to get to the bottom of the situation: We first notice that at the end of the failing test run, RSpec tells us "Randomized with seed 318". We can use this information to run the tests in the order that caused the failure and start to debug and diagnose the problem. We do this by passing the --seed parameter with the value 318, as follows: $ rspec spec/old_weather_query_spec.rb --seed 318 The problem has to do with the way that we increment @api_request_count without ensuring it has been initialized. Looking at our code, we notice that the only place we initialize @api_request_count is in OldWeatherQuery.api_request_count and OldWeatherQuery.clear!. If we don't call either of these methods first, then OldWeatherQuery.forecast, the main method in this module, will always fail. Our tests sometimes pass because our setup code calls one of these methods first when tests are run in a certain order, but that is not at all how our code would likely be used in production. So basically, our code is completely broken, but our specs pass (sometimes). Based on this, we can create a simple spec that will always fail: describe 'api_request is not initialized' do   it "does not raise an error" do     weather_query.forecast('Malibu,US')   end    end At least now our tests fail deterministically. But this is not the end of our troubles with these specs. If we run our tests many times with the seed value of 318, we will start seeing a second failing test case that is even more random than the first. This is an OldWeatherQuery::NetworkError and it indicates that our tests are actually making HTTP requests to the Internet! Let's do an experiment to confirm this. We'll turn off our Wi-Fi access, unplug our Ethernet cables, and run our specs. When we run our tests without any Internet access, we will see three errors in total. One of them is the error with the uninitialized @api_request_count instance variable, and two of them are instances of OldWeatherQuery::NetworkError, which confirms that we are indeed making real HTTP requests in our code. What's so bad about making requests to the Internet? After all, the test failures are indeed very random and we had to purposely shut off our Internet access to replicate the errors. Flaky tests are actually the least of our problems. First, we could be performing destructive actions that affect real systems, accounts, and people! Imagine if we were testing an e-commerce application that charged customer credit cards by using a third-party payment API via HTTP. If our tests actually hit our payment provider's API endpoint over HTTP, we would get a lot of declined transactions (assuming we are not storing and using real credit card numbers), which could lead to our account being suspended due to suspicions of fraud, putting our e-commerce application out of service. Also, if we were running a continuous integration (CI) server such as Jenkins, which did not have access to the public Internet, we would get failures in our CI builds due to failing tests that attempted to access the Internet. There are a few approaches to solving this problem. In our tests, we attempted to mock our HTTP requests, but obviously failed to do so effectively. A second approach is to allow actual HTTP requests but to configure a special server for testing purposes. Let's focus on figuring out why our HTTP mocks were not successful. In a small set of tests like in this example, it is not hard to hunt down the places where we are sending actual HTTP requests. In larger code bases with a lot of test support code, it may be harder. Also, it would be nice to prevent access to the Internet altogether so we notice these issues as soon as we run the offending tests. Fortunately, Ruby has many excellent tools for testing, and there is one that addresses our needs exactly: WebMock (https://github.com/bblimke/webmock). We simply install the gem and add a couple of lines to our spec helper file to disable all network connections in our tests: require 'rspec'   # require the webmock gem require 'webmock/rspec'   RSpec.configure do |config|   # this is done by default, but let's make it clear   WebMock.disable_net_connect!     Config.order            = 'random'   config.profile_examples = 1   config.color            = true end When we run our tests again, we'll see one or more instances of WebMock::NetConnectNotAllowedError, along with a backtrace to lead us to the point in our tests where the HTTP request was made: If we examine our test code, we'll notice that we mock the OldWeatherQuery.http method in a few places. However, we forgot to set up the mock in the first describe block for caching where we defined a json_response object, but never mocked the OldWeatherQuery.http method to return json_response. We can solve the problem by mocking OldWeatherQuery.http throughout the entire test file. We'll also take this opportunity to clean up the initialization of @api_request_count in our code. Here's what we have now: # new_weather_query.rb   require 'net/http' require 'json' require 'timeout'   module NewWeatherQuery   extend self     class NetworkError < StandardError   end     def forecast(place, use_cache=true)     add_to_history(place)     if use_cache       cache[place] ||= begin         increment_api_request_count         JSON.parse( http(place) )       end     else       JSON.parse( http(place) )     end   rescue JSON::ParserError => e     raise NetworkError.new("Bad response: #{e.inspect}")   end     def increment_api_request_count     @api_request_count ||= 0     @api_request_count += 1   end     def api_request_count     @api_request_count ||= 0   end     def history     (@history || []).dup   end     def clear!     @history           = []     @cache             = {}     @api_request_count = 0   end     private     def add_to_history(s)     @history ||= []     @history << s   end     def cache     @cache ||= {}   end     BASE_URI = 'http://api.openweathermap.org/data/2.5/weather?q='   def http(place)     uri = URI(BASE_URI + place)       Net::HTTP.get(uri)   rescue Timeout::Error     raise NetworkError.new("Request timed out")   rescue URI::InvalidURIError     raise NetworkError.new("Bad place name: #{place}")   rescue SocketError     raise NetworkError.new("Could not reach #{uri.to_s}")   end end And here is the spec file to go with it: # spec/new_weather_query_spec.rb   require_relative 'spec_helper' require_relative '../new_weather_query'   describe NewWeatherQuery do   subject(:weather_query) { described_class }     after { weather_query.clear! }     let(:json_response) { '{}' }   before do     allow(weather_query).to receive(:http).and_return(json_response)        end     describe 'api_request is initialized' do     it "does not raise an error" do       weather_query.forecast('Malibu,US')     end      end   describe 'caching' do     let(:json_response) do       '{"weather" : { "description" : "Sky is Clear"}}'     end       around(:example) do |example|       actual = weather_query.send(:cache)       expect(actual).to eq({})             example.run     end       it "stores results in local cache" do       weather_query.forecast('Malibu,US')         actual = weather_query.send(:cache)       expect(actual.keys).to eq(['Malibu,US'])       expect(actual['Malibu,US']).to be_a(Hash)     end       it "uses cached result in subsequent queries" do       weather_query.forecast('Malibu,US')       weather_query.forecast('Malibu,US')       weather_query.forecast('Malibu,US')     end   end     describe 'query history' do     before do       expect(weather_query.history).to eq([])     end       it "stores every place requested" do       places = %w(         Malibu,US         Beijing,CN         Delhi,IN         Malibu,US         Malibu,US         Beijing,CN       )         places.each {|s| weather_query.forecast(s) }         expect(weather_query.history).to eq(places)     end       it "does not allow history to be modified" do       expect {         weather_query.history = ['Malibu,CN']       }.to raise_error         weather_query.history << 'Malibu,CN'       expect(weather_query.history).to eq([])     end   end     describe 'number of API requests' do     before do       expect(weather_query.api_request_count).to eq(0)     end       it "stores every place requested" do       places = %w(         Malibu,US         Beijing,CN         Delhi,IN         Malibu,US         Malibu,US         Beijing,CN       )         places.each {|s| weather_query.forecast(s) }         expect(weather_query.api_request_count).to eq(3)     end       it "does not allow count to be modified" do       expect {         weather_query.api_request_count = 100       }.to raise_error         expect {         weather_query.api_request_count += 10       }.to raise_error         expect(weather_query.api_request_count).to eq(0)     end   end end Now we've fixed a major bug with our code that slipped through our specs and used to pass randomly. We've made it so that our tests always pass, regardless of the order in which they are run, and without needing to access the Internet. Our test code and application code has also become clearer as we've reduced duplication in a few places. A case study of production faithfulness with a test resource instance We're not done with our WeatherQuery example just yet. Let's take a look at how we would add a simple database to store our cached values. There are some serious limitations to the way we are caching with instance variables, which persist only within the scope of a single Ruby process. As soon as we stop or restart our app, the entire cache will be lost. In a production app, we would likely have many processes running the same code in order to serve traffic effectively. With our current approach, each process would have a separate cache, which would be very inefficient. We could easily save many HTTP requests if we were able to share the cache between processes and across restarts. Economizing on these requests is not simply a matter of improved response time. We also need to consider that we cannot make unlimited requests to external services. For commercial services, we would pay for the number of requests we make. For free services, we are likely to get throttled if we exceed some threshold. Therefore, an effective caching scheme that reduces the number of HTTP requests we make to our external services is of vital importance to the function of a real-world app. Finally, our cache is very simplistic and has no expiration mechanism short of clearing all entries. For a cache to be effective, we need to be able to store entries for individual locations for some period of time within which we don't expect the weather forecast to change much. This will keep the cache small and up to date. We'll use Redis (http://redis.io) as our database since it is very fast, simple, and easy to set up. You can find instructions on the Redis website on how to install it, which is an easy process on any platform. Once you have Redis installed, you simply need to start the server locally, which you can do with the redis-server command. We'll also need to install the Redis Ruby client as a gem (https://github.com/redis/redis-rb). Let's start with a separate configuration file to set up our Redis client for our tests: # spec/config/redis.rb   require 'rspec' require 'redis'   ENV['WQ_REDIS_URL'] ||= 'redis://localhost:6379/15'   RSpec.configure do |config|   if ! ENV['WQ_REDIS_URL'].is_a?(String)     raise "WQ_REDIS_URL environment variable not set"   end   ::REDIS_CLIENT = Redis.new( :url => ENV['WQ_REDIS_URL'] )     config.after(:example) do         ::REDIS_CLIENT.flushdb   end end Note that we place this file in a new config folder under our main spec folder. The idea is to configure each resource separately in its own file to keep everything isolated and easy to understand. This will make maintenance easy and prevent problems with configuration management down the road. We don't do much in this file, but we do establish some important conventions. There is a single environment variable, which takes care of the Redis connection URL. By using an environment variable, we make it easy to change configuration and also allow flexibility in how these configurations are stored. Our code doesn't care if the Redis connection URL is stored in a simple .env file with key-value pairs or loaded from a configuration database. We can also easily override this value manually simply by setting it when we run RSpec, like so: $ WQ_REDIS_URL=redis://1.2.3.4:4321/0 rspec spec Note that we also set a sensible default value, which is to run on the default Redis port of 6379 on our local machine, on database number 15, which is less likely to be used for local development. This prevents our tests from relying on our development database, or from polluting or destroying it. It is also worth mentioning that we prefix our environment variable with WQ (short for weather query). Small details like this are very important for keeping our code easy to understand and to prevent dangerous clashes. We could imagine the kinds of confusion and clashes that could be caused if we relied on REDIS_URL and we had multiple apps running on the same server, all relying on Redis. It would be very easy to break many applications if we changed the value of REDIS_URL for a single app to point to a different instance of Redis. We set a global constant, ::REDIS_CLIENT, to point to a Redis client. We will use this in our code to connect to Redis. Note that in real-world code, we would likely have a global namespace for the entire app and we would define globals such as REDIS_CLIENT under that namespace rather than in the global Ruby namespace. Finally, we configure RSpec to call the flushdb command after every example tagged with :redis to empty the database and keep state clean across tests. In our code, all tests interact with Redis, so this tag seems pointless. However, it is very likely that we would add code that had nothing to do with Redis, and using tags helps us to constrain the scope of our configuration hooks only to where they are needed. This will also prevent confusion about multiple hooks running for the same example. In general, we want to prevent global hooks where possible and make configuration hooks explicitly triggered where possible. So what does our spec look like now? Actually, it is almost exactly the same. Only a few lines have changed to work with the new Redis cache. See if you can spot them! # spec/redis_weather_query_spec.rb   require_relative 'spec_helper' require_relative '../redis_weather_query'   describe RedisWeatherQuery, redis: true do   subject(:weather_query) { described_class }     after { weather_query.clear! }     let(:json_response) { '{}' }   before do     allow(weather_query).to receive(:http).and_return(json_response)        end     describe 'api_request is initialized' do     it "does not raise an error" do       weather_query.forecast('Malibu,US')     end      end           describe 'caching' do     let(:json_response) do       '{"weather" : { "description" : "Sky is Clear"}}'     end       around(:example) do |example|       actual = weather_query.send(:cache).all       expect(actual).to eq({})             example.run     end     it "stores results in local cache" do       weather_query.forecast('Malibu,US')         actual = weather_query.send(:cache).all       expect(actual.keys).to eq(['Malibu,US'])       expect(actual['Malibu,US']).to be_a(Hash)     end       it "uses cached result in subsequent queries" do       weather_query.forecast('Malibu,US')       weather_query.forecast('Malibu,US')       weather_query.forecast('Malibu,US')     end   end     describe 'query history' do     before do       expect(weather_query.history).to eq([])     end       it "stores every place requested" do       places = %w(         Malibu,US         Beijing,CN         Delhi,IN         Malibu,US         Malibu,US         Beijing,CN       )         places.each {|s| weather_query.forecast(s) }         expect(weather_query.history).to eq(places)     end       it "does not allow history to be modified" do       expect {         weather_query.history = ['Malibu,CN']       }.to raise_error         weather_query.history << 'Malibu,CN'       expect(weather_query.history).to eq([])     end   end   describe 'number of API requests' do     before do       expect(weather_query.api_request_count).to eq(0)     end       it "stores every place requested" do       places = %w(         Malibu,US         Beijing,CN         Delhi,IN         Malibu,US         Malibu,US         Beijing,CN       )         places.each {|s| weather_query.forecast(s) }         expect(weather_query.api_request_count).to eq(3)     end       it "does not allow count to be modified" do       expect {         weather_query.api_request_count = 100       }.to raise_error         expect {         weather_query.api_request_count += 10       }.to raise_error         expect(weather_query.api_request_count).to eq(0)     end   end end So what about the actual WeatherQuery code? It changes very little as well: # redis_weather_query.rb   require 'net/http' require 'json' require 'timeout'   # require the new cache module require_relative 'redis_weather_cache' module RedisWeatherQuery   extend self     class NetworkError < StandardError   end     # ... same as before ...     def clear!     @history           = []     @api_request_count = 0           # no more clearing of cache here   end     private     # ... same as before ...       # the new cache module has a Hash-like interface   def cache     RedisWeatherCache   end     # ... same as before ...     end We can see that we've preserved pretty much the same code and specs as before. Almost all of the new functionality is accomplished in a new module that caches with Redis. Here is what it looks like: # redis_weather_cache.rb   require 'redis'   module RedisWeatherCache   extend self     CACHE_KEY             = 'weather_query:cache'   EXPIRY_ZSET_KEY       = 'weather_query:expiry_tracker'   EXPIRE_FORECAST_AFTER = 300 # 5 minutes       def redis_client     if ! defined?(::REDIS_CLIENT)       raise("No REDIS_CLIENT defined!")     end         ::REDIS_CLIENT   end     def []=(location, forecast)     redis_client.hset(CACHE_KEY, location, JSON.generate(forecast))     redis_client.zadd(EXPIRY_ZSET_KEY, Time.now.to_i, location)   end     def [](location)     remove_expired_entries         raw_value = redis_client.hget(CACHE_KEY, location)         if raw_value       JSON.parse(raw_value)     else       nil     end   end     def all     redis_client.hgetall(CACHE_KEY).inject({}) do |memo, (location, forecast_json)|       memo[location] = JSON.parse(forecast_json)       memo     end   end     def clear!     redis_client.del(CACHE_KEY)   end     def remove_expired_entries     # expired locations have a score, i.e. creation timestamp, less than a certain threshold     expired_locations = redis_client.zrangebyscore(EXPIRY_ZSET_KEY, 0, Time.now.to_i - EXPIRE_FORECAST_AFTER)       if ! expired_locations.empty?       # remove the cache entry       redis_client.hdel(CACHE_KEY, expired_locations)                  # also clear the expiry entry       redis_client.zrem(EXPIRY_ZSET_KEY, expired_locations)      end   end end We'll avoid a detailed explanation of this code. We simply note that we accomplish all of the design goals we discussed at the beginning of the section: a persistent cache with expiration of individual values. We've accomplished this using some simple Redis functionality along with ZSET or sorted set functionality, which is a bit more complex, and which we needed because Redis does not allow individual entries in a Hash to be deleted. We can see that by using method names such as RedisWeatherCache.[] and RedisWeatherCache.[]=, we've maintained a Hash-like interface, which made it easy to use this cache instead of the simple in-memory Ruby Hash we had in our previous iteration. Our tests all pass and are still pretty simple, thanks to the modularity of this new cache code, the modular configuration file, and the previous fixes we made to our specs to remove Internet and run-order dependencies. Summary In this article, we delved into setting up and cleaning up state for real-world specs that interact with external services and local resources by extending our WeatherQuery example to address a big bug, isolate our specs from the Internet, and cleanly configure a Redis database to serve as a better cache. Resources for Article: Further resources on this subject: Creating your first heat map in R [article] Probability of R? [article] Programming on Raspbian [article]
Read more
  • 0
  • 0
  • 3541

article-image-mastering-fundamentals
Packt
08 Apr 2016
10 min read
Save for later

Mastering of Fundamentals

Packt
08 Apr 2016
10 min read
In this article by Piotr Sikora, author of the book Professional CSS3, you will master box model, floating's troubleshooting positioning and display types. Readers, after this article, will be more aware of the foundation of HTML and CSS. In this article, we shall cover the following topics: Get knowledge about the traditional box model Basics of floating elements The foundation of positioning elements on webpage Get knowledge about display types (For more resources related to this topic, see here.) Traditional box model Understanding box model is the foundation in CSS theories. You have to know the impact of width, height, margin, and borders on the size of the box and how can you manage it to match the element on a website. Main questions for coders and frontend developers on interviews are based on box model theories. Let's begin this important lesson, which will be the foundation for every subject. Padding/margin/border/width/height The ingredients of final width and height of the box are: Width Height Margins Paddings Borders For a better understanding of box model, here is the image from Chrome inspector: For a clear and better understanding of box model, let's analyze the image: On the image, you can see that, in the box model, we have four edges: Content edge Padding edge Border edge Margin edge The width and height of the box are based on: Width/height of content Padding Border Margin The width and height of the content in box with default box-sizing is controlled by properties: Min-width Max-width Width Min-height Max-height Height An important thing about box model is how background properties will behave. Background will be included in the content section and in the padding section (to padding edge). Let's get a code and try to point all the elements of the box model. HTML: <div class="element">   Lorem ipsum dolor sit amet consecteur </div> CSS: .element {    background: pink;    padding: 10px;    margin: 20px;   width: 100px;   height: 100px;    border: solid 10px black; }   In the browser, we will see the following: This is the view from the inspector of Google Chrome: Let's check how the areas of box model are placed in this specific example: The basic task for interviewed Front End Developer is—the box/element is described with the styles: .box {     width: 100px;     height: 200px;     border: 10px solid #000;     margin: 20px;     padding: 30px; } Please count the final width and height (the real space that is needed for this element) of this element. So, as you can see, the problem is to count the width and height of the box. Ingridients of width: Width Border left Border right Padding left Padding right Additionally, for the width of the space taken by the box: Margin left Margin right Ingridients of height: Height Border top Border bottom Padding top Padding bottom Additionally, for height of the space taken by the box: Margin top Margin bottom So, when you will sum the element, you will have an equation: Width: Box width = width + borderLeft + borderRight + paddingLeft + paddingRight Box width = 100px + 10px + 10px + 30px + 30px = 180px Space width: width = width + borderLeft + borderRight + paddingLeft + paddingRight +  marginLeft + marginRight width = 100px + 10px + 10px + 30px + 30px + 20px + 20 px = 220px Height: Box height = height + borderTop + borderBottom + paddingTop + paddingBottom Box height  = 200px + 10px + 10px + 30px + 30px = 280px Space height: Space height = height + borderTop + borderBottom + paddingTop + paddingBottom +  marginTop + marginBottom Space height = 200px + 10px + 10px + 30px + 30px + 20px + 20px = 320px Here, you can check it in a real browser: Omiting problems with traditional box model (box sizing) The basic theory of box model is pretty hard to learn. You need to remember about all the elements of width/height, even if you set the width and height. The hardest for beginners is the understanding of padding, which shouldn't be counted as a component of width and height. It should be inside the box, and it should impact on these values. To change this behavior to support CSS3 since Internet Explorer 8, box sizing comes to picture. You can set the value: box-sizing: border-box What it gives to you? Finally, the counting of box width and height will be easier because box padding and border is inside the box. So, if we are taking our previous class: .box {     width: 100px;     height: 200px;     border: 10px solid #000;     margin: 20px;     padding: 30px; } We can count the width and height easily: Width = 100px Height = 200px Additionally, the space taken by the box: Space width = 140 px (because of the 20 px margin on both sides: left and right) Space height = 240 px (because of the 20 px margin on both sides: top and bottom) Here is a sample from Chrome: So, if you don't want to repeat all the problems of a traditional box model, you should use it globally for all the elements. Of course, it's not recommended in projects that you are getting in some old project, for example, from new client that needs some small changes:  * { width: 100px; } Adding the preceding code can make more harm than good because of the inheritance of this property for all the elements, which are now based on a traditional box model. But for all the new projects, you should use it. Floating elements Floating boxes are the most used in modern layouts. The theory of floating boxes was still used especially in grid systems and inline lists in CSS frameworks. For example, class and mixin inline-list (in Zurb Foundation framework) are based on floats. Possibilities of floating elements Element can be floated to the left and right. Of course, there is a method that is resetting floats too. The possible values are: float: left; // will float element to left float: right; // will float element to right float: none; // will reset float Most known floating problems When you are using floating elements, you can have some issues. Most known problems with floated elements are: Too big elements (because of width, margin left/right, padding left/right, and badly counted width, which is based on box model) Not cleared floats All of these problems provide a specific effect, which you can easily recognize and then fix. Too big elements can be recognized when elements are not in one line and it should. What you should check first is if the box-sizing: border-box is applied. Then, check the width, padding, and margin. Not cleared floats you can easily recognize when to floating structure some elements from next container are floated. It means that you have no clearfix in your floating container. Define clearfix/class/mixin When I was starting developing HTML and CSS code, there was a method to clear the floats with classes .cb or .clear, both defined as: .clearboth, .cb {     clear: both } This element was added in the container right after all the floated elements. This is important to remember about clearing the floats because the container which contains floating elements won't inherit the height of highest floating element (will have a height equal 0). For example: <div class="container">     <div class="float">         … content ...     </div>     <div class="float">         … content ...     </div>     <div class="clearboth"></div> </div> Where CSS looks like this: .float {     width: 100px;     height: 100px;     float: left; }   .clearboth {     clear: both } Nowadays, there is a better and faster way to clear floats. You can do this with clearfix, which can be defined like this: .clearfix:after {     content: " ";     visibility: hidden;     display: block;     height: 0;     clear: both; } You can use in HTML code: <div class="container clearfix">     <div class="float">         … content ...     </div>     <div class="float">         … content ...     </div> </div> The main reason to switch on clearfix is that you save one tag (with clears both classes). Recommended usage is based on the clearfix mixin, which you can define like this in SASS: =clearfix   &:after     content: " "     visibility: hidden     display: block     height: 0     clear: both So, every time you need to clear floating in some container, you need to invoke it. Let's take the previous code as an example: <div class="container">     <div class="float">         … content ...     </div>     <div class="float">         … content ...     </div> </div> A container can be described as: .container   +clearfix Example of using floating elements The most known usage of float elements is grids. Grid is mainly used to structure the data displayed on a webpage. In this article, let's check just a short draft of grid. Let's create an HTML code: <div class="row">     <div class="column_1of2">         Lorem     </div>     <div class="column_1of2">         Lorem     </div>   </div> <div class="row">     <div class="column_1of3">         Lorem     </div>     <div class="column_1of3">         Lorem     </div>     <div class="column_1of3">         Lorem     </div>   </div>   <div class="row">     <div class="column_1of4">         Lorem     </div>     <div class="column_1of4">         Lorem     </div>     <div class="column_1of4">         Lorem     </div>     <div class="column_1of4">         Lorem     </div> </div> And SASS: *   box-sizing: border-box =clearfix   &:after     content: " "     visibility: hidden     display: block     height: 0     clear: both .row   +clearfix .column_1of2   background: orange   width: 50%   float: left   &:nth-child(2n)     background: red .column_1of3   background: orange   width: (100% / 3)   float: left   &:nth-child(2n)     background: red .column_1of4   background: orange   width: 25%   float: left   &:nth-child(2n)     background: red The final effect: As you can see, we have created a structure of a basic grid. In places where HTML code is placed, Lorem here is a full lorem ipsum to illustrate the grid system. Summary In this article, we studied about the traditional box model and floating elements in detail. Resources for Article: Further resources on this subject: Flexbox in CSS [article] CodeIgniter Email and HTML Table [article] Developing Wiki Seek Widget Using Javascript [article]
Read more
  • 0
  • 0
  • 13522

article-image-building-custom-widgets
Packt
08 Apr 2016
7 min read
Save for later

Building Custom Widgets

Packt
08 Apr 2016
7 min read
This article by Yogesh Dhanapal and Jayakrishnan Vijayaraghavan, authors of the book ArcGIS for JavaScript developers by Example, will develop a custom widget. (For more resources related to this topic, see here.) Building a custom widget Let's create a custom widget in the app, which will do the following: Allow the user to draw a polygon on the map. The polygon should be symbolized with a semitransparent red fill with a dashed yellow outline. The polygon should fetch all the major wild fire events within the boundary of the polygon. This shall be shown as highlighted in graphics and the data should in a grid. Internationalization support must be provided. Modules required for the widget Let's list the modules required to define class and their corresponding intended callback function decoration The modules for Class declaration and OOPS are illustrated in the following table: Modules Callback functions dojo/_base/declare declare dijit/_WidgetBase _WidgetBase dojo/_base/lang lang The modules for using HTML templates are illustrated in the following table: Modules Callback functions dijit/_TemplatedMixin _TemplatedMixin dojo/text! dijitTemplate The modules for using Event is illustrated in the following table: Modules Callback functions dojo/on on dijit/a11yclick a11yclick The modules for manipulating dom elements and their style are illustrated in the following table: Modules Callback functions dojo/dom-style domStyle dojo/dom-class domClass dojo/domReady! - Modules for using draw toolbar and displaying graphics Modules Callback functions esri/toolbars/draw Draw esri/symbols/SimpleFillSymbol SimpleFillSymbol esri/symbols/SimpleLineSymbol SimpleLineSymbol esri/graphic Graphic dojo/_base/Color Color Modules for querying data Modules Callback functions esri/tasks/query Query esri/tasks/QueryTask QueryTask Modules for internationalization support Module Callback functions dojo/i18n! nls Using the draw toolbar Draw toolbar enables us to draw graphics on the map. Draw toolbar has events associated with it. When a draw operation is completed, it returns the object drawn on the map as geometry. Perform the following steps to create a graphic using the draw toolbar: Initiating Draw toolbar The draw toolbar is provided by the module esri/toolbars/draw. The draw toolbar accepts the map object as an argument. Instantiate the draw toolbar within the postCreate function. The draw toolbar also accepts an additional optional argument named options. One of the properties in the options object is named showTooltips. This can be set to true so that we can see a tooltip associated while drawing. The text in the tooltip can be customized. Else, a default tooltip associated with draw geometry is displayed: return declare([_WidgetBase, _TemplatedMixin], { //assigning html template to template string templateString: dijitTemplate, isDrawActive: false, map: null, tbDraw: null, constructor: function (options, srcRefNode) { this.map = options.map; }, startup: function () {}, postCreate: function () { this.inherited(arguments); this.tbDraw = new Draw(this.map, {showTooltips : true}); } The Draw toolbar can be activated on the click event or touch event (in case of smartphones or tablets) of a button, which is intended to indicate the start of a draw event. Dojo provides a module that takes care of touch as well as click events. The module is named dijit/a11yclick. To activate the draw toolbar, we need to provide the type of symbol to draw. The draw toolbar provides a list of constants, which corresponds to the type of draw symbol. These constants are POINT, POLYGON, LINE, POLYLINE, FREEHAND_POLYGON, FREEHAND_POLYLINE, MULTI_POINT, RECTANGLE, TRIANGLE, CIRCLE, ELLIPSE, ARROW, UP_ARROW, DOWN_ARROW, LEFT_ARROW, and RIGHT_ARROW. While activating the draw toolbar, these constants must be used to define the type of Draw operation required. Our objective is to draw a polygon on the click of a draw button. The code is shown in the following screenshot: The draw operation Once the draw tool bar is activated, the draw operation will begin. For point geometry, the draw operation is just a single click. For a polyline and a polygon, the single click adds a vertex to the polyline and a double-click ends the sketch. For freehand polyline or polygon, the click-and-drag operation draws the geometry and a mouse-up operation ends the drawing. The draw-end event handler When the draw operation is complete, we need an event handler to do something with the shape that was drawn by the draw toolbar. The API provides a draw-end event, which is fired once the draw operation is complete. This event handler must be connected to the draw toolbar. This event handler shall be defined within the this.own() function inside the postCreate() method of the widget. The event result can be passed to a named function or an anonymous function: postCreate: function () { ... this.tbDraw.on("draw-end", lang.hitch(this, this.querybyGeometry)); }, ... querybyGeometry: function (evt) { this.isBusy(true); //Get the Drawn geometry var geometryInput = evt.geometry; ... } Symbolizing the drawn shape In the draw-end event call back function, we will get the geometry of the drawn shape as the result object. To add this geometry back to the map, we need to symbolize it. A symbol is associated with the geometry it symbolizes. Also, the styling of the symbol is defined by the colors or picture used to fill up the symbol and the size of the symbol. Just to symbolize a polygon, we need to use the SimpleFillSymbol and the SimpleLineSymbol modules. We may also need the esri/color module to define the fill colors. Let's review a snippet to understand this better. This is a simple snippet to construct a symbol for a polygon with semitransparent solid red color fill and a yellow dash-dot line. In the preceding snippet, SimpleFillSymbol.STYLE_SOLID and SimpleLineSymbol.STYLE_DASHDOT are the constants provided by the SimpleFIllSymbol and the SimpleLineSymbol modules, respectively. These constants are used for styling the polygon and the line. Two colors are defined in the construction of the symbol—one for filling up the polygon and the other for coloring the outline. A color can be defined by four components. They are as follows: Red Green Blue Opacity Red, Green, and Blue components take values from 0 to 255 and the Opacity takes values from 0 to 1. A combination of Red, Green, and Blue components can be used to produce any color according to the RGB color theory. So, to create a yellow color, we are using the maximum of Red component (255) and the maximum of Green Component (255); we don't want the Blue component to contribute to our color, so we will use 0. An Opacity value of 0 means 100% transparency and an opacity value of 1 means 100% opaqueness. We have used 0.2 for the fill color. This means that we need our polygon to be 20% opaque or 80% transparent. The default value for this component is 1. Symbol is just a generic object. It means that any polygon geometry can use the symbol to render itself. Now, we need a container object to display the drawn geometry with the previously defined symbol on the map. A Graphic object provided by the esri/Graphic module acts as a container object, which can accept a geometry and a symbol. The graphic object can be added to the map's graphic layer. A graphic layer is always present in the map object, which can be accessed by using the graphics property of the map (this.map.graphics). Summary In this article, we learned how to create classes and customized widget and its required modules, and how to use a draw toolbar. Resources for Article: Further resources on this subject: Using JavaScript with HTML[article] Learning to Create and Edit Data in ArcGIS[article] Introduction to Mobile Web ArcGIS Development[article]
Read more
  • 0
  • 0
  • 2384

article-image-threading-basics
Packt
08 Apr 2016
6 min read
Save for later

Threading Basics

Packt
08 Apr 2016
6 min read
In this article by Eugene Agafonov, author of the book Multithreading with C# Cookbook - Second Edition, we will cover the basic tasks to work with threads in C#. You will learn the following recipes: Creating a thread in C# Pausing a thread Making a thread wait (For more resources related to this topic, see here.) Creating a thread in C# Throughout the following recipes, we will use Visual Studio 2015 as the main tool to write multithreaded programs in C#. This recipe will show you how to create a new C# program and use threads in it. There is a free Visual Studio Community 2015 IDE, which can be downloaded from the Microsoft website and used to run the code samples. Getting ready To work through this recipe, you will need Visual Studio 2015. There are no other prerequisites. How to do it... To understand how to create a new C# program and use threads in it, perform the following steps: Start Visual Studio 2015. Create a new C# console application project. Make sure that the project uses .NET Framework 4.6 or higher; however, the code in this article will work with previous versions. In the Program.cs file, add the following using directives: using System; using System.Threading; using static System.Console; Add the following code snippet below the Main method: static void PrintNumbers() { WriteLine("Starting..."); for (int i = 1; i < 10; i++) { WriteLine(i); } } Add the following code snippet inside the Main method: Thread t = new Thread(PrintNumbers); t.Start(); PrintNumbers(); Run the program. The output will be something like the following screenshot: How it works... In steps 1 and 2, we created a simple console application in C# using .Net Framework version 4.0. Then, in step 3, we included the System.Threading namespace, which contains all the types needed for the program. Then, we used the using static feature from C# 6.0, which allows us to use the System.Console type's static methods without specifying the type name. An instance of a program that is being executed can be referred to as a process. A process consists of one or more threads. This means that when we run a program, we always have one main thread that executes the program code. In step 4, we defined the PrintNumbers method, which will be used in both the main and newly created threads. Then, in step 5, we created a thread that runs PrintNumbers. When we construct a thread, an instance of the ThreadStart or ParameterizedThreadStart delegate is passed to the constructor. The C# compiler creates this object behind the scenes when we just type the name of the method we want to run in a different thread. Then, we start a thread and run PrintNumbers in the usual manner on the main thread. As a result, there will be two ranges of numbers from 1 to 10 randomly crossing each other. This illustrates that the PrintNumbers method runs simultaneously on the main thread and on the other thread. Pausing a thread This recipe will show you how to make a thread wait for some time without wasting operating system resources. Getting ready To work through this recipe, you will need Visual Studio 2015. There are no other prerequisites. How to do it... To understand how to make a thread wait without wasting operating system resources, perform the following steps: Start Visual Studio 2015. Create a new C# console application project. In the Program.cs file, add the following using directives: using System; using System.Threading; using static System.Console; using static System.Threading.Thread; Add the following code snippet below the Main method: static void PrintNumbers() { WriteLine("Starting..."); for (int i = 1; i < 10; i++) { WriteLine(i); } } static void PrintNumbersWithDelay() { WriteLine("Starting..."); for (int i = 1; i < 10; i++) { Sleep(TimeSpan.FromSeconds(2)); WriteLine(i); } } Add the following code snippet inside the Main method: Thread t = new Thread(PrintNumbersWithDelay); t.Start(); PrintNumbers(); Run the program. How it works... When the program is run, it creates a thread that will execute a code in the PrintNumbersWithDelay method. Immediately after that, it runs the PrintNumbers method. The key feature here is adding the Thread.Sleep method call to a PrintNumbersWithDelay method. It causes the thread executing this code to wait a specified amount of time (2 seconds in our case) before printing each number. While a thread sleeps, it uses as little CPU time as possible. As a result, we will see that the code in the PrintNumbers method, which usually runs later, will be executed before the code in the PrintNumbersWithDelay method in a separate thread. Making a thread wait This recipe will show you how a program can wait for some computation in another thread to complete to use its result later in the code. It is not enough to use Thread.Sleep method because we don't know the exact time the computation will take. Getting ready To work through this recipe, you will need Visual Studio 2015. There are no other prerequisites. How to do it... To understand how a program waits for some computation in another thread to complete in order to use its result later, perform the following steps: Start Visual Studio 2015. Create a new C# console application project. In the Program.cs file, add the following using directives: using System; using System.Threading; using static System.Console; using static System.Threading.Thread; Add the following code snippet below the Main method: static void PrintNumbersWithDelay() { WriteLine("Starting..."); for (int i = 1; i < 10; i++) { Sleep(TimeSpan.FromSeconds(2)); WriteLine(i); } } Add the following code snippet inside the Main method: WriteLine("Starting..."); Thread t = new Thread(PrintNumbersWithDelay); t.Start(); t.Join(); WriteLine("Thread completed"); Run the program. How it works... When the program is run, it runs a long-running thread that prints out numbers and waits two seconds before printing each number. But in the main program, we called the t.Join method, which allows us to wait for thread t to complete. When it is complete, the main program continues to run. With the help of this technique, it is possible to synchronize execution steps between two threads. The first one waits until another one is complete and then continues to work. While the first thread waits, it is in a blocked state (as it is in the previous recipe when you call Thread.Sleep). Summary In this article, we focused on performing some very basic operations with threads in the C# language. We covered a thread's life cycle, which includes creating a thread, suspending a thread, and making a thread wait. Resources for Article: Further resources on this subject: Simplifying Parallelism Complexity in C#[article] Watching Multiple Threads in C#[article] Debugging Multithreaded Applications as Singlethreaded in C#[article]
Read more
  • 0
  • 0
  • 1519
Unlock access to the largest independent learning library in Tech for FREE!
Get unlimited access to 7500+ expert-authored eBooks and video courses covering every tech area you can think of.
Renews at $19.99/month. Cancel anytime
article-image-getting-started-d3-es2016-and-nodejs
Packt
08 Apr 2016
25 min read
Save for later

Getting Started with D3, ES2016, and Node.js

Packt
08 Apr 2016
25 min read
In this article by Ændrew Rininsland, author of the book Learning d3.js Data Visualization, Second Edition, we'll lay the foundations of what you'll need to run all the examples in the article. I'll explain how you can start writing ECMAScript 2016 (ES2016) today—which is the latest and most advanced version of JavaScript—and show you how to use Babel to transpile it to ES5, allowing your modern JavaScript to be run on any browser. We'll then cover the basics of using D3 to render a basic chart. (For more resources related to this topic, see here.) What is D3.js? D3 stands for Data-Driven Documents, and it is being developed by Mike Bostock and the D3 community since 2011. The successor to Bostock's earlier Protovis library, it allows pixel-perfect rendering of data by abstracting the calculation of things such as scales and axes into an easy-to-use domain-specific language (DSL). D3's idioms should be immediately familiar to anyone with experience of using the massively popular jQuery JavaScript library. Much like jQuery, in D3, you operate on elements by selecting them and then manipulating via a chain of modifier functions. Especially within the context of data visualization, this declarative approach makes using it easier and more enjoyable than a lot of other tools out there. The official website, https://d3js.org/, features many great examples that show off the power of D3, but understanding them is tricky at best. After finishing this article, you should be able to understand D3 well enough to figure out the examples. If you want to follow the development of D3 more closely, check out the source code hosted on GitHub at https://github.com/mbostock/d3. The fine-grained control and its elegance make D3 one of the most—if not the most—powerful open source visualization libraries out there. This also means that it's not very suitable for simple jobs such as drawing a line chart or two—in that case you might want to use a library designed for charting. Many use D3 internally anyway. One such interface is Axis, an open source app that I've written. It allows users to easily build basic line, pie, area, and bar charts without writing any code. Try it out at use.axisjs.org. As a data manipulation library, D3 is based on the principles of functional programming, which is probably where a lot of confusion stems from. Unfortunately, functional programming goes beyond the scope of this article, but I'll explain all the relevant bits to make sure that everyone's on the same page. What’s ES2016? One of the main changes in this edition is the emphasis on ES2016, the most modern version of JavaScript currently available. Formerly known as ES6 (Harmony), it pushes the JavaScript language's features forward significantly, allowing for new usage patterns that simplify code readability and increase expressiveness. If you've written JavaScript before and the examples in this article look pretty confusing, it means you're probably familiar with the older, more common ES5 syntax. But don't sweat! It really doesn't take too long to get the hang of the new syntax, and I will try to explain the new language features as we encounter them. Although it might seem a somewhat steep learning curve at the start, by the end, you'll have improved your ability to write code quite substantially and will be on the cutting edge of contemporary JavaScript development. For a really good rundown of all the new toys you have with ES2016, check out this nice guide by the folks at Babel.js, which we will use extensively throughout this article: https://babeljs.io/docs/learn-es2015/. Before I go any further, let me clear some confusion about what ES2016 actually is. Initially, the ECMAScript (or ES for short) standards were incremented by cardinal numbers, for instance, ES4, ES5, ES6, and ES7. However, with ES6, they changed this so that a new standard is released every year in order to keep pace with modern development trends, and thus we refer to the year (2016) now. The big release was ES2015, which more or less maps to ES6. ES2016 is scheduled for ratification in June 2016, and builds on the previous year's standard, while adding a few fixes and two new features. You don't really need to worry about compatibility because we use Babel.js to transpile everything down to ES5 anyway, so it runs the same in Node.js and in the browser. For the sake of simplicity, I will use the word "ES2016" throughout in a general sense to refer to all modern JavaScript, but I'm not referring to the ECMAScript 2016 specification itself. Getting started with Node and Git on the command line I will try not to be too opinionated in this article about which editor or operating system you should use to work through it (though I am using Atom on Mac OS X), but you are going to need a few prerequisites to start. The first is Node.js. Node is widely used for web development nowadays, and it's actually just JavaScript that can be run on the command line. If you're on Windows or Mac OS X without Homebrew, use the installer at https://nodejs.org/en/. If you're on Mac OS X and are using Homebrew, I would recommend installing "n" instead, which allows you to easily switch between versions of Node: $ brew install n $ n latest Regardless of how you do it, once you finish, verify by running the following lines: $ node --version $ npm --version If it displays the versions of node and npm (I'm using 5.6.0 and 3.6.0, respectively), it means you're good to go. If it says something similar to Command not found, double-check whether you've installed everything correctly, and verify that Node.js is in your $PATH environment variable. Next, you'll want to clone the article's repository from GitHub. Change to your project directory and type this: $ git clone https://github.com/aendrew/learning-d3 $ cd $ learning-d3 This will clone the development environment and all the samples in the learning-d3/ directory as well as switch you into it. Another option is to fork the repository on GitHub and then clone your fork instead of mine as was just shown. This will allow you to easily publish your work on the cloud, enabling you to more easily seek support, display finished projects on GitHub pages, and even submit suggestions and amendments to the parent project. This will help us improve this article for future editions. To do this, fork aendrew/learning-d3 and replace aendrew in the preceding code snippet with your GitHub username. Each chapter of this book is in a separate branch. To switch between them, type the following command: $ git checkout chapter1 Replace 1 with whichever chapter you want the examples for. Stay at master for now though. To get back to it, type this line: $ git stash save && git checkout master The master branch is where you'll do a lot of your coding as you work through this article. It includes a prebuilt package.json file (used by npm to manage dependencies), which we'll use to aid our development over the course of this article. There's also a webpack.config.js file, which tells the build system where to put things, and there are a few other sundry config files. We still need to install our dependencies, so let's do that now: $ npm install All of the source code that you'll be working on is in the src/ folder. You'll notice it contains an index.html and an index.js file; almost always, we'll be working in index.js, as index.html is just a minimal container to display our work in: <!DOCTYPE html> <div id="chart"></div> <script src="/assets/bundle.js"></script> To get things rolling, start the development server by typing the following line: $ npm start This starts up the Webpack development server, which will transform our ES2016 JavaScript into backwards-compatible ES5, which can easily be loaded by most browsers. In the preceding HTML, bundle.js is the compiled code produced by Webpack. Now point Chrome to localhost:8080 and fire up the developer console (Ctrl +Shift + J for Linux and Windows and Option + Command + J for Mac). You should see a blank website and a blank JavaScript console with a Command Prompt waiting for some code: A quick Chrome Developer Tools primer Chrome Developer Tools are indispensable to web development. Most modern browsers have something similar, but to keep this article shorter, we'll stick to Chrome here for the sake of simplicity. Feel free to use a different browser. Firefox's Developer Edition is particularly nice. We are mostly going to use the Elements and Console tabs, Elements to inspect the DOM and Console to play with JavaScript code and look for any problems. The other six tabs come in handy for large projects: The Network tab will let you know how long files are taking to load and help you inspect the Ajax requests. The Profiles tab will help you profile JavaScript for performance. The Resources tab is good for inspecting client-side data. Timeline and Audits are useful when you have a global variable that is leaking memory and you're trying to work out exactly why your library is suddenly causing Chrome to use 500 MB of RAM. While I've used these in D3 development, they're probably more useful when building large web applications with frameworks such as React and Angular. One of the favorites from Developer Tools is the CSS inspector at the right-hand side of the Elements tab. It can tell you what CSS rules are affecting the styling of an element, which is very good for hunting rogue rules that are messing things up. You can also edit the CSS and immediately see the results, as follows: The obligatory bar chart example No introductory chapter on D3 would be complete without a basic bar chart example. They are to D3 as "Hello World" is to everything else, and 90 percent of all data storytelling can be done in its simplest form with an intelligent bar or line chart. For a good example of this, look at the kinds of graphics The Economist includes with their articles—they frequently summarize the entire piece with a simple line chart. Coming from a newsroom development background, many of my examples will be related to some degree to current events or possible topics worth visualizing with data. The news development community has been really instrumental in creating the environment for D3 to flourish, and it's increasingly important for aspiring journalists to have proficiency in tools such as D3. The first dataset that we'll use is UNHCR's regional population data. The documentation for this endpoint is at data.unhcr.org/wiki/index.php/Get-population-regional.html. We'll create a bar for each population of displaced people. The first step is to get a basic container set up, which we can then populate with all of our delicious new ES2016 code. At the top of index.js, put the following code: export class BasicChart {   constructor(data) {     var d3 = require('d3'); // Require D3 via Webpack     this.data = data;     this.svg = d3.select('div#chart').append('svg');   } } var chart = new BasicChart(); If you open this in your browser, you'll get the following error on your console: Uncaught Error: Cannot find module "d3" This is because we haven't installed it yet. You’ll notice on line 3 of the preceding code that we import D3 by requiring it. If you've used D3 before, you might be more familiar with it attached to the window global object. This is essentially the same as including a script tag that references D3 in your HTML document, the only difference being that Webpack uses the Node version and compiles it into your bundle.js. To install D3, you use npm. In your project directory, type the following line: $ npm install d3 --save This will pull the latest version of D3 from npmjs.org to the node_modules directory and save it in your package.json file. The package.json file is really useful; instead of keeping all your dependencies inside of your Git repository, you can easily redownload them all just by typing this line: $ npm install If you go back to your browser and switch quickly to the Elements tab, you'll notice a new SVG element as a child of #chart. Go back to index.js. Let's add a bit more to the constructor before I explain what's going on here: export class BasicChart {   constructor(data) {     var d3 = require('d3'); // Require D3 via Webpack     this.data = data;     this.svg = d3.select('div#chart').append('svg');     this.margin = {       left: 30,       top: 30,       right: 0,       bottom: 0     };     this.svg.attr('width', window.innerWidth);     this.svg.attr('height', window.innerHeight);     this.width = window.innerWidth - this.margin.left - this.margin.right;     this.height = window.innerHeight - this.margin.top - this.margin.bottom;     this.chart = this.svg.append('g')       .attr('width', this.width)       .attr('height', this.height) .attr('transform', `translate(${this.margin.left}, ${this.margin.top})`);   } } Okay, here we have the most basic container you'll ever make. All it does is attach data to the class: this.data = data; This selects the #chart element on the page, appending an SVG element and assigning it to another class property: this.svg = d3.select('div#chart').append('svg'); Then it creates a third class property, chart, as a group that's offset by the margins: this.width = window.innerWidth - this.margin.left - this.margin.right;   this.height = window.innerHeight - this.margin.top - this.margin.bottom;   this.chart = svg.append('g')     .attr('width', this.width)     .attr('height', this.height)     .attr('transform', `translate(${this.margin.left}, ${this.margin.top})`); Notice the snazzy new ES2016 string interpolation syntax—using `backticks`, you can then echo out a variable by enclosing it in ${ and }. No more concatenating! The preceding code is not really all that interesting, but wouldn't it be awesome if you never had to type that out again? Well! Because you're the total boss and are learning ES2016 like all the cool kids, you won't ever have to. Let's create our first child class! We're done with BasicChart for the moment. Now, we want to create our actual bar chart class: export class BasicBarChart extends BasicChart {   constructor(data) {     super(data);   } } This is probably very confusing if you're new to ES6. First off, we're extending BasicChart, which means all the class properties that we just defined a minute ago are now available for our BasicBarChart child class. However, if we instantiate a new instance of this, we get the constructor function in our child class. How do we attach the data object so that it's available for both BasicChart and BasicBarChart? The answer is super(), which merely runs the constructor function of the parent class. In other words, even though we don't assign data to this.data as we did previously, it will still be available there when we need it. This is because it was assigned via the parent constructor through the use of super(). We're almost at the point of getting some bars onto that graph; hold tight! But first, we need to define our scales, which decide how D3 maps data to pixel values. Add this code to the constructor of BasicBarChart: let x = d3.scale.ordinal()   .rangeRoundBands([this.margin.left, this.width - this.margin.right], 0.1); The x scale is now a function that maps inputs from an as-yet-unknown domain (we don't have the data yet) to a range of values between this.margin.left and this.width - this.margin.right, that is, between 30 and the width of your viewport minus the right margin, with some spacing defined by the 0.1 value. Because it's an ordinal scale, the domain will have to be discrete rather than continuous. The rangeRoundBands means the range will be split into bands that are guaranteed to be round numbers. Hoorah! We have fit our first new fancy ES2016 feature! The let is the new var—you can still use var to define variables, but you should use let instead because it's limited in scope to the block, statement, or expression on which it is used. Meanwhile, var is used for more global variables, or variables that you want available regardless of the block scope. For more on this, visit http://mdn.io/let. If you have no idea what I'm talking about here, don't worry. It just means that you should define variables with let because they're more likely to act as you think they should and are less likely to leak into other parts of your code. It will also throw an error if you use it before it's defined, which can help with troubleshooting and preventing sneaky bugs. Still inside the constructor, we define another scale named y: let y = d3.scale.linear().range([this.height, this.margin.bottom]); Similarly, the y scale is going to map a currently unknown linear domain to a range between this.height and this.margin.bottom, that is, your viewport height and 30. Inverting the range is important because D3.js considers the top of a graph to be y=0. If ever you find yourself trying to troubleshoot why a D3 chart is upside down, try switching the range values. Now, we define our axes. Add this just after the preceding line, inside the constructor: let xAxis = d3.svg.axis().scale(x).orient('bottom'); let yAxis = d3.svg.axis().scale(y).orient('left'); We've told each axis what scale to use when placing ticks and which side of the axis to put the labels on. D3 will automatically decide how many ticks to display, where they should go, and how to label them. Now the fun begins! We're going to load in our data using Node-style require statements this time around. This works because our sample dataset is in JSON and it's just a file in our repository. For now, this will suffice for our purposes—no callbacks, promises, or observables necessary! Put this at the bottom of the constructor: let data = require('./data/chapter1.json'); Once or maybe twice in your life, the keys in your dataset will match perfectly and you won't need to transform any data. This almost never happens, and today is not one of those times. We're going to use basic JavaScript array operations to filter out invalid data and map that data into a format that's easier for us to work with: let totalNumbers = data.filter((obj) => { return obj.population.length;   })   .map(     (obj) => {       return {         name: obj.name,         population: Number(obj.population[0].value)       };     }   ); This runs the data that we just imported through Array.prototype.filter, whereby any elements without a population array are stripped out. The resultant collection is then passed through Array.prototype.map, which creates an array of objects, each comprised of a name and a population value. We've turned our data into a list of two-value dictionaries. Let's now supply the data to our BasicBarChart class and instantiate it for the first time. Consider the line that says the following: var chart = new BasicChart(); Replace it with this line: var myChart = new BasicBarChart(totalNumbers); The myChart.data will now equal totalNumbers! Go back to the constructor in the BasicBarChart class. Remember the x and y scales from before? We can finally give them a domain and make them useful. Again, a scale is a simply a function that maps an input range to an output domain: x.domain(data.map((d) => { return d.name })); y.domain([0, d3.max(data, (d) => { return d.population; })]); Hey, there's another ES2016 feature! Instead of typing function() {} endlessly, you can now just put () => {} for anonymous functions. Other than being six keystrokes less, the "fat arrow" doesn't bind the value of this to something else, which can make life a lot easier. For more on this, visit http://mdn.io/Arrow_functions. Since most D3 elements are objects and functions at the same time, we can change the internal state of both scales without assigning the result to anything. The domain of x is a list of discrete values. The domain of y is a range from 0 to the d3.max of our dataset—the largest value. Now we're going to draw the axes on our graph: this.chart.append('g')         .attr('class', 'axis')         .attr('transform', `translate(0, ${this.height})`)         .call(xAxis); We've appended an element called g to the graph, given it the axis CSS class, and moved the element to a place in the bottom-left corner of the graph with the transform attribute. Finally, we call the xAxis function and let D3 handle the rest. The drawing of the other axis works exactly the same, but with different arguments: this.chart.append('g')         .attr('class', 'axis')         .attr('transform', `translate(${this.margin.left}, 0)`)         .call(yAxis); Now that our graph is labeled, it's finally time to draw some data: this.chart.selectAll('rect')         .data(data)         .enter()         .append('rect')         .attr('class', 'bar')         .attr('x', (d) => { return x(d.name); })         .attr('width', x.rangeBand())         .attr('y', (d) => { return y(d.population); })         .attr('height', (d) => { return this.height - y(d.population); }); Okay, there's plenty going on here, but this code is saying something very simple. This is what is says: For all rectangles (rect) in the graph, load our data Go through it For each item, append a rect Then define some attributes Ignore the fact that there aren't any rectangles initially; what you're doing is creating a selection that is bound to data and then operating on it. I can understand that it feels a bit weird to operate on non-existent elements (this was personally one of my biggest stumbling blocks when I was learning D3), but it's an idiom that shows its usefulness later on when we start adding and removing elements due to changing data. The x scale helps us calculate the horizontal positions, and rangeBand gives the width of the bar. The y scale calculates vertical positions, and we manually get the height of each bar from y to the bottom. Note that whenever we needed a different value for every element, we defined an attribute as a function (x, y, and height); otherwise, we defined it as a value (width). Keep this in mind when you're tinkering. Let's add some flourish and make each bar grow out of the horizontal axis. Time to dip our toes into animations! Modify the code you just added to resemble the following. I've highlighted the lines that are different: this.chart.selectAll('rect')   .data(data)   .enter()   .append('rect')   .attr('class', 'bar')   .attr('x', (d) => { return x(d.name); })   .attr('width', x.rangeBand())   .attr('y', () => { return y(this.margin.bottom); })   .attr('height', 0)   .transition()     .delay((d, i) => { return i*20; })     .duration(800)     .attr('y', (d) => { return y(d.population); })     .attr('height', (d) => {          return this.height - y(d.population);       }); The difference is that we statically put all bars at the bottom (margin.bottom) and then entered a transition with .transition(). From here on, we define the transition that we want. First, we wanted each bar's transition delayed by 20 milliseconds using i*20. Most D3 callbacks will return the datum (or "whatever data has been bound to this element," which is typically set to d) and the index (or the ordinal number of the item currently being evaluated, which is typically i) while setting the this argument to the currently selected DOM element. Because of this last point, we use the fat arrow—so that we can still use the class this.height property. Otherwise, we'd be trying to find the height property on our SVGRect element, which we're midway to trying to define! This gives the histogram a neat effect, gradually appearing from left to right instead of jumping up at once. Next, we say that we want each animation to last just shy of a second, with .duration(800). At the end, we define the final values for the animated attributes—y and height are the same as in the previous code—and D3 will take care of the rest. Save your file and the page should auto-refresh in the background. If everything went according to the plan, you should have a chart that looks like the following: According to this UNHCR data from June 2015, by far the largest number of displaced persons are from Syria. Hey, look at this—we kind of just did some data journalism here! Remember that you can look at the entire code on GitHub at http://github.com/aendrew/learning-d3/tree/chapter1 if you didn't get something similar to the preceding screenshot. We still need to do just a bit more, mainly by using CSS to style the SVG elements. We could have just gone to our HTML file and added CSS, but then that means opening that yucky index.html file. And where's the fun in writing HTML when we're learning some newfangled JavaScript?! First, create an index.css file in your src/ directory: html, body {   padding: 0;   margin: 0; }   .axis path, .axis line {   fill: none;   stroke: #eee;   shape-rendering: crispEdges; }   .axis text {   font-size: 11px; }   .bar {   fill: steelblue; } Then just add the following line to index.js: require('./index.css'); I know. Crazy, right?! No <style> tags needed! It's worth noting that anything involving require is the result of a Webpack loader; in this article, we've used both the CSS/Style and JSON loaders. Although the author of this text is a fan of Webpack, all we're doing is compiling the styles into bundle.js with Webpack instead of requiring them globally via a <style> tag. This is cool because instead of uploading a dozen files when deploying your finished code, you effectively deploy one optimized bundle. You can also scope CSS rules to be particular to when they’re being included and all sorts of other nifty stuff; for more information, refer to github.com/webpack/css-loader#local-scope. Looking at the preceding CSS, you can now see why we added all those classes to our shapes—we can now directly reference them when styling with CSS. We made the axes thin, gave them a light gray color, and used a smaller font for the labels. The bars should be light blue. Save and wait for the page to refresh. We've made our first D3 chart! I recommend fiddling with the values for width, height, and margin inside of BasicChart to get a feel of the power of D3. You'll notice that everything scales and adjusts to any size without you having to change other code. Smashing! Summary In this article, you learned what D3 is and took a glance at the core philosophy behind how it works. You also set up your computer for prototyping of ideas and to play with visualizations. This environment will be assumed throughout the article. We went through a simple example and created an animated histogram using some of the basics of D3. You learned about scales and axes, that the vertical axis is inverted, that any property defined as a function is recalculated for every data point, and that we use a combination of CSS and SVG to make things beautiful. We also did a lot of fancy stuff with ES2016, Babel, and Webpack and got Node.js installed. Go us! Most of all, this article has given you the basic tools so that you can start playing with D3.js on your own. Tinkering is your friend! Don't be afraid to break stuff—you can always reset to a chapter's default state by running $ git reset --soft origin/chapter1, replacing 1 with whichever chapter you're on. Next, we'll be looking at all this a bit more in depth, specifically how the DOM, SVG, and CSS interact with each other. This article discussed quite a lot, so if some parts got away from you, don't worry. Resources for Article: Further resources on this subject: An Introduction to Node.js Design Patterns [article] Developing Node.js Web Applications [article] Developing a Basic Site with Node.js and Express [article]
Read more
  • 0
  • 0
  • 2049

article-image-selecting-and-analyzing-digital-evidence
Packt
08 Apr 2016
13 min read
Save for later

Selecting and Analyzing Digital Evidence

Packt
08 Apr 2016
13 min read
In this article, Richard Boddington, the author of Practical Digital Forensics, explains how the recovery and preservation of digital evidence has traditionally involved imaging devices and storing the data in bulk in a forensic file or, more effectively, in a forensic image container, notably the IlookIX .ASB container. The recovery of smaller, more manageable datasets from larger datasets from a device or network system using the ISeekDiscovery automaton is now a reality. Whether the practitioner examines an image container or an extraction of information in the ISeekDiscovery container, it should be possible to overview the recovered information and develop a clearer perception of the type of evidence that should be located. Once acquired, the image or device may be searched to find evidence, and locating evidence requires a degree of analysis combined with practitioner knowledge and experience. The process of selection involves analysis, and as new leads open up, the search for more evidence intensifies until ultimately, a thorough search is completed. The searching process involves the analysis of possible evidence, from which evidence may be discarded, collected, or tagged for later reexamination, thereby instigating the selection process. The final two stages of the investigative process are the validation of the evidence, aimed at determining its reliability, relevance, authenticity, accuracy, and completeness, and finally, the presentation of the evidence to interested parties, such as the investigators, the legal team, and ultimately, the legal adjudicating body. (For more resources related to this topic, see here.) Locating digital evidence Locating evidence from the all-too-common large dataset requires some filtration of extraneous material, which has, until recently, been a mainly manual task of sorting the wheat from the chaff. But it is important to clear the clutter and noise of busy operating systems and applications from which only a small amount of evidence really needs to be gleaned. Search processes involve searching in a file system and inside files, and common searches for files are based on names or patterns in their names, keywords in their content, and temporal data (metadata) such as the last access or written time. A pragmatic approach to the examination is necessary, where the onus is on the practitioner to create a list of key words or search terms to cull specific, probative, and case-related information from very large groups of files. Searching desktops and laptops Home computer networks are normally linked to the Internet via a modem and various peripheral devices: a scanner, printer, external hard drive, thumb drive storage device, a digital camera, a mobile phone and a range of users. In an office network this would be a more complicated network system. The linked connections between the devices and the Internet with the terminal leave a range of traces and logging records in the terminal and on some of the devices and the Internet. E-mail messages will be recorded externally on the e-mail server, the printer may keep a record of print jobs, the external storage devices and the communication media also leave logs and data linked to the terminal. All of this data may assist in the reconstruction of key events and provide evidence related to the investigation. Using the logical examination process (booting up the image) it is possible to recover a limited number of deleted files and reconstruct some of the key events of relevance to an investigation. It may not always be possible to boot up a forensic image and view it in its logical format, which is easier and more familiar to users. However, viewing the data inside a forensic image in it physical format provides unaltered metadata and a greater number of deleted, hidden and obscured files that provide accurate information about applications and files. It is possible to view the containers that hold these histories and search records that have been recovered and stored in a forensic file container. Selecting digital evidence For those unfamiliar with investigations, it is quite common to misread the readily available evidence and draw incorrect conclusions. Business managers attempting to analyze what they consider are the facts of a case would be wise to seek legal assistance in selecting and evaluating evidence on which they may wish to base a case. Selecting the evidence involves analysis of the located evidence to determine what events occurred in the system, their significance, and the probative value to the case. The selection analysis stage requires the practitioner to carefully examine the available digital evidence ensuring that they do not misinterpret the evidence and make imprudent presumptions without carefully cross-checking the information. It is a fact-finding process where an attempt is made to develop a plausible reconstruction of the facts. As in conventional crime investigations, practitioners should look for evidence that suggests or indicates motive (why?), means (how?) and opportunity (when?) for suspects to commit the crime, but in cases dependent on digital evidence, it can be a vexatious process. There are often too many potential suspects, which complicates the process of linking the suspect to the events. The following figure shows a typical family network setup using Wi-Fi connections to the home modem that facilitates connection to the Internet. In this case, the parents provided the broadband service for themselves and for three other family members. One of the children's girlfriend completed her university assignments on his computer and synchronized her iPad to his device. The complexity of a typical household network and determining the identity of the transgressor The complexity of a typical household network and determining the identity of the transgressor More effective forensic tools Various forensic tools are available to assist the practitioner in selecting and collating data for examination analysis and investigation. Sorting order from the chaos of even a small personal computer can be a time-consuming and frustrating process. As the digital forensic discipline develops, better and more reliable forensic tools have been developed to assist practitioners in locating, selecting, and collating evidence from larger, complex datasets. To varying degrees, most digital forensic tools used to view and analyze forensic images or attached devices provide helpful user interfaces for locating and categorizing information relevant to the examination. The most advanced application that provides access and convenient viewing of files is the Category Explorer feature in ILookIX, which divides files by type, signature, and properties. Category Explorer also allows the practitioner to create custom categories to group files by relevance. For example, in a criminal investigation involving a conspiracy, the practitioner could create a category for the first individual and a category for the second individual. As files are reviewed, they would then be added to either or both categories. Unlike tags, files can be added to multiple categories, and the categories can be given descriptive names. Deconstructing files The deconstruction of files involves processing compound files such as archives, e-mail stores, registry stores, or other files to extract useful and usable data from a complex file format and generate reports. Manual deconstruction adds significantly to the time taken to complete an examination. Deconstructable files are compound files that can be further broken down into smaller parts such as e-mails, archives, or thumb stores of JPG files. Once the deconstruction is completed, the files will either move into the deconstructed files or deconstruction failed files folders. Deconstructable files will now be now broken out more—e-mail, graphics, archives, and so on. Searching for files Indexing is the process of generating a table of text strings that can then be searched almost instantly any number of times. The two main uses of indexing are to create a dictionary to use when cracking passwords and to index the words for almost-instant searching. Indexing is also valuable when creating a dictionary or using any of the analysis functions built in to ILookIX. ILookIX facilitates the indexing of the entire media at the time of initial processing, all at once. This can also be done after processing. Indexing facilitates searching through files and archives, Windows Registry, e-mail lists, and unallocated space. This function is highly customizable via the setup option in order to optimize for searching or for creating a custom dictionary for password cracking. Sound indexing ensures speedy and accurate searching. Searching is the process of having ILookIX look through the evidence for a specific item, such as a string of text or an expression. An expression, in terms of searching, is a pattern used to structure data in a search, such as a credit card number or e-mail address. The Event Analysis tool ILookIX's Event Analysis tool provides the practitioner a graphical representation of events on the subject system, such as file creation, access, or modification; e-mails sent or received; and other events such as the modification of the Master File Table on an NTFS system. The application allows the practitioner to zoom in on any point on the graph to view more specific details. Clicking on any bar on the graph will return the view to the main ILookIX window and display the items from the date bar selected in the List Pane. This can be most helpful when analyzing events during specific periods. The Lead Analysis tool Lead Analysis is an interactive evidence model embedded in ILookIX that allows the practitioner to assimilate known facts into a graphic representation that directly links unseen objects. It provides the answers as the practitioner increases the detail of the design surface and brings into view specific relationships that could go unseen otherwise. The primary aim of Lead Analysis is to help discover links within the case data that may not be evident or intuitive and the practitioner may not be aware of directly or that the practitioner has little background knowledge of to help form relationships manually. Instead of finding and making note of various pieces of information, the analysis is presented as an easy-to-use link model. The complexity of the modeling is removed so that it gives the clearest possible method of discovery. The analysis is based on the current index database, so it is essential to index case data prior to initiating an analysis. Once a list of potential links has been generated, it is important to review them to see whether any are potentially relevant. Highlight any that are, and it will then be possible to look for words in the catalogues if they have been included. In the example scenario, the word divorce was located as it was known that Sarah was divorced from the owner of the computer (the initial suspect). By selecting any word by left-clicking on it once and clicking on the green arrow to link it to Sarah, as shown below, relationships can be uncovered that are not always clear during the first inspection of the data. Each of the stated facts becomes one starting lead on the canvas. If the nodes are related, it is easy to model that relationship by manually linking them together by selecting the first Lead Project to link, right-clicking, and selecting Add a New Port from the menu. This is then repeated for the second Lead Object the practitioner wants to link. By simply clicking on the new port of the selected object that needs to be linked from and dragging to the port of the Lead Object that it should be linked to, a line will appear linking the two together. It is then possible to iterate this process using each start node or discovered node until it is possible to make sense of the total case data. A simple relationship between suspects, locations and even concepts is illustrated in the following screenshot: ILookIX Lead Analysis discovering relationships between various entities ILookIX Lead Analysis discovering relationships between various entities Analyzing e-mail datasets Analyzing and selecting evidence from large e-mail datasets is a common task for the practitioner. ILookIX's embedded application E-mail Linkage Analysis is an interactive evidence model to help practitioners discover links between the correspondents within e-mail data. The analysis is presented as an easy-to-use link model; the complexity of the modeling is removed to provide the clearest possible method of discovery. The results of analysis are saved at the end of the modeling session for future editing. If there is a large amount of e-mail to process, this analysis generation may take a few minutes. Once the analysis is displayed, the user will see the e-mail linkage itself. It is then possible to see a line between correspondents indicating that they have a relationship of some type. Here in particular, line thickness indicates the frequency of traffic between two correspondents; therefore, thicker flow lines indicate more traffic. On the canvas, once the analysis is generated, the user may select any e-mail addressee node by left-clicking on it once. Creating the analysis is really simple, and one of the most immediately valuable resources this provides is group identification, as shown in the following screenshot. ILookIX will initiate a search for that addressee and list all e-mails where your selected addressee was a correspondent. Users may make their own connection lines by clicking on an addressee node point and dragging to another node point. Nodes can be deleted to allow linkage between smaller groups of individuals. The E-mail Linkage tool showing relationships of possible relevance to a case The E-mail Linkage tool showing relationships of possible relevance to a case The Volume Shadow Copy analysis tools Shadow volumes, also known as the Volume Snapshot Service (VSS), use a service that creates point-in-time copies of files. The service is built in to versions of Windows Vista, 7, 8, and 10 and is turned on by default. ILookIX can recover true copies of overwritten files from shadow volumes, as long as they resided on the volume at the time that the snapshot was created. VSS recovery is a method of recovering extant and deleted files from the volume snapshots available on the system. IlookIX, unlike any other forensic tool, is capable of reconstructing volume shadow copies, either differential or full, including deleted files and folders. In the test scenario, the tool recovered a total of 87,000 files, equating to conventional tool recovery rates. Using ILookIX's Xtreme File Recovery, some 337,000 files were recovered. The Maximal Full Volume Shadow Snapshot application recovered a total of 797,00 files. Using the differential process, 354,000 files were recovered, which filtered out 17,000 additional files for further analysis. This enabled the detection of e-mail messages and attachments and Windows Registry changes that would normally remain hidden. Summary This article described in detail the process of locating and selecting evidence in terms of a general process. It also further explained the nature of digital evidence and provided examples of its value in supporting a legal case. Various advanced analysis and recovery tools were demonstrated that show the reader how technology can speed up and make more efficient the location and selection processes. Some of these tools are not new but have been enhanced, while others are innovative and seek out evidence normally unavailable to the practitioner. Resources for Article: Further resources on this subject: Mobile Phone Forensics – A First Step into Android Forensics [article] Introduction to Mobile Forensics [article] BackTrack Forensics [article]
Read more
  • 0
  • 0
  • 17087

article-image-using-native-sdks-and-libraries-react-native
Emilio Rodriguez
07 Apr 2016
6 min read
Save for later

Using Native SDKs and Libraries in React Native

Emilio Rodriguez
07 Apr 2016
6 min read
When building an app in React Native we may end up needing to use third-party SDKs or libraries. Most of the time, these are only available in their native version, and, therefore, only accessible as Objective-C or Swift libraries in the case of iOS apps or as Java Classes for Android apps. Only in a few cases these libraries are written in JavaScript and even then, they may need pieces of functionality not available in React Native such as DOM access or Node.js specific functionality. In my experience, this is one of the main reasons driving developers and IT decision makers in general to run away from React Native when considering a mobile development framework for their production apps. The creators of React Native were fully aware of this potential pitfall and left a door open in the framework to make sure integrating third-party software was not only possible but also quick, powerful, and doable by any non-iOS/Android native developer (i.e. most of the React Native developers). As a JavaScript developer, having to write Objective-C or Java code may not be very appealing in the beginning, but once you realize the whole process of integrating a native SDK can take as little as eight lines of code split in two files (one header file and one implementation file), the fear quickly fades away and the feeling of being able to perform even the most complex task in a mobile app starts to take over. Suddenly, the whole power of iOS and Android can be at any React developer’s disposal. To better illustrate how to integrate a third-party SDK we will use one of the easiest to integrate payment providers: Paymill. If we take a look at their site, we notice that only iOS and Android SDKs are available for mobile payments. That should leave out every app written in React Native if it wasn’t for the ability of this framework to communicate with native modules. For the sake of convenience I will focus this article on the iOS module. Step 1: Create two native files for our bridge. We need to create an Objective-C class, which will serve as a bridge between our React code and Paymill’s native SDK. Normally, an Objective-C class is made out of two files, a .m and a .h, holding the module implementation and the header for this module respectively. To create the .h file we can right-click on our project’s main folder in XCode > New File > Header file. In our case, I will call this file PaymillBridge.h. For React Native to communicate with our bridge, we need to make it implement the RTCBridgeModule included in React Native. To do so, we only have to make sure our .h file looks like this: // PaymillBridge.h #import "RCTBridgeModule.h" @interface PaymillBridge : NSObject <RCTBridgeModule> @end We can follow a similar process to create the .m file: Right-click our project’s main folder in XCode > New File > Objective-C file. The module implementation file should include the RCT_EXPORT_MODULE macro (also provided in any React Native project): // PaymillBridge.m @implementation PaymillBridge RCT_EXPORT_MODULE(); @end A macro is just a predefined piece of functionality that can be imported just by calling it. This will make sure React is aware of this module and would make it available for importing in your app. Now we need to expose the method we need in order to use Paymill’s services from our JavaScript code. For this example we will be using Paymill’s method to generate a token representing a credit card based on a public key and some credit card details: generateTokenWithPublicKey. To do so, we need to use another macro provided by React Native: RCT_EXPORT_METHOD. // PaymillBridge.m @implementation PaymillBridge RCT_EXPORT_MODULE(); RCT_EXPORT_METHOD(generateTokenWithPublicKey: (NSString *)publicKey cardDetails:(NSDictionary *)cardDetails callback:(RCTResponseSenderBlock)callback) { //… Implement the call as described in the SDK’s documentation … callback(@[[NSNull null], token]); } @end In this step we will have to write some Objective-C but most likely it would be a very simple piece of code using the examples stated in the SDK’s documentation. One interesting point is how to send data from the native SDK to our React code. To do so you need to pass a callback as you can see I did as the last parameter of our exported method. Callbacks in React Native’s bridges have to be defined as RCTResponseSenderBlock. Once we do this, we can call this callback passing an array of parameters, which will be sent as parameters for our JavaScript function in React Native (in our case we decided to pass two parameters back: an error set to null following the error handling conventions of node.js, and the token generated by Paymill natively). Step 2: Call our bridge from our React Native code. Once the module is properly set up, React Native makes it available in our app just by importing it from our JavaScript code: // PaymentComponent.js var Paymill = require('react-native').NativeModules.PaymillBridge; Paymill.generateTokenWithPublicKey( '56s4ad6a5s4sd5a6', cardDetails, function(error, token){ console.log(token); }); NativeModules holds the list of modules we created implementing the RCTBridgeModule. React Native makes them available by the name we chose for our Objective-C class name (PaymillBridge in our example). Then, we can call any exported native method as a normal JavaScript method from our React Native Component or library. Going Even Further That should do it for any basic SDK, but React Native gives developers a lot more control on how to communicate with native modules. For example, we may want to force the module to be run in the main thread. For that we just need to add an extra method to our native module implementation: // PaymillBridge.m @implementation PaymillBridge //... - (dispatch_queue_t)methodQueue { return dispatch_get_main_queue(); } Just by adding this method to our PaymillBridge.m React Native will force all the functionality related to this module to be run on the main thread, which will be needed when running main-thread-only iOS API. And there is more: promises, exporting constants, sending events to JavaScript, etc. More complex functionality can be found in the official documentation of React Native; the topics covered on this article, however, should solve 80 percent of the cases when implementing most of the third-party SDKs. About the Author Emilio Rodriguez started working as a software engineer for Sun Microsystems in 2006. Since then, he has focused his efforts on building a number of mobile apps with React Native while contributing to the React Native project. These contributions helped his understand how deep and powerful this framework is.
Read more
  • 0
  • 2
  • 33600

article-image-how-use-currying-swift-fun-and-profit
Alexander Altman
06 Apr 2016
5 min read
Save for later

How to Use Currying in Swift for Fun and Profit

Alexander Altman
06 Apr 2016
5 min read
Swift takes inspiration from functional languages in a lot of its features, and one of those features is currying. The idea behind currying is relatively straightforward, and Apple has already taken the time to explain the basics of it in The Swift Programming Language. Nonetheless, there's a lot more to currying in Swift than what first meets the eye. What is currying? Let's say we have a function, f, which takes two parameters, a: Int and b: String, and returns a Bool: func f(a: Int, _ b: String) -> Bool { // … do somthing here … } Here, we're taking both a and b simultaneously as parameters to our function, but we don't have to do it that way! We can just as easily write this function to take just a as a parameter and then return another function that takes b as it's only parameter and returns the final result: func f(a: Int) -> ((String) -> Bool) { return { b in // … do somthing here … } } (I've added a few extra parentheses for clarity, but Swift is actually just fine if you write String -> Bool instead of ((String) -> Bool); the two notations mean exactly the same thing.) This formulation uses a closure, but you can also use a nested function for the exact same effect: func f(a: Int) -> ((String) -> Bool) { func g(b: String) -> Bool { // … do somthing here … } return g } Of course, Swift wouldn't be Swift without providing a convenient syntax for things like this, so there is even a third way to write the curried version of f, and it's (usually) preferred over either of the previous two: func f(a: Int)(_ b: String) -> Bool { // … do somthing here … } Any of these iterations of our curried function f can be called like this: let res: Bool = f(1)("hello") Which should look very similar to the way you would call the original uncurried f: let res: Bool = f(1, "hello") Currying isn't limited to just two parameters either; here's an example of a partially curried function of five parameters (taking them in groups of two, one, and two): func weirdAddition(x: Int, use useX: Bool)(_ y: Int)(_ z: Int, use useZ: Bool) -> Int { return (useX ? x : 0) + y + (useZ ? z : 0) } How is currying used in Swift? Believe it or not, Swift actually uses currying all over the place, even if you don't notice it. Probably, the most prominent example is that of instance methods, which are just curried type methods: // This: NSColor.blueColor().shadowWithLevel(1/3) // …is the same as this: NSColor.shadowWithLevel(NSColor.blueColor())(1/3) But, there's a much deeper implication of currying's availability in Swift: all functions secretly take only one parameter! How is this possible, you ask? It has to do with how Swift treats tuples. A function that “officially” takes, say, three parameters, actually only takes one parameter that happens to be a three-tuple. This is perhaps most visible when exploited via the higher-order collections method: func dotProduct(xVec: [Double], _ yVec: [Double]) -> Double { // Note that (this particular overload of) the `*` operator // has the signature `(Double, Double) -> Double`. return zip(xVec, yVec).map(*).reduce(0, combine: +) } It would seem that anything you can do with tuples, you can do with a function parameter list and vice versa; in fact, that is almost true. The four features of function parameter lists that don't carry over directly into tuples are the variadic, inout, defaulted, and @autoclosure parameters. You can, technically, form a variadic, inout, defaulted, or @autoclosure tuple type, but if you try to use it in any context other than as a function's parameter type, swiftc will give you an error. What you definitely can do with tuples is use named values, notwithstanding the unfortunate prohibition on single-element tuples in Swift (named or not). Apple provides some information on tuples with named elements in The Swift Programming Language; it also gives an example of one in the same book. It should be noted that the names given to tuple elements are somewhat ephemeral in that they can very easily be introduced, eliminated, and altered via implicit conversions. This applies regardless of whether the tuple type is that of a standalone value or of a function's parameter: // converting names in a function's parameter list func printBoth(first x: Int, second y: String) { print(x, y, separator: ", ") } let printTwo: (a: Int, b: String) -> Void = printBoth // converting names in a standalone tuple type // (for some reason, Swift dislikes assigning `firstAndSecond` // directly to `aAndB`, but going through `nameless` is fine) let firstAndSecond: (first: Int, second: String) = (first: 1, second: "hello") let nameless: (Int, String) = firstAndSecond let aAndB: (a: Int, b: String) = nameless Currying, with its connection to tuples, is a very powerful feature of Swift. Use it wherever it seems helpful, and the language will be more than happy to oblige. About the author Alexander Altman is a functional programming enthusiast who enjoys the mathematical and ergonomic aspects of programming language design. He's been working with Swift since the language's first public release, and he is one of the core contributors to the TypeLift project.
Read more
  • 0
  • 0
  • 12534
article-image-caching-symfony
Packt
05 Apr 2016
15 min read
Save for later

Caching in Symfony

Packt
05 Apr 2016
15 min read
In this article by Sohail Salehi, author of the book, Mastering Symfony, we are going to discuss performance improvement using cache. Caching is a vast subject and needs its own book to be covered properly. However, in our Symfony project, we are interested in two types of caches only: Application cache Database cache We will see what caching facilities are provided in Symfony by default and how we can use them. We are going to apply the caching techniques on some methods in our projects and watch the performance improvement. By the end of this article, you will have a firm understanding about the usage of HTTP cache headers in the application layer and caching libraries. (For more resources related to this topic, see here.) Definition of cache Cache is a temporary place that stores contents that can be served faster when they are needed. Considering that we already have a permanent place on disk to store our web contents (templates, codes, and database tables), cache sounds like a duplicate storage. That is exactly what they are. They are duplicates and we need them because, in return for consuming an extra space to store the same data, they provide a very fast response to some requests. So this is a very good trade-off between storage and performance. To give you an example about how good this deal can be, consider the following image. On the left side, we have a usual client/server request/response model and let's say the response latency is two seconds and there are only 100 users who hit the same content per hour: On the right side, however, we have a cache layer that sits between the client and server. What it does basically is receive the same request and pass it to the server. The server sends a response to the cache and, because this response is new to the cache, it will save a copy (duplicate) of the response and then pass it back to the client. The latency is 2 + 0.2 seconds. However, it doesn't add up, does it? The purpose of using cache was to improve the overall performance and reduce the latency. It has already added more delays to the cycle. With this result, how could it possibly be beneficial? The answer is in the following image: Now, with the response being cached, imagine the same request comes through. (We have about 100 requests/hour for the same content, remember?) This time, the cache layer looks into its space, finds the response, and sends it back to the client, without bothering the server. The latency is 0.2 seconds. Of course, these are only imaginary numbers and situations. However, in the simplest form, this is how cache works. It might not be very helpful on a low traffic website; however, when we are dealing with thousands of concurrent users on a high traffic website, then we can appreciate the value of caching. So, according to the previous images, we can define some terminology and use them in this article as we continue. In the first image, when a client asked for that page, it wasn't exited and the cache layer had to store a copy of its contents for the future references. This is called Cache Miss. However, in the second image, we already had a copy of the contents stored in the cache and we benefited from it. This is called Cache Hit. Characteristics of a good cache If you do a quick search, you will find that a good cache is defined as the one which misses only once. In other words, this cache miss happens only if the content has not been requested before. This feature is necessary but it is not sufficient. To clarify the situation a little bit, let's add two more terminology here. A cache can be in one of the following states: fresh (has the same contents as the original response) and stale (has the old response's contents that have now changed on the server). The important question here is for how long should a cache be kept? We have the power to define the freshness of a cache via a setting expiration period. We will see how to do this in the coming sections. However, just because we have this power doesn't mean that we are right about the content's freshness. Consider the situation shown in the following image: If we cache a content for a long time, cache miss won't happen again (which satisfies the preceding definition), but the content might lose its freshness according to the dynamic resources that might change on the server. To give you an example, nobody likes to read the news of three months ago when they open the BBC website. Now, we can modify the definition of a good cache as follows: A cache strategy is considered to be good if cache miss for the same content happens only once, while the cached contents are still fresh. This means that defining the cache expiry time won't be enough and we need another strategy to keep an eye on cache freshness. This happens via a cache validation strategy. When the server sends a response, we can set the validation rules on the basis of what really matters on the server side, and this way, we can keep the contents stored in the cache fresh, as shown in the following image. We will see how to do this in Symfony soon. Caches in a Symfony project In this article, we will focus on two types of caches: The gateway cache (which is called reverse proxy cache as well) and doctrine cache. As you might have guessed, the gateway cache deals with all of the HTTP cache headers. Symfony comes with a very strong gateway cache out of the box. All you need to do is just activate it in your front controller then start defining your cache expiration and validation strategies inside your controllers. That said, it does not mean that you are forced or restrained to use the Symfony cache only. If you prefer other reverse proxy cache libraries (that is, Varnish or Django), you are welcome to use them. The caching configurations in Symfony are transparent such that you don't need to change a single line inside your controllers when you change your caching libraries. Just modify your config.yml file and you will be good to go. However, we all know that caching is not for application layers and views only. Sometimes, we need to cache any database-related contents as well. For our Doctrine ORM, this includes metadata cache, query cache, and result cache. Doctrine comes with its own bundle to handle these types of caches and it uses a wide range of libraries (APC, Memcached, Redis, and so on) to do the job. Again, we don't need to install anything to use this cache bundle. If we have Doctrine installed already, all we need to do is configure something and then all the Doctrine caching power will be at our disposal. Putting these two caching types together, we will have a big picture to cache our Symfony project: As you can see in this image, we might have a problem with the final cached page. Imagine that we have a static page that might change once a week, and in this page, there are some blocks that might change on a daily or even hourly basis, as shown in the following image. The User dashboard in our project is a good example. Thus, if we set the expiration on the gateway cache to one week, we cannot reflect all of those rapid updates in our project and task controllers. To solve this problem, we can leverage from Edge Side Includes (ESI) inside Symfony. Basically, any part of the page that has been defined inside an ESI tag can tell its own cache story to the gateway cache. Thus, we can have multiple cache strategies living side by side inside a single page. With this solution, our big picture will look as follows: Thus, we are going to use the default Symfony and Doctrine caching features for application and model layers and you can also use some popular third-party bundles for more advanced settings. If you completely understand the caching principals, moving to other caching bundles would be like a breeze. Key players in the HTTP cache header Before diving into the Symfony application cache, let's familiarize ourselves with the elements that we need to handle in our cache strategies. To do so, open https://www.wikipedia.org/ in your browser and inspect any resource with the 304 response code and ponder on request/response headers inside the Network tab: Among the response elements, there are four cache headers that we are interested in the most: expires and cache-control, which will be used for an expiration model, and etag and last-modified, which will be used for a validation model. Apart from these cache headers, we can have variations of the same cache (compressed/uncompressed) via the Vary header and we can define a cache as private (accessible by a specific user) or public (accessible by everyone). Using the Symfony reverse proxy cache There is no complicated or lengthy procedure required to activate the Symfony's gateway cache. Just open the front controller and uncomment the following lines: // web/app.php <?php //... require_once __DIR__.'/../app/AppKernel.php'; //un comment this line require_once __DIR__.'/../app/AppCache.php'; $kernel = new AppKernel('prod', false); $kernel->loadClassCache(); // and this line $kernel = new AppCache($kernel); // ... ?> Now, the kernel is wrapped around the Application Cache layer, which means that any request coming from the client will pass through this layer first. Set the expiration for the dashboard page Log in to your project and click on the Request/Response section in the debug toolbar. Then, scroll down to Response Headers and check the contents: As you can see, only cache-control is sitting there with some default values among the cache headers that we are interested in. When you don't set any value for Cache-Control, Symfony considers the page contents as private to keep them safe. Now, let's go to the Dashboard controller and add some gateway cache settings to the indexAction() method: // src/AppBundle/Controller/DashboardController.php <?php namespace AppBundleController; use SymfonyBundleFrameworkBundleControllerController; use SymfonyComponentHttpFoundationResponse; class DashboardController extends Controller { public function indexAction() { $uId = $this->getUser()->getId(); $util = $this->get('mava_util'); $userProjects = $util->getUserProjects($uId); $currentTasks= $util->getUserTasks($uId, 'in progress'); $response = new Response(); $date = new DateTime('+2 days'); $response->setExpires($date); return $this->render( 'CoreBundle:Dashboard:index.html.twig', array( 'currentTasks' => $currentTasks, 'userProjects' => $userProjects ), $response ); } } You might have noticed that we didn't change the render() method. Instead, we added the response settings as the third parameter of this method. This is a good solution because now we can keep the current template structure and adding new settings won't require any other changes in the code. However, you might wonder what other options do we have? We can save the whole $this->render() method in a variable and assign a response setting to it as follows: // src/AppBundle/Controller/DashboardController.php <?php // ... $res = $this->render( 'AppBundle:Dashboard:index.html.twig', array( 'currentTasks' => $currentTasks, 'userProjects' => $userProjects ) ); $res->setExpires($date); return $res; ?> Still looks like a lot of hard work for a simple response header setting. So let me introduce a better option. We can use the @Cache annotation as follows: // src/AppBundle/Controller/DashboardController.php <?php namespace AppBundleController; use SymfonyBundleFrameworkBundleControllerController; use SensioBundleFrameworkExtraBundleConfigurationCache; class DashboardController extends Controller { /** * @Cache(expires="next Friday") */ public function indexAction() { $uId = $this->getUser()->getId(); $util = $this->get('mava_util'); $userProjects = $util->getUserProjects($uId); $currentTasks= $util->getUserTasks($uId, 'in progress'); return $this->render( 'AppBundle:Dashboard:index.html.twig', array( 'currentTasks' => $currentTasks, 'userProjects' => $userProjects )); } } Have you noticed that the response object is completely removed from the code? With an annotation, all response headers are sent internally, which helps keep the original code clean. Now that's what I call zero-fee maintenance. Let's check our response headers in Symfony's debug toolbar and see what it looks like: The good thing about the @Cache annotation is that they can be nested. Imagine you have a controller full of actions. You want all of them to have a shared maximum age of half an hour except one that is supposed to be private and should be expired in five minutes. This sounds like a lot of code if you going are to use the response objects directly, but with an annotation, it will be as simple as this: <?php //... /** * @Cache(smaxage="1800", public="true") */ class DashboardController extends Controller { public function firstAction() { //... } public function secondAction() { //... } /** * @Cache(expires="300", public="false") */ public function lastAction() { //... } } The annotation defined before the controller class will apply to every single action, unless we explicitly add a new annotation for an action. Validation strategy In the previous example, we set the expiry period very long. This means that if a new task is assigned to the user, it won't show up in his dashboard because of the wrong caching strategy. To fix this issue, we can validate the cache before using it. There are two ways for validation: We can check the content's date via the Last-Modified header: In this technique, we certify the freshness of a content via the time it has been modified. In other words, if we keep track of the dates and times of each change on a resource, then we can simply compare that date with cache's date and find out if it is still fresh. We can use the ETag header as a unique content signature: The other solution is to generate a unique string based on the contents and evaluate the cache's freshness based on its signature. We are going to try both of them in the Dashboard controller and see them in action. Using the right validation header is totally dependent on the current code. In some actions, calculating modified dates is way easier than creating a digital footprint, while in others, going through the date and time function might looks costly. Of course, there are situations where generating both headers are critical. So creating it is totally dependent on the code base and what you are going to achieve. As you can see, we have two entities in the indexAction() method and, considering the current code, generating the ETag header looks practical. So the validation header will look as follows: // src/AppBundle/Controller/DashboardController.php <?php //... class DashboardController extends Controller { /** * @Cache(ETag="userProjects ~ finishedTasks") */ public function indexAction() { //... } } The next time a request arrives, the cache layer looks into the ETag value in the controller, compares it with its own ETag, and calls the indexAction() method; only, there is a difference between these two. How to mix expiration and validation strategies Imagine that we want to keep the cache fresh for 10 minutes and simultaneously keep an eye on any changes over user projects or finished tasks. It is obvious that tasks won't finish every 10 minutes and it is far beyond reality to expect changes on project status during this period. So what we can do to make our caching strategy efficient is that we can combine Expiration and Validation together and apply them to the Dashboard Controller as follows: // src/CoreBundle/Controller/DashboardController.php <?php //... /** * @Cache(expires="600") */ class DashboardController extends Controller { /** * @Cache(ETag="userProjects ~ finishedTasks") */ public function indexAction() { //... } } Keep in mind that Expiration has a higher priority over Validation. In other words, the cache is fresh for 10 minutes, regardless of the validation status. So when you visit your dashboard for the first time, a new cache plus a 302 response (not modified) is generated automatically and you will hit cache for the next 10 minutes. However, what happens after 10 minutes is a little different. Now, the expiration status is not satisfying; thus, the HTTP flow falls into the validation phase and in case nothing happened to the finished tasks status or the your project status, then a new expiration period is generated and you hit the cache again. However, if there is any change in your tasks or project status, then you will hit the server to get the real response, and a new cache from response's contents, new expiration period, and new ETag are generated and stored in the cache layer for future references. Summary In this article, you learned about the basics of gateway and Doctrine caching. We saw how to set expiration and validation strategies using HTTP headers such as Cache-Control, Expires, Last-Modified, and ETag. You learned how to set public and private access levels for a cache and use an annotation to define cache rules in the controller. Resources for Article: Further resources on this subject: User Interaction and Email Automation in Symfony 1.3: Part1 [article] The Symfony Framework – Installation and Configuration [article] User Interaction and Email Automation in Symfony 1.3: Part2 [article]
Read more
  • 0
  • 0
  • 15194

article-image-proxmox-ve-fundamentals
Packt
04 Apr 2016
12 min read
Save for later

Proxmox VE Fundamentals

Packt
04 Apr 2016
12 min read
In this article written by Rik Goldman author of the book Learning Proxmox VE, we introduce to you Proxmox Virtual Environment (PVE) which is a mature, complete, well-supported, enterprise-class virtualization environment for servers. It is an open source tool—based in the Debian GNU/Linux distribution—that manages containers, virtual machines, storage, virtualized networks, and high-availability clustering through a well-designed, web-based interface or via the command-line interface. (For more resources related to this topic, see here.) Developers provided the first stable release of Proxmox VE in 2008; 4 years and eight point releases later, ZDNet's Ken Hess boldly, but quite sensibly, declared Proxmox VE as Proxmox: The Ultimate Hypervisor (http://www.zdnet.com/article/proxmox-the-ultimate-hypervisor/). Four years later, PVE is on version 4.1, in use by at least 90,000 hosts, and more than 500 commercial customers in 140 countries; the web-based administrative interface itself is translated into nineteen languages. This article will explore the fundamental technologies underlying PVE's hypervisor features: LXC, KVM, and QEMU. To do so, we will develop a working understanding of virtual machines, containers, and their appropriate use. We will cover the following topics: Proxmox VE in brief Virtualization and containerization with PVE Proxmox VE virtual machines, KVM, and QEMU Containerization with PVE and LXC Proxmox VE in brief With Proxmox VE, Proxmox Server Solutions GmbH (https://www.proxmox.com/en/about) provides us with an enterprise-ready, open source type II hypervisor. Later, you'll find some of the features that make Proxmox VE such a strong enterprise candidate. The license for Proxmox VE is very deliberately the GNU Affero General Public License (V3) (https://www.gnu.org/licenses/agpl-3.0.html). From among the many free and open source compatible licenses available, this is a significant choice because it is "specifically designed to ensure cooperation with the community in the case of network server software." PVE is primarily administered from an integrated web interface or from the command line locally or via SSH. Consequently, there is no need for a separate management server and the associated expenditure. In this way, Proxmox VE significantly contrasts with alternative enterprise virtualization solutions by vendors such as VMware. Proxmox VE instances/nodes can be incorporated into PVE clusters, and centrally administered from a unified web interface. Proxmox VE provides for live migration—the movement of a virtual machine or container from one cluster node to another without any disruption of services. This is a rather unique feature to PVE and not common to competing products. Features Proxmox VE VMware vSphere Hardware requirements Flexible Strict compliance with HCL Integrated management interface Web- and shell-based (browser and SSH) No. Requires dedicated management server at additional cost Simple subscription structure Yes; based on number of premium support tickets per year and CPU socket count No High availability Yes Yes VM live migration Yes Yes Supports containers Yes No Virtual machine OS support Windows and Linux Windows, Linux, and Unix Community support Yes No Live VM snapshots Yes Yes Contrasting Proxmox VE and VMware vSphere features For a complete catalog of features, see the Proxmox VE datasheet at https://www.proxmox.com/images/download/pve/docs/Proxmox-VE-Datasheet.pdf. Like its competitors, PVE is a hypervisor: a typical hypervisor is software that creates, runs, configures, and manages virtual machines based on an administrator or engineer's choices. PVE is known as a type II hypervisor because the virtualization layer is built upon an operating system. As a type II hypervisor, Proxmox VE is built on the Debian project. Debian is a GNU/Linux distribution renowned for its reliability, commitment to security, and its thriving and dedicated community of contributing developers. A type II hypervisor, such as PVE, runs directly over the operating system. In Proxmox VE's case, the operating system is Debian; since the release of PVE 4.0, the underlying operating system has been Debian "Jessie." By contrast, a type I hypervisor (such as VMware's ESXi) runs directly on bare metal without the mediation of an operating system. It has no additional function beyond managing virtualization and the physical hardware. A type I hypervisor runs directly on hardware, without the mediation of an operating system. As a type II hypervisor, Proxmox VE is built on the Debian project. Debian is a GNU/Linux distribution renowned for its reliability, commitment to security, and its thriving and dedicated community of contributing developers. Debian-based GNU/Linux distributions are arguably the most popular GNU/Linux distributions for the desktop. One characteristic that distinguishes Debian from competing distribution is its release policy: Debian releases only when its development community can stand behind it for its stability, security, and usability. Debian does not distinguish between long-term support releases and regular releases as do some other distributions. Instead, all Debian releases receive strong support and critical updates through the first year following the next release. (Since 2007, a major release of Debian has been made about every two years. Debian 8, Jessie, was released just about on schedule in 2015. Proxmox VE's reliance on Debian is thus a testament to its commitment to these values: stability, security, and usability over scheduled releases that favor cutting-edge features. PVE provides its virtualization functionality through three open technologies, and the efficiency with which they're integrated by its administrative web interface: LXC KVM QEMU To understand how this foundation serves Proxmox VE, we must first be able to clearly understand the relationship between virtualization (or, specifically, hardware virtualization) and containerization (OS virtualization). As we proceed, their respective use cases should become clear. Virtualization and containerization with Proxmox VE It is correct to ultimately understand containerization as a type of virtualization. However, here, we'll look first to conceptually distinguish a virtual machine from a container by focusing on contrasting characteristics. Simply put, virtualization is a technique through which we provide fully-functional, computing resources without a demand for resources' physical organization, locations, or relative proximity. Briefly put, virtualization technology allows you to share and allocate the resources of a physical computer into multiple execution environments. Without context, virtualization is a vague term, encapsulating the abstraction of such resources as storage, networks, servers, desktop environments, and even applications from their concrete hardware requirements through software implementation solutions called hypervisors. Virtualization thus affords us more flexibility, more functionality, and a significant positive impact on our budgets—often realized with merely the resources we have at hand. In terms of PVE, virtualization most commonly refers to the abstraction of all aspects of a discrete computing system from its hardware. In this context, virtualization is the creation, in other words, of a virtual machine or VM, with its own operating system and applications. A VM may be initially understood as a computer that has the same functionality as a physical machine. Likewise, it may be incorporated and communicated with via a network exactly as a machine with physical hardware would. Put yet another way, from inside a VM, we will experience no difference from which we can distinguish it from a physical computer. The virtual machine, moreover, hasn't the physical footprint of its physical counterparts. The hardware it relies on is, in fact, provided by software that borrows from the hardware resources from a host installed on a physical machine (or bare metal). Nevertheless, the software components of the virtual machine, from the applications to the operating system, are distinctly separated from those of the host machine. This advantage is realized when it comes to allocating physical space for resources. For example, we may have a PVE server running a web server, database server, firewall, and log management system—all as discrete virtual machines. Rather than consuming the physical space, resources, and labor of maintaining four physical machines, we simply make physical room for the single Proxmox VE server and configure an appropriate virtual LAN as necessary. In a white paper entitled Putting Server Virtualization to Work, AMD articulates well the benefits of virtualization to businesses and developers (https://www.amd.com/Documents/32951B_Virtual_WP.pdf): Top 5 business benefits of virtualization: Increases server utilization Improves service levels Streamlines manageability and security Decreases hardware costs Reduces facility costs The benefits of virtualization with a development and test environment: Lowers capital and space requirements. Lowers power and cooling costs Increases efficiencies through shorter test cycles Faster time-to-market To these benefits, let's add portability and encapsulation: the unique ability to migrate a live VM from one PVE host to another—without suffering a service outage. Proxmox VE makes the creation and control of virtual machines possible through the combined use of two free and open source technologies: Kernel-based Virtual Machine (or KVM) and Quick Emulator (QEMU). Used together, we refer to this integration of tools as KVM-QEMU. KVM KVM has been an integral part of the Linux kernel since February, 2007. This kernel module allows GNU/Linux users and administrators to take advantage of an architecture's hardware virtualization extensions; for our purposes, these extensions are AMD's AMD-V and Intel's VT-X for the x86_64 architecture. To really make the most of Proxmox VE's feature set, you'll therefore very much want to install on an x86_64 machine with a CPU with integrated virtualization extensions. For a full list of AMD and Intel processors supported by KVM, visit Intel at http://ark.intel.com/Products/VirtualizationTechnology or AMD at http://support.amd.com/en-us/kb-articles/Pages/GPU120AMDRVICPUsHyperVWin8.aspx. QEMU QEMU provides an emulation and virtualization interface that can be scripted or otherwise controlled by a user. Visualizing the relationship between KVM and QEMU Without Proxmox VE, we could essentially define the hardware, create a virtual disk, and start and stop a virtualized server from the command line using QEMU. Alternatively, we could rely on any one of an array of GUI frontends for QEMU (a list of GUIs available for various platforms can be found at http://wiki.qemu.org/Links#GUI_Front_Ends). Of course, working with these solutions is productive only if you're interested in what goes on behind the scenes in PVE when virtual machines are defined. Proxmox VE's management of virtual machines is itself managing QEMU through its API. Managing QEMU from the command line can be tedious. The following is a line from a script that launched Raspbian, a Debian remix intended for the architecture of the Raspberry Pi, on an x86 Intel machine running Ubuntu. When we see how easy it is to manage VMs from Proxmox VE's administrative interfaces, we'll sincerely appreciate that relative simplicity: qemu-system-arm -kernel kernel-qemu -cpu arm1176 -m 256 -M versatilepb -no-reboot -serial stdio -append "root=/dev/sda2 panic=1" -hda ./$raspbian_img -hdb swap If you're familiar with QEMU's emulation features, it's perhaps important to note that we can't manage emulation through the tools and features Proxmox VE provides—despite its reliance on QEMU. From a bash shell provided by Debian, it's possible. However, the emulation can't be controlled through PVE's administration and management interfaces. Containerization with Proxmox VE Containers are a class of virtual machines (as containerization has enjoyed a renaissance since 2005, the term OS virtualization has become synonymous with containerization and is often used for clarity). However, by way of contrast with VMs, containers share operating system components, such as libraries and binaries, with the host operating system; a virtual machine does not. Visually contrasting virtual machines with containers The container advantage This arrangement potentially allows a container to run leaner and with fewer hardware resources borrowed from the host. For many authors, pundits, and users, containers also offer a demonstrable advantage in terms of speed and efficiency. (However, it should be noted here that as resources such as RAM and more powerful CPUs become cheaper, this advantage will diminish.) The Proxmox VE container is made possible through LXC from version 4.0 (it's made possible through OpenVZ in previous PVE versions). LXC is the third fundamental technology serving Proxmox VE's ultimate interest. Like KVM and QEMU, LXC (or Linux Containers) is an open source technology. It allows a host to run, and an administrator to manage, multiple operating system instances as isolated containers on a single physical host. Conceptually then, a container very clearly represents a class of virtualization, rather than an opposing concept. Nevertheless, it's helpful to maintain a clear distinction between a virtual machine and a container as we come to terms with PVE. The ideal implementation of a Proxmox VE guest is contingent on our distinguishing and choosing between a virtual-machine solution and a container solution. Since Proxmox VE containers share components of the host operating system and can offer advantages in terms of efficiency, this text will guide you through the creation of containers whenever the intended guest can be fully realized with Debian Jessie as our hypervisor's operating system without sacrificing features. When our intent is a guest running a Microsoft Windows operating system, for example, a Proxmox VE container ceases to be a solution. In such a case, we turn, instead, to creating a virtual machine. We must rely on a VM precisely because the operating system components that Debian can share with a Linux container are not components a Microsoft Windows operating system can make use of. Summary In this article, we have come to terms with the three open source technologies that provide Proxmox VE's foundational features: containerization and virtualization with LXC, KVM, and QEMU. Along the way, we've come to understand that containers, while being a type of virtualization, have characteristics that distinguish them from virtual machines. These differences will be crucial as we determine which technology to rely on for a virtual server solution with Proxmox VE. Resources for Article: Further resources on this subject: Deploying App-V 5 in a Virtual Environment[article] Setting Up a Spark Virtual Environment[article] Basic Concepts of Proxmox Virtual Environment[article]
Read more
  • 0
  • 0
  • 32251

Packt
04 Apr 2016
20 min read
Save for later

Morphology – Getting Our Feet Wet

Packt
04 Apr 2016
20 min read
In this article by Deepti Chopra, Nisheeth Joshi, and Iti Mathur authors of the book Mastering Natural Language Processing with Python, morphology may be defined as the study of the composition of words using morphemes. A morpheme is the smallest unit of the language that has a meaning. In this article, we will discuss stemming and lemmatizing, creating a stemmer and lemmatizer for non-English languages, developing a morphological analyzer and morphological generator using machine learning tools, creating a search engine, and many other concepts. In brief, this article will include the following topics: Introducing morphology Creating a stemmer and lemmatizer Developing a stemmer for non-English languages Creating a morphological analyzer Creating a morphological generator Creating a search engine (For more resources related to this topic, see here.) Introducing morphology Morphology may be defined as the study of the production of tokens with the help of morphemes. A morpheme is the basic unit of language, which carries a meaning. There are two types of morphemes: stems and affixes (suffixes, prefixes, infixes, and circumfixes). Stems are also referred to as free morphemes since they can even exist without adding affixes. Affixes are referred to as bound morphemes since they cannot exist in a free form, and they always exist along with free morphemes. Consider the word "unbelievable". Here, "believe" is a stem or free morpheme. It can even exist on its own. The morphemes "un" and "able" are affixes or bound morphemes. They cannot exist in s free form but exist together with a stem. There are three kinds of languages, namely isolating languages, agglutinative languages, and inflecting languages. Morphology has different meanings in all these languages. Isolating languages are those languages in which words are merely free morphemes, and they do not carry any tense (past, present, and future) or number (singular or plural) information. Mandarin Chinese is an example of an isolating language. Agglutinative languages are those languages in which small words combine together to convey compound information. Turkish is an example of an agglutinative language. Inflecting languages are languages in which words are broken down into simpler units, but all these simpler units exhibit different meanings. Latin is an example of an inflecting language. There are morphological processes such as inflections, derivations, semi-affixes, combining forms, and cliticization. An inflection refers to transforming a word into a form so that it represents a person, number, tense, gender, case, aspect, and mood. Here, the syntactic category of the token remains the same. In derivation, the syntactic category of word is also changed. Semi-affixes are bound morphemes that exhibit a word-like quality, for example, noteworthy, antisocial, anticlockwise, and so on. Understanding stemmers Stemming may be defined as the process of obtaining a stem from a word by eliminating the affixes from it. For example, in the word "raining", a stemmer would return the root word or the stem word "rain" by removing the affix "ing" from "raining". In order to increase the accuracy of information retrieval, search engines mostly use stemming to get a stem and store it as an index word. Search engines call words with the same meaning synonyms, which may be a kind of query expansion known as conflation. Martin Porter has designed a well-known stemming algorithm known as the Porter Stemming Algorithm. This algorithm is basically designed to replace and eliminate some well-known suffices present in English words. To perform stemming in NLTK, we can simply perform the instantiation of the PorterStemmer class, and then perform stemming by calling the stem method. Let's take a look at the code for stemming using the PorterStemmer class in NLTK: >>> import nltk>>> from nltk.stem import PorterStemmer>>> stemmerporter = PorterStemmer()>>> stemmerporter.stem('working')'work'>>> stemmerporter.stem('happiness')'happi' The PorterStemmer class is trained and has the knowledge of many stems and word forms in the English language. The process of stemming takes place in a series of steps and transforms a word into a shorter word or this word may similar meaning to the root word. The stemmer I interface defines the stem() method, and all stemmers are inherited from this interface. The inheritance diagram is depicted here: Another Stemming algorithm, known as the Lancaster Stemming algorithm, was introduced in Lancaster University. Similar to the PorterStemmer class, the LancasterStemmer class is used in NLTK to implement Lancaster Stemming. Let's consider the following code, which depicts Lancaster stemming in NLTK: >>> import nltk >>> from nltk.stem import LancasterStemmer >>> stemmerlan=LancasterStemmer() >>> stemmerlan.stem('working') 'work' >>> stemmerlan.stem('happiness') 'happy' We can also build our own stemmer in NLTK using RegexpStemmer. This works by accepting a string and eliminates it from the prefix or suffix of a word when a match is found. Let's consider an example of stemming using RegexpStemmer in NLTK: >>> import nltk >>> from nltk.stem import RegexpStemmer >>> stemmerregexp=RegexpStemmer('ing') >>> stemmerregexp.stem('working') 'work' >>> stemmerregexp.stem('happiness') 'happiness' >>> stemmerregexp.stem('pairing') 'pair' We can use RegexpStemmer in cases where stemming cannot be performed using PorterStemmer and LancasterStemmer. The SnowballStemmer class is used to perform stemming in 13 languages other than English. In order to perform stemming using SnowballStemmer, firstly, an instance is created in the language where stemming needs to be performed, and then using the stem() method, stepping is performed. Consider the following example to perform stemming in Spanish and French in NLTK using SnowballStemmer: >>> import nltk >>> from nltk.stem import SnowballStemmer >>> SnowballStemmer.languages ('danish', 'dutch', 'english', 'finnish', 'french', 'german', 'hungarian', 'italian', 'norwegian', 'porter', 'portuguese', 'romanian', 'russian', 'spanish', 'swedish') >>> spanishstemmer=SnowballStemmer('spanish') >>> spanishstemmer.stem('comiendo') 'com' >>> frenchstemmer=SnowballStemmer('french') >>> frenchstemmer.stem('manger') 'mang' Nltk.stem.api consists of the stemmer I class in which the stem function is performed. Consider the following code present in NLTK, which enables stemming to be performed: Class StemmerI(object): """ It is an interface that helps to eliminate morphological affixes from the tokens and the process is known as stemming. """ def stem(self, token): """ Eliminate affixes from token and stem is returned. """ raise NotImplementedError() Here's the code used to perform stemming using multiple stemmers: >>> import nltk >>> from nltk.stem.porter import PorterStemmer >>> from nltk.stem.lancaster import LancasterStemmer >>> from nltk.stem import SnowballStemmer >>> def obtain_tokens(): With open('/home/p/NLTK/sample1.txt') as stem: tok = nltk.word_tokenize(stem.read()) return tokens >>> def stemming(filtered): stem=[] for x in filtered: stem.append(PorterStemmer().stem(x)) return stem >>> if_name_=="_main_": tok= obtain_tokens() >>> print("tokens is %s")%(tok) >>> stem_tokens= stemming(tok) >>> print("After stemming is %s")%stem_tokens >>> res=dict(zip(tok,stem_tokens)) >>> print("{tok:stemmed}=%s")%(result) Understanding lemmatization Lemmatization is the process in which we transform a word into a form that has a different word category. The word formed after lemmatization is entirely different from what it was initially. Consider an example of lemmatization in NLTK: >>> import nltk >>> from nltk.stem import WordNetLemmatizer >>> lemmatizer_output=WordNetLemmatizer() >>> lemmatizer_output.lemmatize('working') 'working' >>> lemmatizer_output.lemmatize('working',pos='v') 'work' >>> lemmatizer_output.lemmatize('works') 'work' WordNetLemmatizer may be defined as a wrapper around the so-called WordNet corpus, and it makes use of the morphy() function present in WordNetCorpusReader to extract a lemma. If no lemma is extracted, then the word is only returned in its original form. For example, for 'works', the lemma that is returned is in the singular form 'work'. This code snippet illustrates the difference between stemming and lemmatization: >>> import nltk >>> from nltk.stem import PorterStemmer >>> stemmer_output=PorterStemmer() >>> stemmer_output.stem('happiness') 'happi' >>> from nltk.stem import WordNetLemmatizer >>> lemmatizer_output.lemmatize('happiness') 'happiness' In the preceding code, 'happiness' is converted to 'happi' by stemming it. Lemmatization can't find the root word for 'happiness', so it returns the word "happiness". Developing a stemmer for non-English languages Polyglot is a software that is used to provide models called morfessor models, which are used to obtain morphemes from tokens. The Morpho project's goal is to create unsupervised data-driven processes. Its focuses on the creation of morphemes, which are the smallest units of syntax. Morphemes play an important role in natural language processing. They are useful in automatic recognition and the creation of language. With the help of the vocabulary dictionaries of polyglot, morfessor models on 50,000 tokens of different languages was used. Here's the code to obtain a language table using a polyglot: from polyglot.downloader import downloader print(downloader.supported_languages_table("morph2")) The output obtained from the preceding code is in the form of these languages listed as follows: 1. Piedmontese language 2. Lombard language 3. Gan Chinese 4. Sicilian 5. Scots 6. Kirghiz, Kyrgyz 7. Pashto, Pushto 8. Kurdish 9. Portuguese 10. Kannada 11. Korean 12. Khmer 13. Kazakh 14. Ilokano 15. Polish 16. Panjabi, Punjabi 17. Georgian 18. Chuvash 19. Alemannic 20. Czech 21. Welsh 22. Chechen 23. Catalan; Valencian 24. Northern Sami 25. Sanskrit (Saṁskṛta) 26. Slovene 27. Javanese 28. Slovak 29. Bosnian-Croatian-Serbian 30. Bavarian 31. Swedish 32. Swahili 33. Sundanese 34. Serbian 35. Albanian 36. Japanese 37. Western Frisian 38. French 39. Finnish 40. Upper Sorbian 41. Faroese 42. Persian 43. Sinhala, Sinhalese 44. Italian 45. Amharic 46. Aragonese 47. Volapük 48. Icelandic 49. Sakha 50. Afrikaans 51. Indonesian 52. Interlingua 53. Azerbaijani 54. Ido 55. Arabic 56. Assamese 57. Yoruba 58. Yiddish 59. Waray-Waray 60. Croatian 61. Hungarian 62. Haitian; Haitian Creole 63. Quechua 64. Armenian 65. Hebrew (modern) 66. Silesian 67. Hindi 68. Divehi; Dhivehi; Mald... 69. German 70. Danish 71. Occitan 72. Tagalog 73. Turkmen 74. Thai 75. Tajik 76. Greek, Modern 77. Telugu 78. Tamil 79. Oriya 80. Ossetian, Ossetic 81. Tatar 82. Turkish 83. Kapampangan 84. Venetian 85. Manx 86. Gujarati 87. Galician 88. Irish 89. Scottish Gaelic; Gaelic 90. Nepali 91. Cebuano 92. Zazaki 93. Walloon 94. Dutch 95. Norwegian 96. Norwegian Nynorsk 97. West Flemish 98. Chinese 99. Bosnian 100. Breton 101. Belarusian 102. Bulgarian 103. Bashkir 104. Egyptian Arabic 105. Tibetan Standard, Tib... 106. Bengali 107. Burmese 108. Romansh 109. Marathi (Marāthī) 110. Malay 111. Maltese 112. Russian 113. Macedonian 114. Malayalam 115. Mongolian 116. Malagasy 117. Vietnamese 118. Spanish; Castilian 119. Estonian 120. Basque 121. Bishnupriya Manipuri 122. Asturian 123. English 124. Esperanto 125. Luxembourgish, Letzeb... 126. Latin 127. Uighur, Uyghur 128. Ukrainian 129. Limburgish, Limburgan... 130. Latvian 131. Urdu 132. Lithuanian 133. Fiji Hindi 134. Uzbek 135. Romanian, Moldavian, ... The necessary models can be downloaded using the following code: %%bash polyglot download morph2.en morph2.ar [polyglot_data] Downloading package morph2.en to [polyglot_data] /home/rmyeid/polyglot_data... [polyglot_data] Package morph2.en is already up-to-date! [polyglot_data] Downloading package morph2.ar to [polyglot_data] /home/rmyeid/polyglot_data... [polyglot_data] Package morph2.ar is already up-to-date! Consider this example that obtains output from a polyglot: from polyglot.text import Text, Word tokens =["unconditional" ,"precooked", "impossible", "painful", "entered"] for s in tokens: s=Word(s, language="en") print("{:<20}{}".format(s,s.morphemes)) unconditional ['un','conditional'] precooked ['pre','cook','ed'] impossible ['im','possible'] painful ['pain','ful'] entered ['enter','ed'] If tokenization is not performed properly, then we can perform morphological analysis for the process of splitting text into its original constituents: sent="Ihopeyoufindthebookinteresting" para=Text(sent) para.language="en" para.morphemes WordList(['I','hope','you','find','the','book','interesting']) A morphological analyzers Morphological analysis may be defined as the process of obtaining grammatical information about a token given its suffix information. Morphological analysis can be performed in three ways: Morpheme-based morphology (or the item and arrangement approach), Lexeme-based morphology (or the item and process approach), and Word-based morphology (or the word and paradigm approach). A morphological analyzer may be defined as a program that is responsible for the analysis of the morphology of a given input token. It analyzes a given token and generates morphological information, such as gender, number, class, and so on, as an output. In order to perform morphological analysis on a given non-whitespace token, pyEnchant dictionary is used. Consider the following code that performs morphological analysis: >>> import enchant >>> s = enchant.Dict("en_US") >>> tok=[] >>> def tokenize(st1): if not st1:return for j in xrange(len(st1),-1,-1): if s.check(st1[0:j]): tok.append(st1[0:i]) st1=st[j:] tokenize(st1) break >>> tokenize("itismyfavouritebook") >>> tok ['it', 'is', 'my','favourite','book'] >>> tok=[ ] >>> tokenize("ihopeyoufindthebookinteresting") >>> tok ['i','hope','you','find','the','book','interesting'] We can determine the category of a word as follows: Morphological hints: Suffix information helps us to detect the category of a word. For example, -ness and –ment suffixes exist with nouns. Syntactic hints: Contextual information is conducive in determining the category of a word. For example, if we have found a word that has a noun category, then syntactic hints will be useful in determining whether an adjective will appear before the noun or after the noun category. Semantic hints: A semantic hint is also useful in determining the category of a word. For example, if we already know that a word represents the name of a location, then it will fall under the noun category. Open class: This refers to the class of words that are not fixed and each day, their number keeps on increasing whenever a new word is added to the list. Words in an open class are usually in the form of nouns. Prepositions are mostly a closed class. Morphology captured by the part of speech tagset: Part of Speech tagset capture information that helps us to perform morphology. For example, the word 'plays' would appear with the third person and singular noun. Omorfi (the open morphology of Finnish) is a package that has been licensed by version 3 of GNU GPL. It is used for the purpose of performing numerous tasks such as language modeling, morphological analysis, rule-based machine translations, information retrieval, statistical machine translations, morphological segmentation, ontologies, and spell checking and correction. A morphological generator A morphological generator is a program that performs the task of morphological generations. Morphological generation may be considered the opposite of morphological analysis. Here, given the description of a word in terms of its number, category, stem, and so on, the original word is retrieved. For example, if root = go, Part of Speech = verb, tense= present, and if it occurs along with a third person and singular subject, then the morphological generator would generate its surface form, that is, goes. There are many Python-based software that perform morphological analysis and generation. Some of them are as follows: ParaMorfo: This is used to perform the morphological generation and analysis of Spanish and Guarani nouns, adjectives, and verbs HornMorpho: This is used for the morphological generation and analysis of Oromo and Amharic nouns and verbs as well as Tigrinya verbs AntiMorfo: This is used for the morphological generation and analysis of Quechua adjectives, verbs, and nouns as well as Spanish verbs MorfoMelayu: This is used for the morphological analysis of Malay words Other examples of software that is used to perform morphological analysis and generation are as follows: Morph is a morphological generator and analyzer for the English language and the RASP system Morphy is a morphological generator, analyzer, and POS tagger for German Morphisto is a morphological generator and analyzer for German Morfette performs supervised learning (inflectional morphology) for Spanish and French Search engines PyStemmer 1.0.1 consists of Snowball stemming algorithms that are conducive for performing information retrieval tasks and the construction of a search engine. It consists of the Porter stemming algorithm and many other stemming algorithms that are useful for the purpose of performing stemming and information retrieval tasks in many languages, including many European languages. We can construct a vector space search engine by converting the texts into vectors. Here are the steps needed to construct a vector space search engine: Stemming and elimination of stop words. A stemmer is a program that accepts words and converts them into stems. Tokens that have same stem almost have the same meanings. Stop words are also eliminated from text. Consider the following code for the removal of stop words and tokenization: def eliminatestopwords(self,list): " " " Eliminate words which occur often and have not much significance from context point of view. " " " return[ word for word in list if word not in self.stopwords ] def tokenize(self,string): " " " Perform the task of splitting text into stop words and tokens " " " Str=self.clean(str) Words=str.split(" ") return [self.stemmer.stem(word,0,len(word)-1) for word in words] Mapping keywords into vector dimensions.Here's the code required to perform the mapping of keywords into vector dimensions: def obtainvectorkeywordindex(self, documentList): " " " In the document vectors, generate the keyword for the given position of element " " " #Perform mapping of text into strings vocabstring = " ".join(documentList) vocablist = self.parser.tokenise(vocabstring) #Eliminate common words that have no search significance vocablist = self.parser.eliminatestopwords(vocablist) uniqueVocablist = util.removeDuplicates(vocablist) vectorIndex={} offset=0 #Attach a position to keywords that performs mapping with dimension that is used to depict this token for word in uniqueVocablist: vectorIndex[word]=offset offset+=1 return vectorIndex #(keyword:position) Mapping of text strings to vectorsHere, a simple term count model is used. The code to convert text strings into vectors is as follows: def constructVector(self, wordString): # Initialise the vector with 0's Vector_val = [0] * len(self.vectorKeywordIndex) tokList = self.parser.tokenize(tokString) tokList = self.parser.eliminatestopwords(tokList) for word in toklist: vector[self.vectorKeywordIndex[word]] += 1; # simple Term Count Model is used return vector Searching similar documents By finding the cosine of an angle between the vectors of a document, we can prove whether two given documents are similar or not. If the cosine value is 1, then the angle value is 0 degrees and vectors are said to be parallel (this means that documents are related). If the cosine value is 0 and the value of the angle is 90 degrees, then vectors are said to be perpendicular (this means that documents are not related). This is the code to compute the cosine between the text vector using scipy: def cosine(vec1, vec2): """ cosine = ( X * Y ) / ||X|| x ||Y|| """ return float(dot(vec1,vec2) / (norm(vec1) * norm(vec2))) Search keywords We perform the mapping of keywords to a vector space. We construct a temporary text that represents items to be searched and then compare it with document vectors with the help of a cosine measurement. Here is the following code needed to search for the vector space: def searching(self,searchinglist): """ search for text that are matched on the basis of list of items """ askVector = self.buildQueryVector(searchinglist) ratings = [util.cosine(askVector, textVector) for textVector in self.documentVectors] ratings.sort(reverse=True) return ratings The following code can be used to detect languages from a source text: >>> import nltk >>> import sys >>> try: from nltk import wordpunct_tokenize from nltk.corpus import stopwords except ImportError: print( 'Error has occured') #---------------------------------------------------------------------- >>> def _calculate_languages_ratios(text): """ Compute probability of given document that can be written in different languages and give a dictionary that appears like {'german': 2, 'french': 4, 'english': 1} """ languages_ratios = {} ''' nltk.wordpunct_tokenize() splits all punctuations into separate tokens wordpunct_tokenize("I hope you like the book interesting .") [' I',' hope ','you ','like ','the ','book' ,'interesting ','.'] ''' tok = wordpunct_tokenize(text) wor = [word.lower() for word in tok] # Compute occurence of unique stopwords in a text for language in stopwords.fileids(): stopwords_set = set(stopwords.words(language)) words_set = set(words) common_elements = words_set.intersection(stopwords_set) languages_ratios[language] = len(common_elements) # language "score" return languages_ratios #---------------------------------------------------------------- >>> def detect_language(text): """ Compute the probability of given text that is written in different languages and obtain the one that is highest scored. It makes use of stopwords calculation approach, finds out unique stopwords present in a analyzed text. """ ratios = _calculate_languages_ratios(text) most_rated_language = max(ratios, key=ratios.get) return most_rated_language if __name__=='__main__': text = ''' All over this cosmos, most of the people believe that there is an invisible supreme power that is the creator and the runner of this world. Human being is supposed to be the most intelligent and loved creation by that power and that is being searched by human beings in different ways into different things. As a result people reveal His assumed form as per their own perceptions and beliefs. It has given birth to different religions and people are divided on the name of religion viz. Hindu, Muslim, Sikhs, Christian etc. People do not stop at this. They debate the superiority of one over the other and fight to establish their views. Shrewd people like politicians oppose and support them at their own convenience to divide them and control them. It has intensified to the extent that even parents of a new born baby teach it about religious differences and recommend their own religion superior to that of others and let the child learn to hate other people just because of religion. Jonathan Swift, an eighteenth century novelist, observes that we have just enough religion to make us hate, but not enough to make us love one another. The word 'religion' does not have a derogatory meaning - A literal meaning of religion is 'A personal or institutionalized system grounded in belief in a God or Gods and the activities connected with this'. At its basic level, 'religion is just a set of teachings that tells people how to lead a good life'. It has never been the purpose of religion to divide people into groups of isolated followers that cannot live in harmony together. No religion claims to teach intolerance or even instructs its believers to segregate a certain religious group or even take the fundamental rights of an individual solely based on their religious choices. It is also said that 'Majhab nhi sikhata aaps mai bair krna'. But this very majhab or religion takes a very heinous form when it is misused by the shrewd politicians and the fanatics e.g. in Ayodhya on 6th December, 1992 some right wing political parties and communal organizations incited the Hindus to demolish the 16th century Babri Masjid in the name of religion to polarize Hindus votes. Muslim fanatics in Bangladesh retaliated and destroyed a number of temples, assassinated innocent Hindus and raped Hindu girls who had nothing to do with the demolition of Babri Masjid. This very inhuman act has been presented by Taslima Nasrin, a Banglsdeshi Doctor-cum-Writer in her controversial novel 'Lajja' (1993) in which, she seems to utilizes fiction's mass emotional appeal, rather than its potential for nuance and universality. ''' >>> language = detect_language(text) >>> print(language) The preceding code will search for stop words and detect the language of the text, which is English. Summary In this article, we discussed stemming, lemmatization, and morphological analysis and generation. Resources for Article: Further resources on this subject: How is Python code organized[article] Machine learning and Python – the Dream Team[article] Putting the Fun in Functional Python[article]
Read more
  • 0
  • 0
  • 4569
article-image-how-get-started-redux-react-native
Emilio Rodriguez
04 Apr 2016
5 min read
Save for later

How To Get Started with Redux in React Native

Emilio Rodriguez
04 Apr 2016
5 min read
In mobile development there is a need for architectural frameworks, but complex frameworks designed to be used in web environments may end up damaging the development process or even the performance of our app. Because of this, some time ago I decided to introduce in all of my React Native projects the leanest framework I ever worked with: Redux. Redux is basically a state container for JavaScript apps. It is 100 percent library-agnostic so you can use it with React, Backbone, or any other view library. Moreover, it is really small and has no dependencies, which makes it an awesome tool for React Native projects. Step 1: Install Redux in your React Native project. Redux can be added as an npm dependency into your project. Just navigate to your project’s main folder and type: npm install --save react-redux By the time this article was written React Native was still depending on React Redux 3.1.0 since versions above depended on React 0.14, which is not 100 percent compatible with React Native. Because of this, you will need to force version 3.1.0 as the one to be dependent on in your project. Step 2: Set up a Redux-friendly folder structure. Of course, setting up the folder structure for your project is totally up to every developer but you need to take into account that you will need to maintain a number of actions, reducers, and components. Besides, it’s also useful to keep a separate folder for your API and utility functions so these won’t be mixing with your app’s core functionality. Having this in mind, this is my preferred folder structure under the src folder in any React Native project: Step 3: Create your first action. In this article we will be implementing a simple login functionality to illustrate how to integrate Redux inside React Native. A good point to start this implementation is the action, a basic function called from the component whenever we want the whole state of the app to be changed (i.e. changing from the logged out state into the logged in state). To keep this example as concise as possible we won’t be doing any API calls to a backend – only the pure Redux integration will be explained. Our action creator is a simple function returning an object (the action itself) with a type attribute expressing what happened with the app. No business logic should be placed here; our action creators should be really plain and descriptive. Step 4: Create your first reducer. Reducers are the ones in charge of updating the state of the app. Unlike in Flux, Redux only has one store for the whole app, but it will be conveniently name-spaced automatically by Redux once the reducers have been applied. In our example, the user reducer needs to be aware of when the user is logged in. Because of that, it needs to import the LOGIN_SUCCESS constant we defined in our actions before and export a default function, which will be called by Redux every time an action occurs in the app. Redux will automatically pass the current state of the app and the action occurred. It’s up to the reducer to realize if it needs to modify the state or not based on the action.type. That’s why almost every time our reducer will be a function containing a switch statement, which modifies and returns the state based on what action occurred. It’s important to state that Redux works with object references to identify when the state is changed. Because of this, the state should be cloned before any modification. It’s also interesting to know that the action passed to the reducers can contain other attributes apart from type. For example, when doing a more complex login, the user first name and last name can be added to the action by the action created and used by the reducer to update the state of the app. Step 5: Create your component. This step is almost pure React Native coding. We need a component to trigger the action and to respond to the change of state in the app. In our case it will be a simple View containing a button that disappears when logged in. This is a normal React Native component except for some pieces of the Redux boilerplate: The three import lines at the top will require everything we need from Redux ‘mapStateToProps’ and ‘mapDispatchToProps’ are two functions bound with ‘connect’ to the component: this makes Redux know that this component needs to be passed a piece of the state (everything under ‘userReducers’) and all the actions available in the app. Just by doing this, we will have access to the login action (as it is used in the onLoginButtonPress) and to the state of the app (as it is used in the !this.props.user.loggedIn statement) Step 6: Glue it all from your index.ios.js. For Redux to apply its magic, some initialization should be done in the main file of your React Native project (index.ios.js). This is pure boilerplate and only done once: Redux needs to inject a store holding the app state into the app. To do so, it requires a ‘Provider’ wrapping the whole app. This store is basically a combination of reducers. For this article we only need one reducer, but a full app will include many others and each of them should be passed into the combineReducers function to be taken into account by Redux whenever an action is triggered. About the Author Emilio Rodriguez started working as a software engineer for Sun Microsystems in 2006. Since then, he has focused his efforts on building a number of mobile apps with React Native while contributing to the React Native project. These contributions helped his understand how deep and powerful this framework is.
Read more
  • 0
  • 0
  • 44376

article-image-testing-and-debugging-distributed-applications
Packt
01 Apr 2016
21 min read
Save for later

Testing and Debugging Distributed Applications

Packt
01 Apr 2016
21 min read
In this article, by Francesco Pierfederici author of the book Distributed Computing with Python, the author likes to state that, "distributed systems, both large and small, can be extremely challenging to test and debug, as they are spread over a network, run on computers that can be quite different from each other, and might even be physically located in different continents altogether". Moreover, the computers we use could have different user accounts, different disks with different software packages, different hardware resources, and very uneven performance. Some can even be in a different time zone. Developers of distributed systems need to consider all these pieces of information when trying to foresee failure conditions. Operators have to work around all of these challenges when debugging errors. (For more resources related to this topic, see here.) The big picture Testing and debugging monolithic applications is not simple, as every developer knows. However, there are a number of tools that dramatically make the task easier, including the pdb debugger, various profilers (notable mentions include cProfile and line_profile), linters, static code analysis tools, and a host of test frameworks, a number of which have been included in the standard library of Python 3.3 and higher. The challenge with distributed applications is that most of the tools and packages that we can use for single-process applications lose much of their power when dealing with multiple processes, especially when these processes run on different computers. Debugging and profiling distributed applications written in C, C++, and Fortran can be done with tools such as Intel VTune, Allinea MAP, and DDT. Unfortunately, Python developers are left with very few or no options for the time being. Writing small- or medium-sized distributed systems is not terribly hard, as we saw in the pages so far. The main difference between writing monolithic programs and distributed applications is the large number of interdependent components running on remote hardware. This is what makes monitoring and debugging distributed code harder and less convenient. Fortunately, we can still use all familiar debuggers and code analysis tools on our Python distributed applications. Unfortunately, these tools will only go so far to the point that we will have to rely on old-fashioned logging and print statements to get the full picture on what went wrong. Common problems – clocks and time Time is a handy variable for use. For instance, using timestamps is very natural when we want to join different streams of data, sort database records, and in general, reconstruct the timeline for a series of events, which we often times observe are out of order. In addition, some tools (for example, GNU make) rely solely on file modification time and are easily confused by a clock skew between machines. For these reasons, clock synchronization among all computers and systems we use is very important. If our computers are in different time zones, we might want to not only synchronize their clocks but also set them to Coordinated Universal Time (UTC) for simplicity. In all the cases, when changing clocks to UTC is not possible, a good advice is to always process time in UTC within our code and to only covert local time for display purposes. In general, clock synchronization in distributed systems is a fascinating and complex topic, and it is out of the scope of this article. Most readers, luckily, are likely to be well served by the Network Time Protocol (NTP), which is a perfectly fine synchronization solution for most situations. Most modern operating systems, including Windows, Mac OS X, and Linux, have great support for NTP. Another thing to consider when talking about time is the timing of periodic actions, such as polling loops or cronjobs. Many applications need to spawn processes or perform actions (for example, sending a confirmation e-mail or checking whether new data is available) at regular intervals. A common pattern is to set up timers (either in our code or via the tools provided by the OS) and have all these timers go off at some time, usually at a specific hour and at regular intervals after that. The risk of this approach is that we can overload the system the very moment all these processes wake up and start their work. A surprisingly common example would be starting a significant number of processes that all need to read some configuration or data file from a shared disk. In these cases, everything goes fine until the number of processes becomes so large that the shared disk cannot handle the data transfer, thus slowing our application to a crawl. A common solution is the staggering of these timers in order to spread them out over a longer time interval. In general, since we do not always control all code that we use, it is good practice to start our timers at some random number of minutes past the hour, just to be safe. Another example of this situation would be an image-processing service that periodically polls a set of directories looking for new data. When new images are found, they are copied to a staging area, renamed, scaled, and potentially converted to a common format before being archived for later use. If we're not careful, it would be easy to overload the system if many images were to be uploaded at once. A better approach would be to throttle our application (maybe using a queue-based architecture) so that it would only start an appropriately small number of image processors so as to not flood the system. Common problems – software environments Another common challenge is making sure that the software installed on all the various machines we are ever going to use is consistent and consistently upgraded. Unfortunately, it is frustratingly common to spend hours debugging a distributed application only to discover that for some unknown and seemingly impossible reason, some computers had an old version of the code and/or its dependencies. Sometimes, we might even find the code to have disappeared completely. The reasons for these discrepancies can be many: from a mount point that failed to a bug in our deployment procedures to a simple human mistake. A common approach, especially in the HPC world, is to always create a self-contained environment for our code before launching the application itself. Some projects go as far as preferring static linking of all dependencies to avoid having the runtime pick up the wrong version of a dynamic library. This approach works well if the application runtime is long compared to the time it takes to build its full environment, all of its software dependencies, and the application itself. It is not that practical otherwise. Python, fortunately, has the ability to create self-contained virtual environments. There are two related tools that we can use: pyvenv (available as part of the Python 3.5 standard library) and virtualenv (available in PyPI). Additionally, pip, the Python package management system, allows us to specify the exact version of each package we want to install in a requirements file. These tools, when used together, permit reasonable control on the execution environment. However, the devil, as it is often said, is in the details, and different computer nodes might use the exact same Python virtual environment but incompatible versions of some external library. In this respect, container technologies such as Docker (https://www.docker.com) and, in general, version-controlled virtual machines are promising ways out of the software runtime environment maelstrom in those environments where they can be used. In all other cases, HPC clusters come to mind, the best approach will probably be to not rely on the system software and manage our own environments and the full-software stack. Common problems – permissions and environments Different computers might have run our code under different user accounts, and our application might expect to be able to read a file or write data into a specific directory and hit an unexpected permission error. Even in cases where the user accounts used by our code are all the same (down to the same user ID and group ID), their environment may be different on different hosts. Therefore, an environment variable we assumed to be defined might not be or, even worse, might be set to an incompatible value. These problems are common when our code runs as a special, unprivileged user such as nobody. Defensive coding, especially when accessing the environment, and making sure to always fall back to sensible defaults when variables are undefined (that is, value = os.environ.get('SOME_VAR', fallback_value) instead of simply value = os.environ.get['SOME_VAR']) is often necessary. A common approach, when this is possible, is to only run our applications under a specific user account that we control and specify the full set of environment variables our code needs in the deployment and application startup scripts (which will have to be version controlled as well). Some systems, however, not only execute jobs under extremely limited user accounts, but they also restrict code execution to temporary sandboxes. In many cases, access to the outside network is also blocked. In these situations, one might have no other choice but to set up the full environment locally and copy it to a shared disk partition. Other data can be served from custom-build servers running as ancillary jobs just for this purpose. In general, permission problems and user environment mismatches are very similar to problems with the software environment and should be tackled in concert. Often times, developers find themselves wanting to isolate their code from the system as much as possible and create a small, but self-contained environment with all the code and all the environment variables they need. Common problems – the availability of hardware resources The hardware resources that our application needs might or might not be available at any given point in time. Moreover, even if some resources were to be available at some point in time, nothing guarantees that they will stay available for much longer. A problems we can face related to this are network glitches, which are quite common in many environments (especially for mobile apps) and which, for most practical purposes, are undistinguishable from machine or application crashes. Applications using a distributed computing framework or job scheduler can often rely on the framework itself to handle at least some common failure scenarios. Some job schedulers will even resubmit our jobs in case of errors or sudden machine unavailability. Complex applications, however, might need to implement their own strategies to deal with hardware failures. In some cases, the best strategy is to simply restart the application when the necessary resources are available again. Other times, restarting from scratch would be cost prohibitive. In these cases, a common approach is to implement application checkpointing. What this means is that the application both writes its state to a disk periodically and is able to bootstrap itself from a previously saved state. In implementing a checkpointing strategy, you need to balance the convenience of being able to restart an application midway with the performance hit of writing a state to a disk. Another consideration is the increase in code complexity, especially when many processes or threads are involved in reading and writing state information. A good rule of thumb is that data or results that can be recreated easily and quickly do not warrant implementation of application checkpointing. If, on the other hand, some processing requires a significant amount of time and one cannot afford to waste it, then application checkpointing might be in order. For example, climate simulations can easily run for several weeks or months at a time. In these cases, it is important to checkpoint them every hour or so, as restarting from the beginning after a crash would be expensive. On the other hand, a process that takes an uploaded image and creates a thumbnail for, say, a web gallery runs quickly and is not normally worth checkpointing. To be safe, a state should always be written and updated automatically (for example, by writing to a temporary file and replacing the original only after the write completes successfully). The last thing we want is to restart from a corrupted state! Very familiar to HPC users as well as users of AWS, a spot instance is a situation where a fraction or the entirety of the processes of our application are evicted from the machines that they are running on. When this happens, a warning is typically sent to our processes (usually, a SIGQUIT signal) and after a few seconds, they are unceremoniously killed (via a SIGKILL signal). For AWS spot instances, the time of termination is available through a web service in the instance metadata. In either case, our applications are given some time to save the state and quit in an orderly fashion. Python has powerful facilities to catch and handle signals (refer to the signal module). For example, the following simple commands shows how we can implement a bare-bones checkpointing strategy in our application: #!/usr/bin/env python3.5 """ Simple example showing how to catch signals in Python """ import json import os import signal import sys     # Path to the file we use to store state. Note that we assume # $HOME to be defined, which is far from being an obvious # assumption! STATE_FILE = os.path.join(os.environ['HOME'],                                '.checkpoint.json')     class Checkpointer:     def __init__(self, state_path=STATE_FILE):         """         Read the state file, if present, and initialize from that.         """         self.state = {}         self.state_path = state_path         if os.path.exists(self.state_path):             with open(self.state_path) as f:                 self.state.update(json.load(f))         return       def save(self):         print('Saving state: {}'.format(self.state))         with open(self.state_path, 'w') as f:             json.dump(self.state, f)         return       def eviction_handler(self, signum, frame):         """         This is the function that gets called when a signal is trapped.         """         self.save()           # Of course, using sys.exit is a bit brutal. We can do better.         print('Quitting')         sys.exit(0)         return     if __name__ == '__main__':     import time       print('This is process {}'.format(os.getpid()))       ckp = Checkpointer()     print('Initial state: {}'.format(ckp.state))       # Catch SIGQUIT     signal.signal(signal.SIGQUIT, ckp.eviction_handler)       # Get a value from the state.     i = ckp.state.get('i', 0)     try:         while True:             i += 1             ckp.state['i'] = i             print('Updated in-memory state: {}'.format(ckp.state))             time.sleep(1)     except KeyboardInterrupt:         ckp.save() If we run the preceding script in a terminal window and then in another terminal window, we send it a SIGQUIT signal (for example, via kill -s SIGQUIT <process id>). After this, we see the checkpointing in action, as the following screenshot illustrates: A common situation in distributed applications is that of being forced to run code in potentially heterogeneous environments: machines (real or virtual) of different performances, with different hardware resources (for example, with or without GPUs), and potentially different software environments (as we mentioned already). Even in the presence of a job scheduler, to help us choose the right software and hardware environment, we should always log the full environment as well as the performance of each execution machine. In advanced architectures, these performance metrics can be used to improve the efficiency of job scheduling. PBS Pro, for instance, takes into consideration the historical performance figures of each job being submitted to decide where to execute it next. HTCondor continuously benchmarks each machine and makes those figures available for node selection and ranking. Perhaps, the most frustrating cases are where either due to the network itself or due to servers being overloaded, network requests take so long that our code hits its internal timeouts. This might lead us to believe that the counterpart service is not available. These bugs, especially when transient, can be quite hard to debug. Challenges – the development environment Another common challenge in distributed systems is the setup of a representative development and testing environment, especially for individuals or small teams. Ideally, in fact, the development environment should be identical to the worst-case scenario deployment environment. It should allow developers to test common failure scenarios, such as a disk filling up, varying network latencies, intermittent network connections, hardware and software failures, and so on—all things that are bound to happen in real time, sooner or later. Large teams have the resources to set up development and test clusters, and they almost always have dedicated software quality teams stress testing our code. Small teams, unfortunately, often find themselves forced to write code on their laptops and use a very simplified (and best-case scenario!) environment made up by two or three virtual machines running on the laptops themselves to emulate the real system. This pragmatic solution works and is definitely better than nothing. However, we should remember that virtual machines running on the same host exhibit unrealistically high-availability and low-network latencies. In addition, nobody will accidentally upgrade them without us knowing or image them with the wrong operating system. The environment is simply too controlled and stable to be realistic. A step closer to a realistic setup would be to create a small development cluster on, say, AWS using the same VM images, with the same software stack and user accounts that we are going to use in production. All things said, there is simply no replacement for the real thing. For cloud-based applications, it is worth our while to at least test our code on a smaller version of the deployment setup. For HPC applications, we should be using either a test cluster, a partition of the operational cluster, or a test queue for development and testing. Ideally, we would develop on an exact clone of the operational system. Cost consideration and ease of development will constantly push us to the multiple-VMs-on-a-laptop solution; it is simple, essentially free, and it works without an Internet connection, which is an important point. We should, however, keep in mind that distributed applications are not impossibly hard to write; they just have more failure modes than their monolithic counterparts do. Some of these failure modes (especially those related to data access patterns) typically require a careful choice of architecture. Correcting architectural choices dictated by false assumptions later on in the development stage can be costly. Convincing managers to give us the hardware resources that we need early on is usually difficult. In the end, this is a delicate balancing act. A useful strategy – logging everything Often times, logging is like taking backups or eating vegetables—we all know we should do it, but most of us forget. In distributed applications, we simply have no other choice—logging is essential. Not only that, logging everything is essential. With many different processes running on potentially ephemeral remote resources at difficult-to-predict times, the only way to understand what happens is to have logging information and have it readily available and in an easily searchable format/system. At the bare minimum, we should log process startup and exit time, exit code and exceptions (if any), all input arguments, all outputs, the full execution environment, the name and IP of the execution host, the current working directory, the user account as well as the full application configuration, and all software versions. The idea is that if something goes wrong, we should be able to use this information to log onto the same machine (if still available), go to the same directory, and reproduce exactly what our code was doing. Of course, being able to exactly reproduce the execution environment might simply not be possible (often times, because it requires administrator privileges). However, we should always aim to be able to recreate a good approximation of that environment. This is where job schedulers really shine; they allow us to choose a specific machine and specify the full job environment, which makes replicating failures easier. Logging software versions (not only the version of the Python interpreter, but also the version of all the packages used) helps diagnose outdated software stacks on remote machines. The Python package manager, pip, makes getting the list of installed packages easy: import pip; pip.main(['list']). Whereas, import sys; print(sys.executable, sys.version_info) displays the location and version of the interpreter. It is also useful to create a system whereby all our classes and function calls emit logging messages with the same level detail and at the same points in the object life cycle. Common approaches involve the use of decorators and, maybe a bit too esoteric for some, metaclasses. This is exactly what the autologging Python module (available on PyPI) does for us. Once logging is in place, we face the questions where to store all these logging messages and whose traffic could be substantial for high verbosity levels in large applications. Simple installations will probably want to write log messages to text files on a disk. More complex applications might want to store these messages in a database (which can be done by creating a custom handler for the Python logging module) or in specialized log aggregators such as Sentry (https://getsentry.com). Closely related to logging is the issue of monitoring. Distributed applications can have many moving parts, and it is often essential to know which machines are up, which are busy, as well as which processes or jobs are currently running, waiting, or in an error state. Knowing which processes are taking longer than usual to complete their work is often an important warning sign that something might be wrong. Several monitoring solutions for Python (often times, integrated with our logging system) exist. The Celery project, for instance, recommends flower (http://flower.readthedocs.org) as a monitoring and control web application. HPC job schedulers, on the other hand, tend to lack common, general-purpose, monitoring solutions that go beyond simple command-line clients. Monitoring comes in handy in discovering potential problems before they become serious. It is in fact useful to monitor resources such as available disk space and trigger actions or even simple warning e-mails when they fall under a given threshold. Many centers monitor hardware performance and hard drive SMART data to detect early signs of potential problems. These issues are more likely to be of interest to operations personnel rather than developers, but they are useful to keep in mind. They can also be integrated in our applications to implement strategies in order to handle performance degradations gracefully. A useful strategy – simulating components A good, although possibly expensive in terms of time and effort, test strategy is to simulate some or all of the components of our system. The reasons are multiple; on one hand, simulating or mocking software components allows us to test our interfaces to them more directly. In this respect, mock testing libraries, such as unittest.mock (part of the Python 3.5 standard library), are truly useful. Another reason to simulate software components is to make them fail or misbehave on demand and see how our application responds. For instance, we could increase the response time of services such as REST APIs or databases to worst-case scenario levels and see what happens. Sometimes, we might exceed timeout values in some network calls leading our application to incorrectly assume that the sever has crashed. Especially early on in the design and development of a complex distributed application, one can make overly optimistic assumptions about things such as network availability and performance or response time of services such as databases or web servers. For this reason, having the ability to either completely bring a service offline or, more subtly, modify its behavior can tell us a lot about which of the assumptions in our code might be overly optimistic. The Netflix Chaos Monkey (https://github.com/Netflix/SimianArmy) approach of disabling random components of our system to see how our application copes with failures can be quite useful. Summary Writing and running small- or medium-sized distributed applications in Python is not hard. There are many high-quality frameworks that we can leverage among others, for example, Celery, Pyro, various job schedulers, Twisted, MPI bindings, or the multiprocessing module in the standard library. The real difficulty, however, lies in monitoring and debugging our applications, especially because a large fraction of our code runs concurrently on many different, often remote, computers. The most insidious bugs are those that end up producing incorrect results (for example, because of data becoming corrupted along the way) rather than raising an exception, which most frameworks are able to catch and bubble up. The monitoring and debugging tools that we can use with Python code are, sadly, not as sophisticated as the frameworks and libraries we use to develop that same code. The consequence is that large teams end up developing their own, often times, very specialized distributed debugging systems from scratch and small teams mostly rely on log messages and print statements. More work is needed in the area of debuggers for distributed applications in general and for dynamic languages such as Python in particular. Resources for Article: Further resources on this subject: Python Data Structures [article] Python LDAP applications - extra LDAP operations and the LDAP URL library [article] Machine Learning Tasks [article]
Read more
  • 0
  • 0
  • 3629
Modal Close icon
Modal Close icon