Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Events
Videos
Audiobooks
Packt Hub
Free Learning
Arrow right icon
timer SALE ENDS IN
0 Days
:
00 Hours
:
00 Minutes
:
00 Seconds

How-To Tutorials

7018 Articles
article-image-real-time-aggregation-streaming-data-using-spark-streaming-and-kafka
Anant Asthana
11 Apr 2016
10 min read
Save for later

Real-Time Aggregation on Streaming Data Using Spark Streaming and Kafka

Anant Asthana
11 Apr 2016
10 min read
This post goes over doing a few aggregations on streaming data using Spark Streaming and Kafka. We will be setting up a local environment for the purpose of the tutorial. If you have Spark and Kafka running on a cluster, you can skip the getting setup steps. The Challenge of Stream Computations Computations on streams can be challenging due to multiple reasons, including the size of a dataset. Certain metrics such as quantiles need to iterate over the entire dataset in a sorted order using standard formulae/ practices and they may not be the most suited approach, for example, mean = sum of value/ count. For a streaming dataset, this is not fully scalable. Instead, suppose we store the sum and count and each new item is added to the sum. For every new item, we increment the count, and whenever we need the average, we divide the sum by the count. Then we get the mean at that instance. Calculating Percentile Percentile requires finding the location of an item in a large dataset; for example, 90th percentile would mean the value that is over 90 percent of the values in a sorted dataset. To illustrate, in [9, 1, 8, 7, 6, 5, 2, 4, 3, 0], the 80th percentile would be 8. This means we need to sort the dataset and then find an item by its location. This clearly is not scalable. Scaling this operation involves using an algorithm called tdigest. This is a way of approximatingpercentile at scale. tdigest creates digests that create centroids at positions that are approximated at the appropriate quantiles. These digests can be added to get a complete digest that can be used to estimate the quantiles of the whole dataset. Spark allows us to do computations on partitions of data, unlike traditional Map Reduce. So we calculate the digests for every partition and add them in the reduce phase to get a complete digest. This is the only time we need to converge that data at one point (reduce operation). We then use Spark's broadcast feature to broadcast the value. This value is then used for filtering the dataset to leave us an RDD matching our criteria (top 5 percentile). We then use mapPartitions to send the values of each partition to Kafka (this could be any message handler, post, and so on). Nature of the Data We are using fictitious data. It contains two columns: user_id, and activity_type. We are going to compute popular users. The activity can be of the following types: profile.picture.like, profile.view, and message.private. Each of these activities will have a different score. Metrics We Would Like to Compute We would like to compute the most popular users, that is, the top 5 percentile of users (score and list of users). Prerequisites You must have Docker, Python 2.7, and JRE 1.7 installed, as well as Scala and basic familiarity with Spark and the concept of RDDs. Getting Setup with Kafka Download the Kafka container. For the purpose of this tutorial, we will run Kafka as a Docker container. The container can be run with Mac: docker run - p 2181 : 2181 - p 9092 : 9092 -- env ADVERTISED_HOST = `boot2docker ip` -- env ADVERTISED_PORT = 9092 spotify / kafka Linux (Docker installed directly on the machine): docker run - p 2181 : 2181 - p 9092 : 9092 -- env ADVERTISED_HOST = `127.0.0.1` -- env ADVERTISED_PORT = 9092 spotify / kafka More information about the container can be found at here. This should get you started with running a Kafka instance that we will be using for this tutorial. We also download the Kafka binaries locally to test the Kafka consumer, create topics, and so on. Kafka binaries can be found at here. Download and extract the latest version. The directory containing the Kafka binaries will be referred to as $KAFKA_HOME. Getting Setup with Spark The next step is to install Spark. We have two options to run spark: Run it on a Docker container Run it locally Running Spark Locally Download Spark binaries from here: wget http : //apache.claz.org/spark/spark-1.4.0/spark-1.4.0-bin-hadoop2.6.tgz Extract the binaries: tar - xvf spark - 1.4 . 0 - bin - hadoop2 . 6.tgz Run pyspark shell using cd spark - 1.4 . 0 - bin - hadoop2 . 6 ./bin/pyspark If you have IPython installed, you can also use IPython with pyspark by using the following line: IPYTHON=1 ./bin/pyspark Running Spark as a Docker container docker run - i - t - h - p 8888 : 8888 - v my_code : /app sandbox anantasty/ ubuntu_spark_ipython : 1.0 bash This will mount a directory named my_code on your local system to the /app directory on the Docker container. The Spark shell starts with the Spark Context available as sc and the HiveContext available as the following: sqlContext Here is a simple Spark job for testing the installation: sc . parallelize ( range ( 1 , 100 )) sc . parallelize ( range ( 1 , 100 )) res = rdd . map ( lambda v : v * v ). reduce ( lambda x , y : x + y) print res This job just calculates the sum of squares of the first 1000 integers. Spark Streaming Basics Spark streaming is an extension of the core Spark API. It can be used to process high-throughput, fault-tolerant data streams. These data streams can be nested from various sources, such as ZeroMQ, Flume, Twitter, Kafka, and so on. Spark Streaming breaks the data into small batches, and these batches are then processed by Spark to generate the stream of results, again in batches. The code abstraction from this is called DStream, which represents a continuous stream of data. A DStream is a sequence of RDDs loaded incrementally. More information on Spark Streaming can be found in the Spark Streaming Programming guide. Kafka Basics Kafka is a publish-subscribe messaging system. It is distributed, partitioned, and replicated. Terminology: A category of feeds is called a topic; for example, weather data from two different stations could be different topics. The publishers are called Producers. The subscribers of these topics are called Consumers. The Kafka cluster has one or more servers each of which is called a broker. More details can be found at here. Generating Mock Data We can generate data in two ways: Statically generated data Continuous data generation We can use statically generated data to generate a dataset and use that in our Kafka producers. We could use the following method to generate random data: from numpy . random import random_integers MESSAGE_TYPES = [ 'profile.picture.like' , 'profile.view' , 'message.private'] START_INDEX = 1000 NUM_USERS = 100000 NUM_ROWS = 10000000 def generate_data ( start_index = START_INDEX , num_users = NUM_USERS, num_rows = NUM_ROWS ): users = random_integers ( start_index , start_index + num_users , num_rows) activity = random_integers ( 0 , len ( MESSAGE_TYPES ) - 1 , num_rows) activity_name = [ MESSAGE_TYPES [ i ] for i in activity] user_activity = zip ( users , activity_name) return user_activity We can also generate data on the fly using this code: import random MESSAGE_TYPES = [ 'profile.picture.like' , 'profile.view' , 'message.private'] START_INDEX = 1000 END_INDEX = 101000 def gen_random_message ( start_index = START_INDEX , end_index = END_INDEX ): return ( random . randint ( start_index , end_index ), random . choice ( MESSAGE_TYPES )) The full source code can be found at the GitHub repo. Now we can start the producer and use the following line: .$KAFKA_HOME/bin/ kafka - console - consumer . sh -- zookeeper 127.0 . 0.1 : 2181 -- topic messages We can see the Kafka messages being printed to the console. At this point, we have our producer ready. Aggregation and Processing Using Spark Streaming This process can be broken down into the following steps: Reading the message from the Kafka queue. Decoding the message. Converting the message type text to its numeric score. Updating the score counts for incoming data. Filtering for the most popular users. Reading Messages from the Kafka Queue Reading messages in pyspark is possible using the KafkaUtils module to create a stream from a Kafka queue. kvs = KafkaUtils . createDirectStream ( ssc , [ "messages" ], { "metadata.broker.list" : "localhost:9092" }) Load the message and convert the type text to key. This is done by using Python’s built-in json module and returning a tuple of the relevant values. If you notice, we used this: scores_b . value [ message [ 'activity' ]] Here, scores is a dictionary that maps the message type text to a numeric value. We then broadcast this dictionary out to all the nodes as score_b, using the following lines: scores = { 'profile.picture.like' : 2 , 'profile.view' : 1 , 'message.private' : 3} scores_b = sc . broadcast ( scores) Next, we access the dictionary using scores_b.value, which returns us the original dictionary. Spark uses a bit torrent style broadcast, where the master broadcasts the value to a few nodes and the other nodes replicate this value from those nodes. def load_msg ( msg ): message = json . loads ( msg [ 1 ]) return message [ 'user_id' ], scores_b . value [ message [ 'activity' ]] Now we count incoming messages and update the score count. For this step, we use the updateStateByKey function on the DStream. The updateStateByKey function returns a new DStream by applying the provided function to the previous state of the DStream and the new values. This function operates somewhat similarly to a reduce function. The function provided to updateStateByKey has the accumulated value from the previous operations and the new value, and we can aggregate or combine these in our function that we provide. We also have to note that the first value is used as the key by default, so in this case the userId is the key, which is ideal. The score is the value. def update_scorecount ( new_scores , score_sum ): if not score_sum: score_sum = 0 return sum ( new_scores ) + score_sum Now we can filter the most popular users. We compute the desired percentile and filter based on it. To calculate the percentile, we use the tdigest algorithm. This algorithm allows us to estimate the percentile value in a single pass and thus is very useful and efficient for streaming data. The orignal tdigest repo from Ted Dunning can be found at here. An open source Python implementation of this algorithm was used and it can be found at here. We create a digest_partitions function that takes values from a given partition and adds them to the digest. In the reduce step, these digests are added to provide a final digest that can provide us the percentile value. We then broadcast this percentile value, which we later use in our filter. We could have also performed the computation of the digest within the filter_most_popular function, but this way we can easily add some form of output such as a Kafka producer to publish the percentile value, if needed. def digest_partitions ( values ): digest = TDigest () digest . batch_update ( values) return [ digest] def compute_percentile ( rdd ): global percentile_broadcast percentile_limit = rdd . map ( lambda row : row [ 1 ]). mapPartitions( digest_partitions ). reduce ( add ). percentile ( args_broadcast . value . limit) percentile_broadcast = rdd . context . broadcast( percentile_limit) def filter_most_popular ( rdd ): global percentile_broadcast if percentile_broadcast: return rdd . filter ( lambda row : row [ 1 ] > percentile_broadcast . value) return rdd . context . parallelize ([]) This filtered RDD can now be broadcast using Kafka. To broadcast the values, we used a Keyed producer and key on the timestamp. We use the foreachPartition function to publish each partition of the RDD instead of publishing each value at once to avoid the overhead of creating a huge number of network connections to Kafka. def publish_popular_users ( popular_rdd ): key = 'popular_{}' . format ( int ( time ())) message_key = popular_rdd . context . broadcast ( key) def publish_partition ( partition ): kafka = KafkaClient ( args_broadcast . value . kafka_hosts) producer = KeyedProducer ( kafka , partitioner = RoundRobinPartitioner, async = True , batch_send = True) producer . send_messages ( 'popular_users' , message_key . value, *[ json . dumps ( user ) for user in partition ]) popular_rdd . foreachPartition ( publish_partition) The complete code can be found here. The code can be run using: SPARK_HOME/spark/bin/sparksubmit master local jars SPARK_KAFKA_JARS/target/sparkstreamingkafkaassembly_2.101.5.0SNAPSHOT.jar --executorcores 8 streaming_percentile.py About the Author Anant Asthana is a principal consultant and data scientist at Pythian.  He is also an avid outdoorsman and is very passionate about open source software.
Read more
  • 0
  • 0
  • 20880

article-image-special-effects
Packt
11 Apr 2016
16 min read
Save for later

Special Effects

Packt
11 Apr 2016
16 min read
In this article by Maciej Szczesnik, author of Unity 5.x Animation Cookbook, we will cover the following recipes: Creating camera shakes with the Animation View and Animator Controller Using the Animation View to animate public script variables Using additive Mecanim layers of add extra motion to a character Using Blend Shapes to morph an object into another one (For more resources related to this topic, see here.) Introduction This one is all about encouraging you to experiment with Unity's animation system. In the next ten recipes, we will create interesting effects and use animations in new, creative ways. Using Animation Events to trigger sound and visual effects This recipe shows a simple, generic way of playing different sound and visual effects with Animation Events. Getting ready To start with, you need to have a character with one, looped animation—Jump. We also need a sound effect and a particle system. We will need a transparent DustParticle.png texture for the particle system. It should resemble a small dust cloud. In the Rigs directory, you will find all the animations you need, and in the Resources folder, you'll find other required assets. When you play the game, you will see a character using the Jump animation. It will also play a sound effect and a particle effect while landing. How to do it... To play sound and visual effects with Animation Events, follow these steps: Import the character with the Jump animation. In the Import Settings, Animation tab, select the Jump animation. Make it loop. Go to the Events section. Scrub through the timeline in the Preview section, and click on Add Event Button. The Edit Animation Event window will appear.Edit Animation Event window Type Sound in the Function field and Jump in the String field. This will call a Sound function in a script attached to the character and pass the Jump word as a string parameter to it. Create another Animation Event. Set the Function field to Effect and the String field to Dust. Apply Import Settings. Create Animator Controller for the character with just the Jump animation in it. Place the character in a scene. Attach the controller to the Animator component of the character. Attach an Audio Source component to the character. Uncheck the Play On Awake option. Create an empty Game Object and name it Dust. Add a Particle System component to it. This will be our dust effect. Set the Particle System parameters as follows: Duration to 1 second. Start Life Time to 0,5 seconds.      Start Speed to 0,4.      Start Size to random between two constants: 1 and 2.      Start Color to a light brown.      Emission | Rate to 0.      Emission | Bursts to one burst with time set to 0, min and max set to 5.      Shape | Shape to Sphere.      Shape | Radius to 0.2.     For Color Over Lifetime, create a gradient for the alpha channel. In the 0% mark and 100% mark, it should be set to 0. In the 10% and 90% mark, it should be set to 255. Create a new Material and set the shader by navigating to Particles | Alpha Blended. Drag and drop a transparent texture of DustParticle.png into the Texture field of Material. Drag and drop Material by navigating to the Renderer | Material slot of our Dust Particle System. Create a Resources folder in the project's structure. Unity can load assets from the Resources folder in runtime without the need of referencing them as prefabs. Drag and drop the Jump.ogg sound and Dust Game Object into the Resources folder. Write a new script and name it TriggerEffects.cs. This script has two public void functions. Both are called from the Jump animation as Animation Events. In the first function, we load an Audio Clip from the Resources folder. We set the Audio Clip name in the Animation Event itself as a string parameter (it was set to Jump). When we successfully load the Audio Clip, we play it using the Audio Source component, reference to which we store in the source variable. We also randomize the pitch of the Audio Source to have a little variation when playing the Jump.ogg sound. public void Sound (string soundResourceName) { AudioClip clip = (AudioClip) Resources.Load(soundResourceName); if (clip != null) { source.pitch = Random.Range(0.9f, 1.2f); source.PlayOneShot(clip); } } In the second function, we try to load a prefab with the name specified as the function's parameter. We also set this name in the Animation Event (it was set to Dust). If we manage to load the prefab, we instantiate it, creating the dust effect under our character's feet. public void Effect (string effectResourceName) { GameObject effectResource = (GameObject)Resources.Load(effectResourceName); if (effectResource != null) { GameObject.Instantiate(effectResource, transform.position, Quaternion.identity); } } Assign the script to our character and play the game to see the effect. How it works... We are using one important feature of Animation Events in this recipe: the possibility of a passing string, int, or float parameter to our script's functions. This way, we can create one function to play all the sound effects associated with our character and pass clip names as string parameters from the Animation Events. The same concept is used to spawn the Dust effect. The Resources folder is needed to get any resource (prefab, texture, audio clip, and so on.) with the Resources.Load(string path) function. This method is convenient in order to load assets using their names. There's more... Our Dust effect has the AutoDestroy.cs script attached to make it disappear after a certain period of time. You can find that script in the Shared Scripts folder in the provided Unity project example. Creating camera shakes with the Animation View and the Animator Controller In this recipe, we will use a simple but very effective method to create camera shakes. These effects are often used to emphasize impacts or explosions in our games. Getting ready... You don't need anything special for this recipe. We will create everything from scratch in Unity. You can also download the provided example. When you open the Example.scene scene and play the game, you can press Space to see a simple camera shake effect. How to do it... To create a camera shake effect, follow these steps: Create an empty Game Object in Scene View and name it CameraRig. Parent Main Camera to CameraRig. Select Main Camera and add an Animator component to it. Open Animation View. Create a new Animation Clip and call it CamNormal. The camera should have no motion in this clip. Add keys for both the camera's position and its rotation. Create another Animation Clip and call it CameraShake. Animate the camera's rotation and position it to create a shake effect. The animation should be for about 0.5 seconds. Open the automatically created Main Camera controller. Add a Shake Trigger parameter. Create two transitions:      Navigate to CamNormal | CameraShake with this condition: Shake the Trigger parameter, Has Exit Time is set to false, and Transition Duration is set to 0.2 seconds.      Navigate to CameraShake | CamNormal with no conditions, Has Exit Time is set to true, and Transition Duration is set to 0.2 seconds. Write a new script and call it CamShake.cs. In this script's Update() function, we check whether the player pressed the Space key. If so, we trigger the Shake Trigger in our controller. if (Input.GetKeyDown(KeyCode.Space)) { anim.SetTrigger("Shake"); } As always, the anim variable holds the reference to the Animator component and is set in the Start() function with the GetComponent<Animator>() method. Assign the script to Main Camera. Play the game and press Space to see the effect. How it works... In this recipe, we've animated the camera's position and rotation relative to the CameraRig object. This way, we can still move CameraRig (or attach it to a character). Our CameraShake animation affects only the local position and rotation of the camera. In the script, we simply call the Shake Trigger to play the CameraShake animation once. There's more... You can create more sophisticated camera shake effects with Blend Trees. To do so, prepare several shake animations of different strengths and blend them in a Blend Tree using a Strengthfloat parameter. This way, you will be able to set the shake's strength, depending on different situations in the game (the distance from an explosion, for instance). Using the Animation View to animate public script variables In Unity, we can animate public script variables. The most standard types are supported. We can use this to achieve interesting effects that are not possible to achieve directly. For instance, we can animate the fog's color and density, which is not directly accessible through the Animation View. Getting ready... In this recipe, everything will be created from scratch, so you don't need to prepare any special assets. You can find the Example.scene scene there. If you open it and press Space, you can observe the fog changing color and density. This is achieved by animating the public variables of a script. Animated fog How to do it... To animate public script variables, follow these steps: Create a new script and call it FogAnimator.cs. Create two public variables in this script: public float fogDensity and public Color fogColor. In the script's Update() function, we call the o Trigger in the controller when the player presses Space. We also set the RenderSettings.fogColor and RenderSettings.fogDensity parameters using our public variables. We also adjust the main camera's background color to match the fog color. if (Input.GetKeyDown(KeyCode.Space)) { anim.SetTrigger("ChangeFog"); } RenderSettings.fogColor = fogColor; RenderSettings.fogDensity = fogDensity; Camera.main.backgroundColor = fogColor; Create a new Game Object and name it FogAnimator. Attach the FogAnimator.cs script to it. Select the FogAnimator game object and add an Animator component to it. Open the Animation View. Create a new Animation Clip. Make sure Record Button is pressed. Create an animation for the public float fogDensity and public Color fogColor parameters by changing their values. You can create any number of animations and connect them in the automatically created Animator Controller with transitions based on the ChangeFog Trigger (you need to add this parameter to the controller first). Here's an example controller:An example controller for different fog animations Remember that you don't need to create animations of the fog changing its color or density. You can rely on blending between animations in the controller. All you need to have is one key for the density and one for the color in each animation. In this example, all Transition Durations are set to 1 second, and every transition's Has Exit Time parameter is set to false. Make sure that the fog is enabled in the Lighting settings. Play the game and press the Space button to see the effect. How it works... Normally, we can't animate the fog's color or density using the Animation View. But we can do this easily with a script that sets the RenderSettings.fogColor and RenderSettings.fogDensity parameters in every frame. We use animations to change the script's public variables values in time. This way, we've created a workaround in order to animate fog in Unity. We've just scratched the surface of what's possible in terms of animating public script variables. Try experimenting with them to achieve awesome effects. Using additive Mecanim layers to add extra motion to a character In previous recipes, we used Mecanim layers in the override mode. We can set a layer to be additive. This can add additional movement to our base layer animations. Getting ready... We will need a character with three animations—Idle, TiredReference, and Tired. The first animation is a normal, stationary idle. The second animation has no motion and is used as a reference pose to calculate the additive motion from the third animation. TiredReference can be the first frame of the Tired animation. In the Tired animation, we can see our character breathing heavily. You will find the same Humanoid character there. If you play the game and press Space, our character will start breathing heavily while still using the Idle animation. You can find all the required animations in the Rigs directory. How to do it... To use additive layers, follow these steps: Import the character into Unity and place it in a scene. Go to the Animation tab in Import Settings. Find the TiredReference animation and check the Additive Reference Pose option (you can also use the normal Tired animation and specify the frame in the Pose Frame field). Loop the Idle and Tired animations. Create a new Animator Controller. Drag and drop the Idle animation into the controller and make it the default state. Find the Layers tab in upper-left corner of the Animator window. Select it and click on the Plus button below to add a new layer. Name the newly created layer Tired. Click on the Gear icon and set the Blending to Additive. Take a look at this diagram for reference:                                                                                                      Additive layer settings Drag and drop the Tired animation to the newly created layer. Assign the controller to our character. Create a new script and call it Tired.cs. In this script's Update() function, we set the weight of the Tired layer when the player presses Space. The Tired layer has an index of 1. We use a weightTarget helper variable to set the new weight to 0 or 1, depending on its current value. This allows us to switch the additive layer on and off every time the player presses Space. Finally, we interpolate the weight value in time to make the transition more smooth, and we set weight of our additive layer with the SetLayerWeight() function. if (Input.GetKeyDown(KeyCode.Space)) { if (weightTarget < 0.5f) { weightTarget = 1f; } else if (weightTarget > 0.5f) { weightTarget = 0f; } } weight = Mathf.Lerp(weight, weightTarget, Time.deltaTime * tiredLerpSpeed); anim.SetLayerWeight(1, weight); Attach the script to the Humanoid character. Play the game and press Space to see the additive animation effect. How it works... Additive animations are calculated using the reference pose. Movements relative to this pose are then added to other animations. This way, we can not only override the base layer with other layers but also modify base movements by adding a secondary motion. Try experimenting with different additive animations. You can, for instance, make your character bend, aim, or change its overall body pose. Using Blend Shapes to morph an object into another one Previously, we used Blend Shapes to create face expressions. This is also an excellent tool for special effects. In this recipe, we will morph one object into another. Getting ready... To follow this recipe, we need to prepare an object with Blend Shapes. We've created a really simple example in Blender—a subdivided cube with one shape key that looks like a sphere. Take a look at this screenshot for reference: A cube with a Blend Shape that turns it into a sphere You will see a number of cubes there. If you hit the Space key in play mode, the cubes will morph into spheres. You can find the Cuboid.fbx asset with the required Blend Shapes in the Model directory. How to do it... To use Blend Shapes to morph objects, follow these steps: Import the model with at least one Blend Shape to Unity. You may need to go to the Import Settings | Model tab and choose Import BlendShapes. Place the model in Scene. Create a new script and call it ChangeShape.cs. This script is similar to the one from the previous recipe. In the Update() function, we change the weight of the of the first Blend Shape when player presses Space. Again, we use a helper variable weightTarget to set the new weight to 0 or 100, depending on its current value. Blend Shapes have weights from 0 to 100 instead of 1. Finally, we interpolate the weight value in time to make the transition smoother. We use the SetBlendShapeWeight() function on the skinnedRenderer object. This variable is set in the Start() function with the GetComponent<SkinnedMeshRenderer>() function. if (Input.GetKeyDown(KeyCode.Space)) { if (weightTarget < 50f) { weightTarget = 100f; } else if (weightTarget > 50f) { weightTarget = 0f; } } weight = Mathf.Lerp(weight, weightTarget, Time.deltaTime * blendShapeLerpSpeed); skinnedRenderer.SetBlendShapeWeight(0, weight); Attach the script to the model on the scene. Play the game and press Space to see the model morph. How it works... Blend Shapes store vertices position of a mesh. We have to create them in a 3D package. Unity imports Blend Shapes and we can modify their weights in runtime using the SetBlendShapeWeight() function on the Skinned Mesh Renderer component. Blend Shapes have trouble with storing normals. If we import normals from our model it may look weird after morphing. Sometimes setting the Normals option to Calculate in the Import Settings can helps with the problem. If we choose this option Unity will calculate normals based on the angle between faces of our model. This allowed us to morph a hard surface cube into a smooth sphere in this example. Summary This article covers some basic recipes which can be performed using Unity. It also covers basic concept of of using Animation Layer, Mecanim layer and creating Camera shakes Resources for Article: Further resources on this subject: Animation features in Unity 5[article] Saying Hello to Unity and Android[article] Learning NGUI for Unity[article]
Read more
  • 0
  • 0
  • 13288

article-image-mastering-fundamentals
Packt
08 Apr 2016
10 min read
Save for later

Mastering of Fundamentals

Packt
08 Apr 2016
10 min read
In this article by Piotr Sikora, author of the book Professional CSS3, you will master box model, floating's troubleshooting positioning and display types. Readers, after this article, will be more aware of the foundation of HTML and CSS. In this article, we shall cover the following topics: Get knowledge about the traditional box model Basics of floating elements The foundation of positioning elements on webpage Get knowledge about display types (For more resources related to this topic, see here.) Traditional box model Understanding box model is the foundation in CSS theories. You have to know the impact of width, height, margin, and borders on the size of the box and how can you manage it to match the element on a website. Main questions for coders and frontend developers on interviews are based on box model theories. Let's begin this important lesson, which will be the foundation for every subject. Padding/margin/border/width/height The ingredients of final width and height of the box are: Width Height Margins Paddings Borders For a better understanding of box model, here is the image from Chrome inspector: For a clear and better understanding of box model, let's analyze the image: On the image, you can see that, in the box model, we have four edges: Content edge Padding edge Border edge Margin edge The width and height of the box are based on: Width/height of content Padding Border Margin The width and height of the content in box with default box-sizing is controlled by properties: Min-width Max-width Width Min-height Max-height Height An important thing about box model is how background properties will behave. Background will be included in the content section and in the padding section (to padding edge). Let's get a code and try to point all the elements of the box model. HTML: <div class="element">   Lorem ipsum dolor sit amet consecteur </div> CSS: .element {    background: pink;    padding: 10px;    margin: 20px;   width: 100px;   height: 100px;    border: solid 10px black; }   In the browser, we will see the following: This is the view from the inspector of Google Chrome: Let's check how the areas of box model are placed in this specific example: The basic task for interviewed Front End Developer is—the box/element is described with the styles: .box {     width: 100px;     height: 200px;     border: 10px solid #000;     margin: 20px;     padding: 30px; } Please count the final width and height (the real space that is needed for this element) of this element. So, as you can see, the problem is to count the width and height of the box. Ingridients of width: Width Border left Border right Padding left Padding right Additionally, for the width of the space taken by the box: Margin left Margin right Ingridients of height: Height Border top Border bottom Padding top Padding bottom Additionally, for height of the space taken by the box: Margin top Margin bottom So, when you will sum the element, you will have an equation: Width: Box width = width + borderLeft + borderRight + paddingLeft + paddingRight Box width = 100px + 10px + 10px + 30px + 30px = 180px Space width: width = width + borderLeft + borderRight + paddingLeft + paddingRight +  marginLeft + marginRight width = 100px + 10px + 10px + 30px + 30px + 20px + 20 px = 220px Height: Box height = height + borderTop + borderBottom + paddingTop + paddingBottom Box height  = 200px + 10px + 10px + 30px + 30px = 280px Space height: Space height = height + borderTop + borderBottom + paddingTop + paddingBottom +  marginTop + marginBottom Space height = 200px + 10px + 10px + 30px + 30px + 20px + 20px = 320px Here, you can check it in a real browser: Omiting problems with traditional box model (box sizing) The basic theory of box model is pretty hard to learn. You need to remember about all the elements of width/height, even if you set the width and height. The hardest for beginners is the understanding of padding, which shouldn't be counted as a component of width and height. It should be inside the box, and it should impact on these values. To change this behavior to support CSS3 since Internet Explorer 8, box sizing comes to picture. You can set the value: box-sizing: border-box What it gives to you? Finally, the counting of box width and height will be easier because box padding and border is inside the box. So, if we are taking our previous class: .box {     width: 100px;     height: 200px;     border: 10px solid #000;     margin: 20px;     padding: 30px; } We can count the width and height easily: Width = 100px Height = 200px Additionally, the space taken by the box: Space width = 140 px (because of the 20 px margin on both sides: left and right) Space height = 240 px (because of the 20 px margin on both sides: top and bottom) Here is a sample from Chrome: So, if you don't want to repeat all the problems of a traditional box model, you should use it globally for all the elements. Of course, it's not recommended in projects that you are getting in some old project, for example, from new client that needs some small changes:  * { width: 100px; } Adding the preceding code can make more harm than good because of the inheritance of this property for all the elements, which are now based on a traditional box model. But for all the new projects, you should use it. Floating elements Floating boxes are the most used in modern layouts. The theory of floating boxes was still used especially in grid systems and inline lists in CSS frameworks. For example, class and mixin inline-list (in Zurb Foundation framework) are based on floats. Possibilities of floating elements Element can be floated to the left and right. Of course, there is a method that is resetting floats too. The possible values are: float: left; // will float element to left float: right; // will float element to right float: none; // will reset float Most known floating problems When you are using floating elements, you can have some issues. Most known problems with floated elements are: Too big elements (because of width, margin left/right, padding left/right, and badly counted width, which is based on box model) Not cleared floats All of these problems provide a specific effect, which you can easily recognize and then fix. Too big elements can be recognized when elements are not in one line and it should. What you should check first is if the box-sizing: border-box is applied. Then, check the width, padding, and margin. Not cleared floats you can easily recognize when to floating structure some elements from next container are floated. It means that you have no clearfix in your floating container. Define clearfix/class/mixin When I was starting developing HTML and CSS code, there was a method to clear the floats with classes .cb or .clear, both defined as: .clearboth, .cb {     clear: both } This element was added in the container right after all the floated elements. This is important to remember about clearing the floats because the container which contains floating elements won't inherit the height of highest floating element (will have a height equal 0). For example: <div class="container">     <div class="float">         … content ...     </div>     <div class="float">         … content ...     </div>     <div class="clearboth"></div> </div> Where CSS looks like this: .float {     width: 100px;     height: 100px;     float: left; }   .clearboth {     clear: both } Nowadays, there is a better and faster way to clear floats. You can do this with clearfix, which can be defined like this: .clearfix:after {     content: " ";     visibility: hidden;     display: block;     height: 0;     clear: both; } You can use in HTML code: <div class="container clearfix">     <div class="float">         … content ...     </div>     <div class="float">         … content ...     </div> </div> The main reason to switch on clearfix is that you save one tag (with clears both classes). Recommended usage is based on the clearfix mixin, which you can define like this in SASS: =clearfix   &:after     content: " "     visibility: hidden     display: block     height: 0     clear: both So, every time you need to clear floating in some container, you need to invoke it. Let's take the previous code as an example: <div class="container">     <div class="float">         … content ...     </div>     <div class="float">         … content ...     </div> </div> A container can be described as: .container   +clearfix Example of using floating elements The most known usage of float elements is grids. Grid is mainly used to structure the data displayed on a webpage. In this article, let's check just a short draft of grid. Let's create an HTML code: <div class="row">     <div class="column_1of2">         Lorem     </div>     <div class="column_1of2">         Lorem     </div>   </div> <div class="row">     <div class="column_1of3">         Lorem     </div>     <div class="column_1of3">         Lorem     </div>     <div class="column_1of3">         Lorem     </div>   </div>   <div class="row">     <div class="column_1of4">         Lorem     </div>     <div class="column_1of4">         Lorem     </div>     <div class="column_1of4">         Lorem     </div>     <div class="column_1of4">         Lorem     </div> </div> And SASS: *   box-sizing: border-box =clearfix   &:after     content: " "     visibility: hidden     display: block     height: 0     clear: both .row   +clearfix .column_1of2   background: orange   width: 50%   float: left   &:nth-child(2n)     background: red .column_1of3   background: orange   width: (100% / 3)   float: left   &:nth-child(2n)     background: red .column_1of4   background: orange   width: 25%   float: left   &:nth-child(2n)     background: red The final effect: As you can see, we have created a structure of a basic grid. In places where HTML code is placed, Lorem here is a full lorem ipsum to illustrate the grid system. Summary In this article, we studied about the traditional box model and floating elements in detail. Resources for Article: Further resources on this subject: Flexbox in CSS [article] CodeIgniter Email and HTML Table [article] Developing Wiki Seek Widget Using Javascript [article]
Read more
  • 0
  • 0
  • 13522

article-image-threading-basics
Packt
08 Apr 2016
6 min read
Save for later

Threading Basics

Packt
08 Apr 2016
6 min read
In this article by Eugene Agafonov, author of the book Multithreading with C# Cookbook - Second Edition, we will cover the basic tasks to work with threads in C#. You will learn the following recipes: Creating a thread in C# Pausing a thread Making a thread wait (For more resources related to this topic, see here.) Creating a thread in C# Throughout the following recipes, we will use Visual Studio 2015 as the main tool to write multithreaded programs in C#. This recipe will show you how to create a new C# program and use threads in it. There is a free Visual Studio Community 2015 IDE, which can be downloaded from the Microsoft website and used to run the code samples. Getting ready To work through this recipe, you will need Visual Studio 2015. There are no other prerequisites. How to do it... To understand how to create a new C# program and use threads in it, perform the following steps: Start Visual Studio 2015. Create a new C# console application project. Make sure that the project uses .NET Framework 4.6 or higher; however, the code in this article will work with previous versions. In the Program.cs file, add the following using directives: using System; using System.Threading; using static System.Console; Add the following code snippet below the Main method: static void PrintNumbers() { WriteLine("Starting..."); for (int i = 1; i < 10; i++) { WriteLine(i); } } Add the following code snippet inside the Main method: Thread t = new Thread(PrintNumbers); t.Start(); PrintNumbers(); Run the program. The output will be something like the following screenshot: How it works... In steps 1 and 2, we created a simple console application in C# using .Net Framework version 4.0. Then, in step 3, we included the System.Threading namespace, which contains all the types needed for the program. Then, we used the using static feature from C# 6.0, which allows us to use the System.Console type's static methods without specifying the type name. An instance of a program that is being executed can be referred to as a process. A process consists of one or more threads. This means that when we run a program, we always have one main thread that executes the program code. In step 4, we defined the PrintNumbers method, which will be used in both the main and newly created threads. Then, in step 5, we created a thread that runs PrintNumbers. When we construct a thread, an instance of the ThreadStart or ParameterizedThreadStart delegate is passed to the constructor. The C# compiler creates this object behind the scenes when we just type the name of the method we want to run in a different thread. Then, we start a thread and run PrintNumbers in the usual manner on the main thread. As a result, there will be two ranges of numbers from 1 to 10 randomly crossing each other. This illustrates that the PrintNumbers method runs simultaneously on the main thread and on the other thread. Pausing a thread This recipe will show you how to make a thread wait for some time without wasting operating system resources. Getting ready To work through this recipe, you will need Visual Studio 2015. There are no other prerequisites. How to do it... To understand how to make a thread wait without wasting operating system resources, perform the following steps: Start Visual Studio 2015. Create a new C# console application project. In the Program.cs file, add the following using directives: using System; using System.Threading; using static System.Console; using static System.Threading.Thread; Add the following code snippet below the Main method: static void PrintNumbers() { WriteLine("Starting..."); for (int i = 1; i < 10; i++) { WriteLine(i); } } static void PrintNumbersWithDelay() { WriteLine("Starting..."); for (int i = 1; i < 10; i++) { Sleep(TimeSpan.FromSeconds(2)); WriteLine(i); } } Add the following code snippet inside the Main method: Thread t = new Thread(PrintNumbersWithDelay); t.Start(); PrintNumbers(); Run the program. How it works... When the program is run, it creates a thread that will execute a code in the PrintNumbersWithDelay method. Immediately after that, it runs the PrintNumbers method. The key feature here is adding the Thread.Sleep method call to a PrintNumbersWithDelay method. It causes the thread executing this code to wait a specified amount of time (2 seconds in our case) before printing each number. While a thread sleeps, it uses as little CPU time as possible. As a result, we will see that the code in the PrintNumbers method, which usually runs later, will be executed before the code in the PrintNumbersWithDelay method in a separate thread. Making a thread wait This recipe will show you how a program can wait for some computation in another thread to complete to use its result later in the code. It is not enough to use Thread.Sleep method because we don't know the exact time the computation will take. Getting ready To work through this recipe, you will need Visual Studio 2015. There are no other prerequisites. How to do it... To understand how a program waits for some computation in another thread to complete in order to use its result later, perform the following steps: Start Visual Studio 2015. Create a new C# console application project. In the Program.cs file, add the following using directives: using System; using System.Threading; using static System.Console; using static System.Threading.Thread; Add the following code snippet below the Main method: static void PrintNumbersWithDelay() { WriteLine("Starting..."); for (int i = 1; i < 10; i++) { Sleep(TimeSpan.FromSeconds(2)); WriteLine(i); } } Add the following code snippet inside the Main method: WriteLine("Starting..."); Thread t = new Thread(PrintNumbersWithDelay); t.Start(); t.Join(); WriteLine("Thread completed"); Run the program. How it works... When the program is run, it runs a long-running thread that prints out numbers and waits two seconds before printing each number. But in the main program, we called the t.Join method, which allows us to wait for thread t to complete. When it is complete, the main program continues to run. With the help of this technique, it is possible to synchronize execution steps between two threads. The first one waits until another one is complete and then continues to work. While the first thread waits, it is in a blocked state (as it is in the previous recipe when you call Thread.Sleep). Summary In this article, we focused on performing some very basic operations with threads in the C# language. We covered a thread's life cycle, which includes creating a thread, suspending a thread, and making a thread wait. Resources for Article: Further resources on this subject: Simplifying Parallelism Complexity in C#[article] Watching Multiple Threads in C#[article] Debugging Multithreaded Applications as Singlethreaded in C#[article]
Read more
  • 0
  • 0
  • 1519

article-image-getting-started-d3-es2016-and-nodejs
Packt
08 Apr 2016
25 min read
Save for later

Getting Started with D3, ES2016, and Node.js

Packt
08 Apr 2016
25 min read
In this article by Ændrew Rininsland, author of the book Learning d3.js Data Visualization, Second Edition, we'll lay the foundations of what you'll need to run all the examples in the article. I'll explain how you can start writing ECMAScript 2016 (ES2016) today—which is the latest and most advanced version of JavaScript—and show you how to use Babel to transpile it to ES5, allowing your modern JavaScript to be run on any browser. We'll then cover the basics of using D3 to render a basic chart. (For more resources related to this topic, see here.) What is D3.js? D3 stands for Data-Driven Documents, and it is being developed by Mike Bostock and the D3 community since 2011. The successor to Bostock's earlier Protovis library, it allows pixel-perfect rendering of data by abstracting the calculation of things such as scales and axes into an easy-to-use domain-specific language (DSL). D3's idioms should be immediately familiar to anyone with experience of using the massively popular jQuery JavaScript library. Much like jQuery, in D3, you operate on elements by selecting them and then manipulating via a chain of modifier functions. Especially within the context of data visualization, this declarative approach makes using it easier and more enjoyable than a lot of other tools out there. The official website, https://d3js.org/, features many great examples that show off the power of D3, but understanding them is tricky at best. After finishing this article, you should be able to understand D3 well enough to figure out the examples. If you want to follow the development of D3 more closely, check out the source code hosted on GitHub at https://github.com/mbostock/d3. The fine-grained control and its elegance make D3 one of the most—if not the most—powerful open source visualization libraries out there. This also means that it's not very suitable for simple jobs such as drawing a line chart or two—in that case you might want to use a library designed for charting. Many use D3 internally anyway. One such interface is Axis, an open source app that I've written. It allows users to easily build basic line, pie, area, and bar charts without writing any code. Try it out at use.axisjs.org. As a data manipulation library, D3 is based on the principles of functional programming, which is probably where a lot of confusion stems from. Unfortunately, functional programming goes beyond the scope of this article, but I'll explain all the relevant bits to make sure that everyone's on the same page. What’s ES2016? One of the main changes in this edition is the emphasis on ES2016, the most modern version of JavaScript currently available. Formerly known as ES6 (Harmony), it pushes the JavaScript language's features forward significantly, allowing for new usage patterns that simplify code readability and increase expressiveness. If you've written JavaScript before and the examples in this article look pretty confusing, it means you're probably familiar with the older, more common ES5 syntax. But don't sweat! It really doesn't take too long to get the hang of the new syntax, and I will try to explain the new language features as we encounter them. Although it might seem a somewhat steep learning curve at the start, by the end, you'll have improved your ability to write code quite substantially and will be on the cutting edge of contemporary JavaScript development. For a really good rundown of all the new toys you have with ES2016, check out this nice guide by the folks at Babel.js, which we will use extensively throughout this article: https://babeljs.io/docs/learn-es2015/. Before I go any further, let me clear some confusion about what ES2016 actually is. Initially, the ECMAScript (or ES for short) standards were incremented by cardinal numbers, for instance, ES4, ES5, ES6, and ES7. However, with ES6, they changed this so that a new standard is released every year in order to keep pace with modern development trends, and thus we refer to the year (2016) now. The big release was ES2015, which more or less maps to ES6. ES2016 is scheduled for ratification in June 2016, and builds on the previous year's standard, while adding a few fixes and two new features. You don't really need to worry about compatibility because we use Babel.js to transpile everything down to ES5 anyway, so it runs the same in Node.js and in the browser. For the sake of simplicity, I will use the word "ES2016" throughout in a general sense to refer to all modern JavaScript, but I'm not referring to the ECMAScript 2016 specification itself. Getting started with Node and Git on the command line I will try not to be too opinionated in this article about which editor or operating system you should use to work through it (though I am using Atom on Mac OS X), but you are going to need a few prerequisites to start. The first is Node.js. Node is widely used for web development nowadays, and it's actually just JavaScript that can be run on the command line. If you're on Windows or Mac OS X without Homebrew, use the installer at https://nodejs.org/en/. If you're on Mac OS X and are using Homebrew, I would recommend installing "n" instead, which allows you to easily switch between versions of Node: $ brew install n $ n latest Regardless of how you do it, once you finish, verify by running the following lines: $ node --version $ npm --version If it displays the versions of node and npm (I'm using 5.6.0 and 3.6.0, respectively), it means you're good to go. If it says something similar to Command not found, double-check whether you've installed everything correctly, and verify that Node.js is in your $PATH environment variable. Next, you'll want to clone the article's repository from GitHub. Change to your project directory and type this: $ git clone https://github.com/aendrew/learning-d3 $ cd $ learning-d3 This will clone the development environment and all the samples in the learning-d3/ directory as well as switch you into it. Another option is to fork the repository on GitHub and then clone your fork instead of mine as was just shown. This will allow you to easily publish your work on the cloud, enabling you to more easily seek support, display finished projects on GitHub pages, and even submit suggestions and amendments to the parent project. This will help us improve this article for future editions. To do this, fork aendrew/learning-d3 and replace aendrew in the preceding code snippet with your GitHub username. Each chapter of this book is in a separate branch. To switch between them, type the following command: $ git checkout chapter1 Replace 1 with whichever chapter you want the examples for. Stay at master for now though. To get back to it, type this line: $ git stash save && git checkout master The master branch is where you'll do a lot of your coding as you work through this article. It includes a prebuilt package.json file (used by npm to manage dependencies), which we'll use to aid our development over the course of this article. There's also a webpack.config.js file, which tells the build system where to put things, and there are a few other sundry config files. We still need to install our dependencies, so let's do that now: $ npm install All of the source code that you'll be working on is in the src/ folder. You'll notice it contains an index.html and an index.js file; almost always, we'll be working in index.js, as index.html is just a minimal container to display our work in: <!DOCTYPE html> <div id="chart"></div> <script src="/assets/bundle.js"></script> To get things rolling, start the development server by typing the following line: $ npm start This starts up the Webpack development server, which will transform our ES2016 JavaScript into backwards-compatible ES5, which can easily be loaded by most browsers. In the preceding HTML, bundle.js is the compiled code produced by Webpack. Now point Chrome to localhost:8080 and fire up the developer console (Ctrl +Shift + J for Linux and Windows and Option + Command + J for Mac). You should see a blank website and a blank JavaScript console with a Command Prompt waiting for some code: A quick Chrome Developer Tools primer Chrome Developer Tools are indispensable to web development. Most modern browsers have something similar, but to keep this article shorter, we'll stick to Chrome here for the sake of simplicity. Feel free to use a different browser. Firefox's Developer Edition is particularly nice. We are mostly going to use the Elements and Console tabs, Elements to inspect the DOM and Console to play with JavaScript code and look for any problems. The other six tabs come in handy for large projects: The Network tab will let you know how long files are taking to load and help you inspect the Ajax requests. The Profiles tab will help you profile JavaScript for performance. The Resources tab is good for inspecting client-side data. Timeline and Audits are useful when you have a global variable that is leaking memory and you're trying to work out exactly why your library is suddenly causing Chrome to use 500 MB of RAM. While I've used these in D3 development, they're probably more useful when building large web applications with frameworks such as React and Angular. One of the favorites from Developer Tools is the CSS inspector at the right-hand side of the Elements tab. It can tell you what CSS rules are affecting the styling of an element, which is very good for hunting rogue rules that are messing things up. You can also edit the CSS and immediately see the results, as follows: The obligatory bar chart example No introductory chapter on D3 would be complete without a basic bar chart example. They are to D3 as "Hello World" is to everything else, and 90 percent of all data storytelling can be done in its simplest form with an intelligent bar or line chart. For a good example of this, look at the kinds of graphics The Economist includes with their articles—they frequently summarize the entire piece with a simple line chart. Coming from a newsroom development background, many of my examples will be related to some degree to current events or possible topics worth visualizing with data. The news development community has been really instrumental in creating the environment for D3 to flourish, and it's increasingly important for aspiring journalists to have proficiency in tools such as D3. The first dataset that we'll use is UNHCR's regional population data. The documentation for this endpoint is at data.unhcr.org/wiki/index.php/Get-population-regional.html. We'll create a bar for each population of displaced people. The first step is to get a basic container set up, which we can then populate with all of our delicious new ES2016 code. At the top of index.js, put the following code: export class BasicChart {   constructor(data) {     var d3 = require('d3'); // Require D3 via Webpack     this.data = data;     this.svg = d3.select('div#chart').append('svg');   } } var chart = new BasicChart(); If you open this in your browser, you'll get the following error on your console: Uncaught Error: Cannot find module "d3" This is because we haven't installed it yet. You’ll notice on line 3 of the preceding code that we import D3 by requiring it. If you've used D3 before, you might be more familiar with it attached to the window global object. This is essentially the same as including a script tag that references D3 in your HTML document, the only difference being that Webpack uses the Node version and compiles it into your bundle.js. To install D3, you use npm. In your project directory, type the following line: $ npm install d3 --save This will pull the latest version of D3 from npmjs.org to the node_modules directory and save it in your package.json file. The package.json file is really useful; instead of keeping all your dependencies inside of your Git repository, you can easily redownload them all just by typing this line: $ npm install If you go back to your browser and switch quickly to the Elements tab, you'll notice a new SVG element as a child of #chart. Go back to index.js. Let's add a bit more to the constructor before I explain what's going on here: export class BasicChart {   constructor(data) {     var d3 = require('d3'); // Require D3 via Webpack     this.data = data;     this.svg = d3.select('div#chart').append('svg');     this.margin = {       left: 30,       top: 30,       right: 0,       bottom: 0     };     this.svg.attr('width', window.innerWidth);     this.svg.attr('height', window.innerHeight);     this.width = window.innerWidth - this.margin.left - this.margin.right;     this.height = window.innerHeight - this.margin.top - this.margin.bottom;     this.chart = this.svg.append('g')       .attr('width', this.width)       .attr('height', this.height) .attr('transform', `translate(${this.margin.left}, ${this.margin.top})`);   } } Okay, here we have the most basic container you'll ever make. All it does is attach data to the class: this.data = data; This selects the #chart element on the page, appending an SVG element and assigning it to another class property: this.svg = d3.select('div#chart').append('svg'); Then it creates a third class property, chart, as a group that's offset by the margins: this.width = window.innerWidth - this.margin.left - this.margin.right;   this.height = window.innerHeight - this.margin.top - this.margin.bottom;   this.chart = svg.append('g')     .attr('width', this.width)     .attr('height', this.height)     .attr('transform', `translate(${this.margin.left}, ${this.margin.top})`); Notice the snazzy new ES2016 string interpolation syntax—using `backticks`, you can then echo out a variable by enclosing it in ${ and }. No more concatenating! The preceding code is not really all that interesting, but wouldn't it be awesome if you never had to type that out again? Well! Because you're the total boss and are learning ES2016 like all the cool kids, you won't ever have to. Let's create our first child class! We're done with BasicChart for the moment. Now, we want to create our actual bar chart class: export class BasicBarChart extends BasicChart {   constructor(data) {     super(data);   } } This is probably very confusing if you're new to ES6. First off, we're extending BasicChart, which means all the class properties that we just defined a minute ago are now available for our BasicBarChart child class. However, if we instantiate a new instance of this, we get the constructor function in our child class. How do we attach the data object so that it's available for both BasicChart and BasicBarChart? The answer is super(), which merely runs the constructor function of the parent class. In other words, even though we don't assign data to this.data as we did previously, it will still be available there when we need it. This is because it was assigned via the parent constructor through the use of super(). We're almost at the point of getting some bars onto that graph; hold tight! But first, we need to define our scales, which decide how D3 maps data to pixel values. Add this code to the constructor of BasicBarChart: let x = d3.scale.ordinal()   .rangeRoundBands([this.margin.left, this.width - this.margin.right], 0.1); The x scale is now a function that maps inputs from an as-yet-unknown domain (we don't have the data yet) to a range of values between this.margin.left and this.width - this.margin.right, that is, between 30 and the width of your viewport minus the right margin, with some spacing defined by the 0.1 value. Because it's an ordinal scale, the domain will have to be discrete rather than continuous. The rangeRoundBands means the range will be split into bands that are guaranteed to be round numbers. Hoorah! We have fit our first new fancy ES2016 feature! The let is the new var—you can still use var to define variables, but you should use let instead because it's limited in scope to the block, statement, or expression on which it is used. Meanwhile, var is used for more global variables, or variables that you want available regardless of the block scope. For more on this, visit http://mdn.io/let. If you have no idea what I'm talking about here, don't worry. It just means that you should define variables with let because they're more likely to act as you think they should and are less likely to leak into other parts of your code. It will also throw an error if you use it before it's defined, which can help with troubleshooting and preventing sneaky bugs. Still inside the constructor, we define another scale named y: let y = d3.scale.linear().range([this.height, this.margin.bottom]); Similarly, the y scale is going to map a currently unknown linear domain to a range between this.height and this.margin.bottom, that is, your viewport height and 30. Inverting the range is important because D3.js considers the top of a graph to be y=0. If ever you find yourself trying to troubleshoot why a D3 chart is upside down, try switching the range values. Now, we define our axes. Add this just after the preceding line, inside the constructor: let xAxis = d3.svg.axis().scale(x).orient('bottom'); let yAxis = d3.svg.axis().scale(y).orient('left'); We've told each axis what scale to use when placing ticks and which side of the axis to put the labels on. D3 will automatically decide how many ticks to display, where they should go, and how to label them. Now the fun begins! We're going to load in our data using Node-style require statements this time around. This works because our sample dataset is in JSON and it's just a file in our repository. For now, this will suffice for our purposes—no callbacks, promises, or observables necessary! Put this at the bottom of the constructor: let data = require('./data/chapter1.json'); Once or maybe twice in your life, the keys in your dataset will match perfectly and you won't need to transform any data. This almost never happens, and today is not one of those times. We're going to use basic JavaScript array operations to filter out invalid data and map that data into a format that's easier for us to work with: let totalNumbers = data.filter((obj) => { return obj.population.length;   })   .map(     (obj) => {       return {         name: obj.name,         population: Number(obj.population[0].value)       };     }   ); This runs the data that we just imported through Array.prototype.filter, whereby any elements without a population array are stripped out. The resultant collection is then passed through Array.prototype.map, which creates an array of objects, each comprised of a name and a population value. We've turned our data into a list of two-value dictionaries. Let's now supply the data to our BasicBarChart class and instantiate it for the first time. Consider the line that says the following: var chart = new BasicChart(); Replace it with this line: var myChart = new BasicBarChart(totalNumbers); The myChart.data will now equal totalNumbers! Go back to the constructor in the BasicBarChart class. Remember the x and y scales from before? We can finally give them a domain and make them useful. Again, a scale is a simply a function that maps an input range to an output domain: x.domain(data.map((d) => { return d.name })); y.domain([0, d3.max(data, (d) => { return d.population; })]); Hey, there's another ES2016 feature! Instead of typing function() {} endlessly, you can now just put () => {} for anonymous functions. Other than being six keystrokes less, the "fat arrow" doesn't bind the value of this to something else, which can make life a lot easier. For more on this, visit http://mdn.io/Arrow_functions. Since most D3 elements are objects and functions at the same time, we can change the internal state of both scales without assigning the result to anything. The domain of x is a list of discrete values. The domain of y is a range from 0 to the d3.max of our dataset—the largest value. Now we're going to draw the axes on our graph: this.chart.append('g')         .attr('class', 'axis')         .attr('transform', `translate(0, ${this.height})`)         .call(xAxis); We've appended an element called g to the graph, given it the axis CSS class, and moved the element to a place in the bottom-left corner of the graph with the transform attribute. Finally, we call the xAxis function and let D3 handle the rest. The drawing of the other axis works exactly the same, but with different arguments: this.chart.append('g')         .attr('class', 'axis')         .attr('transform', `translate(${this.margin.left}, 0)`)         .call(yAxis); Now that our graph is labeled, it's finally time to draw some data: this.chart.selectAll('rect')         .data(data)         .enter()         .append('rect')         .attr('class', 'bar')         .attr('x', (d) => { return x(d.name); })         .attr('width', x.rangeBand())         .attr('y', (d) => { return y(d.population); })         .attr('height', (d) => { return this.height - y(d.population); }); Okay, there's plenty going on here, but this code is saying something very simple. This is what is says: For all rectangles (rect) in the graph, load our data Go through it For each item, append a rect Then define some attributes Ignore the fact that there aren't any rectangles initially; what you're doing is creating a selection that is bound to data and then operating on it. I can understand that it feels a bit weird to operate on non-existent elements (this was personally one of my biggest stumbling blocks when I was learning D3), but it's an idiom that shows its usefulness later on when we start adding and removing elements due to changing data. The x scale helps us calculate the horizontal positions, and rangeBand gives the width of the bar. The y scale calculates vertical positions, and we manually get the height of each bar from y to the bottom. Note that whenever we needed a different value for every element, we defined an attribute as a function (x, y, and height); otherwise, we defined it as a value (width). Keep this in mind when you're tinkering. Let's add some flourish and make each bar grow out of the horizontal axis. Time to dip our toes into animations! Modify the code you just added to resemble the following. I've highlighted the lines that are different: this.chart.selectAll('rect')   .data(data)   .enter()   .append('rect')   .attr('class', 'bar')   .attr('x', (d) => { return x(d.name); })   .attr('width', x.rangeBand())   .attr('y', () => { return y(this.margin.bottom); })   .attr('height', 0)   .transition()     .delay((d, i) => { return i*20; })     .duration(800)     .attr('y', (d) => { return y(d.population); })     .attr('height', (d) => {          return this.height - y(d.population);       }); The difference is that we statically put all bars at the bottom (margin.bottom) and then entered a transition with .transition(). From here on, we define the transition that we want. First, we wanted each bar's transition delayed by 20 milliseconds using i*20. Most D3 callbacks will return the datum (or "whatever data has been bound to this element," which is typically set to d) and the index (or the ordinal number of the item currently being evaluated, which is typically i) while setting the this argument to the currently selected DOM element. Because of this last point, we use the fat arrow—so that we can still use the class this.height property. Otherwise, we'd be trying to find the height property on our SVGRect element, which we're midway to trying to define! This gives the histogram a neat effect, gradually appearing from left to right instead of jumping up at once. Next, we say that we want each animation to last just shy of a second, with .duration(800). At the end, we define the final values for the animated attributes—y and height are the same as in the previous code—and D3 will take care of the rest. Save your file and the page should auto-refresh in the background. If everything went according to the plan, you should have a chart that looks like the following: According to this UNHCR data from June 2015, by far the largest number of displaced persons are from Syria. Hey, look at this—we kind of just did some data journalism here! Remember that you can look at the entire code on GitHub at http://github.com/aendrew/learning-d3/tree/chapter1 if you didn't get something similar to the preceding screenshot. We still need to do just a bit more, mainly by using CSS to style the SVG elements. We could have just gone to our HTML file and added CSS, but then that means opening that yucky index.html file. And where's the fun in writing HTML when we're learning some newfangled JavaScript?! First, create an index.css file in your src/ directory: html, body {   padding: 0;   margin: 0; }   .axis path, .axis line {   fill: none;   stroke: #eee;   shape-rendering: crispEdges; }   .axis text {   font-size: 11px; }   .bar {   fill: steelblue; } Then just add the following line to index.js: require('./index.css'); I know. Crazy, right?! No <style> tags needed! It's worth noting that anything involving require is the result of a Webpack loader; in this article, we've used both the CSS/Style and JSON loaders. Although the author of this text is a fan of Webpack, all we're doing is compiling the styles into bundle.js with Webpack instead of requiring them globally via a <style> tag. This is cool because instead of uploading a dozen files when deploying your finished code, you effectively deploy one optimized bundle. You can also scope CSS rules to be particular to when they’re being included and all sorts of other nifty stuff; for more information, refer to github.com/webpack/css-loader#local-scope. Looking at the preceding CSS, you can now see why we added all those classes to our shapes—we can now directly reference them when styling with CSS. We made the axes thin, gave them a light gray color, and used a smaller font for the labels. The bars should be light blue. Save and wait for the page to refresh. We've made our first D3 chart! I recommend fiddling with the values for width, height, and margin inside of BasicChart to get a feel of the power of D3. You'll notice that everything scales and adjusts to any size without you having to change other code. Smashing! Summary In this article, you learned what D3 is and took a glance at the core philosophy behind how it works. You also set up your computer for prototyping of ideas and to play with visualizations. This environment will be assumed throughout the article. We went through a simple example and created an animated histogram using some of the basics of D3. You learned about scales and axes, that the vertical axis is inverted, that any property defined as a function is recalculated for every data point, and that we use a combination of CSS and SVG to make things beautiful. We also did a lot of fancy stuff with ES2016, Babel, and Webpack and got Node.js installed. Go us! Most of all, this article has given you the basic tools so that you can start playing with D3.js on your own. Tinkering is your friend! Don't be afraid to break stuff—you can always reset to a chapter's default state by running $ git reset --soft origin/chapter1, replacing 1 with whichever chapter you're on. Next, we'll be looking at all this a bit more in depth, specifically how the DOM, SVG, and CSS interact with each other. This article discussed quite a lot, so if some parts got away from you, don't worry. Resources for Article: Further resources on this subject: An Introduction to Node.js Design Patterns [article] Developing Node.js Web Applications [article] Developing a Basic Site with Node.js and Express [article]
Read more
  • 0
  • 0
  • 2049

article-image-selecting-and-analyzing-digital-evidence
Packt
08 Apr 2016
13 min read
Save for later

Selecting and Analyzing Digital Evidence

Packt
08 Apr 2016
13 min read
In this article, Richard Boddington, the author of Practical Digital Forensics, explains how the recovery and preservation of digital evidence has traditionally involved imaging devices and storing the data in bulk in a forensic file or, more effectively, in a forensic image container, notably the IlookIX .ASB container. The recovery of smaller, more manageable datasets from larger datasets from a device or network system using the ISeekDiscovery automaton is now a reality. Whether the practitioner examines an image container or an extraction of information in the ISeekDiscovery container, it should be possible to overview the recovered information and develop a clearer perception of the type of evidence that should be located. Once acquired, the image or device may be searched to find evidence, and locating evidence requires a degree of analysis combined with practitioner knowledge and experience. The process of selection involves analysis, and as new leads open up, the search for more evidence intensifies until ultimately, a thorough search is completed. The searching process involves the analysis of possible evidence, from which evidence may be discarded, collected, or tagged for later reexamination, thereby instigating the selection process. The final two stages of the investigative process are the validation of the evidence, aimed at determining its reliability, relevance, authenticity, accuracy, and completeness, and finally, the presentation of the evidence to interested parties, such as the investigators, the legal team, and ultimately, the legal adjudicating body. (For more resources related to this topic, see here.) Locating digital evidence Locating evidence from the all-too-common large dataset requires some filtration of extraneous material, which has, until recently, been a mainly manual task of sorting the wheat from the chaff. But it is important to clear the clutter and noise of busy operating systems and applications from which only a small amount of evidence really needs to be gleaned. Search processes involve searching in a file system and inside files, and common searches for files are based on names or patterns in their names, keywords in their content, and temporal data (metadata) such as the last access or written time. A pragmatic approach to the examination is necessary, where the onus is on the practitioner to create a list of key words or search terms to cull specific, probative, and case-related information from very large groups of files. Searching desktops and laptops Home computer networks are normally linked to the Internet via a modem and various peripheral devices: a scanner, printer, external hard drive, thumb drive storage device, a digital camera, a mobile phone and a range of users. In an office network this would be a more complicated network system. The linked connections between the devices and the Internet with the terminal leave a range of traces and logging records in the terminal and on some of the devices and the Internet. E-mail messages will be recorded externally on the e-mail server, the printer may keep a record of print jobs, the external storage devices and the communication media also leave logs and data linked to the terminal. All of this data may assist in the reconstruction of key events and provide evidence related to the investigation. Using the logical examination process (booting up the image) it is possible to recover a limited number of deleted files and reconstruct some of the key events of relevance to an investigation. It may not always be possible to boot up a forensic image and view it in its logical format, which is easier and more familiar to users. However, viewing the data inside a forensic image in it physical format provides unaltered metadata and a greater number of deleted, hidden and obscured files that provide accurate information about applications and files. It is possible to view the containers that hold these histories and search records that have been recovered and stored in a forensic file container. Selecting digital evidence For those unfamiliar with investigations, it is quite common to misread the readily available evidence and draw incorrect conclusions. Business managers attempting to analyze what they consider are the facts of a case would be wise to seek legal assistance in selecting and evaluating evidence on which they may wish to base a case. Selecting the evidence involves analysis of the located evidence to determine what events occurred in the system, their significance, and the probative value to the case. The selection analysis stage requires the practitioner to carefully examine the available digital evidence ensuring that they do not misinterpret the evidence and make imprudent presumptions without carefully cross-checking the information. It is a fact-finding process where an attempt is made to develop a plausible reconstruction of the facts. As in conventional crime investigations, practitioners should look for evidence that suggests or indicates motive (why?), means (how?) and opportunity (when?) for suspects to commit the crime, but in cases dependent on digital evidence, it can be a vexatious process. There are often too many potential suspects, which complicates the process of linking the suspect to the events. The following figure shows a typical family network setup using Wi-Fi connections to the home modem that facilitates connection to the Internet. In this case, the parents provided the broadband service for themselves and for three other family members. One of the children's girlfriend completed her university assignments on his computer and synchronized her iPad to his device. The complexity of a typical household network and determining the identity of the transgressor The complexity of a typical household network and determining the identity of the transgressor More effective forensic tools Various forensic tools are available to assist the practitioner in selecting and collating data for examination analysis and investigation. Sorting order from the chaos of even a small personal computer can be a time-consuming and frustrating process. As the digital forensic discipline develops, better and more reliable forensic tools have been developed to assist practitioners in locating, selecting, and collating evidence from larger, complex datasets. To varying degrees, most digital forensic tools used to view and analyze forensic images or attached devices provide helpful user interfaces for locating and categorizing information relevant to the examination. The most advanced application that provides access and convenient viewing of files is the Category Explorer feature in ILookIX, which divides files by type, signature, and properties. Category Explorer also allows the practitioner to create custom categories to group files by relevance. For example, in a criminal investigation involving a conspiracy, the practitioner could create a category for the first individual and a category for the second individual. As files are reviewed, they would then be added to either or both categories. Unlike tags, files can be added to multiple categories, and the categories can be given descriptive names. Deconstructing files The deconstruction of files involves processing compound files such as archives, e-mail stores, registry stores, or other files to extract useful and usable data from a complex file format and generate reports. Manual deconstruction adds significantly to the time taken to complete an examination. Deconstructable files are compound files that can be further broken down into smaller parts such as e-mails, archives, or thumb stores of JPG files. Once the deconstruction is completed, the files will either move into the deconstructed files or deconstruction failed files folders. Deconstructable files will now be now broken out more—e-mail, graphics, archives, and so on. Searching for files Indexing is the process of generating a table of text strings that can then be searched almost instantly any number of times. The two main uses of indexing are to create a dictionary to use when cracking passwords and to index the words for almost-instant searching. Indexing is also valuable when creating a dictionary or using any of the analysis functions built in to ILookIX. ILookIX facilitates the indexing of the entire media at the time of initial processing, all at once. This can also be done after processing. Indexing facilitates searching through files and archives, Windows Registry, e-mail lists, and unallocated space. This function is highly customizable via the setup option in order to optimize for searching or for creating a custom dictionary for password cracking. Sound indexing ensures speedy and accurate searching. Searching is the process of having ILookIX look through the evidence for a specific item, such as a string of text or an expression. An expression, in terms of searching, is a pattern used to structure data in a search, such as a credit card number or e-mail address. The Event Analysis tool ILookIX's Event Analysis tool provides the practitioner a graphical representation of events on the subject system, such as file creation, access, or modification; e-mails sent or received; and other events such as the modification of the Master File Table on an NTFS system. The application allows the practitioner to zoom in on any point on the graph to view more specific details. Clicking on any bar on the graph will return the view to the main ILookIX window and display the items from the date bar selected in the List Pane. This can be most helpful when analyzing events during specific periods. The Lead Analysis tool Lead Analysis is an interactive evidence model embedded in ILookIX that allows the practitioner to assimilate known facts into a graphic representation that directly links unseen objects. It provides the answers as the practitioner increases the detail of the design surface and brings into view specific relationships that could go unseen otherwise. The primary aim of Lead Analysis is to help discover links within the case data that may not be evident or intuitive and the practitioner may not be aware of directly or that the practitioner has little background knowledge of to help form relationships manually. Instead of finding and making note of various pieces of information, the analysis is presented as an easy-to-use link model. The complexity of the modeling is removed so that it gives the clearest possible method of discovery. The analysis is based on the current index database, so it is essential to index case data prior to initiating an analysis. Once a list of potential links has been generated, it is important to review them to see whether any are potentially relevant. Highlight any that are, and it will then be possible to look for words in the catalogues if they have been included. In the example scenario, the word divorce was located as it was known that Sarah was divorced from the owner of the computer (the initial suspect). By selecting any word by left-clicking on it once and clicking on the green arrow to link it to Sarah, as shown below, relationships can be uncovered that are not always clear during the first inspection of the data. Each of the stated facts becomes one starting lead on the canvas. If the nodes are related, it is easy to model that relationship by manually linking them together by selecting the first Lead Project to link, right-clicking, and selecting Add a New Port from the menu. This is then repeated for the second Lead Object the practitioner wants to link. By simply clicking on the new port of the selected object that needs to be linked from and dragging to the port of the Lead Object that it should be linked to, a line will appear linking the two together. It is then possible to iterate this process using each start node or discovered node until it is possible to make sense of the total case data. A simple relationship between suspects, locations and even concepts is illustrated in the following screenshot: ILookIX Lead Analysis discovering relationships between various entities ILookIX Lead Analysis discovering relationships between various entities Analyzing e-mail datasets Analyzing and selecting evidence from large e-mail datasets is a common task for the practitioner. ILookIX's embedded application E-mail Linkage Analysis is an interactive evidence model to help practitioners discover links between the correspondents within e-mail data. The analysis is presented as an easy-to-use link model; the complexity of the modeling is removed to provide the clearest possible method of discovery. The results of analysis are saved at the end of the modeling session for future editing. If there is a large amount of e-mail to process, this analysis generation may take a few minutes. Once the analysis is displayed, the user will see the e-mail linkage itself. It is then possible to see a line between correspondents indicating that they have a relationship of some type. Here in particular, line thickness indicates the frequency of traffic between two correspondents; therefore, thicker flow lines indicate more traffic. On the canvas, once the analysis is generated, the user may select any e-mail addressee node by left-clicking on it once. Creating the analysis is really simple, and one of the most immediately valuable resources this provides is group identification, as shown in the following screenshot. ILookIX will initiate a search for that addressee and list all e-mails where your selected addressee was a correspondent. Users may make their own connection lines by clicking on an addressee node point and dragging to another node point. Nodes can be deleted to allow linkage between smaller groups of individuals. The E-mail Linkage tool showing relationships of possible relevance to a case The E-mail Linkage tool showing relationships of possible relevance to a case The Volume Shadow Copy analysis tools Shadow volumes, also known as the Volume Snapshot Service (VSS), use a service that creates point-in-time copies of files. The service is built in to versions of Windows Vista, 7, 8, and 10 and is turned on by default. ILookIX can recover true copies of overwritten files from shadow volumes, as long as they resided on the volume at the time that the snapshot was created. VSS recovery is a method of recovering extant and deleted files from the volume snapshots available on the system. IlookIX, unlike any other forensic tool, is capable of reconstructing volume shadow copies, either differential or full, including deleted files and folders. In the test scenario, the tool recovered a total of 87,000 files, equating to conventional tool recovery rates. Using ILookIX's Xtreme File Recovery, some 337,000 files were recovered. The Maximal Full Volume Shadow Snapshot application recovered a total of 797,00 files. Using the differential process, 354,000 files were recovered, which filtered out 17,000 additional files for further analysis. This enabled the detection of e-mail messages and attachments and Windows Registry changes that would normally remain hidden. Summary This article described in detail the process of locating and selecting evidence in terms of a general process. It also further explained the nature of digital evidence and provided examples of its value in supporting a legal case. Various advanced analysis and recovery tools were demonstrated that show the reader how technology can speed up and make more efficient the location and selection processes. Some of these tools are not new but have been enhanced, while others are innovative and seek out evidence normally unavailable to the practitioner. Resources for Article: Further resources on this subject: Mobile Phone Forensics – A First Step into Android Forensics [article] Introduction to Mobile Forensics [article] BackTrack Forensics [article]
Read more
  • 0
  • 0
  • 17087
Unlock access to the largest independent learning library in Tech for FREE!
Get unlimited access to 7500+ expert-authored eBooks and video courses covering every tech area you can think of.
Renews at ₹800/month. Cancel anytime
article-image-building-custom-widgets
Packt
08 Apr 2016
7 min read
Save for later

Building Custom Widgets

Packt
08 Apr 2016
7 min read
This article by Yogesh Dhanapal and Jayakrishnan Vijayaraghavan, authors of the book ArcGIS for JavaScript developers by Example, will develop a custom widget. (For more resources related to this topic, see here.) Building a custom widget Let's create a custom widget in the app, which will do the following: Allow the user to draw a polygon on the map. The polygon should be symbolized with a semitransparent red fill with a dashed yellow outline. The polygon should fetch all the major wild fire events within the boundary of the polygon. This shall be shown as highlighted in graphics and the data should in a grid. Internationalization support must be provided. Modules required for the widget Let's list the modules required to define class and their corresponding intended callback function decoration The modules for Class declaration and OOPS are illustrated in the following table: Modules Callback functions dojo/_base/declare declare dijit/_WidgetBase _WidgetBase dojo/_base/lang lang The modules for using HTML templates are illustrated in the following table: Modules Callback functions dijit/_TemplatedMixin _TemplatedMixin dojo/text! dijitTemplate The modules for using Event is illustrated in the following table: Modules Callback functions dojo/on on dijit/a11yclick a11yclick The modules for manipulating dom elements and their style are illustrated in the following table: Modules Callback functions dojo/dom-style domStyle dojo/dom-class domClass dojo/domReady! - Modules for using draw toolbar and displaying graphics Modules Callback functions esri/toolbars/draw Draw esri/symbols/SimpleFillSymbol SimpleFillSymbol esri/symbols/SimpleLineSymbol SimpleLineSymbol esri/graphic Graphic dojo/_base/Color Color Modules for querying data Modules Callback functions esri/tasks/query Query esri/tasks/QueryTask QueryTask Modules for internationalization support Module Callback functions dojo/i18n! nls Using the draw toolbar Draw toolbar enables us to draw graphics on the map. Draw toolbar has events associated with it. When a draw operation is completed, it returns the object drawn on the map as geometry. Perform the following steps to create a graphic using the draw toolbar: Initiating Draw toolbar The draw toolbar is provided by the module esri/toolbars/draw. The draw toolbar accepts the map object as an argument. Instantiate the draw toolbar within the postCreate function. The draw toolbar also accepts an additional optional argument named options. One of the properties in the options object is named showTooltips. This can be set to true so that we can see a tooltip associated while drawing. The text in the tooltip can be customized. Else, a default tooltip associated with draw geometry is displayed: return declare([_WidgetBase, _TemplatedMixin], { //assigning html template to template string templateString: dijitTemplate, isDrawActive: false, map: null, tbDraw: null, constructor: function (options, srcRefNode) { this.map = options.map; }, startup: function () {}, postCreate: function () { this.inherited(arguments); this.tbDraw = new Draw(this.map, {showTooltips : true}); } The Draw toolbar can be activated on the click event or touch event (in case of smartphones or tablets) of a button, which is intended to indicate the start of a draw event. Dojo provides a module that takes care of touch as well as click events. The module is named dijit/a11yclick. To activate the draw toolbar, we need to provide the type of symbol to draw. The draw toolbar provides a list of constants, which corresponds to the type of draw symbol. These constants are POINT, POLYGON, LINE, POLYLINE, FREEHAND_POLYGON, FREEHAND_POLYLINE, MULTI_POINT, RECTANGLE, TRIANGLE, CIRCLE, ELLIPSE, ARROW, UP_ARROW, DOWN_ARROW, LEFT_ARROW, and RIGHT_ARROW. While activating the draw toolbar, these constants must be used to define the type of Draw operation required. Our objective is to draw a polygon on the click of a draw button. The code is shown in the following screenshot: The draw operation Once the draw tool bar is activated, the draw operation will begin. For point geometry, the draw operation is just a single click. For a polyline and a polygon, the single click adds a vertex to the polyline and a double-click ends the sketch. For freehand polyline or polygon, the click-and-drag operation draws the geometry and a mouse-up operation ends the drawing. The draw-end event handler When the draw operation is complete, we need an event handler to do something with the shape that was drawn by the draw toolbar. The API provides a draw-end event, which is fired once the draw operation is complete. This event handler must be connected to the draw toolbar. This event handler shall be defined within the this.own() function inside the postCreate() method of the widget. The event result can be passed to a named function or an anonymous function: postCreate: function () { ... this.tbDraw.on("draw-end", lang.hitch(this, this.querybyGeometry)); }, ... querybyGeometry: function (evt) { this.isBusy(true); //Get the Drawn geometry var geometryInput = evt.geometry; ... } Symbolizing the drawn shape In the draw-end event call back function, we will get the geometry of the drawn shape as the result object. To add this geometry back to the map, we need to symbolize it. A symbol is associated with the geometry it symbolizes. Also, the styling of the symbol is defined by the colors or picture used to fill up the symbol and the size of the symbol. Just to symbolize a polygon, we need to use the SimpleFillSymbol and the SimpleLineSymbol modules. We may also need the esri/color module to define the fill colors. Let's review a snippet to understand this better. This is a simple snippet to construct a symbol for a polygon with semitransparent solid red color fill and a yellow dash-dot line. In the preceding snippet, SimpleFillSymbol.STYLE_SOLID and SimpleLineSymbol.STYLE_DASHDOT are the constants provided by the SimpleFIllSymbol and the SimpleLineSymbol modules, respectively. These constants are used for styling the polygon and the line. Two colors are defined in the construction of the symbol—one for filling up the polygon and the other for coloring the outline. A color can be defined by four components. They are as follows: Red Green Blue Opacity Red, Green, and Blue components take values from 0 to 255 and the Opacity takes values from 0 to 1. A combination of Red, Green, and Blue components can be used to produce any color according to the RGB color theory. So, to create a yellow color, we are using the maximum of Red component (255) and the maximum of Green Component (255); we don't want the Blue component to contribute to our color, so we will use 0. An Opacity value of 0 means 100% transparency and an opacity value of 1 means 100% opaqueness. We have used 0.2 for the fill color. This means that we need our polygon to be 20% opaque or 80% transparent. The default value for this component is 1. Symbol is just a generic object. It means that any polygon geometry can use the symbol to render itself. Now, we need a container object to display the drawn geometry with the previously defined symbol on the map. A Graphic object provided by the esri/Graphic module acts as a container object, which can accept a geometry and a symbol. The graphic object can be added to the map's graphic layer. A graphic layer is always present in the map object, which can be accessed by using the graphics property of the map (this.map.graphics). Summary In this article, we learned how to create classes and customized widget and its required modules, and how to use a draw toolbar. Resources for Article: Further resources on this subject: Using JavaScript with HTML[article] Learning to Create and Edit Data in ArcGIS[article] Introduction to Mobile Web ArcGIS Development[article]
Read more
  • 0
  • 0
  • 2384

article-image-using-native-sdks-and-libraries-react-native
Emilio Rodriguez
07 Apr 2016
6 min read
Save for later

Using Native SDKs and Libraries in React Native

Emilio Rodriguez
07 Apr 2016
6 min read
When building an app in React Native we may end up needing to use third-party SDKs or libraries. Most of the time, these are only available in their native version, and, therefore, only accessible as Objective-C or Swift libraries in the case of iOS apps or as Java Classes for Android apps. Only in a few cases these libraries are written in JavaScript and even then, they may need pieces of functionality not available in React Native such as DOM access or Node.js specific functionality. In my experience, this is one of the main reasons driving developers and IT decision makers in general to run away from React Native when considering a mobile development framework for their production apps. The creators of React Native were fully aware of this potential pitfall and left a door open in the framework to make sure integrating third-party software was not only possible but also quick, powerful, and doable by any non-iOS/Android native developer (i.e. most of the React Native developers). As a JavaScript developer, having to write Objective-C or Java code may not be very appealing in the beginning, but once you realize the whole process of integrating a native SDK can take as little as eight lines of code split in two files (one header file and one implementation file), the fear quickly fades away and the feeling of being able to perform even the most complex task in a mobile app starts to take over. Suddenly, the whole power of iOS and Android can be at any React developer’s disposal. To better illustrate how to integrate a third-party SDK we will use one of the easiest to integrate payment providers: Paymill. If we take a look at their site, we notice that only iOS and Android SDKs are available for mobile payments. That should leave out every app written in React Native if it wasn’t for the ability of this framework to communicate with native modules. For the sake of convenience I will focus this article on the iOS module. Step 1: Create two native files for our bridge. We need to create an Objective-C class, which will serve as a bridge between our React code and Paymill’s native SDK. Normally, an Objective-C class is made out of two files, a .m and a .h, holding the module implementation and the header for this module respectively. To create the .h file we can right-click on our project’s main folder in XCode > New File > Header file. In our case, I will call this file PaymillBridge.h. For React Native to communicate with our bridge, we need to make it implement the RTCBridgeModule included in React Native. To do so, we only have to make sure our .h file looks like this: // PaymillBridge.h #import "RCTBridgeModule.h" @interface PaymillBridge : NSObject <RCTBridgeModule> @end We can follow a similar process to create the .m file: Right-click our project’s main folder in XCode > New File > Objective-C file. The module implementation file should include the RCT_EXPORT_MODULE macro (also provided in any React Native project): // PaymillBridge.m @implementation PaymillBridge RCT_EXPORT_MODULE(); @end A macro is just a predefined piece of functionality that can be imported just by calling it. This will make sure React is aware of this module and would make it available for importing in your app. Now we need to expose the method we need in order to use Paymill’s services from our JavaScript code. For this example we will be using Paymill’s method to generate a token representing a credit card based on a public key and some credit card details: generateTokenWithPublicKey. To do so, we need to use another macro provided by React Native: RCT_EXPORT_METHOD. // PaymillBridge.m @implementation PaymillBridge RCT_EXPORT_MODULE(); RCT_EXPORT_METHOD(generateTokenWithPublicKey: (NSString *)publicKey cardDetails:(NSDictionary *)cardDetails callback:(RCTResponseSenderBlock)callback) { //… Implement the call as described in the SDK’s documentation … callback(@[[NSNull null], token]); } @end In this step we will have to write some Objective-C but most likely it would be a very simple piece of code using the examples stated in the SDK’s documentation. One interesting point is how to send data from the native SDK to our React code. To do so you need to pass a callback as you can see I did as the last parameter of our exported method. Callbacks in React Native’s bridges have to be defined as RCTResponseSenderBlock. Once we do this, we can call this callback passing an array of parameters, which will be sent as parameters for our JavaScript function in React Native (in our case we decided to pass two parameters back: an error set to null following the error handling conventions of node.js, and the token generated by Paymill natively). Step 2: Call our bridge from our React Native code. Once the module is properly set up, React Native makes it available in our app just by importing it from our JavaScript code: // PaymentComponent.js var Paymill = require('react-native').NativeModules.PaymillBridge; Paymill.generateTokenWithPublicKey( '56s4ad6a5s4sd5a6', cardDetails, function(error, token){ console.log(token); }); NativeModules holds the list of modules we created implementing the RCTBridgeModule. React Native makes them available by the name we chose for our Objective-C class name (PaymillBridge in our example). Then, we can call any exported native method as a normal JavaScript method from our React Native Component or library. Going Even Further That should do it for any basic SDK, but React Native gives developers a lot more control on how to communicate with native modules. For example, we may want to force the module to be run in the main thread. For that we just need to add an extra method to our native module implementation: // PaymillBridge.m @implementation PaymillBridge //... - (dispatch_queue_t)methodQueue { return dispatch_get_main_queue(); } Just by adding this method to our PaymillBridge.m React Native will force all the functionality related to this module to be run on the main thread, which will be needed when running main-thread-only iOS API. And there is more: promises, exporting constants, sending events to JavaScript, etc. More complex functionality can be found in the official documentation of React Native; the topics covered on this article, however, should solve 80 percent of the cases when implementing most of the third-party SDKs. About the Author Emilio Rodriguez started working as a software engineer for Sun Microsystems in 2006. Since then, he has focused his efforts on building a number of mobile apps with React Native while contributing to the React Native project. These contributions helped his understand how deep and powerful this framework is.
Read more
  • 0
  • 2
  • 33600

article-image-how-use-currying-swift-fun-and-profit
Alexander Altman
06 Apr 2016
5 min read
Save for later

How to Use Currying in Swift for Fun and Profit

Alexander Altman
06 Apr 2016
5 min read
Swift takes inspiration from functional languages in a lot of its features, and one of those features is currying. The idea behind currying is relatively straightforward, and Apple has already taken the time to explain the basics of it in The Swift Programming Language. Nonetheless, there's a lot more to currying in Swift than what first meets the eye. What is currying? Let's say we have a function, f, which takes two parameters, a: Int and b: String, and returns a Bool: func f(a: Int, _ b: String) -> Bool { // … do somthing here … } Here, we're taking both a and b simultaneously as parameters to our function, but we don't have to do it that way! We can just as easily write this function to take just a as a parameter and then return another function that takes b as it's only parameter and returns the final result: func f(a: Int) -> ((String) -> Bool) { return { b in // … do somthing here … } } (I've added a few extra parentheses for clarity, but Swift is actually just fine if you write String -> Bool instead of ((String) -> Bool); the two notations mean exactly the same thing.) This formulation uses a closure, but you can also use a nested function for the exact same effect: func f(a: Int) -> ((String) -> Bool) { func g(b: String) -> Bool { // … do somthing here … } return g } Of course, Swift wouldn't be Swift without providing a convenient syntax for things like this, so there is even a third way to write the curried version of f, and it's (usually) preferred over either of the previous two: func f(a: Int)(_ b: String) -> Bool { // … do somthing here … } Any of these iterations of our curried function f can be called like this: let res: Bool = f(1)("hello") Which should look very similar to the way you would call the original uncurried f: let res: Bool = f(1, "hello") Currying isn't limited to just two parameters either; here's an example of a partially curried function of five parameters (taking them in groups of two, one, and two): func weirdAddition(x: Int, use useX: Bool)(_ y: Int)(_ z: Int, use useZ: Bool) -> Int { return (useX ? x : 0) + y + (useZ ? z : 0) } How is currying used in Swift? Believe it or not, Swift actually uses currying all over the place, even if you don't notice it. Probably, the most prominent example is that of instance methods, which are just curried type methods: // This: NSColor.blueColor().shadowWithLevel(1/3) // …is the same as this: NSColor.shadowWithLevel(NSColor.blueColor())(1/3) But, there's a much deeper implication of currying's availability in Swift: all functions secretly take only one parameter! How is this possible, you ask? It has to do with how Swift treats tuples. A function that “officially” takes, say, three parameters, actually only takes one parameter that happens to be a three-tuple. This is perhaps most visible when exploited via the higher-order collections method: func dotProduct(xVec: [Double], _ yVec: [Double]) -> Double { // Note that (this particular overload of) the `*` operator // has the signature `(Double, Double) -> Double`. return zip(xVec, yVec).map(*).reduce(0, combine: +) } It would seem that anything you can do with tuples, you can do with a function parameter list and vice versa; in fact, that is almost true. The four features of function parameter lists that don't carry over directly into tuples are the variadic, inout, defaulted, and @autoclosure parameters. You can, technically, form a variadic, inout, defaulted, or @autoclosure tuple type, but if you try to use it in any context other than as a function's parameter type, swiftc will give you an error. What you definitely can do with tuples is use named values, notwithstanding the unfortunate prohibition on single-element tuples in Swift (named or not). Apple provides some information on tuples with named elements in The Swift Programming Language; it also gives an example of one in the same book. It should be noted that the names given to tuple elements are somewhat ephemeral in that they can very easily be introduced, eliminated, and altered via implicit conversions. This applies regardless of whether the tuple type is that of a standalone value or of a function's parameter: // converting names in a function's parameter list func printBoth(first x: Int, second y: String) { print(x, y, separator: ", ") } let printTwo: (a: Int, b: String) -> Void = printBoth // converting names in a standalone tuple type // (for some reason, Swift dislikes assigning `firstAndSecond` // directly to `aAndB`, but going through `nameless` is fine) let firstAndSecond: (first: Int, second: String) = (first: 1, second: "hello") let nameless: (Int, String) = firstAndSecond let aAndB: (a: Int, b: String) = nameless Currying, with its connection to tuples, is a very powerful feature of Swift. Use it wherever it seems helpful, and the language will be more than happy to oblige. About the author Alexander Altman is a functional programming enthusiast who enjoys the mathematical and ergonomic aspects of programming language design. He's been working with Swift since the language's first public release, and he is one of the core contributors to the TypeLift project.
Read more
  • 0
  • 0
  • 12534

article-image-caching-symfony
Packt
05 Apr 2016
15 min read
Save for later

Caching in Symfony

Packt
05 Apr 2016
15 min read
In this article by Sohail Salehi, author of the book, Mastering Symfony, we are going to discuss performance improvement using cache. Caching is a vast subject and needs its own book to be covered properly. However, in our Symfony project, we are interested in two types of caches only: Application cache Database cache We will see what caching facilities are provided in Symfony by default and how we can use them. We are going to apply the caching techniques on some methods in our projects and watch the performance improvement. By the end of this article, you will have a firm understanding about the usage of HTTP cache headers in the application layer and caching libraries. (For more resources related to this topic, see here.) Definition of cache Cache is a temporary place that stores contents that can be served faster when they are needed. Considering that we already have a permanent place on disk to store our web contents (templates, codes, and database tables), cache sounds like a duplicate storage. That is exactly what they are. They are duplicates and we need them because, in return for consuming an extra space to store the same data, they provide a very fast response to some requests. So this is a very good trade-off between storage and performance. To give you an example about how good this deal can be, consider the following image. On the left side, we have a usual client/server request/response model and let's say the response latency is two seconds and there are only 100 users who hit the same content per hour: On the right side, however, we have a cache layer that sits between the client and server. What it does basically is receive the same request and pass it to the server. The server sends a response to the cache and, because this response is new to the cache, it will save a copy (duplicate) of the response and then pass it back to the client. The latency is 2 + 0.2 seconds. However, it doesn't add up, does it? The purpose of using cache was to improve the overall performance and reduce the latency. It has already added more delays to the cycle. With this result, how could it possibly be beneficial? The answer is in the following image: Now, with the response being cached, imagine the same request comes through. (We have about 100 requests/hour for the same content, remember?) This time, the cache layer looks into its space, finds the response, and sends it back to the client, without bothering the server. The latency is 0.2 seconds. Of course, these are only imaginary numbers and situations. However, in the simplest form, this is how cache works. It might not be very helpful on a low traffic website; however, when we are dealing with thousands of concurrent users on a high traffic website, then we can appreciate the value of caching. So, according to the previous images, we can define some terminology and use them in this article as we continue. In the first image, when a client asked for that page, it wasn't exited and the cache layer had to store a copy of its contents for the future references. This is called Cache Miss. However, in the second image, we already had a copy of the contents stored in the cache and we benefited from it. This is called Cache Hit. Characteristics of a good cache If you do a quick search, you will find that a good cache is defined as the one which misses only once. In other words, this cache miss happens only if the content has not been requested before. This feature is necessary but it is not sufficient. To clarify the situation a little bit, let's add two more terminology here. A cache can be in one of the following states: fresh (has the same contents as the original response) and stale (has the old response's contents that have now changed on the server). The important question here is for how long should a cache be kept? We have the power to define the freshness of a cache via a setting expiration period. We will see how to do this in the coming sections. However, just because we have this power doesn't mean that we are right about the content's freshness. Consider the situation shown in the following image: If we cache a content for a long time, cache miss won't happen again (which satisfies the preceding definition), but the content might lose its freshness according to the dynamic resources that might change on the server. To give you an example, nobody likes to read the news of three months ago when they open the BBC website. Now, we can modify the definition of a good cache as follows: A cache strategy is considered to be good if cache miss for the same content happens only once, while the cached contents are still fresh. This means that defining the cache expiry time won't be enough and we need another strategy to keep an eye on cache freshness. This happens via a cache validation strategy. When the server sends a response, we can set the validation rules on the basis of what really matters on the server side, and this way, we can keep the contents stored in the cache fresh, as shown in the following image. We will see how to do this in Symfony soon. Caches in a Symfony project In this article, we will focus on two types of caches: The gateway cache (which is called reverse proxy cache as well) and doctrine cache. As you might have guessed, the gateway cache deals with all of the HTTP cache headers. Symfony comes with a very strong gateway cache out of the box. All you need to do is just activate it in your front controller then start defining your cache expiration and validation strategies inside your controllers. That said, it does not mean that you are forced or restrained to use the Symfony cache only. If you prefer other reverse proxy cache libraries (that is, Varnish or Django), you are welcome to use them. The caching configurations in Symfony are transparent such that you don't need to change a single line inside your controllers when you change your caching libraries. Just modify your config.yml file and you will be good to go. However, we all know that caching is not for application layers and views only. Sometimes, we need to cache any database-related contents as well. For our Doctrine ORM, this includes metadata cache, query cache, and result cache. Doctrine comes with its own bundle to handle these types of caches and it uses a wide range of libraries (APC, Memcached, Redis, and so on) to do the job. Again, we don't need to install anything to use this cache bundle. If we have Doctrine installed already, all we need to do is configure something and then all the Doctrine caching power will be at our disposal. Putting these two caching types together, we will have a big picture to cache our Symfony project: As you can see in this image, we might have a problem with the final cached page. Imagine that we have a static page that might change once a week, and in this page, there are some blocks that might change on a daily or even hourly basis, as shown in the following image. The User dashboard in our project is a good example. Thus, if we set the expiration on the gateway cache to one week, we cannot reflect all of those rapid updates in our project and task controllers. To solve this problem, we can leverage from Edge Side Includes (ESI) inside Symfony. Basically, any part of the page that has been defined inside an ESI tag can tell its own cache story to the gateway cache. Thus, we can have multiple cache strategies living side by side inside a single page. With this solution, our big picture will look as follows: Thus, we are going to use the default Symfony and Doctrine caching features for application and model layers and you can also use some popular third-party bundles for more advanced settings. If you completely understand the caching principals, moving to other caching bundles would be like a breeze. Key players in the HTTP cache header Before diving into the Symfony application cache, let's familiarize ourselves with the elements that we need to handle in our cache strategies. To do so, open https://www.wikipedia.org/ in your browser and inspect any resource with the 304 response code and ponder on request/response headers inside the Network tab: Among the response elements, there are four cache headers that we are interested in the most: expires and cache-control, which will be used for an expiration model, and etag and last-modified, which will be used for a validation model. Apart from these cache headers, we can have variations of the same cache (compressed/uncompressed) via the Vary header and we can define a cache as private (accessible by a specific user) or public (accessible by everyone). Using the Symfony reverse proxy cache There is no complicated or lengthy procedure required to activate the Symfony's gateway cache. Just open the front controller and uncomment the following lines: // web/app.php <?php //... require_once __DIR__.'/../app/AppKernel.php'; //un comment this line require_once __DIR__.'/../app/AppCache.php'; $kernel = new AppKernel('prod', false); $kernel->loadClassCache(); // and this line $kernel = new AppCache($kernel); // ... ?> Now, the kernel is wrapped around the Application Cache layer, which means that any request coming from the client will pass through this layer first. Set the expiration for the dashboard page Log in to your project and click on the Request/Response section in the debug toolbar. Then, scroll down to Response Headers and check the contents: As you can see, only cache-control is sitting there with some default values among the cache headers that we are interested in. When you don't set any value for Cache-Control, Symfony considers the page contents as private to keep them safe. Now, let's go to the Dashboard controller and add some gateway cache settings to the indexAction() method: // src/AppBundle/Controller/DashboardController.php <?php namespace AppBundleController; use SymfonyBundleFrameworkBundleControllerController; use SymfonyComponentHttpFoundationResponse; class DashboardController extends Controller { public function indexAction() { $uId = $this->getUser()->getId(); $util = $this->get('mava_util'); $userProjects = $util->getUserProjects($uId); $currentTasks= $util->getUserTasks($uId, 'in progress'); $response = new Response(); $date = new DateTime('+2 days'); $response->setExpires($date); return $this->render( 'CoreBundle:Dashboard:index.html.twig', array( 'currentTasks' => $currentTasks, 'userProjects' => $userProjects ), $response ); } } You might have noticed that we didn't change the render() method. Instead, we added the response settings as the third parameter of this method. This is a good solution because now we can keep the current template structure and adding new settings won't require any other changes in the code. However, you might wonder what other options do we have? We can save the whole $this->render() method in a variable and assign a response setting to it as follows: // src/AppBundle/Controller/DashboardController.php <?php // ... $res = $this->render( 'AppBundle:Dashboard:index.html.twig', array( 'currentTasks' => $currentTasks, 'userProjects' => $userProjects ) ); $res->setExpires($date); return $res; ?> Still looks like a lot of hard work for a simple response header setting. So let me introduce a better option. We can use the @Cache annotation as follows: // src/AppBundle/Controller/DashboardController.php <?php namespace AppBundleController; use SymfonyBundleFrameworkBundleControllerController; use SensioBundleFrameworkExtraBundleConfigurationCache; class DashboardController extends Controller { /** * @Cache(expires="next Friday") */ public function indexAction() { $uId = $this->getUser()->getId(); $util = $this->get('mava_util'); $userProjects = $util->getUserProjects($uId); $currentTasks= $util->getUserTasks($uId, 'in progress'); return $this->render( 'AppBundle:Dashboard:index.html.twig', array( 'currentTasks' => $currentTasks, 'userProjects' => $userProjects )); } } Have you noticed that the response object is completely removed from the code? With an annotation, all response headers are sent internally, which helps keep the original code clean. Now that's what I call zero-fee maintenance. Let's check our response headers in Symfony's debug toolbar and see what it looks like: The good thing about the @Cache annotation is that they can be nested. Imagine you have a controller full of actions. You want all of them to have a shared maximum age of half an hour except one that is supposed to be private and should be expired in five minutes. This sounds like a lot of code if you going are to use the response objects directly, but with an annotation, it will be as simple as this: <?php //... /** * @Cache(smaxage="1800", public="true") */ class DashboardController extends Controller { public function firstAction() { //... } public function secondAction() { //... } /** * @Cache(expires="300", public="false") */ public function lastAction() { //... } } The annotation defined before the controller class will apply to every single action, unless we explicitly add a new annotation for an action. Validation strategy In the previous example, we set the expiry period very long. This means that if a new task is assigned to the user, it won't show up in his dashboard because of the wrong caching strategy. To fix this issue, we can validate the cache before using it. There are two ways for validation: We can check the content's date via the Last-Modified header: In this technique, we certify the freshness of a content via the time it has been modified. In other words, if we keep track of the dates and times of each change on a resource, then we can simply compare that date with cache's date and find out if it is still fresh. We can use the ETag header as a unique content signature: The other solution is to generate a unique string based on the contents and evaluate the cache's freshness based on its signature. We are going to try both of them in the Dashboard controller and see them in action. Using the right validation header is totally dependent on the current code. In some actions, calculating modified dates is way easier than creating a digital footprint, while in others, going through the date and time function might looks costly. Of course, there are situations where generating both headers are critical. So creating it is totally dependent on the code base and what you are going to achieve. As you can see, we have two entities in the indexAction() method and, considering the current code, generating the ETag header looks practical. So the validation header will look as follows: // src/AppBundle/Controller/DashboardController.php <?php //... class DashboardController extends Controller { /** * @Cache(ETag="userProjects ~ finishedTasks") */ public function indexAction() { //... } } The next time a request arrives, the cache layer looks into the ETag value in the controller, compares it with its own ETag, and calls the indexAction() method; only, there is a difference between these two. How to mix expiration and validation strategies Imagine that we want to keep the cache fresh for 10 minutes and simultaneously keep an eye on any changes over user projects or finished tasks. It is obvious that tasks won't finish every 10 minutes and it is far beyond reality to expect changes on project status during this period. So what we can do to make our caching strategy efficient is that we can combine Expiration and Validation together and apply them to the Dashboard Controller as follows: // src/CoreBundle/Controller/DashboardController.php <?php //... /** * @Cache(expires="600") */ class DashboardController extends Controller { /** * @Cache(ETag="userProjects ~ finishedTasks") */ public function indexAction() { //... } } Keep in mind that Expiration has a higher priority over Validation. In other words, the cache is fresh for 10 minutes, regardless of the validation status. So when you visit your dashboard for the first time, a new cache plus a 302 response (not modified) is generated automatically and you will hit cache for the next 10 minutes. However, what happens after 10 minutes is a little different. Now, the expiration status is not satisfying; thus, the HTTP flow falls into the validation phase and in case nothing happened to the finished tasks status or the your project status, then a new expiration period is generated and you hit the cache again. However, if there is any change in your tasks or project status, then you will hit the server to get the real response, and a new cache from response's contents, new expiration period, and new ETag are generated and stored in the cache layer for future references. Summary In this article, you learned about the basics of gateway and Doctrine caching. We saw how to set expiration and validation strategies using HTTP headers such as Cache-Control, Expires, Last-Modified, and ETag. You learned how to set public and private access levels for a cache and use an annotation to define cache rules in the controller. Resources for Article: Further resources on this subject: User Interaction and Email Automation in Symfony 1.3: Part1 [article] The Symfony Framework – Installation and Configuration [article] User Interaction and Email Automation in Symfony 1.3: Part2 [article]
Read more
  • 0
  • 0
  • 15194
article-image-proxmox-ve-fundamentals
Packt
04 Apr 2016
12 min read
Save for later

Proxmox VE Fundamentals

Packt
04 Apr 2016
12 min read
In this article written by Rik Goldman author of the book Learning Proxmox VE, we introduce to you Proxmox Virtual Environment (PVE) which is a mature, complete, well-supported, enterprise-class virtualization environment for servers. It is an open source tool—based in the Debian GNU/Linux distribution—that manages containers, virtual machines, storage, virtualized networks, and high-availability clustering through a well-designed, web-based interface or via the command-line interface. (For more resources related to this topic, see here.) Developers provided the first stable release of Proxmox VE in 2008; 4 years and eight point releases later, ZDNet's Ken Hess boldly, but quite sensibly, declared Proxmox VE as Proxmox: The Ultimate Hypervisor (http://www.zdnet.com/article/proxmox-the-ultimate-hypervisor/). Four years later, PVE is on version 4.1, in use by at least 90,000 hosts, and more than 500 commercial customers in 140 countries; the web-based administrative interface itself is translated into nineteen languages. This article will explore the fundamental technologies underlying PVE's hypervisor features: LXC, KVM, and QEMU. To do so, we will develop a working understanding of virtual machines, containers, and their appropriate use. We will cover the following topics: Proxmox VE in brief Virtualization and containerization with PVE Proxmox VE virtual machines, KVM, and QEMU Containerization with PVE and LXC Proxmox VE in brief With Proxmox VE, Proxmox Server Solutions GmbH (https://www.proxmox.com/en/about) provides us with an enterprise-ready, open source type II hypervisor. Later, you'll find some of the features that make Proxmox VE such a strong enterprise candidate. The license for Proxmox VE is very deliberately the GNU Affero General Public License (V3) (https://www.gnu.org/licenses/agpl-3.0.html). From among the many free and open source compatible licenses available, this is a significant choice because it is "specifically designed to ensure cooperation with the community in the case of network server software." PVE is primarily administered from an integrated web interface or from the command line locally or via SSH. Consequently, there is no need for a separate management server and the associated expenditure. In this way, Proxmox VE significantly contrasts with alternative enterprise virtualization solutions by vendors such as VMware. Proxmox VE instances/nodes can be incorporated into PVE clusters, and centrally administered from a unified web interface. Proxmox VE provides for live migration—the movement of a virtual machine or container from one cluster node to another without any disruption of services. This is a rather unique feature to PVE and not common to competing products. Features Proxmox VE VMware vSphere Hardware requirements Flexible Strict compliance with HCL Integrated management interface Web- and shell-based (browser and SSH) No. Requires dedicated management server at additional cost Simple subscription structure Yes; based on number of premium support tickets per year and CPU socket count No High availability Yes Yes VM live migration Yes Yes Supports containers Yes No Virtual machine OS support Windows and Linux Windows, Linux, and Unix Community support Yes No Live VM snapshots Yes Yes Contrasting Proxmox VE and VMware vSphere features For a complete catalog of features, see the Proxmox VE datasheet at https://www.proxmox.com/images/download/pve/docs/Proxmox-VE-Datasheet.pdf. Like its competitors, PVE is a hypervisor: a typical hypervisor is software that creates, runs, configures, and manages virtual machines based on an administrator or engineer's choices. PVE is known as a type II hypervisor because the virtualization layer is built upon an operating system. As a type II hypervisor, Proxmox VE is built on the Debian project. Debian is a GNU/Linux distribution renowned for its reliability, commitment to security, and its thriving and dedicated community of contributing developers. A type II hypervisor, such as PVE, runs directly over the operating system. In Proxmox VE's case, the operating system is Debian; since the release of PVE 4.0, the underlying operating system has been Debian "Jessie." By contrast, a type I hypervisor (such as VMware's ESXi) runs directly on bare metal without the mediation of an operating system. It has no additional function beyond managing virtualization and the physical hardware. A type I hypervisor runs directly on hardware, without the mediation of an operating system. As a type II hypervisor, Proxmox VE is built on the Debian project. Debian is a GNU/Linux distribution renowned for its reliability, commitment to security, and its thriving and dedicated community of contributing developers. Debian-based GNU/Linux distributions are arguably the most popular GNU/Linux distributions for the desktop. One characteristic that distinguishes Debian from competing distribution is its release policy: Debian releases only when its development community can stand behind it for its stability, security, and usability. Debian does not distinguish between long-term support releases and regular releases as do some other distributions. Instead, all Debian releases receive strong support and critical updates through the first year following the next release. (Since 2007, a major release of Debian has been made about every two years. Debian 8, Jessie, was released just about on schedule in 2015. Proxmox VE's reliance on Debian is thus a testament to its commitment to these values: stability, security, and usability over scheduled releases that favor cutting-edge features. PVE provides its virtualization functionality through three open technologies, and the efficiency with which they're integrated by its administrative web interface: LXC KVM QEMU To understand how this foundation serves Proxmox VE, we must first be able to clearly understand the relationship between virtualization (or, specifically, hardware virtualization) and containerization (OS virtualization). As we proceed, their respective use cases should become clear. Virtualization and containerization with Proxmox VE It is correct to ultimately understand containerization as a type of virtualization. However, here, we'll look first to conceptually distinguish a virtual machine from a container by focusing on contrasting characteristics. Simply put, virtualization is a technique through which we provide fully-functional, computing resources without a demand for resources' physical organization, locations, or relative proximity. Briefly put, virtualization technology allows you to share and allocate the resources of a physical computer into multiple execution environments. Without context, virtualization is a vague term, encapsulating the abstraction of such resources as storage, networks, servers, desktop environments, and even applications from their concrete hardware requirements through software implementation solutions called hypervisors. Virtualization thus affords us more flexibility, more functionality, and a significant positive impact on our budgets—often realized with merely the resources we have at hand. In terms of PVE, virtualization most commonly refers to the abstraction of all aspects of a discrete computing system from its hardware. In this context, virtualization is the creation, in other words, of a virtual machine or VM, with its own operating system and applications. A VM may be initially understood as a computer that has the same functionality as a physical machine. Likewise, it may be incorporated and communicated with via a network exactly as a machine with physical hardware would. Put yet another way, from inside a VM, we will experience no difference from which we can distinguish it from a physical computer. The virtual machine, moreover, hasn't the physical footprint of its physical counterparts. The hardware it relies on is, in fact, provided by software that borrows from the hardware resources from a host installed on a physical machine (or bare metal). Nevertheless, the software components of the virtual machine, from the applications to the operating system, are distinctly separated from those of the host machine. This advantage is realized when it comes to allocating physical space for resources. For example, we may have a PVE server running a web server, database server, firewall, and log management system—all as discrete virtual machines. Rather than consuming the physical space, resources, and labor of maintaining four physical machines, we simply make physical room for the single Proxmox VE server and configure an appropriate virtual LAN as necessary. In a white paper entitled Putting Server Virtualization to Work, AMD articulates well the benefits of virtualization to businesses and developers (https://www.amd.com/Documents/32951B_Virtual_WP.pdf): Top 5 business benefits of virtualization: Increases server utilization Improves service levels Streamlines manageability and security Decreases hardware costs Reduces facility costs The benefits of virtualization with a development and test environment: Lowers capital and space requirements. Lowers power and cooling costs Increases efficiencies through shorter test cycles Faster time-to-market To these benefits, let's add portability and encapsulation: the unique ability to migrate a live VM from one PVE host to another—without suffering a service outage. Proxmox VE makes the creation and control of virtual machines possible through the combined use of two free and open source technologies: Kernel-based Virtual Machine (or KVM) and Quick Emulator (QEMU). Used together, we refer to this integration of tools as KVM-QEMU. KVM KVM has been an integral part of the Linux kernel since February, 2007. This kernel module allows GNU/Linux users and administrators to take advantage of an architecture's hardware virtualization extensions; for our purposes, these extensions are AMD's AMD-V and Intel's VT-X for the x86_64 architecture. To really make the most of Proxmox VE's feature set, you'll therefore very much want to install on an x86_64 machine with a CPU with integrated virtualization extensions. For a full list of AMD and Intel processors supported by KVM, visit Intel at http://ark.intel.com/Products/VirtualizationTechnology or AMD at http://support.amd.com/en-us/kb-articles/Pages/GPU120AMDRVICPUsHyperVWin8.aspx. QEMU QEMU provides an emulation and virtualization interface that can be scripted or otherwise controlled by a user. Visualizing the relationship between KVM and QEMU Without Proxmox VE, we could essentially define the hardware, create a virtual disk, and start and stop a virtualized server from the command line using QEMU. Alternatively, we could rely on any one of an array of GUI frontends for QEMU (a list of GUIs available for various platforms can be found at http://wiki.qemu.org/Links#GUI_Front_Ends). Of course, working with these solutions is productive only if you're interested in what goes on behind the scenes in PVE when virtual machines are defined. Proxmox VE's management of virtual machines is itself managing QEMU through its API. Managing QEMU from the command line can be tedious. The following is a line from a script that launched Raspbian, a Debian remix intended for the architecture of the Raspberry Pi, on an x86 Intel machine running Ubuntu. When we see how easy it is to manage VMs from Proxmox VE's administrative interfaces, we'll sincerely appreciate that relative simplicity: qemu-system-arm -kernel kernel-qemu -cpu arm1176 -m 256 -M versatilepb -no-reboot -serial stdio -append "root=/dev/sda2 panic=1" -hda ./$raspbian_img -hdb swap If you're familiar with QEMU's emulation features, it's perhaps important to note that we can't manage emulation through the tools and features Proxmox VE provides—despite its reliance on QEMU. From a bash shell provided by Debian, it's possible. However, the emulation can't be controlled through PVE's administration and management interfaces. Containerization with Proxmox VE Containers are a class of virtual machines (as containerization has enjoyed a renaissance since 2005, the term OS virtualization has become synonymous with containerization and is often used for clarity). However, by way of contrast with VMs, containers share operating system components, such as libraries and binaries, with the host operating system; a virtual machine does not. Visually contrasting virtual machines with containers The container advantage This arrangement potentially allows a container to run leaner and with fewer hardware resources borrowed from the host. For many authors, pundits, and users, containers also offer a demonstrable advantage in terms of speed and efficiency. (However, it should be noted here that as resources such as RAM and more powerful CPUs become cheaper, this advantage will diminish.) The Proxmox VE container is made possible through LXC from version 4.0 (it's made possible through OpenVZ in previous PVE versions). LXC is the third fundamental technology serving Proxmox VE's ultimate interest. Like KVM and QEMU, LXC (or Linux Containers) is an open source technology. It allows a host to run, and an administrator to manage, multiple operating system instances as isolated containers on a single physical host. Conceptually then, a container very clearly represents a class of virtualization, rather than an opposing concept. Nevertheless, it's helpful to maintain a clear distinction between a virtual machine and a container as we come to terms with PVE. The ideal implementation of a Proxmox VE guest is contingent on our distinguishing and choosing between a virtual-machine solution and a container solution. Since Proxmox VE containers share components of the host operating system and can offer advantages in terms of efficiency, this text will guide you through the creation of containers whenever the intended guest can be fully realized with Debian Jessie as our hypervisor's operating system without sacrificing features. When our intent is a guest running a Microsoft Windows operating system, for example, a Proxmox VE container ceases to be a solution. In such a case, we turn, instead, to creating a virtual machine. We must rely on a VM precisely because the operating system components that Debian can share with a Linux container are not components a Microsoft Windows operating system can make use of. Summary In this article, we have come to terms with the three open source technologies that provide Proxmox VE's foundational features: containerization and virtualization with LXC, KVM, and QEMU. Along the way, we've come to understand that containers, while being a type of virtualization, have characteristics that distinguish them from virtual machines. These differences will be crucial as we determine which technology to rely on for a virtual server solution with Proxmox VE. Resources for Article: Further resources on this subject: Deploying App-V 5 in a Virtual Environment[article] Setting Up a Spark Virtual Environment[article] Basic Concepts of Proxmox Virtual Environment[article]
Read more
  • 0
  • 0
  • 32251

article-image-how-get-started-redux-react-native
Emilio Rodriguez
04 Apr 2016
5 min read
Save for later

How To Get Started with Redux in React Native

Emilio Rodriguez
04 Apr 2016
5 min read
In mobile development there is a need for architectural frameworks, but complex frameworks designed to be used in web environments may end up damaging the development process or even the performance of our app. Because of this, some time ago I decided to introduce in all of my React Native projects the leanest framework I ever worked with: Redux. Redux is basically a state container for JavaScript apps. It is 100 percent library-agnostic so you can use it with React, Backbone, or any other view library. Moreover, it is really small and has no dependencies, which makes it an awesome tool for React Native projects. Step 1: Install Redux in your React Native project. Redux can be added as an npm dependency into your project. Just navigate to your project’s main folder and type: npm install --save react-redux By the time this article was written React Native was still depending on React Redux 3.1.0 since versions above depended on React 0.14, which is not 100 percent compatible with React Native. Because of this, you will need to force version 3.1.0 as the one to be dependent on in your project. Step 2: Set up a Redux-friendly folder structure. Of course, setting up the folder structure for your project is totally up to every developer but you need to take into account that you will need to maintain a number of actions, reducers, and components. Besides, it’s also useful to keep a separate folder for your API and utility functions so these won’t be mixing with your app’s core functionality. Having this in mind, this is my preferred folder structure under the src folder in any React Native project: Step 3: Create your first action. In this article we will be implementing a simple login functionality to illustrate how to integrate Redux inside React Native. A good point to start this implementation is the action, a basic function called from the component whenever we want the whole state of the app to be changed (i.e. changing from the logged out state into the logged in state). To keep this example as concise as possible we won’t be doing any API calls to a backend – only the pure Redux integration will be explained. Our action creator is a simple function returning an object (the action itself) with a type attribute expressing what happened with the app. No business logic should be placed here; our action creators should be really plain and descriptive. Step 4: Create your first reducer. Reducers are the ones in charge of updating the state of the app. Unlike in Flux, Redux only has one store for the whole app, but it will be conveniently name-spaced automatically by Redux once the reducers have been applied. In our example, the user reducer needs to be aware of when the user is logged in. Because of that, it needs to import the LOGIN_SUCCESS constant we defined in our actions before and export a default function, which will be called by Redux every time an action occurs in the app. Redux will automatically pass the current state of the app and the action occurred. It’s up to the reducer to realize if it needs to modify the state or not based on the action.type. That’s why almost every time our reducer will be a function containing a switch statement, which modifies and returns the state based on what action occurred. It’s important to state that Redux works with object references to identify when the state is changed. Because of this, the state should be cloned before any modification. It’s also interesting to know that the action passed to the reducers can contain other attributes apart from type. For example, when doing a more complex login, the user first name and last name can be added to the action by the action created and used by the reducer to update the state of the app. Step 5: Create your component. This step is almost pure React Native coding. We need a component to trigger the action and to respond to the change of state in the app. In our case it will be a simple View containing a button that disappears when logged in. This is a normal React Native component except for some pieces of the Redux boilerplate: The three import lines at the top will require everything we need from Redux ‘mapStateToProps’ and ‘mapDispatchToProps’ are two functions bound with ‘connect’ to the component: this makes Redux know that this component needs to be passed a piece of the state (everything under ‘userReducers’) and all the actions available in the app. Just by doing this, we will have access to the login action (as it is used in the onLoginButtonPress) and to the state of the app (as it is used in the !this.props.user.loggedIn statement) Step 6: Glue it all from your index.ios.js. For Redux to apply its magic, some initialization should be done in the main file of your React Native project (index.ios.js). This is pure boilerplate and only done once: Redux needs to inject a store holding the app state into the app. To do so, it requires a ‘Provider’ wrapping the whole app. This store is basically a combination of reducers. For this article we only need one reducer, but a full app will include many others and each of them should be passed into the combineReducers function to be taken into account by Redux whenever an action is triggered. About the Author Emilio Rodriguez started working as a software engineer for Sun Microsystems in 2006. Since then, he has focused his efforts on building a number of mobile apps with React Native while contributing to the React Native project. These contributions helped his understand how deep and powerful this framework is.
Read more
  • 0
  • 0
  • 44376

Packt
04 Apr 2016
20 min read
Save for later

Morphology – Getting Our Feet Wet

Packt
04 Apr 2016
20 min read
In this article by Deepti Chopra, Nisheeth Joshi, and Iti Mathur authors of the book Mastering Natural Language Processing with Python, morphology may be defined as the study of the composition of words using morphemes. A morpheme is the smallest unit of the language that has a meaning. In this article, we will discuss stemming and lemmatizing, creating a stemmer and lemmatizer for non-English languages, developing a morphological analyzer and morphological generator using machine learning tools, creating a search engine, and many other concepts. In brief, this article will include the following topics: Introducing morphology Creating a stemmer and lemmatizer Developing a stemmer for non-English languages Creating a morphological analyzer Creating a morphological generator Creating a search engine (For more resources related to this topic, see here.) Introducing morphology Morphology may be defined as the study of the production of tokens with the help of morphemes. A morpheme is the basic unit of language, which carries a meaning. There are two types of morphemes: stems and affixes (suffixes, prefixes, infixes, and circumfixes). Stems are also referred to as free morphemes since they can even exist without adding affixes. Affixes are referred to as bound morphemes since they cannot exist in a free form, and they always exist along with free morphemes. Consider the word "unbelievable". Here, "believe" is a stem or free morpheme. It can even exist on its own. The morphemes "un" and "able" are affixes or bound morphemes. They cannot exist in s free form but exist together with a stem. There are three kinds of languages, namely isolating languages, agglutinative languages, and inflecting languages. Morphology has different meanings in all these languages. Isolating languages are those languages in which words are merely free morphemes, and they do not carry any tense (past, present, and future) or number (singular or plural) information. Mandarin Chinese is an example of an isolating language. Agglutinative languages are those languages in which small words combine together to convey compound information. Turkish is an example of an agglutinative language. Inflecting languages are languages in which words are broken down into simpler units, but all these simpler units exhibit different meanings. Latin is an example of an inflecting language. There are morphological processes such as inflections, derivations, semi-affixes, combining forms, and cliticization. An inflection refers to transforming a word into a form so that it represents a person, number, tense, gender, case, aspect, and mood. Here, the syntactic category of the token remains the same. In derivation, the syntactic category of word is also changed. Semi-affixes are bound morphemes that exhibit a word-like quality, for example, noteworthy, antisocial, anticlockwise, and so on. Understanding stemmers Stemming may be defined as the process of obtaining a stem from a word by eliminating the affixes from it. For example, in the word "raining", a stemmer would return the root word or the stem word "rain" by removing the affix "ing" from "raining". In order to increase the accuracy of information retrieval, search engines mostly use stemming to get a stem and store it as an index word. Search engines call words with the same meaning synonyms, which may be a kind of query expansion known as conflation. Martin Porter has designed a well-known stemming algorithm known as the Porter Stemming Algorithm. This algorithm is basically designed to replace and eliminate some well-known suffices present in English words. To perform stemming in NLTK, we can simply perform the instantiation of the PorterStemmer class, and then perform stemming by calling the stem method. Let's take a look at the code for stemming using the PorterStemmer class in NLTK: >>> import nltk>>> from nltk.stem import PorterStemmer>>> stemmerporter = PorterStemmer()>>> stemmerporter.stem('working')'work'>>> stemmerporter.stem('happiness')'happi' The PorterStemmer class is trained and has the knowledge of many stems and word forms in the English language. The process of stemming takes place in a series of steps and transforms a word into a shorter word or this word may similar meaning to the root word. The stemmer I interface defines the stem() method, and all stemmers are inherited from this interface. The inheritance diagram is depicted here: Another Stemming algorithm, known as the Lancaster Stemming algorithm, was introduced in Lancaster University. Similar to the PorterStemmer class, the LancasterStemmer class is used in NLTK to implement Lancaster Stemming. Let's consider the following code, which depicts Lancaster stemming in NLTK: >>> import nltk >>> from nltk.stem import LancasterStemmer >>> stemmerlan=LancasterStemmer() >>> stemmerlan.stem('working') 'work' >>> stemmerlan.stem('happiness') 'happy' We can also build our own stemmer in NLTK using RegexpStemmer. This works by accepting a string and eliminates it from the prefix or suffix of a word when a match is found. Let's consider an example of stemming using RegexpStemmer in NLTK: >>> import nltk >>> from nltk.stem import RegexpStemmer >>> stemmerregexp=RegexpStemmer('ing') >>> stemmerregexp.stem('working') 'work' >>> stemmerregexp.stem('happiness') 'happiness' >>> stemmerregexp.stem('pairing') 'pair' We can use RegexpStemmer in cases where stemming cannot be performed using PorterStemmer and LancasterStemmer. The SnowballStemmer class is used to perform stemming in 13 languages other than English. In order to perform stemming using SnowballStemmer, firstly, an instance is created in the language where stemming needs to be performed, and then using the stem() method, stepping is performed. Consider the following example to perform stemming in Spanish and French in NLTK using SnowballStemmer: >>> import nltk >>> from nltk.stem import SnowballStemmer >>> SnowballStemmer.languages ('danish', 'dutch', 'english', 'finnish', 'french', 'german', 'hungarian', 'italian', 'norwegian', 'porter', 'portuguese', 'romanian', 'russian', 'spanish', 'swedish') >>> spanishstemmer=SnowballStemmer('spanish') >>> spanishstemmer.stem('comiendo') 'com' >>> frenchstemmer=SnowballStemmer('french') >>> frenchstemmer.stem('manger') 'mang' Nltk.stem.api consists of the stemmer I class in which the stem function is performed. Consider the following code present in NLTK, which enables stemming to be performed: Class StemmerI(object): """ It is an interface that helps to eliminate morphological affixes from the tokens and the process is known as stemming. """ def stem(self, token): """ Eliminate affixes from token and stem is returned. """ raise NotImplementedError() Here's the code used to perform stemming using multiple stemmers: >>> import nltk >>> from nltk.stem.porter import PorterStemmer >>> from nltk.stem.lancaster import LancasterStemmer >>> from nltk.stem import SnowballStemmer >>> def obtain_tokens(): With open('/home/p/NLTK/sample1.txt') as stem: tok = nltk.word_tokenize(stem.read()) return tokens >>> def stemming(filtered): stem=[] for x in filtered: stem.append(PorterStemmer().stem(x)) return stem >>> if_name_=="_main_": tok= obtain_tokens() >>> print("tokens is %s")%(tok) >>> stem_tokens= stemming(tok) >>> print("After stemming is %s")%stem_tokens >>> res=dict(zip(tok,stem_tokens)) >>> print("{tok:stemmed}=%s")%(result) Understanding lemmatization Lemmatization is the process in which we transform a word into a form that has a different word category. The word formed after lemmatization is entirely different from what it was initially. Consider an example of lemmatization in NLTK: >>> import nltk >>> from nltk.stem import WordNetLemmatizer >>> lemmatizer_output=WordNetLemmatizer() >>> lemmatizer_output.lemmatize('working') 'working' >>> lemmatizer_output.lemmatize('working',pos='v') 'work' >>> lemmatizer_output.lemmatize('works') 'work' WordNetLemmatizer may be defined as a wrapper around the so-called WordNet corpus, and it makes use of the morphy() function present in WordNetCorpusReader to extract a lemma. If no lemma is extracted, then the word is only returned in its original form. For example, for 'works', the lemma that is returned is in the singular form 'work'. This code snippet illustrates the difference between stemming and lemmatization: >>> import nltk >>> from nltk.stem import PorterStemmer >>> stemmer_output=PorterStemmer() >>> stemmer_output.stem('happiness') 'happi' >>> from nltk.stem import WordNetLemmatizer >>> lemmatizer_output.lemmatize('happiness') 'happiness' In the preceding code, 'happiness' is converted to 'happi' by stemming it. Lemmatization can't find the root word for 'happiness', so it returns the word "happiness". Developing a stemmer for non-English languages Polyglot is a software that is used to provide models called morfessor models, which are used to obtain morphemes from tokens. The Morpho project's goal is to create unsupervised data-driven processes. Its focuses on the creation of morphemes, which are the smallest units of syntax. Morphemes play an important role in natural language processing. They are useful in automatic recognition and the creation of language. With the help of the vocabulary dictionaries of polyglot, morfessor models on 50,000 tokens of different languages was used. Here's the code to obtain a language table using a polyglot: from polyglot.downloader import downloader print(downloader.supported_languages_table("morph2")) The output obtained from the preceding code is in the form of these languages listed as follows: 1. Piedmontese language 2. Lombard language 3. Gan Chinese 4. Sicilian 5. Scots 6. Kirghiz, Kyrgyz 7. Pashto, Pushto 8. Kurdish 9. Portuguese 10. Kannada 11. Korean 12. Khmer 13. Kazakh 14. Ilokano 15. Polish 16. Panjabi, Punjabi 17. Georgian 18. Chuvash 19. Alemannic 20. Czech 21. Welsh 22. Chechen 23. Catalan; Valencian 24. Northern Sami 25. Sanskrit (Saṁskṛta) 26. Slovene 27. Javanese 28. Slovak 29. Bosnian-Croatian-Serbian 30. Bavarian 31. Swedish 32. Swahili 33. Sundanese 34. Serbian 35. Albanian 36. Japanese 37. Western Frisian 38. French 39. Finnish 40. Upper Sorbian 41. Faroese 42. Persian 43. Sinhala, Sinhalese 44. Italian 45. Amharic 46. Aragonese 47. Volapük 48. Icelandic 49. Sakha 50. Afrikaans 51. Indonesian 52. Interlingua 53. Azerbaijani 54. Ido 55. Arabic 56. Assamese 57. Yoruba 58. Yiddish 59. Waray-Waray 60. Croatian 61. Hungarian 62. Haitian; Haitian Creole 63. Quechua 64. Armenian 65. Hebrew (modern) 66. Silesian 67. Hindi 68. Divehi; Dhivehi; Mald... 69. German 70. Danish 71. Occitan 72. Tagalog 73. Turkmen 74. Thai 75. Tajik 76. Greek, Modern 77. Telugu 78. Tamil 79. Oriya 80. Ossetian, Ossetic 81. Tatar 82. Turkish 83. Kapampangan 84. Venetian 85. Manx 86. Gujarati 87. Galician 88. Irish 89. Scottish Gaelic; Gaelic 90. Nepali 91. Cebuano 92. Zazaki 93. Walloon 94. Dutch 95. Norwegian 96. Norwegian Nynorsk 97. West Flemish 98. Chinese 99. Bosnian 100. Breton 101. Belarusian 102. Bulgarian 103. Bashkir 104. Egyptian Arabic 105. Tibetan Standard, Tib... 106. Bengali 107. Burmese 108. Romansh 109. Marathi (Marāthī) 110. Malay 111. Maltese 112. Russian 113. Macedonian 114. Malayalam 115. Mongolian 116. Malagasy 117. Vietnamese 118. Spanish; Castilian 119. Estonian 120. Basque 121. Bishnupriya Manipuri 122. Asturian 123. English 124. Esperanto 125. Luxembourgish, Letzeb... 126. Latin 127. Uighur, Uyghur 128. Ukrainian 129. Limburgish, Limburgan... 130. Latvian 131. Urdu 132. Lithuanian 133. Fiji Hindi 134. Uzbek 135. Romanian, Moldavian, ... The necessary models can be downloaded using the following code: %%bash polyglot download morph2.en morph2.ar [polyglot_data] Downloading package morph2.en to [polyglot_data] /home/rmyeid/polyglot_data... [polyglot_data] Package morph2.en is already up-to-date! [polyglot_data] Downloading package morph2.ar to [polyglot_data] /home/rmyeid/polyglot_data... [polyglot_data] Package morph2.ar is already up-to-date! Consider this example that obtains output from a polyglot: from polyglot.text import Text, Word tokens =["unconditional" ,"precooked", "impossible", "painful", "entered"] for s in tokens: s=Word(s, language="en") print("{:<20}{}".format(s,s.morphemes)) unconditional ['un','conditional'] precooked ['pre','cook','ed'] impossible ['im','possible'] painful ['pain','ful'] entered ['enter','ed'] If tokenization is not performed properly, then we can perform morphological analysis for the process of splitting text into its original constituents: sent="Ihopeyoufindthebookinteresting" para=Text(sent) para.language="en" para.morphemes WordList(['I','hope','you','find','the','book','interesting']) A morphological analyzers Morphological analysis may be defined as the process of obtaining grammatical information about a token given its suffix information. Morphological analysis can be performed in three ways: Morpheme-based morphology (or the item and arrangement approach), Lexeme-based morphology (or the item and process approach), and Word-based morphology (or the word and paradigm approach). A morphological analyzer may be defined as a program that is responsible for the analysis of the morphology of a given input token. It analyzes a given token and generates morphological information, such as gender, number, class, and so on, as an output. In order to perform morphological analysis on a given non-whitespace token, pyEnchant dictionary is used. Consider the following code that performs morphological analysis: >>> import enchant >>> s = enchant.Dict("en_US") >>> tok=[] >>> def tokenize(st1): if not st1:return for j in xrange(len(st1),-1,-1): if s.check(st1[0:j]): tok.append(st1[0:i]) st1=st[j:] tokenize(st1) break >>> tokenize("itismyfavouritebook") >>> tok ['it', 'is', 'my','favourite','book'] >>> tok=[ ] >>> tokenize("ihopeyoufindthebookinteresting") >>> tok ['i','hope','you','find','the','book','interesting'] We can determine the category of a word as follows: Morphological hints: Suffix information helps us to detect the category of a word. For example, -ness and –ment suffixes exist with nouns. Syntactic hints: Contextual information is conducive in determining the category of a word. For example, if we have found a word that has a noun category, then syntactic hints will be useful in determining whether an adjective will appear before the noun or after the noun category. Semantic hints: A semantic hint is also useful in determining the category of a word. For example, if we already know that a word represents the name of a location, then it will fall under the noun category. Open class: This refers to the class of words that are not fixed and each day, their number keeps on increasing whenever a new word is added to the list. Words in an open class are usually in the form of nouns. Prepositions are mostly a closed class. Morphology captured by the part of speech tagset: Part of Speech tagset capture information that helps us to perform morphology. For example, the word 'plays' would appear with the third person and singular noun. Omorfi (the open morphology of Finnish) is a package that has been licensed by version 3 of GNU GPL. It is used for the purpose of performing numerous tasks such as language modeling, morphological analysis, rule-based machine translations, information retrieval, statistical machine translations, morphological segmentation, ontologies, and spell checking and correction. A morphological generator A morphological generator is a program that performs the task of morphological generations. Morphological generation may be considered the opposite of morphological analysis. Here, given the description of a word in terms of its number, category, stem, and so on, the original word is retrieved. For example, if root = go, Part of Speech = verb, tense= present, and if it occurs along with a third person and singular subject, then the morphological generator would generate its surface form, that is, goes. There are many Python-based software that perform morphological analysis and generation. Some of them are as follows: ParaMorfo: This is used to perform the morphological generation and analysis of Spanish and Guarani nouns, adjectives, and verbs HornMorpho: This is used for the morphological generation and analysis of Oromo and Amharic nouns and verbs as well as Tigrinya verbs AntiMorfo: This is used for the morphological generation and analysis of Quechua adjectives, verbs, and nouns as well as Spanish verbs MorfoMelayu: This is used for the morphological analysis of Malay words Other examples of software that is used to perform morphological analysis and generation are as follows: Morph is a morphological generator and analyzer for the English language and the RASP system Morphy is a morphological generator, analyzer, and POS tagger for German Morphisto is a morphological generator and analyzer for German Morfette performs supervised learning (inflectional morphology) for Spanish and French Search engines PyStemmer 1.0.1 consists of Snowball stemming algorithms that are conducive for performing information retrieval tasks and the construction of a search engine. It consists of the Porter stemming algorithm and many other stemming algorithms that are useful for the purpose of performing stemming and information retrieval tasks in many languages, including many European languages. We can construct a vector space search engine by converting the texts into vectors. Here are the steps needed to construct a vector space search engine: Stemming and elimination of stop words. A stemmer is a program that accepts words and converts them into stems. Tokens that have same stem almost have the same meanings. Stop words are also eliminated from text. Consider the following code for the removal of stop words and tokenization: def eliminatestopwords(self,list): " " " Eliminate words which occur often and have not much significance from context point of view. " " " return[ word for word in list if word not in self.stopwords ] def tokenize(self,string): " " " Perform the task of splitting text into stop words and tokens " " " Str=self.clean(str) Words=str.split(" ") return [self.stemmer.stem(word,0,len(word)-1) for word in words] Mapping keywords into vector dimensions.Here's the code required to perform the mapping of keywords into vector dimensions: def obtainvectorkeywordindex(self, documentList): " " " In the document vectors, generate the keyword for the given position of element " " " #Perform mapping of text into strings vocabstring = " ".join(documentList) vocablist = self.parser.tokenise(vocabstring) #Eliminate common words that have no search significance vocablist = self.parser.eliminatestopwords(vocablist) uniqueVocablist = util.removeDuplicates(vocablist) vectorIndex={} offset=0 #Attach a position to keywords that performs mapping with dimension that is used to depict this token for word in uniqueVocablist: vectorIndex[word]=offset offset+=1 return vectorIndex #(keyword:position) Mapping of text strings to vectorsHere, a simple term count model is used. The code to convert text strings into vectors is as follows: def constructVector(self, wordString): # Initialise the vector with 0's Vector_val = [0] * len(self.vectorKeywordIndex) tokList = self.parser.tokenize(tokString) tokList = self.parser.eliminatestopwords(tokList) for word in toklist: vector[self.vectorKeywordIndex[word]] += 1; # simple Term Count Model is used return vector Searching similar documents By finding the cosine of an angle between the vectors of a document, we can prove whether two given documents are similar or not. If the cosine value is 1, then the angle value is 0 degrees and vectors are said to be parallel (this means that documents are related). If the cosine value is 0 and the value of the angle is 90 degrees, then vectors are said to be perpendicular (this means that documents are not related). This is the code to compute the cosine between the text vector using scipy: def cosine(vec1, vec2): """ cosine = ( X * Y ) / ||X|| x ||Y|| """ return float(dot(vec1,vec2) / (norm(vec1) * norm(vec2))) Search keywords We perform the mapping of keywords to a vector space. We construct a temporary text that represents items to be searched and then compare it with document vectors with the help of a cosine measurement. Here is the following code needed to search for the vector space: def searching(self,searchinglist): """ search for text that are matched on the basis of list of items """ askVector = self.buildQueryVector(searchinglist) ratings = [util.cosine(askVector, textVector) for textVector in self.documentVectors] ratings.sort(reverse=True) return ratings The following code can be used to detect languages from a source text: >>> import nltk >>> import sys >>> try: from nltk import wordpunct_tokenize from nltk.corpus import stopwords except ImportError: print( 'Error has occured') #---------------------------------------------------------------------- >>> def _calculate_languages_ratios(text): """ Compute probability of given document that can be written in different languages and give a dictionary that appears like {'german': 2, 'french': 4, 'english': 1} """ languages_ratios = {} ''' nltk.wordpunct_tokenize() splits all punctuations into separate tokens wordpunct_tokenize("I hope you like the book interesting .") [' I',' hope ','you ','like ','the ','book' ,'interesting ','.'] ''' tok = wordpunct_tokenize(text) wor = [word.lower() for word in tok] # Compute occurence of unique stopwords in a text for language in stopwords.fileids(): stopwords_set = set(stopwords.words(language)) words_set = set(words) common_elements = words_set.intersection(stopwords_set) languages_ratios[language] = len(common_elements) # language "score" return languages_ratios #---------------------------------------------------------------- >>> def detect_language(text): """ Compute the probability of given text that is written in different languages and obtain the one that is highest scored. It makes use of stopwords calculation approach, finds out unique stopwords present in a analyzed text. """ ratios = _calculate_languages_ratios(text) most_rated_language = max(ratios, key=ratios.get) return most_rated_language if __name__=='__main__': text = ''' All over this cosmos, most of the people believe that there is an invisible supreme power that is the creator and the runner of this world. Human being is supposed to be the most intelligent and loved creation by that power and that is being searched by human beings in different ways into different things. As a result people reveal His assumed form as per their own perceptions and beliefs. It has given birth to different religions and people are divided on the name of religion viz. Hindu, Muslim, Sikhs, Christian etc. People do not stop at this. They debate the superiority of one over the other and fight to establish their views. Shrewd people like politicians oppose and support them at their own convenience to divide them and control them. It has intensified to the extent that even parents of a new born baby teach it about religious differences and recommend their own religion superior to that of others and let the child learn to hate other people just because of religion. Jonathan Swift, an eighteenth century novelist, observes that we have just enough religion to make us hate, but not enough to make us love one another. The word 'religion' does not have a derogatory meaning - A literal meaning of religion is 'A personal or institutionalized system grounded in belief in a God or Gods and the activities connected with this'. At its basic level, 'religion is just a set of teachings that tells people how to lead a good life'. It has never been the purpose of religion to divide people into groups of isolated followers that cannot live in harmony together. No religion claims to teach intolerance or even instructs its believers to segregate a certain religious group or even take the fundamental rights of an individual solely based on their religious choices. It is also said that 'Majhab nhi sikhata aaps mai bair krna'. But this very majhab or religion takes a very heinous form when it is misused by the shrewd politicians and the fanatics e.g. in Ayodhya on 6th December, 1992 some right wing political parties and communal organizations incited the Hindus to demolish the 16th century Babri Masjid in the name of religion to polarize Hindus votes. Muslim fanatics in Bangladesh retaliated and destroyed a number of temples, assassinated innocent Hindus and raped Hindu girls who had nothing to do with the demolition of Babri Masjid. This very inhuman act has been presented by Taslima Nasrin, a Banglsdeshi Doctor-cum-Writer in her controversial novel 'Lajja' (1993) in which, she seems to utilizes fiction's mass emotional appeal, rather than its potential for nuance and universality. ''' >>> language = detect_language(text) >>> print(language) The preceding code will search for stop words and detect the language of the text, which is English. Summary In this article, we discussed stemming, lemmatization, and morphological analysis and generation. Resources for Article: Further resources on this subject: How is Python code organized[article] Machine learning and Python – the Dream Team[article] Putting the Fun in Functional Python[article]
Read more
  • 0
  • 0
  • 4569
article-image-integrating-objective-c
Packt
01 Apr 2016
11 min read
Save for later

Integrating with Objective-C

Packt
01 Apr 2016
11 min read
In this article written by Kyle Begeman author of the book Swift 2 Cookbook, we will cover the following recipes: Porting your code from one language to another Replacing the user interface classes Upgrading the app delegate Introduction Swift 2 is out, and we can see that it is going to replace Objective-C on iOS development sooner or later, however how should you migrate your Objective-C app? Is it necessary to rewrite everything again? Of course you don't have to rewrite a whole application in Swift from scratch, you can gradually migrate it. Imagine a four years app developed by 10 developers, it would take a long time to be rewritten. Actually, you've already seen that some of the codes we've used in this book have some kind of "old Objective-C fashion". The reason is that not even Apple computers could migrate the whole Objective-C code into Swift. (For more resources related to this topic, see here.) Porting your code from one language to another In the previous recipe we learned how to add a new code into an existing Objective-C project, however you shouldn't only add new code but also, as far as possible, you should migrate your old code to the new Swift language. If you would like to keep your application core on Objective-C that's ok, but remember that new features are going to be added on Swift and it will be difficult keeping two languages on the same project. In this recipe we are going to port part of the code, which is written in Objective-C to Swift. Getting ready Make a copy of the previous recipe, if you are using any version control it's a good time for committing your changes. How to do it… Open the project and add a new file called Setup.swift, here we are going to add a new class with the same name (Setup): class Setup { class func generate() -> [Car]{ var result = [Car]() for distance in [1.2, 0.5, 5.0] { var car = Car() car.distance = Float(distance) result.append(car) } var car = Car() car.distance = 4 var van = Van() van.distance = 3.8 result += [car, van] return result } } Now that we have this car array generator we can call it on the viewDidLoad method replacing the previous code: - (void)viewDidLoad { [super viewDidLoad]; vehicles = [Setup generate]; [self->tableView reloadData]; } Again press play and check that the application is still working. How it works… The reason we had to create a class instead of creating a function is that you can only export to Objective-C classes, protocols, properties, and subscripts. Bear that in mind in case of developing with the two languages. If you would like to export a class to Objective-C you have two choices, the first one is inheriting from NSObject and the other one is adding the @objc attribute before your class, protocol, property, or subscript. If you paid attention, our method returns a Swift array but it was converted to an NSArray, but as you might know, they are different kinds of array. Firstly, because Swift arrays are mutable and NSArray are not, and the other reason is that their methods are different. Can we use NSArray in Swift? The answer is yes, but I would recommend avoiding it, imagine once finished migrating to Swift your code still follows the old way, it would be another migration. There's more… Migrating from Objective-C is something that you should do with care, don't try to change the whole application at once, remember that some Swift objects behave differently from Objective-C, for example, dictionaries in Swift have the key and the value types specified but in Objective-C they can be of any type. Replacing the user interface classes At this moment you know how to migrate the model part of an application, however in real life we also have to replace the graphical classes. Doing it is not complicated but it could be a bit full of details. Getting ready Continuing with the previous recipe, make a copy of it or just commit the changes you have and let's continue with our migration. How to do it… First create a new file called MainViewController.swift and start importing the UIKit: import UIKit The next step is creating a class called MainViewController, this class must inherit from UIViewController and implement the protocols UITableViewDataSource and UITableViewDelegate: class MainViewController:UIViewController,UITableViewDataSource, UITableViewDelegate {  Then, add the attributes we had in the previous view controller, keep the same name you have used before: private var vehicles = [Car]() @IBOutlet var tableView:UITableView! Next, we need to implement the methods, let's start with the table view data source methods: func tableView(tableView: UITableView, numberOfRowsInSection section: Int) -> Int{ return vehicles.count } func tableView(tableView: UITableView, cellForRowAtIndexPath indexPath: NSIndexPath) -> UITableViewCell{ var cell:UITableViewCell? = self.tableView.dequeueReusableCellWithIdentifier ("vehiclecell") if cell == nil { cell = UITableViewCell(style: .Subtitle, reuseIdentifier: "vehiclecell") } var currentCar = self.vehicles[indexPath.row] cell!.textLabel?.numberOfLines = 1 cell!.textLabel?.text = "Distance (currentCar.distance * 1000) meters" var detailText = "Pax: (currentCar.pax) Fare: (currentCar.fare)" if currentCar is Van{ detailText += ", Volume: ( (currentCar as Van).capacity)" } cell!.detailTextLabel?.text = detailText cell!.imageView?.image = currentCar.image return cell! } Pay attention that this conversion is not 100% equivalent, the fare for example isn't going to be shown with two digits of precision, there is an explanation later of why we are not going to fix this now.  The next step is adding the event, in this case we have to do the action done when the user selects a car: func tableView(tableView: UITableView, willSelectRowAtIndexPath indexPath: NSIndexPath) -> NSIndexPath? { var currentCar = self.vehicles[indexPath.row] var time = currentCar.distance / 50.0 * 60.0 UIAlertView(title: "Car booked", message: "The car will arrive in (time) minutes", delegate: nil, cancelButtonTitle: "OK").show() return indexPath } As you can see, we need only do one more step to complete our code, in this case it's the view didLoad. Pay attention that another difference from Objective-C and Swift is that in Swift you have to specify that you are overloading an existing method: override func viewDidLoad() { super.viewDidLoad() vehicles = Setup.generate() self.tableView.reloadData() } } // end of class Our code is complete, but of course our application is still using the old code. To complete this operation, click on the storyboard, if the document outline isn't being displayed, click on the Editor menu and then on Show Document Outline: Now that you can see the document outline, click on View Controller that appears with a yellow circle with a square inside: Then on the right-hand side, click on the identity inspector, next go to the custom class and change the value of the class from ViewController to MainViewController. After that, press play and check that your application is running, select a car and check that it is working. Be sure that it is working with your new Swift class by paying attention on the fare value, which in this case isn't shown with two digits of precision. Is everything done? I would say no, it's a good time to commit your changes. Lastly, delete the original Objective-C files, because you won't need them anymore. How it works… As you can see, it's not so hard replacing an old view controller with a Swift one, the first thing you need to do is create a new view controller class with its protocols. Keep the same names you had on your old code for attributes and methods that are linked as IBActions, it will make the switch very straightforward otherwise you will have to link again. Bear in mind that you need to be sure that your changes are applied and that they are working, but sometimes it is a good idea to have something different, otherwise your application can be using the old Objective-C and you didn't realize it. Try to modernize our code using the Swift way instead of the old Objective-C style, for example, nowadays it's preferable using interpolation rather than using stringWithFormat. We also learned that you don't need to relink any action or outlet if you keep the same name. If you want to change the name of anything you might first keep its original name, test your app, and after that you can refactor it following the traditional factoring steps. Don't delete the original Objective-C files until you are sure that the equivalent Swift file is working on every functionality. There's more… This application had only one view controller, however applications usually have more than one view controller. In this case, the best way you can update them is one by one instead of all of them at the same time. Upgrading the app delegate As you know there is an object that controls the events of an application, which is called application delegate. Usually you shouldn't have much code here, but a few of them you might have. For example, you may deactivate the camera or the GPS requests when your application goes to the background and reactivate them when the app returns active. Certainly it is a good idea to update this file even if you don't have any new code on it, so it won't be a problem in the future. Getting ready If you are using the version control system, commit your changes from the last recipe or if you prefer just copy your application. How to do it… Open the previous application recipe and create a new Swift file called ApplicationDelegate.swift, then you can create a class with the same name. As in our previous class we didn't have any code on the application delegate, so we can differentiate it by printing on the log console. So add this traditional application delegate on your Swift file: class ApplicationDelegate: UIResponder, UIApplicationDelegate { var window: UIWindow? func application(application: UIApplication, didFinishLaunchingWithOptions launchOptions: [NSObject: AnyObject]?) -> Bool { print("didFinishLaunchingWithOptions") return true } func applicationWillResignActive(application: UIApplication) { print("applicationWillResignActive") } func applicationDidEnterBackground(application: UIApplication) { print("applicationDidEnterBackground") } func applicationWillEnterForeground(application: UIApplication) { print("applicationWillEnterForeground") } func applicationDidBecomeActive(application: UIApplication) { print("applicationDidBecomeActive") } func applicationWillTerminate(application: UIApplication) { print("applicationWillTerminate") } } Now go to your project navigator and expand the Supporting Files group, after that click on the main.m file. In this file we are going to import the magic file, the Swift header file: #import "Chapter_8_Vehicles-Swift.h" After that we have to specify that the application delegate is the new class we have, so replace the AppDelegate class on the UIApplicationMain call with ApplicationDelegate. Your main function should be like this: int main(int argc, char * argv[]) { @autoreleasepool { return UIApplicationMain(argc, argv, nil, NSStringFromClass([ApplicationDelegate class])); } } It's time to press play and check whether the application is working or not. Press the home button or the combination shift + command + H if you are using the simulator and again open your application. Have a look that you have some messages on your log console. Now that you are sure that your Swift code is working, remove the original app delegate and its importation on the main.m. Test your app just in case. You could consider that we had finished this part, but actually we still have another step to do: removing the main.m file. Now it is very easy, just click on the ApplicationDelegate.swift file and before the class declaration add the attribute @UIApplicationMain, then right click on the main.h and choose to delete it. Test it and your application is done. How it works… The application delegate class has always been specified at the starting of an application. In Objective-C, it follows the C start point, which is a function called main. In iOS, you can specify the class that you want to use as an application delegate. If you program for OS X the procedure is different, you have to go to your nib file and change its class name to the new one. Why did we have to change the main function and then eliminate it? The reason is that you should avoid massive changes, if something goes wrong you won't know the step where you failed, so probably you will have to rollback everything again. If you do your migration step by step ensuring that it is still working, it means that in case of finding an error, it will be easier to solve it. Avoid doing massive changes on your project, changing step by step will be easier to solve issues. There's more… In this recipe, we learned the last steps of how to migrate an app from Objective-C to Swift code, however we have to remember that programming is not only about applications, you can also have a framework. In the next recipe, we are going to learn how to create your own framework compatible with Swift and Objective-C. Summary This article shows you how Swift and Objective-C can live together and give you a step-by-step guide on how to migrate your Objective-C app to Swift. Resources for Article: Further resources on this subject: Concurrency and Parallelism with Swift 2 [article] Swift for Open Source Developers [article] Your First Swift 2 Project [article]
Read more
  • 0
  • 0
  • 14500

article-image-testing-and-debugging-distributed-applications
Packt
01 Apr 2016
21 min read
Save for later

Testing and Debugging Distributed Applications

Packt
01 Apr 2016
21 min read
In this article, by Francesco Pierfederici author of the book Distributed Computing with Python, the author likes to state that, "distributed systems, both large and small, can be extremely challenging to test and debug, as they are spread over a network, run on computers that can be quite different from each other, and might even be physically located in different continents altogether". Moreover, the computers we use could have different user accounts, different disks with different software packages, different hardware resources, and very uneven performance. Some can even be in a different time zone. Developers of distributed systems need to consider all these pieces of information when trying to foresee failure conditions. Operators have to work around all of these challenges when debugging errors. (For more resources related to this topic, see here.) The big picture Testing and debugging monolithic applications is not simple, as every developer knows. However, there are a number of tools that dramatically make the task easier, including the pdb debugger, various profilers (notable mentions include cProfile and line_profile), linters, static code analysis tools, and a host of test frameworks, a number of which have been included in the standard library of Python 3.3 and higher. The challenge with distributed applications is that most of the tools and packages that we can use for single-process applications lose much of their power when dealing with multiple processes, especially when these processes run on different computers. Debugging and profiling distributed applications written in C, C++, and Fortran can be done with tools such as Intel VTune, Allinea MAP, and DDT. Unfortunately, Python developers are left with very few or no options for the time being. Writing small- or medium-sized distributed systems is not terribly hard, as we saw in the pages so far. The main difference between writing monolithic programs and distributed applications is the large number of interdependent components running on remote hardware. This is what makes monitoring and debugging distributed code harder and less convenient. Fortunately, we can still use all familiar debuggers and code analysis tools on our Python distributed applications. Unfortunately, these tools will only go so far to the point that we will have to rely on old-fashioned logging and print statements to get the full picture on what went wrong. Common problems – clocks and time Time is a handy variable for use. For instance, using timestamps is very natural when we want to join different streams of data, sort database records, and in general, reconstruct the timeline for a series of events, which we often times observe are out of order. In addition, some tools (for example, GNU make) rely solely on file modification time and are easily confused by a clock skew between machines. For these reasons, clock synchronization among all computers and systems we use is very important. If our computers are in different time zones, we might want to not only synchronize their clocks but also set them to Coordinated Universal Time (UTC) for simplicity. In all the cases, when changing clocks to UTC is not possible, a good advice is to always process time in UTC within our code and to only covert local time for display purposes. In general, clock synchronization in distributed systems is a fascinating and complex topic, and it is out of the scope of this article. Most readers, luckily, are likely to be well served by the Network Time Protocol (NTP), which is a perfectly fine synchronization solution for most situations. Most modern operating systems, including Windows, Mac OS X, and Linux, have great support for NTP. Another thing to consider when talking about time is the timing of periodic actions, such as polling loops or cronjobs. Many applications need to spawn processes or perform actions (for example, sending a confirmation e-mail or checking whether new data is available) at regular intervals. A common pattern is to set up timers (either in our code or via the tools provided by the OS) and have all these timers go off at some time, usually at a specific hour and at regular intervals after that. The risk of this approach is that we can overload the system the very moment all these processes wake up and start their work. A surprisingly common example would be starting a significant number of processes that all need to read some configuration or data file from a shared disk. In these cases, everything goes fine until the number of processes becomes so large that the shared disk cannot handle the data transfer, thus slowing our application to a crawl. A common solution is the staggering of these timers in order to spread them out over a longer time interval. In general, since we do not always control all code that we use, it is good practice to start our timers at some random number of minutes past the hour, just to be safe. Another example of this situation would be an image-processing service that periodically polls a set of directories looking for new data. When new images are found, they are copied to a staging area, renamed, scaled, and potentially converted to a common format before being archived for later use. If we're not careful, it would be easy to overload the system if many images were to be uploaded at once. A better approach would be to throttle our application (maybe using a queue-based architecture) so that it would only start an appropriately small number of image processors so as to not flood the system. Common problems – software environments Another common challenge is making sure that the software installed on all the various machines we are ever going to use is consistent and consistently upgraded. Unfortunately, it is frustratingly common to spend hours debugging a distributed application only to discover that for some unknown and seemingly impossible reason, some computers had an old version of the code and/or its dependencies. Sometimes, we might even find the code to have disappeared completely. The reasons for these discrepancies can be many: from a mount point that failed to a bug in our deployment procedures to a simple human mistake. A common approach, especially in the HPC world, is to always create a self-contained environment for our code before launching the application itself. Some projects go as far as preferring static linking of all dependencies to avoid having the runtime pick up the wrong version of a dynamic library. This approach works well if the application runtime is long compared to the time it takes to build its full environment, all of its software dependencies, and the application itself. It is not that practical otherwise. Python, fortunately, has the ability to create self-contained virtual environments. There are two related tools that we can use: pyvenv (available as part of the Python 3.5 standard library) and virtualenv (available in PyPI). Additionally, pip, the Python package management system, allows us to specify the exact version of each package we want to install in a requirements file. These tools, when used together, permit reasonable control on the execution environment. However, the devil, as it is often said, is in the details, and different computer nodes might use the exact same Python virtual environment but incompatible versions of some external library. In this respect, container technologies such as Docker (https://www.docker.com) and, in general, version-controlled virtual machines are promising ways out of the software runtime environment maelstrom in those environments where they can be used. In all other cases, HPC clusters come to mind, the best approach will probably be to not rely on the system software and manage our own environments and the full-software stack. Common problems – permissions and environments Different computers might have run our code under different user accounts, and our application might expect to be able to read a file or write data into a specific directory and hit an unexpected permission error. Even in cases where the user accounts used by our code are all the same (down to the same user ID and group ID), their environment may be different on different hosts. Therefore, an environment variable we assumed to be defined might not be or, even worse, might be set to an incompatible value. These problems are common when our code runs as a special, unprivileged user such as nobody. Defensive coding, especially when accessing the environment, and making sure to always fall back to sensible defaults when variables are undefined (that is, value = os.environ.get('SOME_VAR', fallback_value) instead of simply value = os.environ.get['SOME_VAR']) is often necessary. A common approach, when this is possible, is to only run our applications under a specific user account that we control and specify the full set of environment variables our code needs in the deployment and application startup scripts (which will have to be version controlled as well). Some systems, however, not only execute jobs under extremely limited user accounts, but they also restrict code execution to temporary sandboxes. In many cases, access to the outside network is also blocked. In these situations, one might have no other choice but to set up the full environment locally and copy it to a shared disk partition. Other data can be served from custom-build servers running as ancillary jobs just for this purpose. In general, permission problems and user environment mismatches are very similar to problems with the software environment and should be tackled in concert. Often times, developers find themselves wanting to isolate their code from the system as much as possible and create a small, but self-contained environment with all the code and all the environment variables they need. Common problems – the availability of hardware resources The hardware resources that our application needs might or might not be available at any given point in time. Moreover, even if some resources were to be available at some point in time, nothing guarantees that they will stay available for much longer. A problems we can face related to this are network glitches, which are quite common in many environments (especially for mobile apps) and which, for most practical purposes, are undistinguishable from machine or application crashes. Applications using a distributed computing framework or job scheduler can often rely on the framework itself to handle at least some common failure scenarios. Some job schedulers will even resubmit our jobs in case of errors or sudden machine unavailability. Complex applications, however, might need to implement their own strategies to deal with hardware failures. In some cases, the best strategy is to simply restart the application when the necessary resources are available again. Other times, restarting from scratch would be cost prohibitive. In these cases, a common approach is to implement application checkpointing. What this means is that the application both writes its state to a disk periodically and is able to bootstrap itself from a previously saved state. In implementing a checkpointing strategy, you need to balance the convenience of being able to restart an application midway with the performance hit of writing a state to a disk. Another consideration is the increase in code complexity, especially when many processes or threads are involved in reading and writing state information. A good rule of thumb is that data or results that can be recreated easily and quickly do not warrant implementation of application checkpointing. If, on the other hand, some processing requires a significant amount of time and one cannot afford to waste it, then application checkpointing might be in order. For example, climate simulations can easily run for several weeks or months at a time. In these cases, it is important to checkpoint them every hour or so, as restarting from the beginning after a crash would be expensive. On the other hand, a process that takes an uploaded image and creates a thumbnail for, say, a web gallery runs quickly and is not normally worth checkpointing. To be safe, a state should always be written and updated automatically (for example, by writing to a temporary file and replacing the original only after the write completes successfully). The last thing we want is to restart from a corrupted state! Very familiar to HPC users as well as users of AWS, a spot instance is a situation where a fraction or the entirety of the processes of our application are evicted from the machines that they are running on. When this happens, a warning is typically sent to our processes (usually, a SIGQUIT signal) and after a few seconds, they are unceremoniously killed (via a SIGKILL signal). For AWS spot instances, the time of termination is available through a web service in the instance metadata. In either case, our applications are given some time to save the state and quit in an orderly fashion. Python has powerful facilities to catch and handle signals (refer to the signal module). For example, the following simple commands shows how we can implement a bare-bones checkpointing strategy in our application: #!/usr/bin/env python3.5 """ Simple example showing how to catch signals in Python """ import json import os import signal import sys     # Path to the file we use to store state. Note that we assume # $HOME to be defined, which is far from being an obvious # assumption! STATE_FILE = os.path.join(os.environ['HOME'],                                '.checkpoint.json')     class Checkpointer:     def __init__(self, state_path=STATE_FILE):         """         Read the state file, if present, and initialize from that.         """         self.state = {}         self.state_path = state_path         if os.path.exists(self.state_path):             with open(self.state_path) as f:                 self.state.update(json.load(f))         return       def save(self):         print('Saving state: {}'.format(self.state))         with open(self.state_path, 'w') as f:             json.dump(self.state, f)         return       def eviction_handler(self, signum, frame):         """         This is the function that gets called when a signal is trapped.         """         self.save()           # Of course, using sys.exit is a bit brutal. We can do better.         print('Quitting')         sys.exit(0)         return     if __name__ == '__main__':     import time       print('This is process {}'.format(os.getpid()))       ckp = Checkpointer()     print('Initial state: {}'.format(ckp.state))       # Catch SIGQUIT     signal.signal(signal.SIGQUIT, ckp.eviction_handler)       # Get a value from the state.     i = ckp.state.get('i', 0)     try:         while True:             i += 1             ckp.state['i'] = i             print('Updated in-memory state: {}'.format(ckp.state))             time.sleep(1)     except KeyboardInterrupt:         ckp.save() If we run the preceding script in a terminal window and then in another terminal window, we send it a SIGQUIT signal (for example, via kill -s SIGQUIT <process id>). After this, we see the checkpointing in action, as the following screenshot illustrates: A common situation in distributed applications is that of being forced to run code in potentially heterogeneous environments: machines (real or virtual) of different performances, with different hardware resources (for example, with or without GPUs), and potentially different software environments (as we mentioned already). Even in the presence of a job scheduler, to help us choose the right software and hardware environment, we should always log the full environment as well as the performance of each execution machine. In advanced architectures, these performance metrics can be used to improve the efficiency of job scheduling. PBS Pro, for instance, takes into consideration the historical performance figures of each job being submitted to decide where to execute it next. HTCondor continuously benchmarks each machine and makes those figures available for node selection and ranking. Perhaps, the most frustrating cases are where either due to the network itself or due to servers being overloaded, network requests take so long that our code hits its internal timeouts. This might lead us to believe that the counterpart service is not available. These bugs, especially when transient, can be quite hard to debug. Challenges – the development environment Another common challenge in distributed systems is the setup of a representative development and testing environment, especially for individuals or small teams. Ideally, in fact, the development environment should be identical to the worst-case scenario deployment environment. It should allow developers to test common failure scenarios, such as a disk filling up, varying network latencies, intermittent network connections, hardware and software failures, and so on—all things that are bound to happen in real time, sooner or later. Large teams have the resources to set up development and test clusters, and they almost always have dedicated software quality teams stress testing our code. Small teams, unfortunately, often find themselves forced to write code on their laptops and use a very simplified (and best-case scenario!) environment made up by two or three virtual machines running on the laptops themselves to emulate the real system. This pragmatic solution works and is definitely better than nothing. However, we should remember that virtual machines running on the same host exhibit unrealistically high-availability and low-network latencies. In addition, nobody will accidentally upgrade them without us knowing or image them with the wrong operating system. The environment is simply too controlled and stable to be realistic. A step closer to a realistic setup would be to create a small development cluster on, say, AWS using the same VM images, with the same software stack and user accounts that we are going to use in production. All things said, there is simply no replacement for the real thing. For cloud-based applications, it is worth our while to at least test our code on a smaller version of the deployment setup. For HPC applications, we should be using either a test cluster, a partition of the operational cluster, or a test queue for development and testing. Ideally, we would develop on an exact clone of the operational system. Cost consideration and ease of development will constantly push us to the multiple-VMs-on-a-laptop solution; it is simple, essentially free, and it works without an Internet connection, which is an important point. We should, however, keep in mind that distributed applications are not impossibly hard to write; they just have more failure modes than their monolithic counterparts do. Some of these failure modes (especially those related to data access patterns) typically require a careful choice of architecture. Correcting architectural choices dictated by false assumptions later on in the development stage can be costly. Convincing managers to give us the hardware resources that we need early on is usually difficult. In the end, this is a delicate balancing act. A useful strategy – logging everything Often times, logging is like taking backups or eating vegetables—we all know we should do it, but most of us forget. In distributed applications, we simply have no other choice—logging is essential. Not only that, logging everything is essential. With many different processes running on potentially ephemeral remote resources at difficult-to-predict times, the only way to understand what happens is to have logging information and have it readily available and in an easily searchable format/system. At the bare minimum, we should log process startup and exit time, exit code and exceptions (if any), all input arguments, all outputs, the full execution environment, the name and IP of the execution host, the current working directory, the user account as well as the full application configuration, and all software versions. The idea is that if something goes wrong, we should be able to use this information to log onto the same machine (if still available), go to the same directory, and reproduce exactly what our code was doing. Of course, being able to exactly reproduce the execution environment might simply not be possible (often times, because it requires administrator privileges). However, we should always aim to be able to recreate a good approximation of that environment. This is where job schedulers really shine; they allow us to choose a specific machine and specify the full job environment, which makes replicating failures easier. Logging software versions (not only the version of the Python interpreter, but also the version of all the packages used) helps diagnose outdated software stacks on remote machines. The Python package manager, pip, makes getting the list of installed packages easy: import pip; pip.main(['list']). Whereas, import sys; print(sys.executable, sys.version_info) displays the location and version of the interpreter. It is also useful to create a system whereby all our classes and function calls emit logging messages with the same level detail and at the same points in the object life cycle. Common approaches involve the use of decorators and, maybe a bit too esoteric for some, metaclasses. This is exactly what the autologging Python module (available on PyPI) does for us. Once logging is in place, we face the questions where to store all these logging messages and whose traffic could be substantial for high verbosity levels in large applications. Simple installations will probably want to write log messages to text files on a disk. More complex applications might want to store these messages in a database (which can be done by creating a custom handler for the Python logging module) or in specialized log aggregators such as Sentry (https://getsentry.com). Closely related to logging is the issue of monitoring. Distributed applications can have many moving parts, and it is often essential to know which machines are up, which are busy, as well as which processes or jobs are currently running, waiting, or in an error state. Knowing which processes are taking longer than usual to complete their work is often an important warning sign that something might be wrong. Several monitoring solutions for Python (often times, integrated with our logging system) exist. The Celery project, for instance, recommends flower (http://flower.readthedocs.org) as a monitoring and control web application. HPC job schedulers, on the other hand, tend to lack common, general-purpose, monitoring solutions that go beyond simple command-line clients. Monitoring comes in handy in discovering potential problems before they become serious. It is in fact useful to monitor resources such as available disk space and trigger actions or even simple warning e-mails when they fall under a given threshold. Many centers monitor hardware performance and hard drive SMART data to detect early signs of potential problems. These issues are more likely to be of interest to operations personnel rather than developers, but they are useful to keep in mind. They can also be integrated in our applications to implement strategies in order to handle performance degradations gracefully. A useful strategy – simulating components A good, although possibly expensive in terms of time and effort, test strategy is to simulate some or all of the components of our system. The reasons are multiple; on one hand, simulating or mocking software components allows us to test our interfaces to them more directly. In this respect, mock testing libraries, such as unittest.mock (part of the Python 3.5 standard library), are truly useful. Another reason to simulate software components is to make them fail or misbehave on demand and see how our application responds. For instance, we could increase the response time of services such as REST APIs or databases to worst-case scenario levels and see what happens. Sometimes, we might exceed timeout values in some network calls leading our application to incorrectly assume that the sever has crashed. Especially early on in the design and development of a complex distributed application, one can make overly optimistic assumptions about things such as network availability and performance or response time of services such as databases or web servers. For this reason, having the ability to either completely bring a service offline or, more subtly, modify its behavior can tell us a lot about which of the assumptions in our code might be overly optimistic. The Netflix Chaos Monkey (https://github.com/Netflix/SimianArmy) approach of disabling random components of our system to see how our application copes with failures can be quite useful. Summary Writing and running small- or medium-sized distributed applications in Python is not hard. There are many high-quality frameworks that we can leverage among others, for example, Celery, Pyro, various job schedulers, Twisted, MPI bindings, or the multiprocessing module in the standard library. The real difficulty, however, lies in monitoring and debugging our applications, especially because a large fraction of our code runs concurrently on many different, often remote, computers. The most insidious bugs are those that end up producing incorrect results (for example, because of data becoming corrupted along the way) rather than raising an exception, which most frameworks are able to catch and bubble up. The monitoring and debugging tools that we can use with Python code are, sadly, not as sophisticated as the frameworks and libraries we use to develop that same code. The consequence is that large teams end up developing their own, often times, very specialized distributed debugging systems from scratch and small teams mostly rely on log messages and print statements. More work is needed in the area of debuggers for distributed applications in general and for dynamic languages such as Python in particular. Resources for Article: Further resources on this subject: Python Data Structures [article] Python LDAP applications - extra LDAP operations and the LDAP URL library [article] Machine Learning Tasks [article]
Read more
  • 0
  • 0
  • 3629
Modal Close icon
Modal Close icon