How-To Tutorials

article-image-fast-array-operations-numpy

19 Dec 2013

10 min read

Fast Array Operations with NumPy

19 Dec 2013

(For more resources related to this topic, see here.) Getting started with NumPy NumPy is founded around its multidimensional array object, numpy.ndarray. NumPy arrays are a collection of elements of the same data type; this fundamental restriction allows NumPy to pack the data in an efficient way. By storing the data in this way NumPy can handle arithmetic and mathematical operations at high speed. Creating arrays You can create NumPy arrays using the numpy.array function. It takes list-like object (or another array) as input and, optionally, a string expressing its data type. You can interactively test array creation using an IPython shell as follows: In [1]: import numpy as np In [2]: a = np.array([0, 1, 2]) Every NumPy array has a data type that can be accessed by the dtype attribute, as shown in the following code. In the following code example, dtype is a 64-bit integer. In [3]: a.dtype Out[3]: dtype('int64') If we want those numbers to be treated as a float type of variable, we can either pass the dtype argument in the np.array function or cast the array to another data type using the astype method as shown in the following code: In [4]: a = np.array([1, 2, 3], dtype='float32') In [5]: a.astype('float32') Out[5]: array([ 0., 1., 2.], dtype=float32) To create an array with two dimensions (an array of arrays) we can initialize the array using a nested sequence shown as follows: In [6]: a = np.array([[0, 1, 2], [3, 4, 5]]) In [7]: print(a) Out[7]: [[0 1 2] [3 4 5]] The array created in this way has two dimensions—axes in NumPy's jargon. Such an array is like a table that contains two rows and three columns. We can access the axes structure using the ndarray.shape attribute: In [7]: a.shape Out[7]: (2, 3) Arrays can also be reshaped only as long as the product of the shape dimensions is equal to the total number of elements in the array. For example, we can reshape an array containing 16 elements in the following ways: (2, 8), (4, 4), or (2, 2, 4). To reshape an array we can either use the ndarray.reshape method or directly change the ndarray.shape attribute. The following code illustrates the use of the ndarray.reshape method: In [7]: a = np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15]) In [7]: a.shape Out[7]: (16,) In [8]: a.reshape(4, 4) # Equivalent: a.shape = (4, 4) Out[8]: array([[ 0, 1, 2, 3], [ 4, 5, 6, 7], [ 8, 9, 10, 11], [12, 13, 14, 15]]) Thanks to this property you are also free to add dimensions of size one. You can reshape an array with 16 elements to (16, 1), (1, 16), (16, 1, 1), and so on. NumPy provides convenience functions, shown in the following code, to create arrays filled with zeros, filled with ones, or without an initialization value (empty—their actual value is meaningless and depends on the memory state). Those functions take the array shape as a tuple and optionally its dtype. In [8]: np.zeros((3, 3)) In [9]: np.empty((3, 3)) In [10]: np.ones((3, 3), dtype='float32') In our examples we will use the numpy.random module to generate random floating point numbers in the (0, 1) interval. The numpy.random module is shown as follows: In [11]: np.random.rand(3, 3) Sometimes it is convenient to initialize arrays that have a similar shape to other arrays. Again, NumPy provides some handy functions for that purpose such as zeros_like, empty_like, and ones_like. These functions are as follows: In [12]: np.zeros_like(a) In [13]: np.empty_like(a) In [14]: np.ones_like(a) Accessing arrays NumPy array interface is, on a shallow level, similar to Python lists. They can be indexed using integers, and can also be iterated using a for loop. In [15]: A = np.array([0, 1, 2, 3, 4, 5, 6, 7, 8]) In [16]: A[0] Out[16]: 0 In [17]: [a for a in A] Out[17]: [0, 1, 2, 3, 4, 5, 6, 7, 8] It is also possible to index an array in multiple dimensions. If we take a (3,3) array (an array containing 3 triplets) and we index the first element, we obtain the first triplet shown as follows: In [18]: A = np.array([[0, 1, 2], [3, 4, 5], [6, 7, 8]]) In [19]: A[0] Out[19]: array([0, 1, 2]) We can index the triplet again by adding the other index separated by a comma. To get the second element of the first triplet we can index using [0, 1] as shown in the following code: In [20]: A[0, 1] Out[20]: 1 NumPy allows you to slice arrays in single and multiple dimensions. If we index on the first dimension we will get a collection of triplets shown as follows: In [21]: A[0:2] Out[21]: array([[0, 1, 2], [3, 4, 5]]) If we slice the array with [0:2]. for every selected triplet we extract the first two elements, resulting in a (2, 2) array shown in the following code: In [22]: A[0:2, 0:2] Out[22]: array([[0, 1], [3, 4]]) Intuitively, you can update values in the array by using both numerical indexes and slices. The syntax is as follows: In [23]: A[0, 1] = 8 In [24]: A[0:2, 0:2] = [[1, 1], [1, 1]] Indexing with the slicing syntax is fast because it doesn't make copies of the array. In NumPy terminology it returns a view over the same memory area. If we take a slice of the original array and then changes one of its value; the original array will be updated as well. The following code illustrates an example of the same: In [25]: a = np.array([1, 1, 1, 1]) In [26]: a_view = A[0:2] In [27]: a_view[0] = 2 In [28]: print(A) Out[28]: [2 1 1 1] We can take a look at another example that shows how the slicing syntax can be used in a real-world scenario. We define an array r_i, shown in the following line of code, which contains a set of 10 coordinates (x, y); its shape will be (10, 2): In [29]: r_i = np.random.rand(10, 2) A typical operation is extracting the x component of each coordinate. In other words you want to extract the items [0, 0], [1, 0], [2, 0], and so on. resulting in an array with shape (10,). It is helpful to think that the first index is moving while the second one is fixed (at 0). With this in mind, we will slice every index on the first axis (the moving one) and take the first element (the fixed one) on the second axis as shown in the following line of code: In [30]: x_i = r_i[:, 0] On the other hand, the following expression of code will keep the first index fixed and the second index moving, giving the first (x, y) coordinate: In [31]: r_0 = r_i[0, :] Slicing all the indexes over the last axis is optional; using r_i[0] has the same effect as r_i[0, :]. NumPy allows to index an array by using another NumPy array made of either integer or Boolean values—a feature called fancy indexing. If you index with an array of integers, NumPy will interpret the integers as indexes and will return an array containing their corresponding values. If we index an array containing 10 elements with [0, 2, 3], we obtain an array of size 3 containing the elements at positions 0, 2 and 3. The following code gives us an illustration of this concept: In [32]: a = np.array([9, 8, 7, 6, 5, 4, 3, 2, 1, 0]) In [33]: idx = np.array([0, 2, 3]) In [34]: a[idx] Out[34]: array([9, 7, 6]) You can use fancy indexing on multiple dimensions by passing an array for each dimension. If we want to extract the elements [0, 2] and [1, 3] we have to pack all the indexes acting on the first axis in one array, and the ones acting on the second axis in another. This can be seen in the following code: In [35]: a = np.array([[0, 1, 2], [3, 4, 5], [6, 7, 8], [9, 10, 11]]) In [36]: idx1 = np.array([0, 1]) In [37]: idx2 = np.array([2, 3]) In [38]: a[idx1, idx2] You can also use normal lists as index arrays, but not tuples. For example the following two statements are equivalent: >>> a[np.array([0, 1])] # is equivalent to >>> a[[0, 1]] However, if you use a tuple, NumPy will interpret the following statement as an index on multiple dimensions: >>> a[(0, 1)] # is equivalent to >>> a[0, 1] The index arrays are not required to be one-dimensional; we can extract elements from the original array in any shape. For example we can select elements from the original array to form a (2,2) array shown as follows: In [39]: idx1 = [[0, 1], [3, 2]] In [40]: idx2 = [[0, 2], [1, 1]] In [41]: a[idx1, idx2] Out[41]: array([[ 0, 5], [10, 7]]) The array slicing and fancy indexing features can be combined. For example, this is useful if we want to swap the x and y columns in a coordinate array. In the following code, the first index will be running over all the elements (a slice), and for each of those we extract the element in position 1 (the y) first and then the one in position 0 (the x): In [42]: r_i = np.random(10, 2) In [43]: r_i[:, [0, 1]] = r_i[:, [1, 0]] When the index array is a Boolean there are slightly different rules. The Boolean array will act like a mask; every element corresponding to True will be extracted and put in the output array. This procedure is shown as follows: In [44]: a = np.array([0, 1, 2, 3, 4, 5]) In [45]: mask = np.array([True, False, True, False, False, False]) In [46]: a[mask] Out[46]: array([0, 2]) The same rules apply when dealing with multiple dimensions. Furthermore, if the index array has the same shape as the original array, the elements corresponding to True will be selected and put in the resulting array. Indexing in NumPy is a reasonably fast operation. Anyway, when speed is critical, you can use the, slightly faster, numpy.take and numpy.compress functions to squeeze out a little more speed. The first argument of numpy.take is the array we want to operate on, and the second is the list of indexes we want to extract. The last argument is axis; if not provided, the indexes will act on the flattened array, otherwise they will act along the specified axis. In [47]: r_i = np.random(100, 2) In [48]: idx = np.arange(50) # integers 0 to 50 In [49]: %timeit np.take(r_i, idx, axis=0) 1000000 loops, best of 3: 962 ns per loop In [50]: %timeit r_i[idx] 100000 loops, best of 3: 3.09 us per loop The similar, but faster version for Boolean arrays is numpy.compress which works in the same way. The use of numpy.compress is shown as follows: In [51]: idx = np.ones(100, dtype='bool') # all True values In [52]: %timeit np.compress(idx, r_i, axis=0) 1000000 loops, best of 3: 1.65 us per loop In [53]: %timeit r_i[idx] 100000 loops, best of 3: 5.47 us per loop Summary The article thus covers the basics of NumPy arrays, talking about the creating of arrays and how we can access them. Resources for Article: Further resources on this subject: Getting Started with Spring Python [Article] Python Testing: Installing the Robot Framework [Article] Python Multimedia: Fun with Animations using Pyglet [Article]

0
0
107489

Packt

19 Dec 2013

15 min read

Applied Modeling

Packt

19 Dec 2013

15 min read

0
0
2362

Packt

19 Dec 2013

12 min read

Going Isometric

Packt

19 Dec 2013

12 min read

(For more resources related to this topic, see here.) Cartesian to isometric equations A very important thing to understand here is that the level data still remains the same 2D array, and we will be altering only the rendering process. Later on, we will need to update the level data to accommodate large tiles, which will contain items that are bigger than the current tile size. Our two-dimensional top-down coordinates for a tile can be called Cartesian coordinates. The relationship between Cartesian and isometric coordinates is shown in the following code: //Cartesian to isometric: x_Iso = x_Cart - y_Cart; y_Iso = ( x_Cart + y_Cart ) / 2; //Isometric to Cartesian: x_Cart = ( 2 * y_Iso + x_Iso ) / 2; y_Cart = ( 2 * y_Iso – x_Iso ) / 2; Now that is very simple isn't it? We will use an IsoHelper class for this conversion where we can pass through a point and get back to the converted point. An isometric view via a matrix transformation Although the equations are simple and straightforward, the art needed for an isometric tile is a bit complicated. The artist needs to create the rhombus-shaped tile art with pixel precision and mostly tileable in all four directions. An alternative approach is to use the square tile itself and skew them dynamically using the corresponding code. Let us try to create the isometric view for the level data with the same tiles using this approach. The transformation matrix for isometric transformation is as follows, which is essentially a rotation of 45 degrees and scaling by half in Y axis: var m:Matrix = new Matrix(1,0.5,-1,0.5,0,0); The code for the IsometricLevel class, is shared as follows. You should initialize this class from the Starling document class using new Starling (IsometricLevel, stage). The following approach just applies the isometric transformation matrix to the RenderTexture image. Minor changes in the init function are shown in the following code: var m:Matrix = new Matrix(1,0.5,-1,0.5,0,0); for(var i_int=0;i<levelData.length;i++){ for(var j_int=0;j<levelData[0].length;j++){ img=new Image(texAtlas.getTexture(paddedName(levelData[i][j]))); img.x=j*tileWidth+borderX; img.y=i*tileWidth+borderY; rTex.draw(img); } } m.translate( 300, 0 ); rTexImage.transformationMatrix = m; We apply the transformation matrix to the RenderTexture image and translate it by 300 pixels so that the whole of it is visible. Skewing will make a part of the image to be out of the visible area of the screen. We will get the following result: An alternate approach is to apply the transformation matrix to each individual tile image, find the corresponding isometric coordinates, and move and place individual tiles accordingly as shown in the following code: var m:Matrix = new Matrix(1,0.5,-1,0.5,0,0); var pt_Point=new Point(); for(var i_int=0;i<levelData.length;i++){ for(var j_int=0;j<levelData[0].length;j++){ img = new Image(texAtlas.getTexture(paddedName(levelData[i][j]))); img.transformationMatrix = m; pt.x=j*tileWidth+borderX; pt.y=i*tileWidth+borderY; pt=IsoHelper.cartToIso(pt); img.x=pt.x+300; img.y=pt.y; rTex.draw(img); } } Here, we use the convenient cartToIso(pt) conversion function of our IsoHelper class to find the corresponding isometric coordinates to our Cartesian coordinates. We are offsetting the drawing by 300 pixels to handle the skewing offset for the image. This approach will work in some cases, but not all top-down tiles can be simply skewed and made into an isometric tile. For example, consider a tree in the top-down view, it will simply look like a skewed tree graphic after we apply the isometric transformation. So, the right approach is to create an isometric tile art specifically and use isometric equations to place them correctly. Let us use the isometric tiles provided in the assets pack to create a sample level. Implementing the isometric view via isometric art Please refer to the SampleIsometricDemo source folder, which implements a sample level of our game using isometric art and the previously mentioned equations. There are some differences in the approach that I will be explaining in the following sections. Most of it has to do with the change in level data, altering the registration point of larger tiles, and handling depth. We also need to offset the image drawing so that it fits in the screen area. We use a variable called screenOffset for this purpose. The render code is as follows: var pt_Point=new Point(); for(var i_int=0;i<groundArray.length;i++){ for(var j_int=0;j<groundArray[0].length;j++){ //draw the ground img=new Image(texAtlas.getTexture(String(groundArray[i][j]).split(".")[0])); pt.x=j*tileWidth; pt.y=i*tileWidth; pt=IsoHelper.cartToIso(pt); img.x=pt.x+screenOffset.x; img.y=pt.y+screenOffset.y; rTex.draw(img); //draw overlay if(overlayArray[i][j]!="*"){ img=new Image(texAtlas.getTexture(String(overlayArray[i][j]).split(".")[0]));])); img.x=pt.x+screenOffset.x; img.y=pt.y+screenOffset.y; if(regPoints[overlayArray[i][j]]!=null){ img.x+=regPoints[overlayArray[i][j]].x; img.y-=regPoints[overlayArray[i][j]].y; } rTex.draw(img); } } } The result is shown in the following screenshot: Level data structure The level data for our isometric level is not just a simple 2D array with index numbers any more, but a combination of multiple data structures. We have a 2D array for the ground tiles, another 2D array for overlay tiles, and a dictionary to store altered registration points of the overlay tiles. Ground tiles are those tiles which exactly fit the isometric tile dimensions, which in this case is 80 x 40, and makes up the bottom-most layer of the isometric level. These tiles won't take part in any depth sorting as they are always rendered below all other items that populate the level. Overlay tiles are items which may not fit into the isometric tile dimensions and have height, for instance, buildings, trees, bushes, rocks, and so on. Some of these can be fit into tile dimensions, but are kept as such that we have various advantages using the following approach: We are free to place an overlay tile over any ground tile, which adds to flexibility We would need a lot of tiles if we try to fit overlay tiles and ground tiles together for all permutations and combinations Effects such as tinting can be applied independently to the overlay tiles Depth handling becomes much easier Overlay tiles which are smaller than the tile size reduce the game size Altering registration points Starling considers all images as rectangular blocks with their registration point at the top-left corner. The registration point is the point which can be considered as the (0,0) of that image. Traditional Flash had given us the capability to alter the registration points by embedding images inside Sprite or MovieClip. We can still do the same, but it will require unnecessary creation of a lot of Sprites. Alternately, we can use the pivotX and pivotY properties of Starling objects for the same result too. In our isometric level, we will need to precisely place overlay tiles inside the isometric grid space. An overlay tile does not have any standard size as it can be any item— a tree, building, character, and so on. So, placing them correctly is a tricky thing and very specific to the tile concerned. This leads us to have independent registration points for each overlay tile. We use a dictionary structure to save these values and use those values as offsets while placing overlay tiles. For example, we need to place a bush image, nonwalk0009.png, exactly at the middle of an isometric grid, which means moving it 12 pixels to the left and 19 pixels to the top for proper alignment. We save (12,19) as a new point inside our dictionary for ID nonwalk0009.png, as follows: regPoints["nonwalk0009.png"]=new Point(12,19); Finding a tile's precise placement point needs to involve visual interaction; hence, we will build a level editor, which makes this easier. Depth sorting An isometric view needs us to handle the depth of items manually. For ground tiles, there is no depth issue as they always form the lowest layer over which all the overlay items and characters are drawn. But overlay tiles and characters need to be drawn at specific depths for it to look appropriate. By depth, I mean the order at which the images are drawn. An image drawn later will overlap the one drawn earlier, thereby making it seem in front of the latter. For a level which does not change or without any moving items, we need to find the depth only once for the initial render. But for a level with moving characters or vehicles, we need to find every frame in the game loop and render. The current sample level does not change over time, so we can simply render the level by looping through the array. Any overlay item placed at a higher I or J value will be rendered later, and hence will be shown in front, where I and J are array indices. Thus, items placed at higher indices appear closer to the camera, that is, for the same I, a higher J is closer to the camera and vice versa. When we have a moving item, we need to find the corresponding array position it occupies based on its current screen position. By using these new found array indices, we can compare with the overlay tile's indices and decide on the drawing sequence. The code to find array indices from the screen position is as follows: //capture screen position var screenPos_Point=new Point(hero.x,hero.y); //convert to cartesian coordinates var cartPos_Point=IsoHelper.isoToCart(screenPos); //find tile indices from cartesian values var tilePos_Point=IsoHelper.getTileIndices(screenPos,tileWidth); Understanding isometric movement Isometric movement is very straightforward to implement. All we need to do is move the item in top-down Cartesian coordinates and draw it on the screen after converting into isometric coordinates. For example, if our character is at a point, heroCart in the Cartesian system, then the following code moves him/her to the right: heroCart.x+=heroSpeed; //convert to isometric coordinates heroIso=IsoHelper.cartToIso(heroCart); heroImage.x=heroIso.x; heroImage.y=heroIso.y; rTex.draw(heroImage); Detecting isometric collision Collision detection for any tile-based game is done based on tiles. When designing, we will make sure that certain tiles are walkable while certain tiles are nonwalkable, which means that the characters can move over some tiles but not over others. So, when we calculate the movements of any character, we first make sure that the character won't end up on a nonwalkable tile. Thus, after each movement, we check if the resulting position falls in a nonwalkable tile by finding array indices as mentioned previously. If the result is true, we will ignore the movement, else we will proceed with the movement and update the on-screen character position. heroCart.x+=heroSpeed; //find new tile point var tilePos_Point=IsoHelper.getTileIndices(heroCart,tileWidth); //this checks if new tile position is occupied, else proceeds if(checkWalkable(tilePos)){ //convert to isometric coordinates heroIso=IsoHelper.cartToIso(heroCart); heroImage.x=heroIso.x; heroImage.y=heroIso.y; rTex.draw(heroImage); } You may be wondering that the hero character should need some special considerations to be drawn correctly as the right depth, but by the way we draw things, it gets handled automatically. We do not allow the hero to move onto a nonwalkable tile, that is, bushes, trees, and so on. So, any tile remains walkable or nonwalkable. The character gets drawn on top of a walkable tile, which does not contain any overlay items, and hence it will occupy the right depth. In this method, a full tile has to be made either walkable or nonwalkable, but this may not be the case for all games. We may need to have tiles, which block entry from a specific direction or block exit in a particular direction as a fence along one border of a tile. In such cases, the tile is still walkable, but valid movement is also checked by tracking the direction in which the character is moving. For our game, the first method will be used along with the four-way freedom of movement. In an isometric view, movement can be either in four directions or eight directions, which in turn is called a four-way movement or an eight-way movement respectively. A four-way movement is when we move along the X or Y axis alone on the Cartesian space. An eight-way movement happens when, in addition to four - way, we also move the item diagonally. Logic still remains the same. Summary In this article, we learned about the isometric projection and the equations that help us to implement it based on the simpler Cartesian system. We implemented a sample isometric level using isometric art as well as learned about matrix-based fake isometric rendering. We analyzed the IsoHelper class, which facilitates easy conversion between Cartesian and isometric coordinates and also helps in finding array indices. We learned why altering the registration points is essential for perfectly placing the overlay tiles and we found that our level data needs to track these registration points as well. We also learned how depth sorting, collision detection, and isometric movement are done based on our tile-based approach. Resources for Article: Further resources on this subject: Introduction to Game Development Using Unity 3D [Article] Flash Game Development: Making of Astro-PANIC! [Article] Collision Detection and Physics in Panda3D Game Development [Article]

0
0
18817

article-image-platform-service-and-cloudbees

Packt

19 Dec 2013

10 min read

Platform as a Service and CloudBees

Packt

19 Dec 2013

10 min read

(For more resources related to this topic, see here.) Platform as a Service (PaaS) is a crossover between IaaS and SaaS. This is a fuzzy definition, but it defines well the existing actors in this industry well and possible confusions. A general presentation of PaaS uses a pyramid. Depending on what the graphics try to demonstrate, the pyramid can be drawn upside down, as shown in the following diagram: Cloud pyramids The pyramid on the left-hand side shows XaaS platforms based on the target users' profiles. It demonstrates that IaaS is the basis for all Cloud services. It provides the required flexibility for PaaS to support applications that are exposed as SaaS to the end users. Some SaaS actually don't use a PaaS and directly rely on IaaS, but that doesn't really matter here. The pyramid on the right-hand side represents the providers and the three levels suggests the number of providers in each category. IaaS only makes sense for highly concentrated, large-scale providers. PaaS can have more actors, probably focused on some ecosystem, but the need is to have a neutral and standard platform that is actually attractive for developers. SaaS is about all the possible applications running in Cloud. The top-level shape should then be far larger than what the graphic shows. So, which platform? With the previous definition of platform, you just have a faint idea; your understanding about PaaS is more than IaaS and less than SaaS. The missing definition is to know what the platform is about. A platform is a standardization of the runtime for which a developer is waiting to do his/her job. This depends on the software ecosystem you're considering. For a Java EE developer, a platform means having at least a servlet container, managing DataSource to access the database, and having few comparable resources wrapped as standard Java EE APIs. A Play! framework developer will consider this as overweight and only ask for a JVM with web socket's support. A PHP developer will expect a Linux/Apache/MySQL/PHP (LAMP) stack, similar to the one he/she has been using for years, with a traditional server hosting service. So, depending on the development ecosystem you're considering, platforms don't have the exact same meaning, but they all share a common principle. A platform is the common denominator for a software language ecosystem, where the application is all that a specific developer will write or choose on their own. Java EE developers will ask for a container, and Ruby developers will ask for an RVM environment. What they run on top is their own choice. With this definition, you understand that a platform is about the standardization of runtime for a software ecosystem. Maybe some of you have patched OpenJDK to enable some magic features in the JVM (really?), but most of us just use the standard Oracle Java distribution. Such a standardization makes it possible to share resources and engineering skills on a large scale, to reduce cost, and provide a reliable runtime. Cloud and clustering Another consideration for a platform is clustering. Cloud is based on slicing resources into small virtual elements and letting the users select as many as they need. In most cases, this requires the application to support a clustering mode, as using more resources will require you to scale out on multiple hosts. Clustering has never been a trivial thing, and many developers aren't familiar with the related constraints. The platform can help them by providing specialized services to distribute the load around the cluster's nodes. Some PaaS such as CloudBees or Google App Engine provide such features, while some don't. This is the major difference between PaaS offers. Some are IaaS-like preinstalled middleware services, while some offer a highly integrated platform. A typical issue faced is that of state management. Java EE developers rely on HttpSession to store user's data and retrieve them on subsequent interaction. Modern frameworks tend to be stateless, but the state needs to be managed anyway. PaaS has to provide options to developers, so that they can choose the best strategy to match their own business requirements. This is a typical clustering issue that is well addressed by PaaS because the technical solutions (sticky session, session replication, distributed storage engines, and so on) have been implemented once with all the required skills to do it right, and can be used by all platform users. Thanks to a PaaS, you don't need to be a clustering guru. This doesn't mean that it will magically let your legacy application scale out, but it gives you adequate tools to design the application for scalability. Private versus public Clouds Many companies are interested in Cloud, thanks to the press for publishing all product announcements as the new revolution, and would like to benefit from them but as a private resource. If you go back to the comparison in the Preface with an electricity production, this may make sense if you're well established. Amazon or Google should have private power plants to supply giant data centers can make sense—anyway it doesn't seems that they do but as backends. For most of companies, this would be a surprising company choice. The main reason is that the principle of the Cloud relies on the last letter of XaaS (S) that stands for Service. You can install an OpenStack or VMware farm on your data center, but then you won't have an IaaS. You will have some virtualization and flexibility that probably is far better than traditional dedicated hardware, but you miss the major change. You still will have to hire operators to administer the servers and software stack. You will even have a more complex software stack (search for an OpenStack administrator and you'll understand). Using Cloud makes sense because there are thousands of users all around the world sharing the same lower-level resources, and a centralized, highly specialized team to manage them all. Building your own, private PaaS is yet another challenge. This is not a simple middleware stack. This is not about providing virtual machine images with a preinstalled Tomcat server. What about maintenance, application scalability, deployment APIs, clustering, backup, data replication, high availability,monitoring, and support? Support is a major added value of cloud services—I'm not just saying this because I'm a support engineer—but because when something fails, you need someone to help. You can't just wait with the promise for a patch provided by the community. The guy who's running your application needs to have significant knowledge of the platform. That's one reason that CloudBees is focusing on Java first, as this is the ecosystem and environment we know best (even we have some Erlang and Ruby engineers whose preferred game is to troll on this displeasing language). With a private Cloud, you probably can have level-one support with an internal support team, but you can't handle all the issues. As for resource concentration, to build an impressive knowledge base. All those topics are ignored in most cases as people only focus on the app:deploy automation, as opposed to the old-style deployments to dedicated hardware. If this is what you're looking for, you should know that Maven was able to do this for years on all the Java EE containers using cargo. You can check the same at http://cargo.codehaus.org. Cloud isn't just about abstracting the runtime behind an API; it's about changing the way in which developers manage and access runtime so that it becomes a service they can consume without any need to worry about what's happening behind the scene. Security The reason that companies claim to prefer a private cloud solution is security. Amazon datacenters are far more secure than any private datacenter, due to both strong security policy and anonymous user data. Security is not about exploiting encryption algorithms, like in Hollywood movies, but about social attacks that are far more fragile. Few companies take care of administrative, financial, familial, or personal safety. Thanks to the combination of VPN, HTTPS, fixed IPs, and firewall filters, you can safely deploy an application on Amazon Cloud as an extension to your own network, to access data from your legacy Oracle or SAP mainframe hosted in your datacenter. As a mobile application demonstrates, your data is already going out from your private network. There's no concrete reason why your backend application can't be hosted outside your walls. CloudBees – embrace the development stack CloudBees PaaS has something special in its DNA that you won't find in other PaaS; focusing on the Java ecosystem first, even with polyglot support, CloudBees understands well the Java ecosystem's complexity and its underlying practices. Heroku was one of the first successful PaaS, focusing on Ruby runtime. Deployment of a Ruby application is just about sending source code to the platform using the following command: git push heroku master Ruby is a pleasant ecosystem because there are no such long debates on building and provisioning tools that we know of, unlike in JavaWorld, GemFile, and Rake, period. In the Java ecosystem, there is a need to generate, compile the source code, and then sometime post the process classes, hence a large set of build tools are required. There's also a need to provision runtime with dozens of dependencies, so a set of dependency management tools, inter-project relations, and so on are required. With Agile development practices, automated testing has introduced a huge set of test frameworks that developers want to integrate into the deployment process. The Java platform is not just about hosting a JVM or a servlet container, it's about managing Ant, Maven, SBT, or Gradle builds, as well as Grails-, Play-, Clojure-, and Scala-specific tooling. It's about hosting dependency repositories. It's about handling complex build processes to include multiple levels of testing and code analysis. The CloudBees platform has two major components: RUN@cloud is a PaaS, as described earlier, to host applications and provide high-level runtime services DEV@cloud is a continuous integration and deployment SaaS based on Jenkins Jenkins is not the subject of this article, but it is the de facto standard for but not limited to continuous integration in the Java ecosystem. With a large set of plugins, it can be extended to support a large set of tools, processes, and views about your project. The CloudBees team includes major Jenkins committers (including myself #selfpromotion), and so it has a deep knowledge on Jenkins ecosystem and is best placed to offer it as a Cloud service. We also can help you to diagnose your project workflow by applying the best continuous integration and deployment practices. This also helps you to get more efficient and focused results on your actual business development. The following screenshot displays the continuous Cloud delivery concept in CloudBees: With some CloudBees-specific plugins to help, DEV@cloud Jenkins creates a smooth code-build-deploy pipeline, comparable to Heroku's Git push, but with full control over the intermediary process to convert your source code to a runnable application. This is such a significant component to build a full stack for Java developers that CloudBees is the official provider for the continuous integration service for Google App Engine (http://googleappengine.blogspot.fr/2012/10/jenkins-meet-google-app-engine.html), Cloud Foundry (http://blog.cloudfoundry.com/2013/02/28/continuous-integration-to-cloud-foundry-com-using-jenkins-in-the-cloud/), and Amazon. Summary This article introduced the Cloud principles and benefits, and compared CloudBees to its competitors. Resources for Article: Further resources on this subject: Framework Comparison: Backbase AJAX framework Vs Other Similar Framework (Part 2) [Article] Integrating Spring Framework with Hibernate ORM Framework: Part 2 [Article] Working with Zend Framework 2.0 [Article]

0
0
2872

Packt

19 Dec 2013

5 min read

JBoss EAP6 Overview

Packt

19 Dec 2013

5 min read

(For more resources related to this topic, see here.) Understanding high availability To understand the term high availability, here is its definition from Wikipedia: "High availability is a system design approach and associated service implementation that ensures that a prearranged level of operational performance will be met during a contractual measurement period. Users want their systems, for example, hospitals, production computers, and the electrical grid to be ready to serve them at all times. If a user cannot access the system, it is said to be unavailable." In the IT field, when we mention the words "high availability", we usually think of the uptime of the server, and technologies such as clustering and load balancing can be used to achieve this. Clustering means to use multiple servers to form a group. From their perspective, users see the cluster as a single entity and access it as if it's just a single point. The following figure shows the structure of a cluster: To achieve the previously mentioned goal, we usually use a controller of the cluster, called load balancer, to sit in front of the cluster. Its job is to receive and dispatch user requests to a node inside the cluster, and the node will do the real work of processing the user requests. After the node processes the user request, the response will be sent to the load balancer, and the load balancer will send it back to the users. The following figure shows the workflow: Besides load balancing user requests, the clustering system can also do failover inside itself. Failover means when a node has crashed, the load balancer can switch to other running nodes to process user requests. In a cluster, some nodes may fail during runtime. If this happens, the requests to the failed nodes should be redirected to the healthy nodes. The process is shown in the following figure: To make failover possible, the node in a cluster should be able to replicate user data from one to another. In JBoss EAP6, the Infinispan module, which is a data-grid solution provided by the JBoss community, does the web session replication. If one node fails, the user request could be redirected to another node; however, the session with the user won't be lost. The following figure illustrates failover: To achieve the previously mentioned goals, the JBoss community has provided us a powerful set of tools. In the next section we'll have an overview on it. JBoss EAP6 high availability As a Java EE application server, JBoss EAP6 uses modules coming from different open source projects: Web server (JBossWeb) EJB (JBoss EJB3) Web service (JBossWS/RESTEasy) Messaging (HornetQ) JPA and transaction management (Hibernate/Narayana) As we can see, JBoss EAP6 uses many more open source projects, and each part may have its own consideration to achieve the goal of high availability. Now let's have a brief on these parts with respect to high availability: JBoss Web, Apache httpd, mod_jk, and mod_cluster The clustering for a web server may be the most popular topic and is well understood by the majority. There are a lot of good solutions in the market. For JBoss EAP6, the solution it adopted is to use Apache httpd as the load balancer. httpd will dispatch the user requests to the EAP server. Red Hat has led two open source projects to work with httpd, which are called mod_jk and mod_cluster. In this article we'll learn how to use these two projects. EJB session bean JBoss EAP6 has provided the @org.jboss.ejb3.annotation.Clustered annotation that we can use on both the @Stateless and @Stateful session beans. The clustered annotation is JBoss EAP6/WildFly specific implementation. When using @Clustered with @Stateless, the session bean can be load balanced; and when @Clustered is used with the @Stateful bean, the state of the bean will be replicated in the cluster. JBossWS and RESTEasy JBoss EAP6 provides two web service solutions out of the box. One is JBossWS and the other is RESTEasy. JBossWS is a web service framework that implements the JAX-WS specification. RESTEasy is an implementation of the JAX-RS specification to help you to build RESTFul web services. HornetQ HornetQ is a high-performance messaging system provided by the JBoss community. The messaging system is designed to be asynchronous and has its own consideration on load balancing and failover. Hibernate and Narayana In the database and transaction management field, high availability is a huge topic. For example, each database vendor may have their own solutions on load balancing the database queries. For example, PostgreSQL has some open source solutions, for example, Slony and pgpool, which can let us replicate the database from master to slave and which distributes the user queries to different database nodes in a cluster. In the ORM layer, Hibernate also has projects such as Hibernate Shards that can deploy a data base in a distributed way. JGroups and JBoss Remoting JGroups and JBoss Remoting are the cornerstone of JBoss EAP6 clustering features, which enable it to support high availability. JGroups is a reliable communication system based on IP multicasting. JGroups is not limited to multicast and can use TCP too. JBoss Remoting is the underlying communication framework for multiple parts in JBoss EAP6. Summary In this article we learned the basic concepts about high availability and also had an overview of the basic functions of JBoss EAP6. This will help you in understanding JBoss EAP6 in a better way. Resources for Article: Further resources on this subject: Introduction to JBoss Clustering [Article] JBoss RichFaces 3.3 Supplemental Installation [Article] JBoss AS plug-in and the Eclipse Web Tools Platform [Article]

0
0
2163

Packt

19 Dec 2013

4 min read

Reporting

Packt

19 Dec 2013

4 min read

(For more resources related to this topic, see here.) Creating a pie chart First, we made the component test CT for display purposes, but now let's create the CT to make it run. We will use the Direct function, so let's prepare that as well. In reality we've done this already. Duplicate a different app.html and change the JavaScript file like we have done before. Please see the source file for the code: 03_making_a_pie_chart/ct/dashboard/pie_app.html. Implementing the Direct function Next, prepare the Direct function to read the data. First, it's the config.php file that defines the API. Let's gather them together and implement the four graphs (source file: 04_implement_direct_function/php/config.php). .... 'MyAppDashBoard'=>array( 'methods'=>array( 'getPieData'=>array( 'len'=>0 ), 'getBarData'=>array( 'len'=>0 ), 'getLineData'=>array( 'len'=>0 ), 'getRadarData'=>array( 'len'=>0 ) ) .... Next, let's create the following methods to acquire data for the various charts: getPieData getBarData getLineData getRadarData First, implement the getPieData method for the pie chart. We'll implement the Direct method to get the data for the pie chart. Please see the actual content for the source code (source file: 04_implement_direct_function/php/classes/ MyAppDashBoard.php ). This is acquiring valid quotation and bill data items. With the data to be sent back to the client, set the array in items and set up the various names and data in a key array. You will now combine the definitions in the next model. Preparing the store for the pie chart Charts need a store, so let's define the store and model (source file: 05_prepare_the_store_for_the_pie_chart/app/model/ Pie.js). We'll create the MyApp.model.Pie class that has the name and data fields. Connect this with the data you set with the return value of the Direct function. If you increased the number of fields inside the model you just defined, make sure to amend the return field values, otherwise it won't be applied to the chart, so be careful. We'll use the model we made in the previous step and implement the store (source file: 05_prepare_the_store_for_the_pie_chart/app/model/ Pie.js). Ext.define('MyApp.store.Pie', { extend: 'Ext.data.Store', storeId: 'DashboardPie', model: 'MyApp.model.Pie', proxy: { type: 'direct', directFn: 'MyAppDashboard.getPieData', reader: { type: 'json', root: 'items' } } }) Then, define the store using the model we made and set up the Direct function we made earlier in the proxy. Creating the View We have now prepared the presentation data. Now, let's quickly create the view to display it (source file: 06_making_the_view/app/view/dashboard/Pie.js). Ext.define('MyApp.view.dashboard.Pie', { extend: 'Ext.panel.Panel', alias : 'widget.myapp-dashboard-pie', title: 'Pie Chart', layout: 'fit', requires: [ 'Ext.chart.Chart', 'MyApp.store.Pie' ], initComponent: function() { var me = this, store; store = Ext.create('MyApp.store.Pie'); Ext.apply(me, { items: [{ xtype: 'chart', store: store, series: [{ type: 'pie', field: 'data', showInLegend: true, label: { field: 'name', display: 'rotate', contrast: true, font: '18px Arial' } }] }] }); me.callParent(arguments); } }); Implementing the controller With the previous code, data is not being read by the store and nothing is being displayed. In the same way that reading was performed with onShow, let's implement the controller (source file: 06_making_the_view/app/controller/DashBoard.js): Ext.define('MyApp.controller.dashboard.DashBoard', { extend: 'MyApp.controller.Abstract', screenName: 'dashboard', init: function() { var me = this; me.control({ 'myapp-dashboard': { 'myapp-show': me.onShow, 'myapp-hide': me.onHide } }); }, onShow: function(p) { p.down('myapp-dashboard-pie chart').store.load(); }, onHide: function() { } }); With the charts we create from now on, as we create them it would be good to add the reading process to onShow. Let's take a look at our pie chart which appears as follows: Summary You must agree this is starting to look like an application! The dashboard is the first screen you see right after logging in. Charts are extremely effective in order to visually check a large and complicated amount of data. If you keep adding panels as and when you feel it's needed, you'll increase its practicability. This sample will become a customizable base for you to use in future projects. Resources for Article: Further resources on this subject: So, what is Ext JS? [Article] Buttons, Menus, and Toolbars in Ext JS [Article] Displaying Data with Grids in Ext JS [Article]

0
0
4550

How-To Tutorials

article-image-sharing-your-bi-reports-and-dashboards

Packt

19 Dec 2013

4 min read

Sharing Your BI Reports and Dashboards

Packt

19 Dec 2013

4 min read

(For more resources related to this topic, see here.) The final objective of the information in the BI reports and dashboards is to detect the cause-effect business behavior and trends, and trigger actions to solve them. These actions supported by visual information, via scorecards and dashboards. This process requires an interaction with several people. MicroStrategy includes the functionality to share our reports, scorecards, and dashboards, regardless of the location of the people. Reaching your audience MicroStrategy offers the option to share our reports via different channels that leverage the latest social technologies that are already present in the marketplace, that is, MicroStrategy integrates with Twitter and Facebook. The sharing is like avoiding any related costs and maintaining the design premise of the do-it-yourself approach without any help from specialized IT personnel. Main menu The main menu of MicroStrategy shows a column named Status. When we click on that column, as shown in the following screenshot, the Share option appears: The Share button The other option is the Share button within our reports, that is, the view that we want to share. Select the Share button located at the bottom of the screen, as shown in the following screenshot: The share options are the same, regardless of the location where you activate the option; the various alternate menus are shown in the following screenshot: E-mail sharing While selecting the e-mail option from the Scorecards-Dashboards model, the system will ask you for the e-mail programs that you want to use in order to send an e-mail; in our case, we select Outlook. MicroStrategy automatically prepares an e-mail with a link to share it. You can modify the text, and select the recipients of the e-mail, as shown in the following screenshot: The recipients of the e-mail will click on the URL that is included in the e-mail, send it by this schema, and the user will be able to analyze the report in a read-only mode with only the Filters panel enabled. The following screenshot shows how the user will review the report. Also, the user is not allowed to make any modifications. This option does not require a MicroStrategy platform user account. When a user clicks on the link, he is able to edit the filters and perform their analyses, as well as switch to any available layout, in our case, scorecards and dashboards. As a result, any visualization object can be maximized and minimized for better analysis, as shown in the following screenshot: In this option, the report can be visualized in a fullscreen mode by clicking on the fullscreen button [] located at the top-right corner of the screen. In this sharing mode, the user is able to download the information in Excel and PDF formats for each visualization object. For instance, if you need all the data included in the grid for the stores in region 1 opened in the year 2000. Perform the following steps: In the browser, open the URL that is generated when you select the e-mail share option. Select the ScoreCard tab. In the Open Year filter, type 2012 and in the Region filter, type 1. Now, maximize the grid. Two icons will appear in the top-left corner of the screen: one for exporting the data to Excel and the other for exporting it to PDF for each visualization object, as shown in the following screenshot: Please keep in mind that these two export options only apply to a specific visualization object; it is not possible to export the complete report from this functionality that is offered to the consumer. Summary In this article, we learned how to share our scorecards and dashboards via several channels, such as e-mails, social networks (Twitter and Facebook), and blogs or corporate intranet sites. Resources for Article: Further resources on this subject: Participating in a business process (Intermediate) [Article] Self-service Business Intelligence, Creating Value from Data [Article] Exploring Financial Reporting and Analysis [Article]

0
0
1644

Packt

19 Dec 2013

4 min read

Background Animation

Packt

19 Dec 2013

4 min read

0
0
2520

How-To Tutorials

Packt

18 Dec 2013

9 min read

Code Editing

Packt

18 Dec 2013

9 min read

0
0
1468

How-To Tutorials

Packt

18 Dec 2013

5 min read

Working with AMQP

Packt

18 Dec 2013

5 min read

(for more resources related to this topic, see here.) Broadcasting messages In this example we are seeing how to send the same message to a possibly large number of consumers. This is a typical messaging application, broadcasting to a huge number of clients. For example, when updating the scoreboard in a massive multiplayer game, or when publishing news in a social network application. In this article we are discussing both the producer and consumer implementation. Since it is very typical to have consumers using different technologies and programming languages, we are using Java, Python, and Ruby to show interoperability with AMQP. We are going to appreciate the benefits of having separated exchanges and queues in AMQP. Getting ready To use this article you will need to set up Java, Python and Ruby environments as described. How to do it… To cook this article we are preparing four different codes: The Java publisher The Java consumer The Python consumer The Ruby consumer To prepare a Java publisher: Declare a fanout exchange: channel.exchangeDeclare(myExchange, "fanout"); Send one message to the exchange: channel.basicPublish(myExchange, "", null, jsonmessage.getBytes()); Then to prepare a Java consumer: Declare the same fanout exchange declared by the producer: channel.exchangeDeclare(myExchange, "fanout"); Autocreate a new temporary queue: String queueName = channel.queueDeclare().getQueue(); Bind the queue to the exchange: channel.queueBind(queueName, myExchange, ""); Define a custom, non-blocking consumer. Consume messages invoking channel.basicConsume() The source code of the Python consumer is very similar to the Java consumer, so there is no need to repeat the needed steps. In the Ruby consumer you need to use require "bunny" and then use the URI connection. We are now ready to mix all together, to see the article in action: Start one instance of the Java producer; messages start getting published immediately. Start one or more instances of the Java/Python/Ruby consumer; the consumers receive only the messages sent while they are running. see that the consumer has lost the messages sent while it was down. How it works… Both the producer and the consumers are connected to RabbitMQ with a single connection, but the logical path of the messages is depicted in the following figure: In step 1 we have declared the exchange that we are using. The logic is the same as in the queue declaration: if the specified exchange doesn't exist, create it; otherwise, do nothing. The second argument of exchangeDeclare() is a string, specifying the type of the exchange, fanout in this case. In step 2 the producer sends one message to the exchange. You can just view it along with the other defined exchanges issuing the following command on the RabbitMQ command shell: rabbitmqctl list_exchanges The second argument in the call to channel.basicPublish() is the routing key, which is always ignored when used with a fanout exchange. The third argument, set to null, is the optional message property. The fourth argument is just the message itself. When we started one consumer, it created its own temporary queue (step 9). Using the channel.queueDeclare() empty overload, we are creating a nondurable, exclusive, autodelete queue with an autogenerated name. Launching a couple of consumers and issuing rabbitmqctl list_queues, we can see two queues, one per consumer, with their odd names, along with the persistent myFirstQueue as shown in the following screenshot: In step 5 we have bound the queues to myExchange. It is possible to monitor these bindings too, issuing the following command: rabbitmqctl list_bindings The monitoring is a very important aspect of AMQP; messages are routed by exchanges to the bound queues, and buffered in the queues. Exchanges do not buffer messages; they are just logical elements. The fanout exchange routes messages by just placing a copy of them in each bound queue. So, no bound queues and all the messages are just received by no one consumer. As soon as we close one consumer, we implicitly destroy its private temporary queue (that's why the queues are autodelete; otherwise, these queues would be left behind unused, and the number of queues on the broker would increase indefinitely), and messages are not buffered to it anymore. When we restart the consumer, it will create a new, independent queue and as soon as we bind it to myExchange, messages sent by the publisher will be buffered into this queue and pulled by the consumer itself. There's more… When RabbitMQ is started for the first time, it creates some predefined exchanges. Issuing rabbitmqctl list_exchanges we can observe many existing exchanges, in addition to the one that we have defined in this article: All the amq.* exchanges listed here are already defined by all the AMQP-compliant brokers and can be used instead of defining your own exchanges; they do not need to be declared at all. We could have used amq.fanout in place of myLastnews.fanout_6, and this is a good choice for very simple applications. However, applications generally declare and use their own exchanges. See also With the overload used in the article, the exchange is non-autodelete (won't be deleted as soon as the last client detaches it) and non-durable (won't survive server restarts). You can find more available options and overloads at http://www.rabbitmq.com/releases/rabbitmq-java-client/current-javadoc/. Summary In this article, we are mainly using Java since this language is widely used in enterprise software development, integration, and distribution. RabbitMQ is a perfect fit in this environment. resources for article: further resources on this subject: RabbitMQ Acknowledgements [article] Getting Started with ZeroMQ [article] Setting up GlassFish for JMS and Working with Message Queues [article]

0
0
2775

article-image-component-based-approach-unity

Packt

18 Dec 2013

4 min read

Component-based approach of Unity

Packt

18 Dec 2013

4 min read

(For more resources related to this topic, see here.) First of all, you have a project, which is essentially a folder that contains all of the files and information about your game. Some of the files are called scenes (think of them as levels). A scene contains a number of game objects that you have added to it. The contents of your scenes are determined by you, and you can have as many of them as you want. You can also make your game switch between different scenes, thus making different sets of game objects active. On a smaller scale, you have game objects and components. A game object by itself is simply an invisible container that does not do anything. Without adding appropriate components to it, it cannot, for instance, appear in the scene, receive input from the player, or move and interact with other objects. Using components, you can easily assemble powerful game objects while reusing several small parts, each responsible for a simple task or behavior—rendering the game object, handling the input, taking damage, playing an audio effect, and so on—making your game much simpler to develop and manage. Unity relies heavily on this approach, so the better you grasp it, the faster you will get good at it. The only component that each and every game object in Unity has attached to it by default is Transform. It lets you define the game object's position, rotation, and scale. Normally, you can attach, detach, and destroy components in any given game object at will, but you cannot remove Transform. Each component has a number of properties that you can access and change: these can be integer or floating point numbers, strings of text, textures, scripts, references to game objects or other components. They are used to change the way a certain component behaves, to influence its appearance or interaction. Some of the properties include the position, rotation, and scale properties of the Transform component. The following screenshot shows the Wall game object with the Transform, Mesh Filter, Box Collider, Mesh Renderer, and Script components attached to it. the properties of Transform are displayed. In order to reveal or hide a component's properties you need to left-click on its name or on the small arrow on the left of its icon. Unity has a number of predefined game objects that already have components attached to them, such as cameras, lights, and primitives. You can access them by choosing GameObject | Create from the main menu. Alternatively, you can create empty game objects by pressing command + Shift + N (Ctrl + Shift + N in Windows) and attach components to them using the Components submenu. The following figure shows the project structure that we have discussed. Note that there can be any number of scenes within a single project, any number of game objects within a single scene, any number of components attached to a single game object, and finally, any number of properties within a single component. One final thing that you need to know about components right now is that you can copy them by right-clicking on the name of the component in the Inspector panel and selecting Copy Component from the contextual menu shown in the following screenshot. You can also reset the properties of the components to their default values, remove components, and move them up or down for your convenience. Summary This article has covered the basic concept of the component-based approach of Unity and the figures/screenshots demonstrate the various aspect of the same. Resources for Article: Further resources on this subject: Mobile Game Design [Article] Unity Game Development: Welcome to the 3D world [Article] Interface Designing for Games in iOS [Article]

0
0
7196

article-image-getting-started-apache-nutch

Packt

18 Dec 2013

13 min read

Getting Started with Apache Nutch

Packt

18 Dec 2013

13 min read

(For more resources related to this topic, see here.) Introduction of Apache Nutch Apache Nutch is a very robust and scalable tool for webcrawling and it can be integrated with scripting language i.e Python for web crawling. You can use it whenever your application contains huge data and you want to apply crawling on your data. Apache Nutch is an Open Source WebCrawler Software which is used for crawling websites. You can create your own search engine like google if you understand Apache Nutch clearly. It will provide you your own search engine using which you can increase your application page rank in searching and also customize your application searching according to your needs. It is extensible and scalable. It facilitates for parsing, indexing, creating your own search engine, customize search according to needs, scalability, robustness and ScoringFilter for custom implementations. ScoringFilter is a Java class which is used while creating Apache Nutch plugin. It is used for manipulating scoring variables. We can run Apache Nutch on a single machine as well as distributed environment like Apache Hadoop. It is written in Java. We can find broken links using Apache Nutch, create a copy of all the visited pages for searching over for example: Build indexes. We can find Web page hyperlinks in an automated manner. Apache Nutch can be integrated with Apache Solr easily and we can index all the webpages which are crawled by Apache Nutch to Apache Solr. We can then use Apache Solr for searching the webpages which are indexed by Apache Nutch. Apache Solr is a search platform which is built on top of Apache Lucene. It can be used for searching any type of data for example webpages. Crawling your first website Crawling is driven by Apache Nutch crawling tool and certain related tools for building and maintaining several data structures. It includes web database, the index and a set of segments. Once Apache Nutch has indexed the webpages to Apache Solr, you can search for the required webpage(s) in Apache Solr. Apache Solr Installation Apache Solr is a search platform which is built on top of Apache Lucene. It can be used for searching any type of data for example webpages. It’s a very powerful searching mechanism and provides full-text search, dynamic clustering, database integration, rich document handling and many more. Apache SOLR will be used for indexing urls which are crawled by Apache Nutch and then one can search the details in Apache SOLR crawled by Apache Nutch. Crawling your website using the crawl script Apache Nutch 2.2.1 comes with the facility of crawl script which does crawling by just executing one single script. In earlier version, we have to manually do each step like generating data, fetching data, parsing data and so on for perfrom crawling. Crawling the web, the CrawlDb, and URL filters When user invokes crawling command in Apache Nutch 1.x, crawlDB is generated by Apache Nutch which is nothing but a directory which contains details about crawling. In Apache 2.x, crawlDB is not present. Instead Apache Nutch keeps all the crawling data directly into the database. InjectorJob The injector will add the necessary urls to the crawldb. Crawldb is the directory which is created by Apache Nutch for storing data related to crawling. You need to provide urls to InjectorJob either by downloading urls from internet or writing your own file which contains urls. Let’s say you have created one directory called urls which contains all the urls that needs to be injected in cralwdb. Following command will be used for perform the InjectorJob: #bin/nutch inject crawl/crawldb urls Urls will be directory which contains all the urls which needs to be injected in crawldb. Crawl/crawldb is the directory in which injected urls will be placed. After performing this job, you have number of unfetched urls inside your database i.e crawldb. GeneratorJob Once we have done with the InjectorJob, now it’s time to fetch the injected urls from crawldb. So for fetching the urls, you need to perform GeneratorJob before. Follwing command will be used for GeneratorJob: #bin/nutch generate crawl/crawldb crawl/segments Crawldb is the directory from where urls are generated. Segments is the directory which is used by GeneratorJob to fetch the necessary information required for crawling. FetcherJob The job of the fetch is to fetch the urls which are generated by GeneratorJob. It will use the input provided by GeneratorJob. Follwing command will be used for FetcherJob: #bin/nutch fetch –all Here I have provided input parameters –all which means this job will fetch all the urls which are generated by GeneratorJob. You can use different input parameters according to your needs. ParserJob After FetcherJob, ParserJob is to parse the urls which are fetched by FetcherJob. Follwing command will be used for ParserJob: # bin/nutch parse –all I have used input parameters –all which will parse all the urls which are fetched by FetcherJob. You can use different input parameter according to your needs. DbUpdaterJob Once the ParserJob has been completed, we need to update the database by providing results of the FetcherJob. This will update the respected databases with the last fetched urls. Following command will be used for DbUpdaterJob: # bin/nutch updatedb crawl/crawldb –all After performing this job, database will contain both updated entries of all the initial pages and also contains the new entities which are correspond to the newly discovered pages which are linked from the initial set. Invertlinks Before applying indexing, we need to first invert all the links. After this we will be able to index incoming anchor text with the pages. Following command will be used for Invertlinks: # bin/nutch invertlinks crawl/linkdb -dir crawl/segments Apache Hadoop Apache Hadoop is designed for running your application on servers where there will be lot of computers in which one will be master computer and rest will be the slave computers. So it’s huge data warehouse. Master computers are the computers which will direct slave computers for data processing. So processing is done by slave computers. This is the reason why Apache Hadoop is used for processing huge amount of data as process is divided into the number of slave computers and that’s why Apache Hadoop gives highest throughput for any processing. So as data will increase, you need to increase number of slave computers. That’s how Apache Hadoop functionality runs. Integration of Apache Nutch with Apache Hadoop Apache Nutch can be easily integrated with Apache Hadoop and we can make our process much faster than running Apache Nutch on single machine. After integrating Apache Nutch with Apache Hadoop, we can perform crawling on Apache Hadoop cluster environment. So the process will be much faster and we will get highest amount of throughput. Apache Hadoop Setup with Cluster This setup is not required a huge hardware to purchase and running Apache Nutch and Apache Hadoop. It is designed in such a way to make the use of hardware maximum. Formatting the HDFS filesystem using the NameNode HDFS stands for Hadoop Distributed File system is a directory which is used by Apache Hadoop for storage purpose. So it’s the directory which stroes all the data related to Apache Hadoop. It has two components as NameNode and DataNode in which NameNode manages the filesystem metadata and DataNodes actually stores the data. It’s highly configurable and suited well for many installations. When there are very large clusters, at that time configuration needs to be tuned. The first step for getting start your Apache Hadoop is the formatting Hadoop filesystem which is implemented on top of the local filesystem of your cluster(which will include only your local machine if you have followed). Setting up the deployment architecture of Apache Nutch We have to setup Apache Nutch on each of the machine which we are using. In this case, we are using six machines cluster. So we have to setup Apache Nutch on each machine. For the less number of machines in our cluster configuration, we can setup manually on each machine. But when the machines are more, let’s say we have 100 machines in our cluster environment. So we can’t setup on each machine manually. For that we require some deployment tool such as Chef or ateleast distributed ssh. You can refer to http://www.opscode.com/chef/ for getting familiar with Chef. You can refer http://www.ibm.com/developerworks/aix/library/au-satdistadmin/for getting familiar with distributed ssh.I will just demonstrate about running Apache Hadoop on Ubuntu for Single-Node Cluster. If you want to go for running Apache Hadoop on Ubuntu for Multi-Node cluster then I have already provided reference link above. You can follow that and configure the same. Once we have done with the deployment of Apache Nutch to single machine, we will run this script start-all.sh that will start the services on the master node and data nodes. It means the script will begin the hadoop daemons on the master node and so we are able to login into all the slave nodes using ssh command as explained above and will begin daemons on the slave nodes. The start-all.sh script expects that Apache Nutch should be put on the same location on each machine. It is also expecting that Apache Hadoop is storing the data at the same filepath on each machine. The start-all.sh script which starts the daemons on the master and slave nodes are going to use password-less login using ssh. Introduction of Apache Nutch configuration with Eclipse Apache Nutch can be easily configured with Eclipse. After that we can perform crawling easily using Eclipse. So need to perform crawling from command line. We can use eclipse for all the operations of crawling which we are doing from command line.Instructions are provided for fixing a development environment for Apache Nutch with Eclipse IDE. It's supposed to give a comprehensive starting resource for configuring, building, crawling and debugging of Apache Nutch within the above of context. Following are the prerequisites for Apache Nutch integration with Eclipse: Get the latest version of Eclipse from http://www.eclipse.org/downloads/packages/release/juno/r All the required subsequent are available from the Eclipse Marketplace. But if they are not, you can download eclipse market place as follows http://marketplace.eclipse.org/marketplace-client-intro Once you've configuired Eclipse, Download as per here http://subclipse.tigris.org/. If you have faced a problem with the 1.8.x release, try 1.6.x. This may resolve compatability issues. Download IvyDE plugin for Eclipse as here http://ant.apache.org/ivy/ivyde/download.cgi Download m2e plugin for Eclipse here http://marketplace.eclipse.org/content/maven-integration-eclipse Introduction of Apache Accumulo Accumulo is basically used as the datastore for storing data. So same way as we are using different databases like MySQL, Oracle, etc. So same way Apache Accumulo can be used. The key point of Apache Accumulo is, it is running on Apache Hadoop Cluster environment. So that's a very good feature with Accumulo.Accumulo sorted, distributed key/value store could be a strong, scalable, high performance information storage and retrieval system. Apache Accumulo depends on Google's BigTable design and is built ontop of Apache Hadoop, ,Thrift and Zookeeper. Apache Accumulo features a some novel improvement on the BigTable design within a form of cell-based access management and the server-side programming mechanism which will do modificationication in key/value pairs at varied points within the data management process Introduction of Apache Gora Apache Gora open source framework providesin-memory data model and persistence for large data. Apache Gora supports persisting to column stores, key and value stores, document stores and RDBMSs and analyzing the data with extensive Apache Hadoop MapReduce support. Supported Datastores Apache Gora presently supports the subsequent datastores: AccumuloProphetess PApache Hbase Amazon DynamoDB Use of Apache Gora Although there are many excellent ORM frameworks for relational databases and data modeling in NoSQL data stores different profoundly from their relative cousins. DataD-model agnostic frameworks like JDO aren't comfortable to be used cases, wherever one has to use the complete power of data models in column stores. Gora fills the thegap giving user an easy-to-use in-memory data model plus persistence for large data frameworkproviding data store specific mappings and also in built Apache Hadoop support. Integration of Apache Nutch with Apache Accumulo In this section, we are going to cover the integration process for integrating Apache Nutch with Apache Accumulo. Apache Accumulo is basically used for a huge data storeage. It is built on the top of Apache Hadoop, Zookeeper and Thrift. So a potential use of integrating Apache Nutch with Apache Accumulo is when our application has huge data to process and we want to run our application in cluste environment. At that time we can use Apache Accumulo as data storage purpose. As Apache Accumulo only running with Apache Hadoop, maximum use of Apache Accumulo would be in cluster based environment. So first we will start with the configuration of Apache GORA with Apache Nutch. Then we will setup Apache Hadoop and Zookeeper. Then we will do installation and configuration of Apache Accumulo. Then we will test Apache Accumulo and at the end we will see Crawling with Apache Nutch on Apache Accumulo. Setup Apache Hadoop and Apache Zookeeper for Apache Nutch Apache Zookeeper is a centralized service which is used for maintaining configuration information, provideses distributed synchronization, naming and also provideses group services. All these services are used by distributed applications in one or another manner. So all these services are provided by zookeeper so you don’t have to write these services from scratch. You can use these services for implementing consensus, management, group, leader election and presence protocols and you can also build it for your own requirements. Apache Accumulo is built on the top of Apache Hadoop, Zookeeper. So we must configure Apache Accumulo within Apache Hadoop and Apache Zookeeper. You can referrer to http://www.covert.io/post/18414889381/accumulo-nutch-and-gora for any queries related to setup. Integration of Apache Nutch with MySQL In this section, we are going to integrate Apache Nutch with MySQL. So after that you can crawled webpages in Apache Nutch that will be stored in MYSQL. So you can go to MySQL and check your crawled webpages and also perform necessary operations. We will start with the introduction of MySQL then we will cover what is the need of integrating MySQL with Apache Nutch. After that we will see configuration of MySQL with Apache Nutch and at the end we will do crawling with Apache Nutch on MySQL. So let’s just start with the introduction of MYSQL. Summary We covered the following: Downloading Apache Hadoop and Apache Nutch Perform Crawling on Apache Hadoop Cluster in Apache Nutch Apache Nutch configuration with eclipse Installation steps of building Apache Nutch with Eclipse Crawling in Eclipse Configuration of Apache GORA with Apache Nutch Installation and Configuration of Apache Accumulo Crawling with Apache Nutch on Apache Accumulo Need of integrating MySQL with Apache Nutch Resources for Article: Further resources on this subject: Getting Started with the Alfresco Records Management Module [Article] Making Big Data Work for Hadoop and Solr [Article] Apache Solr PHP Integration [Article]

0
0
3017

Packt

18 Dec 2013

16 min read

Hello OpenCL

Packt

18 Dec 2013

16 min read

(For more resources related to this topic, see here.) The Wikipedia definition says that, Parallel Computing is a form of computation in which many calculations are carried out simultaneously, operating on the principle that large problems can often be divided into smaller ones, which are then solved concurrently (in parallel). There are many Parallel Computing programming standards or API specifications, such as OpenMP, OpenMPI, Pthreads, and so on. This book is all about OpenCL Parallel Programming. In this article, we will start with a discussion on different types of parallel programming. We will first introduce you to OpenCL with different OpenCL components. We will also take a look at the various hardware and software vendors of OpenCL and their OpenCL installation steps. Finally, at the end of the article we will see an OpenCL program example SAXPY in detail and its implementation. Advances in computer architecture All over the 20th century computer architectures have advanced by multiple folds. The trend is continuing in the 21st century and will remain for a long time to come. Some of these trends in architecture follow Moore's Law. "Moore's law is the observation that, over the history of computing hardware, the number of transistors on integrated circuits doubles approximately every two years". Many devices in the computer industry are linked to Moore's law, whether they are DSPs, memory devices, or digital cameras. All the hardware advances would be of no use if there weren't any software advances. Algorithms and software applications grow in complexity, as more and more user interaction comes into play. An algorithm can be highly sequential or it may be parallelized, by using any parallel computing framework. Amdahl's Law is used to predict the speedup for an algorithm, which can be obtained given n threads. This speedup is dependent on the value of the amount of strictly serial or non-parallelizable code (B). The time T(n) an algorithm takes to finish when being executed on n thread(s) of execution corresponds to: T(n) = T(1) (B + (1-B)/n) Therefore the theoretical speedup which can be obtained for a given algorithm is given by : Speedup(n) = 1/(B + (1-B)/n) Amdahl's Law has a limitation, that it does not fully exploit the computing power that becomes available as the number of processing core increase. Gustafson's Law takes into account the scaling of the platform by adding more processing elements in the platform. This law assumes that the total amount of work that can be done in parallel, varies linearly with the increase in number of processing elements. Let an algorithm be decomposed into (a+b). The variable a is the serial execution time and variable b is the parallel execution time. Then the corresponding speedup for P parallel elements is given by: (a + P*b) Speedup = (a + P*b) / (a + b) Now defining α as a/(a+b), the sequential execution component, as follows, gives the speedup for P processing elements: Speedup(P) = P – α *(P - 1) Given a problem which can be solved using OpenCL, the same problem can also be solved on a different hardware with different capabilities. Gustafson's law suggests that with more number of computing units, the data set should also increase that is, "fixed work per processor". Whereas Amdahl's law suggests the speedup which can be obtained for the existing data set if more computing units are added, that is, "Fixed work for all processors". Let's take the following example: Let the serial component and parallel component of execution be of one unit each. In Amdahl's Law the strictly serial component of code is B (equals 0.5). For two processors, the speedup T(2) is given by: T(2) = 1 / (0.5 + (1 – 0.5) / 2) = 1.33 Similarly for four and eight processors, the speedup is given by: T(4) = 1.6 and T(8) = 1.77 Adding more processors, for example when n tends to infinity, the speedup obtained at max is only 2. On the other hand in Gustafson's law, Alpha = 1(1+1) = 0.5 (which is also the serial component of code). The speedup for two processors is given by: Speedup(2) = 2 – 0.5(2 - 1) = 1.5 Similarly for four and eight processors, the speedup is given by: Speedup(4) = 2.5 and Speedup(8) = 4.5 The following figure shows the work load scaling factor of Gustafson's law, when compared to Amdahl's law with a constant workload: Comparison of Amdahl's and Gustafson's Law OpenCL is all about parallel programming, and Gustafson's law very well fits into this book as we will be dealing with OpenCL for data parallel applications. Workloads which are data parallel in nature can easily increase the data set and take advantage of the scalable platforms by adding more compute units. For example, more pixels can be computed as more compute units are added. Different parallel programming techniques There are several different forms of parallel computing such as bit-level, instruction level, data, and task parallelism. This book will largely focus on data and task parallelism using heterogeneous devices. We just now coined a term, heterogeneous devices. How do we tackle complex tasks "in parallel" using different types of computer architecture? Why do we need OpenCL when there are many (already defined) open standards for Parallel Computing? To answer this question, let us discuss the pros and cons of different Parallel computing Framework. OpenMP OpenMP is an API that supports multi-platform shared memory multiprocessing programming in C, C++, and Fortran. It is prevalent only on a multi-core computer platform with a shared memory subsystem. A basic OpenMP example implementation of the OpenMP Parallel directive is as follows: #pragma omp parallel { body; } When you build the preceding code using the OpenMP shared library, libgomp would expand to something similar to the following code: void subfunction (void *data) { use data; body; } setup data; GOMP_parallel_start (subfunction, &data, num_threads); subfunction (&data); GOMP_parallel_end (); void GOMP_parallel_start (void (*fn)(void *), void *data, unsigned num_threads) The OpenMP directives make things easy for the developer to modify the existing code to exploit the multicore architecture. OpenMP, though being a great parallel programming tool, does not support parallel execution on heterogeneous devices, and the use of a multicore architecture with shared memory subsystem does not make it cost effective. MPI Message Passing Interface (MPI) has an advantage over OpenMP, that it can run on either the shared or distributed memory architecture. Distributed memory computers are less expensive than large shared memory computers. But it has its own drawback with inherent programming and debugging challenges. One major disadvantage of MPI parallel framework is that the performance is limited by the communication network between the nodes. Supercomputers have a massive number of processors which are interconnected using a high speed network connection or are in computer clusters, where computer processors are in close proximity to each other. In clusters, there is an expensive and dedicated data bus for data transfers across the computers. MPI is extensively used in most of these compute monsters called supercomputers. OpenACC The OpenACC Application Program Interface (API) describes a collection of compiler directives to specify loops and regions of code in standard C, C++, and Fortran to be offloaded from a host CPU to an attached accelerator, providing portability across operating systems, host CPUs, and accelerators. OpenACC is similar to OpenMP in terms of program annotation, but unlike OpenMP which can only be accelerated on CPUs, OpenACC programs can be accelerated on a GPU or on other accelerators also. OpenACC aims to overcome the drawbacks of OpenMP by making parallel programming possible across heterogeneous devices. OpenACC standard describes directives and APIs to accelerate the applications. The ease of programming and the ability to scale the existing codes to use the heterogeneous processor, warrantees a great future for OpenACC programming. CUDA Compute Unified Device Architecture (CUDA) is a parallel computing architecture developed by NVIDIA for graphics processing and GPU (General Purpose GPU) programming. There is a fairly good developer community following for the CUDA software framework. Unlike OpenCL, which is supported on GPUs by many vendors and even on many other devices such as IBM's Cell B.E. processor or TI's DSP processor and so on, CUDA is supported only for NVIDIA GPUs. Due to this lack of generalization, and focus on a very specific hardware platform from a single vendor, OpenCL is gaining traction. CUDA or OpenCL? CUDA is more proprietary and vendor specific but has its own advantages. It is easier to learn and start writing code in CUDA than in OpenCL, due to its simplicity. Optimization of CUDA is more deterministic across a platform, since less number of platforms are supported from a single vendor only. It has simplified few programming constructs and mechanisms. So for a quick start and if you are sure that you can stick to one device (GPU) from a single vendor that is NVIDIA, CUDA can be a good choice. OpenCL on the other hand is supported for many hardware from several vendors and those hardware vary extensively even in their basic architecture, which created the requirement of understanding a little complicated concepts before starting OpenCL programming. Also, due to the support of a huge range of hardware, although an OpenCL program is portable, it may lose optimization when ported from one platform to another. The kernel development where most of the effort goes, is practically identical between the two languages. So, one should not worry about which one to choose. Choose the language which is convenient. But remember your OpenCL application will be vendor agnostic. This book aims at attracting more developers to OpenCL. There are many libraries which use OpenCL programming for acceleration. Some of them are MAGMA, clAMDBLAS, clAMDFFT, BOLT C++ Template library, and JACKET which accelerate MATLAB on GPUs. Besides this, there are C++ and Java bindings available for OpenCL also. Once you've figured out how to write your important "kernels" it's trivial to port to either OpenCL or CUDA. A kernel is a computation code which is executed by an array of threads. CUDA also has a vast set of CUDA accelerated libraries, that is, CUBLAS, CUFFT, CUSPARSE, Thrust and so on. But it may not take a long time to port these libraries to OpenCL. Renderscripts Renderscripts is also an API specification which is targeted for 3D rendering and general purpose compute operations in an Android platform. Android apps can accelerate the performance by using these APIs. It is also a cross-platform solution. When an app is run, the scripts are compiled into a machine code of the device. This device can be a CPU, a GPU, or a DSP. The choice of which device to run it on is made at runtime. If a platform does not have a GPU, the code may fall back to the CPU. Only Android supports this API specification as of now. The execution model in Renderscripts is similar to that of OpenCL. Hybrid parallel computing model Parallel programming models have their own advantages and disadvantages. With the advent of many different types of computer architectures, there is a need to use multiple programming models to achieve high performance. For example, one may want to use MPI as the message passing framework, and then at each node level one might want to use, OpenCL, CUDA, OpenMP, or OpenACC. Besides all the above programming models many compilers such as Intel ICC, GCC, and Open64 provide auto parallelization options, which makes the programmers job easy and exploit the underlying hardware architecture without the need of knowing any parallel computing framework. Compilers are known to be good at providing instruction-level parallelism. But tackling data level or task level auto parallelism has its own limitations and complexities. Introduction to OpenCL OpenCL standard was first introduced by Apple, and later on became part of the open standards organization "Khronos Group". It is a non-profit industry consortium, creating open standards for the authoring, and acceleration of parallel computing, graphics, dynamic media, computer vision and sensor processing on a wide variety of platforms and devices. The goal of OpenCL is to make certain types of parallel programming easier, and to provide vendor agnostic hardware-accelerated parallel execution of code. OpenCL (Open Computing Language) is the first open, royalty-free standard for general-purpose parallel programming of heterogeneous systems. It provides a uniform programming environment for software developers to write efficient, portable code for high-performance compute servers, desktop computer systems, and handheld devices using a diverse mix of multi-core CPUs, GPUs, and DSPs. OpenCL gives developers a common set of easy-to-use tools to take advantage of any device with an OpenCL driver (processors, graphics cards, and so on) for the processing of parallel code. By creating an efficient, close-to-the-metal programming interface, OpenCL will form the foundation layer of a parallel computing ecosystem of platform-independent tools, middleware, and applications. We mentioned vendor agnostic, yes that is what OpenCL is about. The different vendors here can be AMD, Intel, NVIDIA, ARM, TI, and so on. The following diagram shows the different vendors and hardware architectures which use the OpenCL specification to leverage the hardware capabilities: The heterogeneous system The OpenCL framework defines a language to write "kernels". These kernels are functions which are capable of running on different compute devices. OpenCL defines an extended C language for writing compute kernels, and a set of APIs for creating and managing these kernels. The compute kernels are compiled with a runtime compiler, which compiles them on-the-fly during host application execution for the targeted device. This enables the host application to take advantage of all the compute devices in the system with a single set of portable compute kernels. Based on your interest and hardware availability, you might want to do OpenCL programming with a "host and device" combination of "CPU and CPU" or "CPU and GPU". Both have their own programming strategy. In CPUs you can run very large kernels as the CPU architecture supports out-of-order instruction level parallelism and have large caches. For the GPU you will be better off writing small kernels for better performance. Hardware and software vendors There are various hardware vendors who support OpenCL. Every OpenCL vendor provides OpenCL runtime libraries. These runtimes are capable of running only on their specific hardware architectures. Not only across different vendors, but within a vendor there may be different types of architectures which might need a different approach towards OpenCL programming. Now let's discuss the various hardware vendors who provide an implementation of OpenCL, to exploit their underlying hardware. Advanced Micro Devices, Inc. (AMD) With the launch of AMD A Series APU, one of industry's first Accelerated Processing Unit (APU), AMD is leading the efforts of integrating both the x86_64 CPU and GPU dies in one chip. It has four cores of CPU processing power, and also a four or five graphics SIMD engine, depending on the silicon part which you wish to buy. The following figure shows the block diagram of AMD APU architecture: AMD architecture diagram—© 2011, Advanced Micro Devices, Inc. An AMD GPU consist of a number of Compute Engines (CU) and each CU has 16 ALUs. Further, each ALU is a VLIW4 SIMD processor and it could execute a bundle of four or five independent instructions. Each CU could be issued a group of 64 work items which form the work group (wavefront). AMD Radeon ™ HD 6XXX graphics processors uses this design. The following figure shows the HD 6XXX series Compute unit, which has 16 SIMD engines, each of which has four processing elements: AMD Radeon HD 6xxx Series SIMD Engine—© 2011, Advanced Micro Devices, Inc. Starting with the AMD Radeon HD 7XXX series of graphics processors from AMD, there were significant architectural changes. AMD introduced the new Graphics Core Next (GCN) architecture. The following figure shows an GCN compute unit which has 4 SIMD engines and each engine is 16 lanes wide: GCN Compute Unit—© 2011, Advanced Micro Devices, Inc. A group of these Compute Units forms an AMD HD 7xxx Graphics Processor. In GCN, each CU includes four separate SIMD units for vector processing. Each of these SIMD units simultaneously execute a single operation across 16 work items, but each can be working on a separate wavefront. Apart from the APUs, AMD also provides discrete graphics cards. The latest family of graphics card, HD 7XXX, and beyond uses the GCN architecture. NVIDIA® One of NVIDIA GPU architectures is codenamed "Kepler". GeForce® GTX 680 is one Kepler architectural silicon part. Each Kepler GPU consists of different configurations of Graphics Processing Clusters (GPC) and streaming multiprocessors. The GTX 680 consists of four GPCs and eight SMXs as shown in the following figure: NVIDIA Kepler architecture—GTX 680, © NVIDIA® Kepler architecture is part of the GTX 6XX and GTX 7XX family of NVIDIA discrete cards. Prior to Kepler, NVIDIA had Fermi architecture which was part of the GTX 5XX family of discrete and mobile graphic processing units. Intel® Intel's OpenCL implementation is supported in the Sandy Bridge and Ivy Bridge processor families. Sandy Bridge family architecture is also synonymous with the AMD's APU. These processor architectures also integrated a GPU into the same silicon as the CPU by Intel. Intel changed the design of the L3 cache, and allowed the graphic cores to get access to the L3, which is also called as the last level cache. It is because of this L3 sharing that the graphics performance is good in Intel. Each of the CPUs including the graphics execution unit is connected via Ring Bus. Also each execution unit is a true parallel scalar processor. Sandy Bridge provides the graphics engine HD 2000, with six Execution Units (EU), and HD 3000 (12 EU), and Ivy Bridge provides HD 2500(six EU) and HD 4000 (16 EU). The following figure shows the Sandy bridge architecture with a ring bus, which acts as an interconnect between the cores and the HD graphics: Intel Sandy Bridge architecture—© Intel® ARM Mali™ GPUs ARM also provides GPUs by the name of Mali Graphics processors. The Mali T6XX series of processors come with two, four, or eight graphics cores. These graphic engines deliver graphics compute capability to entry level smartphones, tablets, and Smart TVs. The below diagram shows the Mali T628 graphics processor. ARM Mali—T628 graphics processor, © ARM Mali T628 has eight shader cores or graphic cores. These cores also support Renderscripts APIs besides supporting OpenCL. Besides the four key competitors, companies such as TI (DSP), Altera (FPGA), and Oracle are providing OpenCL implementations for their respective hardware. We suggest you to get hold of the benchmark performance numbers of the different processor architectures we discussed, and try to compare the performance numbers of each of them. This is an important first step towards comparing different architectures, and in the future you might want to select a particular OpenCL platform based on your application workload.

0
0
2763

article-image-enabling-your-new-theme-magento

Packt

18 Dec 2013

3 min read

Enabling your new theme in Magento

Packt

18 Dec 2013

3 min read

0
0
12541

Packt

18 Dec 2013

8 min read

Settings goals

Packt

18 Dec 2013

8 min read

0
0
1481

Fast Array Operations with NumPy

Applied Modeling

Going Isometric

Platform as a Service and CloudBees

JBoss EAP6 Overview

Reporting

Sharing Your BI Reports and Dashboards

Background Animation

Code Editing

Working with AMQP

Trending Topics

Component-based approach of Unity

Getting Started with Apache Nutch

Hello OpenCL

Enabling your new theme in Magento

Settings goals

Create a Free Account To Continue Reading

Sign in to activate your 7-day free access