Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletter Hub
Free Learning
Arrow right icon
timer SALE ENDS IN
0 Days
:
00 Hours
:
00 Minutes
:
00 Seconds

How-To Tutorials

7019 Articles
article-image-normal-maps
Packt
19 Jan 2017
12 min read
Save for later

Normal maps

Packt
19 Jan 2017
12 min read
In this article by Raimondas Pupius, the author of the book Mastering SFML Game Development we will learn about normal maps and specular maps. (For more resources related to this topic, see here.) Lighting can be used to create visually complex and breath-taking scenes. One of the massive benefits of having a lighting system is the ability it provides to add extra details to your scene, which wouldn't have been possible otherwise. One way of doing so is using normal maps. Mathematically speaking, the word "normal" in the context of a surface is simply a directional vector that is perpendicular to the said surface. Consider the following illustration: In this case, what's normal is facing up because that's the direction perpendicular to the plane. How is this helpful? Well, imagine you have a really complex model with many vertices; it'd be extremely taxing to render the said model because of all the geometry that would need to be processed with each frame. A clever trick to work around this, known as normal mapping, is to take the information of all of those vertices and save them on a texture that looks similar to this one: It probably looks extremely funky, especially if being looked of physical release in grayscale, but try not to think of this in terms of colors, but directions. The red channel of a normal map encodes the –x and +x values. The green channel does the same for –y and +y values, and the blue channel is used for –z to +z. Looking back at the previous image now, it's easier to confirm which direction each individual pixel is facing. Using this information on geometry that's completely flat would still allow us to light it in such a way that it would make it look like it has all of the detail in there; yet, it would still remain flat and light on performance: These normal maps can be hand-drawn or simply generated using software such as Crazybump. Let's see how all of this can be done in our game engine. Implementing normal map rendering In the case of maps, implementing normal map rendering is extremely simple. We already have all the material maps integrated and ready to go, so at this time, it's simply a matter of sampling the texture of the tile-sheet normals: void Map::Redraw(sf::Vector3i l_from, sf::Vector3i l_to) { ... if (renderer->UseShader("MaterialPass")) { // Material pass. auto shader = renderer->GetCurrentShader(); auto textureName = m_tileMap.GetTileSet().GetTextureName(); auto normalMaterial = m_textureManager-> GetResource(textureName + "_normal"); for (auto x = l_from.x; x <= l_to.x; ++x) { for (auto y = l_from.y; y <= l_to.y; ++y) { for (auto layer = l_from.z; layer <= l_to.z; ++layer) { auto tile = m_tileMap.GetTile(x, y, layer); if (!tile) { continue; } auto& sprite = tile->m_properties->m_sprite; sprite.setPosition( static_cast<float>(x * Sheet::Tile_Size), static_cast<float>(y * Sheet::Tile_Size)); // Normal pass. if (normalMaterial) { shader->setUniform("material", *normalMaterial); renderer->Draw(sprite, &m_normals[layer]); } } } } } ... } The process is exactly the same as drawing a normal tile to a diffuse map, except that here we have to provide the material shader with the texture of the tile-sheet normal map. Also note that we're now drawing to a normal buffer texture. The same is true for drawing entities as well: void S_Renderer::Draw(MaterialMapContainer& l_materials, Window& l_window, int l_layer) { ... if (renderer->UseShader("MaterialPass")) { // Material pass. auto shader = renderer->GetCurrentShader(); auto textures = m_systemManager-> GetEntityManager()->GetTextureManager(); for (auto &entity : m_entities) { auto position = entities->GetComponent<C_Position>( entity, Component::Position); if (position->GetElevation() < l_layer) { continue; } if (position->GetElevation() > l_layer) { break; } C_Drawable* drawable = GetDrawableFromType(entity); if (!drawable) { continue; } if (drawable->GetType() != Component::SpriteSheet) { continue; } auto sheet = static_cast<C_SpriteSheet*>(drawable); auto name = sheet->GetSpriteSheet()->GetTextureName(); auto normals = textures->GetResource(name + "_normal"); // Normal pass. if (normals) { shader->setUniform("material", *normals); drawable->Draw(&l_window, l_materials[MaterialMapType::Normal].get()); } } } ... } You can try obtaining a normal texture through the texture manager. If you find one, you can draw it to the normal map material buffer. Dealing with particles isn't much different from what we've seen already, except for one little piece of detail: void ParticleSystem::Draw(MaterialMapContainer& l_materials, Window& l_window, int l_layer) { ... if (renderer->UseShader("MaterialValuePass")) { // Material pass. auto shader = renderer->GetCurrentShader(); for (size_t i = 0; i < container->m_countAlive; ++i) { if (l_layer >= 0) { if (positions[i].z < l_layer * Sheet::Tile_Size) { continue; } if (positions[i].z >= (l_layer + 1) * Sheet::Tile_Size) { continue; } } else if (positions[i].z < Sheet::Num_Layers * Sheet::Tile_Size) { continue; } // Normal pass. shader->setUniform("material", sf::Glsl::Vec3(0.5f, 0.5f, 1.f)); renderer->Draw(drawables[i], l_materials[MaterialMapType::Normal].get()); } } ... } As you can see, we're actually using the material value shader in order to give particles' static normals, which are always sort of pointing to the camera. A normal map buffer should look something like this after you render all the normal maps to it: Changing the lighting shader Now that we have all of this information, let's actually use it when calculating the illumination of the pixels inside the light pass shader: uniform sampler2D LastPass; uniform sampler2D DiffuseMap; uniform sampler2D NormalMap; uniform vec3 AmbientLight; uniform int LightCount; uniform int PassNumber; struct LightInfo { vec3 position; vec3 color; float radius; float falloff; }; const int MaxLights = 4; uniform LightInfo Lights[MaxLights]; void main() { vec4 pixel = texture2D(LastPass, gl_TexCoord[0].xy); vec4 diffusepixel = texture2D(DiffuseMap, gl_TexCoord[0].xy); vec4 normalpixel = texture2D(NormalMap, gl_TexCoord[0].xy); vec3 PixelCoordinates = vec3(gl_FragCoord.x, gl_FragCoord.y, gl_FragCoord.z); vec4 finalPixel = gl_Color * pixel; vec3 viewDirection = vec3(0, 0, 1); if(PassNumber == 1) { finalPixel *= vec4(AmbientLight, 1.0); } // IF FIRST PASS ONLY! vec3 N = normalize(normalpixel.rgb * 2.0 - 1.0); for(int i = 0; i < LightCount; ++i) { vec3 L = Lights[i].position - PixelCoordinates; float distance = length(L); float d = max(distance - Lights[i].radius, 0); L /= distance; float attenuation = 1 / pow(d/Lights[i].radius + 1, 2); attenuation = (attenuation - Lights[i].falloff) / (1 - Lights[i].falloff); attenuation = max(attenuation, 0); float normalDot = max(dot(N, L), 0.0); finalPixel += (diffusepixel * ((vec4(Lights[i].color, 1.0) * attenuation))) * normalDot; } gl_FragColor = finalPixel; } First, the normal map texture needs to be passed to it as well as sampled, which is where the first two highlighted lines of code come in. Once this is done, for each light we're drawing on the screen, the normal directional vector is calculated. This is done by first making sure that it can go into the negative range and then normalizing it. A normalized vector only represents a direction. Since the color values range from 0 to 255, negative values cannot be directly represented. This is why we first bring them into the right range by multiplying them by 2.0 and subtracting by 1.0. A dot product is then calculated between the normal vector and the normalized L vector, which now represents the direction from the light to the pixel. How much a pixel is lit up from a specific light is directly contingent upon the dot product, which is a value from 1.0 to 0.0 and represents magnitude. A dot product is an algebraic operation that takes in two vectors, as well as the cosine of the angle between them, and produces a scalar value between 0.0 and 1.0 that essentially represents how “orthogonal” they are. We use this property to light pixels less and less, given greater and greater angles between their normals and the light. Finally, the dot product is used again when calculating the final pixel value. The entire influence of the light is multiplied by it, which allows every pixel to be drawn differently as if it had some underlying geometry that was pointing in a different direction. The last thing left to do now is to pass the normal map buffer to the shader in our C++ code: void LightManager::RenderScene() { ... if (renderer->UseShader("LightPass")) { // Light pass. ... shader->setUniform("NormalMap", m_materialMaps[MaterialMapType::Normal]->getTexture()); ... } ... } This effectively enables normal mapping and gives us beautiful results such as this: The leaves, the character, and pretty much everything in this image now looks like they have a definition, ridges, and crevices; it is lit as if it had geometry, although it's paper-thin. Note the lines around each tile in this particular instance. This is one of the main reasons why normal maps for pixel art, such as tile sheets, shouldn't be automatically generated; it can sample the tiles adjacent to it and incorrectly add bevelled edges. Specular maps While normal maps provide us with the possibility to fake how bumpy a surface is, specular maps allow us to do the same with the shininess of a surface. This is what the same segment of the tile sheet we used as an example for a normal map looks like in a specular map: It's not as complex as a normal map since it only needs to store one value: the shininess factor. We can leave it up to each light to decide how much shine it will cast upon the scenery by letting it have its own values: struct LightBase { ... float m_specularExponent = 10.f; float m_specularStrength = 1.f; }; Adding support for specularity Similar to normal maps, we need to use the material pass shader to render to a specularity buffer texture: void Map::Redraw(sf::Vector3i l_from, sf::Vector3i l_to) { ... if (renderer->UseShader("MaterialPass")) { // Material pass. ... auto specMaterial = m_textureManager->GetResource( textureName + "_specular"); for (auto x = l_from.x; x <= l_to.x; ++x) { for (auto y = l_from.y; y <= l_to.y; ++y) { for (auto layer = l_from.z; layer <= l_to.z; ++layer) { ... // Normal pass. // Specular pass. if (specMaterial) { shader->setUniform("material", *specMaterial); renderer->Draw(sprite, &m_speculars[layer]); } } } } } ... } The texture for specularity is once again attempted to be obtained; it is passed down to the material pass shader if found. The same is true when you render entities: void S_Renderer::Draw(MaterialMapContainer& l_materials, Window& l_window, int l_layer) { ... if (renderer->UseShader("MaterialPass")) { // Material pass. ... for (auto &entity : m_entities) { ... // Normal pass. // Specular pass. if (specular) { shader->setUniform("material", *specular); drawable->Draw(&l_window, l_materials[MaterialMapType::Specular].get()); } } } ... } Particles, on the other hand, also use the material value pass shader: void ParticleSystem::Draw(MaterialMapContainer& l_materials, Window& l_window, int l_layer) { ... if (renderer->UseShader("MaterialValuePass")) { // Material pass. auto shader = renderer->GetCurrentShader(); for (size_t i = 0; i < container->m_countAlive; ++i) { ... // Normal pass. // Specular pass. shader->setUniform("material", sf::Glsl::Vec3(0.f, 0.f, 0.f)); renderer->Draw(drawables[i], l_materials[MaterialMapType::Specular].get()); } } } For now, we don't want any of them to be specular at all. This can obviously be tweaked later on, but the important thing is that we have that functionality available and yielding results, such as the following: This specularity texture needs to be sampled inside a light-pass shader, just like a normal texture. Let's see what this involves. Changing the lighting shader Just as before, a uniform sampler2D needs to be added to sample the specularity of a particular fragment: uniform sampler2D LastPass; uniform sampler2D DiffuseMap; uniform sampler2D NormalMap; uniform sampler2D SpecularMap; uniform vec3 AmbientLight; uniform int LightCount; uniform int PassNumber; struct LightInfo { vec3 position; vec3 color; float radius; float falloff; float specularExponent; float specularStrength; }; const int MaxLights = 4; uniform LightInfo Lights[MaxLights]; const float SpecularConstant = 0.4; void main() { ... vec4 specularpixel = texture2D(SpecularMap, gl_TexCoord[0].xy); vec3 viewDirection = vec3(0, 0, 1); // Looking at positive Z. ... for(int i = 0; i < LightCount; ++i){ ... float specularLevel = 0.0; specularLevel = pow(max(0.0, dot(reflect(-L, N), viewDirection)), Lights[i].specularExponent * specularpixel.a) * SpecularConstant; vec3 specularReflection = Lights[i].color * specularLevel * specularpixel.rgb * Lights[i].specularStrength; finalPixel += (diffusepixel * ((vec4(Lights[i].color, 1.0) * attenuation)) + vec4(specularReflection, 1.0)) * normalDot; } gl_FragColor = finalPixel; } We also need to add in the specular exponent and strength to each light's struct, as it's now part of it. Once the specular pixel is sampled, we need to set up the direction of the camera as well. Since that's static, we can leave it as is in the shader. The specularity of the pixel is then calculated by taking into account the dot product between the pixel’s normal and the light, the color of the specular pixel itself, and the specular strength of the light. Note the use of a specular constant in the calculation. This is a value that can and should be tweaked in order to obtain best results, as 100% specularity rarely ever looks good. Then, all that's left is to make sure the specularity texture is also sent to the light-pass shader in addition to the light's specular exponent and strength values: void LightManager::RenderScene() { ... if (renderer->UseShader("LightPass")) { // Light pass. ... shader->setUniform("SpecularMap", m_materialMaps[MaterialMapType::Specular]->getTexture()); ... for (auto& light : m_lights) { ... shader->setUniform(id + ".specularExponent", light.m_specularExponent); shader->setUniform(id + ".specularStrength", light.m_specularStrength); ... } } } The result may not be visible right away, but upon closer inspection of moving a light stream, we can see that correctly mapped surfaces will have a glint that will move around with the light: While this is nearly perfect, there's still some room for improvement. Summary Lighting is a very powerful tool when used right. Different aspects of a material may be emphasized depending on the setup of the game level, additional levels of detail can be added in without too much overhead, and the overall aesthetics of the project will be leveraged to new heights. The full version of “Mastering SFML Game Development” offers all of this and more by not only utilizing normal and specular maps, but also using 3D shadow-mapping techniques to create Omni-directional point light shadows that breathe new life into the game world. Resources for Article: Further resources on this subject: Common Game Programming Patterns [article] Sprites in Action [article] Warfare Unleashed Implementing Gameplay [article]
Read more
  • 0
  • 0
  • 26521

article-image-background-jobs-django-celery
Jean Jung
19 Jan 2017
7 min read
Save for later

Background jobs on Django with Celery

Jean Jung
19 Jan 2017
7 min read
While doing web applications, you usually need to run some operations in the background to improve the application performance, or because a job really needs to run outside of the application environment. In both cases, if you are on Django, you are in good hands because you have Celery, the Distributed Task Queue written in Python. Celery is a tiny but complete project. You can find more information on the project page. In this post, we will see how it’s easy to integrate Celery with an existing project, and although we are focusing on Django here, creating a standalone Celery worker is a very similar process. Installing Celery The first step we will see is how to install Celery. If you already have it, please move to the next section and follow the next step! As every good Python package, Celery is distributed on pip. You can install it just by entering: pip install celery Choosing a message broker The second step is about choosing a message broker to act as the job queue. Celery can talk with a great variety of brokers; the main ones are: RabbitMQ Redis 1 Amazon SQS  ² Check for support on other brokers here. If you’re already using any of these brokers for other purposes, choose it as your primary option. In this section there is nothing more you have to do. Celery is very transparent and does not require any source modification to move from a broker to another, so feel free to try more than one after we end here. Ok let’s move on, but first do not forget to look the little notes below. ¹: For Redis (a great choice in my opinion), you have to install the celery[redis] package. ²: Celery has great features like web monitoring that do not work with this broker. Celery worker entrypoint When running Celery on a directory it will search for a file called celery.py, which is the application entrypoint, where the configs are loaded and the application object resides. Working with Django, this file is commonly stored on the project directory, along with the settings.py file; your file structure should look like this: your_project_name your_project_name __init__.py settings.py urls.py wsgi.py celery.py your_app_name __init__.py models.py views.py …. The settings read by that file will be on the same settings.py file that Django uses. At this point we can take a look at the official documentation celery.py file example. This code is basically the same for every project; just replace proj by your project name and save that file. Each part is described in the file comments. from __future__ import absolute_import, unicode_literals import os from celery import Celery # set the default Django settings module for the 'celery' program. os.environ.setdefault('DJANGO_SETTINGS_MODULE', 'proj.settings') app = Celery('proj') # Using a string here means the worker don't have to serialize # the configuration object to child processes. # - namespace='CELERY' means all celery-related configuration keys # should have a `CELERY_` prefix. app.config_from_object('django.conf:settings', namespace='CELERY') # Load task modules from all registered Django app configs. # This is not required, but as you can have more than one app # with tasks it’s better to do the autoload than declaring all tasks # in this same file. app.autodiscover_tasks() Settings By default, Celery depends only on the broker_url setting to work. As we’ve seen in the previous session, your settings will be stored alongside the Django ones but with the 0‘CELERY_’ prefix. The broker_url format is as follows: CELERY_BROKER_URL = ‘broker://[[user]:[password]@]host[:port[/resource]]’ Where broker is an identifier that specifies the chosen broker, like amqp or redis; user and password are the authentication to the service. If needed, host and port are the addresses of the service and resource is a broker-specific path to the component resource. For example, if you’ve chosen a local Redis as your broker, your broker URL will be: CELERY_BROKER_URL = ‘redis://localhost:6379/0’ ¹ 1: Considering a default Redis installation with the database 0 being used. Doing this we have a functioning celery worker. How lucky! It’s so simple! But wait, what about the tasks? How do we write and execute them? Let’s see. Creating and running tasks Because of the superpowers Celery has, it can autoload tasks from Django app directories as we’ve seen before; you just have to declare your app tasks in a file called tasks.py in the app dir: your_project_name your_project_name __init__.py settings.py urls.py wsgi.py celery.py your_app_name __init__.py models.py views.py tasks.py …. In that file you just need to put functions decorated with the celery.shared_task decorator. So suppose we want to do a background mailer; the source will be like this: from __future__ import absolute_import, unicode_literals from celery import shared_task from django.core.mail import send_mail @shared_task def mailer(subject, message, recipient_list, from=’default@admin.com’): send_mail(subject, message, recipient_list, from) Then on the Django application, on any place you have to send an e-mail on background, just do the following: from __future__ import absolute_import from app.tasks import mailer …. def send_email_to_user(request): if request.user: mailer.delay(‘Alert Foo’, ‘The foo message’, [request.user.email]) delay is probably the most used way to submit a job to a Celery worker, but is not the only one. Check this reference to see what is possible to do. There are many features like task chaining, with future schedules and more! As you can have noticed, in a great majority of the files, we have used the from __future__ import absolute_import statement. This is very important, mainly with Python 2, because of the way Celery serializes messages to post tasks on brokers. You need to follow the same convention when creating and using tasks, as otherwise the namespace of the task will differ and the task will not get executed. The absolute import module forces you to use absolute imports, so you will avoid these problems. Check this link for more information. Running the worker If you get the source code above, put anything in the right place and run the Django development server to test your background jobs, they will not work! Wait. This is because you don’t have a Celery worker started yet. To start it, do a cd to the project main directory (the same as you run python manage.py runserver for example) and run: celery -A your_project_name worker -l info Replace your_project_name with your project and info with the desired log level. Keep this process running, start the Django server, and yes. Now you can see that anything works! Where to go now? Explore the Celery documentation and see all the available features, caveats, and help you can get from it. There is also an example project on the Celery GitHub page that you can use as a template for new projects or a guide to add celery to your existing project. Summary We’ve seen how to install and configure Celery to run alongside a new or existing Django project. We explored some of the broker options we have, and how simple it is to change between them. There are some hints about brokers that don’t offer all of the features Celery has. We have seen an example of a mailer task, and how it was created and called from the Django application. Finally I provided instructions to start the worker to get the things done. References [1] - Django project documentation [2] - Celery project documentation [3] - Redis project page [4] - RabbitMQ project page [5] - Amazon SQS page About the author Jean Jung is a Brazilian developer passionate about technology. He is currently a system analyst at EBANX, an international payment processing company for Latin America. He's very interested in Python and artificial intelligence, specifically machine learning, compilers and operational systems. As a hobby, he's always looking for IoT projects with Arduino.
Read more
  • 0
  • 0
  • 4815

article-image-clustering-model-spark
Packt
19 Jan 2017
7 min read
Save for later

Clustering Model with Spark

Packt
19 Jan 2017
7 min read
In this article by Manpreet Singh Ghotra and Rajdeep Dua, coauthors of the book Machine Learning with Spark, Second Edition, we will analyze the case where we do not have labeled data available. Supervised learning methods are those where the training data is labeled with the true outcome that we would like to predict (for example, a rating for recommendations and class assignment for classification or a real target variable in the case of regression). (For more resources related to this topic, see here.) In unsupervised learning, the model is not supervised with the true target label. The unsupervised case is very common in practice, since obtaining labeled training data can be very difficult or expensive in many real-world scenarios (for example, having humans label training data with class labels for classification). However, we would still like to learn some underlying structure in the data and use these to make predictions. This is where unsupervised learning approaches can be useful. Unsupervised learning models are also often combined with supervised models, for example, applying unsupervised techniques to create new input features for supervised models. Clustering models are, in many ways, the unsupervised equivalent of classification models. With classification, we would try to learn a model that would predict which class a given training example belonged to. The model is essentially a mapping from a set of features to the class. In clustering, we would like to segment the data in such a way that each training example is assigned to a segment called a cluster. The clusters act much like classes, except that the true class assignments are unknown. Clustering models have many use cases that are the same as classification; these include the following: Segmenting users or customers into different groups based on behavior characteristics and metadata Grouping content on a website or products in a retail business Finding clusters of similar genes Segmenting communities in ecology Creating image segments for use in image analysis applications such as object detection Types of clustering models There are many different forms of clustering models available, ranging from simple to extremely complex ones. The Spark MLlibrary currently provides K-means clustering, which is among the simplest approaches available. However, it is often very effective, and its simplicity makes it is relatively easy to understand and is scalable. K-means clustering K-means attempts to partition a set of data points into K distinct clusters (where K is an input parameter for the model). More formally, K-means tries to find clusters so as to minimize the sum of squared errors (or distances) within each cluster. This objective function is known as the within cluster sum of squared errors (WCSS). It is the sum, over each cluster, of the squared errors between each point and the cluster center. Starting with a set of K initial cluster centers (which are computed as the mean vector for all data points in the cluster), the standard method for K-means iterates between two steps: Assign each data point to the cluster that minimizes the WCSS. The sum of squares is equivalent to the squared Euclidean distance; therefore, this equates to assigning each point to the closest cluster center as measured by the Euclidean distance metric. Compute the new cluster centers based on the cluster assignments from the first step. The algorithm proceeds until either a maximum number of iterations has been reached or convergence has been achieved. Convergence means that the cluster assignments no longer change during the first step; therefore, the value of the WCSS objective function does not change either. For more details, refer to Spark's documentation on clustering at http://spark.apache.org/docs/latest/mllib-clustering.html or refer to http://en.wikipedia.org/wiki/K-means_clustering. To illustrate the basics of K-means, we will use a simple dataset. We have five classes, which are shown in the following figure: Multiclass dataset However, assume that we don't actually know the true classes. If we use K-means with five clusters, then after the first step, the model's cluster assignments might look like this: Cluster assignments after the first K-means iteration We can see that K-means has already picked out the centers of each cluster fairly well. After the next iteration, the assignments might look like those shown in the following figure: Cluster assignments after the second K-means iteration Things are starting to stabilize, but the overall cluster assignments are broadly the same as they were after the first iteration. Once the model has converged, the final assignments could look like this: Final cluster assignments for K-means As we can see, the model has done a decent job of separating the five clusters. The leftmost three are fairly accurate (with a few incorrect points). However, the two clusters in the bottom-right corner are less accurate. This illustrates the following: The iterative nature of K-means The model's dependency on the method of initially selecting clusters' centers (here, we will use a random approach) How the final cluster assignments can be very good for well-separated data but can be poor for data that is more difficult Initialization methods The standard initialization method for K-means, usually simply referred to as the random method, starts by randomly assigning each data point to a cluster before proceeding with the first update step. Spark ML provides a parallel variant for this initialization method, called K-means++, which is the default initialization method used. Refer to http://en.wikipedia.org/wiki/K-means_clustering#Initialization_methods and http://en.wikipedia.org/wiki/K-means%2B%2B for more information. The results of using K-means++ are shown here. Note that this time, the difficult bottom-right points have been mostly correctly clustered. Final cluster assignments for K-means++ Variants There are many other variants of K-means; they focus on initialization methods or the core model. One of the more common variants is fuzzy K-means. This model does not assign each point to one cluster as K-means does (a so-called hard assignment). Instead, it is a soft version of K-means, where each point can belong to many clusters and is represented by the relative membership to each cluster. So, for K clusters, each point is represented as a K-dimensional membership vector, with each entry in this vector indicating the membership proportion in each cluster. Mixture models A mixture model is essentially an extension of the idea behind fuzzy K-means; however, it makes an assumption that there is an underlying probability distribution that generates the data. For example, we might assume that the data points are drawn from a set of K-independent Gaussian (normal) probability distributions. The cluster assignments are also soft, so each point is represented by K membership weights in each of the K underlying probability distributions. Refer to http://en.wikipedia.org/wiki/Mixture_model for further details and for a mathematical treatment of mixture models. Hierarchical clustering Hierarchical clustering is a structured clustering approach that results in a multilevel hierarchy of clusters where each cluster might contain many subclusters (or child clusters). Each child cluster is, thus, linked to the parent cluster. This form of clustering is often also called tree clustering. Agglomerative clustering is a bottom-up approach where we have the following: Each data point begins in its own cluster The similarity (or distance) between each pair of clusters is evaluated The pair of clusters that are most similar are found; this pair is then merged to form a new cluster The process is repeated until only one top-level cluster remains Divisive clustering is a top-down approach that works in reverse, starting with one cluster, and at each stage, splitting a cluster into two, until all data points are allocated to their own bottom-level cluster. You can find more information at http://en.wikipedia.org/wiki/Hierarchical_clustering. Summary In this article, we explored a new class of model that learns structure from unlabeled data—unsupervised learning. You learned about various clustering models like the K-means model, mixture models, and the hierarchical clustering model. We also considered a simple dataset to illustrate the basics of K-means. Resources for Article: Further resources on this subject: Spark for Beginners [article] Setting up Spark [article] Holistic View on Spark [article]
Read more
  • 0
  • 0
  • 2123

article-image-using-firebase-real-time-database
Oliver Blumanski
18 Jan 2017
5 min read
Save for later

Using the Firebase Real-Time Database

Oliver Blumanski
18 Jan 2017
5 min read
In this post, we are going to look at how to use the Firebase real-time database, along with an example. Here we are writing and reading data from the database using multiple platforms. To do this, we first need a server script that is adding data, and secondly we need a component that pulls the data from the Firebase database. Step 1 - Server Script to collect data Digest an XML feed and transfer the data into the Firebase real-time database. The script runs as cronjob frequently to refresh the data. Step 2 - App Component Subscribe to the data from a JavaScript component, in this case, React-Native. About Firebase Now that those two steps are complete, let's take a step back and talk about Google Firebase. Firebase offers a range of services such as a real-time database, authentication, cloud notifications, storage, and much more. You can find the full feature list here. Firebase covers three platforms: iOS, Android, and Web. The server script uses the Firebases JavaScript Web API. Having data in this real-time database allows us to query the data from all three platforms (iOS, Android, Web), and in addition, the real-time database allows us to subscribe (listen) to a database path (query), or to query a path once. Step 1 - Digest XML feed and transfer into Firebase Firebase Set UpThe first thing you need to do is to set up a Google Firebase project here In the app, click on "Add another App" and choose Web, a pop-up will show you the configuration. You can copy paste your config into the example script. Now you need to set the rules for your Firebase database. You should make yourself familiar with the database access rules. In my example, the path latestMarkets/ is open for write and read. In a real-world production app, you would have to secure this, having authentication for the write permissions. Here are the database rules to get started: { "rules": { "users": { "$uid": { ".read": "$uid === auth.uid", ".write": "$uid === auth.uid" } }, "latestMarkets": { ".read": true, ".write": true } } } The Server Script Code The XML feed contains stock market data and is frequently changing, except on the weekend. To build the server script, some NPM packages are needed: Firebase Request xml2json babel-preset-es2015 Require modules and configure Firebase web api: const Firebase = require('firebase'); const request = require('request'); const parser = require('xml2json'); // firebase access config const config = { apiKey: "apikey", authDomain: "authdomain", databaseURL: "dburl", storageBucket: "optional", messagingSenderId: "optional" } // init firebase Firebase.initializeApp(config) [/Code] I write JavaScript code in ES6. It is much more fun. It is a simple script, so let's have a look at the code that is relevant to Firebase. The code below is inserting or overwriting data in the database. For this script, I am happy to overwrite data: Firebase.database().ref('latestMarkets/'+value.Symbol).set({ Symbol: value.Symbol, Bid: value.Bid, Ask: value.Ask, High: value.High, Low: value.Low, Direction: value.Direction, Last: value.Last }) .then((response) => { // callback callback(true) }) .catch((error) => { // callback callback(error) }) Firebase Db first references the path: Firebase.database().ref('latestMarkets/'+value.Symbol) And then the action you want to do: // insert/overwrite (promise) Firebase.database().ref('latestMarkets/'+value.Symbol).set({}).then((result)) // get data once (promise) Firebase.database().ref('latestMarkets/'+value.Symbol).once('value').then((snapshot)) // listen to db path, get data on change (callback) Firebase.database().ref('latestMarkets/'+value.Symbol).on('value', ((snapshot) => {}) // ...... Here is the Github repository: Displaying the data in a React-Native app This code below will listen to a database path, on data change, all connected devices will synchronise the data: Firebase.database().ref('latestMarkets/').on('value', snapshot => { // do something with snapshot.val() }) To close the listener, or unsubscribe the path, one can use "off": Firebase.database().ref('latestMarkets/').off() I’ve created an example react-native app to display the data: The Github repository Conclusion In mobile app development, one big question is: "What database and cache solution can I use to provide online and offline capabilities?" One way to look at this question is like you are starting a project from scratch. If so, you can fit your data into Firebase, and then this would be a great solution for you. Additionally, you can use it for both web and mobile apps. The great thing is that you don't need to write a particular API, and you can access data straight from JavaScript. On the other hand, if you have a project that uses MySQL for example, the Firebase real-time database won't help you much. You would need to have a remote API to connect to your database in this case. But even if using the Firebase database isn't a good fit for your project, there are still other features, such as Firebase Storage or Cloud Messaging, which are very easy to use, and even though they are beyond the scope of this post, they are worth checking out. About the author Oliver Blumanski is a developer based out of Townsville, Australia. He has been a software developer since 2000, and can be found on GitHub at @blumanski.
Read more
  • 0
  • 0
  • 16942

article-image-building-scalable-microservices
Packt
18 Jan 2017
33 min read
Save for later

Building Scalable Microservices

Packt
18 Jan 2017
33 min read
In this article by Vikram Murugesan, the author of the book Microservices Deployment Cookbook, we will see a brief introduction to concept of the microservices. (For more resources related to this topic, see here.) Writing microservices with Spring Boot Now that our project is ready, let's look at how to write our microservice. There are several Java-based frameworks that let you create microservices. One of the most popular frameworks from the Spring ecosystem is the Spring Boot framework. In this article, we will look at how to create a simple microservice application using Spring Boot. Getting ready Any application requires an entry point to start the application. For Java-based applications, you can write a class that has the main method and run that class as a Java application. Similarly, Spring Boot requires a simple Java class with the main method to run it as a Spring Boot application (microservice). Before you start writing your Spring Boot microservice, you will also require some Maven dependencies in your pom.xml file. How to do it… Create a Java class called com.packt.microservices.geolocation.GeoLocationApplication.java and give it an empty main method: package com.packt.microservices.geolocation; public class GeoLocationApplication { public static void main(String[] args) { // left empty intentionally } } Now that we have our basic template project, let's make our project a child project of Spring Boot's spring-boot-starter-parent pom module. This module has a lot of prerequisite configurations in its pom.xml file, thereby reducing the amount of boilerplate code in our pom.xml file. At the time of writing this, 1.3.6.RELEASE was the most recent version: <parent> <groupId>org.springframework.boot</groupId> <artifactId>spring-boot-starter-parent</artifactId> <version>1.3.6.RELEASE</version> </parent> After this step, you might want to run a Maven update on your project as you have added a new parent module. If you see any warnings about the version of the maven-compiler plugin, you can either ignore it or just remove the <version>3.5.1</version> element. If you remove the version element, please perform a Maven update afterward. Spring Boot has the ability to enable or disable Spring modules such as Spring MVC, Spring Data, and Spring Caching. In our use case, we will be creating some REST APIs to consume the geolocation information of the users. So we will need Spring MVC. Add the following dependencies to your pom.xml file: <dependencies> <dependency> <groupId>org.springframework.boot</groupId> <artifactId>spring-boot-starter-web</artifactId> </dependency> </dependencies> We also need to expose the APIs using web servers such as Tomcat, Jetty, or Undertow. Spring Boot has an in-memory Tomcat server that starts up as soon as you start your Spring Boot application. So we already have an in-memory Tomcat server that we could utilize. Now let's modify the GeoLocationApplication.java class to make it a Spring Boot application: package com.packt.microservices.geolocation; import org.springframework.boot.SpringApplication; import org.springframework.boot.autoconfigure.SpringBootApplication; @SpringBootApplication public class GeoLocationApplication { public static void main(String[] args) { SpringApplication.run(GeoLocationApplication.class, args); } } As you can see, we have added an annotation, @SpringBootApplication, to our class. The @SpringBootApplication annotation reduces the number of lines of code written by adding the following three annotations implicitly: @Configuration @ComponentScan @EnableAutoConfiguration If you are familiar with Spring, you will already know what the first two annotations do. @EnableAutoConfiguration is the only annotation that is part of Spring Boot. The AutoConfiguration package has an intelligent mechanism that guesses the configuration of your application and automatically configures the beans that you will likely need in your code. You can also see that we have added one more line to the main method, which actually tells Spring Boot the class that will be used to start this application. In our case, it is GeoLocationApplication.class. If you would like to add more initialization logic to your application, such as setting up the database or setting up your cache, feel free to add it here. Now that our Spring Boot application is all set to run, let's see how to run our microservice. Right-click on GeoLocationApplication.java from Package Explorer, select Run As, and then select Spring Boot App. You can also choose Java Application instead of Spring Boot App. Both the options ultimately do the same thing. You should see something like this on your STS console: If you look closely at the console logs, you will notice that Tomcat is being started on port number 8080. In order to make sure our Tomcat server is listening, let's run a simple curl command. cURL is a command-line utility available on most Unix and Mac systems. For Windows, use tools such as Cygwin or even Postman. Postman is a Google Chrome extension that gives you the ability to send and receive HTTP requests. For simplicity, we will use cURL. Execute the following command on your terminal: curl http://localhost:8080 This should give us an output like this: {"timestamp":1467420963000,"status":404,"error":"Not Found","message":"No message available","path":"/"} This error message is being produced by Spring. This verifies that our Spring Boot microservice is ready to start building on with more features. There are more configurations that are needed for Spring Boot, which we will perform later in this article along with Spring MVC. Writing microservices with WildFly Swarm WildFly Swarm is a J2EE application packaging framework from RedHat that utilizes the in-memory Undertow server to deploy microservices. In this article, we will create the same GeoLocation API using WildFly Swarm and JAX-RS. To avoid confusion and dependency conflicts in our project, we will create the WildFly Swarm microservice as its own Maven project. This article is just here to help you get started on WildFly Swarm. When you are building your production-level application, it is your choice to either use Spring Boot, WildFly Swarm, Dropwizard, or SparkJava based on your needs. Getting ready Similar to how we created the Spring Boot Maven project, create a Maven WAR module with the groupId com.packt.microservices and name/artifactId geolocation-wildfly. Feel free to use either your IDE or the command line. Be aware that some IDEs complain about a missing web.xml file. We will see how to fix that in the next section. How to do it… Before we set up the WildFly Swarm project, we have to fix the missing web.xml error. The error message says that Maven expects to see a web.xml file in your project as it is a WAR module, but this file is missing in your project. In order to fix this, we have to add and configure maven-war-plugin. Add the following code snippet to your pom.xml file's project section: <build> <plugins> <plugin> <groupId>org.apache.maven.plugins</groupId> <artifactId>maven-war-plugin</artifactId> <version>2.6</version> <configuration> <failOnMissingWebXml>false</failOnMissingWebXml> </configuration> </plugin> </plugins> </build> After adding the snippet, save your pom.xml file and perform a Maven update. Also, if you see that your project is using a Java version other than 1.8. Again, perform a Maven update for the changes to take effect. Now, let's add the dependencies required for this project. As we know that we will be exposing our APIs, we have to add the JAX-RS library. JAX-RS is the standard JSR-compliant API for creating RESTful web services. JBoss has its own version of JAX-RS. So let's  add that dependency to the pom.xml file: <dependencies> <dependency> <groupId>org.jboss.spec.javax.ws.rs</groupId> <artifactId>jboss-jaxrs-api_2.0_spec</artifactId> <version>1.0.0.Final</version> <scope>provided</scope> </dependency> </dependencies> The one thing that you have to note here is the provided scope. The provide scope in general means that this JAR need not be bundled with the final artifact when it is built. Usually, the dependencies with provided scope will be available to your application either via your web server or application server. In this case, when Wildfly Swarm bundles your app and runs it on the in-memory Undertow server, your server will already have this dependency. The next step toward creating the GeoLocation API using Wildfly Swarm is creating the domain object. Use the com.packt.microservices.geolocation.GeoLocation.java file. Now that we have the domain object, there are two classes that you need to create in order to write your first JAX-RS web service. The first of those is the Application class. The Application class in JAX-RS is used to define the various components that you will be using in your application. It can also hold some metadata about your application, such as your basePath (or ApplicationPath) to all resources listed in this Application class. In this case, we are going to use /geolocation as our basePath. Let's see how that looks: package com.packt.microservices.geolocation; import javax.ws.rs.ApplicationPath; import javax.ws.rs.core.Application; @ApplicationPath("/geolocation") public class GeoLocationApplication extends Application { public GeoLocationApplication() {} } There are two things to note in this class; one is the Application class and the other is the @ApplicationPath annotation—both of which we've already talked about. Now let's move on to the resource class, which is responsible for exposing the APIs. If you are familiar with Spring MVC, you can compare Resource classes to Controllers. They are responsible for defining the API for any specific resource. The annotations are slightly different from that of Spring MVC. Let's create a new resource class called com.packt.microservices.geolocation.GeoLocationResource.java that exposes a simple GET API: package com.packt.microservices.geolocation; import java.util.ArrayList; import java.util.List; import javax.ws.rs.GET; import javax.ws.rs.Path; import javax.ws.rs.Produces; @Path("/") public class GeoLocationResource { @GET @Produces("application/json") public List<GeoLocation> findAll() { return new ArrayList<>(); } } All the three annotations, @GET, @Path, and @Produces, are pretty self explanatory. Before we start writing the APIs and the service class, let's test the application from the command line to make sure it works as expected. With the current implementation, any GET request sent to the /geolocation URL should return an empty JSON array. So far, we have created the RESTful APIs using JAX-RS. It's just another JAX-RS project: In order to make it a microservice using Wildfly Swarm, all you have to do is add the wildfly-swarm-plugin to the Maven pom.xml file. This plugin will be tied to the package phase of the build so that whenever the package goal is triggered, the plugin will create an uber JAR with all required dependencies. An uber JAR is just a fat JAR that has all dependencies bundled inside itself. It also deploys our application in an in-memory Undertow server. Add the following snippet to the plugins section of the pom.xml file: <plugin> <groupId>org.wildfly.swarm</groupId> <artifactId>wildfly-swarm-plugin</artifactId> <version>1.0.0.Final</version> <executions> <execution> <id>package</id> <goals> <goal>package</goal> </goals> </execution> </executions> </plugin> Now execute the mvn clean package command from the project's root directory, and wait for the Maven build to be successful. If you look at the logs, you can see that wildfly-swarm-plugin will create the uber JAR, which has all its dependencies. You should see something like this in your console logs: After the build is successful, you will find two artifacts in the target directory of your project. The geolocation-wildfly-0.0.1-SNAPSHOT.war file is the final WAR created by the maven-war-plugin. The geolocation-wildfly-0.0.1-SNAPSHOT-swarm.jar file is the uber JAR created by the wildfly-swarm-plugin. Execute the following command in the same terminal to start your microservice: java –jar target/geolocation-wildfly-0.0.1-SNAPSHOT-swarm.jar After executing this command, you will see that Undertow has started on port number 8080, exposing the geolocation resource we created. You will see something like this: Execute the following cURL command in a separate terminal window to make sure our API is exposed. The response of the command should be [], indicating there are no geolocations: curl http://localhost:8080/geolocation Now let's build the service class and finish the APIs that we started. For simplicity purposes, we are going to store the geolocations in a collection in the service class itself. In a real-time scenario, you will be writing repository classes or DAOs that talk to the database that holds your geolocations. Get the com.packt.microservices.geolocation.GeoLocationService.java interface. We'll use the same interface here. Create a new class called com.packt.microservices.geolocation.GeoLocationServiceImpl.java that extends the GeoLocationService interface: package com.packt.microservices.geolocation; import java.util.ArrayList; import java.util.Collections; import java.util.List; public class GeoLocationServiceImpl implements GeoLocationService { private static List<GeoLocation> geolocations = new ArrayList<>(); @Override public GeoLocation create(GeoLocation geolocation) { geolocations.add(geolocation); return geolocation; } @Override public List<GeoLocation> findAll() { return Collections.unmodifiableList(geolocations); } } Now that our service classes are implemented, let's finish building the APIs. We already have a very basic stubbed-out GET API. Let's just introduce the service class to the resource class and call the findAll method. Similarly, let's use the service's create method for POST API calls. Add the following snippet to GeoLocationResource.java: private GeoLocationService service = new GeoLocationServiceImpl(); @GET @Produces("application/json") public List<GeoLocation> findAll() { return service.findAll(); } @POST @Produces("application/json") @Consumes("application/json") public GeoLocation create(GeoLocation geolocation) { return service.create(geolocation); } We are now ready to test our application. Go ahead and build your application. After the build is successful, run your microservice: let's try to create two geolocations using the POST API and later try to retrieve them using the GET method. Execute the following cURL commands in your terminal one by one: curl -H "Content-Type: application/json" -X POST -d '{"timestamp": 1468203975, "userId": "f1196aac-470e-11e6-beb8-9e71128cae77", "latitude": 41.803488, "longitude": -88.144040}' http://localhost:8080/geolocation This should give you something like the following output (pretty-printed for readability): { "latitude": 41.803488, "longitude": -88.14404, "userId": "f1196aac-470e-11e6-beb8-9e71128cae77", "timestamp": 1468203975 } curl -H "Content-Type: application/json" -X POST -d '{"timestamp": 1468203975, "userId": "f1196aac-470e-11e6-beb8-9e71128cae77", "latitude": 9.568012, "longitude": 77.962444}' http://localhost:8080/geolocation This command should give you an output similar to the following (pretty-printed for readability): { "latitude": 9.568012, "longitude": 77.962444, "userId": "f1196aac-470e-11e6-beb8-9e71128cae77", "timestamp": 1468203975 } To verify whether your entities were stored correctly, execute the following cURL command: curl http://localhost:8080/geolocation This should give you an output like this (pretty-printed for readability): [ { "latitude": 41.803488, "longitude": -88.14404, "userId": "f1196aac-470e-11e6-beb8-9e71128cae77", "timestamp": 1468203975 }, { "latitude": 9.568012, "longitude": 77.962444, "userId": "f1196aac-470e-11e6-beb8-9e71128cae77", "timestamp": 1468203975 } ] Whatever we have seen so far will give you a head start in building microservices with WildFly Swarm. Of course, there are tons of features that WildFly Swarm offers. Feel free to try them out based on your application needs. I strongly recommend going through the WildFly Swarm documentation for any advanced usages. Writing microservices with Dropwizard Dropwizard is a collection of libraries that help you build powerful applications quickly and easily. The libraries vary from Jackson, Jersey, Jetty, and so on. You can take a look at the full list of libraries on their website. This ecosystem of libraries that help you build powerful applications could be utilized to create microservices as well. As we saw earlier, it utilizes Jetty to expose its services. In this article, we will create the same GeoLocation API using Dropwizard and Jersey. To avoid confusion and dependency conflicts in our project, we will create the Dropwizard microservice as its own Maven project. This article is just here to help you get started with Dropwizard. When you are building your production-level application, it is your choice to either use Spring Boot, WildFly Swarm, Dropwizard, or SparkJava based on your needs. Getting ready Similar to how we created other Maven projects,  create a Maven JAR module with the groupId com.packt.microservices and name/artifactId geolocation-dropwizard. Feel free to use either your IDE or the command line. After the project is created, if you see that your project is using a Java version other than 1.8. Perform a Maven update for the change to take effect. How to do it… The first thing that you will need is the dropwizard-core Maven dependency. Add the following snippet to your project's pom.xml file: <dependencies> <dependency> <groupId>io.dropwizard</groupId> <artifactId>dropwizard-core</artifactId> <version>0.9.3</version> </dependency> </dependencies> Guess what? This is the only dependency you will need to spin up a simple Jersey-based Dropwizard microservice. Before we start configuring Dropwizard, we have to create the domain object, service class, and resource class: com.packt.microservices.geolocation.GeoLocation.java com.packt.microservices.geolocation.GeoLocationService.java com.packt.microservices.geolocation.GeoLocationImpl.java com.packt.microservices.geolocation.GeoLocationResource.java Let's see what each of these classes does. The GeoLocation.java class is our domain object that holds the geolocation information. The GeoLocationService.java class defines our interface, which is then implemented by the GeoLocationServiceImpl.java class. If you take a look at the GeoLocationServiceImpl.java class, we are using a simple collection to store the GeoLocation domain objects. In a real-time scenario, you will be persisting these objects in a database. But to keep it simple, we will not go that far. To be consistent with the previous, let's change the path of GeoLocationResource to /geolocation. To do so, replace @Path("/") with @Path("/geolocation") on line number 11 of the GeoLocationResource.java class. We have now created the service classes, domain object, and resource class. Let's configure Dropwizard. In order to make your project a microservice, you have to do two things: Create a Dropwizard configuration class. This is used to store any meta-information or resource information that your application will need during runtime, such as DB connection, Jetty server, logging, and metrics configurations. These configurations are ideally stored in a YAML file, which will them be mapped to your Configuration class using Jackson. In this application, we are not going to use the YAML configuration as it is out of scope for this article. If you would like to know more about configuring Dropwizard, refer to their Getting Started documentation page at http://www.dropwizard.io/0.7.1/docs/getting-started.html. Let's  create an empty Configuration class called GeoLocationConfiguration.java: package com.packt.microservices.geolocation; import io.dropwizard.Configuration; public class GeoLocationConfiguration extends Configuration { } The YAML configuration file has a lot to offer. Take a look at a sample YAML file from Dropwizard's Getting Started documentation page to learn more. The name of the YAML file is usually derived from the name of your microservice. The microservice name is usually identified by the return value of the overridden method public String getName() in your Application class. Now let's create the GeoLocationApplication.java application class: package com.packt.microservices.geolocation; import io.dropwizard.Application; import io.dropwizard.setup.Environment; public class GeoLocationApplication extends Application<GeoLocationConfiguration> { public static void main(String[] args) throws Exception { new GeoLocationApplication().run(args); } @Override public void run(GeoLocationConfiguration config, Environment env) throws Exception { env.jersey().register(new GeoLocationResource()); } } There are a lot of things going on here. Let's look at them one by one. Firstly, this class extends Application with the GeoLocationConfiguration generic. This clearly makes an instance of your GeoLocationConfiguraiton.java class available so that you have access to all the properties you have defined in your YAML file at the same time mapped in the Configuration class. The next one is the run method. The run method takes two arguments: your configuration and environment. The Environment instance is a wrapper to other library-specific objects such as MetricsRegistry, HealthCheckRegistry, and JerseyEnvironment. For example, we could register our Jersey resources using the JerseyEnvironment instance. The env.jersey().register(new GeoLocationResource())line does exactly that. The main method is pretty straight-forward. All it does is call the run method. Before we can start the microservice, we have to configure this project to create a runnable uber JAR. Uber JARs are just fat JARs that bundle their dependencies in themselves. For this purpose, we will be using the maven-shade-plugin. Add the following snippet to the build section of the pom.xml file. If this is your first plugin, you might want to wrap it in a <plugins> element under <build>: <plugin> <groupId>org.apache.maven.plugins</groupId> <artifactId>maven-shade-plugin</artifactId> <version>2.3</version> <configuration> <createDependencyReducedPom>true</createDependencyReducedPom> <filters> <filter> <artifact>*:*</artifact> <excludes> <exclude>META-INF/*.SF</exclude> <exclude>META-INF/*.DSA</exclude> <exclude>META-INF/*.RSA</exclude> </excludes> </filter> </filters> </configuration> <executions> <execution> <phase>package</phase> <goals> <goal>shade</goal> </goals> <configuration> <transformers> <transformer implementation="org.apache.maven.plugins.shade.resource.ServicesResourceTransformer" /> <transformer implementation="org.apache.maven.plugins.shade.resource.ManifestResourceTransformer"> <mainClass>com.packt.microservices.geolocation.GeoLocationApplication</mainClass> </transformer> </transformers> </configuration> </execution> </executions> </plugin> The previous snippet does the following: It creates a runnable uber JAR that has a reduced pom.xml file that does not include the dependencies that are added to the uber JAR. To learn more about this property, take a look at the documentation of maven-shade-plugin. It utilizes com.packt.microservices.geolocation.GeoLocationApplication as the class whose main method will be invoked when this JAR is executed. This is done by updating the MANIFEST file. It excludes all signatures from signed JARs. This is required to avoid security errors. Now that our project is properly configured, let's try to build and run it from the command line. To build the project, execute mvn clean package from the project's root directory in your terminal. This will create your final JAR in the target directory. Execute the following command to start your microservice: java -jar target/geolocation-dropwizard-0.0.1-SNAPSHOT.jar server The server argument instructs Dropwizard to start the Jetty server. After you issue the command, you should be able to see that Dropwizard has started the in-memory Jetty server on port 8080. If you see any warnings about health checks, ignore them. Your console logs should look something like this: We are now ready to test our application. Let's try to create two geolocations using the POST API and later try to retrieve them using the GET method. Execute the following cURL commands in your terminal one by one: curl -H "Content-Type: application/json" -X POST -d '{"timestamp": 1468203975, "userId": "f1196aac-470e-11e6-beb8-9e71128cae77", "latitude": 41.803488, "longitude": -88.144040}' http://localhost:8080/geolocation This should give you an output similar to the following (pretty-printed for readability): { "latitude": 41.803488, "longitude": -88.14404, "userId": "f1196aac-470e-11e6-beb8-9e71128cae77", "timestamp": 1468203975 } curl -H "Content-Type: application/json" -X POST -d '{"timestamp": 1468203975, "userId": "f1196aac-470e-11e6-beb8-9e71128cae77", "latitude": 9.568012, "longitude": 77.962444}' http://localhost:8080/geolocation This should give you an output like this (pretty-printed for readability): { "latitude": 9.568012, "longitude": 77.962444, "userId": "f1196aac-470e-11e6-beb8-9e71128cae77", "timestamp": 1468203975 } To verify whether your entities were stored correctly, execute the following cURL command: curl http://localhost:8080/geolocation It should give you an output similar to the following (pretty-printed for readability): [ { "latitude": 41.803488, "longitude": -88.14404, "userId": "f1196aac-470e-11e6-beb8-9e71128cae77", "timestamp": 1468203975 }, { "latitude": 9.568012, "longitude": 77.962444, "userId": "f1196aac-470e-11e6-beb8-9e71128cae77", "timestamp": 1468203975 } ] Excellent! You have created your first microservice with Dropwizard. Dropwizard offers more than what we have seen so far. Some of it is out of scope for this article. I believe the metrics API that Dropwizard uses could be used in any type of application. Writing your Dockerfile So far in this article, we have seen how to package our application and how to install Docker. Now that we have our JAR artifact and Docker set up, let's see how to Dockerize our microservice application using Docker. Getting ready In order to Dockerize our application, we will have to tell Docker how our image is going to look. This is exactly the purpose of a Dockerfile. A Dockerfile has its own syntax (or Dockerfile instructions) and will be used by Docker to create images. Throughout this article, we will try to understand some of the most commonly used Dockerfile instructions as we write our Dockerfile for the geolocation tracker microservice. How to do it… First, open your STS IDE and create a new file called Dockerfile in the geolocation project. The first line of the Dockerfile is always the FROM instruction followed by the base image that you would like to create your image from. There are thousands of images on Docker Hub to choose from. In our case, we would need something that already has Java installed on it. There are some images that are official, meaning they are well documented and maintained. Docker Official Repositories are very well documented, and they follow best practices and standards. Docker has its own team to maintain these repositories. This is essential in order to keep the repository clear, thus helping the user make the right choice of repository. To read more about Docker Official Repositories, take a look at https://docs.docker.com/docker-hub/official_repos/ We will be using the Java official repository. To find the official repository, go to hub.docker.com and search for java. You have to choose the one that says official. At the time of writing this, the Java image documentation says it will soon be deprecated in favor of the openjdk image. So the first line of our Dockerfile will look like this: FROM openjdk:8 As you can see, we have used version (or tag) 8 for our image. If you are wondering what type of operating system this image uses, take a look at the Dockerfile of this image, which you can get from the Docker Hub page. Docker images are usually tagged with the version of the software they are written for. That way, it is easy for users to pick from. The next step is creating a directory for our project where we will store our JAR artifact. Add this as your next line: RUN mkdir -p /opt/packt/geolocation This is a simple Unix command that creates the /opt/packt/geolocation directory. The –p flag instructs it to create the intermediate directories if they don't exist. Now let's create an instruction that will add the JAR file that was created in your local machine into the container at /opt/packt/geolocation. ADD target/geolocation-0.0.1-SNAPSHOT.jar /opt/packt/geolocation/ As you can see, we are picking up the uber JAR from target directory and dropping it into the /opt/packt/geolocation directory of the container. Take a look at the / at the end of the target path. That says that the JAR has to be copied into the directory. Before we can start the application, there is one thing we have to do, that is, expose the ports that we would like to be mapped to the Docker host ports. In our case, the in-memory Tomcat instance is running on port 8080. In order to be able to map port 8080 of our container to any port to our Docker host, we have to expose it first. For that, we will use the EXPOSE instruction. Add the following line to your Dockerfile: EXPOSE 8080 Now that we are ready to start the app, let's go ahead and tell Docker how to start a container for this image. For that, we will use the CMD instruction: CMD ["java", "-jar", "/opt/packt/geolocation/geolocation-0.0.1-SNAPSHOT.jar"] There are two things we have to note here. Once is the way we are starting the application and the other is how the command is broken down into comma-separated Strings. First, let's talk about how we start the application. You might be wondering why we haven't used the mvn spring-boot:run command to start the application. Keep in mind that this command will be executed inside the container, and our container does not have Maven installed, only OpenJDK 8. If you would like to use the maven command, take that as an exercise, and try to install Maven on your container and use the mvn command to start the application. Now that we know we have Java installed, we are issuing a very simple java –jar command to run the JAR. In fact, the Spring Boot Maven plugin internally issues the same command. The next thing is how the command has been broken down into comma-separated Strings. This is a standard that the CMD instruction follows. To keep it simple, keep in mind that for whatever command you would like to run upon running the container, just break it down into comma-separated Strings (in whitespaces). Your final Dockerfile should look something like this: FROM openjdk:8 RUN mkdir -p /opt/packt/geolocation ADD target/geolocation-0.0.1-SNAPSHOT.jar /opt/packt/geolocation/ EXPOSE 8080 CMD ["java", "-jar", "/opt/packt/geolocation/geolocation-0.0.1-SNAPSHOT.jar"] This Dockerfile is one of the simplest implementations. Dockerfiles can sometimes get bigger due to the fact that you need a lot of customizations to your image. In such cases, it is a good idea to break it down into multiple images that can be reused and maintained separately. There are some best practices to follow whenever you create your own Dockerfile and image. Though we haven't covered that here as it is out of the scope of this article, you still should take a look at and follow them. To learn more about the various Dockerfile instructions, go to https://docs.docker.com/engine/reference/builder/. Building your Docker image We created the Dockerfile, which will be used in this article to create an image for our microservice. If you are wondering why we would need an image, it is the only way we can ship our software to any system. Once you have your image created and uploaded to a common repository, it will be easier to pull your image from any location. Getting ready Before you jump right into it, it might be a good idea to get yourself familiar with some of the most commonly used Docker commands. In this article, we will use the build command. Take a look at this URL to understand the other commands: https://docs.docker.com/engine/reference/commandline/#/image-commands. After familiarizing yourself with the commands, open up a new terminal, and change your directory to the root of the geolocation project. Make sure your docker-machine instance is running. If it is not running, use the docker-machine start command to run your docker-machine instance: docker-machine start default If you have to configure your shell for the default Docker machine, go ahead and execute the following command: eval $(docker-machine env default) How to do it… From the terminal, issue the following docker build command: docker build –t packt/geolocation. We'll try to understand the command later. For now, let's see what happens after you issue the preceding command. You should see Docker downloading the openjdk image from Docker Hub. Once the image has been downloaded, you will see that Docker tries to validate each and every instruction provided in the Dockerfile. When the last instruction has been processed, you will see a message saying Successfully built. This says that your image has been successfully built. Now let's try to understand the command. There are three things to note here: The first thing is the docker build command itself. The docker build command is used to build a Docker image from a Dockerfile. It needs at least one input, which is usually the location of the Dockerfile. Dockerfiles can be renamed to something other than Dockerfile and can be referred to using the –f option of the docker build command. An instance of this being used is when teams have different Dockerfiles for different build environments, for example, using DockerfileDev for the dev environment, DockerfileStaging for the staging environment, and DockerfileProd for the production environment. It is still encouraged as best practice to use other Docker options in order to keep the same Dockerfile for all environments. The second thing is the –t option. The –t option takes the name of the repo and a tag. In our case, we have not mentioned the tag, so by default, it will pick up latest as the tag. If you look at the repo name, it is different from the official openjdk image name. It has two parts: packt and geolocation. It is always a good practice to put the Docker Hub account name followed by the actual image name as the name of your repo. For now, we will use packt as our account name, we will see how to create our own Docker Hub account and use that account name here. The third thing is the dot at the end. The dot operator says that the Dockerfile is located in the current directory, or the present working directory to be more precise. Let's go ahead and verify whether our image was created. In order to do that, issue the following command on your terminal: docker images The docker images command is used to list down all images available in your Docker host. After issuing the command, you should see something like this: As you can see, the newly built image is listed as packt/geolocation in your Docker host. The tag for this image is latest as we did not specify any. The image ID uniquely identifies your image. Note the size of the image. It is a few megabytes bigger than the openjdk:8 image. That is most probably because of the size of our executable uber JAR inside the container. Now that we know how to build an image using an existing Dockerfile, we are at the end of this article. This is just a very quick intro to the docker build command. There are more options that you can provide to the command, such as CPUs and memory. To learn more about the docker build command, take a look at this page: https://docs.docker.com/engine/reference/commandline/build/ Running your microservice as a Docker container We successfully created our Docker image in the Docker host. Keep in mind that if you are using Windows or Mac, your Docker host is the VirtualBox VM and not your local computer. In this article, we will look at how to spin off a container for the newly created image. Getting ready To spin off a new container for our packt/geolocation image, we will use the docker run command. This command is used to run any command inside your container, given the image. Open your terminal and go to the root of the geolocation project. If you have to start your Docker machine instance, do so using the docker-machine start command, and set the environment using the docker-machine env command. How to do it… Go ahead and issue the following command on your terminal: docker run packt/geolocation Right after you run the command, you should see something like this: Yay! We can see that our microservice is running as a Docker container. But wait—there is more to it. Let's see how we can access our microservice's in-memory Tomcat instance. Try to run a curl command to see if our app is up and running: Open a new terminal instance and execute the following cURL command in that shell: curl -H "Content-Type: application/json" -X POST -d '{"timestamp": 1468203975, "userId": "f1196aac-470e-11e6-beb8-9e71128cae77", "latitude": 41.803488, "longitude": -88.144040}' http://localhost:8080/geolocation Did you get an error message like this? curl: (7) Failed to connect to localhost port 8080: Connection refused Let's try to understand what happened here. Why would we get a connection refused error when our microservice logs clearly say that it is running on port 8080? Yes, you guessed it right: the microservice is not running on your local computer; it is actually running inside the container, which in turn is running inside your Docker host. Here, your Docker host is the VirtualBox VM called default. So we have to replace localhost with the IP of the container. But getting the IP of the container is not straightforward. That is the reason we are going to map port 8080 of the container to the same port on the VM. This mapping will make sure that any request made to port 8080 on the VM will be forwarded to port 8080 of the container. Now go to the shell that is currently running your container, and stop your container. Usually, Ctrl + C will do the job. After your container is stopped, issue the following command: docker run –p 8080:8080 packt/geolocation The –p option does the port mapping from Docker host to container. The port number to the left of the colon indicates the port number of the Docker host, and the port number to the right of the colon indicates that of the container. In our case, both of them are same. After you execute the previous command, you should see the same logs that you saw before. We are not done yet. We still have to find the IP that we have to use to hit our RESTful endpoint. The IP that we have to use is the IP of our Docker Machine VM. To find the IP of the docker-machine instance, execute the following command in a new terminal instance: docker-machine ip default. This should give you the IP of the VM. Let's say the IP that you received was 192.168.99.100. Now, replace localhost in your cURL command with this IP, and execute the cURL command again: curl -H "Content-Type: application/json" -X POST -d '{"timestamp": 1468203975, "userId": "f1196aac-470e-11e6-beb8-9e71128cae77", "latitude": 41.803488, "longitude": -88.144040}' http://192.168.99.100:8080/geolocation This should give you an output similar to the following (pretty-printed for readability): { "latitude": 41.803488, "longitude": -88.14404, "userId": "f1196aac-470e-11e6-beb8-9e71128cae77", "timestamp": 1468203975 } This confirms that you are able to access your microservice from the outside. Take a moment to understand how the port mapping is done. The following figure shows how your machine, VM, and container are orchestrated: This confirms that you are able to access your microservice from the outside. Summary We looked at an example of a geolocation tracker application to see how it can be broken down into smaller and manageable services. Next, we saw how to create the GeoLocationTracker service using the Spring Boot framework. Resources for Article: Further resources on this subject: Domain-Driven Design [article] Breaking into Microservices Architecture [article] A capability model for microservices [article]
Read more
  • 0
  • 0
  • 46772

article-image-turn-based-game-framework-duality-c
LőrincSerfőző
17 Jan 2017
6 min read
Save for later

Turn-based game framework in Duality (C#)

LőrincSerfőző
17 Jan 2017
6 min read
This post presents a simple turn-based game framework using the Duality game engine. The engine is written in C#, and scripting it requires the same language. The word 'scripting' is a bit misleading here, because it is more of the process of extending the engine. In this text, the inner workings of Duality are not explained in detail, as it is better done by the official documentation on GitHub. However, if you are familiar with the vocabulary of game development, the tool is rather easy to pick up, and this guide can be also followed. In addition, these concepts can be tailored to fit any other game framework or engine. Required tools Duality can be downloaded from the official site. A C# compiler and a text editor are also needed. Visual Studio 2013 or higher is recommended, but other IDEs like MonoDevelop also work. Overall model Turn-based games were popular long before the computer era, and even these days they are among the most successful releases. These games often require analytical thinking instead of lightning-fast reflexes, and favor longer term strategy over instinctive decisions. The following list gathers the most typical attributes of the genre, which we need to consider while building a turn-based game: The game process is divided into turns. Usually every player takes action once per turn. Everyone has to wait for their opportunity. The order of the players' actions can be fixed, or based on some sort of mechanic of initiative. In the latter case, the order can change between turns. During the game, some players drop out, some others can join. The system has to handle these changes. Implementation In the following paragraph, a prototype-quality system is described in order to achieve these goals. In addition to discrete time measurement, turn-based games often utilize discrete measurement of space, for example a grid based movement system. We are implementing that as well. The solution contains two distinct building blocks: a manager object an entity object which has multiple instances. The manager object, as its name suggests,arranges the order of the entities taking action in the turn and asks them to decide their own action, a movement direction in our case. It should not distinguish between player-controlled entities and AI ones. Thus the actual logic behind the entities can be various, but they need to implement the same methods—it is clear that we need an interface language struct for that. Defining the ICmpMovementEntity interface publicenum Decision { NotDecided, Stay, UpMove, RightMove, DownMove, LeftMove } public interface ICmpMovementEntity // [1] { int Initiative { get; set; } // [2] Decision RequestDecision (); // [3] } The interface's name has the ICmp prefix. This follows a convention in Duality, and indicates that only Component objects should implement that. Initiative returns an integer, used by the manager object to determine the order of entities in the turn. The method RequestDecision returns the enum type Decision. Its value is NotDecided, when there is no decision yet. In that case, the same entity is asked at the next game loop update. If the returned value is Stay, the entity object remains at its place, otherwise it is moved to the returned direction. Implementing the manager object The skeleton of the manager object is the following: internal class TurnMovementManager { private const float GRID = 64; // [1] private readonlyHashSet<ICmpMovementEntity>entitiesMovedInTurn = new HashSet<ICmpMovementEntity> (); // [2] private ICmpMovementEntityonTurnEntity; // [3] public void Tick (); // [4] private ICmpMovementEntityGetNextNotMovedEntity (); // [5] private void MoveEntity(ICmpMovementEntity entity, Decision decision); // [6] private void NextTurn (); // [7] } GRID is the discrete measurement step in the game world. entitiesMovedInTurn is the set of the entities already taken action in the current turn. onTurnEntity keeps track the entity that is asked to decide its move next. Tick is invoked every game loop update. The main processing happens here. public void Tick () { onTurnEntity = GetNextNotMovedEntity (); if (onTurnEntity == null) { NextTurn (); return; } var decision = onTurnEntity.RequestDecision (); if (decision != Decision.NotDecided&& decision != Decision.Stay) { entitiesMovedInTurn.Add (onTurnEntity); MoveEntity (onTurnEntity, decision); } } GetNextNotMovedEntity collects the entities not moved in the current turn, sorts them by their initiative, and returns the first. If there are no unmoved entities left, it returns null. privateICmpMovementEntityGetNextNotMovedEntity () { varentitiesInScene = Scene.Current.FindComponents<ICmpMovementEntity> (); varnotMovedEntities = entitiesInScene.Where (ent => !entitiesMovedInTurn.Contains (ent)).ToList (); Comparison<ICmpMovementEntity> compare = (ent1, ent2) =>ent2.Initiative.CompareTo (ent1.Initiative); notMovedEntities.Sort (compare); return notMovedEntities.FirstOrDefault (); } MoveEntity displaces the currently processed entity, according to its decision. private void MoveEntity (ICmpMovementEntity entity, Decision decision) { varentityComponent = onTurnEntity as Component; var transform = entityComponent.GameObj.Transform; Vector2 direction; switch (decision) { case Decision.UpMove: direction = -Vector2.UnitY; break; case Decision.RightMove: direction = Vector2.UnitX; break; case Decision.DownMove: direction = Vector2.UnitY; break; case Decision.LeftMove: direction = -Vector2.UnitX; break; case Decision.NotDecided: case Decision.Stay: default: throw new ArgumentOutOfRangeException(nameof(decision), decision, null); } transform.MoveByAbs(GRID * direction); } NextTurn clears the moved entity set. private void NextTurn () { entitiesMovedInTurn.Clear (); } Driving the manager object from the CorePlugin Developing games in the Duality engine is usually done via Core Plugin development. Every assembly that extends the base engine functionality needs to implement a CorePlugin object. These objects can be used drive global logic, such as our manager class. The TurnbasedMovementCorePlugin class overrides the OnAfterUpdate method of its superclass, to update a TurnMovementManager instance every frame. public class TurnbasedMovementCorePlugin : CorePlugin { private readonlyTurnMovementManagerturnMovementManager = new TurnMovementManager(); protected override void OnAfterUpdate () { base.OnAfterUpdate (); if (DualityApp.ExecContext == DualityApp.ExecutionContext.Game) { turnMovementManager.Tick (); } } } Trivial test implementation of a ICmpMovementEntity ICmpMovementEntity implementations can be complex, but for demonstration purposes, a simpler one is presented below. It is based on user input. [RequiredComponent(typeof(Transform))] public class TurnMovementTestCmp : Component, ICmpMovementEntity { public int Initiative { get; set; } = 1; public Decision RequestDecision () { if (DualityApp.Keyboard.KeyHit (Key.Space)) return Decision.RightMove; return Decision.NotDecided; } } Summary Of course more convoluted ICmpMovementEntity implementations are needed for game logic. I hope you enjoyed this post. In case you have any questions, feel free to post them below, or on the Duality forums. About the author LorincSerfozo is a software engineer at Graphisoft, the company behind the the BIM solution ArchiCAD. He is studying mechatronics engineering at Budapest University of Technology and Economics, an interdisciplinary field between the more traditional mechanical engineering, electrical engineering and informatics, and has quickly grown a passion towards software development. He is a supporter of opensource software and contributes to the C# and OpenGL-based Duality game engine, creating free plugins and tools for itsusers.
Read more
  • 0
  • 0
  • 7482
Unlock access to the largest independent learning library in Tech for FREE!
Get unlimited access to 7500+ expert-authored eBooks and video courses covering every tech area you can think of.
Renews at $19.99/month. Cancel anytime
article-image-chef-language-and-style
Packt
17 Jan 2017
14 min read
Save for later

Chef Language and Style

Packt
17 Jan 2017
14 min read
In this article by Matthias Marschall, author of the book, Chef Cookbook, Third Edition, we will cover the following section: Using community Chef style Using attributes to dynamically configure recipes Using templates Mixing plain Ruby with Chef DSL (For more resources related to this topic, see here.) Introduction If you want to automate your infrastructure, you will end up using most of Chef's language features. In this article, we will look at how to use the Chef Domain Specific Language (DSL) from basic to advanced level. Using community Chef style It's easier to read code that adheres to a coding style guide. It is important to deliver consistently styled code, especially when sharing cookbooks with the Chef community. In this article, you'll find some of the most important rules (out of many more—enough to fill a short book on their own) to apply to your own cookbooks. Getting ready As you're writing cookbooks in Ruby, it's a good idea to follow general Ruby principles for readable (and therefore maintainable) code. Chef Software, Inc. proposes Ian Macdonald's Ruby Style Guide (http://www.caliban.org/ruby/rubyguide.shtml#style), but to be honest, I prefer Bozhidar Batsov's Ruby Style Guide (https://github.com/bbatsov/ruby-style-guide) due to its clarity. Let's look at the most important rules for Ruby in general and for cookbooks specifically. How to do it… Let's walk through a few Chef style guide examples: Use two spaces per indentation level: remote_directory node['nagios']['plugin_dir'] do source 'plugins' end Use Unix-style line endings. Avoid Windows line endings by configuring Git accordingly: mma@laptop:~/chef-repo $ git config --global core.autocrlf true For more options on how to deal with line endings in Git, go to https://help.github.com/articles/dealing-with-line-endings. Align parameters spanning more than one line: variables( mon_host: 'monitoring.example.com', nrpe_directory: "#{node['nagios']['nrpe']['conf_dir']}/nrpe.d" ) Describe your cookbook in metadata.rb (you should always use the Ruby DSL: Version your cookbook according to Semantic Versioning standards (http://semver.org): version "1.1.0" List the supported operating systems by looping through an array using the each method: %w(redhat centos ubuntu debian).each do |os| supports os end Declare dependencies and pin their versions in metadata.rb: depends "apache2", ">= 1.0.4" depends "build-essential" Construct strings from variable values and static parts by using string expansion: my_string = "This resource changed #{counter} files" Download temporary files to Chef::Config['file_cache_path'] instead of /tmp or some local directory. Use strings to access node attributes instead of Ruby symbols: node['nagios']['users_databag_group'] Set attributes in my_cookbook/attributes/default.rb by using default: default['my_cookbook']['version'] = "3.0.11" Create an attribute namespace by using your cookbook name as the first level in my_cookbook/attributes/default.rb: default['my_cookbook']['version'] = "3.0.11" default['my_cookbook']['name'] = "Mine" How it works... Using community Chef style helps to increase the readability of your cookbooks. Your cookbooks will be read much more often than changed. Because of this, it usually pays off to put a little extra effort into following a strict style guide when writing cookbooks. There's more... Using Semantic Versioning (see http://semver.org) for your cookbooks helps to manage dependencies. If you change anything that might break cookbooks, depending on your cookbook, you need to consider this as a backwards incompatible API change. In such cases, Semantic Versioning demands that you increase the major number of your cookbook, for example from 1.1.3 to 2.0.0, resetting minor level and patch levels. Using attributes to dynamically configure recipes Imagine some cookbook author has hardcoded the path where the cookbook puts a configuration file, but in a place that does not comply with your rules. Now, you're in trouble! You can either patch the cookbook or rewrite it from scratch. Both options leave you with a headache and lots of work. Attributes are there to avoid such headaches. Instead of hardcoding values inside cookbooks, attributes enable authors to make their cookbooks configurable. By overriding default values set in cookbooks, users can inject their own values. Suddenly, it's next to trivial to obey your own rules. In this section, we'll see how to use attributes in your cookbooks. Getting ready Make sure you have a cookbook called my_cookbook and the run_list of your node includes my_cookbook. How to do it... Let's see how to define and use a simple attribute: Create a default file for your cookbook attributes: mma@laptop:~/chef-repo $ subl cookbooks/my_cookbook/attributes/default.rb Add a default attribute: default['my_cookbook']['message'] = 'hello world!' Use the attribute inside a recipe: mma@laptop:~/chef-repo $ subl cookbooks/my_cookbook/recipes/default.rb message = node['my_cookbook']['message'] Chef::Log.info("** Saying what I was told to say: #{message}") Upload the modified cookbook to the Chef server: mma@laptop:~/chef-repo $ knife cookbook upload my_cookbook Uploading my_cookbook [0.1.0] Run chef-client on your node: user@server:~$ sudo chef-client ...TRUNCATED OUTPUT... [2016-11-23T19:29:03+00:00] INFO: ** Saying what I was told to say: hello world! ...TRUNCATED OUTPUT... How it works… Chef loads all attributes from the attribute files before it executes the recipes. The attributes are stored with the node object. You can access all attributes stored with the node object from within your recipes and retrieve their current values. Chef has a strict order of precedence for attributes: Default is the lowest, then normal (which is aliased with set), and then override. Additionally, attribute levels set in recipes have precedence over the same level set in an attribute file. Also, attributes defined in roles and environments have the highest precedence. You will find an overview chart at https://docs.chef.io/attributes.html#attribute-precedence. There's more… You can set and override attributes within roles and environments. Attributes defined in roles or environments have the highest precedence (on their respective levels: default and override): Create a role: mma@laptop:~/chef-repo $ subl roles/german_hosts.rb name "german_hosts" description "This Role contains hosts, which should print out their messages in German" run_list "recipe[my_cookbook]" default_attributes "my_cookbook" => { "message" => "Hallo Welt!" } Upload the role to the Chef server: mma@laptop:~/chef-repo $ knife role from file german_hosts.rb Updated Role german_hosts! Assign the role to a node called server: mma@laptop:~/chef-repo $ knife node run_list add server 'role[german_hosts]' server: run_list: role[german_hosts] Run the Chef client on your node: user@server:~$ sudo chef-client ...TRUNCATED OUTPUT... [2016-11-23T19:40:56+00:00] INFO: ** Saying what I was told to say: Hallo Welt! ...TRUNCATED OUTPUT... Calculating values in the attribute files Attributes set in roles and environments (as shown earlier) have the highest precedence and they're already available when the attribute files are loaded. This enables you to calculate attribute values based on role or environment-specific values: Set an attribute within a role: mma@laptop:~/chef-repo $ subl roles/german_hosts.rb name "german_hosts" description "This Role contains hosts, which should print out their messages in German" run_list "recipe[my_cookbook]" default_attributes "my_cookbook" => { "hi" => "Hallo", "world" => "Welt" } Calculate the message attribute, based on the two attributes hi and world: mma@laptop:~/chef-repo $ subl cookbooks/my_cookbook/attributes/default.rb default['my_cookbook']['message'] = "#{node['my_cookbook']['hi']} #{node['my_cookbook']['world']}!" Upload the modified cookbook to your Chef server and run the Chef client on your node to see that it works, as shown in the preceding example. See also Read more about attributes in Chef at https://docs.chef.io/attributes.html Using templates Configuration Management is all about configuring your hosts well. Usually, configuration is carried out by using configuration files. Chef's template resource allows you to recreate these configuration files with dynamic values that are driven by the attributes we've discussed so far in this article. You can retrieve dynamic values from data bags, attributes, or even calculate them on the fly before passing them into a template. Getting ready Make sure you have a cookbook called my_cookbook and that the run_list of your node includes my_cookbook. How to do it… Let's see how to create and use a template to dynamically generate a file on your node: Add a template to your recipe: mma@laptop:~/chef-repo $ subl cookbooks/my_cookbook/recipes/default.rb template '/tmp/message' do source 'message.erb' variables( hi: 'Hallo', world: 'Welt', from: node['fqdn'] ) end Add the ERB template file: mma@laptop:~/chef-repo $ mkdir -p cookbooks/my_cookbook/templates mma@laptop:~/chef-repo $ subl cookbooks/my_cookbook/templates/default/message.erb <%- 4.times do %> <%= @hi %>, <%= @world %> from <%= @from %>! <%- end %> Upload the modified cookbook to the Chef server: mma@laptop:~/chef-repo $ knife cookbook upload my_cookbook Uploading my_cookbook [0.1.0] Run the Chef client on your node: user@server:~$ sudo chef-client ...TRUNCATED OUTPUT... [2016-11-23T19:36:30+00:00] INFO: Processing template[/tmp/message] action create (my_cookbook::default line 9) [2016-11-23T19:36:31+00:00] INFO: template[/tmp/message] updated content ...TRUNCATED OUTPUT... Validate the content of the generated file: user@server:~$ sudo cat /tmp/message Hallo, Welt from vagrant.vm! Hallo, Welt from vagrant.vm! Hallo, Welt from vagrant.vm! Hallo, Welt from vagrant.vm! How it works… Chef uses Erubis as its template language. It allows you to use pure Ruby code by using special symbols inside your templates. These are commonly called the 'angry squid' You use <%= %> if you want to print the value of a variable or Ruby expression into the generated file. You use <%- %> if you want to embed Ruby logic into your template file. We used it to loop our expression four times. When you use the template resource, Chef makes all the variables you pass available as instance variables when rendering the template. We used @hi, @world, and @from in our earlier example. There's more… The node object is available in a template as well. Technically, you could access node attributes directly from within your template: <%= node['fqdn'] %> However, this is not a good idea because it will introduce hidden dependencies to your template. It is better to make dependencies explicit, for example, by declaring the fully qualified domain name (FQDN) of your node as a variable for the template resource inside your cookbook: template '/tmp/fqdn' do source 'fqdn.erb' variables( fqdn: node['fqdn'] ) end Avoid using the node object directly inside your templates because this introduces hidden dependencies to node variables in your templates. If you need a different template for a specific host or platform, you can put those specific templates into various subdirectories of the templates directory. Chef will try to locate the correct template by searching these directories from the most specific (host) to the least specific (default). You can place message.erb in the cookbooks/my_cookbook/templates/host-server.vm ("host-#{node[:fqdn]}") directory if it is specific to that host. If it is platform-specific, you can place it in cookbooks/my_cookbook/templates/ubuntu ("#{node[:platform]}"); and if it is specific to a certain platform version, you can place it in cookbooks/my_cookbook/templates/ubuntu-16.04 ("#{node[:platform]}-#{node[:platorm_version]}"). Only place it in the default directory if your template is the same for any host or platform. Know the templates/default directory means that a template file is the same for all hosts and platforms—it does not correspond to a recipe name. See also Read more about templates at https://docs.chef.io/templates.html Mixing plain Ruby with Chef DSL To create simple recipes, you only need to use resources provided by Chef such as template, remote_file, or service. However, as your recipes become more elaborate, you'll discover the need to do more advanced things such as conditionally executing parts of your recipe, looping, or even making complex calculations. Instead of declaring the gem_package resource ten times, simply use different name attributes; it is so much easier to loop through an array of gem names creating the gem_package resources on the fly. This is the power of mixing plain Ruby with Chef Domain Specific Language (DSL). We'll see a few tricks in the following sections. Getting ready Start a chef-shell on any of your nodes in Client mode to be able to access your Chef server, as shown in the following code: user@server:~$ sudo chef-shell --client loading configuration: /etc/chef/client.rb Session type: client ...TRUNCATED OUTPUT... run `help' for help, `exit' or ^D to quit. Ohai2u user@server! chef > How to do it… Let's play around with some Ruby constructs in chef-shell to get a feel for what's possible: Get all nodes from the Chef server by using search from the Chef DSL: chef > nodes = search(:node, "hostname:[* TO *]") => [#<Chef::Node:0x00000005010d38 @chef_server_rest=nil, @name="server", ...TRUNCATED OUTPUT... Sort your nodes by name by using plain Ruby: chef > nodes.sort! { |a, b| a.hostname <=> b.hostname }.collect { |n| n.hostname } => ["alice", "server"] Loop through the nodes, printing their operating systems: chef > nodes.each do |n| chef > puts n['os'] chef ?> end linux windows => [node[server], node[alice]] Log only if there are no nodes: chef > Chef::Log.warn("No nodes found") if nodes.empty? => nil Install multiple Ruby gems by using an array, a loop, and string expansion to construct the gem names: chef > recipe_mode chef:recipe > %w{ec2 essentials}.each do |gem| chef:recipe > gem_package "knife-#{gem}" chef:recipe ?> end => ["ec2", "essentials"] How it works... Chef recipes are Ruby files, which get evaluated in the context of a Chef run. They can contain plain Ruby code, such as if statements and loops, as well as Chef DSL elements such as resources (remote_file, service, template, and so on). Inside your recipes, you can declare Ruby variables and assign them any values. We used the Chef DSL method search to retrieve an array of Chef::Node instances and stored that array in the variable nodes. Because nodes is a plain Ruby array, we can use all methods the array class provides such as sort! or empty? Also, we can iterate through the array by using the plain Ruby each iterator, as we did in the third example. Another common thing is to use if, else, or case for conditional execution. In the fourth example, we used if to only write a warning to the log file if the nodes array are empty. In the last example, we entered recipe mode and combined an array of strings (holding parts of gem names) and the each iterator with the Chef DSL gem_package resource to install two Ruby gems. To take things one step further, we used plain Ruby string expansion to construct the full gem names (knife-ec2 and knife-essentials) on the fly. There's more… You can use the full power of Ruby in combination with the Chef DSL in your recipes. Here is an excerpt from the default recipe from the nagios cookbook, which shows what's possible: # Sort by name to provide stable ordering nodes.sort! { |a, b| a.name <=> b.name } # maps nodes into nagios hostgroups service_hosts = {} search(:role, ‚*:*') do |r| hostgroups << r.name nodes.select { |n| n[‚roles'].include?(r.name) if n[‚roles'] }.each do |n| service_hosts[r.name] = n[node[‚nagios'][‚host_name_attribute']] end end First, they use Ruby to sort an array of nodes by their name attributes. Then, they define a Ruby variable called service_hosts as an empty Hash. After this, you will see some more array methods in action such as select, include?, and each. See also Find out more about how to use Ruby in recipes here: https://docs.chef.io/chef/dsl_recipe.html There's more… If you don't want to modify existing cookbooks, this is currently the only way to modify parts of recipes which are not meant to be configured via attributes. This approach is exactly the same thing as monkey-patching any Ruby class by reopening it in your own source files. This usually leads to brittle code, as your code now depends on implementation details of another piece of code instead of depending on its public interface (in Chef recipes, the public interface is its attributes). Keep such cookbook modifications in a separate place so that you can easily find out what you did later. If you bury your modifications deep inside your complicated cookbooks, you might experience issues later that are very hard to debug. Resources for Article: Further resources on this subject: Getting started with using Chef [article] Going Beyond the Basics [article] An Overview of Automation and Advent of Chef [article]
Read more
  • 0
  • 0
  • 23261

article-image-flink-complex-event-processing
Packt
16 Jan 2017
13 min read
Save for later

Flink Complex Event Processing

Packt
16 Jan 2017
13 min read
In this article by Tanmay Deshpande, the author of the book Mastering Apache Flink, we will learn the Table API provided by Apache Flink and how we can use it to process relational data structures. We will start learning more about the libraries provided by Apache Flink and how we can use them for specific use cases. To start with, let's try to understand a library called complex event processing (CEP). CEP is a very interesting but complex topic which has its value in various industries. Wherever there is a stream of events expected, naturally people want to perform complex event processing in all such use cases. Let's try to understand what CEP is all about. (For more resources related to this topic, see here.) What is complex event processing? CEP is a technique to analyze streams of disparate events occurring with high frequency and low latency. These days, streaming events can be found in various industries, for example: In the oil and gas domain, sensor data comes from various drilling tools or from upstream oil pipeline equipment In the security domain, activity data, malware information, and usage pattern data come from various end points In the wearable domain, data comes from various wrist bands with information about your heart beat rate, your activity, and so on In the banking domain, data from credit cards usage, banking activities, and so on It is very important to analyze the variation patterns to get notified in real time about any change in the regular assembly. CEP is able to understand the patterns across the streams of events, sub-events, and their sequences. CEP helps to identify meaningful patterns and complex relationships among unrelated events, and sends notifications in real and near real time to avoid any damage: The preceding diagram shows how the CEP flow works. Even though the flow looks simple, CEP has various abilities such as: Ability to produce results as soon as the input event stream is available Ability to provide computations like aggregation over time and timeout between two events of interest Ability to provide real time/near real time alerts and notifications on detection of complex event patterns Ability to connect and correlate heterogeneous sources and analyze patterns in them Ability to achieve high throughput, low latency processing There are various solutions available in the market. With big data technology advancements, we have multiple options like Apache Spark, Apache Samza, Apache Beam, among others, but none of them have a dedicated library to fit all solutions. Now let us try to understand what we can achieve with Flink's CEP library. Flink CEP Apache Flink provides the Flink CEP library which provides APIs to perform complex event processing. The library consists of the following core components: Event stream Pattern definition Pattern detection Alert generation Flink CEP works on Flink's streaming API called DataStream. A programmer needs to define the pattern to be detected from the stream of events and then Flink's CEP engine detects the pattern and takes appropriate action, such as generating alerts. In order to get started, we need to add following Maven dependency: <!-- https://mvnrepository.com/artifact/org.apache.flink/flink-cep-scala_2.10 --> <dependency> <groupId>org.apache.flink</groupId> <artifactId>flink-cep-scala_2.10</artifactId> <version>1.1.2</version> </dependency> Event stream A very important component of CEP is its input event stream. We have seen details of DataStream API. Now let's use that knowledge to implement CEP. The very first thing we need to do is define a Java POJO for the event. Let's assume we need to monitor a temperature sensor event stream. First we define an abstract class and then extend this class. While defining the event POJOs we need to make sure that we implement the hashCode() and equals() methods, as while comparing the events, compile will make use of them. The following code snippets demonstrate this. First, we write an abstract class as shown here: package com.demo.chapter05; public abstract class MonitoringEvent { private String machineName; public String getMachineName() { return machineName; } public void setMachineName(String machineName) { this.machineName = machineName; } @Override public int hashCode() { final int prime = 31; int result = 1; result = prime * result + ((machineName == null) ? 0 : machineName.hashCode()); return result; } @Override public boolean equals(Object obj) { if (this == obj) return true; if (obj == null) return false; if (getClass() != obj.getClass()) return false; MonitoringEvent other = (MonitoringEvent) obj; if (machineName == null) { if (other.machineName != null) return false; } else if (!machineName.equals(other.machineName)) return false; return true; } public MonitoringEvent(String machineName) { super(); this.machineName = machineName; } } Then we write the actual temperature event: package com.demo.chapter05; public class TemperatureEvent extends MonitoringEvent { public TemperatureEvent(String machineName) { super(machineName); } private double temperature; public double getTemperature() { return temperature; } public void setTemperature(double temperature) { this.temperature = temperature; } @Override public int hashCode() { final int prime = 31; int result = super.hashCode(); long temp; temp = Double.doubleToLongBits(temperature); result = prime * result + (int) (temp ^ (temp >>> 32)); return result; } @Override public boolean equals(Object obj) { if (this == obj) return true; if (!super.equals(obj)) return false; if (getClass() != obj.getClass()) return false; TemperatureEvent other = (TemperatureEvent) obj; if (Double.doubleToLongBits(temperature) != Double.doubleToLongBits(other.temperature)) return false; return true; } public TemperatureEvent(String machineName, double temperature) { super(machineName); this.temperature = temperature; } @Override public String toString() { return "TemperatureEvent [getTemperature()=" + getTemperature() + ", getMachineName()=" + getMachineName() + "]"; } } Now we can define the event source as shown follows. In Java: StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment(); DataStream<TemperatureEvent> inputEventStream = env.fromElements(new TemperatureEvent("xyz", 22.0), new TemperatureEvent("xyz", 20.1), new TemperatureEvent("xyz", 21.1), new TemperatureEvent("xyz", 22.2), new TemperatureEvent("xyz", 22.1), new TemperatureEvent("xyz", 22.3), new TemperatureEvent("xyz", 22.1), new TemperatureEvent("xyz", 22.4), new TemperatureEvent("xyz", 22.7), new TemperatureEvent("xyz", 27.0)); In Scala: val env: StreamExecutionEnvironment = StreamExecutionEnvironment.getExecutionEnvironment val input: DataStream[TemperatureEvent] = env.fromElements(new TemperatureEvent("xyz", 22.0), new TemperatureEvent("xyz", 20.1), new TemperatureEvent("xyz", 21.1), new TemperatureEvent("xyz", 22.2), new TemperatureEvent("xyz", 22.1), new TemperatureEvent("xyz", 22.3), new TemperatureEvent("xyz", 22.1), new TemperatureEvent("xyz", 22.4), new TemperatureEvent("xyz", 22.7), new TemperatureEvent("xyz", 27.0)) Pattern API Pattern API allows you to define complex event patterns very easily. Each pattern consists of multiple states. To go from one state to another state, generally we need to define the conditions. The conditions could be continuity or filtered out events. Let's try to understand each pattern operation in detail. Begin The initial state can be defined as follows: In Java: Pattern<Event, ?> start = Pattern.<Event>begin("start"); In Scala: val start : Pattern[Event, _] = Pattern.begin("start") Filter We can also specify the filter condition for the initial state: In Java: start.where(new FilterFunction<Event>() { @Override public boolean filter(Event value) { return ... // condition } }); In Scala: start.where(event => ... /* condition */) Subtype We can also filter out events based on their sub-types, using the subtype() method. In Java: start.subtype(SubEvent.class).where(new FilterFunction<SubEvent>() { @Override public boolean filter(SubEvent value) { return ... // condition } }); In Scala: start.subtype(classOf[SubEvent]).where(subEvent => ... /* condition */) Or Pattern API also allows us define multiple conditions together. We can use OR and AND operators. In Java: pattern.where(new FilterFunction<Event>() { @Override public boolean filter(Event value) { return ... // condition } }).or(new FilterFunction<Event>() { @Override public boolean filter(Event value) { return ... // or condition } }); In Scala: pattern.where(event => ... /* condition */).or(event => ... /* or condition */) Continuity As stated earlier, we do not always need to filter out events. There can always be some pattern where we need continuity instead of filters. Continuity can be of two types – strict continuity and non-strict continuity. Strict continuity Strict continuity needs two events to succeed directly which means there should be no other event in between. This pattern can be defined by next(). In Java: Pattern<Event, ?> strictNext = start.next("middle"); In Scala: val strictNext: Pattern[Event, _] = start.next("middle") Non-strict continuity Non-strict continuity can be stated as other events are allowed to be in between the specific two events. This pattern can be defined by followedBy(). In Java: Pattern<Event, ?> nonStrictNext = start.followedBy("middle"); In Scala: val nonStrictNext : Pattern[Event, _] = start.followedBy("middle") Within Pattern API also allows us to do pattern matching based on time intervals. We can define a time-based temporal constraint as follows. In Java: next.within(Time.seconds(30)); In Scala: next.within(Time.seconds(10)) Detecting patterns To detect the patterns against the stream of events, we need run the stream though the pattern. The CEP.pattern() returns PatternStream. The following code snippet shows how we can detect a pattern. First the pattern is defined to check if temperature value is greater than 26.0 degrees in 10 seconds. In Java: Pattern<TemperatureEvent, ?> warningPattern = Pattern.<TemperatureEvent> begin("first") .subtype(TemperatureEvent.class).where(new FilterFunction<TemperatureEvent>() { public boolean filter(TemperatureEvent value) { if (value.getTemperature() >= 26.0) { return true; } return false; } }).within(Time.seconds(10)); PatternStream<TemperatureEvent> patternStream = CEP.pattern(inputEventStream, warningPattern); In Scala: val env: StreamExecutionEnvironment = StreamExecutionEnvironment.getExecutionEnvironment val input = // data val pattern: Pattern[TempEvent, _] = Pattern.begin("start").where(event => event.temp >= 26.0) val patternStream: PatternStream[TempEvent] = CEP.pattern(input, pattern) Use case – complex event processing on temperature sensor In earlier sections, we learnt various features provided by the Flink CEP engine. Now it's time to understand how we can use it in real-world solutions. For that let's assume we work for a mechanical company which produces some products. In the product factory, there is a need to constantly monitor certain machines. The factory has already set up the sensors which keep on sending the temperature of the machines at a given time. Now we will be setting up a system that constantly monitors the temperature value and generates an alert if the temperature exceeds a certain value. We can use the following architecture: Here we will be using Kafka to collect events from sensors. In order to write a Java application, we first need to create a Maven project and add the following dependency: <!-- https://mvnrepository.com/artifact/org.apache.flink/flink-cep-scala_2.10 --> <dependency> <groupId>org.apache.flink</groupId> <artifactId>flink-cep-scala_2.10</artifactId> <version>1.1.2</version> </dependency> <!-- https://mvnrepository.com/artifact/org.apache.flink/flink-streaming-java_2.10 --> <dependency> <groupId>org.apache.flink</groupId> <artifactId>flink-streaming-java_2.10</artifactId> <version>1.1.2</version> </dependency> <!-- https://mvnrepository.com/artifact/org.apache.flink/flink-streaming-scala_2.10 --> <dependency> <groupId>org.apache.flink</groupId> <artifactId>flink-streaming-scala_2.10</artifactId> <version>1.1.2</version> </dependency> <dependency> <groupId>org.apache.flink</groupId> <artifactId>flink-connector-kafka-0.9_2.10</artifactId> <version>1.0.0</version> </dependency> Next we need to do following things for using Kafka. First we need to define a custom Kafka deserializer. This will read bytes from a Kafka topic and convert it into TemperatureEvent. The following is the code to do this. EventDeserializationSchema.java: package com.demo.chapter05; import java.io.IOException; import java.nio.charset.StandardCharsets; import org.apache.flink.api.common.typeinfo.TypeInformation; import org.apache.flink.api.java.typeutils.TypeExtractor; import org.apache.flink.streaming.util.serialization.DeserializationSchema; public class EventDeserializationSchema implements DeserializationSchema<TemperatureEvent> { public TypeInformation<TemperatureEvent> getProducedType() { return TypeExtractor.getForClass(TemperatureEvent.class); } public TemperatureEvent deserialize(byte[] arg0) throws IOException { String str = new String(arg0, StandardCharsets.UTF_8); String[] parts = str.split("="); return new TemperatureEvent(parts[0], Double.parseDouble(parts[1])); } public boolean isEndOfStream(TemperatureEvent arg0) { return false; } } Next we create topics in Kafka called temperature: bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic temperature Now we move to Java code which would listen to these events in Flink streams: StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment(); Properties properties = new Properties(); properties.setProperty("bootstrap.servers", "localhost:9092"); properties.setProperty("group.id", "test"); DataStream<TemperatureEvent> inputEventStream = env.addSource( new FlinkKafkaConsumer09<TemperatureEvent>("temperature", new EventDeserializationSchema(), properties)); Next we will define the pattern to check if the temperature is greater than 26.0 degrees Celsius within 10 seconds: Pattern<TemperatureEvent, ?> warningPattern = Pattern.<TemperatureEvent> begin("first").subtype(TemperatureEvent.class).where(new FilterFunction<TemperatureEvent>() { private static final long serialVersionUID = 1L; public boolean filter(TemperatureEvent value) { if (value.getTemperature() >= 26.0) { return true; } return false; } }).within(Time.seconds(10)); Next match this pattern with the stream of events and select the event. We will also add up the alert messages into results stream as shown here: DataStream<Alert> patternStream = CEP.pattern(inputEventStream, warningPattern) .select(new PatternSelectFunction<TemperatureEvent, Alert>() { private static final long serialVersionUID = 1L; public Alert select(Map<String, TemperatureEvent> event) throws Exception { return new Alert("Temperature Rise Detected:" + event.get("first").getTemperature() + " on machine name:" + event.get("first").getMachineName()); } }); In order to know the alerts generated, we will print the results: patternStream.print(); And we execute the stream: env.execute("CEP on Temperature Sensor"); Now we are all set to execute the application. So as and when we get messages in Kafka topics, the CEP will keep on executing. The actual execution will looks like the following. Example input: xyz=21.0 xyz=30.0 LogShaft=29.3 Boiler=23.1 Boiler=24.2 Boiler=27.0 Boiler=29.0 Example output: Connected to JobManager at Actor[akka://flink/user/jobmanager_1#1010488393] 10/09/2016 18:15:55 Job execution switched to status RUNNING. 10/09/2016 18:15:55 Source: Custom Source(1/4) switched to SCHEDULED 10/09/2016 18:15:55 Source: Custom Source(1/4) switched to DEPLOYING 10/09/2016 18:15:55 Source: Custom Source(2/4) switched to SCHEDULED 10/09/2016 18:15:55 Source: Custom Source(2/4) switched to DEPLOYING 10/09/2016 18:15:55 Source: Custom Source(3/4) switched to SCHEDULED 10/09/2016 18:15:55 Source: Custom Source(3/4) switched to DEPLOYING 10/09/2016 18:15:55 Source: Custom Source(4/4) switched to SCHEDULED 10/09/2016 18:15:55 Source: Custom Source(4/4) switched to DEPLOYING 10/09/2016 18:15:55 CEPPatternOperator(1/1) switched to SCHEDULED 10/09/2016 18:15:55 CEPPatternOperator(1/1) switched to DEPLOYING 10/09/2016 18:15:55 Map -> Sink: Unnamed(1/4) switched to SCHEDULED 10/09/2016 18:15:55 Map -> Sink: Unnamed(1/4) switched to DEPLOYING 10/09/2016 18:15:55 Map -> Sink: Unnamed(2/4) switched to SCHEDULED 10/09/2016 18:15:55 Map -> Sink: Unnamed(2/4) switched to DEPLOYING 10/09/2016 18:15:55 Map -> Sink: Unnamed(3/4) switched to SCHEDULED 10/09/2016 18:15:55 Map -> Sink: Unnamed(3/4) switched to DEPLOYING 10/09/2016 18:15:55 Map -> Sink: Unnamed(4/4) switched to SCHEDULED 10/09/2016 18:15:55 Map -> Sink: Unnamed(4/4) switched to DEPLOYING 10/09/2016 18:15:55 Source: Custom Source(2/4) switched to RUNNING 10/09/2016 18:15:55 Source: Custom Source(3/4) switched to RUNNING 10/09/2016 18:15:55 Map -> Sink: Unnamed(1/4) switched to RUNNING 10/09/2016 18:15:55 Map -> Sink: Unnamed(2/4) switched to RUNNING 10/09/2016 18:15:55 Map -> Sink: Unnamed(3/4) switched to RUNNING 10/09/2016 18:15:55 Source: Custom Source(4/4) switched to RUNNING 10/09/2016 18:15:55 Source: Custom Source(1/4) switched to RUNNING 10/09/2016 18:15:55 CEPPatternOperator(1/1) switched to RUNNING 10/09/2016 18:15:55 Map -> Sink: Unnamed(4/4) switched to RUNNING 1> Alert [message=Temperature Rise Detected:30.0 on machine name:xyz] 2> Alert [message=Temperature Rise Detected:29.3 on machine name:LogShaft] 3> Alert [message=Temperature Rise Detected:27.0 on machine name:Boiler] 4> Alert [message=Temperature Rise Detected:29.0 on machine name:Boiler] We can also configure a mail client and use some external web hook to send e-mail or messenger notifications. The code for the application can be found on GitHub: https://github.com/deshpandetanmay/mastering-flink. Summary We learnt about complex event processing (CEP). We discussed the challenges involved and how we can use the Flink CEP library to solve CEP problems. We also learnt about Pattern API and the various operators we can use to define the pattern. In the final section, we tried to connect the dots and see one complete use case. With some changes, this setup can be used as it is present in various other domains as well. We will see how to use Flink's built-in Machine Learning library to solve complex problems. Resources for Article: Further resources on this subject: Getting Started with Apache Spark DataFrames [article] Getting Started with Apache Hadoop and Apache Spark [article] Integrating Scala, Groovy, and Flex Development with Apache Maven [article]
Read more
  • 0
  • 0
  • 7037

article-image-tabular-models
Packt
16 Jan 2017
15 min read
Save for later

Tabular Models

Packt
16 Jan 2017
15 min read
In this article by Derek Wilson, the author of the book Tabular Modeling with SQL Server 2016 Analysis Services Cookbook, you will learn the following recipes: Opening an existing model Importing data Modifying model relationships Modifying model measures Modifying model columns Modifying model hierarchies Creating a calculated table Creating key performance indicators (KPIs) Modifying key performance indicators (KPIs) Deploying a modified model (For more resources related to this topic, see here.) Once the new data is loaded into the model, we will modify various pieces of the model, including adding a new Key Performance Indicator. Next, we will perform calculations to see how to create and modify measures and columns. Opening an existing model We will open the model. To make modifications to your deployed models, we will need to open the model in the Visual Studio designer. How to do it… Open your solution, by navigating to File | Open | Project/Solution. Then select the folder and solution Chapter3_Model and select Open. Your solution is now open and ready for modification. How it works… Visual Studio stores the model as a project inside of a solution. In Chapter 3 we created a new project and saved it as Chapter3_Model. To make modifications to the model we open it in Visual Studio. Importing data The crash data has many columns that store the data in codes. In order to make this data useful for reporting, we need to add description columns. In this section, we will create four code tables by importing data into a SQL Server database. Then, we will add the tables to your existing model. Getting ready In the database on your SQL Server, run the following scripts to create the four tables and populate them with the reference data: Create the Major Cause of Accident Reference Data table: CREATE TABLE [dbo].[MAJCSE_T](   [MAJCSE] [int] NULL,   [MAJOR_CAUSE] [varchar](50) NULL ) ON [PRIMARY] Then, populate the table with data: INSERT INTO MAJCSE_T VALUES (20, 'Overall/rollover'), (21, 'Jackknife'), (31, 'Animal'), (32, 'Non-motorist'), (33, 'Vehicle in Traffic'), (35, 'Parked motor vehicle'), (37, 'Railway vehicle'), (40, 'Collision with bridge'), (41, 'Collision with bridge pier'), (43, 'Collision with curb'), (44, 'Collision with ditch'), (47, 'Collision culvert'), (48, 'Collision Guardrail - face'), (50, 'Collision traffic barrier'), (53, 'impact with Attenuator'), (54, 'Collision with utility pole'), (55, 'Collision with traffic sign'), (59, 'Collision with mailbox'), (60, 'Collision with Tree'), (70, 'Fire'), (71, 'Immersion'), (72, 'Hit and Run'), (99, 'Unknown') Create the table to store the lighting conditions at the time of the crash: CREATE TABLE [dbo].[LIGHT_T](   [LIGHT] [int] NULL,   [LIGHT_CONDITION] [varchar](30) NULL ) ON [PRIMARY] Now, populate the data that shows the descriptions for the codes: INSERT INTO LIGHT_T VALUES (1, 'Daylight'), (2, 'Dusk'), (3, 'Dawn'), (4, 'Dark, roadway lighted'), (5, 'Dark, roadway not lighted'), (6, 'Dark, unknown lighting'), (9, 'Unknown') Create the table to store the road conditions: CREATE TABLE [dbo].[CSRFCND_T](   [CSRFCND] [int] NULL,   [SURFACE_CONDITION] [varchar](50) NULL ) ON [PRIMARY] Now populate the road condition descriptions: INSERT INTO CSRFCND_T VALUES (1, 'Dry'), (2, 'Wet'), (3, 'Ice'), (4, 'Snow'), (5, 'Slush'), (6, 'Sand, Mud'), (7, 'Water'), (99, 'Unknown') Finally, create the weather table: CREATE TABLE [dbo].[WEATHER_T](   [WEATHER] [int] NULL,   [WEATHER_CONDITION] [varchar](30) NULL ) ON [PRIMARY] Then populate the weather condition descriptions. INSERT INTO WEATHER_T VALUES (1, 'Clear'), (2, 'Partly Cloudy'), (3, 'Cloudy'), (5, 'Mist'), (6, 'Rain'), (7, 'Sleet, hail, freezing rain'), (9, 'Severe winds'), (10, 'Blowing Sand'), (99, 'Unknown') You now have the tables and data required to complete the recipes in this chapter. How to do it… From your open model, change to the Diagram view in model.bim. Navigate to Model | Import from Data Source then select Microsoft SQL Server on the Table Import Wizard and click on Next. Set your Server Name to Localhost and change the Database name to Chapter3 and click on Next. Enter your admin account username and password and click on Next. You want to select from a list of tables the four tables that were created at the beginning. Click on Finish to import the data. How it works… This recipe opens the table import wizard and allows us to select the four new tables that are to be added to the existing model. The data is then imported into your Tabular Model workspace. Once imported, the data is now ready to be used to enhance the model. Modifying model relationships We will create the necessary relationships for the new tables. These relationships will be used in the model in order for the SSAS engine to perform correct calculations. How to do it… Open your model to the diagram view and you will see the four tables that you imported from the previous recipe. Select the CSRFCND field in the CSRFCND_T table and drag the CSRFCND table in the Crash_Data table. Select the LIGHT field in the LIGHT_T table and drag to the LIGHT table in the Crash_Data table. Select the MAJCSE field in the MAJCSE_T table and drag to the MAJCSE table in the Crash_Data table. Select the WEATHER field in the WEATHER_T table and drag to the WEATHER table in the Crash_Data table. How it works… Each table in this section has a relationship built between the code columns and the Crash_Data table corresponding columns. These relationships allow for DAX calculations to be applied across the data tables. Modifying model measures Now that there are more tables in the model, we are going to add an additional measure to perform quick calculations on data. The measure will use a simple DAX calculation since it is focused on how to add or modify the model measures. How to do it… Open the Chapter 3 model project to the Model.bim folder and make sure you are in grid view. Select the cell under Count_of_Crashes and in the fx bar add the following DAX formula to create Sum_of_Fatalities: Sum_of_Fatalities:=SUM(Crash_Data[FATALITIES]) Then, hit Enter to create the calculation: In the properties window, enter Injury_Calculations in the Display Folder. Then, change the Format to Whole Number and change the Show Thousand Separator to True. Finally, add to Description Total Number of Fatalities Recorded: How it works… In this recipe, we added a new measure to the existing model that calculates the total number of fatalities on the Crash_Data table. Then we added a new folder for the users to see the calculation. We also modified the default behavior of the calculation to display as a whole number and show commas to make the numbers easier to interpret. Finally, we added a description to the calculation that users will be able to see in the reporting tools. If we did not make these changes in the model, each user will be required to make the changes each time they accessed the model. By placing the changes in the model, everyone will see the data in the same format. Modifying model columns We will modify the properties of the columns on the WEATHER table. Modifications to the columns in a table make the information easier for your users to understand in the reporting tools. Some properties determine how the SSAS engine uses the fields when creating the model on the server. How to do it… In Model.bim, make sure you are in the grid view and change to the WEATHER_T tab. Select WEATHER column to view the available Properties and make the following changes: Hiddenproperty to True  Uniqueproperty to True Sort By ColumnselectWEATHER_CONDITION Summarize By to Count Next, select the WEATHER_CONDITION column and modify the following properties. Description add Weather at time of crash Default Labelproperty to True How it works… This recipe modified the properties of the measure to make it better for your report users to access the data. The WEATHER code column was hidden so it will not be visible in the reporting tools and the WEATHER_CONDITION was sorted in alphabetical order. You set the default aggregation to Count and then added a description for the column. Now, when this dimension is added to a report only the WEATHER_CONDITION column will be seen and pre-sorted based on the WEATHER_CONDITION field. It will also use count as the aggregation type to provide the number of each type of weather conditions. If you were to add another new description to the table, it would automatically be sorted correctly. Modifying model hierarchies Once you have created a hierarchy, you may want to remove or modify the hierarchy from your model. We will make modifications to the Calendar_YQMD hierarchy. How to do it… Open Model.bim to the diagram view and find the Master_Calendar_T table. Review the Calendar_YQMD hierarchy and included columns. Select the Quarter_Name column and right-click on it to bring up the menu. Select Remove from Hierarchy to delete Quarter_Name from the hierarchy and confirm on the next screen by selecting Remove from Hierarchy. Select the Calendar_YQMD hierarchy and right-click on it and select Rename. Change the name to Calendar_YMD and hit on Enter. How it works… In this recipe, we opened the diagram view and selected the Master_Calendar_T table to find the existing hierarchy. After selecting the Quarter_Name column in the hierarchy, we used the menus to view the available options for modifications. Then we selected the option to remove the column from the hierarchy. Finally, we updated the name of the hierarchy to let users know that the quarter column is not included. There’s more… Another option to remove fields from the hierarchy is to select the column and then press the delete key. Likewise, you can double-click on the Calendar_YQMD hierarchy to bring up the edit window for the name. Then edit the name and hit Enter to save the change in the designer. Creating a calculated table Calculated tables are created dynamically using functions or DAX queries. They are very useful if you need to create a new table based on information in another table. For example, you could have a date table with 30 years of data. However, most of your users only look at the last five years of information when running most of their analysis. Instead of creating a new table you can dynamically make a new table that only stores the last five years of dates. You will use a single DAX query to filter the Master_Calendar_T table to the last 5 years of data. How to do it… OpenModel.bim to the grid view and then select the Table menu and New Calculated Table. A new data tab is created. In the function box, enter this DAX formula to create a date calendar for the last 5 years: FILTER(MasterCalendar_T, MasterCalendar_T[Date]>=DATEADD(MasterCalendar_T[Date],6,YEAR)) Double-click on the CalculatedTable 1 tab and rename to Last_5_Years_T. How it works… It works by creating a new table in the model that is built from a DAX formula. In order to limit the number of years shown, the DAX formula reduces the total number of dates available for the last 5 years of dates. There’s more… After you create a calculated table, you will need to create the necessary relationships and hierarchies just like a regular table: Switch to the diagram view in the model.bim and you will be able to see the new table. Create a new hierarchy and name it Last_5_Years_YQM and include Year, Quarter_Name, Month_Name, and Date Replace the Master_Calendar_T relationship with the Date column from the Last_5_Years_T date column to the Crash_Date.Crash_Date column. Now, the model will only display the last 5 years of crash data when using the Last_5_Years_T table in the reporting tools. The Crash_Data table still contains all of the records if you need to view more than 5 years of data. Creating key performance indicators (KPIs) Key performance indicators are business metrics that show the effectiveness of a business objective. They are used to track actual performance against budgeted or planned value such as Service Level Agreements or On-Time performance. The advantage of creating a KPI is the ability to quickly see the actual value compared to the target value. To add a KPI, you will need to have a measure to use as the actual and another measure that returns the target value. In this recipe, we will create a KPI that tracks the number of fatalities and compares them to the prior year with the goal of having fewer fatalities each year. How to do it… Open the Model.bim to the grid view and select an empty cell and create a new measure named Last_Year_Fatalities:Last_Year_Fatalities:=CALCULATE(SUM(Crash_Data[FATALITIES]),DATEADD(MasterCalendar_T[Date],-1, YEAR)) Select the already existing Sum_of_measure then right-click and select Create KPI…. On the Key Performance Indicator (KPI) window, select Last_Year_Fatalities as the Target Measure. Then, select the second set of icons that have red, yellow, and green with symbols. Finally, change the KPI color scheme to green, yellow, and red and make the scores 90 and 97, and then click on OK. The Sum_of_Fatalites measure will now have a small graph next to it in the measure grid to show that there is a KPI on that measure. How it works… You created a new calculation that compared the actual count of fatalities compared to the same number for the prior year. Then you created a new KPI that used the actual and Last_Year_Fatalities measure. In the KPI window, you setup thresholds to determine when a KPI is red, yellow, or green. For this example, you want to show that having less fatalities year over year is better. Therefore, when the KPI is 97% or higher the KPI will show red. For values that are in the range of 90% to 97% the KPI is yellow and anything below 90% is green. By selecting the icons with both color and symbols, users that are color-blind can still determine the appropriate symbol of the KPI. Modifying key performance indicators (KPIs) Once you have created a KPI, you may want to remove or modify the KPI from your model. You will make modifications to the Last_Year_Fatalities hierarchy. How to do it… Open Model.bim to the Grid view and select the Sum_of_Fatalities measure then right-click to bring up Edit KPI settings…. Edit the appropriate settings to modify an existing KPI. How it works… Just like models, KPIs will need to be modified after being initially designed. The icon next to a measure denotes that a KPI is defined on the measure. Right-clicking on the measure brings up the menu that allows you to enter the Edit KPI setting. Deploying a modified model Once you have completed the changes to your model, you have two options for deployment. First, you can deploy the model and replace the existing model. Alternatively, you can change the name of your model and deploy it as a new model. This is often useful when you need to test changes and maintain the existing model as is. How to do it… Open the Chapter3_model project in Visual Studio. Select the Project menu and select Chapter3_Model Properties… to bring up the Properties menu and review the Server and Database properties. To overwrite an existing model make no changes and click on OK. Select the Build menu from the Chapter3_Model project and select the Deploy Chapter3_Model option. On the following screens, enter the impersonation credentials for your data and hit OK to deploy the changes. How it works… the model that is on your local machine and submits the changes to the server. By not making any changes to the existing model properties, a new deployment will overwrite the old model. All of your changes are now published on the server and users can begin to leverage the changes. There’s more… Sometimes you might want to deploy your model to a different database without overwriting the existing environment. This could be to try out a new model or test different functionality with users that you might want to implement. You can modify the properties of the project to deploy to a different server such as development, UAT, or production. Likewise, you can also change the database name to deploy the model to the same server or different servers for testing. Open the Project menu and then select Chapter3_Model Properties. Change the name of the Database to Chapter4_Model and click on OK. Next, on the Build menu, select Deploy Chapter3_Model to deploy the model to the same server under the new name of Chapter4_Model. When you review the Analysis Services databases in SQL Server Management Studio, you will now see a database for Chapter3_Model and Chapter4_Model. Summary After building a model, we will need to maintain and enhance the model as the business users update or change their requirements. We will begin by adding additional tables to the model that contain the descriptive data columns for several code columns. Then we will create relationships between these new tables and the existing data tables. Resources for Article: Further resources on this subject: Say Hi to Tableau [article] Data Tables and DataTables Plugin in jQuery 1.3 with PHP [article] Data Science with R [article]
Read more
  • 0
  • 0
  • 3000

article-image-how-add-intent-your-amazon-echo-skill
Antonio Cucciniello
16 Jan 2017
5 min read
Save for later

How to Add an Intent to Your Amazon Echo Skill

Antonio Cucciniello
16 Jan 2017
5 min read
If you are new to Amazon Echo Development and have no idea what an intent is, you have come to the right place. What is an intent, you ask? It is basically a way for your skill to know what to do when you say a specific phrase to your Amazon Echo. For example, If I said "Alexa, ask Greeter to say goodbye," and I wanted Alexa to respond with "Goodbye", I would create an intent that handles what the echo does (and ultimately how it responds) when it hears the phrase I gave it. Without adding your own intents, the echo skill would not be able to handle the different functions you would like it to. Now, this involves two different forms of work: one in your Amazon Developer Portal and the other in your code. Developer Portal First, we should talk about the Developer Portal work. There are two different parts to setting up your Developer Portal for an intent. Intent Schema When you are logged in to the Developer Portal for your skill, on the left hand side, go to the Interaction Model section. Once you are on that page, go to the Intent Schema section. Your Intent Schema is basically where you create your intents in the JSON format for Alexa to understand them. For a basic example, we will be creating an intent called HelloWorldIntent: Once your Intent Schema looks like this, you have now added a new intent to your project. In order for Alexa to know when you want to select this intent, you need to add Sample Utterances. Sample Utterances If you scroll down to the bottom of the page we were on to edit the Intent Schema, you will see a section called Sample Utterances. A Sample Utterance is a bunch of example phrases that a user would say to invoke this specific intent. This is how Alexa takes what you say and knows what to do with it. For example, we could add the statements in this image as Sample Utterances for our HelloWorldIntent: The format for this is: IntentName + the phrase you would like to say. So, now when a user is using this skill, he can say "Alexa, ask Greeter to say hello." Now Alexa would take that phrase and know what to do with it because you have coded to handle this intent. In Code Up to this point, everything you have done has been in the Amazon Developer Portal for your skill. Now that the boring part is out of the way, you can learn how to actually handle the intent's functionality through code. Intent Function The first step is to create a file in the project’s root directory called hello-world-intent.js. This is going to be a file that contains a function for handling your HelloWorldIntent. Here is what the code should look like for basic handling of the intent: // hello-world-intent.js module.exports = HelloWorldIntent function HelloWorldIntent (intent, session, response) { var output = 'Hello World' response.tell(output) return } If you are familiar with Node.js, you probably know of module.exports for allowing the programmer to add code from different files using require('./fileName.js'). This allows us to use the HelloWorldIntent function in our main js file. Our function is pretty basic, it takes three parameters: intent, session, and response. For the purpose of this lesson, we will only be using response. Response allows the user to send multiple types of responses through Alexa. For this example, we will be using response.tell(). At the begining of our function, we created a basic string that says "Hello World" and stored that in a variable called output. The line, response.tell(output), tells Alexa to respond to the user with "Hello World". Main file In your main js file (the one you use AWS Lambda to point to), add the following line in the top section of your file: // main.js var HelloWorldIntent = require('./hello-world-intent.js') This is why we used module.exports before: in order to be able to import your HelloWorldIntent function into the main file. Now, go to the section of your file where you have: ServiceName.prototype.intentHandlers = {}. This is how you connect the work you did in the Developer Portal to the work you did in your code. To add an intent handler for an intent, add a line inside the brackets like this: 'IntentNameInDevPortal' : IntentFunctionCodeName. So in the end your code should look as follows: GreeterService.protoype.intentHandlers = { 'HelloWorldIntent' : HelloWorldIntent } Conclusion Congrats! You have just added your first intent from scratch. To summarize what happens with your skill from start to finish, when invoking an intent, here is a quick breakdown: You say, "Alexa, ask Greeter to say hello." Alexa listens to what you are saying. She takes the words you said and tries to figure out what to do with them by comparing them to Sample Utterances for the intents you have created. She realizes you are invoking the HelloWorldIntent. She now executes the code you have established for a HelloWorldIntent (the code in the HelloWorldIntent Function). She responds with "Hello World." Possible Resources Use my skill here: Edit Docs Check out the Code for my skill on GitHub Alexa Skills Kit Custom Interaction Model Reference Implementing the Built-in Intents About the author Antonio is a Software Engineer with a background in C, C++, and Javascript (Node.Js) from New Jersey. His most recent project called Edit Docs is an Amazon Echo skill that allows users to edit Google Drive files using your voice. He loves building cool things with software, reading books on self-help and improvement, finance, and entrepreneurship. To contact Antonio, email him at Antonio.cucciniello16@gmail.com, follow him on twitter at @antocucciniello, and follow him on GitHub.
Read more
  • 0
  • 0
  • 15563
article-image-basic-operations-elasticsearch
Packt
16 Jan 2017
10 min read
Save for later

Basic Operations of Elasticsearch

Packt
16 Jan 2017
10 min read
In this article by Alberto Maria Angelo Paro, the author of the book ElasticSearch 5.0 Cookbook - Third Edition, you will learn the following recipes: Creating an index Deleting an index Opening/closing an index Putting a mapping in an index Getting a mapping (For more resources related to this topic, see here.) Creating an index The first operation to do before starting indexing data in Elasticsearch is to create an index--the main container of our data. An index is similar to the concept of database in SQL, a container for types (tables in SQL) and documents (records in SQL). Getting ready To execute curl via the command line you need to install curl for your operative system. How to do it... The HTTP method to create an index is PUT (but also POST works); the REST URL contains the index name: http://<server>/<index_name> For creating an index, we will perform the following steps: From the command line, we can execute a PUT call: curl -XPUT http://127.0.0.1:9200/myindex -d '{ "settings" : { "index" : { "number_of_shards" : 2, "number_of_replicas" : 1 } } }' The result returned by Elasticsearch should be: {"acknowledged":true,"shards_acknowledged":true} If the index already exists, a 400 error is returned: { "error" : { "root_cause" : [ { "type" : "index_already_exists_exception", "reason" : "index [myindex/YJRxuqvkQWOe3VuTaTbu7g] already exists", "index_uuid" : "YJRxuqvkQWOe3VuTaTbu7g", "index" : "myindex" } ], "type" : "index_already_exists_exception", "reason" : "index [myindex/YJRxuqvkQWOe3VuTaTbu7g] already exists", "index_uuid" : "YJRxuqvkQWOe3VuTaTbu7g", "index" : "myindex" }, "status" : 400 } How it works... Because the index name will be mapped to a directory on your storage, there are some limitations to the index name, and the only accepted characters are: ASCII letters [a-z] Numbers [0-9] point ".", minus "-", "&" and "_" During index creation, the replication can be set with two parameters in the settings/index object: number_of_shards, which controls the number of shards that compose the index (every shard can store up to 2^32 documents) number_of_replicas, which controls the number of replica (how many times your data is replicated in the cluster for high availability)A good practice is to set this value at least to 1. The API call initializes a new index, which means: The index is created in a primary node first and then its status is propagated to all nodes of the cluster level A default mapping (empty) is created All the shards required by the index are initialized and ready to accept data The index creation API allows defining the mapping during creation time. The parameter required to define a mapping is mapping and accepts multi mappings. So in a single call it is possible to create an index and put the required mappings. There's more... The create index command allows passing also the mappings section, which contains the mapping definitions. It is a shortcut to create an index with mappings, without executing an extra PUT mapping call: curl -XPOST localhost:9200/myindex -d '{ "settings" : { "number_of_shards" : 2, "number_of_replicas" : 1 }, "mappings" : { "order" : { "properties" : { "id" : {"type" : "keyword", "store" : "yes"}, "date" : {"type" : "date", "store" : "no" , "index":"not_analyzed"}, "customer_id" : {"type" : "keyword", "store" : "yes"}, "sent" : {"type" : "boolea+n", "index":"not_analyzed"}, "name" : {"type" : "text", "index":"analyzed"}, "quantity" : {"type" : "integer", "index":"not_analyzed"}, "vat" : {"type" : "double", "index":"no"} } } } }' Deleting an index The counterpart of creating an index is deleting one. Deleting an index means deleting its shards, mappings, and data. There are many common scenarios when we need to delete an index, such as: Removing the index to clean unwanted/obsolete data (for example, old Logstash indices). Resetting an index for a scratch restart. Deleting an index that has some missing shard, mainly due to some failures, to bring back the cluster in a valid state (if a node dies and it's storing a single replica shard of an index, this index is missing a shard so the cluster state becomes red. In this case, you'll bring back the cluster to a green status, but you lose the data contained in the deleted index). Getting ready To execute curl via command line you need to install curl for your operative system. The index created is required to be deleted. How to do it... The HTTP method used to delete an index is DELETE. The following URL contains only the index name: http://<server>/<index_name> For deleting an index, we will perform the steps given as follows: Execute a DELETE call, by writing the following command: curl -XDELETE http://127.0.0.1:9200/myindex We check the result returned by Elasticsearch. If everything is all right, it should be: {"acknowledged":true} If the index doesn't exist, a 404 error is returned: { "error" : { "root_cause" : [ { "type" : "index_not_found_exception", "reason" : "no such index", "resource.type" : "index_or_alias", "resource.id" : "myindex", "index_uuid" : "_na_", "index" : "myindex" } ], "type" : "index_not_found_exception", "reason" : "no such index", "resource.type" : "index_or_alias", "resource.id" : "myindex", "index_uuid" : "_na_", "index" : "myindex" }, "status" : 404 } How it works... When an index is deleted, all the data related to the index is removed from disk and is lost. During the delete processing, first the cluster is updated, and then the shards are deleted from the storage. This operation is very fast; in a traditional filesystem it is implemented as a recursive delete. It's not possible restore a deleted index, if there is no backup. Also calling using the special _all index_name can be used to remove all the indices. In production it is good practice to disable the all indices deletion by adding the following line to Elasticsearch.yml: action.destructive_requires_name:true Opening/closing an index If you want to keep your data, but save resources (memory/CPU), a good alternative to delete indexes is to close them. Elasticsearch allows you to open/close an index to put it into online/offline mode. Getting ready To execute curl via the command line you need to install curl for your operative system. How to do it... For opening/closing an index, we will perform the following steps: From the command line, we can execute a POST call to close an index using: curl -XPOST http://127.0.0.1:9200/myindex/_close If the call is successful, the result returned by Elasticsearch should be: {,"acknowledged":true} To open an index, from the command line, type the following command: curl -XPOST http://127.0.0.1:9200/myindex/_open If the call is successful, the result returned by Elasticsearch should be: {"acknowledged":true} How it works... When an index is closed, there is no overhead on the cluster (except for metadata state): the index shards are switched off and they don't use file descriptors, memory, and threads. There are many use cases when closing an index: Disabling date-based indices (indices that store their records by date), for example, when you keep an index for a week, month, or day and you want to keep online a fixed number of old indices (that is, two months) and some offline (that is, from two months to six months). When you do searches on all the active indices of a cluster and don't want search in some indices (in this case, using alias is the best solution, but you can achieve the same concept of alias with closed indices). An alias cannot have the same name as an index When an index is closed, calling the open restores its state. Putting a mapping in an index We saw how to build mapping by indexing documents. This recipe shows how to put a type mapping in an index. This kind of operation can be considered as the Elasticsearch version of an SQL created table. Getting ready To execute curl via the command line you need to install curl for your operative system. How to do it... The HTTP method to put a mapping is PUT (also POST works). The URL format for putting a mapping is: http://<server>/<index_name>/<type_name>/_mapping For putting a mapping in an index, we will perform the steps given as follows: If we consider the type order, the call will be: curl -XPUT 'http://localhost:9200/myindex/order/_mapping' -d '{ "order" : { "properties" : { "id" : {"type" : "keyword", "store" : "yes"}, "date" : {"type" : "date", "store" : "no" , "index":"not_analyzed"}, "customer_id" : {"type" : "keyword", "store" : "yes"}, "sent" : {"type" : "boolean", "index":"not_analyzed"}, "name" : {"type" : "text", "index":"analyzed"}, "quantity" : {"type" : "integer", "index":"not_analyzed"}, "vat" : {"type" : "double", "index":"no"} } } }' In case of success, the result returned by Elasticsearch should be: {"acknowledged":true} How it works... This call checks if the index exists and then it creates one or more type mapping as described in the definition. During mapping insert if there is an existing mapping for this type, it is merged with the new one. If there is a field with a different type and the type could not be updated, an exception expanding fields property is raised. To prevent an exception during the merging mapping phase, it's possible to specify the ignore_conflicts parameter to true (default is false). The put mapping call allows you to set the type for several indices in one shot; list the indices separated by commas or to apply all indexes using the _all alias. There's more… There is not a delete operation for mapping. It's not possible to delete a single mapping from an index. To remove or change a mapping you need to manage the following steps: Create a new index with the new/modified mapping Reindex all the records Delete the old index with incorrect mapping Getting a mapping After having set our mappings for processing types, we sometimes need to control or analyze the mapping to prevent issues. The action to get the mapping for a type helps us to understand structure or its evolution due to some merge and implicit type guessing. Getting ready To execute curl via command-line you need to install curl for your operative system. How to do it… The HTTP method to get a mapping is GET. The URL formats for getting mappings are: http://<server>/_mapping http://<server>/<index_name>/_mapping http://<server>/<index_name>/<type_name>/_mapping To get a mapping from the type of an index, we will perform the following steps: If we consider the type order of the previous chapter, the call will be: curl -XGET 'http://localhost:9200/myindex/order/_mapping?pretty=true' The pretty argument in the URL is optional, but very handy to pretty print the response output. The result returned by Elasticsearch should be: { "myindex" : { "mappings" : { "order" : { "properties" : { "customer_id" : { "type" : "keyword", "store" : true }, … truncated } } } } } How it works... The mapping is stored at the cluster level in Elasticsearch. The call checks both index and type existence and then it returns the stored mapping. The returned mapping is in a reduced form, which means that the default values for a field are not returned. Elasticsearch stores only not default field values to reduce network and memory consumption. Retrieving a mapping is very useful for several purposes: Debugging template level mapping Checking if implicit mapping was derivated correctly by guessing fields Retrieving the mapping metadata, which can be used to store type-related information Simply checking if the mapping is correct If you need to fetch several mappings, it is better to do it at index level or cluster level to reduce the numbers of API calls. Summary We learned how to manage indices and perform operations on documents. We'll discuss different operations on indices such as create, delete, update, open, and close. These operations are very important because they allow better define the container (index) that will store your documents. The index create/delete actions are similar to the SQL create/delete database commands. Resources for Article: Further resources on this subject: Elastic Stack Overview [article] Elasticsearch – Spicing Up a Search Using Geo [article] Downloading and Setting Up ElasticSearch [article]
Read more
  • 0
  • 0
  • 5843

article-image-web-framework-behavior-tuning
Packt
12 Jan 2017
8 min read
Save for later

Web Framework Behavior Tuning

Packt
12 Jan 2017
8 min read
In this article by Alex Antonov, the author of the book Spring Boot Cookbook – Second Edition, learn to use and configure spring resources and build your own Spring-based application using Spring Boot. In this article, you will learn about the following topics: Configuring route matching patterns Configuring custom static path mappings Adding custom connectors (For more resources related to this topic, see here.) Introduction We will look into enhancing our web application by doing behavior tuning, configuring the custom routing rules and patterns, adding additional static asset paths, and adding and modifying servlet container connectors and other properties, such as enabling SSL. Configuring route matching patterns When we build web applications, it is not always the case that a default, out-of-the-box, mapping configuration is applicable. At times, we want to create our RESTful URLs that contain characters such as . (dot), which Spring treats as a delimiter defining format, like path.xml, or we might not want to recognize a trailing slash, and so on. Conveniently, Spring provides us with a way to get this accomplished with ease. Let's imagine that the ISBN format does allow the use of dots to separate the book number from the revision with a pattern looking like [isbn-number].[revision]. How to do it… We will configure our application to not use the suffix pattern match of .* and not to strip the values after the dot when parsing the parameters. Let's perform the following steps: Let's add the necessary configuration to our WebConfiguration class with the following content: @Override public void configurePathMatch(PathMatchConfigurer configurer) { configurer.setUseSuffixPatternMatch(false). setUseTrailingSlashMatch(true); } Start the application by running ./gradlew clean bootRun. Let's open http://localhost:8080/books/978-1-78528-415-1.1 in the browser to see the following results: If we enter the correct ISBN, we will see a different result, as shown in the following screenshot: How it works… Let's look at what we did in detail. The configurePathMatch(PathMatchConfigurer configurer) method gives us an ability to set our own behavior in how we want Spring to match the request URL path to the controller parameters: configurer.setUseSuffixPatternMatch(false): This method indicates that we don't want to use the .* suffix so as to strip the trailing characters after the last dot. This translates into Spring parsing out 978-1-78528-415-1.1 as an {isbn} parameter for BookController. So, http://localhost:8080/books/978-1-78528-415-1.1 and http://localhost:8080/books/978-1-78528-415-1 will become different URLs. configurer.setUseTrailingSlashMatch(true): This method indicates that we want to use the trailing / in the URL as a match, as if it were not there. This effectively makes http://localhost:8080/books/978-1-78528-415-1 the same as http://localhost:8080/books/978-1-78528-415-1/. If you want to do further configuration on how the path matching takes place, you can provide your own implementation of PathMatcher and UrlPathHelper, but these will be required in the most extreme and custom-tailored situations and are not generally recommended. Configuring custom static path mappings It is possible to control how our web application deals with static assets and the files that exist on the filesystem or are bundled in the deployable archive. Let's say that we want to expose our internal application.properties file via the static web URL of http://localhost:8080/internal/application.properties from our application. To get started with this, proceed with the steps in the next section. How to do it… Let's add a new method, addResourceHandlers, to the WebConfiguration class with the following content: @Override public void addResourceHandlers(ResourceHandlerRegistry registry) { registry.addResourceHandler("/internal/**").addResourceLocations("classpath:/"); } Start the application by running ./gradlew clean bootRun. Let's open http://localhost:8080/internal/application.properties in the browser to see the following results: How it works… The method that we overrode, addResourceHandlers(ResourceHandlerRegistry registry), is another configuration method from WebMvcConfigurer, which gives us an ability to define custom mappings for static resource URLs and connect them with the resources on the filesystem or application classpath. In our case, we defined a mapping of anything that is being accessed via the / internal URL to be looked for in classpath:/ of our application. (For production environment, you probably don't want to expose the entire classpath as a static resource!) So, let's take a detailed look at what we did, as follows: registry.addResourceHandler("/internal/**"): This method adds a resource handler to the registry to handle our static resources, and it returns ResourceHandlerRegistration to us, which can be used to further configure the mapping in a chained fashion. /internal/** is a path pattern that will be used to match against the request URL using PathMatcher. We have seen how PathMatcher can be configured in the previous example but, by default, an AntPathMatcher implementation is used. We can configure more than one URL pattern to be matched to a particular resource location. addResourceLocations("classpath:/"):This method is called on the newly created instance of ResourceHandlerRegistration, and it defines the directories where the resources should be loaded from. These should be valid filesystems or classpath directories, and there can be more than one entered. If multiple locations are provided, they will be checked in the order in which they were entered. setCachePeriod (Integer cachePeriod): Using this method, we can also configure a caching interval for the given resource by adding custom connectors. Another very common scenario in the enterprise application development and deployment is to run the application with two separate HTTP port connectors: one for HTTP and the other for HTTPS. Adding custom connectors Another very common scenario in the enterprise application development and deployment is to run the application with two separate HTTP port connectors: one for HTTP and the other for HTTPS. Getting ready For this recipe, we will undo the changes that we implemented in the previous example. In order to create an HTTPS connector, we will need a few things; but, most importantly, we will need to generate a certificate keystore that is used to encrypt and decrypt the SSL communication with the browser. If you are using Unix or Mac, you can do it by running the following command: $JAVA_HOME/bin/keytool -genkey -alias tomcat -keyalg RSA On Windows, this can be achieved via the following code: "%JAVA_HOME%binkeytool" -genkey -alias tomcat -keyalg RSA During the creation of the keystore, you should enter the information that is appropriate to you, including passwords, name, and so on. For the purpose of this book, we will use the default password: changeit. Once the execution is complete, a newly generated keystore file will appear in your home directory under the name .keystore. You can find more information about preparing the certificate keystore at https://tomcat.apache.org/tomcat-8.0-doc/ssl-howto.html#Prepare_the_Certificate_Keystore. How to do it… With the keystore creation complete, we will need to create a separate properties file in order to store our configuration for the HTTPS connector, such as port and others. After that, we will create a configuration property binding object and use it to configure our new connector. Perform the following steps: First, we will create a new properties file named tomcat.https.properties in the src/main/resources directory from the root of our project with the following content: custom.tomcat.https.port=8443 custom.tomcat.https.secure=true custom.tomcat.https.scheme=https custom.tomcat.https.ssl=true custom.tomcat.https.keystore=${user.home}/.keystore custom.tomcat.https.keystore-password=changeit Next, we will create a nested static class named TomcatSslConnectorProperties in our WebConfiguration, with the following content: @ConfigurationProperties(prefix = "custom.tomcat.https") public static class TomcatSslConnectorProperties { private Integer port; private Boolean ssl= true; private Boolean secure = true; private String scheme = "https"; private File keystore; private String keystorePassword; //Skipping getters and setters to save space, but we do need them public void configureConnector(Connector connector) { if (port != null) connector.setPort(port); if (secure != null) connector.setSecure(secure); if (scheme != null) connector.setScheme(scheme); if (ssl!= null) connector.setProperty("SSLEnabled", ssl.toString()); if (keystore!= null &&keystore.exists()) { connector.setProperty("keystoreFile", keystore.getAbsolutePath()); connector.setProperty("keystorePassword", keystorePassword); } } } Now, we will need to add our newly created tomcat.http.properties file as a Spring Boot property source and enable TomcatSslConnectorProperties to be bound. This can be done by adding the following code right above the class declaration of the WebConfiguration class: @Configuration @PropertySource("classpath:/tomcat.https.properties") @EnableConfigurationProperties(WebConfiguration.TomcatSslConnectorProperties.class) public class WebConfiguration extends WebMvcConfigurerAdapter {...} Finally, we will need to create an EmbeddedServletContainerFactory Spring bean where we will add our HTTPS connector. We will do that by adding the following code to the WebConfiguration class: @Bean public EmbeddedServletContainerFactory servletContainer(TomcatSslConnectorProperties properties) { TomcatEmbeddedServletContainerFactory tomcat = new TomcatEmbeddedServletContainerFactory(); tomcat.addAdditionalTomcatConnectors( createSslConnector(properties)); return tomcat; } private Connector createSslConnector(TomcatSslConnectorProperties properties) { Connector connector = new Connector(); properties.configureConnector(connector); return connector; } Start the application by running ./gradlew clean bootRun. Let's open https://localhost:8443/internal/tomcat.https.properties in the browser to see the following results: Summary In this article, you learned how to fine-tune the behavior of a web application. This article has given a small gist about custom routes, asset paths, and amending routing patterns. You also learned how to add more connectors to the servlet container. Resources for Article: Further resources on this subject: Introduction to Spring Framework [article] Setting up Microsoft Bot Framework Dev Environment [article] Creating our first bot, WebBot [article]
Read more
  • 0
  • 0
  • 18275

article-image-professional-environment-react-native-part-2
Pierre Monge
12 Jan 2017
4 min read
Save for later

A Professional Environment for React Native, Part 2

Pierre Monge
12 Jan 2017
4 min read
In Part 1 of this series, I covered the full environment and everything you need to start creating your own React Native applications. Now here in Part 2, we are going to dig in and go over the tools that you can take advantage of for maintaining those React Native apps. Maintaining the application Maintaining a React Native application, just like any software, is very complex and requires a lot of organization. In addition to having strict code (a good syntax with eslint or a good understanding of the code with flow), you must have intelligent code, and you must organize your files, filenames, and variables. It is necessary to have solutions for the maintenance of the application in the long term as well as have tools that provide feedback. Here are some tools that we use, which should be in place early in the cycle of your React Native development. GitHub GitHub is a fantastic tool, but you need to know how to control it. In my company, we have our own Git flow with a Dev branch, a master branch, release branches, bugs and other useful branches. It's up to you to make your own flow for Git! One of the most important things is the Pull Request, or the PR! And if there are many people on your project, it is important for your group to agree on the organization of the code. BugTracker & Tooling We use many tools in my team, but here is our Must-Have list to maintain the application: circleCI: This is a continuous integration tool that we integrate with GitHub. It allows us to pass recurrent tests with each new commit. BugSnag: This is a bug tracking tool that can be used in a React Native integration, which makes it possible to raise user bugs by the webs without the user noticing it. codePush: This is useful for deploying code on versions already in production. And yes, you can change business code while the application is already in production. I do not pay much attention to it, yet the states of applications (Debug, Beta, and Production) are a big part that has to be treated because it is a workset to have for quality work and a long application life. We also have quality assurance in our company, which allows us to validate a product before it is set up, which provides a regular process of putting a React Native app into production. As you can see, there are many tools that will help you maintain a React Native mobile application. Despite the youthfulness of the product, the community grows quickly and developers are excited about creating apps. There are more and more large companies using React Native, such as AirBnB , Wix, Microsoft, and many others. And with the technology growing and improving, there are more and more new tools and integrations coming to React Native. I hope this series has helped you create and maintain your own React Native applications. Here is a summary of the tools covered: Atom is a text editor that's modern, approachable, yet hackable to the core—a tool that you can customize to do anything, but you also need to use it productively without ever touching a config file. GitHub is a web-based Git repository hosting service. CircleCI is a modern continuous integration and delivery platform that software teams love to use. BugSnag monitors application errors to improve customer experiences and code quality. react-native-code-push is a plugin that provides client-side integration, allowing you to easily add a dynamic update experience to your React Native app. About the author Pierre Monge (liroo.pierre@gmail.com) is a 21-year-old student. He is a developer in C, JavaScript, and all things web development, and he has recently been creating mobile applications. He is currently working as an intern at a company named Azendoo, where he is developing a 100% React Native application.
Read more
  • 0
  • 0
  • 4253
article-image-metric-analytics-metricbeat
Packt
11 Jan 2017
5 min read
Save for later

Metric Analytics with Metricbeat

Packt
11 Jan 2017
5 min read
In this article by Bahaaldine Azarmi, the author of the book Learning Kibana 5.0, we will learn about metric analytics, which is fundamentally different in terms of data structure. (For more resources related to this topic, see here.) Author would like to spend a few lines on the following question: What is a metric? A metric is an event that contains a timestamp and usually one or more numeric values. It is appended to a metric file sequentially, where all lines of metrics are ordered based on the timestamp. As an example, here are a few system metrics: 02:30:00 AM    all    2.58    0.00    0.70    1.12    0.05     95.5502:40:00 AM    all    2.56    0.00    0.69    1.05    0.04     95.6602:50:00 AM    all    2.64    0.00    0.65    1.15    0.05     95.50 Unlike logs, metrics are sent periodically, for example, every 10 minutes (as the preceding example illustrates) whereas logs are usually appended to the log file when something happens. Metrics are often used in the context of software or hardware health monitoring, such as resource utilization monitoring, database execution metrics monitoring, and so on. Since version 5.0, Elastic had, at all layers of the solutions, new features to enhance the user experience of metrics management and analytics. Metricbeat is one of the new features in 5.0. It allows the user to ship metrics data, whether from the machine or from applications, to Elasticsearch, and comes with out-of-the-box dashboards for Kibana. Kibana also integrates Timelion with its core, a plugin which has been made for manipulating numeric data, such as metrics. In this article, we'll start by working with Metricbeat. Metricbeat in Kibana The procedure to import the dashboard has been laid out in the subsequent section. Importing the dashboard Before importing the dashboard, let's have a look at the actual metric data that Metricbeat ships. As I have Chrome opened while typing this article, I'm going to filter the data by process name, here chrome: Discover tab filtered by process name   Here is an example of one of the documents I have: { "_index": "metricbeat-2016.09.06", "_type": "metricsets", "_id": "AVcBFstEVDHwfzZYZHB8", "_score": 4.29527, "_source": { "@timestamp": "2016-09-06T20:00:53.545Z", "beat": { "hostname": "MacBook-Pro-de-Bahaaldine.local", "name": "MacBook-Pro-de-Bahaaldine.local" }, "metricset": { "module": "system", "name": "process", "rtt": 5916 }, "system": { "process": { "cmdline": "/Applications/Google Chrome.app/Contents/Versions/52.0.2743.116/Google Chrome Helper.app/Contents/MacOS/Google Chrome Helper --type=ppapi --channel=55142.2188.1032368744 --ppapi-flash-args --lang=fr", "cpu": { "start_time": "09:52", "total": { "pct": 0.0035 } }, "memory": { "rss": { "bytes": 67813376, "pct": 0.0039 }, "share": 0, "size": 3355303936 }, "name": "Google Chrome H", "pid": 76273, "ppid": 55142, "state": "running", "username": "bahaaldine" } }, "type": "metricsets" }, "fields": { "@timestamp": [ 1473192053545 ] } } Metricbeat document example The preceding document breaks down the utilization of resources for the chrome process. We can see, for example, the usage of CPU and memory, as well as the state of the process as a whole. Now how about visualizing the data in an actual dashboard? To do so, go into the Kibana folder located in the Metricbeat installation directory: MacBook-Pro-de-Bahaaldine:kibana bahaaldine$ pwd /elastic/metricbeat-5.0.0/kibana MacBook-Pro-de-Bahaaldine:kibana bahaaldine$ ls dashboard import_dashboards.ps1 import_dashboards.sh index-pattern search visualization import_dashboards.sh is the file we will use to import the dashboards in Kibana. Execute the file script like the following: ./import_dashboards.sh –h This should print out the help, which, essentially, will give you the list of arguments you can pass to the script. Here, we need to specify a username and a password as we are using the X-Pack security plugin, which secures our cluster: ./import_dashboards.sh –u elastic:changeme You should normally get a bunch of logs stating that dashboards have been imported, as shown in the following example: Import visualization Servers-overview: {"_index":".kibana","_type":"visualization","_id":"Servers-overview","_version":4,"forced_refresh":false,"_shards":{"total":2,"successful":1,"failed":0},"created":false} Now, at this point, you have metric data in Elasticsearch and dashboards created in Kibana, so you can now visualize the data. Visualizing metrics If you go back into the Kibana/dashboard section and try to open the Metricbeat System Statistics dashboard, you should get something similar to the following: Metricbeat Kibana dashboard You should see in your own dashboard the metric based on the processes that are running on your computer. In my case, I have a bunch of them for which I can visualize the CPU and memory utilization, for example: RAM and CPU utilization As an example, what can be important here is to be sure that Metricbeat has a very low footprint on the overall system in terms of CPU or RAM, as shown here: Metricbeat resource utilization As we can see in the preceding diagram, Metricbeat only uses about 0.4% of the CPU and less than 0.1% of the memory on my Macbook Pro. On the other hand, if I want to get the most resource-consuming processes, I can check in the Top processes data table, which gives the following information: Top processes Besides Google Chrome H, which uses a lot of CPU, zoom.us, a conferencing application, seems to bring a lot of stress to my laptop. Rather than using the Kibana standard visualization to manipulate our metrics, we'll use Timelion instead, and focus on this heavy CPU consuming processes use case. Summary In this article, we have seen how we can use Kibana in the context of technical metric analytics. We relied on the data that Metricbeat is able to ship from a machine and visualized the result both in Kibana dashboard and in Kibana Timelion. Resources for Article: Further resources on this subject: An Introduction to Kibana [article] Big Data Analytics [article] Static Data Management [article]
Read more
  • 0
  • 0
  • 3342

article-image-scale-your-django-app
Jean Jung
11 Jan 2017
6 min read
Save for later

Scale your Django App: Gunicorn + Apache + Nginx

Jean Jung
11 Jan 2017
6 min read
One question when starting with Django is "How do I scale my app"? Brandon Rhodes has answered this question in Foundations of Python Network Programming. Rhodes shows us different options, so in this post we will focus on the preferred and main option: Gunicorn + Apache + Nginx. The idea of this architecture is to have Nginx as a proxy to delegate dynamic content to Gunicorn and static content to Apache. As Django, by itself, does not handle static content, and Apache does it very well, we can take advantages from that. Below we will see how to configure everything. Environment Project directory: /var/www/myproject Apache2 Nginx Gunicorn Project settings STATIC_ROOT: /var/www/myproject/static STATIC_URL: /static/ MEDIA_ROOT: /var/www/myproject/media MEDIA_URL: /media/ ALLOWED_HOSTS: myproject.com Gunicorn Gunicorn is a Python and WSGI compatible server, making it our first option when working with Django. It’s possible to install Gunicorn from pip by running: pip install gunicorn To run the Gunicorn server, cd to your project directoryand run: gunicorn myproject.wsgi:application -b localhost:8000 By default, Gunicorn runs just one worker to serve the pages. If you feel you need more workers, you can start them by passing the number of workers to the --workers option. Gunicorn also runs in the foreground, but you need to configure a service on your server. This, however, is not the focus of this post. Visit localhost:8000 on your browser and see that your project is working. You will probably see that your static wasn’t accessible. This is because Django itself cannot serve static files, and Gunicorn is not configured to serve them too. Let’s fix that with Apache in the next section. If your page does not work here, check if you are using a virtualenv and if it is enabled on the Gunicorn running process. Apache Installing Apache takes some time and is not the focus of this post; additionally, a great majority of the readers will already have Apache, so if you don’t know how to install Apache, follow this guide. If you already have configured Apache to serve static content, this one will be very similar to what you have done. If you have never done that, do not be afraid; it will be easy! First of all, change the listening port from Apache. Currently, on Apache2, edit the /etc/apache2/ports.conf and change the line: Listen 80 To: Listen 8001 You can choose other ports too; just be sure to adjust the permissions on the static and media files dir to match the current Apache running user needs. Create a file at /etc/apache2/sites-enabled/myproject.com.conf and add this content: <VirtualHost *:8001> ServerName static.myproject.com ServerAdmin webmaster@localhost CustomLog ${APACHE_LOG_DIR}/static.myproject.com-access.log combined ErrorLog ${APACHE_LOG_DIR}/static.myproject.com-error.log # Possible values include: debug, info, notice, warn, error, crit, # alert, emerg. LogLevel warn DocumentRoot /var/www/myproject Alias /static/ /var/www/myproject/static/ Alias /media/ /var/www/myproject/media/ <Directory /var/www/myproject/static> Require all granted </Directory> <Directory /var/www/myproject/media> Require all granted </Directory> </VirtualHost> Be sure to replace everything needed to fit your project needs. But your project still does not work well because Gunicorn does not know about Apache, and we don’t want it to know anything about that. This is because we will use Nginx, covered in the next session. Nginx Nginx is a very light and powerful web server. It is different from Apache, and it does not spawn a new process for every request, so it works very well as a proxy. As I’ve said, when installing Apache, you would lead to this reference to know how to install Nginx. Proxy configuration is very simple in Nginx; just create a file at /etc/nginx/conf.d/myproject.com.conf and put: upstream dynamic { server 127.0.0.1:8000; } upstream static { server 127.0.0.1:8001; } server { listen 80; server_name myproject.com; # Root request handler to gunicorn upstream location / { proxy_pass http://dynamic; } # Static request handler to apache upstream location /static { proxy_pass http://static/static; } # Media request handler to apache upstream location /media { proxy_pass http://static/media; } proxy_set_header X-Real-IP $remote_addr; proxy_set_header Host $host; proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for; error_page 500 502 503 504 /50x.html; } This way, you have everything working on your machine! If you have more than one machine, you can dedicate the machines to deliver static/dynamic contents. The machine where Nginx runs is the proxy, and it needs to be visible from the Internet. The machines running Apache or Gunicorn can be visible only from your local network. If you follow this architecture, you can just change the Apache and Gunicorn configurations to listen to the default ports, adjust the domain names, and set the Nginx configuration to deliver the connections over the new domains. Where to go now? For more details on deploying Gunicorn with Nginx, see the Gunicorn deployment page. You would like to see the Apache configuration page and the Nginx getting started page to have more information about scalability and security. Summary In this post you saw how to configure Nginx, Apache, and Gunicorn servers to deliver your Django app over a proxy environment, balancing your requests through Apache and Gunicorn. There was a state about how to start more Gunicorn workers and where to find details about scaling each of the servers being used. References [1] - PEP 3333 - WSGI [2] - Gunicorn Project [3] - Apache Project [4] - Nginx Project [5] - Rhodes, B. & Goerzen, J. (2014). Foundations of Python Network Programming. New York, NY: Apress. About the author Jean Jung is a Brazilian developer passionate about technology. Currently a System Analyst at EBANX, an international payment processing cross boarder for Latin America, he's very interested in Python and Artificial Intelligence, specifically Machine Learning, Compilers, and Operational Systems. As a hobby, he's always looking for IoT projects with Arduino.
Read more
  • 0
  • 0
  • 7707
Modal Close icon
Modal Close icon