Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletter Hub
Free Learning
Arrow right icon
timer SALE ENDS IN
0 Days
:
00 Hours
:
00 Minutes
:
00 Seconds

How-To Tutorials

7014 Articles
article-image-turn-based-game-framework-duality-c
LőrincSerfőző
17 Jan 2017
6 min read
Save for later

Turn-based game framework in Duality (C#)

LőrincSerfőző
17 Jan 2017
6 min read
This post presents a simple turn-based game framework using the Duality game engine. The engine is written in C#, and scripting it requires the same language. The word 'scripting' is a bit misleading here, because it is more of the process of extending the engine. In this text, the inner workings of Duality are not explained in detail, as it is better done by the official documentation on GitHub. However, if you are familiar with the vocabulary of game development, the tool is rather easy to pick up, and this guide can be also followed. In addition, these concepts can be tailored to fit any other game framework or engine. Required tools Duality can be downloaded from the official site. A C# compiler and a text editor are also needed. Visual Studio 2013 or higher is recommended, but other IDEs like MonoDevelop also work. Overall model Turn-based games were popular long before the computer era, and even these days they are among the most successful releases. These games often require analytical thinking instead of lightning-fast reflexes, and favor longer term strategy over instinctive decisions. The following list gathers the most typical attributes of the genre, which we need to consider while building a turn-based game: The game process is divided into turns. Usually every player takes action once per turn. Everyone has to wait for their opportunity. The order of the players' actions can be fixed, or based on some sort of mechanic of initiative. In the latter case, the order can change between turns. During the game, some players drop out, some others can join. The system has to handle these changes. Implementation In the following paragraph, a prototype-quality system is described in order to achieve these goals. In addition to discrete time measurement, turn-based games often utilize discrete measurement of space, for example a grid based movement system. We are implementing that as well. The solution contains two distinct building blocks: a manager object an entity object which has multiple instances. The manager object, as its name suggests,arranges the order of the entities taking action in the turn and asks them to decide their own action, a movement direction in our case. It should not distinguish between player-controlled entities and AI ones. Thus the actual logic behind the entities can be various, but they need to implement the same methods—it is clear that we need an interface language struct for that. Defining the ICmpMovementEntity interface publicenum Decision { NotDecided, Stay, UpMove, RightMove, DownMove, LeftMove } public interface ICmpMovementEntity // [1] { int Initiative { get; set; } // [2] Decision RequestDecision (); // [3] } The interface's name has the ICmp prefix. This follows a convention in Duality, and indicates that only Component objects should implement that. Initiative returns an integer, used by the manager object to determine the order of entities in the turn. The method RequestDecision returns the enum type Decision. Its value is NotDecided, when there is no decision yet. In that case, the same entity is asked at the next game loop update. If the returned value is Stay, the entity object remains at its place, otherwise it is moved to the returned direction. Implementing the manager object The skeleton of the manager object is the following: internal class TurnMovementManager { private const float GRID = 64; // [1] private readonlyHashSet<ICmpMovementEntity>entitiesMovedInTurn = new HashSet<ICmpMovementEntity> (); // [2] private ICmpMovementEntityonTurnEntity; // [3] public void Tick (); // [4] private ICmpMovementEntityGetNextNotMovedEntity (); // [5] private void MoveEntity(ICmpMovementEntity entity, Decision decision); // [6] private void NextTurn (); // [7] } GRID is the discrete measurement step in the game world. entitiesMovedInTurn is the set of the entities already taken action in the current turn. onTurnEntity keeps track the entity that is asked to decide its move next. Tick is invoked every game loop update. The main processing happens here. public void Tick () { onTurnEntity = GetNextNotMovedEntity (); if (onTurnEntity == null) { NextTurn (); return; } var decision = onTurnEntity.RequestDecision (); if (decision != Decision.NotDecided&& decision != Decision.Stay) { entitiesMovedInTurn.Add (onTurnEntity); MoveEntity (onTurnEntity, decision); } } GetNextNotMovedEntity collects the entities not moved in the current turn, sorts them by their initiative, and returns the first. If there are no unmoved entities left, it returns null. privateICmpMovementEntityGetNextNotMovedEntity () { varentitiesInScene = Scene.Current.FindComponents<ICmpMovementEntity> (); varnotMovedEntities = entitiesInScene.Where (ent => !entitiesMovedInTurn.Contains (ent)).ToList (); Comparison<ICmpMovementEntity> compare = (ent1, ent2) =>ent2.Initiative.CompareTo (ent1.Initiative); notMovedEntities.Sort (compare); return notMovedEntities.FirstOrDefault (); } MoveEntity displaces the currently processed entity, according to its decision. private void MoveEntity (ICmpMovementEntity entity, Decision decision) { varentityComponent = onTurnEntity as Component; var transform = entityComponent.GameObj.Transform; Vector2 direction; switch (decision) { case Decision.UpMove: direction = -Vector2.UnitY; break; case Decision.RightMove: direction = Vector2.UnitX; break; case Decision.DownMove: direction = Vector2.UnitY; break; case Decision.LeftMove: direction = -Vector2.UnitX; break; case Decision.NotDecided: case Decision.Stay: default: throw new ArgumentOutOfRangeException(nameof(decision), decision, null); } transform.MoveByAbs(GRID * direction); } NextTurn clears the moved entity set. private void NextTurn () { entitiesMovedInTurn.Clear (); } Driving the manager object from the CorePlugin Developing games in the Duality engine is usually done via Core Plugin development. Every assembly that extends the base engine functionality needs to implement a CorePlugin object. These objects can be used drive global logic, such as our manager class. The TurnbasedMovementCorePlugin class overrides the OnAfterUpdate method of its superclass, to update a TurnMovementManager instance every frame. public class TurnbasedMovementCorePlugin : CorePlugin { private readonlyTurnMovementManagerturnMovementManager = new TurnMovementManager(); protected override void OnAfterUpdate () { base.OnAfterUpdate (); if (DualityApp.ExecContext == DualityApp.ExecutionContext.Game) { turnMovementManager.Tick (); } } } Trivial test implementation of a ICmpMovementEntity ICmpMovementEntity implementations can be complex, but for demonstration purposes, a simpler one is presented below. It is based on user input. [RequiredComponent(typeof(Transform))] public class TurnMovementTestCmp : Component, ICmpMovementEntity { public int Initiative { get; set; } = 1; public Decision RequestDecision () { if (DualityApp.Keyboard.KeyHit (Key.Space)) return Decision.RightMove; return Decision.NotDecided; } } Summary Of course more convoluted ICmpMovementEntity implementations are needed for game logic. I hope you enjoyed this post. In case you have any questions, feel free to post them below, or on the Duality forums. About the author LorincSerfozo is a software engineer at Graphisoft, the company behind the the BIM solution ArchiCAD. He is studying mechatronics engineering at Budapest University of Technology and Economics, an interdisciplinary field between the more traditional mechanical engineering, electrical engineering and informatics, and has quickly grown a passion towards software development. He is a supporter of opensource software and contributes to the C# and OpenGL-based Duality game engine, creating free plugins and tools for itsusers.
Read more
  • 0
  • 0
  • 7482

article-image-chef-language-and-style
Packt
17 Jan 2017
14 min read
Save for later

Chef Language and Style

Packt
17 Jan 2017
14 min read
In this article by Matthias Marschall, author of the book, Chef Cookbook, Third Edition, we will cover the following section: Using community Chef style Using attributes to dynamically configure recipes Using templates Mixing plain Ruby with Chef DSL (For more resources related to this topic, see here.) Introduction If you want to automate your infrastructure, you will end up using most of Chef's language features. In this article, we will look at how to use the Chef Domain Specific Language (DSL) from basic to advanced level. Using community Chef style It's easier to read code that adheres to a coding style guide. It is important to deliver consistently styled code, especially when sharing cookbooks with the Chef community. In this article, you'll find some of the most important rules (out of many more—enough to fill a short book on their own) to apply to your own cookbooks. Getting ready As you're writing cookbooks in Ruby, it's a good idea to follow general Ruby principles for readable (and therefore maintainable) code. Chef Software, Inc. proposes Ian Macdonald's Ruby Style Guide (http://www.caliban.org/ruby/rubyguide.shtml#style), but to be honest, I prefer Bozhidar Batsov's Ruby Style Guide (https://github.com/bbatsov/ruby-style-guide) due to its clarity. Let's look at the most important rules for Ruby in general and for cookbooks specifically. How to do it… Let's walk through a few Chef style guide examples: Use two spaces per indentation level: remote_directory node['nagios']['plugin_dir'] do source 'plugins' end Use Unix-style line endings. Avoid Windows line endings by configuring Git accordingly: mma@laptop:~/chef-repo $ git config --global core.autocrlf true For more options on how to deal with line endings in Git, go to https://help.github.com/articles/dealing-with-line-endings. Align parameters spanning more than one line: variables( mon_host: 'monitoring.example.com', nrpe_directory: "#{node['nagios']['nrpe']['conf_dir']}/nrpe.d" ) Describe your cookbook in metadata.rb (you should always use the Ruby DSL: Version your cookbook according to Semantic Versioning standards (http://semver.org): version "1.1.0" List the supported operating systems by looping through an array using the each method: %w(redhat centos ubuntu debian).each do |os| supports os end Declare dependencies and pin their versions in metadata.rb: depends "apache2", ">= 1.0.4" depends "build-essential" Construct strings from variable values and static parts by using string expansion: my_string = "This resource changed #{counter} files" Download temporary files to Chef::Config['file_cache_path'] instead of /tmp or some local directory. Use strings to access node attributes instead of Ruby symbols: node['nagios']['users_databag_group'] Set attributes in my_cookbook/attributes/default.rb by using default: default['my_cookbook']['version'] = "3.0.11" Create an attribute namespace by using your cookbook name as the first level in my_cookbook/attributes/default.rb: default['my_cookbook']['version'] = "3.0.11" default['my_cookbook']['name'] = "Mine" How it works... Using community Chef style helps to increase the readability of your cookbooks. Your cookbooks will be read much more often than changed. Because of this, it usually pays off to put a little extra effort into following a strict style guide when writing cookbooks. There's more... Using Semantic Versioning (see http://semver.org) for your cookbooks helps to manage dependencies. If you change anything that might break cookbooks, depending on your cookbook, you need to consider this as a backwards incompatible API change. In such cases, Semantic Versioning demands that you increase the major number of your cookbook, for example from 1.1.3 to 2.0.0, resetting minor level and patch levels. Using attributes to dynamically configure recipes Imagine some cookbook author has hardcoded the path where the cookbook puts a configuration file, but in a place that does not comply with your rules. Now, you're in trouble! You can either patch the cookbook or rewrite it from scratch. Both options leave you with a headache and lots of work. Attributes are there to avoid such headaches. Instead of hardcoding values inside cookbooks, attributes enable authors to make their cookbooks configurable. By overriding default values set in cookbooks, users can inject their own values. Suddenly, it's next to trivial to obey your own rules. In this section, we'll see how to use attributes in your cookbooks. Getting ready Make sure you have a cookbook called my_cookbook and the run_list of your node includes my_cookbook. How to do it... Let's see how to define and use a simple attribute: Create a default file for your cookbook attributes: mma@laptop:~/chef-repo $ subl cookbooks/my_cookbook/attributes/default.rb Add a default attribute: default['my_cookbook']['message'] = 'hello world!' Use the attribute inside a recipe: mma@laptop:~/chef-repo $ subl cookbooks/my_cookbook/recipes/default.rb message = node['my_cookbook']['message'] Chef::Log.info("** Saying what I was told to say: #{message}") Upload the modified cookbook to the Chef server: mma@laptop:~/chef-repo $ knife cookbook upload my_cookbook Uploading my_cookbook [0.1.0] Run chef-client on your node: user@server:~$ sudo chef-client ...TRUNCATED OUTPUT... [2016-11-23T19:29:03+00:00] INFO: ** Saying what I was told to say: hello world! ...TRUNCATED OUTPUT... How it works… Chef loads all attributes from the attribute files before it executes the recipes. The attributes are stored with the node object. You can access all attributes stored with the node object from within your recipes and retrieve their current values. Chef has a strict order of precedence for attributes: Default is the lowest, then normal (which is aliased with set), and then override. Additionally, attribute levels set in recipes have precedence over the same level set in an attribute file. Also, attributes defined in roles and environments have the highest precedence. You will find an overview chart at https://docs.chef.io/attributes.html#attribute-precedence. There's more… You can set and override attributes within roles and environments. Attributes defined in roles or environments have the highest precedence (on their respective levels: default and override): Create a role: mma@laptop:~/chef-repo $ subl roles/german_hosts.rb name "german_hosts" description "This Role contains hosts, which should print out their messages in German" run_list "recipe[my_cookbook]" default_attributes "my_cookbook" => { "message" => "Hallo Welt!" } Upload the role to the Chef server: mma@laptop:~/chef-repo $ knife role from file german_hosts.rb Updated Role german_hosts! Assign the role to a node called server: mma@laptop:~/chef-repo $ knife node run_list add server 'role[german_hosts]' server: run_list: role[german_hosts] Run the Chef client on your node: user@server:~$ sudo chef-client ...TRUNCATED OUTPUT... [2016-11-23T19:40:56+00:00] INFO: ** Saying what I was told to say: Hallo Welt! ...TRUNCATED OUTPUT... Calculating values in the attribute files Attributes set in roles and environments (as shown earlier) have the highest precedence and they're already available when the attribute files are loaded. This enables you to calculate attribute values based on role or environment-specific values: Set an attribute within a role: mma@laptop:~/chef-repo $ subl roles/german_hosts.rb name "german_hosts" description "This Role contains hosts, which should print out their messages in German" run_list "recipe[my_cookbook]" default_attributes "my_cookbook" => { "hi" => "Hallo", "world" => "Welt" } Calculate the message attribute, based on the two attributes hi and world: mma@laptop:~/chef-repo $ subl cookbooks/my_cookbook/attributes/default.rb default['my_cookbook']['message'] = "#{node['my_cookbook']['hi']} #{node['my_cookbook']['world']}!" Upload the modified cookbook to your Chef server and run the Chef client on your node to see that it works, as shown in the preceding example. See also Read more about attributes in Chef at https://docs.chef.io/attributes.html Using templates Configuration Management is all about configuring your hosts well. Usually, configuration is carried out by using configuration files. Chef's template resource allows you to recreate these configuration files with dynamic values that are driven by the attributes we've discussed so far in this article. You can retrieve dynamic values from data bags, attributes, or even calculate them on the fly before passing them into a template. Getting ready Make sure you have a cookbook called my_cookbook and that the run_list of your node includes my_cookbook. How to do it… Let's see how to create and use a template to dynamically generate a file on your node: Add a template to your recipe: mma@laptop:~/chef-repo $ subl cookbooks/my_cookbook/recipes/default.rb template '/tmp/message' do source 'message.erb' variables( hi: 'Hallo', world: 'Welt', from: node['fqdn'] ) end Add the ERB template file: mma@laptop:~/chef-repo $ mkdir -p cookbooks/my_cookbook/templates mma@laptop:~/chef-repo $ subl cookbooks/my_cookbook/templates/default/message.erb <%- 4.times do %> <%= @hi %>, <%= @world %> from <%= @from %>! <%- end %> Upload the modified cookbook to the Chef server: mma@laptop:~/chef-repo $ knife cookbook upload my_cookbook Uploading my_cookbook [0.1.0] Run the Chef client on your node: user@server:~$ sudo chef-client ...TRUNCATED OUTPUT... [2016-11-23T19:36:30+00:00] INFO: Processing template[/tmp/message] action create (my_cookbook::default line 9) [2016-11-23T19:36:31+00:00] INFO: template[/tmp/message] updated content ...TRUNCATED OUTPUT... Validate the content of the generated file: user@server:~$ sudo cat /tmp/message Hallo, Welt from vagrant.vm! Hallo, Welt from vagrant.vm! Hallo, Welt from vagrant.vm! Hallo, Welt from vagrant.vm! How it works… Chef uses Erubis as its template language. It allows you to use pure Ruby code by using special symbols inside your templates. These are commonly called the 'angry squid' You use <%= %> if you want to print the value of a variable or Ruby expression into the generated file. You use <%- %> if you want to embed Ruby logic into your template file. We used it to loop our expression four times. When you use the template resource, Chef makes all the variables you pass available as instance variables when rendering the template. We used @hi, @world, and @from in our earlier example. There's more… The node object is available in a template as well. Technically, you could access node attributes directly from within your template: <%= node['fqdn'] %> However, this is not a good idea because it will introduce hidden dependencies to your template. It is better to make dependencies explicit, for example, by declaring the fully qualified domain name (FQDN) of your node as a variable for the template resource inside your cookbook: template '/tmp/fqdn' do source 'fqdn.erb' variables( fqdn: node['fqdn'] ) end Avoid using the node object directly inside your templates because this introduces hidden dependencies to node variables in your templates. If you need a different template for a specific host or platform, you can put those specific templates into various subdirectories of the templates directory. Chef will try to locate the correct template by searching these directories from the most specific (host) to the least specific (default). You can place message.erb in the cookbooks/my_cookbook/templates/host-server.vm ("host-#{node[:fqdn]}") directory if it is specific to that host. If it is platform-specific, you can place it in cookbooks/my_cookbook/templates/ubuntu ("#{node[:platform]}"); and if it is specific to a certain platform version, you can place it in cookbooks/my_cookbook/templates/ubuntu-16.04 ("#{node[:platform]}-#{node[:platorm_version]}"). Only place it in the default directory if your template is the same for any host or platform. Know the templates/default directory means that a template file is the same for all hosts and platforms—it does not correspond to a recipe name. See also Read more about templates at https://docs.chef.io/templates.html Mixing plain Ruby with Chef DSL To create simple recipes, you only need to use resources provided by Chef such as template, remote_file, or service. However, as your recipes become more elaborate, you'll discover the need to do more advanced things such as conditionally executing parts of your recipe, looping, or even making complex calculations. Instead of declaring the gem_package resource ten times, simply use different name attributes; it is so much easier to loop through an array of gem names creating the gem_package resources on the fly. This is the power of mixing plain Ruby with Chef Domain Specific Language (DSL). We'll see a few tricks in the following sections. Getting ready Start a chef-shell on any of your nodes in Client mode to be able to access your Chef server, as shown in the following code: user@server:~$ sudo chef-shell --client loading configuration: /etc/chef/client.rb Session type: client ...TRUNCATED OUTPUT... run `help' for help, `exit' or ^D to quit. Ohai2u user@server! chef > How to do it… Let's play around with some Ruby constructs in chef-shell to get a feel for what's possible: Get all nodes from the Chef server by using search from the Chef DSL: chef > nodes = search(:node, "hostname:[* TO *]") => [#<Chef::Node:0x00000005010d38 @chef_server_rest=nil, @name="server", ...TRUNCATED OUTPUT... Sort your nodes by name by using plain Ruby: chef > nodes.sort! { |a, b| a.hostname <=> b.hostname }.collect { |n| n.hostname } => ["alice", "server"] Loop through the nodes, printing their operating systems: chef > nodes.each do |n| chef > puts n['os'] chef ?> end linux windows => [node[server], node[alice]] Log only if there are no nodes: chef > Chef::Log.warn("No nodes found") if nodes.empty? => nil Install multiple Ruby gems by using an array, a loop, and string expansion to construct the gem names: chef > recipe_mode chef:recipe > %w{ec2 essentials}.each do |gem| chef:recipe > gem_package "knife-#{gem}" chef:recipe ?> end => ["ec2", "essentials"] How it works... Chef recipes are Ruby files, which get evaluated in the context of a Chef run. They can contain plain Ruby code, such as if statements and loops, as well as Chef DSL elements such as resources (remote_file, service, template, and so on). Inside your recipes, you can declare Ruby variables and assign them any values. We used the Chef DSL method search to retrieve an array of Chef::Node instances and stored that array in the variable nodes. Because nodes is a plain Ruby array, we can use all methods the array class provides such as sort! or empty? Also, we can iterate through the array by using the plain Ruby each iterator, as we did in the third example. Another common thing is to use if, else, or case for conditional execution. In the fourth example, we used if to only write a warning to the log file if the nodes array are empty. In the last example, we entered recipe mode and combined an array of strings (holding parts of gem names) and the each iterator with the Chef DSL gem_package resource to install two Ruby gems. To take things one step further, we used plain Ruby string expansion to construct the full gem names (knife-ec2 and knife-essentials) on the fly. There's more… You can use the full power of Ruby in combination with the Chef DSL in your recipes. Here is an excerpt from the default recipe from the nagios cookbook, which shows what's possible: # Sort by name to provide stable ordering nodes.sort! { |a, b| a.name <=> b.name } # maps nodes into nagios hostgroups service_hosts = {} search(:role, ‚*:*') do |r| hostgroups << r.name nodes.select { |n| n[‚roles'].include?(r.name) if n[‚roles'] }.each do |n| service_hosts[r.name] = n[node[‚nagios'][‚host_name_attribute']] end end First, they use Ruby to sort an array of nodes by their name attributes. Then, they define a Ruby variable called service_hosts as an empty Hash. After this, you will see some more array methods in action such as select, include?, and each. See also Find out more about how to use Ruby in recipes here: https://docs.chef.io/chef/dsl_recipe.html There's more… If you don't want to modify existing cookbooks, this is currently the only way to modify parts of recipes which are not meant to be configured via attributes. This approach is exactly the same thing as monkey-patching any Ruby class by reopening it in your own source files. This usually leads to brittle code, as your code now depends on implementation details of another piece of code instead of depending on its public interface (in Chef recipes, the public interface is its attributes). Keep such cookbook modifications in a separate place so that you can easily find out what you did later. If you bury your modifications deep inside your complicated cookbooks, you might experience issues later that are very hard to debug. Resources for Article: Further resources on this subject: Getting started with using Chef [article] Going Beyond the Basics [article] An Overview of Automation and Advent of Chef [article]
Read more
  • 0
  • 0
  • 23261

article-image-tabular-models
Packt
16 Jan 2017
15 min read
Save for later

Tabular Models

Packt
16 Jan 2017
15 min read
In this article by Derek Wilson, the author of the book Tabular Modeling with SQL Server 2016 Analysis Services Cookbook, you will learn the following recipes: Opening an existing model Importing data Modifying model relationships Modifying model measures Modifying model columns Modifying model hierarchies Creating a calculated table Creating key performance indicators (KPIs) Modifying key performance indicators (KPIs) Deploying a modified model (For more resources related to this topic, see here.) Once the new data is loaded into the model, we will modify various pieces of the model, including adding a new Key Performance Indicator. Next, we will perform calculations to see how to create and modify measures and columns. Opening an existing model We will open the model. To make modifications to your deployed models, we will need to open the model in the Visual Studio designer. How to do it… Open your solution, by navigating to File | Open | Project/Solution. Then select the folder and solution Chapter3_Model and select Open. Your solution is now open and ready for modification. How it works… Visual Studio stores the model as a project inside of a solution. In Chapter 3 we created a new project and saved it as Chapter3_Model. To make modifications to the model we open it in Visual Studio. Importing data The crash data has many columns that store the data in codes. In order to make this data useful for reporting, we need to add description columns. In this section, we will create four code tables by importing data into a SQL Server database. Then, we will add the tables to your existing model. Getting ready In the database on your SQL Server, run the following scripts to create the four tables and populate them with the reference data: Create the Major Cause of Accident Reference Data table: CREATE TABLE [dbo].[MAJCSE_T](   [MAJCSE] [int] NULL,   [MAJOR_CAUSE] [varchar](50) NULL ) ON [PRIMARY] Then, populate the table with data: INSERT INTO MAJCSE_T VALUES (20, 'Overall/rollover'), (21, 'Jackknife'), (31, 'Animal'), (32, 'Non-motorist'), (33, 'Vehicle in Traffic'), (35, 'Parked motor vehicle'), (37, 'Railway vehicle'), (40, 'Collision with bridge'), (41, 'Collision with bridge pier'), (43, 'Collision with curb'), (44, 'Collision with ditch'), (47, 'Collision culvert'), (48, 'Collision Guardrail - face'), (50, 'Collision traffic barrier'), (53, 'impact with Attenuator'), (54, 'Collision with utility pole'), (55, 'Collision with traffic sign'), (59, 'Collision with mailbox'), (60, 'Collision with Tree'), (70, 'Fire'), (71, 'Immersion'), (72, 'Hit and Run'), (99, 'Unknown') Create the table to store the lighting conditions at the time of the crash: CREATE TABLE [dbo].[LIGHT_T](   [LIGHT] [int] NULL,   [LIGHT_CONDITION] [varchar](30) NULL ) ON [PRIMARY] Now, populate the data that shows the descriptions for the codes: INSERT INTO LIGHT_T VALUES (1, 'Daylight'), (2, 'Dusk'), (3, 'Dawn'), (4, 'Dark, roadway lighted'), (5, 'Dark, roadway not lighted'), (6, 'Dark, unknown lighting'), (9, 'Unknown') Create the table to store the road conditions: CREATE TABLE [dbo].[CSRFCND_T](   [CSRFCND] [int] NULL,   [SURFACE_CONDITION] [varchar](50) NULL ) ON [PRIMARY] Now populate the road condition descriptions: INSERT INTO CSRFCND_T VALUES (1, 'Dry'), (2, 'Wet'), (3, 'Ice'), (4, 'Snow'), (5, 'Slush'), (6, 'Sand, Mud'), (7, 'Water'), (99, 'Unknown') Finally, create the weather table: CREATE TABLE [dbo].[WEATHER_T](   [WEATHER] [int] NULL,   [WEATHER_CONDITION] [varchar](30) NULL ) ON [PRIMARY] Then populate the weather condition descriptions. INSERT INTO WEATHER_T VALUES (1, 'Clear'), (2, 'Partly Cloudy'), (3, 'Cloudy'), (5, 'Mist'), (6, 'Rain'), (7, 'Sleet, hail, freezing rain'), (9, 'Severe winds'), (10, 'Blowing Sand'), (99, 'Unknown') You now have the tables and data required to complete the recipes in this chapter. How to do it… From your open model, change to the Diagram view in model.bim. Navigate to Model | Import from Data Source then select Microsoft SQL Server on the Table Import Wizard and click on Next. Set your Server Name to Localhost and change the Database name to Chapter3 and click on Next. Enter your admin account username and password and click on Next. You want to select from a list of tables the four tables that were created at the beginning. Click on Finish to import the data. How it works… This recipe opens the table import wizard and allows us to select the four new tables that are to be added to the existing model. The data is then imported into your Tabular Model workspace. Once imported, the data is now ready to be used to enhance the model. Modifying model relationships We will create the necessary relationships for the new tables. These relationships will be used in the model in order for the SSAS engine to perform correct calculations. How to do it… Open your model to the diagram view and you will see the four tables that you imported from the previous recipe. Select the CSRFCND field in the CSRFCND_T table and drag the CSRFCND table in the Crash_Data table. Select the LIGHT field in the LIGHT_T table and drag to the LIGHT table in the Crash_Data table. Select the MAJCSE field in the MAJCSE_T table and drag to the MAJCSE table in the Crash_Data table. Select the WEATHER field in the WEATHER_T table and drag to the WEATHER table in the Crash_Data table. How it works… Each table in this section has a relationship built between the code columns and the Crash_Data table corresponding columns. These relationships allow for DAX calculations to be applied across the data tables. Modifying model measures Now that there are more tables in the model, we are going to add an additional measure to perform quick calculations on data. The measure will use a simple DAX calculation since it is focused on how to add or modify the model measures. How to do it… Open the Chapter 3 model project to the Model.bim folder and make sure you are in grid view. Select the cell under Count_of_Crashes and in the fx bar add the following DAX formula to create Sum_of_Fatalities: Sum_of_Fatalities:=SUM(Crash_Data[FATALITIES]) Then, hit Enter to create the calculation: In the properties window, enter Injury_Calculations in the Display Folder. Then, change the Format to Whole Number and change the Show Thousand Separator to True. Finally, add to Description Total Number of Fatalities Recorded: How it works… In this recipe, we added a new measure to the existing model that calculates the total number of fatalities on the Crash_Data table. Then we added a new folder for the users to see the calculation. We also modified the default behavior of the calculation to display as a whole number and show commas to make the numbers easier to interpret. Finally, we added a description to the calculation that users will be able to see in the reporting tools. If we did not make these changes in the model, each user will be required to make the changes each time they accessed the model. By placing the changes in the model, everyone will see the data in the same format. Modifying model columns We will modify the properties of the columns on the WEATHER table. Modifications to the columns in a table make the information easier for your users to understand in the reporting tools. Some properties determine how the SSAS engine uses the fields when creating the model on the server. How to do it… In Model.bim, make sure you are in the grid view and change to the WEATHER_T tab. Select WEATHER column to view the available Properties and make the following changes: Hiddenproperty to True  Uniqueproperty to True Sort By ColumnselectWEATHER_CONDITION Summarize By to Count Next, select the WEATHER_CONDITION column and modify the following properties. Description add Weather at time of crash Default Labelproperty to True How it works… This recipe modified the properties of the measure to make it better for your report users to access the data. The WEATHER code column was hidden so it will not be visible in the reporting tools and the WEATHER_CONDITION was sorted in alphabetical order. You set the default aggregation to Count and then added a description for the column. Now, when this dimension is added to a report only the WEATHER_CONDITION column will be seen and pre-sorted based on the WEATHER_CONDITION field. It will also use count as the aggregation type to provide the number of each type of weather conditions. If you were to add another new description to the table, it would automatically be sorted correctly. Modifying model hierarchies Once you have created a hierarchy, you may want to remove or modify the hierarchy from your model. We will make modifications to the Calendar_YQMD hierarchy. How to do it… Open Model.bim to the diagram view and find the Master_Calendar_T table. Review the Calendar_YQMD hierarchy and included columns. Select the Quarter_Name column and right-click on it to bring up the menu. Select Remove from Hierarchy to delete Quarter_Name from the hierarchy and confirm on the next screen by selecting Remove from Hierarchy. Select the Calendar_YQMD hierarchy and right-click on it and select Rename. Change the name to Calendar_YMD and hit on Enter. How it works… In this recipe, we opened the diagram view and selected the Master_Calendar_T table to find the existing hierarchy. After selecting the Quarter_Name column in the hierarchy, we used the menus to view the available options for modifications. Then we selected the option to remove the column from the hierarchy. Finally, we updated the name of the hierarchy to let users know that the quarter column is not included. There’s more… Another option to remove fields from the hierarchy is to select the column and then press the delete key. Likewise, you can double-click on the Calendar_YQMD hierarchy to bring up the edit window for the name. Then edit the name and hit Enter to save the change in the designer. Creating a calculated table Calculated tables are created dynamically using functions or DAX queries. They are very useful if you need to create a new table based on information in another table. For example, you could have a date table with 30 years of data. However, most of your users only look at the last five years of information when running most of their analysis. Instead of creating a new table you can dynamically make a new table that only stores the last five years of dates. You will use a single DAX query to filter the Master_Calendar_T table to the last 5 years of data. How to do it… OpenModel.bim to the grid view and then select the Table menu and New Calculated Table. A new data tab is created. In the function box, enter this DAX formula to create a date calendar for the last 5 years: FILTER(MasterCalendar_T, MasterCalendar_T[Date]>=DATEADD(MasterCalendar_T[Date],6,YEAR)) Double-click on the CalculatedTable 1 tab and rename to Last_5_Years_T. How it works… It works by creating a new table in the model that is built from a DAX formula. In order to limit the number of years shown, the DAX formula reduces the total number of dates available for the last 5 years of dates. There’s more… After you create a calculated table, you will need to create the necessary relationships and hierarchies just like a regular table: Switch to the diagram view in the model.bim and you will be able to see the new table. Create a new hierarchy and name it Last_5_Years_YQM and include Year, Quarter_Name, Month_Name, and Date Replace the Master_Calendar_T relationship with the Date column from the Last_5_Years_T date column to the Crash_Date.Crash_Date column. Now, the model will only display the last 5 years of crash data when using the Last_5_Years_T table in the reporting tools. The Crash_Data table still contains all of the records if you need to view more than 5 years of data. Creating key performance indicators (KPIs) Key performance indicators are business metrics that show the effectiveness of a business objective. They are used to track actual performance against budgeted or planned value such as Service Level Agreements or On-Time performance. The advantage of creating a KPI is the ability to quickly see the actual value compared to the target value. To add a KPI, you will need to have a measure to use as the actual and another measure that returns the target value. In this recipe, we will create a KPI that tracks the number of fatalities and compares them to the prior year with the goal of having fewer fatalities each year. How to do it… Open the Model.bim to the grid view and select an empty cell and create a new measure named Last_Year_Fatalities:Last_Year_Fatalities:=CALCULATE(SUM(Crash_Data[FATALITIES]),DATEADD(MasterCalendar_T[Date],-1, YEAR)) Select the already existing Sum_of_measure then right-click and select Create KPI…. On the Key Performance Indicator (KPI) window, select Last_Year_Fatalities as the Target Measure. Then, select the second set of icons that have red, yellow, and green with symbols. Finally, change the KPI color scheme to green, yellow, and red and make the scores 90 and 97, and then click on OK. The Sum_of_Fatalites measure will now have a small graph next to it in the measure grid to show that there is a KPI on that measure. How it works… You created a new calculation that compared the actual count of fatalities compared to the same number for the prior year. Then you created a new KPI that used the actual and Last_Year_Fatalities measure. In the KPI window, you setup thresholds to determine when a KPI is red, yellow, or green. For this example, you want to show that having less fatalities year over year is better. Therefore, when the KPI is 97% or higher the KPI will show red. For values that are in the range of 90% to 97% the KPI is yellow and anything below 90% is green. By selecting the icons with both color and symbols, users that are color-blind can still determine the appropriate symbol of the KPI. Modifying key performance indicators (KPIs) Once you have created a KPI, you may want to remove or modify the KPI from your model. You will make modifications to the Last_Year_Fatalities hierarchy. How to do it… Open Model.bim to the Grid view and select the Sum_of_Fatalities measure then right-click to bring up Edit KPI settings…. Edit the appropriate settings to modify an existing KPI. How it works… Just like models, KPIs will need to be modified after being initially designed. The icon next to a measure denotes that a KPI is defined on the measure. Right-clicking on the measure brings up the menu that allows you to enter the Edit KPI setting. Deploying a modified model Once you have completed the changes to your model, you have two options for deployment. First, you can deploy the model and replace the existing model. Alternatively, you can change the name of your model and deploy it as a new model. This is often useful when you need to test changes and maintain the existing model as is. How to do it… Open the Chapter3_model project in Visual Studio. Select the Project menu and select Chapter3_Model Properties… to bring up the Properties menu and review the Server and Database properties. To overwrite an existing model make no changes and click on OK. Select the Build menu from the Chapter3_Model project and select the Deploy Chapter3_Model option. On the following screens, enter the impersonation credentials for your data and hit OK to deploy the changes. How it works… the model that is on your local machine and submits the changes to the server. By not making any changes to the existing model properties, a new deployment will overwrite the old model. All of your changes are now published on the server and users can begin to leverage the changes. There’s more… Sometimes you might want to deploy your model to a different database without overwriting the existing environment. This could be to try out a new model or test different functionality with users that you might want to implement. You can modify the properties of the project to deploy to a different server such as development, UAT, or production. Likewise, you can also change the database name to deploy the model to the same server or different servers for testing. Open the Project menu and then select Chapter3_Model Properties. Change the name of the Database to Chapter4_Model and click on OK. Next, on the Build menu, select Deploy Chapter3_Model to deploy the model to the same server under the new name of Chapter4_Model. When you review the Analysis Services databases in SQL Server Management Studio, you will now see a database for Chapter3_Model and Chapter4_Model. Summary After building a model, we will need to maintain and enhance the model as the business users update or change their requirements. We will begin by adding additional tables to the model that contain the descriptive data columns for several code columns. Then we will create relationships between these new tables and the existing data tables. Resources for Article: Further resources on this subject: Say Hi to Tableau [article] Data Tables and DataTables Plugin in jQuery 1.3 with PHP [article] Data Science with R [article]
Read more
  • 0
  • 0
  • 3000

article-image-how-add-intent-your-amazon-echo-skill
Antonio Cucciniello
16 Jan 2017
5 min read
Save for later

How to Add an Intent to Your Amazon Echo Skill

Antonio Cucciniello
16 Jan 2017
5 min read
If you are new to Amazon Echo Development and have no idea what an intent is, you have come to the right place. What is an intent, you ask? It is basically a way for your skill to know what to do when you say a specific phrase to your Amazon Echo. For example, If I said "Alexa, ask Greeter to say goodbye," and I wanted Alexa to respond with "Goodbye", I would create an intent that handles what the echo does (and ultimately how it responds) when it hears the phrase I gave it. Without adding your own intents, the echo skill would not be able to handle the different functions you would like it to. Now, this involves two different forms of work: one in your Amazon Developer Portal and the other in your code. Developer Portal First, we should talk about the Developer Portal work. There are two different parts to setting up your Developer Portal for an intent. Intent Schema When you are logged in to the Developer Portal for your skill, on the left hand side, go to the Interaction Model section. Once you are on that page, go to the Intent Schema section. Your Intent Schema is basically where you create your intents in the JSON format for Alexa to understand them. For a basic example, we will be creating an intent called HelloWorldIntent: Once your Intent Schema looks like this, you have now added a new intent to your project. In order for Alexa to know when you want to select this intent, you need to add Sample Utterances. Sample Utterances If you scroll down to the bottom of the page we were on to edit the Intent Schema, you will see a section called Sample Utterances. A Sample Utterance is a bunch of example phrases that a user would say to invoke this specific intent. This is how Alexa takes what you say and knows what to do with it. For example, we could add the statements in this image as Sample Utterances for our HelloWorldIntent: The format for this is: IntentName + the phrase you would like to say. So, now when a user is using this skill, he can say "Alexa, ask Greeter to say hello." Now Alexa would take that phrase and know what to do with it because you have coded to handle this intent. In Code Up to this point, everything you have done has been in the Amazon Developer Portal for your skill. Now that the boring part is out of the way, you can learn how to actually handle the intent's functionality through code. Intent Function The first step is to create a file in the project’s root directory called hello-world-intent.js. This is going to be a file that contains a function for handling your HelloWorldIntent. Here is what the code should look like for basic handling of the intent: // hello-world-intent.js module.exports = HelloWorldIntent function HelloWorldIntent (intent, session, response) { var output = 'Hello World' response.tell(output) return } If you are familiar with Node.js, you probably know of module.exports for allowing the programmer to add code from different files using require('./fileName.js'). This allows us to use the HelloWorldIntent function in our main js file. Our function is pretty basic, it takes three parameters: intent, session, and response. For the purpose of this lesson, we will only be using response. Response allows the user to send multiple types of responses through Alexa. For this example, we will be using response.tell(). At the begining of our function, we created a basic string that says "Hello World" and stored that in a variable called output. The line, response.tell(output), tells Alexa to respond to the user with "Hello World". Main file In your main js file (the one you use AWS Lambda to point to), add the following line in the top section of your file: // main.js var HelloWorldIntent = require('./hello-world-intent.js') This is why we used module.exports before: in order to be able to import your HelloWorldIntent function into the main file. Now, go to the section of your file where you have: ServiceName.prototype.intentHandlers = {}. This is how you connect the work you did in the Developer Portal to the work you did in your code. To add an intent handler for an intent, add a line inside the brackets like this: 'IntentNameInDevPortal' : IntentFunctionCodeName. So in the end your code should look as follows: GreeterService.protoype.intentHandlers = { 'HelloWorldIntent' : HelloWorldIntent } Conclusion Congrats! You have just added your first intent from scratch. To summarize what happens with your skill from start to finish, when invoking an intent, here is a quick breakdown: You say, "Alexa, ask Greeter to say hello." Alexa listens to what you are saying. She takes the words you said and tries to figure out what to do with them by comparing them to Sample Utterances for the intents you have created. She realizes you are invoking the HelloWorldIntent. She now executes the code you have established for a HelloWorldIntent (the code in the HelloWorldIntent Function). She responds with "Hello World." Possible Resources Use my skill here: Edit Docs Check out the Code for my skill on GitHub Alexa Skills Kit Custom Interaction Model Reference Implementing the Built-in Intents About the author Antonio is a Software Engineer with a background in C, C++, and Javascript (Node.Js) from New Jersey. His most recent project called Edit Docs is an Amazon Echo skill that allows users to edit Google Drive files using your voice. He loves building cool things with software, reading books on self-help and improvement, finance, and entrepreneurship. To contact Antonio, email him at Antonio.cucciniello16@gmail.com, follow him on twitter at @antocucciniello, and follow him on GitHub.
Read more
  • 0
  • 0
  • 15563

article-image-flink-complex-event-processing
Packt
16 Jan 2017
13 min read
Save for later

Flink Complex Event Processing

Packt
16 Jan 2017
13 min read
In this article by Tanmay Deshpande, the author of the book Mastering Apache Flink, we will learn the Table API provided by Apache Flink and how we can use it to process relational data structures. We will start learning more about the libraries provided by Apache Flink and how we can use them for specific use cases. To start with, let's try to understand a library called complex event processing (CEP). CEP is a very interesting but complex topic which has its value in various industries. Wherever there is a stream of events expected, naturally people want to perform complex event processing in all such use cases. Let's try to understand what CEP is all about. (For more resources related to this topic, see here.) What is complex event processing? CEP is a technique to analyze streams of disparate events occurring with high frequency and low latency. These days, streaming events can be found in various industries, for example: In the oil and gas domain, sensor data comes from various drilling tools or from upstream oil pipeline equipment In the security domain, activity data, malware information, and usage pattern data come from various end points In the wearable domain, data comes from various wrist bands with information about your heart beat rate, your activity, and so on In the banking domain, data from credit cards usage, banking activities, and so on It is very important to analyze the variation patterns to get notified in real time about any change in the regular assembly. CEP is able to understand the patterns across the streams of events, sub-events, and their sequences. CEP helps to identify meaningful patterns and complex relationships among unrelated events, and sends notifications in real and near real time to avoid any damage: The preceding diagram shows how the CEP flow works. Even though the flow looks simple, CEP has various abilities such as: Ability to produce results as soon as the input event stream is available Ability to provide computations like aggregation over time and timeout between two events of interest Ability to provide real time/near real time alerts and notifications on detection of complex event patterns Ability to connect and correlate heterogeneous sources and analyze patterns in them Ability to achieve high throughput, low latency processing There are various solutions available in the market. With big data technology advancements, we have multiple options like Apache Spark, Apache Samza, Apache Beam, among others, but none of them have a dedicated library to fit all solutions. Now let us try to understand what we can achieve with Flink's CEP library. Flink CEP Apache Flink provides the Flink CEP library which provides APIs to perform complex event processing. The library consists of the following core components: Event stream Pattern definition Pattern detection Alert generation Flink CEP works on Flink's streaming API called DataStream. A programmer needs to define the pattern to be detected from the stream of events and then Flink's CEP engine detects the pattern and takes appropriate action, such as generating alerts. In order to get started, we need to add following Maven dependency: <!-- https://mvnrepository.com/artifact/org.apache.flink/flink-cep-scala_2.10 --> <dependency> <groupId>org.apache.flink</groupId> <artifactId>flink-cep-scala_2.10</artifactId> <version>1.1.2</version> </dependency> Event stream A very important component of CEP is its input event stream. We have seen details of DataStream API. Now let's use that knowledge to implement CEP. The very first thing we need to do is define a Java POJO for the event. Let's assume we need to monitor a temperature sensor event stream. First we define an abstract class and then extend this class. While defining the event POJOs we need to make sure that we implement the hashCode() and equals() methods, as while comparing the events, compile will make use of them. The following code snippets demonstrate this. First, we write an abstract class as shown here: package com.demo.chapter05; public abstract class MonitoringEvent { private String machineName; public String getMachineName() { return machineName; } public void setMachineName(String machineName) { this.machineName = machineName; } @Override public int hashCode() { final int prime = 31; int result = 1; result = prime * result + ((machineName == null) ? 0 : machineName.hashCode()); return result; } @Override public boolean equals(Object obj) { if (this == obj) return true; if (obj == null) return false; if (getClass() != obj.getClass()) return false; MonitoringEvent other = (MonitoringEvent) obj; if (machineName == null) { if (other.machineName != null) return false; } else if (!machineName.equals(other.machineName)) return false; return true; } public MonitoringEvent(String machineName) { super(); this.machineName = machineName; } } Then we write the actual temperature event: package com.demo.chapter05; public class TemperatureEvent extends MonitoringEvent { public TemperatureEvent(String machineName) { super(machineName); } private double temperature; public double getTemperature() { return temperature; } public void setTemperature(double temperature) { this.temperature = temperature; } @Override public int hashCode() { final int prime = 31; int result = super.hashCode(); long temp; temp = Double.doubleToLongBits(temperature); result = prime * result + (int) (temp ^ (temp >>> 32)); return result; } @Override public boolean equals(Object obj) { if (this == obj) return true; if (!super.equals(obj)) return false; if (getClass() != obj.getClass()) return false; TemperatureEvent other = (TemperatureEvent) obj; if (Double.doubleToLongBits(temperature) != Double.doubleToLongBits(other.temperature)) return false; return true; } public TemperatureEvent(String machineName, double temperature) { super(machineName); this.temperature = temperature; } @Override public String toString() { return "TemperatureEvent [getTemperature()=" + getTemperature() + ", getMachineName()=" + getMachineName() + "]"; } } Now we can define the event source as shown follows. In Java: StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment(); DataStream<TemperatureEvent> inputEventStream = env.fromElements(new TemperatureEvent("xyz", 22.0), new TemperatureEvent("xyz", 20.1), new TemperatureEvent("xyz", 21.1), new TemperatureEvent("xyz", 22.2), new TemperatureEvent("xyz", 22.1), new TemperatureEvent("xyz", 22.3), new TemperatureEvent("xyz", 22.1), new TemperatureEvent("xyz", 22.4), new TemperatureEvent("xyz", 22.7), new TemperatureEvent("xyz", 27.0)); In Scala: val env: StreamExecutionEnvironment = StreamExecutionEnvironment.getExecutionEnvironment val input: DataStream[TemperatureEvent] = env.fromElements(new TemperatureEvent("xyz", 22.0), new TemperatureEvent("xyz", 20.1), new TemperatureEvent("xyz", 21.1), new TemperatureEvent("xyz", 22.2), new TemperatureEvent("xyz", 22.1), new TemperatureEvent("xyz", 22.3), new TemperatureEvent("xyz", 22.1), new TemperatureEvent("xyz", 22.4), new TemperatureEvent("xyz", 22.7), new TemperatureEvent("xyz", 27.0)) Pattern API Pattern API allows you to define complex event patterns very easily. Each pattern consists of multiple states. To go from one state to another state, generally we need to define the conditions. The conditions could be continuity or filtered out events. Let's try to understand each pattern operation in detail. Begin The initial state can be defined as follows: In Java: Pattern<Event, ?> start = Pattern.<Event>begin("start"); In Scala: val start : Pattern[Event, _] = Pattern.begin("start") Filter We can also specify the filter condition for the initial state: In Java: start.where(new FilterFunction<Event>() { @Override public boolean filter(Event value) { return ... // condition } }); In Scala: start.where(event => ... /* condition */) Subtype We can also filter out events based on their sub-types, using the subtype() method. In Java: start.subtype(SubEvent.class).where(new FilterFunction<SubEvent>() { @Override public boolean filter(SubEvent value) { return ... // condition } }); In Scala: start.subtype(classOf[SubEvent]).where(subEvent => ... /* condition */) Or Pattern API also allows us define multiple conditions together. We can use OR and AND operators. In Java: pattern.where(new FilterFunction<Event>() { @Override public boolean filter(Event value) { return ... // condition } }).or(new FilterFunction<Event>() { @Override public boolean filter(Event value) { return ... // or condition } }); In Scala: pattern.where(event => ... /* condition */).or(event => ... /* or condition */) Continuity As stated earlier, we do not always need to filter out events. There can always be some pattern where we need continuity instead of filters. Continuity can be of two types – strict continuity and non-strict continuity. Strict continuity Strict continuity needs two events to succeed directly which means there should be no other event in between. This pattern can be defined by next(). In Java: Pattern<Event, ?> strictNext = start.next("middle"); In Scala: val strictNext: Pattern[Event, _] = start.next("middle") Non-strict continuity Non-strict continuity can be stated as other events are allowed to be in between the specific two events. This pattern can be defined by followedBy(). In Java: Pattern<Event, ?> nonStrictNext = start.followedBy("middle"); In Scala: val nonStrictNext : Pattern[Event, _] = start.followedBy("middle") Within Pattern API also allows us to do pattern matching based on time intervals. We can define a time-based temporal constraint as follows. In Java: next.within(Time.seconds(30)); In Scala: next.within(Time.seconds(10)) Detecting patterns To detect the patterns against the stream of events, we need run the stream though the pattern. The CEP.pattern() returns PatternStream. The following code snippet shows how we can detect a pattern. First the pattern is defined to check if temperature value is greater than 26.0 degrees in 10 seconds. In Java: Pattern<TemperatureEvent, ?> warningPattern = Pattern.<TemperatureEvent> begin("first") .subtype(TemperatureEvent.class).where(new FilterFunction<TemperatureEvent>() { public boolean filter(TemperatureEvent value) { if (value.getTemperature() >= 26.0) { return true; } return false; } }).within(Time.seconds(10)); PatternStream<TemperatureEvent> patternStream = CEP.pattern(inputEventStream, warningPattern); In Scala: val env: StreamExecutionEnvironment = StreamExecutionEnvironment.getExecutionEnvironment val input = // data val pattern: Pattern[TempEvent, _] = Pattern.begin("start").where(event => event.temp >= 26.0) val patternStream: PatternStream[TempEvent] = CEP.pattern(input, pattern) Use case – complex event processing on temperature sensor In earlier sections, we learnt various features provided by the Flink CEP engine. Now it's time to understand how we can use it in real-world solutions. For that let's assume we work for a mechanical company which produces some products. In the product factory, there is a need to constantly monitor certain machines. The factory has already set up the sensors which keep on sending the temperature of the machines at a given time. Now we will be setting up a system that constantly monitors the temperature value and generates an alert if the temperature exceeds a certain value. We can use the following architecture: Here we will be using Kafka to collect events from sensors. In order to write a Java application, we first need to create a Maven project and add the following dependency: <!-- https://mvnrepository.com/artifact/org.apache.flink/flink-cep-scala_2.10 --> <dependency> <groupId>org.apache.flink</groupId> <artifactId>flink-cep-scala_2.10</artifactId> <version>1.1.2</version> </dependency> <!-- https://mvnrepository.com/artifact/org.apache.flink/flink-streaming-java_2.10 --> <dependency> <groupId>org.apache.flink</groupId> <artifactId>flink-streaming-java_2.10</artifactId> <version>1.1.2</version> </dependency> <!-- https://mvnrepository.com/artifact/org.apache.flink/flink-streaming-scala_2.10 --> <dependency> <groupId>org.apache.flink</groupId> <artifactId>flink-streaming-scala_2.10</artifactId> <version>1.1.2</version> </dependency> <dependency> <groupId>org.apache.flink</groupId> <artifactId>flink-connector-kafka-0.9_2.10</artifactId> <version>1.0.0</version> </dependency> Next we need to do following things for using Kafka. First we need to define a custom Kafka deserializer. This will read bytes from a Kafka topic and convert it into TemperatureEvent. The following is the code to do this. EventDeserializationSchema.java: package com.demo.chapter05; import java.io.IOException; import java.nio.charset.StandardCharsets; import org.apache.flink.api.common.typeinfo.TypeInformation; import org.apache.flink.api.java.typeutils.TypeExtractor; import org.apache.flink.streaming.util.serialization.DeserializationSchema; public class EventDeserializationSchema implements DeserializationSchema<TemperatureEvent> { public TypeInformation<TemperatureEvent> getProducedType() { return TypeExtractor.getForClass(TemperatureEvent.class); } public TemperatureEvent deserialize(byte[] arg0) throws IOException { String str = new String(arg0, StandardCharsets.UTF_8); String[] parts = str.split("="); return new TemperatureEvent(parts[0], Double.parseDouble(parts[1])); } public boolean isEndOfStream(TemperatureEvent arg0) { return false; } } Next we create topics in Kafka called temperature: bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic temperature Now we move to Java code which would listen to these events in Flink streams: StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment(); Properties properties = new Properties(); properties.setProperty("bootstrap.servers", "localhost:9092"); properties.setProperty("group.id", "test"); DataStream<TemperatureEvent> inputEventStream = env.addSource( new FlinkKafkaConsumer09<TemperatureEvent>("temperature", new EventDeserializationSchema(), properties)); Next we will define the pattern to check if the temperature is greater than 26.0 degrees Celsius within 10 seconds: Pattern<TemperatureEvent, ?> warningPattern = Pattern.<TemperatureEvent> begin("first").subtype(TemperatureEvent.class).where(new FilterFunction<TemperatureEvent>() { private static final long serialVersionUID = 1L; public boolean filter(TemperatureEvent value) { if (value.getTemperature() >= 26.0) { return true; } return false; } }).within(Time.seconds(10)); Next match this pattern with the stream of events and select the event. We will also add up the alert messages into results stream as shown here: DataStream<Alert> patternStream = CEP.pattern(inputEventStream, warningPattern) .select(new PatternSelectFunction<TemperatureEvent, Alert>() { private static final long serialVersionUID = 1L; public Alert select(Map<String, TemperatureEvent> event) throws Exception { return new Alert("Temperature Rise Detected:" + event.get("first").getTemperature() + " on machine name:" + event.get("first").getMachineName()); } }); In order to know the alerts generated, we will print the results: patternStream.print(); And we execute the stream: env.execute("CEP on Temperature Sensor"); Now we are all set to execute the application. So as and when we get messages in Kafka topics, the CEP will keep on executing. The actual execution will looks like the following. Example input: xyz=21.0 xyz=30.0 LogShaft=29.3 Boiler=23.1 Boiler=24.2 Boiler=27.0 Boiler=29.0 Example output: Connected to JobManager at Actor[akka://flink/user/jobmanager_1#1010488393] 10/09/2016 18:15:55 Job execution switched to status RUNNING. 10/09/2016 18:15:55 Source: Custom Source(1/4) switched to SCHEDULED 10/09/2016 18:15:55 Source: Custom Source(1/4) switched to DEPLOYING 10/09/2016 18:15:55 Source: Custom Source(2/4) switched to SCHEDULED 10/09/2016 18:15:55 Source: Custom Source(2/4) switched to DEPLOYING 10/09/2016 18:15:55 Source: Custom Source(3/4) switched to SCHEDULED 10/09/2016 18:15:55 Source: Custom Source(3/4) switched to DEPLOYING 10/09/2016 18:15:55 Source: Custom Source(4/4) switched to SCHEDULED 10/09/2016 18:15:55 Source: Custom Source(4/4) switched to DEPLOYING 10/09/2016 18:15:55 CEPPatternOperator(1/1) switched to SCHEDULED 10/09/2016 18:15:55 CEPPatternOperator(1/1) switched to DEPLOYING 10/09/2016 18:15:55 Map -> Sink: Unnamed(1/4) switched to SCHEDULED 10/09/2016 18:15:55 Map -> Sink: Unnamed(1/4) switched to DEPLOYING 10/09/2016 18:15:55 Map -> Sink: Unnamed(2/4) switched to SCHEDULED 10/09/2016 18:15:55 Map -> Sink: Unnamed(2/4) switched to DEPLOYING 10/09/2016 18:15:55 Map -> Sink: Unnamed(3/4) switched to SCHEDULED 10/09/2016 18:15:55 Map -> Sink: Unnamed(3/4) switched to DEPLOYING 10/09/2016 18:15:55 Map -> Sink: Unnamed(4/4) switched to SCHEDULED 10/09/2016 18:15:55 Map -> Sink: Unnamed(4/4) switched to DEPLOYING 10/09/2016 18:15:55 Source: Custom Source(2/4) switched to RUNNING 10/09/2016 18:15:55 Source: Custom Source(3/4) switched to RUNNING 10/09/2016 18:15:55 Map -> Sink: Unnamed(1/4) switched to RUNNING 10/09/2016 18:15:55 Map -> Sink: Unnamed(2/4) switched to RUNNING 10/09/2016 18:15:55 Map -> Sink: Unnamed(3/4) switched to RUNNING 10/09/2016 18:15:55 Source: Custom Source(4/4) switched to RUNNING 10/09/2016 18:15:55 Source: Custom Source(1/4) switched to RUNNING 10/09/2016 18:15:55 CEPPatternOperator(1/1) switched to RUNNING 10/09/2016 18:15:55 Map -> Sink: Unnamed(4/4) switched to RUNNING 1> Alert [message=Temperature Rise Detected:30.0 on machine name:xyz] 2> Alert [message=Temperature Rise Detected:29.3 on machine name:LogShaft] 3> Alert [message=Temperature Rise Detected:27.0 on machine name:Boiler] 4> Alert [message=Temperature Rise Detected:29.0 on machine name:Boiler] We can also configure a mail client and use some external web hook to send e-mail or messenger notifications. The code for the application can be found on GitHub: https://github.com/deshpandetanmay/mastering-flink. Summary We learnt about complex event processing (CEP). We discussed the challenges involved and how we can use the Flink CEP library to solve CEP problems. We also learnt about Pattern API and the various operators we can use to define the pattern. In the final section, we tried to connect the dots and see one complete use case. With some changes, this setup can be used as it is present in various other domains as well. We will see how to use Flink's built-in Machine Learning library to solve complex problems. Resources for Article: Further resources on this subject: Getting Started with Apache Spark DataFrames [article] Getting Started with Apache Hadoop and Apache Spark [article] Integrating Scala, Groovy, and Flex Development with Apache Maven [article]
Read more
  • 0
  • 0
  • 7037

article-image-basic-operations-elasticsearch
Packt
16 Jan 2017
10 min read
Save for later

Basic Operations of Elasticsearch

Packt
16 Jan 2017
10 min read
In this article by Alberto Maria Angelo Paro, the author of the book ElasticSearch 5.0 Cookbook - Third Edition, you will learn the following recipes: Creating an index Deleting an index Opening/closing an index Putting a mapping in an index Getting a mapping (For more resources related to this topic, see here.) Creating an index The first operation to do before starting indexing data in Elasticsearch is to create an index--the main container of our data. An index is similar to the concept of database in SQL, a container for types (tables in SQL) and documents (records in SQL). Getting ready To execute curl via the command line you need to install curl for your operative system. How to do it... The HTTP method to create an index is PUT (but also POST works); the REST URL contains the index name: http://<server>/<index_name> For creating an index, we will perform the following steps: From the command line, we can execute a PUT call: curl -XPUT http://127.0.0.1:9200/myindex -d '{ "settings" : { "index" : { "number_of_shards" : 2, "number_of_replicas" : 1 } } }' The result returned by Elasticsearch should be: {"acknowledged":true,"shards_acknowledged":true} If the index already exists, a 400 error is returned: { "error" : { "root_cause" : [ { "type" : "index_already_exists_exception", "reason" : "index [myindex/YJRxuqvkQWOe3VuTaTbu7g] already exists", "index_uuid" : "YJRxuqvkQWOe3VuTaTbu7g", "index" : "myindex" } ], "type" : "index_already_exists_exception", "reason" : "index [myindex/YJRxuqvkQWOe3VuTaTbu7g] already exists", "index_uuid" : "YJRxuqvkQWOe3VuTaTbu7g", "index" : "myindex" }, "status" : 400 } How it works... Because the index name will be mapped to a directory on your storage, there are some limitations to the index name, and the only accepted characters are: ASCII letters [a-z] Numbers [0-9] point ".", minus "-", "&" and "_" During index creation, the replication can be set with two parameters in the settings/index object: number_of_shards, which controls the number of shards that compose the index (every shard can store up to 2^32 documents) number_of_replicas, which controls the number of replica (how many times your data is replicated in the cluster for high availability)A good practice is to set this value at least to 1. The API call initializes a new index, which means: The index is created in a primary node first and then its status is propagated to all nodes of the cluster level A default mapping (empty) is created All the shards required by the index are initialized and ready to accept data The index creation API allows defining the mapping during creation time. The parameter required to define a mapping is mapping and accepts multi mappings. So in a single call it is possible to create an index and put the required mappings. There's more... The create index command allows passing also the mappings section, which contains the mapping definitions. It is a shortcut to create an index with mappings, without executing an extra PUT mapping call: curl -XPOST localhost:9200/myindex -d '{ "settings" : { "number_of_shards" : 2, "number_of_replicas" : 1 }, "mappings" : { "order" : { "properties" : { "id" : {"type" : "keyword", "store" : "yes"}, "date" : {"type" : "date", "store" : "no" , "index":"not_analyzed"}, "customer_id" : {"type" : "keyword", "store" : "yes"}, "sent" : {"type" : "boolea+n", "index":"not_analyzed"}, "name" : {"type" : "text", "index":"analyzed"}, "quantity" : {"type" : "integer", "index":"not_analyzed"}, "vat" : {"type" : "double", "index":"no"} } } } }' Deleting an index The counterpart of creating an index is deleting one. Deleting an index means deleting its shards, mappings, and data. There are many common scenarios when we need to delete an index, such as: Removing the index to clean unwanted/obsolete data (for example, old Logstash indices). Resetting an index for a scratch restart. Deleting an index that has some missing shard, mainly due to some failures, to bring back the cluster in a valid state (if a node dies and it's storing a single replica shard of an index, this index is missing a shard so the cluster state becomes red. In this case, you'll bring back the cluster to a green status, but you lose the data contained in the deleted index). Getting ready To execute curl via command line you need to install curl for your operative system. The index created is required to be deleted. How to do it... The HTTP method used to delete an index is DELETE. The following URL contains only the index name: http://<server>/<index_name> For deleting an index, we will perform the steps given as follows: Execute a DELETE call, by writing the following command: curl -XDELETE http://127.0.0.1:9200/myindex We check the result returned by Elasticsearch. If everything is all right, it should be: {"acknowledged":true} If the index doesn't exist, a 404 error is returned: { "error" : { "root_cause" : [ { "type" : "index_not_found_exception", "reason" : "no such index", "resource.type" : "index_or_alias", "resource.id" : "myindex", "index_uuid" : "_na_", "index" : "myindex" } ], "type" : "index_not_found_exception", "reason" : "no such index", "resource.type" : "index_or_alias", "resource.id" : "myindex", "index_uuid" : "_na_", "index" : "myindex" }, "status" : 404 } How it works... When an index is deleted, all the data related to the index is removed from disk and is lost. During the delete processing, first the cluster is updated, and then the shards are deleted from the storage. This operation is very fast; in a traditional filesystem it is implemented as a recursive delete. It's not possible restore a deleted index, if there is no backup. Also calling using the special _all index_name can be used to remove all the indices. In production it is good practice to disable the all indices deletion by adding the following line to Elasticsearch.yml: action.destructive_requires_name:true Opening/closing an index If you want to keep your data, but save resources (memory/CPU), a good alternative to delete indexes is to close them. Elasticsearch allows you to open/close an index to put it into online/offline mode. Getting ready To execute curl via the command line you need to install curl for your operative system. How to do it... For opening/closing an index, we will perform the following steps: From the command line, we can execute a POST call to close an index using: curl -XPOST http://127.0.0.1:9200/myindex/_close If the call is successful, the result returned by Elasticsearch should be: {,"acknowledged":true} To open an index, from the command line, type the following command: curl -XPOST http://127.0.0.1:9200/myindex/_open If the call is successful, the result returned by Elasticsearch should be: {"acknowledged":true} How it works... When an index is closed, there is no overhead on the cluster (except for metadata state): the index shards are switched off and they don't use file descriptors, memory, and threads. There are many use cases when closing an index: Disabling date-based indices (indices that store their records by date), for example, when you keep an index for a week, month, or day and you want to keep online a fixed number of old indices (that is, two months) and some offline (that is, from two months to six months). When you do searches on all the active indices of a cluster and don't want search in some indices (in this case, using alias is the best solution, but you can achieve the same concept of alias with closed indices). An alias cannot have the same name as an index When an index is closed, calling the open restores its state. Putting a mapping in an index We saw how to build mapping by indexing documents. This recipe shows how to put a type mapping in an index. This kind of operation can be considered as the Elasticsearch version of an SQL created table. Getting ready To execute curl via the command line you need to install curl for your operative system. How to do it... The HTTP method to put a mapping is PUT (also POST works). The URL format for putting a mapping is: http://<server>/<index_name>/<type_name>/_mapping For putting a mapping in an index, we will perform the steps given as follows: If we consider the type order, the call will be: curl -XPUT 'http://localhost:9200/myindex/order/_mapping' -d '{ "order" : { "properties" : { "id" : {"type" : "keyword", "store" : "yes"}, "date" : {"type" : "date", "store" : "no" , "index":"not_analyzed"}, "customer_id" : {"type" : "keyword", "store" : "yes"}, "sent" : {"type" : "boolean", "index":"not_analyzed"}, "name" : {"type" : "text", "index":"analyzed"}, "quantity" : {"type" : "integer", "index":"not_analyzed"}, "vat" : {"type" : "double", "index":"no"} } } }' In case of success, the result returned by Elasticsearch should be: {"acknowledged":true} How it works... This call checks if the index exists and then it creates one or more type mapping as described in the definition. During mapping insert if there is an existing mapping for this type, it is merged with the new one. If there is a field with a different type and the type could not be updated, an exception expanding fields property is raised. To prevent an exception during the merging mapping phase, it's possible to specify the ignore_conflicts parameter to true (default is false). The put mapping call allows you to set the type for several indices in one shot; list the indices separated by commas or to apply all indexes using the _all alias. There's more… There is not a delete operation for mapping. It's not possible to delete a single mapping from an index. To remove or change a mapping you need to manage the following steps: Create a new index with the new/modified mapping Reindex all the records Delete the old index with incorrect mapping Getting a mapping After having set our mappings for processing types, we sometimes need to control or analyze the mapping to prevent issues. The action to get the mapping for a type helps us to understand structure or its evolution due to some merge and implicit type guessing. Getting ready To execute curl via command-line you need to install curl for your operative system. How to do it… The HTTP method to get a mapping is GET. The URL formats for getting mappings are: http://<server>/_mapping http://<server>/<index_name>/_mapping http://<server>/<index_name>/<type_name>/_mapping To get a mapping from the type of an index, we will perform the following steps: If we consider the type order of the previous chapter, the call will be: curl -XGET 'http://localhost:9200/myindex/order/_mapping?pretty=true' The pretty argument in the URL is optional, but very handy to pretty print the response output. The result returned by Elasticsearch should be: { "myindex" : { "mappings" : { "order" : { "properties" : { "customer_id" : { "type" : "keyword", "store" : true }, … truncated } } } } } How it works... The mapping is stored at the cluster level in Elasticsearch. The call checks both index and type existence and then it returns the stored mapping. The returned mapping is in a reduced form, which means that the default values for a field are not returned. Elasticsearch stores only not default field values to reduce network and memory consumption. Retrieving a mapping is very useful for several purposes: Debugging template level mapping Checking if implicit mapping was derivated correctly by guessing fields Retrieving the mapping metadata, which can be used to store type-related information Simply checking if the mapping is correct If you need to fetch several mappings, it is better to do it at index level or cluster level to reduce the numbers of API calls. Summary We learned how to manage indices and perform operations on documents. We'll discuss different operations on indices such as create, delete, update, open, and close. These operations are very important because they allow better define the container (index) that will store your documents. The index create/delete actions are similar to the SQL create/delete database commands. Resources for Article: Further resources on this subject: Elastic Stack Overview [article] Elasticsearch – Spicing Up a Search Using Geo [article] Downloading and Setting Up ElasticSearch [article]
Read more
  • 0
  • 0
  • 5843
Unlock access to the largest independent learning library in Tech for FREE!
Get unlimited access to 7500+ expert-authored eBooks and video courses covering every tech area you can think of.
Renews at $19.99/month. Cancel anytime
article-image-web-framework-behavior-tuning
Packt
12 Jan 2017
8 min read
Save for later

Web Framework Behavior Tuning

Packt
12 Jan 2017
8 min read
In this article by Alex Antonov, the author of the book Spring Boot Cookbook – Second Edition, learn to use and configure spring resources and build your own Spring-based application using Spring Boot. In this article, you will learn about the following topics: Configuring route matching patterns Configuring custom static path mappings Adding custom connectors (For more resources related to this topic, see here.) Introduction We will look into enhancing our web application by doing behavior tuning, configuring the custom routing rules and patterns, adding additional static asset paths, and adding and modifying servlet container connectors and other properties, such as enabling SSL. Configuring route matching patterns When we build web applications, it is not always the case that a default, out-of-the-box, mapping configuration is applicable. At times, we want to create our RESTful URLs that contain characters such as . (dot), which Spring treats as a delimiter defining format, like path.xml, or we might not want to recognize a trailing slash, and so on. Conveniently, Spring provides us with a way to get this accomplished with ease. Let's imagine that the ISBN format does allow the use of dots to separate the book number from the revision with a pattern looking like [isbn-number].[revision]. How to do it… We will configure our application to not use the suffix pattern match of .* and not to strip the values after the dot when parsing the parameters. Let's perform the following steps: Let's add the necessary configuration to our WebConfiguration class with the following content: @Override public void configurePathMatch(PathMatchConfigurer configurer) { configurer.setUseSuffixPatternMatch(false). setUseTrailingSlashMatch(true); } Start the application by running ./gradlew clean bootRun. Let's open http://localhost:8080/books/978-1-78528-415-1.1 in the browser to see the following results: If we enter the correct ISBN, we will see a different result, as shown in the following screenshot: How it works… Let's look at what we did in detail. The configurePathMatch(PathMatchConfigurer configurer) method gives us an ability to set our own behavior in how we want Spring to match the request URL path to the controller parameters: configurer.setUseSuffixPatternMatch(false): This method indicates that we don't want to use the .* suffix so as to strip the trailing characters after the last dot. This translates into Spring parsing out 978-1-78528-415-1.1 as an {isbn} parameter for BookController. So, http://localhost:8080/books/978-1-78528-415-1.1 and http://localhost:8080/books/978-1-78528-415-1 will become different URLs. configurer.setUseTrailingSlashMatch(true): This method indicates that we want to use the trailing / in the URL as a match, as if it were not there. This effectively makes http://localhost:8080/books/978-1-78528-415-1 the same as http://localhost:8080/books/978-1-78528-415-1/. If you want to do further configuration on how the path matching takes place, you can provide your own implementation of PathMatcher and UrlPathHelper, but these will be required in the most extreme and custom-tailored situations and are not generally recommended. Configuring custom static path mappings It is possible to control how our web application deals with static assets and the files that exist on the filesystem or are bundled in the deployable archive. Let's say that we want to expose our internal application.properties file via the static web URL of http://localhost:8080/internal/application.properties from our application. To get started with this, proceed with the steps in the next section. How to do it… Let's add a new method, addResourceHandlers, to the WebConfiguration class with the following content: @Override public void addResourceHandlers(ResourceHandlerRegistry registry) { registry.addResourceHandler("/internal/**").addResourceLocations("classpath:/"); } Start the application by running ./gradlew clean bootRun. Let's open http://localhost:8080/internal/application.properties in the browser to see the following results: How it works… The method that we overrode, addResourceHandlers(ResourceHandlerRegistry registry), is another configuration method from WebMvcConfigurer, which gives us an ability to define custom mappings for static resource URLs and connect them with the resources on the filesystem or application classpath. In our case, we defined a mapping of anything that is being accessed via the / internal URL to be looked for in classpath:/ of our application. (For production environment, you probably don't want to expose the entire classpath as a static resource!) So, let's take a detailed look at what we did, as follows: registry.addResourceHandler("/internal/**"): This method adds a resource handler to the registry to handle our static resources, and it returns ResourceHandlerRegistration to us, which can be used to further configure the mapping in a chained fashion. /internal/** is a path pattern that will be used to match against the request URL using PathMatcher. We have seen how PathMatcher can be configured in the previous example but, by default, an AntPathMatcher implementation is used. We can configure more than one URL pattern to be matched to a particular resource location. addResourceLocations("classpath:/"):This method is called on the newly created instance of ResourceHandlerRegistration, and it defines the directories where the resources should be loaded from. These should be valid filesystems or classpath directories, and there can be more than one entered. If multiple locations are provided, they will be checked in the order in which they were entered. setCachePeriod (Integer cachePeriod): Using this method, we can also configure a caching interval for the given resource by adding custom connectors. Another very common scenario in the enterprise application development and deployment is to run the application with two separate HTTP port connectors: one for HTTP and the other for HTTPS. Adding custom connectors Another very common scenario in the enterprise application development and deployment is to run the application with two separate HTTP port connectors: one for HTTP and the other for HTTPS. Getting ready For this recipe, we will undo the changes that we implemented in the previous example. In order to create an HTTPS connector, we will need a few things; but, most importantly, we will need to generate a certificate keystore that is used to encrypt and decrypt the SSL communication with the browser. If you are using Unix or Mac, you can do it by running the following command: $JAVA_HOME/bin/keytool -genkey -alias tomcat -keyalg RSA On Windows, this can be achieved via the following code: "%JAVA_HOME%binkeytool" -genkey -alias tomcat -keyalg RSA During the creation of the keystore, you should enter the information that is appropriate to you, including passwords, name, and so on. For the purpose of this book, we will use the default password: changeit. Once the execution is complete, a newly generated keystore file will appear in your home directory under the name .keystore. You can find more information about preparing the certificate keystore at https://tomcat.apache.org/tomcat-8.0-doc/ssl-howto.html#Prepare_the_Certificate_Keystore. How to do it… With the keystore creation complete, we will need to create a separate properties file in order to store our configuration for the HTTPS connector, such as port and others. After that, we will create a configuration property binding object and use it to configure our new connector. Perform the following steps: First, we will create a new properties file named tomcat.https.properties in the src/main/resources directory from the root of our project with the following content: custom.tomcat.https.port=8443 custom.tomcat.https.secure=true custom.tomcat.https.scheme=https custom.tomcat.https.ssl=true custom.tomcat.https.keystore=${user.home}/.keystore custom.tomcat.https.keystore-password=changeit Next, we will create a nested static class named TomcatSslConnectorProperties in our WebConfiguration, with the following content: @ConfigurationProperties(prefix = "custom.tomcat.https") public static class TomcatSslConnectorProperties { private Integer port; private Boolean ssl= true; private Boolean secure = true; private String scheme = "https"; private File keystore; private String keystorePassword; //Skipping getters and setters to save space, but we do need them public void configureConnector(Connector connector) { if (port != null) connector.setPort(port); if (secure != null) connector.setSecure(secure); if (scheme != null) connector.setScheme(scheme); if (ssl!= null) connector.setProperty("SSLEnabled", ssl.toString()); if (keystore!= null &&keystore.exists()) { connector.setProperty("keystoreFile", keystore.getAbsolutePath()); connector.setProperty("keystorePassword", keystorePassword); } } } Now, we will need to add our newly created tomcat.http.properties file as a Spring Boot property source and enable TomcatSslConnectorProperties to be bound. This can be done by adding the following code right above the class declaration of the WebConfiguration class: @Configuration @PropertySource("classpath:/tomcat.https.properties") @EnableConfigurationProperties(WebConfiguration.TomcatSslConnectorProperties.class) public class WebConfiguration extends WebMvcConfigurerAdapter {...} Finally, we will need to create an EmbeddedServletContainerFactory Spring bean where we will add our HTTPS connector. We will do that by adding the following code to the WebConfiguration class: @Bean public EmbeddedServletContainerFactory servletContainer(TomcatSslConnectorProperties properties) { TomcatEmbeddedServletContainerFactory tomcat = new TomcatEmbeddedServletContainerFactory(); tomcat.addAdditionalTomcatConnectors( createSslConnector(properties)); return tomcat; } private Connector createSslConnector(TomcatSslConnectorProperties properties) { Connector connector = new Connector(); properties.configureConnector(connector); return connector; } Start the application by running ./gradlew clean bootRun. Let's open https://localhost:8443/internal/tomcat.https.properties in the browser to see the following results: Summary In this article, you learned how to fine-tune the behavior of a web application. This article has given a small gist about custom routes, asset paths, and amending routing patterns. You also learned how to add more connectors to the servlet container. Resources for Article: Further resources on this subject: Introduction to Spring Framework [article] Setting up Microsoft Bot Framework Dev Environment [article] Creating our first bot, WebBot [article]
Read more
  • 0
  • 0
  • 18275

article-image-professional-environment-react-native-part-2
Pierre Monge
12 Jan 2017
4 min read
Save for later

A Professional Environment for React Native, Part 2

Pierre Monge
12 Jan 2017
4 min read
In Part 1 of this series, I covered the full environment and everything you need to start creating your own React Native applications. Now here in Part 2, we are going to dig in and go over the tools that you can take advantage of for maintaining those React Native apps. Maintaining the application Maintaining a React Native application, just like any software, is very complex and requires a lot of organization. In addition to having strict code (a good syntax with eslint or a good understanding of the code with flow), you must have intelligent code, and you must organize your files, filenames, and variables. It is necessary to have solutions for the maintenance of the application in the long term as well as have tools that provide feedback. Here are some tools that we use, which should be in place early in the cycle of your React Native development. GitHub GitHub is a fantastic tool, but you need to know how to control it. In my company, we have our own Git flow with a Dev branch, a master branch, release branches, bugs and other useful branches. It's up to you to make your own flow for Git! One of the most important things is the Pull Request, or the PR! And if there are many people on your project, it is important for your group to agree on the organization of the code. BugTracker & Tooling We use many tools in my team, but here is our Must-Have list to maintain the application: circleCI: This is a continuous integration tool that we integrate with GitHub. It allows us to pass recurrent tests with each new commit. BugSnag: This is a bug tracking tool that can be used in a React Native integration, which makes it possible to raise user bugs by the webs without the user noticing it. codePush: This is useful for deploying code on versions already in production. And yes, you can change business code while the application is already in production. I do not pay much attention to it, yet the states of applications (Debug, Beta, and Production) are a big part that has to be treated because it is a workset to have for quality work and a long application life. We also have quality assurance in our company, which allows us to validate a product before it is set up, which provides a regular process of putting a React Native app into production. As you can see, there are many tools that will help you maintain a React Native mobile application. Despite the youthfulness of the product, the community grows quickly and developers are excited about creating apps. There are more and more large companies using React Native, such as AirBnB , Wix, Microsoft, and many others. And with the technology growing and improving, there are more and more new tools and integrations coming to React Native. I hope this series has helped you create and maintain your own React Native applications. Here is a summary of the tools covered: Atom is a text editor that's modern, approachable, yet hackable to the core—a tool that you can customize to do anything, but you also need to use it productively without ever touching a config file. GitHub is a web-based Git repository hosting service. CircleCI is a modern continuous integration and delivery platform that software teams love to use. BugSnag monitors application errors to improve customer experiences and code quality. react-native-code-push is a plugin that provides client-side integration, allowing you to easily add a dynamic update experience to your React Native app. About the author Pierre Monge (liroo.pierre@gmail.com) is a 21-year-old student. He is a developer in C, JavaScript, and all things web development, and he has recently been creating mobile applications. He is currently working as an intern at a company named Azendoo, where he is developing a 100% React Native application.
Read more
  • 0
  • 0
  • 4253

article-image-scale-your-django-app
Jean Jung
11 Jan 2017
6 min read
Save for later

Scale your Django App: Gunicorn + Apache + Nginx

Jean Jung
11 Jan 2017
6 min read
One question when starting with Django is "How do I scale my app"? Brandon Rhodes has answered this question in Foundations of Python Network Programming. Rhodes shows us different options, so in this post we will focus on the preferred and main option: Gunicorn + Apache + Nginx. The idea of this architecture is to have Nginx as a proxy to delegate dynamic content to Gunicorn and static content to Apache. As Django, by itself, does not handle static content, and Apache does it very well, we can take advantages from that. Below we will see how to configure everything. Environment Project directory: /var/www/myproject Apache2 Nginx Gunicorn Project settings STATIC_ROOT: /var/www/myproject/static STATIC_URL: /static/ MEDIA_ROOT: /var/www/myproject/media MEDIA_URL: /media/ ALLOWED_HOSTS: myproject.com Gunicorn Gunicorn is a Python and WSGI compatible server, making it our first option when working with Django. It’s possible to install Gunicorn from pip by running: pip install gunicorn To run the Gunicorn server, cd to your project directoryand run: gunicorn myproject.wsgi:application -b localhost:8000 By default, Gunicorn runs just one worker to serve the pages. If you feel you need more workers, you can start them by passing the number of workers to the --workers option. Gunicorn also runs in the foreground, but you need to configure a service on your server. This, however, is not the focus of this post. Visit localhost:8000 on your browser and see that your project is working. You will probably see that your static wasn’t accessible. This is because Django itself cannot serve static files, and Gunicorn is not configured to serve them too. Let’s fix that with Apache in the next section. If your page does not work here, check if you are using a virtualenv and if it is enabled on the Gunicorn running process. Apache Installing Apache takes some time and is not the focus of this post; additionally, a great majority of the readers will already have Apache, so if you don’t know how to install Apache, follow this guide. If you already have configured Apache to serve static content, this one will be very similar to what you have done. If you have never done that, do not be afraid; it will be easy! First of all, change the listening port from Apache. Currently, on Apache2, edit the /etc/apache2/ports.conf and change the line: Listen 80 To: Listen 8001 You can choose other ports too; just be sure to adjust the permissions on the static and media files dir to match the current Apache running user needs. Create a file at /etc/apache2/sites-enabled/myproject.com.conf and add this content: <VirtualHost *:8001> ServerName static.myproject.com ServerAdmin webmaster@localhost CustomLog ${APACHE_LOG_DIR}/static.myproject.com-access.log combined ErrorLog ${APACHE_LOG_DIR}/static.myproject.com-error.log # Possible values include: debug, info, notice, warn, error, crit, # alert, emerg. LogLevel warn DocumentRoot /var/www/myproject Alias /static/ /var/www/myproject/static/ Alias /media/ /var/www/myproject/media/ <Directory /var/www/myproject/static> Require all granted </Directory> <Directory /var/www/myproject/media> Require all granted </Directory> </VirtualHost> Be sure to replace everything needed to fit your project needs. But your project still does not work well because Gunicorn does not know about Apache, and we don’t want it to know anything about that. This is because we will use Nginx, covered in the next session. Nginx Nginx is a very light and powerful web server. It is different from Apache, and it does not spawn a new process for every request, so it works very well as a proxy. As I’ve said, when installing Apache, you would lead to this reference to know how to install Nginx. Proxy configuration is very simple in Nginx; just create a file at /etc/nginx/conf.d/myproject.com.conf and put: upstream dynamic { server 127.0.0.1:8000; } upstream static { server 127.0.0.1:8001; } server { listen 80; server_name myproject.com; # Root request handler to gunicorn upstream location / { proxy_pass http://dynamic; } # Static request handler to apache upstream location /static { proxy_pass http://static/static; } # Media request handler to apache upstream location /media { proxy_pass http://static/media; } proxy_set_header X-Real-IP $remote_addr; proxy_set_header Host $host; proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for; error_page 500 502 503 504 /50x.html; } This way, you have everything working on your machine! If you have more than one machine, you can dedicate the machines to deliver static/dynamic contents. The machine where Nginx runs is the proxy, and it needs to be visible from the Internet. The machines running Apache or Gunicorn can be visible only from your local network. If you follow this architecture, you can just change the Apache and Gunicorn configurations to listen to the default ports, adjust the domain names, and set the Nginx configuration to deliver the connections over the new domains. Where to go now? For more details on deploying Gunicorn with Nginx, see the Gunicorn deployment page. You would like to see the Apache configuration page and the Nginx getting started page to have more information about scalability and security. Summary In this post you saw how to configure Nginx, Apache, and Gunicorn servers to deliver your Django app over a proxy environment, balancing your requests through Apache and Gunicorn. There was a state about how to start more Gunicorn workers and where to find details about scaling each of the servers being used. References [1] - PEP 3333 - WSGI [2] - Gunicorn Project [3] - Apache Project [4] - Nginx Project [5] - Rhodes, B. & Goerzen, J. (2014). Foundations of Python Network Programming. New York, NY: Apress. About the author Jean Jung is a Brazilian developer passionate about technology. Currently a System Analyst at EBANX, an international payment processing cross boarder for Latin America, he's very interested in Python and Artificial Intelligence, specifically Machine Learning, Compilers, and Operational Systems. As a hobby, he's always looking for IoT projects with Arduino.
Read more
  • 0
  • 0
  • 7707

article-image-ml-package
Packt
11 Jan 2017
18 min read
Save for later

ML Package

Packt
11 Jan 2017
18 min read
In this article by Denny Lee, the author of the book Learning PySpark, has provided a brief implementation and theory on ML packages. So, let's get to it! In this article, we will reuse a portion of the dataset. The data can be downloaded from http://www.tomdrabas.com/data/LearningPySpark/births_transformed.csv.gz. (For more resources related to this topic, see here.) Overview of the package At the top level, the package exposes three main abstract classes: a Transformer, an Estimator, and a Pipeline. We will shortly explain each with some short examples. Transformer The Transformer class, like the name suggests, transforms your data by (normally) appending a new column to your DataFrame. At the high level, when deriving from the Transformer abstract class, each and every new Transformer needs to implement a .transform(...) method. The method, as a first and normally the only obligatory parameter, requires passing a DataFrame to be transformed. This, of course, varies method-by-method in the ML package: other popular parameters are inputCol and outputCol; these, however, frequently default to some predefined values, such as 'features' for the inputCol parameter. There are many Transformers offered in the spark.ml.feature and we will briefly describe them here: Binarizer: Given a threshold, the method takes a continuous variable and transforms it into a binary one. Bucketizer: Similar to the Binarizer, this method takes a list of thresholds (the splits parameter) and transforms a continuous variable into a multinomial one. ChiSqSelector: For the categorical target variables (think, classification models), the feature allows you to select a predefined number of features (parameterized by the numTopFeatures parameter) that explain the variance in the target the best. The selection is done, as the name of the method suggest using a Chi-Square test. It is one of the two-step methods: first, you need to .fit(...) your data (so the method can calculate the Chi-square tests). Calling the .fit(...) method (you pass your DataFrame as a parameter) returns a ChiSqSelectorModel object that you can then use to transform your DataFrame using the .transform(...) method. More information on Chi-square can be found here: http://ccnmtl.columbia.edu/projects/qmss/the_chisquare_test/about_the_chisquare_test.html. CountVectorizer: Useful for a tokenized text (such as [['Learning', 'PySpark', 'with', 'us'],['us', 'us', 'us']]). It is of two-step methods: first, you need to .fit(...), that is, learn the patterns from your dataset, before you can .transform(...) with the CountVectorizerModel returned by the .fit(...) method. The output from this transformer, for the tokenized text presented previously, would look similar to this: [(4, [0, 1, 2, 3], [1.0, 1.0, 1.0, 1.0]),(4, [3], [3.0])]. DCT: The Discrete Cosine Transform takes a vector of real values and returns a vector of the same length, but with the sum of cosine functions oscillating at different frequencies. Such transformations are useful to extract some underlying frequencies in your data or in data compression. ElementwiseProduct: A method that returns a vector with elements that are products of the vector passed to the method, and a vector passed as the scalingVec parameter. For example, if you had a [10.0, 3.0, 15.0] vector and your scalingVec was [0.99, 3.30, 0.66], then the vector you would get would look as follows: [9.9, 9.9, 9.9]. HashingTF: A hashing trick transformer that takes a list of tokenized text and returns a vector (of predefined length) with counts. From PySpark's documentation: Since a simple modulo is used to transform the hash function to a column index, it is advisable to use a power of two as the numFeatures parameter; otherwise the features will not be mapped evenly to the columns. IDF: The method computes an Inverse Document Frequency for a list of documents. Note that the documents need to already be represented as a vector (for example, using either the HashingTF or CountVectorizer). IndexToString: A complement to the StringIndexer method. It uses the encoding from the StringIndexerModel object to reverse the string index to original values. MaxAbsScaler: Rescales the data to be within the [-1, 1] range (thus, does not shift the center of the data). MinMaxScaler: Similar to the MaxAbsScaler with the difference that it scales the data to be in the [0.0, 1.0] range. NGram: The method that takes a list of tokenized text and returns n-grams: pairs, triples, or n-mores of subsequent words. For example, if you had a ['good', 'morning', 'Robin', 'Williams'] vector you would get the following output: ['good morning', 'morning Robin', 'Robin Williams']. Normalizer: A method that scales the data to be of unit norm using the p-norm value (by default, it is L2). OneHotEncoder: A method that encodes a categorical column to a column of binary vectors. PCA: Performs the data reduction using principal component analysis. PolynomialExpansion: Performs a polynomial expansion of a vector. For example, if you had a vector symbolically written as [x, y, z], the method would produce the following expansion: [x, x*x, y, x*y, y*y, z, x*z, y*z, z*z]. QuantileDiscretizer: Similar to the Bucketizer method, but instead of passing the splits parameter you pass the numBuckets one. The method then decides, by calculating approximate quantiles over your data, what the splits should be. RegexTokenizer: String tokenizer using regular expressions. RFormula: For those of you who are avid R users - you can pass a formula such as vec ~ alpha * 3 + beta (assuming your DataFrame has the alpha and beta columns) and it will produce the vec column given the expression. SQLTransformer: Similar to the previous, but instead of R-like formulas you can use SQL syntax. The FROM statement should be selecting from __THIS__ indicating you are accessing the DataFrame. For example: SELECT alpha * 3 + beta AS vec FROM __THIS__. StandardScaler: Standardizes the column to have 0 mean and standard deviation equal to 1. StopWordsRemover: Removes stop words (such as 'the' or 'a') from a tokenized text. StringIndexer: Given a list of all the words in a column, it will produce a vector of indices. Tokenizer: Default tokenizer that converts the string to lower case and then splits on space(s). VectorAssembler: A highly useful transformer that collates multiple numeric (vectors included) columns into a single column with a vector representation. For example, if you had three columns in your DataFrame: df = spark.createDataFrame( [(12, 10, 3), (1, 4, 2)], ['a', 'b', 'c']) The output of calling: ft.VectorAssembler(inputCols=['a', 'b', 'c'], outputCol='features') .transform(df) .select('features') .collect() It would look as follows: [Row(features=DenseVector([12.0, 10.0, 3.0])), Row(features=DenseVector([1.0, 4.0, 2.0]))] VectorIndexer: A method for indexing categorical column into a vector of indices. It works in a column-by-column fashion, selecting distinct values from the column, sorting and returning an index of the value from the map instead of the original value. VectorSlicer: Works on a feature vector, either dense or sparse: given a list of indices it extracts the values from the feature vector. Word2Vec: The method takes a sentence (string) as an input and transforms it into a map of {string, vector} format, a representation that is useful in natural language processing. Note that there are many methods in the ML package that have an E letter next to it; this means the method is currently in beta (or Experimental) and it sometimes might fail or produce erroneous results. Beware. Estimators Estimators can be thought of as statistical models that need to be estimated to make predictions or classify your observations. If deriving from the abstract Estimator class, the new model has to implement the .fit(...) method that fits the model given the data found in a DataFrame and some default or user-specified parameters. There are a lot of estimators available in PySpark and we will now shortly describe the models available in Spark 2.0. Classification The ML package provides a data scientist with seven classification models to choose from. These range from the simplest ones (such as Logistic Regression) to more sophisticated ones. We will provide short descriptions of each of them in the following section: LogisticRegression: The benchmark model for classification. The logistic regression uses logit function to calculate the probability of an observation belonging to a particular class. At the time of writing, the PySpark ML supports only binary classification problems. DecisionTreeClassifier: A classifier that builds a decision tree to predict a class for an observation. Specifying the maxDepth parameter limits the depth the tree grows, the minInstancePerNode determines the minimum number of observations in the tree node required to further split, the maxBins parameter specifies the maximum number of bins the continuous variables will be split into, and the impurity specifies the metric to measure and calculate the information gain from the split. GBTClassifier: A Gradient Boosted Trees classification model for classification. The model belongs to the family of ensemble models: models that combine multiple weak predictive models to form a strong one. At the moment the GBTClassifier model supports binary labels, and continuous and categorical features. RandomForestClassifier: The models produce multiple decision trees (hence the name - forest) and use the mode output of those decision trees to classify observations. The RandomForestClassifier supports both binary and multinomial labels. NaiveBayes: Based on the Bayes' theorem, this model uses conditional probability theory to classify observations. The NaiveBayes model in PySpark ML supports both binary and multinomial labels. MultilayerPerceptronClassifier: A classifier that mimics the nature of a human brain. Deeply rooted in the Artificial Neural Networks theory, the model is a black-box, that is, it is not easy to interpret the internal parameters of the model. The model consists, at a minimum, of three, fully connected layers (a parameter that needs to be specified when creating the model object) of artificial neurons: the input layer (that needs to be equal to the number of features in your dataset), a number of hidden layers (at least one), and an output layer with the number of neurons equal to the number of categories in your label. All the neurons in the input and hidden layers have a sigmoid activation function, whereas the activation function of the neurons in the output layer is softmax. OneVsRest: A reduction of a multiclass classification to a binary one. For example, in the case of a multinomial label, the model can train multiple binary logistic regression models. For example, if label == 2 the model will build a logistic regression where it will convert the label == 2 to 1 (or else label values would be set to 0) and then train a binary model. All the models are then scored and the model with the highest probability wins. Regression There are seven models available for regression tasks in the PySpark ML package. As with classification, these range from some basic ones (such as obligatory Linear Regression) to more complex ones: AFTSurvivalRegression: Fits an Accelerated Failure Time regression model; It is a parametric model that assumes that a marginal effect of one of the features accelerates or decelerates a life expectancy (or process failure). It is highly applicable for the processes with well-defined stages. DecisionTreeRegressor: Similar to the model for classification with an obvious distinction that the label is continuous instead of binary (or multinomial). GBTRegressor: As with the DecisionTreeRegressor, the difference is the data type of the label. GeneralizedLinearRegression: A family of linear models with differing kernel functions (link functions). In contrast to the linear regression that assumes normality of error terms, the GLM allows the label to have different error term distributions: the GeneralizedLinearRegression model from the PySpark ML package supports gaussian, binomial, gamma, and poisson families of error distributions with a host of different link functions. IsotonicRegression: A type of regression that fits a free-form, non-decreasing line to your data. It is useful to fit the datasets with ordered and increasing observations. LinearRegression: The most simple of regression models, assumes linear relationship between features and a continuous label, and normality of error terms. RandomForestRegressor: Similar to either DecisionTreeRegressor or GBTRegressor, the RandomForestRegressor fits a continuous label instead of a discrete one. Clustering Clustering is a family of unsupervised models that is used to find underlying patterns in your data. The PySpark ML package provides four most popular models at the moment: BisectingKMeans: A combination of k-means clustering method and hierarchical clustering. The algorithm begins with all observations in a single cluster and iteratively splits the data into k clusters. Check out this website for more information on pseudo-algorithm: http://minethedata.blogspot.com/2012/08/bisecting-k-means.html. KMeans: It is the famous k-mean algorithm that separates data into k clusters, iteratively searching for centroids that minimize the sum of square distances between each observation and the centroid of the cluster it belongs to. GaussianMixture: This method uses k Gaussian distributions with unknown parameters to dissect the dataset. Using the Expectation-Maximization algorithm, the parameters for the Gaussians are found by maximizing the log-likelihood function. Beware that for datasets with many features this model might perform poorly due to the curse of dimensionality and numerical issues with Gaussian distributions. LDA: This model is used for topic modeling in natural language processing applications. There is also one recommendation model available in PySpark ML, but I will refrain from describing it here. Pipeline A Pipeline in PySpark ML is a concept of an end-to-end transformation-estimation process (with distinct stages) that ingests some raw data (in a DataFrame form), performs necessary data carpentry (transformations), and finally estimates a statistical model (estimator). A Pipeline can be purely transformative, that is, consisting of Transformers only. A Pipeline can be thought of as a chain of multiple discrete stages. When a .fit(...) method is executed on a Pipeline object, all the stages are executed in the order they were specified in the stages parameter; the stages parameter is a list of Transformer and Estimator objects. The .fit(...) method of the Pipeline object executes the .transform(...) method for the Transformers and the .fit(...) method for the Estimators. Normally, the output of a preceding stage becomes the input for the following stage: when deriving from either the Transformer or Estimator abstract classes, one needs to implement the .getOutputCol() method that returns the value of the outputCol parameter specified when creating an object. Predicting chances of infant survival with ML In this section, we will use the portion of the dataset to present the ideas of PySpark ML. If you have not yet downloaded the data, it can be accessed here: http://www.tomdrabas.com/data/LearningPySpark/births_transformed.csv.gz. In this section, we will, once again, attempt to predict the chances of the survival of an infant. Loading the data First, we load the data with the help of the following code: import pyspark.sql.types as typ labels = [ ('INFANT_ALIVE_AT_REPORT', typ.IntegerType()), ('BIRTH_PLACE', typ.StringType()), ('MOTHER_AGE_YEARS', typ.IntegerType()), ('FATHER_COMBINED_AGE', typ.IntegerType()), ('CIG_BEFORE', typ.IntegerType()), ('CIG_1_TRI', typ.IntegerType()), ('CIG_2_TRI', typ.IntegerType()), ('CIG_3_TRI', typ.IntegerType()), ('MOTHER_HEIGHT_IN', typ.IntegerType()), ('MOTHER_PRE_WEIGHT', typ.IntegerType()), ('MOTHER_DELIVERY_WEIGHT', typ.IntegerType()), ('MOTHER_WEIGHT_GAIN', typ.IntegerType()), ('DIABETES_PRE', typ.IntegerType()), ('DIABETES_GEST', typ.IntegerType()), ('HYP_TENS_PRE', typ.IntegerType()), ('HYP_TENS_GEST', typ.IntegerType()), ('PREV_BIRTH_PRETERM', typ.IntegerType()) ] schema = typ.StructType([ typ.StructField(e[0], e[1], False) for e in labels ]) births = spark.read.csv('births_transformed.csv.gz', header=True, schema=schema) We specify the schema of the DataFrame; our severely limited dataset now only has 17 columns. Creating transformers Before we can use the dataset to estimate a model, we need to do some transformations. Since statistical models can only operate on numeric data, we will have to encode the BIRTH_PLACE variable. Before we do any of this, since we will use a number of different feature transformations. Let's import them all: import pyspark.ml.feature as ft To encode the BIRTH_PLACE column, we will use the OneHotEncoder method. However, the method cannot accept StringType columns - it can only deal with numeric types so first we will cast the column to an IntegerType: births = births .withColumn( 'BIRTH_PLACE_INT', births['BIRTH_PLACE'] .cast(typ.IntegerType())) Having done this, we can now create our first Transformer: encoder = ft.OneHotEncoder( inputCol='BIRTH_PLACE_INT', outputCol='BIRTH_PLACE_VEC') Let's now create a single column with all the features collated together. We will use the VectorAssembler method: featuresCreator = ft.VectorAssembler( inputCols=[ col[0] for col in labels[2:]] + [encoder.getOutputCol()], outputCol='features' ) The inputCols parameter passed to the VectorAssembler object is a list of all the columns to be combined together to form the outputCol - the 'features'. Note that we use the output of the encoder object (by calling the .getOutputCol() method), so we do not have to remember to change this parameter's value should we change the name of the output column in the encoder object at any point. It's now time to create our first estimator. Creating an estimator In this example, we will (once again) use the Logistic Regression model. However, we will showcase some more complex models from the .classification set of PySpark ML models, so we load the whole section: import pyspark.ml.classification as cl Once loaded, let's create the model by using the following code: logistic = cl.LogisticRegression( maxIter=10, regParam=0.01, labelCol='INFANT_ALIVE_AT_REPORT') We would not have to specify the labelCol parameter if our target column had the name 'label'. Also, if the output of our featuresCreator would not be called 'features' we would have to specify the featuresCol by (most conveniently) calling the getOutputCol() method on the featuresCreator object. Creating a pipeline All that is left now is to create a Pipeline and fit the model. First, let's load the Pipeline from the ML package: from pyspark.ml import Pipeline Creating Pipeline is really easy. Here's how our pipeline should look like conceptually: Converting this structure into a Pipeline is a walk in the park: pipeline = Pipeline(stages=[ encoder, featuresCreator, logistic ]) That's it! Our pipeline is now created so we can (finally!) estimate the model. Fitting the model Before you fit the model we need to split our dataset into training and testing datasets. Conveniently, the DataFrame API has the .randomSplit(...) method: births_train, births_test = births .randomSplit([0.7, 0.3], seed=666) The first parameter is a list of dataset proportions that should end up in, respectively, births_train and births_test subsets. The seed parameter provides a seed to the randomizer. You can also split the dataset into more than two subsets as long as the elements of the list sum up to 1, and you unpack the output into as many subsets. For example, we could split the births dataset into three subsets like this: train, test, val = births. randomSplit([0.7, 0.2, 0.1], seed=666) The preceding code would put a random 70% of the births dataset into the train object, 20% would go to the test, and the val DataFrame would hold the remaining 10%. Now it is about time to finally run our pipeline and estimate our model: model = pipeline.fit(births_train) test_model = model.transform(births_test) The .fit(...) method of the pipeline object takes our training dataset as an input. Under the hood, the births_train dataset is passed first to the encoder object. The DataFrame that is created at the encoder stage then gets passed to the featuresCreator that creates the 'features' column. Finally, the output from this stage is passed to the logistic object that estimates the final model. The .fit(...) method returns the PipelineModel object (the model object in the preceding snippet) that can then be used for prediction; we attain this by calling the .transform(...) method and passing the testing dataset created earlier. Here's what the test_model looks like in the following command: test_model.take(1) It generates the following output: As you can see, we get all the columns from the Transfomers and Estimators. The logistic regression model outputs several columns: the rawPrediction is the value of the linear combination of features and the β coefficients, probability is the calculated probability for each of the classes, and finally, the prediction, which is our final class assignment. Evaluating the performance of the model Obviously, we would like to now test how well our model did. PySpark exposes a number of evaluation methods for classification and regression in the .evaluation section of the package: import pyspark.ml.evaluation as ev We will use the BinaryClassficationEvaluator to test how well our model performed: evaluator = ev.BinaryClassificationEvaluator( rawPredictionCol='probability', labelCol='INFANT_ALIVE_AT_REPORT') The rawPredictionCol can either be the rawPrediction column produced by the estimator or the probability. Let's see how well our model performed: print(evaluator.evaluate(test_model, {evaluator.metricName: 'areaUnderROC'})) print(evaluator.evaluate(test_model, {evaluator.metricName: 'areaUnderPR'})) The preceding code produces the following result: The area under the ROC of 74% and area under PR 71% shows a well-defined model, but nothing out of extraordinary; if we had other features, we could drive this up. Saving the model PySpark allows you to save the Pipeline definition for later use. It not only saves the pipeline structure, but also all the definitions of all the Transformers and Estimators: pipelinePath = './infant_oneHotEncoder_Logistic_Pipeline' pipeline.write().overwrite().save(pipelinePath) So, you can load it up later and use it straight away to .fit(...) and predict: loadedPipeline = Pipeline.load(pipelinePath) loadedPipeline .fit(births_train) .transform(births_test) .take(1) The preceding code produces the same result (as expected): Summary Hence we studied ML package. We explained what Transformer and Estimator are, and showed their role in another concept introduced in the ML library: the Pipeline. Subsequently, we also presented how to use some of the methods to fine-tune the hyper parameters of models. Finally, we gave some examples of how to use some of the feature extractors and models from the library. Resources for Article: Further resources on this subject: Package Management [article] Everything in a Package with concrete5 [article] Writing a Package in Python [article]
Read more
  • 0
  • 0
  • 2786
article-image-hello-c-welcome-net-core
Packt
11 Jan 2017
10 min read
Save for later

Hello, C#! Welcome, .NET Core!

Packt
11 Jan 2017
10 min read
In this article by Mark J. Price, author of the book C# 7 and .NET Core: Modern Cross-Platform Development-Second Edition, we will discuss about setting up your development environment; understanding the similarities and differences between .NET Core, .NET Framework, .NET Standard Library, and .NET Native. Most people learn complex topics by imitation and repetition rather than reading a detailed explanation of theory. So, I will not explain every keyword and step. This article covers the following topics: Setting up your development environment Understanding .NET (For more resources related to this topic, see here.) Setting up your development environment Before you start programming, you will need to set up your Interactive Development Environment (IDE) that includes a code editor for C#. The best IDE to choose is Microsoft Visual Studio 2017, but it only runs on the Windows operating system. To develop on alternative operating systems such as macOS and Linux, a good choice of IDE is Microsoft Visual Studio Code. Using alternative C# IDEs There are alternative IDEs for C#, for example, MonoDevelop and JetBrains Project Rider. They each have versions available for Windows, Linux, and macOS, allowing you to write code on one operating system and deploy to the same or a different one. For MonoDevelop IDE, visit http://www.monodevelop.com/ For JetBrains Project Rider, visit https://www.jetbrains.com/rider/ Cloud9 is a web browser-based IDE, so it's even more cross-platform than the others. Here is the link: https://c9.io/web/sign-up/free Linux and Docker are popular server host platforms because they are relatively lightweight and more cost-effectively scalable when compared to operating system platforms that are more for end users, such as Windows and macOS. Using Visual Studio 2017 on Windows 10 You can use Windows 7 or later to run code, but you will have a better experience if you use Windows 10. If you don't have Windows 10, and then you can create a virtual machine (VM) to use for development. You can choose any cloud provider, but Microsoft Azure has preconfigured VMs that include properly licensed Windows 10 and Visual Studio 2017. You only pay for the minutes your VM is running, so it is a way for users of Linux, macOS, and older Windows versions to have all the benefits of using Visual Studio 2017. Since October 2014, Microsoft has made a professional-quality edition of Visual Studio available to everyone for free. It is called the Community Edition. Microsoft has combined all its free developer offerings in a program called Visual Studio Dev Essentials. This includes the Community Edition, the free level of Visual Studio Team Services, Azure credits for test and development, and free training from Pluralsight, Wintellect, and Xamarin. Installing Microsoft Visual Studio 2017 Download and install Microsoft Visual Studio Community 2017 or later: https://www.visualstudio.com/vs/visual-studio-2017/. Choosing workloads On the Workloads tab, choose the following: Universal Windows Platform development .NET desktop development Web development Azure development .NET Core and Docker development On the Individual components tab, choose the following: Git for Windows GitHub extension for Visual Studio Click Install. You can choose to install everything if you want support for languages such as C++, Python, and R. Completing the installation Wait for the software to download and install. When the installation is complete, click Launch. While you wait for Visual Studio 2017 to install, you can jump to the Understanding .NET section in this article. Signing in to Visual Studio The first time that you run Visual Studio 2017, you will be prompted to sign in. If you have a Microsoft account, for example, a Hotmail, MSN, Live, or Outlook e-mail address, you can use that account. If you don't, then register for a new one at the following link: https://signup.live.com/ You will see the Visual Studio user interface with the Start Page open in the central area. Like most Windows desktop applications, Visual Studio has a menu bar, a toolbar for common commands, and a status bar at the bottom. On the right is the Solution Explorer window that will list your open projects: To have quick access to Visual Studio in the future, right-click on its entry in the Windows taskbar and select Pin this program to taskbar. Using older versions of Visual Studio The free Community Edition has been available since Visual Studio 2013 with Update 4. If you want to use a free version of Visual Studio older than 2013, then you can use one of the more limited Express editions. A lot of the code in this book will work with older versions if you bear in mind when the following features were introduced: Year C# Features 2005 2 Generics with <T> 2008 3 Lambda expressions with => and manipulating sequences with LINQ (from, in, where, orderby, ascending, descending, select, group, into) 2010 4 Dynamic typing with dynamic and multithreading with Task 2012 5 Simplifying multithreading with async and await 2015 6 string interpolation with $"" and importing static types with using static 2017 7 Tuples (with deconstruction), patterns, out variables, literal improvements Understanding .NET .NET Framework, .NET Core, .NET Standard Library, and .NET Native are related and overlapping platforms for developers to build applications and services upon. Understanding the .NET Framework platform Microsoft's .NET Framework is a development platform that includes a Common Language Runtime (CLR) that manages the execution of code and a rich library of classes for building applications. Microsoft designed the .NET Framework to have the possibility of being cross-platform, but Microsoft put their implementation effort into making it work best with Windows. Practically speaking, the .NET Framework is Windows-only. Understanding the Mono and Xamarin projects Third parties developed a cross-platform .NET implementation named the Mono project (http://www.mono-project.com/). Mono is cross-platform, but it fell well behind the official implementation of .NET Framework. It has found a niche as the foundation of the Xamarin mobile platform. Microsoft purchased Xamarin and now includes what used to be an expensive product for free with Visual Studio 2017. Microsoft has renamed the Xamarin Studio development tool to Visual Studio for the Mac. Xamarin is targeted at mobile development and building cloud services to support mobile apps. Understanding the .NET Core platform Today, we live in a truly cross-platform world. Modern mobile and cloud development have made Windows a much less important operating system. So, Microsoft has been working on an effort to decouple the .NET Framework from its close ties with Windows. While rewriting .NET to be truly cross-platform, Microsoft has taken the opportunity to refactor .NET to remove major parts that are no longer considered core. This new product is branded as the .NET Core, which includes a cross-platform implementation of the CLR known as CoreCLR and a streamlined library of classes known as CoreFX. Streamlining .NET .NET Core is much smaller than the current version of the .NET Framework because a lot has been removed. For example, Windows Forms and Windows Presentation Foundation (WPF) can be used to build graphical user interface (GUI) applications, but they are tightly bound to Windows, so they have been removed from the .NET Core. The latest technology for building Windows apps is the Universal Windows Platform (UWP). ASP.NET Web Forms and Windows Communication Foundation (WCF) are old web applications and service technologies that fewer developers choose to use today, so they have also been removed from the .NET Core. Instead, developers prefer to use ASP.NET MVC and ASP.NET Web API. These two technologies have been refactored and combined into a new product that runs on the .NET Core named ASP.NET Core. The Entity Framework (EF) 6.x is an object-relational mapping technology for working with data stored in relational databases such as Oracle and Microsoft SQL Server. It has gained baggage over the years, so the cross-platform version has been slimmed down and named Entity Framework Core. Some data types in .NET that are included with both the .NET Framework and the .NET Core have been simplified by removing some members. For example, in the .NET Framework, the File class has both a Close and Dispose method, and either can be used to release the file resources. In .NET Core, there is only the Dispose method. This reduces the memory footprint of the assembly and simplifies the API you have to learn. The .NET Framework 4.6 is about 200 MB. The .NET Core is about 11 MB. Eventually, the .NET Core may grow to a similar larger size. Microsoft's goal is not to make the .NET Core smaller than the .NET Framework. The goal is to componentize .NET Core to support modern technologies and to have fewer dependencies so that deployment requires only those components that your application really needs. Understanding the .NET Standard The situation with .NET today is that there are three forked .NET platforms, all controlled by Microsoft: .NET Framework, Xamarin, and .NET Core. Each have different strengths and weaknesses. This has led to the problem that a developer must learn three platforms, each with annoying quirks and limitations. So, Microsoft is working on defining the .NET Standard 2.0: a set of APIs that all .NET platforms must implement. Today, in 2016, there is the .NET Standard 1.6, but only .NET Core 1.0 supports it; .NET Framework and Xamarin do not! .NET Standard 2.0 will be implemented by .NET Framework, .NET Core, and Xamarin. For .NET Core, this will add many of the missing APIs that developers need to port old code written for .NET Framework to the new cross-platform .NET Core. .NET Standard 2.0 will probably be released towards the end of 2017, so I hope to write a third edition of this book for when that's finally released. The future of .NET The .NET Standard 2.0 is the near future of .NET, and it will make it much easier for developers to share code between any flavor of .NET, but we are not there yet. For cross-platform development, .NET Core is a great start, but it will take another version or two to become as mature as the current version of the .NET Framework. This book will focus on the .NET Core, but will use the .NET Framework when important or useful features have not (yet) been implemented in the .NET Core. Understanding the .NET Native compiler Another .NET initiative is the .NET Native compiler. This compiles C# code to native CPU instructions ahead-of-time (AoT) rather than using the CLR to compile IL just-in-time (JIT) to native code later. The .NET Native compiler improves execution speed and reduces the memory footprint for applications. It supports the following: UWP apps for Windows 10, Windows 10 Mobile, Xbox One, HoloLens, and Internet of Things (IoT) devices such as Raspberry Pi Server-side web development with ASP.NET Core Console applications for use on the command line Comparing .NET technologies The following table summarizes and compares the .NET technologies: Platform Feature set C# compiles to Host OSes .NET Framework Mature and extensive IL executed by a runtime Windows only Xamarin Mature and limited to mobile features iOS, Android, Windows Mobile .NET Core Brand-new and somewhat limited Windows, Linux, macOS, Docker .NET Native Brand-new and very limited Native code Summary In this article, we have learned how to set up the development environment, and discussed in detail about .NET technologies. Resources for Article: Further resources on this subject: Introduction to C# and .NET [article] Reactive Programming with C# [article] Functional Programming in C# [article]
Read more
  • 0
  • 0
  • 11259

article-image-metric-analytics-metricbeat
Packt
11 Jan 2017
5 min read
Save for later

Metric Analytics with Metricbeat

Packt
11 Jan 2017
5 min read
In this article by Bahaaldine Azarmi, the author of the book Learning Kibana 5.0, we will learn about metric analytics, which is fundamentally different in terms of data structure. (For more resources related to this topic, see here.) Author would like to spend a few lines on the following question: What is a metric? A metric is an event that contains a timestamp and usually one or more numeric values. It is appended to a metric file sequentially, where all lines of metrics are ordered based on the timestamp. As an example, here are a few system metrics: 02:30:00 AM    all    2.58    0.00    0.70    1.12    0.05     95.5502:40:00 AM    all    2.56    0.00    0.69    1.05    0.04     95.6602:50:00 AM    all    2.64    0.00    0.65    1.15    0.05     95.50 Unlike logs, metrics are sent periodically, for example, every 10 minutes (as the preceding example illustrates) whereas logs are usually appended to the log file when something happens. Metrics are often used in the context of software or hardware health monitoring, such as resource utilization monitoring, database execution metrics monitoring, and so on. Since version 5.0, Elastic had, at all layers of the solutions, new features to enhance the user experience of metrics management and analytics. Metricbeat is one of the new features in 5.0. It allows the user to ship metrics data, whether from the machine or from applications, to Elasticsearch, and comes with out-of-the-box dashboards for Kibana. Kibana also integrates Timelion with its core, a plugin which has been made for manipulating numeric data, such as metrics. In this article, we'll start by working with Metricbeat. Metricbeat in Kibana The procedure to import the dashboard has been laid out in the subsequent section. Importing the dashboard Before importing the dashboard, let's have a look at the actual metric data that Metricbeat ships. As I have Chrome opened while typing this article, I'm going to filter the data by process name, here chrome: Discover tab filtered by process name   Here is an example of one of the documents I have: { "_index": "metricbeat-2016.09.06", "_type": "metricsets", "_id": "AVcBFstEVDHwfzZYZHB8", "_score": 4.29527, "_source": { "@timestamp": "2016-09-06T20:00:53.545Z", "beat": { "hostname": "MacBook-Pro-de-Bahaaldine.local", "name": "MacBook-Pro-de-Bahaaldine.local" }, "metricset": { "module": "system", "name": "process", "rtt": 5916 }, "system": { "process": { "cmdline": "/Applications/Google Chrome.app/Contents/Versions/52.0.2743.116/Google Chrome Helper.app/Contents/MacOS/Google Chrome Helper --type=ppapi --channel=55142.2188.1032368744 --ppapi-flash-args --lang=fr", "cpu": { "start_time": "09:52", "total": { "pct": 0.0035 } }, "memory": { "rss": { "bytes": 67813376, "pct": 0.0039 }, "share": 0, "size": 3355303936 }, "name": "Google Chrome H", "pid": 76273, "ppid": 55142, "state": "running", "username": "bahaaldine" } }, "type": "metricsets" }, "fields": { "@timestamp": [ 1473192053545 ] } } Metricbeat document example The preceding document breaks down the utilization of resources for the chrome process. We can see, for example, the usage of CPU and memory, as well as the state of the process as a whole. Now how about visualizing the data in an actual dashboard? To do so, go into the Kibana folder located in the Metricbeat installation directory: MacBook-Pro-de-Bahaaldine:kibana bahaaldine$ pwd /elastic/metricbeat-5.0.0/kibana MacBook-Pro-de-Bahaaldine:kibana bahaaldine$ ls dashboard import_dashboards.ps1 import_dashboards.sh index-pattern search visualization import_dashboards.sh is the file we will use to import the dashboards in Kibana. Execute the file script like the following: ./import_dashboards.sh –h This should print out the help, which, essentially, will give you the list of arguments you can pass to the script. Here, we need to specify a username and a password as we are using the X-Pack security plugin, which secures our cluster: ./import_dashboards.sh –u elastic:changeme You should normally get a bunch of logs stating that dashboards have been imported, as shown in the following example: Import visualization Servers-overview: {"_index":".kibana","_type":"visualization","_id":"Servers-overview","_version":4,"forced_refresh":false,"_shards":{"total":2,"successful":1,"failed":0},"created":false} Now, at this point, you have metric data in Elasticsearch and dashboards created in Kibana, so you can now visualize the data. Visualizing metrics If you go back into the Kibana/dashboard section and try to open the Metricbeat System Statistics dashboard, you should get something similar to the following: Metricbeat Kibana dashboard You should see in your own dashboard the metric based on the processes that are running on your computer. In my case, I have a bunch of them for which I can visualize the CPU and memory utilization, for example: RAM and CPU utilization As an example, what can be important here is to be sure that Metricbeat has a very low footprint on the overall system in terms of CPU or RAM, as shown here: Metricbeat resource utilization As we can see in the preceding diagram, Metricbeat only uses about 0.4% of the CPU and less than 0.1% of the memory on my Macbook Pro. On the other hand, if I want to get the most resource-consuming processes, I can check in the Top processes data table, which gives the following information: Top processes Besides Google Chrome H, which uses a lot of CPU, zoom.us, a conferencing application, seems to bring a lot of stress to my laptop. Rather than using the Kibana standard visualization to manipulate our metrics, we'll use Timelion instead, and focus on this heavy CPU consuming processes use case. Summary In this article, we have seen how we can use Kibana in the context of technical metric analytics. We relied on the data that Metricbeat is able to ship from a machine and visualized the result both in Kibana dashboard and in Kibana Timelion. Resources for Article: Further resources on this subject: An Introduction to Kibana [article] Big Data Analytics [article] Static Data Management [article]
Read more
  • 0
  • 0
  • 3342

article-image-elastic-stack-overview
Packt
10 Jan 2017
9 min read
Save for later

Elastic Stack Overview

Packt
10 Jan 2017
9 min read
In this article by Ravi Kumar Gupta and Yuvraj Gupta, from the book, Mastering Elastic Stack, we will have an overview of Elastic Stack, it's very easy to read a log file of a few MBs or hundreds, so is it to keep data of this size in databases or files and still get sense out of it. But then a day comes when this data takes terabytes and petabytes, and even notepad++ would refuse to open a data file of a few hundred MBs. Then we start to find something for huge log management, or something that can index the data properly and make sense out of it. If you Google this, you would stumble upon ELK Stack. Elasticsearch manages your data, Logstash reads the data from different sources, and Kibana makes a fine visualization of it. Recently, ELK Stack has evolved as Elastic Stack. We will get to know about it in this article. The following are the points that will be covered in this article: Introduction to ELK Stack The birth of Elastic Stack Who uses the Stack (For more resources related to this topic, see here.) Introduction to ELK Stack It all began with Shay Banon, who started it as an open source project, Elasticsearch, successor of Compass, which gained popularity to become one of the top open source database engines. Later, based on the distributed model of working, Kibana was introduced to visualize the data present in Elasticsearch. Earlier, to put data into Elasticsearch we had rivers, which provided us with a specific input via which we inserted data into Elasticsearch. However, with growing popularity this setup required a tool via which we can insert data into Elasticsearch and have flexibility to perform various transformations on data to make unstructured data structured to have full control on how to process the data. Based on this premise, Logstash was born, which was then incorporated into the Stack, and together these three tools, Elasticsearch, Logstash, and Kibana were named ELK Stack. The following diagram is a simple data pipeline using ELK Stack: As we can see from the preceding figure, data is read using Logstash and indexed to Elasticsearch. Later we can use Kibana to read the indices from Elasticsearch and visualize it using charts and lists. Let's understand these components separately and the role they play in the making of the Stack. Logstash As we got to know that rivers were used initially to put data into Elasticsearch before ELK Stack. For ELK Stack, Logstash is the entry point for all types of data. Logstash has so many plugins to read data from a number of sources and so many output plugins to submit data to a variety of destinations and one of those is the Elasticsearch plugin, which helps to send data to Elasticsearch. After Logstash became popular, eventually rivers got deprecated as they made the cluster unstable and also performance issues were observed. Logstash does not ship data from one end to another; it helps us with collecting raw data and modifying/filtering it to convert it to something meaningful, formatted, and organized. The updated data is then sent to Elasticsearch. If there is no plugin available to support reading data from a specific source, or writing the data to a location, or modifying it in your way, Logstash is flexible enough to allow you to write your own plugins. Simply put, Logstash is open source, highly flexible, rich with plugins, can read your data from your choice of location, normalizes it as per your defined configurations, and sends it to a particular destination as per the requirements. Elasticsearch All of the data read by Logstash is sent to Elasticsearch for indexing. There is a lot more than just indexing. Elasticsearch is not only used to index data, but it is a full-text search engine, highly scalable, distributed, and offers many more things. Elasticsearch manages and maintains your data in the form of indices, offers you to query, access, and aggregate the data using its APIs. Elasticsearch is based on Lucene, thus providing you all of the features that Lucene does. Kibana Kibana uses Elasticsearch APIs to read/query data from Elasticsearch indices to visualize and analyze in the form of charts, graphs and tables. Kibana is in the form of a web application, providing you a highly configurable user interface that lets you query the data, create a number of charts to visualize, and make actual sense out of the data stored. After a robust ELK Stack, as time passed, a few important and complex demands took place, such as authentication, security, notifications, and so on. This demand led for few other tools such as Watcher (providing alerting and notification based on changes in data), Shield (authentication and authorization for securing clusters), Marvel (monitoring statistics of the cluster), ES-Hadoop, Curator, and Graph as requirement arose. The birth of Elastic Stack All the jobs of reading data were done using Logstash, but that's resource consuming. Since Logstash runs on JVM, it consumes a good amount of memory. The community realized the need of improvement and to make the pipelining process resource friendly and lightweight. Back in 2015, Packetbeat was born, a project which was an effort to make a network packet analyzer that could read from different protocols, parse the data, and ship to Elasticsearch. Being lightweight in nature did the trick and a new concept of Beats was formed. Beats are written in Go programming language. The project evolved a lot, and now ELK stack was no more than just Elasticsearch, Logstash, and Kibana, but Beats also became a significant component. The pipeline now looked as follows: Beat A Beat reads data, parses it, and can ship to either Elasticsearch or Logstash. The difference is that they are lightweight, serve a specific purpose, and are installed as agents. There are few beats available such as Topbeat, Filebeat, Packetbeat, and so on, which are supported and provided by the Elastic.co and a good number of Beats already written by the community. If you have a specific requirement, you can write your own Beat using the libbeat library. In simple words, Beats can be treated as very light weight agents to ship data to either Logstash or Elasticsearch and offer you an infrastructure using the libbeat library to create your own Beats. Together Elasticsearch, Logstash, Kibana, and Beats become Elastic Stack, formally known as ELK Stack. Elastic Stack did not just add Beats to its team, but they will be using the same version always. The starting version of the Elastic Stack will be 5.0.0 and the same version will apply to all the components. This version and release method is not only for Elastic Stack, but for other tools of the Elastic family as well. Due to so many tools, there was a problem of unification, wherein each tool had their own version and every version was not compatible with each other, hence leading to a problem. To solve this, now all of the tools will be built, tested, and released together. All of these components play a significant role in creating a pipeline. While Beats and Logstash are used to collect the data, parse it, and ship it, Elasticsearch creates indices, which is finally used by Kibana to make visualizations. While Elastic Stack helps with a pipeline, other tools add security, notifications, monitoring, and other such capabilities to the setup. Who uses Elastic Stack? In the past few years, implementations of Elastic Stack have been increasing very rapidly. In this section, we will consider a few case studies to understand how Elastic Stack has helped this development. Salesforce Salesforce developed a new plugin named ELF (Event Log Files) to collect Salesforce logged data to enable auditing of user activities. The purpose was to analyze the data to understand user behavior and trends in Salesforce. The plugin is available on GitHub at https://github.com/developerforce/elf_elk_docker. This plugin simplifies the Stack configuration and allows us to download ELF to get indexed and finally sensible data that can be visualized using Kibana. This implementation utilizes Elasticsearch, Logstash, and Kibana. CERN There is not just one use case that Elastic Stack helped CERN (European Organization for Nuclear Research), but five. At CERN, Elastic Stack is used for the following: Messaging Data monitoring Cloud benchmarking Infrastructure monitoring Job monitoring Multiple Kibana dashboards are used by CERN for a number of visualizations. Green Man Gaming This is an online gaming platform where game providers publish their games. The website wanted to make a difference by proving better gameplay. They started using Elastic Stack to do log analysis, search, and analysis of gameplay data. They began with setting up Kibana dashboards to gain insights about the counts of gamers by the country and currency used by gamers. That helped them to understand and streamline the support and help in order to provide an improved response. Apart from these case studies, Elastic Stack is used by a number of other companies to gain insights of the data they own. Sometimes, not all of the components are used, that is, not all of the times a Beat would be used and Logstash would be configured. Sometimes, only an Elasticsearch and Kibana combination is used. If we look at the users within the organization, all of the titles who are expected to do big data analysis, business intelligence, data visualizations, log analysis, and so on, can utilize Elastic Stack for their technical forte. A few of these titles are data scientists, DevOps, and so on. Stack competitors Well, it would be wrong to call for Elastic Stack competitors because Elastic Stack has been emerged as a strong competitor to many other tools in the market in recent years and is growing rapidly. A few of these are: Open source: Graylog: Visit https://www.graylog.org/ for more information InfluxDB: Visit https://influxdata.com/ for more information Others: Logscape: Visit http://logscape.com/ for more information Logscene: Visit http://sematext.com/logsene/ for more information Splunk: Visit http://www.splunk.com/ for more information Sumo Logic: Visit https://www.sumologic.com/ for more information Kibana competitors: Grafana: Visit http://grafana.org/ for more information Graphite: Visit https://graphiteapp.org/ for more information Elasticsearch competitors: Lucene/Solr: Visit http://lucene.apache.org/solr/ or https://lucene.apache.org/ for more information Sphinx: Visit http://sphinxsearch.com/ for more information Most of these compare with respect to log management, while Elastic Stack is much more than that. It offers you the ability to analyze any type of data and not just logs. Resources for Article: Further resources on this subject: AIO setup of OpenStack – preparing the infrastructure code environment [article] Our App and Tool Stack [article] Using the OpenStack Dashboard [article]
Read more
  • 0
  • 0
  • 3325
article-image-installing-and-using-vuejs
Packt
10 Jan 2017
14 min read
Save for later

Installing and Using Vue.js

Packt
10 Jan 2017
14 min read
In this article by Olga Filipova, the author of the book Learning Vue.js 2, explores the key concepts of Vue.js framework to understand all its behind the scenes. Also in this article, we will analyze all possible ways of installing Vue.js. We will also learn the ways of debugging and testing our applications. (For more resources related to this topic, see here.) So, in this article we are going to learn: What is MVVM architecture paradigm and how does it apply to Vue.js How to install, start, run, and debug Vue application MVVM architectural pattern Do you remember how to create the Vue instance? We were instantiating it calling new Vue({…}). You also remember that in the options we were passing the element on the page where this Vue instance should be bound and the data object that contained properties we wanted to bind to our view. The data object is our model and DOM element where Vue instance is bound is view. Classic View-Model representation where Vue instance binds one to another In the meantime, our Vue instance is something that helps to bind our model to the View and vice-versa. Our application thus follows Model-View-ViewModel (MVVM) pattern where the Vue instance is a ViewModel. The simplified diagram of Model View ViewModel pattern Our Model contains data and some business logic, our View is responsible for its representation. ViewModel handles data binding ensuring the data changed in the Model is immediately affecting the View layer and vice-versa. Our Views thus become completely data-driven. ViewModel becomes responsible for the control of data flow, making data binding fully declarative for us. Installing, using, and debugging a Vue.js application In this section, we will analyze all possible ways of installing Vue.js. We will also create a skeleton for our. We will also learn the ways of debugging and testing our applications. Installing Vue.js There are a number of ways to install Vue.js. Starting from classic including the downloaded script into HTML within <script> tags, using tools like bower, npm, or Vue's command-line interface (vue-cli) to bootstrap the whole application. Let's have a look at all these methods and choose our favorite. In all these examples we will just show a header on a page saying Learning Vue.js. Standalone Download the vue.js file. There are two versions, minified and developer version. The development version is here: https://vuejs.org/js/vue.js. The minified version is here: https://vuejs.org/js/vue.min.js. If you are developing, make sure you use the development non-minified version of Vue. You will love nice tips and warnings on the console. Then just include vue.js in the script tags: <script src=“vue.js”></script> Vue is registered in the global variable. You are ready to use it. Our example will then look as simple as the following: <div id="app"> <h1>{{ message }}</h1> </div> <script src="vue.js"></script> <script> var data = { message: "Learning Vue.js" }; new Vue({ el: "#app", data: data }); </script> CDN Vue.js is available in the following CDN's: jsdelivr: https://cdn.jsdelivr.net/vue/1.0.25/vue.min.js cdnjs: https://cdnjs.cloudflare.com/ajax/libs/vue/1.0.25/vue.min.js npmcdn: https://npmcdn.com/vue@1.0.25/dist/vue.min.js Just put the url in source in the script tag and you are ready to use Vue! <script src=“ https://cdnjs.cloudflare.com/ajax/libs/vue/1.0.25/vue.min.js”></script> Beware so, the CDN version might not be synchronized with the latest available version of Vue. Thus, the example will look like exactly the same as in the standalone version, but instead of using downloaded file in the <script> tags, we are using a CDN URL. Bower If you are already managing your application with bower and don't want to use other tools, there's also a bower distribution of Vue. Just call bower install: # latest stable release bower install vue Our example will look exactly like the two previous examples, but it will include the file from the bower folder: <script src=“bower_components/vue/dist/vue.js”></script> CSP-compliant CSP (content security policy) is a security standard that provides a set of rules that must be obeyed by the application in order to prevent security attacks. If you are developing applications for browsers, more likely you know pretty well about this policy! For the environments that require CSP-compliant scripts, there’s a special version of Vue.js here: https://github.com/vuejs/vue/tree/csp/dist Let’s do our example as a Chrome application to see the CSP compliant vue.js in action! Start from creating a folder for our application example. The most important thing in a Chrome application is the manifest.json file which describes your application. Let’s create it. It should look like the following: { "manifest_version": 2, "name": "Learning Vue.js", "version": "1.0", "minimum_chrome_version": "23", "icons": { "16": "icon_16.png", "128": "icon_128.png" }, "app": { "background": { "scripts": ["main.js"] } } } The next step is to create our main.js file which will be the entry point for the Chrome application. The script should listen for the application launching and open a new window with given sizes. Let’s create a window of 500x300 size and open it with index.html: chrome.app.runtime.onLaunched.addListener(function() { // Center the window on the screen. var screenWidth = screen.availWidth; var screenHeight = screen.availHeight; var width = 500; var height = 300; chrome.app.window.create("index.html", { id: "learningVueID", outerBounds: { width: width, height: height, left: Math.round((screenWidth-width)/2), top: Math.round((screenHeight-height)/2) } }); }); At this point the Chrome specific application magic is over and now we shall just create our index.html file that will do the same thing as in the previous examples. It will include the vue.js file and our script where we will initialize our Vue application: <html lang="en"> <head> <meta charset="UTF-8"> <title>Vue.js - CSP-compliant</title> </head> <body> <div id="app"> <h1>{{ message }}</h1> </div> <script src="assets/vue.js"></script> <script src="assets/app.js"></script> </body> </html> Download the CSP-compliant version of vue.js and add it to the assets folder. Now let’s create the app.js file and add the code that we already wrote added several times: var data = { message: "Learning Vue.js" }; new Vue({ el: "#app", data: data }); Add it to the assets folder. Do not forget to create two icons of 16 and 128 pixels and call them icon_16.png and icon_128.png. Your code and structure in the end should look more or less like the following: Structure and code for the sample Chrome application using vue.js And now the most important thing. Let’s check if it works! It is very very simple. Go to chrome://extensions/ url in your Chrome browser. Check Developer mode checkbox. Click on Load unpacked extension... and check the folder that we’ve just created. Your app will appear in the list! Now just open a new tab, click on apps, and check that your app is there. Click on it! Sample Chrome application using vue.js in the list of chrome apps Congratulations! You have just created a Chrome application! NPM NPM installation method is recommended for large-scale applications. Just run npm install vue: # latest stable release npm install vue # latest stable CSP-compliant release npm install vue@csp And then require it: var Vue = require(“vue”); Or, for ES2015 lovers: import Vue from “vue”; Our HTML in our example will look exactly like in the previous examples: <html lang="en"> <head> <meta charset="UTF-8"> <title>Vue.js - NPM Installation</title> </head> <body> <div id="app"> <h1>{{ message }}</h1> </div> <script src="main.js"></script> </body> </html> Now let’s create a script.js file that will look almost exactly the same as in standalone or CDN version with only difference that it will require vue.js: var Vue = require("vue"); var data = { message: "Learning Vue.js" }; new Vue({ el: "#app", data: data }); Let’s install vue and browserify in order to be able to compile our script.js into the main.js file: npm install vue –-save-dev npm install browserify –-save-dev In the package.json file add also a script for build that will execute browserify on script.js transpiling it into main.js. So our package.json file will look like this: { "name": "learningVue", "scripts": { "build": "browserify script.js -o main.js" }, "version": "0.0.1", "devDependencies": { "browserify": "^13.0.1", "vue": "^1.0.25" } } Now run: npm install npm run build And open index.html in the browser. I have a friend that at this point would say something like: really? So many steps, installations, commands, explanations… Just to output some header? I’m out! If you are also thinking this, wait. Yes, this is true, now we’ve done something really simple in a rather complex way, but if you stay with me a bit longer, you will see how complex things become easy to implement if we use the proper tools. Also, do not forget to check your Pomodoro timer, maybe it’s time to take a rest! Vue-cli Vue provides its own command line interface that allows bootstrapping single page applications using whatever workflows you want. It immediately provides hot reloading and structure for test driven environment. After installing vue-cli just run vue init <desired boilerplate> <project-name> and then just install and run! # install vue-cli $ npm install -g vue-cli # create a new project $ vue init webpack learn-vue # install and run $ cd learn-vue $ npm install $ npm run dev Now open your browser on localhost:8080. You just used vue-cli to scaffold your application. Let’s adapt it to our example. Open a source folder. In the src folder you will find an App.vue file. Do you remember we talked about Vue components that are like bricks from which you build your application? Do you remember that we were creating them and registering inside our main script file and I mentioned that we will learn to build components in more elegant way? Congratulations, you are looking at the component built in a fancy way! Find the line that says import Hello from './components/Hello'. This is exactly how the components are being reused inside other components. Have a look at the template at the top of the component file. At some point it contains the tag <hello></hello>. This is exactly where in our HTML file will appear the Hello component. Have a look at this component, it is in the src/components folder. As you can see, it contains a template with {{ msg }} and a script that exports data with defined msg. This is exactly the same what we were doing in our previous examples without using components. Let’s slightly modify the code to make it the same as in the previous examples. In the Hello.vue file change the msg in data object: <script> export default { data () { return { msg: “Learning Vue.js” } } } </script> In the App.vue component remove everything from the template except the hello tag, so the template looks like this: <template> <div id="app"> <hello></hello> </div> </template> Now if you rerun the application you will see our example with beautiful styles we didn’t touch: vue application bootstrapped using vue-cli Besides webpack boilerplate template you can use the following configurations with your vue-cli: webpack-simple: A simple Webpack + vue-loader setup for quick prototyping. browserify: A full-featured Browserify + vueify setup with hot-reload, linting and unit testing. browserify-simple: A simple Browserify + vueify setup for quick prototyping. simple: The simplest possible Vue setup in a single HTML file Dev build My dear reader, I can see your shining eyes and I can read your mind. Now that you know how to install and use Vue.js and how does it work, you definitely want to put your hands deeply into the core code and contribute! I understand you. For this you need to use development version of Vue.js which you have to download from GitHub and compile yourself. Let’s build our example with this development version vue. Create a new folder, for example, dev-build and copy all the files from the npm example to this folder. Do not forget to copy the node_modules folder. You should cd into it and download files from GitHub to it, then run npm install and npm run build. cd <APP-PATH>/node_modules git clone https://github.com/vuejs/vue.git cd vue npm install npm run build Now build our example application: cd <APP-PATH> npm install npm run build Open index.html in the browser, you will see the usual Learning Vue.js header. Let’s now try to change something in vue.js source! Go to the node_modules/vue/src folder. Open config.js file. The second line defines delimeters: let delimiters = ['{{', '}}'] This defines the delimiters used in the html templates. The things inside these delimiters are recognized as a Vue data or as a JavaScript code. Let’s change them! Let’s replace “{{” and “}}” with double percentage signs! Go on and edit the file: let delimiters = ['%%', '%%'] Now rebuild both Vue source and our application and refresh the browser. What do you see? After changing Vue source and replacing delimiters, {{}} delimeters do not work anymore! The message inside {{}} is no longer recognized as data that we passed to Vue. In fact, it is being rendered as part of HTML. Now go to the index.html file and replace our curly brackets delimiters with double percentage: <div id="app"> <h1>%% message %%</h1> </div> Rebuild our application and refresh the browser! What about now? You see how easy it is to change the framework’s code and to try out your changes. I’m sure you have plenty of ideas about how to improve or add some functionality to Vue.js. So change it, rebuild, test, deploy! Happy pull requests! Debug Vue application You can debug your Vue application the same way you debug any other web application. Use your developer tools, breakpoints, debugger statements, and so on. Vue also provides vue devtools so it gets easier to debug Vue applications. You can download and install it from the Chrome web store: https://chrome.google.com/webstore/detail/vuejs-devtools/nhdogjmejiglipccpnnnanhbledajbpd After installing it, open, for example, our shopping list application. Open developer tools. You will see the Vue tab has automatically appeared: Vue devtools In our case we only have one component—Root. As you can imagine, once we start working with components and having lots of them, they will all appear in the left part of the Vue devtools palette. Click on the Root component and inspect it. You’ll see all the data attached to this component. If you try to change something, for example, add a shopping list item, check or uncheck a checkbox, change the title, and so on, all these changes will be immediately propagated to the data in the Vue devtools. You will immediately see the changes on the right side of it. Let’s try, for example, to add shopping list item. Once you start typing, you see on the right how newItem changes accordingly: The changes in the models are immediately propagated to the Vue devtools data When we start adding more components and introduce complexity to our Vue applications, the debugging will certainly become more fun! Summary In this article we have analyzed the behind the scenes of Vue.js. We learned how to install Vue.js. We also learned how to debug Vue application. Resources for Article: Further resources on this subject: API with MongoDB and Node.js [Article] Tips & Tricks for Ext JS 3.x [Article] Working with Forms using REST API [Article]
Read more
  • 0
  • 0
  • 74441

article-image-visualization-dashboard-design
Packt
10 Jan 2017
18 min read
Save for later

Visualization Dashboard Design

Packt
10 Jan 2017
18 min read
In this article by David Baldwin, the author of the book Mastering Tableau, we will cover how you need to create some effective dashboards. (For more resources related to this topic, see here.) Since that fateful week in Manhattan, I've read Edward Tufte, Stephen Few, and other thought leaders in the data visualization space. This knowledge has been very fruitful. For instance, quite recently a colleague told me that one of his clients thought a particular dashboard had too many bar charts and he wanted some variation. I shared the following two quotes: Show data variation, not design variation. –Edward Tufte in The Visual Display of Quantitative Information Variety might be the spice of life, but, if it is introduced on a dashboard for its own sake, the display suffers. –Stephen Few in Information Dashboard Design Those quotes proved helpful for my colleague. Hopefully the following information will prove helpful to you. Additionally I would also like to draw attention to Alberto Cairo—a relatively new voice providing new insight. Each of these authors should be considered a must-read for anyone working in data visualization. Visualization design theory Dashboard design Sheet selection Visualization design theory Any discussion on designing dashboards should begin with information about constructing well-designed content. The quality of the dashboard layout and the utilization of technical tips and tricks do not matter if the content is subpar. In other words we should consider the worksheets displayed on dashboards and ensure that those worksheets are well-designed. Therefore, our discussion will begin with a consideration of visualization design principles. Regarding these principles, it's tempting to declare a set of rules such as: To plot change over time, use a line graph To show breakdowns of the whole, use a treemap To compare discrete elements, use a bar chart To visualize correlation, use a scatter plot But of course even a cursory review of the preceding list brings to mind many variations and alternatives! Thus, we will consider various rules while always keeping in mind that rules (at least rules such as these) are meant to be broken. Formatting rules The following formatting rules encompass fonts, lines, and bands. Fonts are, of course, an obvious formatting consideration. Lines and bands, however, may not be something you typically think of when formatting, especially when considering formatting from the perspective of Microsoft Word. But if we broaden formatting considerations to think of Adobe Illustrator, InDesign, and other graphic design tools, then lines and bands are certainly considered. This illustrates that data visualization is closely related to graphic design and that formatting considers much more than just textual layout. Rule – keep the font choice simple Typically using one or two fonts on a dashboard is advisable. More fonts can create a confusing environment and interfere with readability. Fonts chosen for titles should be thick and solid while the body fonts should be easy to read. As of Tableau 10.0 choosing appropriate fonts is simple because of the new Tableau Font Family. Go to Format | Font to display the Format Font window to see and choose these new fonts: Assuming your dashboard is primarily intended for the screen, sans serif fonts are best. On the rare occasions a dashboard is primarily intended for print, you may consider serif fonts; particularly if the print resolution is high. Rule – Trend line > Fever line > Reference line > Drop line > Zero line > Grid line The preceding pseudo formula is intended to communicate line visibility. For example, trend line visibility should be greater than fever line visibility. Visibility is usually enhanced by increasing line thickness but may be enhanced via color saturation or by choosing a dotted or dashed line over a solid line. The trend line, if present, is usually the most visible line on the graph. Trend lines are displayed via the Analytics pane and can be adjusted via Format à Lines. The fever line (for example, the line used on a time-series chart) should not be so heavy as to obscure twists and turns in the data. Although a fever line may be displayed as dotted or dashed by utilizing the Pages shelf, this is usually not advisable because it may obscure visibility. The thickness of a fever line can be adjusted by clicking on the Size shelf in the Marks View card. Reference lines are usually less prevalent than either fever or trend lines and can be formatted by going to Format | Reference lines. Drop lines are not frequently used. To deploy drop lines, right-click in a blank portion of the view and go to Drop lines | Show drop lines. Next, click on a point in the view to display a drop line. To format droplines, go to Format | Droplines. Drop lines are relevant only if at least one axis is utilized in the visualization. Zero lines (sometimes referred to as base lines) display only if zero or negative values are included in the view or positive numerical values are relatively close to zero. Format zero lines by going to Format | Lines. Grid lines should be the most muted lines on the view and may be dispensed with altogether. Format grid lines by going to Format | Lines. Rule – band in groups of three to five Visualizations comprised of a tall table of text or horizontal bars should segment dimension members in groups of three to five. Exercise – banding Navigate to https://public.tableau.com/profile/david.baldwin#!/ to locate and download the workbook. Navigate to the worksheet titled Banding. Select the Superstore data source and place Product Name on the Rows shelf. Double-click on Discount, Profit, Quantity, and Sales. Navigate to Format | Shading and set Band Size under Row Banding so that three to five lines of text are encompassed by each band. Be sure to set an appropriate color for both Pane and Header: Note that after completing the preceding five steps, Tableau defaulted to banding every other row. This default formatting is fine for a short table but is quite busy for a tall table. The band in groups of three to five rule is influenced by Dona W. Wong, who, in her book The Wall Street Journal Guide to Information Graphics, recommends separating long tables or bar charts with thin rules to separate the bars in groups of three to five to help the readers read across. Color rules It seems slightly ironic to discuss color rules in a black-and-white publication such as Mastering Tableau. Nonetheless, even in a monochromatic setting, a discussion of color is relevant. For example, exclusive use of black text communicates differently than using variations of gray. The following survey of color rules should be helpful to ensure that you use colors effectively in a variety of settings. Rule – keep colors simple and limited Stick to the basic hues and provide only a few (perhaps three to five) hue variations. Alberto Cairo, in his book The Functional Art: An Introduction to Information Graphics and Visualization, provides insights into why this is important. The limited capacity of our visual working memory helps explain why it's not advisable to use more than four or five colors or pictograms to identify different phenomena on maps and charts. Rule – respect the psychological implication of colors In Western society, there is a color vocabulary so pervasive, it's second nature. Exit signs marking stairwell locations are red. Traffic cones are orange. Baby boys are traditionally dressed in blue while baby girls wear pink. Similarly, in Tableau reds and oranges should usually be associated with negative performance while blues and greens should be associated with positive performance. Using colors counterintuitively can cause confusion. Rule – be colorblind-friendly Colorblindness is usually manifested as an inability to distinguish red and green or blue and yellow. Red/green and blue/yellow are on opposite sides of the color wheel. Consequently, the challenges these color combinations present for colorblind individuals can be easily recreated with image editing software such as Photoshop. If you are not colorblind, convert an image with these color combinations to grayscale and observe. The challenge presented to the 8.0% of the males and 0.5% of the females who are color blind becomes immediately obvious! Rule – use pure colors sparingly The resulting colors from the following exercise should be a very vibrant red, green, and blue. Depending on the monitor, you may even find it difficult to stare directly at the colors. These are known as pure colors and should be used sparingly; perhaps only to highlight particularly important items. Exercise – using pure colors Open the workbook and navigate to the worksheet entitled Pure Colors. Select the Superstore data source and place Category on both the Rows shelf and the Color shelf. Set the Fit to Entire View. Click on the Color shelf and choose Edit Colors…. In the Edit Colors dialog box, double-click on the color icons to the left of each dimension member; that is, Furniture, Office Supplies, and Technology: Within the resulting dialog box, set furniture to an HTML value of #0000ff, Office Supplies to #ff0000, and Technology to #00ff00. Rule – color variations over symbol variation Deciphering different symbols takes more mental energy for the end user than distinguishing color. Therefore color variation should be used over symbol variation. This rule can actually be observed in Tableau defaults. Create a scatter plot and place a dimension with many members on the Color shelf and Shape shelf respectively. Note that by default, the view will display 20 unique colors but only 10 unique shapes. Older versions of Tableau (such as Tableau 9.0) display warnings that include text such as “…the recommended maximum for this shelf is 10”: Visualization type rules We won't spend time here to delve into a lengthy list of visualization type rules. However, it does seem appropriate to review at least a couple of rules. In the following exercise, we will consider keeping shapes simple and effectively using pie charts. Rule – keep shapes simple Too many shape details impede comprehension. This is because shape details draw the user's focus away from the data. Consider the following exercise on using two different shopping cart images. Exercise – shapes Open the workbook associated and navigate to the worksheet entitled Simple Shopping Cart. Note that the visualization is a scatterplot showing the top 10 selling Sub-Categories in terms of total sales and profits. On your computer, navigate to the Shapes directory located in the My Tableau Repository. On my computer, the path is C:UsersDavid BaldwinDocumentsMy Tableau RepositoryShapes. Within the Shapes directory, create a folder named My Shapes. Reference the link included in the comment section of the worksheet to download the assets. In the downloaded material, find the images titled Shopping_Cart and Shopping_Cart_3D and copy those images into the My Shapes directory created previously. Within Tableau, access the Simple Shopping Cart worksheet. Click on the Shape shelf and then select More Shapes. Within the Edit Shape dialog box, click on the Reload Shapes button. Select the My Shapes palette and set the shape to the simple shopping cart. After closing the dialog box, click on the Size shelf and adjust as desired. Also adjust other aspects of the visualization as desired. Navigate to the 3D Shopping Cart worksheet and then repeat steps 8 to 11. Instead of using the simple shopping cart, use the 3D shopping cart: Compare the two visualizations. Which version of the shopping cart is more attractive? Likely the cart with the 3D look was your choice. Why not choose the more attractive image? Making visualizations attractive is only of secondary concern. The primary goal is to display the data as clearly and efficiently as possible. A simple shape is grasped more quickly and intuitively than a complex shape. Besides, the cuteness of the 3D image will quickly wear off. Rule – use pie charts sparingly Edward Tufte makes an acrid (and somewhat humorous) comment against the use of pie charts in his book The Visual Display of Quantitative Information. A table is nearly always better than a dumb pie chart; the only worse design than a pie chart is several of them. Given their low density and failure to order numbers along a visual dimension, pie charts should never be used. The present sentiment in data visualization circles is largely sympathetic to Tufte's criticism. There may, however, be some exceptions; that is, some circumstances where a pie chart is optimal. Consider the following visualization: Which of the four visualizations best demonstrates that A accounts for 25% of the whole? Clearly it is the pie chart! Therefore, perhaps it is fairer to refer to pie charts as limited and to use them sparingly as opposed to considering them inherently evil. Compromises In this section, we will transition from more or less strict rules to compromises. Often, building visualizations is a balancing act. It's common to encounter contradictory directions from books, blogs, consultants, and within organizations. One person may insist on utilizing every pixel of space while another urges simplicity and whitespace. One counsels a guided approach while another recommends building wide open dashboards that allow end users to discover their own path. Avant gardes may crave esoteric visualizations while those of a more conservative bent prefer to stay with the conventional. We now explore a few of the more common competing requests and suggests compromises. Make the dashboard simple versus make the dashboard robust Recently a colleague showed me a complex dashboard he had just completed. Although he was pleased that he had managed to get it working well, he felt the need to apologize by saying, “I know it's dense and complex but it's what the client wanted.” Occam's Razor encourages the simplest possible solution for any problem. For my colleague's dashboard, the simplest solution was rather complex. This is OK! Complexity in Tableau dashboarding need not be shunned. But a clear understanding of some basic guidelines can help the author intelligently determine how to compromise between demands for simplicity and demands for robustness. More frequent data updates necessitate simpler design. Some Tableau dashboards may be near-real-time. Third-party technology may be utilized to force a browser displaying a dashboard via Tableau Server to refresh every few minutes to ensure the absolute latest data displays. In such cases, the design should be quite simple. The end user must be able to see at a glance all pertinent data and should not use that dashboard for extensive analysis. Conversely, a dashboard that is refreshed monthly can support high complexity and thus may be used for deep exploration. Greater end user expertise supports greater dashboard complexity. Know thy users. If they want easy, at-a-glance visualizations, keep the dashboards simple. If they like deep dives, design accordingly. Smaller audiences require more precise design. If only a few people monitor a given dashboard, it may require a highly customized approach. In such cases, specifications may be detailed, complex, and difficult to execute and maintain because the small user base has expectations that may not be natively easy to produce in Tableau. Screen resolution and visualization complexity are proportional. Users with low-resolution devices will need to interact fairly simply with a dashboard. Thus the design of such a dashboard will likely be correspondingly uncomplicated. Conversely, high-resolution devices support greater complexity. Greater distance from the screen requires larger dashboard elements. If the dashboard is designed for conference room viewing, the elements on the dashboard may need to be fairly large to meet the viewing needs of those far from the screen. Thus the dashboard will likely be relatively simple. Conversely, a dashboard to be viewed primarily on end users desktops can be more complex. Although these points are all about simple versus complex, do not equate simple with easy. A simple and elegantly designed dashboard can be more difficult to create than a complex dashboard. In the words of Steve Jobs: Simple can be harder than complex: You have to work hard to get your thinking clean to make it simple. But it's worth it in the end because once you get there, you can move mountains. Present dense information versus present sparse information Normally, a line graph should have a maximum of four to five lines. However, there are times when you may wish to display many lines. A compromise can be achieved by presenting many lines and empowering the end user to highlight as desired. The following line graph displays the percentage of Internet usage by country from 2000 to 2012. Those countries with the largest increases have been highlighted. Assuming that Highlight Selected Items has been activated within the Color legend, the end user can select items (countries in this case) from the legend to highlight as desired. Or, even better, a worksheet can be created listing all countries and used in conjunction with a highlight action on a dashboard to focus attention on selected items on the line graph: Tell a story versus allow a story to be discovered Albert Cairo, in his excellent book The Functional Art: An Introduction to Information Graphics and Visualization, includes a section where he interviews prominent data visualization and information graphics professionals. Two of these interviews are remarkable for their opposing views. I… feel that many visualization designers try to transform the user into an editor.  They create these amazing interactive tools with tons of bubbles, lines, bars, filters, and scrubber bars, and expect readers to figure the story out by themselves, and draw conclusions from the data. That's not an approach to information graphics I like. – Jim Grimwade The most fascinating thing about the rise of data visualization is exactly that anyone can explore all those large data sets without anyone telling us what the key insight is. – Moritz Stefaner Fortunately, the compromise position can be found in the Jim Grimwade interview: [The New York Times presents] complex sets of data, and they let you go really deep into the figures and their connections. But beforehand, they give you some context, some pointers as to what you can do with those data. If you don't do this… you will end up with a visualization that may look really beautiful and intricate, but that will leave readers wondering, What has this thing really told me? What is this useful for? – Jim Grimwade Although the case scenarios considered in the preceding quotes are likely quite different from the Tableau work you are involved in, the underlying principles remain the same. You can choose to tell a story or build a platform that allows the discovery of numerous stories. Your choice will differ depending on the given dataset and audience. If you choose to create a platform for story discovery, be sure to take the New York Times approach suggested by Grimwade. Provide hints, pointers, and good documentation to lead your end user to successfully interact with the story you wish to tell or successfully discover their own story. Document, Document, Document! But don't use any space! Immediately above we considered the suggestion Provide hints, pointers, and good documentation… but there's an issue. These things take space. Dashboard space is precious. Often Tableau authors are asked to squeeze more and more stuff on a dashboard and are hence looking for ways to conserve space. Here are some suggestions for maximizing documentation on a dashboard while minimally impacting screen real estate. Craft titles for clear communication Titles are expected. Not just a title for a dashboard and worksheets on the dashboard, but also titles for legends, filters and other objects. These titles can be used for effective and efficient documentation. For instance a filter should not just read Market. Instead it should say something like Select a Market. Notice the imperative statement. The user is being told to do something and this is a helpful hint. Adding a couple of words to a title will usually not impact dashboard space. Use subtitles to relay instructions A subtitle will take some extra space but it does not have to be much. A small, italicized font immediately underneath a title is an obvious place a user will look at for guidance. Consider an example: red represents loss. This short sentence could be used as a subtitle that may eliminate the need for a legend and thus actually save space. Use intuitive icons Consider a use case of navigating from one dashboard to another. Of course you could associate an action with some hyperlinked text stating Click here to navigate to another dashboard. But this seems quite unnecessary when an action can be associated with a small, innocuous arrow, such as is natively used in PowerPoint, to communicate the same thing. Store more extensive documentation in a tooltip associated with a help icon. A small question mark in the top-right corner of an application is common. This clearly communicates where to go if additional help is required. As shown in the following exercise, it's easy to create a similar feature on a Tableau dashboard. Summary Hence from this article we studied to create some effective dashboards that are very beneficial in corporate world as a statistical tool to calculate average growth in terms of revenue. Resources for Article: Further resources on this subject: Say Hi to Tableau [article] Tableau Data Extract Best Practices [article] Getting Started with Tableau Public [article]
Read more
  • 0
  • 0
  • 3264
Modal Close icon
Modal Close icon