Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletter Hub
Free Learning
Arrow right icon
timer SALE ENDS IN
0 Days
:
00 Hours
:
00 Minutes
:
00 Seconds

How-To Tutorials

7019 Articles
article-image-working-incanter-datasets
Packt
04 Feb 2015
28 min read
Save for later

Working with Incanter Datasets

Packt
04 Feb 2015
28 min read
In this article by Eric Rochester author of the book, Clojure Data Analysis Cookbook, Second Edition, we will cover the following recipes: Loading Incanter's sample datasets Loading Clojure data structures into datasets Viewing datasets interactively with view Converting datasets to matrices Using infix formulas in Incanter Selecting columns with $ Selecting rows with $ Filtering datasets with $where Grouping data with $group-by Saving datasets to CSV and JSON Projecting from multiple datasets with $join (For more resources related to this topic, see here.) Introduction Incanter combines the power to do statistics using a fully-featured statistical language such as R (http://www.r-project.org/) with the ease and joy of Clojure. Incanter's core data structure is the dataset, so we'll spend some time in this article to look at how to use them effectively. While learning basic tools in this manner is often not the most exciting way to spend your time, it can still be incredibly useful. At its most fundamental level, an Incanter dataset is a table of rows. Each row has the same set of columns, much like a spreadsheet. The data in each cell of an Incanter dataset can be a string or a numeric. However, some operations require the data to only be numeric. First you'll learn how to populate and view datasets, then you'll learn different ways to query and project the parts of the dataset that you're interested in onto a new dataset. Finally, we'll take a look at how to save datasets and merge multiple datasets together. Loading Incanter's sample datasets Incanter comes with a set of default datasets that are useful for exploring Incanter's functions. I haven't made use of them in this book, since there is so much data available in other places, but they're a great way to get a feel of what you can do with Incanter. Some of these datasets—for instance, the Iris dataset—are widely used to teach and test statistical algorithms. It contains the species and petal and sepal dimensions for 50 irises. This is the dataset that we'll access today. In this recipe, we'll load a dataset and see what it contains. Getting ready We'll need to include Incanter in our Leiningen project.clj file: (defproject inc-dsets "0.1.0":dependencies [[org.clojure/clojure "1.6.0"]                 [incanter "1.5.5"]]) We'll also need to include the right Incanter namespaces into our script or REPL: (use '(incanter core datasets)) How to do it… Once the namespaces are available, we can access the datasets easily: user=> (def iris (get-dataset :iris))#'user/iris user=> (col-names iris)[:Sepal.Length :Sepal.Width :Petal.Length :Petal.Width :Species]user=> (nrow iris)150 user=> (set ($ :Species iris))#{"versicolor" "virginica" "setosa"} How it works… We use the get-dataset function to access the built-in datasets. In this case, we're loading the Fisher's Iris dataset, sometimes called Anderson's dataset. This is a multivariate dataset for discriminant analysis. It gives petal and sepal measurements for 150 different Irises of three different species. Incanter's sample datasets cover a wide variety of topics—from U.S. arrests to plant growth and ultrasonic calibration. They can be used to test different algorithms and analyses and to work with different types of data. By the way, the names of functions should be familiar to you if you've previously used R. Incanter often uses the names of R's functions instead of using the Clojure names for the same functions. For example, the preceding code sample used nrow instead of count. There's more... Incanter's API documentation for get-dataset (http://liebke.github.com/incanter/datasets-api.html#incanter.datasets/get-dataset) lists more sample datasets, and you can refer to it for the latest information about the data that Incanter bundles. Loading Clojure data structures into datasets While they are good for learning, Incanter's built-in datasets probably won't be that useful for your work (unless you work with irises). Other recipes cover ways to get data from CSV files and other sources into Incanter. Incanter also accepts native Clojure data structures in a number of formats. We'll take look at a couple of these in this recipe. Getting ready We'll just need Incanter listed in our project.clj file: (defproject inc-dsets "0.1.0":dependencies [[org.clojure/clojure "1.6.0"]                 [incanter "1.5.5"]]) We'll also need to include this in our script or REPL: (use 'incanter.core) How to do it… The primary function used to convert data into a dataset is to-dataset. While it can convert single, scalar values into a dataset, we'll start with slightly more complicated inputs. Generally, you'll be working with at least a matrix. If you pass this to to-dataset, what do you get? user=> (def matrix-set (to-dataset [[1 2 3] [4 5 6]]))#'user/matrix-set user=> (nrow matrix-set)2user=> (col-names matrix-set)[:col-0 :col-1 :col-2] All the data's here, but it can be labeled in a better way. Does to-dataset handle maps? user=> (def map-set (to-dataset {:a 1, :b 2, :c 3}))#'user/map-set user=> (nrow map-set)1 user=> (col-names map-set)[:a :c :b] So, map keys become the column labels. That's much more intuitive. Let's throw a sequence of maps at it: user=> (def maps-set (to-dataset [{:a 1, :b 2, :c 3},                                 {:a 4, :b 5, :c 6}]))#'user/maps-setuser=> (nrow maps-set)2user=> (col-names maps-set)[:a :c :b] This is much more useful. We can also create a dataset by passing the column vector and the row matrix separately to dataset: user=> (def matrix-set-2         (dataset [:a :b :c]                         [[1 2 3] [4 5 6]]))#'user/matrix-set-2 user=> (nrow matrix-set-2)2 user=> (col-names matrix-set-2)[:c :b :a] How it works… The to-dataset function looks at the input and tries to process it intelligently. If given a sequence of maps, the column names are taken from the keys of the first map in the sequence. Ultimately, it uses the dataset constructor to create the dataset. When you want the most control, you should also use the dataset. It requires the dataset to be passed in as a column vector and a row matrix. When the data is in this format or when we need the most control—to rename the columns, for instance—we can use dataset. Viewing datasets interactively with view Being able to interact with our data programmatically is important, but sometimes it's also helpful to be able to look at it. This can be especially useful when you do data exploration. Getting ready We'll need to have Incanter in our project.clj file and script or REPL, so we'll use the same setup as we did for the Loading Incanter's sample datasets recipe, as follows. We'll also use the Iris dataset from that recipe. (use '(incanter core datasets)) How to do it… Incanter makes this very easy. Let's take a look at just how simple it is: First, we need to load the dataset, as follows: user=> (def iris (get-dataset :iris)) #'user/iris Then we just call view on the dataset: user=> (view iris) This function returns the Swing window frame, which contains our data, as shown in the following screenshot. This window should also be open on your desktop, although for me, it's usually hiding behind another window: How it works… Incanter's view function takes any object and tries to display it graphically. In this case, it simply displays the raw data as a table. Converting datasets to matrices Although datasets are often convenient, many times we'll want to treat our data as a matrix from linear algebra. In Incanter, matrices store a table of doubles. This provides good performance in a compact data structure. Moreover, we'll need matrices many times because some of Incanter's functions, such as trans, only operate on a matrix. Plus, it implements Clojure's ISeq interface, so interacting with matrices is also convenient. Getting ready For this recipe, we'll need the Incanter libraries, so we'll use this project.clj file: (defproject inc-dsets "0.1.0":dependencies [[org.clojure/clojure "1.6.0"]                 [incanter "1.5.5"]]) We'll use the core and io namespaces, so we'll load these into our script or REPL: (use '(incanter core io)) This line binds the file name to the identifier data-file: (def data-file "data/all_160_in_51.P35.csv") How to do it… For this recipe, we'll create a dataset, convert it to a matrix, and then perform some operations on it: First, we need to read the data into a dataset, as follows: (def va-data (read-dataset data-file :header true)) Then, in order to convert it to a matrix, we just pass it to the to-matrix function. Before we do this, we'll pull out a few of the columns since matrixes can only contain floating-point numbers: (def va-matrix    (to-matrix ($ [:POP100 :HU100 :P035001] va-data))) Now that it's a matrix, we can treat it like a sequence of rows. Here, we pass it to first in order to get the first row, take in order to get a subset of the matrix, and count in order to get the number of rows in the matrix: user=> (first va-matrix) A 1x3 matrix ------------- 8.19e+03 4.27e+03 2.06e+03   user=> (count va-matrix) 591 We can also use Incanter's matrix operators to get the sum of each column, for instance. The plus function takes each row and sums each column separately: user=> (reduce plus va-matrix) A 1x3 matrix ------------- 5.43e+06 2.26e+06 1.33e+06 How it works… The to-matrix function takes a dataset of floating-point values and returns a compact matrix. Matrices are used by many of Incanter's more sophisticated analysis functions, as they're easy to work with. There's more… In this recipe, we saw the plus matrix operator. Incanter defines a full suite of these. You can learn more about matrices and see what operators are available at https://github.com/liebke/incanter/wiki/matrices. Using infix formulas in Incanter There's a lot to like about lisp: macros, the simple syntax, and the rapid development cycle. Most of the time, it is fine if you treat math operators as functions and use prefix notations, which is a consistent, function-first syntax. This allows you to treat math operators in the same way as everything else so that you can pass them to reduce, or anything else you want to do. However, we're not taught to read math expressions using prefix notations (with the operator first). And especially when formulas get even a little complicated, tracing out exactly what's happening can get hairy. Getting ready For this recipe we'll just need Incanter in our project.clj file, so we'll use the dependencies statement—as well as the use statement—from the Loading Clojure data structures into datasets recipe. For data, we'll use the matrix that we created in the Converting datasets to matrices recipe. How to do it… Incanter has a macro that converts a standard math notation to a lisp notation. We'll explore that in this recipe: The $= macro changes its contents to use an infix notation, which is what we're used to from math class: user=> ($= 7 * 4)28user=> ($= 7 * 4 + 3)31 We can also work on whole matrixes or just parts of matrixes. In this example, we perform a scalar multiplication of the matrix: user=> ($= va-matrix * 4)A 591x3 matrix---------------3.28e+04 1.71e+04 8.22e+03 2.08e+03 9.16e+02 4.68e+02 1.19e+03 6.52e+02 3.08e+02...1.41e+03 7.32e+02 3.72e+02 1.31e+04 6.64e+03 3.49e+03 3.02e+04 9.60e+03 6.90e+03 user=> ($= (first va-matrix) * 4)A 1x3 matrix-------------3.28e+04 1.71e+04 8.22e+03 Using this, we can build complex expressions, such as this expression that takes the mean of the values in the first row of the matrix: user=> ($= (sum (first va-matrix)) /           (count (first va-matrix)))4839.333333333333 Or we can build expressions take the mean of each column, as follows: user=> ($= (reduce plus va-matrix) / (count va-matrix))A 1x3 matrix-------------9.19e+03 3.83e+03 2.25e+03 How it works… Any time you're working with macros and you wonder how they work, you can always get at their output expressions easily, so you can see what the computer is actually executing. The tool to do this is macroexpand-1. This expands the macro one step and returns the result. It's sibling function, macroexpand, expands the expression until there is no macro expression left. Usually, this is more than we want, so we just use macroexpand-1. Let's see what these macros expand into: user=> (macroexpand-1 '($= 7 * 4))(incanter.core/mult 7 4)user=> (macroexpand-1 '($= 7 * 4 + 3))(incanter.core/plus (incanter.core/mult 7 4) 3)user=> (macroexpand-1 '($= 3 + 7 * 4))(incanter.core/plus 3 (incanter.core/mult 7 4)) Here, we can see that the expression doesn't expand into Clojure's * or + functions, but it uses Incanter's matrix functions, mult and plus, instead. This allows it to handle a variety of input types, including matrices, intelligently. Otherwise, it switches around the expressions the way we'd expect. Also, we can see by comparing the last two lines of code that it even handles operator precedence correctly. Selecting columns with $ Often, you need to cut the data to make it more useful. One common transformation is to pull out all the values from one or more columns into a new dataset. This can be useful for generating summary statistics or aggregating the values of some columns. The Incanter macro $ slices out parts of a dataset. In this recipe, we'll see this in action. Getting ready For this recipe, we'll need to have Incanter listed in our project.clj file: (defproject inc-dsets "0.1.0":dependencies [[org.clojure/clojure "1.6.0"]                 [incanter "1.5.5"]                [org.clojure/data.csv "0.1.2"]]) We'll also need to include these libraries in our script or REPL: (require '[clojure.java.io :as io]         '[clojure.data.csv :as csv]         '[clojure.string :as str]         '[incanter.core :as i]) Moreover, we'll need some data. This time, we'll use some country data from the World Bank. Point your browser to http://data.worldbank.org/country and select a country. I picked China. Under World Development Indicators, there is a button labeled Download Data. Click on this button and select CSV. This will download a ZIP file. I extracted its contents into the data/chn directory in my project. I bound the filename for the primary data file to the data-file name. How to do it… We'll use the $ macro in several different ways to get different results. First, however, we'll need to load the data into a dataset, which we'll do in steps 1 and 2: Before we start, we'll need a couple of utilities that load the data file into a sequence of maps and makes a dataset out of those: (defn with-header [coll] (let [headers (map #(keyword (str/replace % space -))                      (first coll))]    (map (partial zipmap headers) (next coll))))   (defn read-country-data [filename] (with-open [r (io/reader filename)]    (i/to-dataset      (doall (with-header                (drop 2 (csv/read-csv r))))))) Now, using these functions, we can load the data: user=> (def chn-data (read-country-data data-file)) We can select columns to be pulled out from the dataset by passing the column names or numbers to the $ macro. It returns a sequence of the values in the column: user=> (i/$ :Indicator-Code chn-data) ("AG.AGR.TRAC.NO" "AG.CON.FERT.PT.ZS" "AG.CON.FERT.ZS" … We can select more than one column by listing all of them in a vector. This time, the results are in a dataset: user=> (i/$ [:Indicator-Code :1992] chn-data)   |           :Indicator-Code |               :1992 | |---------------------------+---------------------| |           AG.AGR.TRAC.NO |             770629 | |         AG.CON.FERT.PT.ZS |                     | |           AG.CON.FERT.ZS |                     | |           AG.LND.AGRI.K2 |             5159980 | … We can list as many columns as we want, although the formatting might suffer: user=> (i/$ [:Indicator-Code :1992 :2002] chn-data)   |           :Indicator-Code |               :1992 |               :2002 | |---------------------------+---------------------+---------------------| |           AG.AGR.TRAC.NO |            770629 |                     | |         AG.CON.FERT.PT.ZS |                     |     122.73027213719 | |           AG.CON.FERT.ZS |                     |   373.087159048868 | |           AG.LND.AGRI.K2 |             5159980 |             5231970 | … How it works… The $ function is just a wrapper over Incanter's sel function. It provides a good way to slice columns out of the dataset, so we can focus only on the data that actually pertains to our analysis. There's more… The indicator codes for this dataset are a little cryptic. However, the code descriptions are in the dataset too: user=> (i/$ [0 1 2] [:Indicator-Code :Indicator-Name] chn-data)   |   :Indicator-Code |                                               :Indicator-Name | |-------------------+---------------------------------------------------------------| |   AG.AGR.TRAC.NO |                             Agricultural machinery, tractors | | AG.CON.FERT.PT.ZS |           Fertilizer consumption (% of fertilizer production) | |   AG.CON.FERT.ZS | Fertilizer consumption (kilograms per hectare of arable land) | … See also… For information on how to pull out specific rows, see the next recipe, Selecting rows with $. Selecting rows with $ The Incanter macro $ also pulls rows out of a dataset. In this recipe, we'll see this in action. Getting ready For this recipe, we'll use the same dependencies, imports, and data as we did in the Selecting columns with $ recipe. How to do it… Similar to how we use $ in order to select columns, there are several ways in which we can use it to select rows, shown as follows: We can create a sequence of the values of one row using $, and pass it the index of the row we want as well as passing :all for the columns: user=> (i/$ 0 :all chn-data) ("AG.AGR.TRAC.NO" "684290" "738526" "52661" "" "880859" "" "" "" "59657" "847916" "862078" "891170" "235524" "126440" "469106" "282282" "817857" "125442" "703117" "CHN" "66290" "705723" "824113" "" "151281" "669675" "861364" "559638" "191220" "180772" "73021" "858031" "734325" "Agricultural machinery, tractors" "100432" "" "796867" "" "China" "" "" "155602" "" "" "770629" "747900" "346786" "" "398946" "876470" "" "795713" "" "55360" "685202" "989139" "798506" "") We can also pull out a dataset containing multiple rows by passing more than one index into $ with a vector (There's a lot of data, even for three rows, so I won't show it here): (i/$ (range 3) :all chn-data) We can also combine the two ways to slice data in order to pull specific columns and rows. We can either pull out a single row or multiple rows: user=> (i/$ 0 [:Indicator-Code :1992] chn-data) ("AG.AGR.TRAC.NO" "770629") user=> (i/$ (range 3) [:Indicator-Code :1992] chn-data)   |   :Indicator-Code | :1992 | |-------------------+--------| |   AG.AGR.TRAC.NO | 770629 | | AG.CON.FERT.PT.ZS |       | |   AG.CON.FERT.ZS |       | How it works… The $ macro is the workhorse used to slice rows and project (or select) columns from datasets. When it's called with two indexing parameters, the first is the row or rows and the second is the column or columns. Filtering datasets with $where While we can filter datasets before we import them into Incanter, Incanter makes it easy to filter and create new datasets from the existing ones. We'll take a look at its query language in this recipe. Getting ready We'll use the same dependencies, imports, and data as we did in the Selecting columns with $ recipe. How to do it… Once we have the data, we query it using the $where function: For example, this creates a dataset with a row for the percentage of China's total land area that is used for agriculture: user=> (def land-use          (i/$where {:Indicator-Code "AG.LND.AGRI.ZS"}                    chn-data)) user=> (i/nrow land-use) 1 user=> (i/$ [:Indicator-Code :2000] land-use) ("AG.LND.AGRI.ZS" "56.2891584865366") The queries can be more complicated too. This expression picks out the data that exists for 1962 by filtering any empty strings in that column: user=> (i/$ (range 5) [:Indicator-Code :1962]          (i/$where {:1962 {:ne ""}} chn-data))   |   :Indicator-Code |             :1962 | |-------------------+-------------------| |   AG.AGR.TRAC.NO |             55360 | |   AG.LND.AGRI.K2 |           3460010 | |   AG.LND.AGRI.ZS | 37.0949187612906 | |   AG.LND.ARBL.HA |         103100000 | | AG.LND.ARBL.HA.PC | 0.154858284392508 | Incanter's query language is even more powerful than this, but these examples should show you the basic structure and give you an idea of the possibilities. How it works… To better understand how to use $where, let's break apart the last example: ($i/where {:1962 {:ne ""}} chn-data) The query is expressed as a hashmap from fields to values (highlighted). As we saw in the first example, the value can be a raw value, either a literal or an expression. This tests for inequality. ($i/where {:1962 {:ne ""}} chn-data) Each test pair is associated with a field in another hashmap (highlighted). In this example, both the hashmaps shown only contain one key-value pair. However, they might contain multiple pairs, which will all be ANDed together. Incanter supports a number of test operators. The basic boolean tests are :$gt (greater than), :$lt (less than), :$gte (greater than or equal to), :$lte (less than or equal to), :$eq (equal to), and :$ne (not equal). There are also some operators that take sets as parameters: :$in and :$nin (not in). The last operator—:$fn—is interesting. It allows you to use any predicate function. For example, this will randomly select approximately half of the dataset: (def random-half (i/$where {:Indicator-Code {:$fn (fn [_] (< (rand) 0.5))}}            chnchn-data)) There's more… For full details of the query language, see the documentation for incanter.core/query-dataset (http://liebke.github.com/incanter/core-api.html#incanter.core/query-dataset). Grouping data with $group-by Datasets often come with an inherent structure. Two or more rows might have the same value in one column, and we might want to leverage that by grouping those rows together in our analysis. Getting ready First, we'll need to declare a dependency on Incanter in the project.clj file: (defproject inc-dsets "0.1.0" :dependencies [[org.clojure/clojure "1.6.0"]                  [incanter "1.5.5"]                  [org.clojure/data.csv "0.1.2"]]) Next, we'll include Incanter core and io in our script or REPL: (require '[incanter.core :as i]          '[incanter.io :as i-io]) For data, we'll use the census race data for all the states. You can download it from http://www.ericrochester.com/clj-data-analysis/data/all_160.P3.csv. These lines will load the data into the race-data name: (def data-file "data/all_160.P3.csv") (def race-data (i-io/read-dataset data-file :header true)) How to do it… Incanter lets you group rows for further analysis or to summarize them with the $group-by function. All you need to do is pass the data to $group-by with the column or function to group on: (def by-state (i/$group-by :STATE race-data)) How it works… This function returns a map where each key is a map of the fields and values represented by that grouping. For example, this is how the keys look: user=> (take 5 (keys by-state)) ({:STATE 29} {:STATE 28} {:STATE 31} {:STATE 30} {:STATE 25}) We can get the data for Virginia back out by querying the group map for state 51. user=> (i/$ (range 3) [:GEOID :STATE :NAME :POP100]            (by-state {:STATE 51}))   | :GEOID | :STATE |         :NAME | :POP100 | |---------+--------+---------------+---------| | 5100148 |     51 | Abingdon town |   8191 | | 5100180 |     51 | Accomac town |     519 | | 5100724 |     51 | Alberta town |     298 | Saving datasets to CSV and JSON Once you've done the work of slicing, dicing, cleaning, and aggregating your datasets, you might want to save them. Incanter by itself doesn't have a good way to do this. However, with the help of some Clojure libraries, it's not difficult at all. Getting ready We'll need to include a number of dependencies in our project.clj file: (defproject inc-dsets "0.1.0":dependencies [[org.clojure/clojure "1.6.0"]                 [incanter "1.5.5"]                 [org.clojure/data.csv "0.1.2"]                 [org.clojure/data.json "0.2.5"]]) We'll also need to include these libraries in our script or REPL: (require '[incanter.core :as i]          '[incanter.io :as i-io]          '[clojure.data.csv :as csv]          '[clojure.data.json :as json]          '[clojure.java.io :as io]) Also, we'll use the same data that we introduced in the Selecting columns with $ recipe. How to do it… This process is really as simple as getting the data and saving it. We'll pull out the data for the year 2000 from the larger dataset. We'll use this subset of the data in both the formats here: (def data2000 (i/$ [:Indicator-Code :Indicator-Name :2000] chn-data)) Saving data as CSV To save a dataset as a CSV, all in one statement, open a file and use clojure.data.csv/write-csv to write the column names and data to it: (with-open [f-out (io/writer "data/chn-2000.csv")] (csv/write-csv f-out [(map name (i/col-names data2000))]) (csv/write-csv f-out (i/to-list data2000))) Saving data as JSON To save a dataset as JSON, open a file and use clojure.data.json/write to serialize the file: (with-open [f-out (io/writer "data/chn-2000.json")] (json/write (:rows data2000) f-out)) How it works… For CSV and JSON, as well as many other data formats, the process is very similar. Get the data, open the file, and serialize data into it. There will be differences in how the output function wants the data (to-list or :rows), and there will be differences in how the output function is called (for instance, whether the file handle is the first or second argument). But generally, outputting datasets will be very similar and relatively simple. Projecting from multiple datasets with $join So far, we've been focusing on splitting up datasets, on dividing them into groups of rows or groups of columns with functions and macros such as $ or $where. However, sometimes we'd like to move in the other direction. We might have two related datasets and want to join them together to make a larger one. For example, we might want to join crime data to census data, or take any two related datasets that come from separate sources and analyze them together. Getting ready First, we'll need to include these dependencies in our project.clj file: (defproject inc-dsets "0.1.0" :dependencies [[org.clojure/clojure "1.6.0"]                 [incanter "1.5.5"]                  [org.clojure/data.csv "0.1.2"]]) We'll use these statements for inclusions: (require '[clojure.java.io :as io]          '[clojure.data.csv :as csv]          '[clojure.string :as str]          '[incanter.core :as i]) For our data file, we'll use the same data that we introduced in the Selecting columns with $ recipe: China's development dataset from the World Bank. How to do it… In this recipe, we'll take a look at how to join two datasets using Incanter: To begin with, we'll load the data from the data/chn/chn_Country_en_csv_v2.csv file. We'll use the with-header and read-country-data functions that were defined in the Selecting columns with $ recipe: (def data-file "data/chn/chn_Country_en_csv_v2.csv") (def chn-data (read-country-data data-file)) Currently, the data for each row contains the data for one indicator across many years. However, for some analyses, it will be more helpful to have each row contain the data for one indicator for one year. To do this, let's first pull out the data from 2 years into separate datasets. Note that for the second dataset, we'll only include a column to match the first dataset (:Indicator-Code) and the data column (:2000): (def chn-1990 (i/$ [:Indicator-Code :Indicator-Name :1990]        chn-data)) (def chn-2000 (i/$ [:Indicator-Code :2000] chn-data)) Now, we'll join these datasets back together. This is contrived, but it's easy to see how we will do this in a more meaningful example. For example, we might want to join the datasets from two different countries: (def chn-decade (i/$join [:Indicator-Code :Indicator-Code]            chn-1990 chn-2000)) From this point on, we can use chn-decade just as we use any other Incanter dataset. How it works… Let's take a look at this in more detail: (i/$join [:Indicator-Code :Indicator-Code] chn-1990 chn-2000) The pair of column keywords in a vector ([:Indicator-Code :Indicator-Code]) are the keys that the datasets will be joined on. In this case, the :Indicator-Code column from both the datasets is used, but the keys can be different for the two datasets. The first column that is listed will be from the first dataset (chn-1990), and the second column that is listed will be from the second dataset (chn-2000). This returns a new dataset. Each row of this new dataset is a superset of the corresponding rows from the two input datasets. Summary In this article we have covered covers the basics of working with Incanter datasets. Datasets are the core data structures used by Incanter, and understanding them is necessary in order to use Incanter effectively. Resources for Article: Further resources on this subject: The Hunt for Data [article] Limits of Game Data Analysis [article] Clojure for Domain-specific Languages - Design Concepts with Clojure [article]
Read more
  • 0
  • 0
  • 3693

article-image-openlayers-key-components
Packt
04 Feb 2015
13 min read
Save for later

OpenLayers' Key Components

Packt
04 Feb 2015
13 min read
In this article by, Thomas Gratier, Paul Spencer, and Erik Hazzard, authors of the book OpenLayers 3 Beginner's Guide, we will see the various components of OpenLayers and a short description about them. (For more resources related to this topic, see here.) The OpenLayers library provides web developers with components useful for building web mapping applications. Following the principles of object-oriented design, these components are called classes. The relationship between all the classes in the OpenLayers library is part of the deliberate design, or architecture, of the library. There are two types of relationships that we, as developers using the library, need to know about: relationships between classes and inheritance between classes. Relationships between classes describe how classes, or more specifically, instances of classes, are related to each other. There are several different conceptual ways that classes can be related, but basically a relationship between two classes implies that one of the class uses the other in some way, and often vice-versa. Inheritance between classes shows how behavior of classes, and their relationships are shared with other classes. Inheritance is really just a way of sharing common behavior between several different classes. We'll start our discussion of the key components of OpenLayers by focusing on the first of these – the relationship between classes. We'll start by looking at the Map class – ol.Map. Its all about the map Instances of the Map class are at the center of every OpenLayers application. These objects are instances of the ol.Map class and they use instances of other classes to do their job, which is to put an interactive map onto a web page. Almost every other class in the OpenLayers is related to the Map class in some direct or indirect relationship. The following diagram illustrates the direct relationships that we are most interested in: The preceding diagram shows the most important relationships between the Map class and other classes it uses to do its job. It tells us several important things: A map has 0 or 1 view instances and it uses the name view to refer to it. A view may be associated with multiple maps, however. A map may have 0 or more instances of layers managed by a Collection class and a layer may be associated with 0 or one Map class. The Map class has a member variable named layers that it uses to refer to this collection. A map may have 0 or more instances of overlays managed by a Collection class and an overlay may be associated with 0 or one Map class. The Map class has a member variable named overlays that it uses to refer to this collection. A map may have 0 or more instances of controls managed by a class called ol.Collection and controls may be associated with 0 or one Map class. The Map class has a member variable named controls that it uses to refer to this collection. A map may have 0 or more instances of interactions managed by a Collection class and an interaction may be associated with 0 or one Map class. The Map class has a member variable named interactions that it uses to refer to this collection. Although these are not the only relationships between the Map class and other classes, these are the ones we'll be working with the most. The View class (ol.View) manages information about the current position of the Map class. If you are familiar with the programming concept of MVC (Model-View-Controller), be aware that the view class is not a View in the MVC sense. It does not provide the presentation layer for the map, rather it acts more like a controller (although there is not an exact parallel because OpenLayers was not designed with MVC in mind). The Layer class (ol.layer.Base) is the base class for classes that provide data to the map to be rendered. The Overlay class (ol.Overlay) is an interactive visual element like a control, but it is tied to a specific geographic position. The Control class (ol.control.Control) is the base class for a group of classes that collectively provide the ability to a user to interact with the Map. Controls have a visible user interface element (such as a button or a form input element) with which the user interacts. The Interaction class (ol.interaction.Interaction) is the base class for a group of classes that also allow the user to interact with the map, but differ from controls in which they have no visible user interface element. For example, the DragPan interaction allows the user to click on and drag the map to pan around. Controlling the Map's view The OpenLayers view class, ol.View, represents a simple two-dimensional view of the world. It is responsible for determining where, and to some degree how, the user is looking at the world. It is responsible for managing the following information: The geographic center of the map The resolution of the map, which is to say how much of the map we can see around the center The rotation of the map Although you can create a map without a view, it won't display anything until a view is assigned to it. Every map must have a view in order to display any map data at all. However, a view may be shared between multiple instances of the Map class. This effectively synchronizes the center, resolution, and rotation of each of the maps. In this way, you can create two or more maps in different HTML containers on a web page, even showing different information, and have them look at the same world position. Changing the position of any of the maps (for instance, by dragging one) automatically updates the other maps at the same time! Displaying map content So, if the view is responsible for managing where the user is looking in the world, which component is responsible for determining what the user sees there? That's the job of layers and overlays. A layer provides access to a source of geospatial data. There are two basic kinds of layers, that is, raster and vector layers: In computer graphics, the term raster (raster graphics) refers to a digital image. In OpenLayers, a raster layer is one that displays images in your map at specific geographic locations. In computer graphics, the term vector (vector graphics) refers to images that are defined in terms of geometric shapes, such as points, lines, and polygons—or mathematic formulae such as Bézier curves. In OpenLayers, a vector layer reads geospatial data from vector data (such as a KML file) and the data can then be drawn onto the map. Layers are not the only way to display spatial information on the map. The other way is to use an overlay. We can create instances of ol.Overlay and add them to the map at specific locations. The overlay then positions its content (an HTML element) on the map at the specified location. The HTML element can then be used like any other HTML element. The most common use of overlays is to display spatially relevant information in a pop-up dialog in response to the mouse moving over, or clicking on a geographic feature. Interacting with the map As mentioned earlier, the two components that allow users to interact with the map are Interactions and Controls. Let's look at them in a bit more detail. Using interactions Interactions are components that allow the user to interact with the map via some direct input, usually by using the mouse (or a finger with a touch screen). Interactions have no visible user interface. The default set of interactions are: ol.interaction.DoubleClickZoom: If you double-click the left mouse button, the map will zoom in by a factor of 2 ol.interaction.DragPan: If you drag the map, it will pan as you move the mouse ol.interaction.PinchRotate: On touch-enabled devices, placing two fingers on the device and rotating them in a circular motion will rotate the map ol.interaction.PinchZoom: On touch-enabled devices, placing two fingers on the device and pinching them together or spreading them apart will zoom the map out and in respectively ol.interaction.KeyboardPan: You can use the arrow keys to pan the map in the direction of the arrows ol.interaction.KeyboardZoom: You can use the + and – keys to zoom in and out ol.interaction.MouseWheelZoom: You can use the scroll wheel on a mouse to zoom the map in and out ol.interaction.DragZoom: If you hold the Shift key while dragging on map, a rectangular region will be drawn and when you release the mouse button, you will zoom into that area Controls Controls are components that allow the user to modify the map state via some visible user interface element, such as a button. In the examples we've seen so far, we've seen zoom buttons in the top-left corner of the map and an attribution control in the bottom-right corner of the map. In fact, the default controls are: ol.control.Zoom: This displays the zoom buttons in the top-left corner. ol.control.Rotate: This is a button to reset rotation to 0; by default, this is only displayed when the map's rotation is not 0. Ol.control.Attribution: This displays attribution text for the layers currently visible in the map. By default, the attributions are collapsed to a single icon in the bottom-right corner and clicking the icon will show the attributions. This concludes our brief overview of the central components of an OpenLayers application. We saw that the Map class is at the center of everything and there are some key components—the view, layers, overlays, interactions, and controls—that it uses to accomplish its job of putting an interactive map onto a web page. At the beginning of this article, we talked about both relationships and inheritance. So far, we've only covered the relationships. In the next section, we'll show the inheritance architecture of the key components and introduce three classes that have been working behind the scenes to make everything work. OpenLayers' super classes In this section, we will look at three classes in the OpenLayers library that we won't often work directly with, but which provide an enormous amount of functionality to most of the other classes in the library. The first two classes, Observable and Object, are at the base of the inheritance tree for OpenLayers—the so-called super classes that most classes inherit from. The third class, Collection, isn't actually a super class but is used as the basis for many relationships between classes in OpenLayers—we've already seen that the Map class relationships with layers, overlays, interactions, and controls are managed by instances of the Collection class. Before we jump into the details, take a look at the inheritance diagram for the components we've already discussed: As you can see, the Observable class, ol.Observable, is the base class for every component of OpenLayers that we've seen so far. In fact, there are very few classes in the OpenLayers library that do not inherit from the Observable class or one of its subclasses. Similarly, the Object class, ol.Object, is the base class for many classes in the library and itself is a subclass of Observable. The Observable and Object classes aren't very glamorous. You can't see them in action and they don't do anything very exciting from a user's perspective. What they do though is provide two common sets of behavior that you can expect to be able to use on almost every object you create or access through the OpenLayers library—Event management and Key-Value Observing (KVO). Event management with the Observable class An event is basically what it sounds like—something happening. Events are a fundamental part of how various components of OpenLayers—the map, layers, controls, and pretty much everything else—communicate with each other. It is often important to know when something has happened and to react to it. One type of event that is very useful is a user-generated event, such as a mouse click or touches on a mobile device's screen. Knowing when the user has clicked and dragged on the Map class allows some code to react to this and move the map to simulate panning it. Other types of events are internal, such as the map being moved or data finishing loading. To continue the previous example, once the map has moved to simulate panning, another event is issued by OpenLayers to say that the map has finished moving so that other parts of OpenLayers can react by updating the user interface with the center coordinates or by loading more data. Key-Value Observing with the Object class OpenLayers' Object class inherits from Observable and implements a software pattern called Key-Value Observing (KVO). With KVO, an object representing some data maintains a list of other objects that wish to observe it. When the data value changes, the observers are notified automatically. Working with Collections The last section for this article is about the OpenLayers' Collection class, ol.Collection. As mentioned, the Collection class is not a super class like Observable and Object, but it is an integral part of the relationship model. Many classes in OpenLayers make use of the Collection class to manage one-to-many relationships. At its core, the Collection class is a JavaScript array with additional convenience methods. It also inherits directly from the Object class and inherits the functionality of both Observable and Object. This makes the Collection class extremely powerful. Collection properties A Collection class, inherited from the Object class, has one observable property, length. When a collection changes (elements are added or removed), it's length property is updated. This means it also emits an event, change:length, when the length property is changed. Collection events A Collection class also inherits the functionality of the Observable class (via Object class) and emits two other events—add and remove. Registered event handler functions of both events will receive a single argument, a CollectionEvent, that has an element property with the element that was added or removed. Summary This wraps up our overview of the key concepts in the OpenLayers library. We took a quick look at the key components of the library from two different aspects—relationships and inheritance. With the Map class as the central object of any OpenLayers application, we looked at its main relationships to other classes including views, layers, overlays, interactions, and controls. We briefly introduced each of these classes to give an overview of primary purpose. We then investigated inheritance related to these objects and reviewed the super classes that provide functionality to most classes in the OpenLayers library—the Observable and Object classes. The Observable class provides a basic event mechanism and the Object class adds observable properties with a powerful binding feature. Lastly, we looked at the Collection class. Although this isn't part of the inheritance structure, it is crucial to know how one-to-many relationships work throughout the library (including the Map class relationships with layers, overlays, interactions, and controls). Resources for Article: Further resources on this subject: OGC for ESRI Professionals [Article] Improving proximity filtering with KNN [Article] OpenLayers: Overview of Vector Layer [Article]
Read more
  • 0
  • 0
  • 3611

article-image-introducing-salt
Packt
04 Feb 2015
11 min read
Save for later

Introducing Salt

Packt
04 Feb 2015
11 min read
In this article by Colton Myers, author of the book Learning SaltStack, we will learn the basic architecture of a Salt deployment. The two main pieces of Salt are the Salt Master and the Salt Minion. The master is the central hub. All minions connect to the master to receive instructions. From the master, you can run commands and apply configuration across hundreds or thousands of minions in seconds. The minion, as mentioned before, connects to the master and treats the master as the source of all truth. Although minions can exist without a master, the full power of Salt is realized when you have minions and the master working together. Salt is built on two major concepts: remote execution and configuration management. In the remote execution system, Salt leverages Python to accomplish complex tasks with single-function calls. The configuration management system in Salt, called States, builds upon the remote execution foundation to create repeatable, enforceable configuration for the minions. With this bird's-eye view in mind, let's get Salt installed so that we can start learning how to use it to make managing our infrastructure easier! (For more resources related to this topic, see here.) Installing Salt The dependencies for running Salt at the time of writing are as follows: Python 2—Version 2.6 or greater (not Python 3-compatible) msgpack-python YAML Jinja2 MarkupSafe Apache Libcloud Requests ZeroMQ—Version 3.2.0 or greater PyZMQ—Version 2.2.0 or greater PyCrypto M2Crypto The easiest way to ensure that the dependencies for Salt are met is to use system-specific package management systems, such as apt on Ubuntu systems, that will handle the dependency-resolution automatically. You can also use a script called Salt-Bootstrap to handle all of the system-specific commands for you. Salt-Bootstrap is an open source project with the goal of creating a Bourne shell-compatible script that will install Salt on any compatible server. The project is managed and hosted by the SaltStack team. You can find more information at https://github.com/saltstack/salt-bootstrap. We will explore each of these methods of installation in turn. Installation with system packages (Ubuntu) The latest release of Salt for Ubuntu is provided in Personal Package Archive (PPA), which is a type of package repository for Ubuntu. The easiest way to access the PPA to install Salt is using the add-apt-repository command, as follows: # sudo add-apt-repository ppa:saltstack/salt If the add-apt-repository command is not found, you can add it by installing the python-software-properties package: sudo apt-get install python-software-properties If you are using Ubuntu Version 12.10 or greater, this step should not be required as the add-apt-repository command should be included in the base system. After you have added the repository, you must update the package management database, as follows: # sudo apt-get update If the system asks whether you should accept a gpg key, press Enter to accept. You should then be able to install the Salt master and the Salt minion with the following command: # sudo apt-get install salt-master salt-minion Assuming there are no errors after running this command, you should be done! Salt is now installed on your machine. Note that we installed both the Salt master and the Salt minion. The term master refers to the central server—the server from which we will be controlling all of our other servers. The term minion refers to the servers connected to and controlled by a master. Installing with Salt-Bootstrap Information about manual installation on other major Linux distributions can be found online, at http://docs.saltstack.com. However, in most cases, it is easier and more straightforward to use a tool called Salt-Bootstrap. In-depth documentation can be found on the project page at https://github.com/saltstack/salt-bootstrap—however, the tool is actually quite easy to use, as follows: # curl -L https://bootstrap.saltstack.com -o install_salt.sh # sudo sh install_salt.sh –h We won't include the help text for Bootstrap here as it would take up too much space. However, it should be noted that, by default, Bootstrap will install only the Salt minion. We want both the Salt minion and the Salt master, which can be accomplished by passing in the -M flag, as follows: # sudo sh install_salt.sh -M The preceding command will result in a fully-functional installation of Salt on your machine! The supported operating system list is extensive, as follows: Amazon Linux AMI 2012.09 Arch Linux CentOS 5/6 Debian 6.x/7.x/8 (git installations only) Fedora 17/18 FreeBSD 9.1/9.2/10 Gentoo Linux Linaro Linux Mint 13/14 OpenSUSE 12.x Oracle Linux 5/6 RHEL 5/6 Scientific Linux 5/6 SmartOS SuSE 11 SP1 and 11 SP2 Ubuntu 10.x/11.x/12.x/13.x/14.x The version of Salt used for the examples in this book is the 2014.7 release. Here is the full version information: # sudo salt --versions-report            Salt: 2014.7.0          Python: 2.7.6          Jinja2: 2.7.2        M2Crypto: 0.21.1 msgpack-python: 0.3.0    msgpack-pure: Not Installed        pycrypto: 2.6.1        libnacl: Not Installed          PyYAML: 3.10          ioflo: Not Installed          PyZMQ: 14.0.1            RAET: Not Installed            ZMQ: 4.0.4            Mako: 0.9.1 It's probable that the version of Salt you installed is a newer release and might have slightly different output. However, the examples should still all work in the latest version of Salt. Configuring Salt Now that we have the master and the minion installed on our machine, we must do a couple of pieces of configuration in order to allow them to talk to each other. Firewall configuration Since Salt minions connect to masters, the only firewall configuration that must be done is on the master. By default, ports 4505 and 4506 must be able to accept incoming connections on the master. The default install of Ubuntu 14.04, used for these examples, actually requires no firewall configuration out-of-the-box to be able to run Salt; the ports required are already open. However, many distributions of Linux come with much more restrictive default firewall settings. The most common firewall software in use by default is iptables. Note that you might also have to change firewall settings on your network hardware if there is network filtering in place outside the software on the machine on which you're working. Firewall configuration is a topic that deserves its own book. However, our needs for the configuration of Salt are fairly simple. First, you must find the set of rules currently in effect for your system. This varies from system to system; for example, the file is located in /etc/sysconfig/iptables on RedHat distributions, while it is located in /etc/iptables/iptables.rules in Arch Linux. Once you find that file, add the following lines to that file, but be sure to do it above the line that says DROP: -A INPUT -m state --state new -m tcp -p tcp --dport 4505 -j ACCEPT -A INPUT -m state --state new -m tcp -p tcp --dport 4506 -j ACCEPT For more information about configuring on your operating system of choice so that your Salt minion can connect successfully to your Salt master, see the Salt documentation at http://docs.saltstack.com/en/latest/topics/tutorials/firewall.html. In version 2014.7.0, a new experimental transport option was introduced in Salt, called RAET. The use of this transport system is beyond the scope of this book. This book will deal exclusively with the default, ZeroMQ-based transport in Salt. Salt minion configuration Out of the box, the Salt minion is configured to connect to a master at the location salt. The reason for this default is that, if DNS is configured correctly such that salt resolves to the master's IP address, no further configuration is needed. The minion will connect successfully to the master. However, in our example, we do not have any DNS configuration in place, so we must configure this ourselves. The minion and master configuration files are located in the /etc/salt/ directory. The /etc/salt/ directory should be created as part of the installation of Salt, assuming you followed the preceding directions. If it does not exist for some reason, please create the directory, and create two files, minion and master, within the directory. Open /etc/salt/minion with your text editor of choice (remember to use sudo!). We will be making a couple of changes to this file. First, find the commented-out line for the configuration option master. It should look like this: #master: salt Uncomment that line and change salt to localhost (as we have this minion connected to the local master). It should look like this: master: localhost If you cannot find the appropriate line in the file, just add the line shown previously to the top of the file. You should also manually configure the minion ID so that you can more easily follow along with the examples in this text. Find the ID line: #id: Uncomment it and set it to myminion: id: myminion Again, if you cannot find the appropriate line in the file, just add the line shown previously to the top of the file. Save and close the file. Without a manually-specified minion ID, the minion will try to intelligently guess what its minion ID should be at startup. For most systems, this will mean the minion ID will be set to the Fully-Qualified Domain Name (FQDN) for the system. Starting the Salt master and Salt minion Now we need to start (or restart) our Salt master and Salt minion. Assuming you're following along on Ubuntu (which I recommend), you can use the following commands: # sudo service salt-minion restart # sudo service salt-master restart Packages in other supported distributions ship with init scripts for Salt. Use whichever service system is available to you to start or restart the Salt minion and Salt master. Accepting the minion key on the master There is one last step remaining before we can run our first Salt commands. We must tell the master that it can trust the minion. To help us with this, Salt comes with the salt-key command to help us manage minion keys: # sudo salt-key Accepted Keys: Unaccepted Keys: myminion Rejected Keys: Notice that our minion, myminion, is listed in the Unaccepted Keys section. This means that the minion has contacted the master and the master has cached that minion's public key, and is waiting for further instructions as to whether to accept the minion or not. If your minion is not showing up in the output of salt-key, it's possible that the minion cannot reach the master on ports 4505 and 4506. Please refer to the Firewall section described previously for more information. Troubleshooting information can also be found in the Salt documentation at http://docs.saltstack.com/en/latest/topics/troubleshooting/. We can inspect the key's fingerprint to ensure that it matches our minion's key, as follows: # sudo salt-key -f myminion Unaccepted Keys: myminion: a8:1f:b0:c2:ab:9d:27:13:60:c9:81:b1:11:a3:68:e1 We can use the salt-call command to run a command on the minion to obtain the minion's key, as follows: # sudo salt-call --local key.finger local:   a8:1f:b0:c2:ab:9d:27:13:60:c9:81:b1:11:a3:68:e1 Since the fingerprints match, we can accept the key on the master, as follows: # sudo salt-key -a myminion The following keys are going to be accepted: Unaccepted Keys: myminion Proceed? [n/Y] Y Key for minion myminion accepted. We can check that the minion key was accepted, as follows: # sudo salt-key Accepted Keys: myminion Unaccepted Keys: Rejected Keys: Success! We are ready to run our first Salt command! Summary We've covered a lot of ground in this article. We've installed the Salt minion and Salt master on our machines and configured them to talk to each other, including accepting the minion's key on the master. Resources for Article: Further resources on this subject: An Introduction to the Terminal [Article] Importing Dynamic Data [Article] Veil-Evasion [Article]
Read more
  • 0
  • 0
  • 1963

article-image-security-and-interoperability
Packt
03 Feb 2015
28 min read
Save for later

Security and Interoperability

Packt
03 Feb 2015
28 min read
 This article by Peter Waher, author of the book, Learning Internet of Things, we will focus on the security, interoperability, and what issues we need to address during the design of the overall architecture of Internet of Things (IoT) to avoid many of the unnecessary problems that might otherwise arise and minimize the risk of painting yourself into a corner. You will learn the following: Risks with IoT Modes of attacking a system and some counter measures The importance of interoperability in IoT (For more resources related to this topic, see here.) Understanding the risks There are many solutions and products marketed today under the label IoT that lack basic security architectures. It is very easy for a knowledgeable person to take control of devices for malicious purposes. Not only devices at home are at risk, but cars, trains, airports, stores, ships, logistics applications, building automation, utility metering applications, industrial automation applications, health services, and so on, are also at risk because of the lack of security measures in their underlying architecture. It has gone so far that many western countries have identified the lack of security measures in automation applications as a risk to national security, and rightly so. It is just a matter of time before somebody is literally killed as a result of an attack by a hacker on some vulnerable equipment connected to the Internet. And what are the economic consequences for a company that rolls out a product for use on the Internet that results into something that is vulnerable to well-known attacks? How has it come to this? After all the trouble Internet companies and applications have experienced during the rollout of the first two generations of the Web, do we repeat the same mistakes with IoT? Reinventing the wheel, but an inverted one One reason for what we discussed in the previous section might be the dissonance between management and engineers. While management knows how to manage known risks, they don't know how to measure them in the field of IoT and computer communication. This makes them incapable of understanding the consequences of architectural decisions made by its engineers. The engineers in turn might not be interested in focusing on risks, but on functionality, which is the fun part. Another reason might be that the generation of engineers who tackle IoT are not the same type of engineers who tackled application development on the Internet. Electronics engineers now resolve many problems already solved by computer science engineers decades earlier. Engineers working on machine-to-machine (M2M) communication paradigms, such as industrial automation, might have considered the problem solved when they discovered that machines could talk to each other over the Internet, that is, when the message-exchanging problem was solved. This is simply relabeling their previous M2M solutions as IoT solutions because the transport now occurs over the IP protocol. But, in the realm of the Internet, this is when the problems start. Transport is just one of the many problems that need to be solved. The third reason is that when engineers actually re-use solutions and previous experience, they don't really fit well in many cases. The old communication patterns designed for web applications on the Internet are not applicable for IoT. So, even if the wheel in many cases is reinvented, it's not the same wheel. In previous paradigms, publishers are a relatively few number of centralized high-value entities that reside on the Internet. On the other hand, consumers are many but distributed low-value entities, safely situated behind firewalls and well protected by antivirus software and operating systems that automatically update themselves. But in IoT, it might be the other way around: publishers (sensors) are distributed, very low-value entities that reside behind firewalls, and consumers (server applications) might be high-value centralized entities, residing on the Internet. It can also be the case that both the consumer and publisher are distributed, low-value entities who reside behind the same or different firewalls. They are not protected by antivirus software, and they do not autoupdate themselves regularly as new threats are discovered and countermeasures added. These firewalls might be installed and then expected to work for 10 years with no modification or update being made. The architectural solutions and security patterns developed for web applications do not solve these cases well. Knowing your neighbor When you decide to move into a new neighborhood, it might be a good idea to know your neighbors first. It's the same when you move a M2M application to IoT. As soon as you connect the cable, you have billions of neighbors around the world, all with access to your device. What kind of neighbors are they? Even though there are a lot of nice and ignorant neighbors on the Internet, you also have a lot of criminals, con artists, perverts, hackers, trolls, drug dealers, drug addicts, rapists, pedophiles, burglars, politicians, corrupt police, curious government agencies, murderers, demented people, agents from hostile countries, disgruntled ex-employees, adolescents with a strange sense of humor, and so on. Would you like such people to have access to your things or access to the things that belong to your children? If the answer is no (as it should be), then you must take security into account from the start of any development project you do, aimed at IoT. Remember that the Internet is the foulest cesspit there is on this planet. When you move from the M2M way of thinking to IoT, you move from a nice and security gated community to the roughest neighborhood in the world. Would you go unprotected or unprepared into such an area? IoT is not the same as M2M communication in a secure and controlled network. For an application to work, it needs to work for some time, not just in the laboratory or just after installation, hoping that nobody finds out about the system. It is not sufficient to just get machines to talk with each other over the Internet. Modes of attack To write an exhaustive list of different modes of attack that you can expect would require a book by itself. Instead, just a brief introduction to some of the most common forms of attack is provided here. It is important to have these methods in mind when designing the communication architecture to use for IoT applications. Denial of Service A Denial of Service (DoS) or Distributed Denial of Service (DDoS) attack is normally used to make a service on the Internet crash or become unresponsive, and in some cases, behave in a way that it can be exploited. The attack consists in making repetitive requests to a server until its resources gets exhausted. In a distributed version, the requests are made by many clients at the same time, which obviously increases the load on the target. It is often used for blackmailing or political purposes. However, as the attack gets more effective and difficult to defend against when the attack is distributed and the target centralized, the attack gets less effective if the solution itself is distributed. To guard against this form of attack, you need to build decentralized solutions where possible. In decentralized solutions, each target's worth is less, making it less interesting to attack. Guessing the credentials One way to get access to a system is to impersonate a client in the system by trying to guess the client's credentials. To make this type of attack less effective, make sure each client and each device has a long and unique, perhaps randomly generated, set of credentials. Never use preset user credentials that are the same for many clients or devices or factory default credentials that are easy to reset. Furthermore, set a limit to the number of authentication attempts per time unit permitted by the system; also, log an event whenever this limit is reached, from where to which credentials were used. This makes it possible for operators to detect systematic attempts to enter the system. Getting access to stored credentials One common way to illicitly enter a system is when user credentials are found somewhere else and reused. Often, people reuse credentials in different systems. There are various ways to avoid this risk from happening. One is to make sure that credentials are not reused in different devices or across different services and applications. Another is to randomize credentials, lessening the desire to reuse memorized credentials. A third way is to never store actual credentials centrally, even encrypted if possible, and instead store hashed values of these credentials. This is often possible since authentication methods use hash values of credentials in their computations. Furthermore, these hashes should be unique to the current installation. Even though some hashing functions are vulnerable in such a way that a new string can be found that generates the same hash value, the probability that this string is equal to the original credentials is miniscule. And if the hash is computed uniquely for each installation, the probability that this string can be reused somewhere else is even more remote. Man in the middle Another way to gain access to a system is to try and impersonate a server component in a system instead of a client. This is often referred to as a Man in the middle (MITM) attack. The reason for the middle part is that the attacker often does not know how to act in the server and simply forwards the messages between the real client and the server. In this process, the attacker gains access to confidential information within the messages, such as client credentials, even if the communication is encrypted. The attacker might even try to modify messages for their own purposes. To avoid this type of attack, it's important for all clients (not just a few) to always validate the identity of the server it connects to. If it is a high-value entity, it is often identified using a certificate. This certificate can both be used to verify the domain of the server and encrypt the communication. Make sure this validation is performed correctly, and do not accept a connection that is invalid or where the certificate has been revoked, is self-signed, or has expired. Another thing to remember is to never use an unsecure authentication method when the client authenticates itself with the server. If a server has been compromised, it might try to fool clients into using a less secure authentication method when they connect. By doing so, they can extract the client credentials and reuse them somewhere else. By using a secure authentication method, the server, even if compromised, will not be able to replay the authentication again or use it somewhere else. The communication is valid only once. Sniffing network communication If communication is not encrypted, everybody with access to the communication stream can read the messages using simple sniffing applications, such as Wireshark. If the communication is point-to-point, this means the communication can be heard by any application on the sending machine, the receiving machine, or any of the bridges or routers in between. If a simple hub is used instead of a switch somewhere, everybody on that network will also be able to eavesdrop. If the communication is performed using multicast messaging service, as can be done in UPnP and CoAP, anybody within the range of the Time to live (TTL) parameter (maximum number of router hops) can eavesdrop. Remember to always use encryption if sensitive data is communicated. If data is private, encryption should still be used, even if the data might not be sensitive at first glance. A burglar can know if you're at home by simply monitoring temperature sensors, water flow meters, electricity meters, or light switches at your home. Small variations in temperature alert to the presence of human beings. Change in the consumption of electrical energy shows whether somebody is cooking food or watching television. The flow of water shows whether somebody is drinking water, flushing a toilet, or taking a shower. No flow of water or a relatively regular consumption of electrical energy tells the burglar that nobody is at home. Light switches can also be used to detect presence, even though there are applications today that simulate somebody being home by switching the lights on and off. If you haven't done so already, make sure to download a sniffer to get a feel of what you can and cannot see by sniffing the network traffic. Wireshark can be downloaded from https://www.wireshark.org/download.html. Port scanning and web crawling Port scanning is a method where you systematically test a range of ports across a range of IP addresses to see which ports are open and serviced by applications. This method can be combined with different tests to see the applications that might be behind these ports. If HTTP servers are found, standard page names and web-crawling techniques can be used to try to figure out which web resources lie behind each HTTP server. CoAP is even simpler since devices often publish well-known resources. Using such simple brute-force methods, it is relatively easy to find (and later exploit) anything available on the Internet that is not secured. To avoid any private resources being published unknowingly, make sure to close all the incoming ports in any firewalls you use. Don't use protocols that require incoming connections. Instead, use protocols that create the connections from inside the firewall. Any resources published on the Internet should be authenticated so that any automatic attempt to get access to them fails. Always remember that information that might seem trivial to an individual might be very interesting if collected en masse. This information might be coveted not only by teenage pranksters but by public relations and marketing agencies, burglars, and government agencies (some would say this is a repetition). Search features and wildcards Don't make the mistake of thinking it's difficult to find the identities of devices published on the Internet. Often, it's the reverse. For devices that use multicast communication, such as those using UPnP and CoAP, anybody can listen in and see who sends the messages. For devices that use single-cast communication, such as those using HTTP or CoAP, port-scanning techniques can be used. For devices that are protected by firewalls and use message brokers to protect against incoming attacks, such as those that use XMPP and MQTT, search features or wildcards can be used to find the identities of devices managed by the broker, and in the case of MQTT, even what they communicate. You should always assume that the identity of all devices can be found, and that there's an interest in exploiting the device. For this reason, it's very important that each device authenticates any requests made to it if possible. Some protocols help you more with this than others, while others make such authentication impossible. XMPP only permits messages from accepted friends. The only thing the device needs to worry about is which friend requests to accept. This can be either configured by somebody else with access to the account or by using a provisioning server if the device cannot make such decisions by itself. The device does not need to worry about client authentication, as this is done by the brokers themselves, and the XMPP brokers always propagate the authenticated identities of everybody who send them messages. MQTT, on the other hand, resides in the other side of the spectrum. Here, devices cannot make any decision about who sees the published data or who makes a request since identities are stripped away by the protocol. The only way to control who gets access to the data is by building a proprietary end-to-end encryption layer on top of the MQTT protocol, thereby limiting interoperability. In between the two resides protocols such as HTTP and CoAP that support some level of local client authentication but lacks a good distributed identity and authentication mechanism. This is vital for IoT even though this problem can be partially solved in local intranets. Breaking ciphers Many believe that by using encryption, data is secure. This is not the case, as discussed previously, since the encryption is often only done between connected parties and not between end users of data (the so-called end-to-end encryption). At most, such encryption safeguards from eavesdropping to some extent. But even such encryption can be broken, partially or wholly, with some effort. Ciphers can be broken using known vulnerabilities in code where attackers exploit program implementations rather than the underlying algorithm of the cipher. This has been the method used in the latest spectacular breaches in code based on the OpenSSL library. To protect yourselves from such attacks, you need to be able to update code in devices remotely, which is not always possible. Other methods use irregularities in how the cipher works to figure out, partly or wholly, what is being communicated over the encrypted channel. This sometimes requires a considerable amount of effort. To safeguard against such attacks, it's important to realize that an attacker does not spend more effort into an attack than what is expected to be gained by the attack. By storing massive amounts of sensitive data centrally or controlling massive amounts of devices from one point, you increase the value of the target, increasing the interest of attacking it. On the other hand, by decentralizing storage and control logic, the interest in attacking a single target decreases since the value of each entity is comparatively lower. Decentralized architecture is an important tool to both mitigate the effects of attacks and decrease the interest in attacking a target. However, by increasing the number of participants, the number of actual attacks can increase, but the effort that can be invested behind each attack when there are many targets also decreases, making it easier to defend each one of the attacks using standard techniques. Tools for achieving security There are a number of tools that architects and developers can use to protect against malicious use of the system. An exhaustive discussion would fill a smaller library. Here, we will mention just a few techniques and how they not only affect security but also interoperability. Virtual Private Networks A method that is often used to protect unsecured solutions on the Internet is to protect them using Virtual Private Networks (VPNs). Often, traditional M2M solutions working well in local intranets need to expand across the Internet. One way to achieve this is to create such VPNs that allow the devices to believe they are in a local intranet, even though communication is transported across the Internet. Even though transport is done over the Internet, it's difficult to see this as a true IoT application. It's rather a M2M solution using the Internet as the mode of transport. Because telephone operators use the Internet to transport long distance calls, it doesn't make it Voice over IP (VoIP). Using VPNs might protect the solution, but it completely eliminates the possibility to interoperate with others on the Internet, something that is seen as the biggest advantage of using the IoT technology. X.509 certificates and encryption We've mentioned the use of certificates to validate the identity of high-value entities on the Internet. Certificates allow you to validate not only the identity, but also to check whether the certificate has been revoked or any of the issuers of the certificate have had their certificates revoked, which might be the case if a certificate has been compromised. Certificates also provide a Public Key Infrastructure (PKI) architecture that handles encryption. Each certificate has a public and private part. The public part of the certificate can be freely distributed and is used to encrypt data, whereas only the holder of the private part of the certificate can decrypt the data. Using certificates incurs a cost in the production or installation of a device or item. They also have a limited life span, so they need to be given either a long lifespan or updated remotely during the life span of the device. Certificates also require a scalable infrastructure for validating them. For these reasons, it's difficult to see that certificates will be used by other than high-value entities that are easy to administer in a network. It's difficult to see a cost-effective, yet secure and meaningful, implementation of validating certificates in low-value devices such as lamps, temperature sensors, and so on, even though it's theoretically possible to do so. Authentication of identities Authentication is the process of validating whether the identity provided is actually correct or not. Authenticating a server might be as simple as validating a domain certificate provided by the server, making sure it has not been revoked and that it corresponds to the domain name used to connect to the server. Authenticating a client might be more involved, as it has to authenticate the credentials provided by the client. Normally, this can be done in many different ways. It is vital for developers and architects to understand the available authentication methods and how they work to be able to assess the level of security used by the systems they develop. Some protocols, such as HTTP and XMPP, use the standardized Simple Authentication and Security Layer (SASL) to publish an extensible set of authentication methods that the client can choose from. This is good since it allows for new authentication methods to be added. But it also provides a weakness: clients can be tricked into choosing an unsecure authentication mechanism, thus unwittingly revealing their user credentials to an impostor. Make sure clients do not use unsecured or obsolete methods, such as PLAIN, BASIC, MD5-CRAM, MD5-DIGEST, and so on, even if they are the only options available. Instead, use secure methods such as SCRAM-SHA-1 or SCRAM-SHA-1-PLUS, or if client certificates are used, EXTERNAL or no method at all. If you're using an unsecured method anyway, make sure to log it to the event log as a warning, making it possible to detect impostors or at least warn operators that unsecure methods are being used. Other protocols do not use secure authentication at all. MQTT, for instance, sends user credentials in clear text (corresponding to PLAIN), making it a requirement to use encryption to hide user credentials from eavesdroppers or client-side certificates or pre-shared keys for authentication. Other protocols do not have a standardized way of performing authentication. In CoAP, for instance, such authentication is built on top of the protocol as security options. The lack of such options in the standard affects interoperability negatively. Usernames and passwords A common method to provide user credentials during authentication is by providing a simple username and password to the server. This is a very human concept. Some solutions use the concept of a pre-shared key (PSK) instead, as it is more applicable to machines, conceptually at least. If you're using usernames and passwords, do not reuse them between devices, just because it is simple. One way to generate secure, difficult-to-guess usernames and passwords is to randomly create them. In this way, they correspond more to pre-shared keys. One problem in using randomly created user credentials is how to administer them. Both the server and the client need to be aware of this information. The identity must also be distributed among the entities that are to communicate with the device. Here, the device creates its own random identity and creates the corresponding account in the XMPP server in a secure manner. There is no need for a common factory default setting. It then reports its identity to a thing registry or provisioning server where the owner can claim it and learn the newly created identity. This method never compromises the credentials and does not affect the cost of production negatively. Furthermore, passwords should never be stored in clear text if it can be avoided. This is especially important on servers where many passwords are stored. Instead, hashes of the passwords should be stored. Most modern authentication algorithms support the use of password hashes. Storing hashes minimizes the risk of unwanted generation of original passwords for attempted reuse in other systems. Using message brokers and provisioning servers Using message brokers can greatly enhance security in an IoT application and lower the complexity of implementation when it comes to authentication, as long as message brokers provide authenticated identity information in messages it forwards. In XMPP, all the federated XMPP servers authenticate clients connected to them as well as the federated servers themselves when they intercommunicate to transport messages between domains. This relieves clients from the burden of having to authenticate each entity in trying to communicate with it since they all have been securely authenticated. It's sufficient to manage security on an identity level. Even this step can be relieved further by the use of provisioning. Unfortunately, not all protocols using message brokers provide this added security since they do not provide information about the sender of packets. MQTT is an example of such a protocol. Centralization versus decentralization Comparing centralized and decentralized architectures is like comparing the process of putting all the eggs in the same basket and distributing them in many much smaller baskets. The effect of a breach of security is much smaller in the decentralized case; fewer eggs get smashed when you trip over. Even though there are more baskets, which might increase the risk of an attack, the expected gain of an attack is much smaller. This limits the motivation of performing a costly attack, which in turn makes it simpler to protect it against. When designing IoT architecture, try to consider the following points: Avoid storing data in a central position if possible. Only store the data centrally that is actually needed to bind things together. Distribute logic, data, and workload. Perform work as far out in the network as possible. This makes the solution more scalable, and it utilizes existing resources better. Use linked data to spread data across the Internet, and use standardized grid computation technologies to assemble distributed data (for example, SPARQL) to avoid the need to store and replicate data centrally. Use a federated set of small local brokers instead of trying to get all the devices on the same broker. Not all brokered protocols support federation, for example, XMPP supports it but MQTT does not. Let devices talk directly to each other instead of having a centralized proprietary API to store data or interpret communication between the two. Contemplate the use of cheap small and energy-efficient microcomputers such as the Raspberry Pi in local installations as an alternative to centralized operation and management from a datacenter. The need for interoperability What has made the Internet great is not a series of isolated services, but the ability to coexist, interchange data, and interact with the users. This is important to keep in mind when developing for IoT. Avoid the mistakes made by many operators who failed during the first Internet bubble. You cannot take responsibility for everything in a service. The new Internet economy is based on the interaction and cooperation between services and its users. Solves complexity The same must be true with the new IoT. Those companies that believe they can control the entire value chain, from things to services, middleware, administration, operation, apps, and so on, will fail, as the companies in the first Internet bubble failed. Companies that built devices with proprietary protocols, middleware, and mobile phone applications, where you can control your things, will fail. Why? Imagine a future where you have a thousand different things in your apartment from a hundred manufacturers. Would you want to download a hundred smart phone apps to control them? Would you like five different applications just to control your lights at home, just because you have light bulbs from five different manufacturers? An alternative would be to have one app to rule them all. There might be a hundred different such apps available (or more), but you can choose which one to use based on your taste and user feedback. And you can change if you want to. But for this to be possible, things need to be interoperable, meaning they should communicate using a commonly understood language. Reduces cost Interoperability does not only affect simplicity of installation and management, but also the price of solutions. Consider a factory that uses thousands (or hundreds of thousands) of devices to control and automate all processes within. Would you like to be able to buy things cheaply or expensively? Companies that promote proprietary solutions, where you're forced to use their system to control your devices, can force their clients to pay a high price for future devices and maintenance, or the large investment made originally might be lost. Will such a solution be able to survive against competitors who sell interoperable solutions where you can buy devices from multiple manufacturers? Interoperability provides competition, and competition drives down cost and increases functionality and quality. This might be a reason for a company to work against interoperability, as it threatens its current business model. But the alternative might be worse. A competitor, possibly a new one, might provide such a solution, and when that happens, the business model with proprietary solutions is dead anyway. The companies that are quickest in adapting a new paradigm are the ones who would most probably survive a paradigm shift, as the shift from M2M to IoT undoubtedly is. Allows new kinds of services and reuse of devices There are many things you cannot do unless you have an interoperable communication model from the start. Consider a future smart city. Here, new applications and services will be built that will reuse existing devices, which were installed perhaps as part of other systems and services. These applications will deliver new value to the inhabitants of the city without the need of installing new duplicate devices for each service being built. But such multiple use of devices is only possible if the devices communicate in an open and interoperable way. However, care has to be taken at the same time since installing devices in an open environment requires the communication infrastructure to be secure as well. To achieve the goal of building smart cities, it is vitally important to use technologies that allow you to have both a secure communication infrastructure and an interoperable one. Combining security and interoperability As we have seen, there are times where security is contradictory to interoperability. If security is meant to be taken as exclusivity, it opposes the idea of interoperability, which is by its very nature inclusive. Depending on the choice of communication infrastructure, you might have to use security measures that directly oppose the idea of an interoperable infrastructure, prohibiting third parties from accessing existing devices in a secure fashion. It is important during the architecture design phase, before implementation, to thoroughly investigate what communication technologies are available, and what they provide and what they do not provide. You might think that this is a minor issue, thinking that you can easily build what is missing on top of the chosen infrastructure. This is not true. All such implementation is by its very nature proprietary, and therefore not interoperable. This might drastically limit your options in the future, which in turn might drastically reduce anyone else's willingness to use your solution. The more a technology includes, in the form of global identity, authentication, authorization, different communication patterns, common language for interchange of sensor data, control operations and access privileges, provisioning, and so on, the more interoperable the solution becomes. If the technology at the same time provides a secure infrastructure, you have the possibility to create a solution that is both secure and interoperable without the need to build proprietary or exclusive solutions on top of it. Summary In this article, we presented the basic reasons why security and interoperability must be contemplated early on in the project and not added as late patchwork because it was shown to be necessary. Not only does such late addition limit interoperability and future use of the solution, it also creates solutions that can jeopardize not only yourself your company and your customers, but in the end, even national security. This article also presented some basic modes of attack and some basic defense systems to counter them. Resources for Article: Further resources on this subject: Rich Internet Application (RIA) – Canvas [article] ExtGWT Rich Internet Application: Crafting UI Real Estate [article] Sending Data to Google Docs [article]
Read more
  • 0
  • 0
  • 7820

article-image-performing-task-gulp
Packt
03 Feb 2015
8 min read
Save for later

Performing Task with Gulp

Packt
03 Feb 2015
8 min read
In this article by Travis Maynard, author of Getting Started with Gulp, we will create a task that will process CSS files. For CSS, we will combine all of the files into a single file and then preprocess it to enable additional features in our code. (For more resources related to this topic, see here.) Using gulp plugins Without plugins, gulp is simply a means of connecting and organizing small bits of functionality. The plugins we are going to install will add the functionality we need to properly modify and optimize our code. Like gulp, all of the gulp plugins we will be using are installed via npm. It is important to note that the gulp team cares deeply about their plugin ecosystem and spends a lot of time making sure they eliminate plugins that duplicate the functionality that has already been created. To enforce these plugin standards, they have implemented a blacklisting system that only shows the approved plugins. You can search for the approved plugins and modules by visiting http://gulpjs.com/plugins. It is important to note that if you search for gulp plugins in the npm registry, you will be shown all the plugins, including the blacklisted ones. So, just to be safe, stick to the official plugin search results to weed out any plugins that might lead you down a wrong path. Additionally, you can run gulp with the --verify flag to make it check whether any of your currently installed plugins and modules are blacklisted. In the following tasks, I will provide you with instructions on how to install gulp plugins as required. The code will look something like this: npm install gulp-plugin1 gulp-plugin2 gulp-plugin3 --save-dev This is simply a shorthand to save you time. You could just as easily run each of these commands separately, but it would only take more time: npm install gulp-plugin1 --save-dev npm install gulp-plugin2 --save-dev npm install gulp-plugin3 --save-dev Remember, on Mac and Linux systems, you may need to add in the additional sudo keyword to the beginning of your commands if you are in a protected area of your file system. Otherwise, you will receive permission errors and none of the modules will be installed. The styles task The first task we are going to add to our gulpfile will be our styles task. Our goal with this task is to combine all of our CSS files into a single file and then run those styles through a preprocessor such as Sass, Less, or Myth. In this example, we will use Myth, but you can simply substitute any other preprocessor that you would prefer to use. Installing gulp plugins For this task, we will be using two plugins: gulp-concat and gulp-myth. As mentioned in the preceding section, we will install both of these tasks at the same time using the shortcut syntax. In addition to these plugins, we need to install gulp as well since this is the first task that we will be writing. For the remaining tasks, it won't be necessary to install gulp again, as it will already be installed locally in our project. The command for installing gulp plugin is as follows: npm install gulp gulp-concat gulp-myth --save-dev The following two screenshots show the installation of the gulp plugin: While running these commands, make sure that you're in the root directory of your project. If you're following the naming conventions used throughout this book, then the folder should be named gulp-book. Including gulp plugins Once complete, you will need to include references to those plugins at the beginning of your gulpfile. To do this, simply open gulpfile.js and add the following lines to it: var gulp = require('gulp');var concat = require('gulp-concat');var myth = require('gulp-myth'); You can now match your gulpfile with the following screenshot: Writing the styles task With these references added, we can now begin to write our styles task. We will start off with the main task method and pass a string of styles to it as its identifier. This is the main method that will wrap all of the tasks we will be creating throughout the book. The code for the task() method is as follows: gulp.task('styles', function() {   // Code Goes Here}); Next, you will need to tell gulp where it can find the source files that you wish to process. You instruct gulp by including a path to the file, but the path can contain globbing wildcards such as * to reference multiple files within a single directory. To demonstrate this, we will target all of the files that are inside of our css directory in our project. gulp.task('styles', function() {   return gulp.src('app/css/*.css')       // Pipes Coming Soon}); We have used the * globbing pattern to tell gulp that our source is every file with a .css extension inside of our css folder. This is a very valuable pattern that you will use throughout the writing of your tasks. Once our source has been set up, we can begin piping in our plugins to modify our data. We will begin by concatenating our source files into a single CSS file named all.css: gulp.task('styles', function() {   return gulp.src('app/css/*.css')       .pipe(concat('all.css'))       // More Pipes Coming Soon}); In the preceding code, we added our concat reference that we included at the top of our gulpfile and passed it in a filename for the concatenated CSS file. In similar build systems, this would create a file and place it in a temporary location; however, with gulp, we can send this newly created file to the next step in our pipechain without writing out to any temporary files. Next, we will pipe in our concatenated CSS file into our preprocessor: gulp.task('styles', function() {   return gulp.src('app/css/*.css')       .pipe(concat('all.css'))       .pipe(myth())}); Finally, to finish the task, we must specify where we need to output our file. In our project, we will be outputting the file into a folder named dist that is located inside of our root project directory. To output a file, we will use gulp's .dest() method. This expects only a single argument, namely, the directory where you would like to output your processed file. The code for the dest() function is as follows: gulp.task('styles', function() {   return gulp.src('app/css/*.css')       .pipe(concat('all.css'))       .pipe(myth())       .pipe(gulp.dest('dist'));}); You can now match your gulpfile with the following screenshot: In the preceding code, we added our final pipe with the .dest() method and supplied it with our dist directory that I mentioned in one of the previous sections. This task will now put our concatenated and preprocessed file into our dist directory for us to include it in our application. This task is now essentially complete! We will continue to add additional functionality to it as we progress through the book, but for now our core functionality is in place. Other preprocessors It is important to note that concatenating our files is often not really necessary when using a preprocessor such as Sass. This is because it already includes an @import feature that allows you to separate your CSS files into partials based on their specific purpose and then pulls them all into a single file. If you are using this functionality within Sass, then we can very easily modify our task by installing the gulp-sass plugin and rearranging our pipes. To do so, you would simply install the gulp-sass plugin and then modify your task as follows: npm install gulp-sass --save-dev The code for gulp-sass task is as follows: gulp.task('styles', function() {   return gulp.src('app/css/*.scss')       .pipe(sass())       .pipe(gulp.dest('dist'));}); You can now remove the concatenation pipe as the gulp-sass plugin will hit those imports and pull everything up together for you. So, in this case, all you would need to do is simply change the source files over to .scss and remove the initial pipe that we used to concatenate our files. After those changes have been made, the pipechain will continue to work as expected. Reviewing the styles task Our styles task will first take in our CSS source files and then concatenate them into a single file that we have called all.css. Once they have been concatenated, we are going to pass our new all.css file into our pipe that will then preprocess it using Myth (again, you can substitute any preprocessor you prefer to use). Finally, we will save that concatenated and preprocessed file in our dist directory where we can finally include it in our website or application. Summary In this article, we learned how to write and run a function in gulpfile from the ground up. In it, we created a style task to process our CSS files. Our CSS task joins together all of our CSS files and then passes the joined file through a preprocessor so that we can use cutting-edge CSS features, such as variables and mathematical calculations. Resources for Article: Further resources on this subject: WebSockets in Wildfly [article] Creating CSS via the Stylus preprocessor [article] Alfresco Web Scrpits [article]
Read more
  • 0
  • 0
  • 2023

article-image-adding-authentication
Packt
23 Jan 2015
15 min read
Save for later

Adding Authentication

Packt
23 Jan 2015
15 min read
This article written by Mat Ryer, the author of Go Programming Blueprints, is focused on high-performance transmission of messages from the clients to the server and back again, but our users have no way of knowing who they are talking to. One solution to this problem is building of some kind of signup and login functionality and letting our users create accounts and authenticate themselves before they can open the chat page. (For more resources related to this topic, see here.) Whenever we are about to build something from scratch, we must ask ourselves how others have solved this problem before (it is extremely rare to encounter genuinely original problems), and whether any open solutions or standards already exist that we can make use of. Authorization and authentication are hardly new problems, especially in the world of the Web, with many different protocols out there to choose from. So how do we decide the best option to pursue? As always, we must look at this question from the point of view of the user. A lot of websites these days allow you to sign in using your accounts existing elsewhere on a variety of social media or community websites. This saves users the tedious job of entering all their account information over and over again as they decide to try out different products and services. It also has a positive effect on the conversion rates for new sites. In this article, we will enhance our chat codebase to add authentication, which will allow our users to sign in using Google, Facebook, or GitHub and you'll see how easy it is to add other sign-in portals too. In order to join the chat, users must first sign in. Following this, we will use the authorized data to augment our user experience so everyone knows who is in the room, and who said what. In this article, you will learn to: Use the decorator pattern to wrap http.Handler types to add additional functionality to handlers Serve HTTP endpoints with dynamic paths Use the Gomniauth open source project to access authentication services Get and set cookies using the http package Encode objects as Base64 and back to normal again Send and receive JSON data over a web socket Give different types of data to templates Work with channels of your own types Handlers all the way down For our chat application, we implemented our own http.Handler type in order to easily compile, execute, and deliver HTML content to browsers. Since this is a very simple but powerful interface, we are going to continue to use it wherever possible when adding functionality to our HTTP processing. In order to determine whether a user is authenticated, we will create an authentication wrapper handler that performs the check, and passes execution on to the inner handler only if the user is authenticated. Our wrapper handler will satisfy the same http.Handler interface as the object inside it, allowing us to wrap any valid handler. In fact, even the authentication handler we are about to write could be later encapsulated inside a similar wrapper if needed. Diagram of a chaining pattern when applied to HTTP handlers The preceding figure shows how this pattern could be applied in a more complicated HTTP handler scenario. Each object implements the http.Handler interface, which means that object could be passed into the http.Handle method to directly handle a request, or it can be given to another object, which adds some kind of extra functionality. The Logging handler might write to a logfile before and after the ServeHTTP method is called on the inner handler. Because the inner handler is just another http.Handler, any other handler can be wrapped in (or decorated with) the Logging handler. It is also common for an object to contain logic that decides which inner handler should be executed. For example, our authentication handler will either pass the execution to the wrapped handler, or handle the request itself by issuing a redirect to the browser. That's plenty of theory for now; let's write some code. Create a new file called auth.go in the chat folder: package main import ( "net/http" ) type authHandler struct { next http.Handler } func (h *authHandler) ServeHTTP(w http.ResponseWriter, r *http.Request) { if _, err := r.Cookie("auth"); err == http.ErrNoCookie { // not authenticated w.Header().Set("Location", "/login") w.WriteHeader(http.StatusTemporaryRedirect) } else if err != nil { // some other error panic(err.Error()) } else { // success - call the next handler h.next.ServeHTTP(w, r) } } func MustAuth(handler http.Handler) http.Handler { return &authHandler{next: handler} } The authHandler type not only implements the ServeHTTP method (which satisfies the http.Handler interface) but also stores (wraps) http.Handler in the next field. Our MustAuth helper function simply creates authHandler that wraps any other http.Handler. This is the pattern in general programming practice that allows us to easily add authentication to our code in main.go. Let us tweak the following root mapping line: http.Handle("/", &templateHandler{filename: "chat.html"}) Let us change the first argument to make it explicit about the page meant for chatting. Next, let's use the MustAuth function to wrap templateHandler for the second argument: http.Handle("/chat", MustAuth(&templateHandler{filename: "chat.html"})) Wrapping templateHandler with the MustAuth function will cause execution to run first through our authHandler, and only to templateHandler if the request is authenticated. The ServeHTTP method in our authHandler will look for a special cookie called auth, and use the Header and WriteHeader methods on http.ResponseWriter to redirect the user to a login page if the cookie is missing. Build and run the chat application and try to hit http://localhost:8080/chat: go build -o chat ./chat -host=":8080" You need to delete your cookies to clear out previous auth tokens, or any other cookies that might be left over from other development projects served through localhost. If you look in the address bar of your browser, you will notice that you are immediately redirected to the /login page. Since we cannot handle that path yet, you'll just get a 404 page not found error. Making a pretty social sign-in page There is no excuse for building ugly apps, and so we will build a social sign-in page that is as pretty as it is functional. Bootstrap is a frontend framework used to develop responsive projects on the Web. It provides CSS and JavaScript code that solve many user-interface problems in a consistent and good-looking way. While sites built using Bootstrap all tend to look the same (although there are plenty of ways in which the UI can be customized), it is a great choice for early versions of apps, or for developers who don't have access to designers. If you build your application using the semantic standards set forth by Bootstrap, it becomes easy for you to make a Bootstrap theme for your site or application and you know it will slot right into your code. We will use the version of Bootstrap hosted on a CDN so we don't have to worry about downloading and serving our own version through our chat application. This means that in order to render our pages properly, we will need an active Internet connection, even during development. If you prefer to download and host your own copy of Bootstrap, you can do so. Keep the files in an assets folder and add the following call to your main function (it uses http.Handle to serve the assets via your application): http.Handle("/assets/", http.StripPrefix("/assets", http.FileServer(http.Dir("/path/to/assets/")))) Notice how the http.StripPrefix and http.FileServer functions return objects that satisfy the http.Handler interface as per the decorator pattern that we implement with our MustAuth helper function. In main.go, let's add an endpoint for the login page: http.Handle("/chat", MustAuth(&templateHandler{filename: "chat.html"})) http.Handle("/login", &templateHandler{filename: "login.html"}) http.Handle("/room", r) Obviously, we do not want to use the MustAuth method for our login page because it will cause an infinite redirection loop. Create a new file called login.html inside our templates folder, and insert the following HTML code: <html> <head> <title>Login</title> <link rel="stylesheet" href="//netdna.bootstrapcdn.com/bootstrap/3.1.1/css/bootstrap.min.css"> </head> <body> <div class="container"> <div class="page-header"> <h1>Sign in</h1> </div> <div class="panel panel-danger"> <div class="panel-heading"> <h3 class="panel-title">In order to chat, you must be signed in</h3> </div> <div class="panel-body"> <p>Select the service you would like to sign in with:</p> <ul> <li> <a href="/auth/login/facebook">Facebook</a> </li> <li> <a href="/auth/login/github">GitHub</a> </li> <li> <a href="/auth/login/google">Google</a> </li> </ul> </div> </div> </div> </body> </html> Restart the web server and navigate to http://localhost:8080/login. You will notice that it now displays our sign-in page: Endpoints with dynamic paths Pattern matching for the http package in the Go standard library isn't the most comprehensive and fully featured implementation out there. For example, Ruby on Rails makes it much easier to have dynamic segments inside the path: "auth/:action/:provider_name" This then provides a data map (or dictionary) containing the values that it automatically extracted from the matched path. So if you visit auth/login/google, then params[:provider_name] would equal google, and params[:action] would equal login. The most the http package lets us specify by default is a path prefix, which we can do by leaving a trailing slash at the end of the pattern: "auth/" We would then have to manually parse the remaining segments to extract the appropriate data. This is acceptable for relatively simple cases, which suits our needs for the time being since we only need to handle a few different paths such as: /auth/login/google /auth/login/facebook /auth/callback/google /auth/callback/facebook If you need to handle more advanced routing situations, you might want to consider using dedicated packages such as Goweb, Pat, Routes, or mux. For extremely simple cases such as ours, the built-in capabilities will do. We are going to create a new handler that powers our login process. In auth.go, add the following loginHandler code: // loginHandler handles the third-party login process. // format: /auth/{action}/{provider} func loginHandler(w http.ResponseWriter, r *http.Request) { segs := strings.Split(r.URL.Path, "/") action := segs[2] provider := segs[3] switch action { case "login": log.Println("TODO handle login for", provider) default: w.WriteHeader(http.StatusNotFound) fmt.Fprintf(w, "Auth action %s not supported", action) } } In the preceding code, we break the path into segments using strings.Split before pulling out the values for action and provider. If the action value is known, we will run the specific code, otherwise we will write out an error message and return an http.StatusNotFound status code (which in the language of HTTP status code, is a 404 code). We will not bullet-proof our code right now but it's worth noticing that if someone hits loginHandler with too few segments, our code will panic because it expects segs[2] and segs[3] to exist. For extra credit, see whether you can protect against this and return a nice error message instead of a panic if someone hits /auth/nonsense. Our loginHandler is only a function and not an object that implements the http.Handler interface. This is because, unlike other handlers, we don't need it to store any state. The Go standard library supports this, so we can use the http.HandleFunc function to map it in a way similar to how we used http.Handle earlier. In main.go, update the handlers: http.Handle("/chat", MustAuth(&templateHandler{filename: "chat.html"})) http.Handle("/login", &templateHandler{filename: "login.html"}) http.HandleFunc("/auth/", loginHandler) http.Handle("/room", r) Rebuild and run the chat application: go build –o chat ./chat –host=":8080" Hit the following URLs and notice the output logged in the terminal: http://localhost:8080/auth/login/google outputs TODO handle login for google http://localhost:8080/auth/login/facebook outputs TODO handle login for facebook We have successfully implemented a dynamic path-matching mechanism that so far just prints out TODO messages; we need to integrate with authentication services in order to make our login process work. OAuth2 OAuth2 is an open authentication and authorization standard designed to allow resource owners to give clients delegated access to private data (such as wall posts or tweets) via an access token exchange handshake. Even if you do not wish to access the private data, OAuth2 is a great option that allows people to sign in using their existing credentials, without exposing those credentials to a third-party site. In this case, we are the third party and we want to allow our users to sign in using services that support OAuth2. From a user's point of view, the OAuth2 flow is: A user selects provider with whom they wish to sign in to the client app. The user is redirected to the provider's website (with a URL that includes the client app ID) where they are asked to give permission to the client app. The user signs in from the OAuth2 service provider and accepts the permissions requested by the third-party application. The user is redirected back to the client app with a request code. In the background, the client app sends the grant code to the provider, who sends back an auth token. The client app uses the access token to make authorized requests to the provider, such as to get user information or wall posts. To avoid reinventing the wheel, we will look at a few open source projects that have already solved this problem for us. Open source OAuth2 packages Andrew Gerrand has been working on the core Go team since February 2010, that is two years before Go 1.0 was officially released. His goauth2 package (see https://code.google.com/p/goauth2/) is an elegant implementation of the OAuth2 protocol written entirely in Go. Andrew's project inspired Gomniauth (see https://github.com/stretchr/gomniauth). An open source Go alternative to Ruby's omniauth project, Gomniauth provides a unified solution to access different OAuth2 services. In the future, when OAuth3 (or whatever next-generation authentication protocol it is) comes out, in theory, Gomniauth could take on the pain of implementing the details, leaving the user code untouched. For our application, we will use Gomniauth to access OAuth services provided by Google, Facebook, and GitHub, so make sure you have it installed by running the following command: go get github.com/stretchr/gomniauth Some of the project dependencies of Gomniauth are kept in Bazaar repositories, so you'll need to head over to http://wiki.bazaar.canonical.com to download them. Tell the authentication providers about your app Before we ask an authentication provider to help our users sign in, we must tell them about our application. Most providers have some kind of web tool or console where you can create applications to kick this process. Here's one from Google: In order to identify the client application, we need to create a client ID and secret. Despite the fact that OAuth2 is an open standard, each provider has their own language and mechanism to set things up, so you will most likely have to play around with the user interface or the documentation to figure it out in each case. At the time of writing this, in Google Developer Console , you navigate to APIs & auth | Credentials and click on the Create new Client ID button. In most cases, for added security, you have to be explicit about the host URLs from where requests will come. For now, since we're hosting our app locally on localhost:8080, you should use that. You will also be asked for a redirect URI that is the endpoint in our chat application and to which the user will be redirected after successfully signing in. The callback will be another action on our loginHandler, so the redirection URL for the Google client will be http://localhost:8080/auth/callback/google. Once you finish the authentication process for the providers you want to support, you will be given a client ID and secret for each provider. Make a note of these, because we will need them when we set up the providers in our chat application. If we host our application on a real domain, we have to create new client IDs and secrets, or update the appropriate URL fields on our authentication providers to ensure that they point to the right place. Either way, it's not bad practice to have a different set of development and production keys for security. Summary This article shows how to add OAuth to our chat application so that we can keep track of who is saying what, but let them log in using Google, Facebook, or GitHub. We also learned how to use handlers for efficient coding. This article also thought us how to make a pretty social sign-in page. Resources for Article: Further resources on this subject: WebSockets in Wildfly [article] Using Socket.IO and Express together [article] The Importance of Securing Web Services [article]
Read more
  • 0
  • 0
  • 3363
Unlock access to the largest independent learning library in Tech for FREE!
Get unlimited access to 7500+ expert-authored eBooks and video courses covering every tech area you can think of.
Renews at $19.99/month. Cancel anytime
article-image-data-writing
Packt
23 Jan 2015
10 min read
Save for later

Data writing

Packt
23 Jan 2015
10 min read
In this article by P. Raja Malleswara Rao, author of the book Spring Batch Essentials, we will see how the Spring Batch provides the configuration to write the read and processed data to a different output (destination). The writer can integrate easily with different relational frameworks. It can also be customized for the different formats. (For more resources related to this topic, see here.) ItemWriter Spring Batch provides an interface in the form of ItemWriter to write bulk data.The following is the definition of the ItemWriter interface: public interface ItemWriter<T> { void write(List<? extends T> items) throws Exception; } Based on the destination platform onto which we have to write the data, we have the following item writers: Flat file item writers : These write the content onto a flat file (fixed width and delimited) XML item writers: These write the data onto an XML file Database item writers : These write the data onto a database Flat file item writers The data read from any of the existing formats can be processed to the desired format and then be written onto multiple formats, including flat files. The following are the APIs that help in flat file item writing. LineAggregator The LineAggregatorAPI concatenates multiple fields into a String to write onto the flat file. This works exactly the opposite way of LineTokenizer in the read operation. public interface LineAggregator<T> { public String aggregate(T item); } PassThroughLineAggregator PassThroughLineAggregator is an implementation of LineAggreagator that considers the object in use is already aggregated and simply returns the String from the object using the toString() method. public class PassThroughLineAggregator<T> implements LineAggregator<T> { public String aggregate(T item) { return item.toString(); } } The FlatFileItemWriter can be configured with the PassThroughLineAggregator, as follows: <bean id="itemWriter" class=" org.springframework.batch.item.file.FlatFileItemWriter"> <property name="resource" value="file:target/outputfiles/employee_output.txt"/> <property name="lineAggregator"> <bean class=" org.springframework.batch.item.file.transform.PassThroughLineAggregator"/> </property> </bean> FieldExtractor If the object writing is more than just writing its String form onto the file, FieldExtractor needs to be used, wherein each object gets converted to the array of fields, aggregated together to form a String to write onto the file. public interface FieldExtractor<T> { Object[] extract(T item); } Field extractors are primarily of two types: PassThroughFieldExtractor: For the scenario where the object collection has to just be converted to the array and passed to write BeanWrapperFieldExtractor: With a field-level configuration of how each field of the object gets placed in the String to write onto the file, this works exactly the opposite way of BeanWrapperFieldSetMapper The BeanWrapperFieldSetExtractor works as follows: BeanWrapperFieldExtractor<Employee> extractor = new BeanWrapperFieldExtractor<Employee>(); extractor.setEmployees(new String[] { "id", "lastname", "firstname","designation","department","yearofjoining"}); int id = 11; String lastname = "Alden"; String firstname = "Richie"; String desination = "associate"; String department = "sales"; int yearofjoining = 1996; Employee employee = new Employee(id, lastname, firstname,designation, department, yearofjoining); Object[] values = extractor.extract(n); assertEquals(id, values[0]); assertEquals(lastname, values[1]); assertEquals(firstname, values[2]); assertEquals(designation, values[3]); assertEquals(department, values[4]); assertEquals(yearofjoining, values[5]);/p> Writing delimited files If the Java object can be written onto the flat files in delimited file format, we can perform it as shown in the following example. Let's consider the Employee object defined already. This object can be configured with the FlatFileItemWriter, the DelimitedLineAggregator, and the BeanWrapperFieldExtractor to perform the delimited flat file, as follows: <bean id="itemWriter" class="org.springframework.batch.item.file.FlatFileItemWriter"> <property name="resource" ref="outputResource"/> <property name="lineAggregator"> <bean class=" org.springframework.batch.item.file.transform.DelimitedLineAggregator"> <property name="delimiter" value=","/> <property name="fieldExtractor"> <bean class=" org.springframework.batch.item.file.transform.BeanWrapperFieldExtractor"> <property name="employees" value="id,lastname,firstname,designation,department,yearofjoining"/> </bean> </property> </bean> </property> </bean> Writing a fixed width file Spring Batch supports fixed width file writing with the help of FormatterLineAggregator. Considering the same example data as delimited flat file writing, we can perform the fixed width file writing using the following configuration: <bean id="itemWriter" class="org.springframework.batch.item.file.FlatFileItemWriter"> <property name="resource" ref="outputResource"/> <property name="lineAggregator"> <bean class=" org.springframework.batch.item.file.transform.FormatterLineAggregator"> <property name="fieldExtractor"> <bean class=" org.springframework.batch.item.file.transform.BeanWrapperFieldExtractor"> <property name="employees" value=" id,lastname,firstname,designation,department,yearofjoining"/> </bean> </property> <property name="format" value="%-2d%-10s%-10s%-10s%-15s%-4d"/> </bean> </property> </bean> The format value is formed based on the following listed formatter conversions, where argrepresents the argument for conversion: Conversion Category Description b, B general This converts Boolean to the String format. The value is falsefor null h, H general This is the Integer.toHexString(arg.hashCode()) s, S general If argimplements Formattable, then arg.formatTo()Otherwise, arg.toString() c, C character This is a Unicode character d integral This is a decimal integer o integral This is an octal integer x, X integral This is a hexadecimal integer e, E floating point This is a decimal number in computerized scientific notation f floating point This is a decimal number g, G floating point This is a computerized scientific notation or decimal format, depending on the precision and value after rounding a, A floating point This is a hexadecimal floating point number with a significand and an exponent t, T date/time This is the prefix for date and time conversion characters % percent This is a literal %(u0025) n line separator This is the platform-specific line separator FlatFileItemWriter can be configured with the shouldDeleteIfExists option, to delete a file if it already exists in the specified location. The header and footer can be added to the flat file by implementing FlatFileHeaderCallBack and FlatFileFooterCallBack and including these beans with the headerCallback and footerCallback properties respectively. XML item writers The data can be written to the Extensible Markup Language (XML) format using StaxEventItemWriter. The Spring Batch configuration for this activity, for the employee example can be the following: <bean id="itemWriter" class="org.springframework.batch.item.xml.StaxEventItemWriter"> <property name="resource" ref="outputResource"/> <property name="marshaller" ref="employeeMarshaller"/> <property name="rootTagName" value="employees"/> <property name="overwriteOutput" value="true"/> </bean> Using the XStream to do the marshalling activity, the following is the configuration: <bean id="employeeMarshaller" class="org.springframework.oxm.xstream.XStreamMarshaller"> <property name="aliases"> <util:map id="aliases"> <entry key="employee" value="batch.Employee"/> <entry key="ID" value="java.lang.Integer"/> </util:map> </property> </bean> The Java code for the preceding configuration can be realized as follows: StaxEventItemWriter staxItemWriter = newStaxEventItemWriter(); FileSystemResource resource = new FileSystemResource("export/employee_output.xml") Map aliases = newHashMap(); aliases.put("employee","batch.Employee"); aliases.put("ID","java.lang.Integer"); Marshaller marshaller = newXStreamMarshaller(); marshaller.setAliases(aliases); staxItemWriter.setResource(resource); staxItemWriter.setMarshaller(marshaller); staxItemWriter.setRootTagName("employees"); staxItemWriter.setOverwriteOutput(true); ExecutionContext executionContext = newExecutionContext(); staxItemWriter.open(executionContext); Employee employee = new Employee(); employee.setID(11); employee.setLastName("Alden"); employee.setFirstName("Richie"); employee.setDesignation("associate"); employee.setDepartment("sales"); employee.setYearOfJoining("1996"); staxItemWriter.write(employee); Database item writers Spring Batch supports database item writing with two possible access types: JDBC and ORM. JDBC-based database writing Spring Batch supports JDBC-based database writing with the help of JdbcBatchItemWriter, which is an implementation of ItemWriter, which executes multiple SQL statements in the batch mode. The following is the sample configuration for the employee example with the JDBC-based database writing: <bean id="employeeWriter" class="org.springframework.batch.item.database.JdbcBatchItemWriter"> <property name="assertUpdates" value="true" /> <property name="itemPreparedStatementSetter"> <bean class="batch.EmployeePreparedStatementSetter" /> </property> <property name="sql" value="INSERT INTO EMPLOYEE (ID, LASTNAME, FIRSTNAME, DESIGNATION, DEPARTMENT, YEAROFJOINING) VALUES(?, ?, ?, ?, ?, ?)" /> <property name="dataSource" ref="dataSource" /> </bean> The ItemPreparedStatementSetter can be implemented for our example of Employee data as follows: public class EmployeePreparedStatementSetter implements ItemPreparedStatementSetter<Employee> { @Override public void setValues(Employee item, PreparedStatement ps) throws SQLException { ps.setInt(1, item.getId()); ps.setString(2, item.getLastName()); ps.setString(3, item.getFirstName()); ps.setString(4, item.getDesignation()); ps.setString(5, item.getDepartment()); ps.setInt(6, item.getYearOfJoining()); } } ORM-based database writing Object relational mapping(ORM) is defined as a programming technique to convert data between incompatible type systems in object-oriented programming languages. ORM takes care of the data persistence from the object oriented program to the database. Spring Batch supports multiple ORMs including Hibernate, JPA, and iBatis. In our example, the Employee class should be annotated to be used with ORM (Hibernate/JPA) for persistence as follows: @Entity("employee") public class Employee { @Id("id") private int id; @Column("lastName") private String lastName; @Column("firstName") private String firstName; @Column("designation") private String designation; @Column("department") private String department; @Column("yearOfJoining") private int yearOfJoining; public int getID() { return id; } public void setID(int id) { this.id = id; } public String getLastName() { return lastName; } public void setLastName(String lastName) { this.lastName = lastName; } public String getFirstName() { return firstName; } public void setFirstName(String firstName) { this.firstName = firstName; } public String getDesignation() { return designation; } public void setDesignation(String designation) { this.designation = designation; } public String getDepartment() { return department; } public void setDepartment(String department) { this.department = department; } public int getYearOfJoining() { return yearOfJoining; } public void setYearOfJoining(int yearOfJoining) { this.yearOfJoining = yearOfJoining; } } The annotations specify that the class Employee is representing a corresponding table in the database with a name as shown with @Entity, and each field corresponds to a column in the database as shown with the @ID and @Column annotations. The following is the configuration to be made with Hibernate for the employee example: <bean id="employeeWriter" class="org.springframework.batch.item.database.HibernateItemWriter"> <property name="hibernateTemplate" ref="hibernateTemplate" /> </bean> Similarly, for JPA and iBatis, the configurations can be made with JpaItemWriter and IbatisBatchItemWriter respectively. Custom item readers and writers Spring Batch supports custom item readers' and writers' configurations. This can be done easily by implementing the ItemReader and ItemWriter interfaces for the respective read and write operations with the business logic we want, and configuring the ItemReader and ItemWriter in the XML batch configuration. Summary Through this article we learned the essential data handling mechanism – writing the data to different destinations including flat files, XML, and databases. Now we have an understanding of the Spring Batch support that is used to custom formats by implementing the interface to match the business needs that are different from the default formats. Resources for Article:   Further resources on this subject: JAAS-based security authentication on JSPs [article] Serving and processing forms [article] Connecting to a web service (Should know) [article]
Read more
  • 0
  • 0
  • 3118

article-image-cloud
Packt
22 Jan 2015
14 min read
Save for later

In the Cloud

Packt
22 Jan 2015
14 min read
In this article by Rafał Kuć, author of the book Solr Cookbook - Third Edition, covers the cloud side of Solr—SolrCloud, setting up collections, replicas configuration, distributed indexing and searching, as well as aliasing and shard manipulation. We will also learn how to create a cluster. (For more resources related to this topic, see here.) Creating a new SolrCloud cluster Imagine a situation where one day you have to set up a distributed cluster with the use of Solr. The amount of data is just too much for a single server to handle. Of course, you can just set up a second server or go for another master server with another set of data. But before Solr 4.0, you would have to take care of the data distribution yourself. In addition to this, you would also have to take care of setting up replication, data duplication, and so on. With SolrCloud you don't have to do this—you can just set up a new cluster and this article will show you how to do that. Getting ready It shows you how to set up a Zookeeper cluster in order to be ready for production use. How to do it... Let's assume that we want to create a cluster that will have four Solr servers. We also would like to have our data divided between the four Solr servers in such a way that we have the original data on two machines and in addition to this, we would also have a copy of each shard available in case something happens with one of the Solr instances. I also assume that we already have our Zookeeper cluster set up, ready, and available at the address 192.168.1.10 on the 9983 port. For this article, we will set up four SolrCloud nodes on the same physical machine: We will start by running an empty Solr server (without any configuration) on port 8983. We do this by running the following command (for Solr 4.x): java -DzkHost=192.168.1.10:9983 -jar start.jar For Solr 5, we will run the following command: bin/solr -c -z 192.168.1.10:9983 Now we start another three nodes, each on a different port (note that different Solr instances can run on the same port, but they should be installed on different machines). We do this by running one command for each installed Solr server (for Solr 4.x): java -Djetty.port=6983 -DzkHost=192.168.1.10:9983 -jar start.jarjava -Djetty.port=4983 -DzkHost=192.168.1.10:9983 -jar start.jarjava -Djetty.port=2983 -DzkHost=192.168.1.10:9983 -jar start.jar For Solr 5, the commands will be as follows: bin/solr -c -p 6983 -z 192.168.1.10:9983bin/solr -c -p 4983 -z 192.168.1.10:9983bin/solr -c -p 2983 -z 192.168.1.10:9983 Now we need to upload our collection configuration to ZooKeeper. Assuming that we have our configuration in /home/conf/solrconfiguration/conf, we will run the following command from the home directory of the Solr server that runs first (the zkcli.sh script can be found in the Solr deployment example in the scripts/cloud-scripts directory): ./zkcli.sh -cmd upconfig -zkhost 192.168.1.10:9983 -confdir /home/conf/solrconfiguration/conf/ -confname collection1 Now we can create our collection using the following command: curl 'localhost:8983/solr/admin/collections?action=CREATE&name=firstCollection&numShards=2&replicationFactor=2&collection.configName=collection1' If we now go to http://localhost:8983/solr/#/~cloud, we will see the following cluster view: As we can see, Solr has created a new collection with a proper deployment. Let's now see how it works. How it works... We assume that we already have ZooKeeper installed—it is empty and doesn't have information about any collection, because we didn't create them. For Solr 4.x, we started by running Solr and telling it that we want it to run in SolrCloud mode. We did that by specifying the -DzkHost property and setting its value to the IP address of our ZooKeeper instance. Of course, in the production environment, you would point Solr to a cluster of ZooKeeper nodes—this is done using the same property, but the IP addresses are separated using the comma character. For Solr 5, we used the solr script provided in the bin directory. By adding the -c switch, we told Solr that we want it to run in the SolrCloud mode. The -z switch works exactly the same as the -DzkHost property for Solr 4.x—it allows you to specify the ZooKeeper host that should be used. Of course, the other three Solr nodes run exactly in the same manner. For Solr 4.x, we add the -DzkHost property that points Solr to our ZooKeeper. Because we are running all the four nodes on the same physical machine, we needed to specify the -Djetty.port property, because we can run only a single Solr server on a single port. For Solr 5, we use the -z property of the bin/solr script and we use the -p property to specify the port on which Solr should start. The next step is to upload the collection configuration to ZooKeeper. We do this because Solr will fetch this configuration from ZooKeeper when you will request the collection creation. To upload the configuration, we use the zkcli.sh script provided with the Solr distribution. We use the upconfig command (the -cmd switch), which means that we want to upload the configuration. We specify the ZooKeeper host using the -zkHost property. After that, we can say which directory our configuration is stored (the -confdir switch). The directory should contain all the needed configuration files such as schema.xml, solrconfig.xml, and so on. Finally, we specify the name under which we want to store our configuration using the -confname switch. After we have our configuration in ZooKeeper, we can create the collection. We do this by running a command to the Collections API that is available at the /admin/collections endpoint. First, we tell Solr that we want to create the collection (action=CREATE) and that we want our collection to be named firstCollection (name=firstCollection). Remember that the collection names are case sensitive, so firstCollection and firstcollection are two different collections. We specify that we want our collection to be built of two primary shards (numShards=2) and we want each shard to be present in two copies (replicationFactor=2). This means that we will have a primary shard and a single replica. Finally, we specify which configuration should be used to create the collection by specifying the collection.configName property. As we can see in the cloud, a view of our cluster has been created and spread across all the nodes. There's more... There are a few things that I would like to mention—the possibility of running a Zookeeper server embedded into Apache Solr and specifying the Solr server name. Starting an embedded ZooKeeper server You can also start an embedded Zookeeper server shipped with Solr for your test environment. In order to do this, you should pass the -DzkRun parameter instead of -DzkHost=192.168.0.10:9983, but only in the command that sends our configuration to the Zookeeper cluster. So the final command for Solr 4.x should look similar to this: java -DzkRun -jar start.jar In Solr 5.0, the same command will be as follows: bin/solr start -c By default, ZooKeeper will start on the port higher by 1,000 to the one Solr is started at. So if you are running your Solr instance on 8983, ZooKeeper will be available at 9983. The thing to remember is that the embedded ZooKeeper should only be used for development purposes and only one node should start it. Specifying the Solr server name Solr needs each instance of SolrCloud to have a name. By default, that name is set using the IP address or the hostname, appended with the port the Solr instance is running on, and the _solr postfix. For example, if our node is running on 192.168.56.1 and port 8983, it will be called 192.168.56.1:8983_solr. Of course, Solr allows you to change that behavior by specifying the hostname. To do this, start using the -Dhost property or add the host property to solr.xml. For example, if we would like one of our nodes to have the name of server1, we can run the following command to start Solr: java -DzkHost=192.168.1.10:9983 -Dhost=server1 -jar start.jar In Solr 5.0, the same command would be: bin/solr start -c -h server1 Setting up multiple collections on a single cluster Having a single collection inside the cluster is nice, but there are multiple use cases when we want to have more than a single collection running on the same cluster. For example, we might want users and books in different collections or logs from each day to be only stored inside a single collection. This article will show you how to create multiple collections on the same cluster. Getting ready This article will show you how to create a new SolrCloud cluster. We also assume that ZooKeeper is running on 192.168.1.10 and is listening on the 2181 port and that we already have four SolrCloud nodes running as a cluster. How to do it... As we already have all the prerequisites, such as ZooKeeper and Solr up and running, we need to upload our configuration files to ZooKeeper to be able to create collections: Assuming that we have our configurations in /home/conf/firstcollection/conf and /home/conf/secondcollection/conf, we will run the following commands from the home directory of the first run Solr server to upload the configuration to ZooKeeper (the zkcli.sh script can be found in the Solr deployment example in the scripts/cloud-scripts directory): ./zkcli.sh -cmd upconfig -zkhost localhost:2181 -confdir /home/conf/firstcollection/conf/ -confname firstcollection./zkcli.sh -cmd upconfig -zkhost localhost:2181 -confdir /home/conf/secondcollection/conf/ -confname secondcollection We have pushed our configurations into Zookeeper, so now we can create the collections we want. In order to do this, we use the following commands: curl 'localhost:8983/solr/admin/collections?action=CREATE&name=firstCollection&numShards=2&replicationFactor=2&collection.configName=firstcollection'curl 'localhost:8983/solr/admin/collections?action=CREATE&name=secondcollection&numShards=4&replicationFactor=1&collection.configName=secondcollection' Now, just to test whether everything went well, we will go to http://localhost:8983/solr/#/~cloud. As the result, we will see the following cluster topology: As we can see, both the collections were created the way we wanted. Now let's see how that happened. How it works... We assume that we already have ZooKeeper installed—it is empty and doesn't have information about any collections, because we didn't create them. We also assumed that we have our SolrCloud cluster configured and started. We start by uploading two configurations to ZooKeeper, one called firstcollection and the other called secondcollection. After that we are ready to create our collections. We start by creating the collection named firstCollection that is built of two primary shards and one replica. The second collection, called secondcollection is built of four primary shards and it doesn't have any replicas. We can see that easily in the cloud view of the deployment. The firstCollection collection has two shards—shard1 and shard2. Each of the shard has two physical copies—one green (which means active) and one with a black dot, which is the primary shard. The secondcollection collection is built of four physical shards—each shard has a black dot near its name, which means that they are primary shards. Splitting shards Imagine a situation where you reach a limit of your current deployment—the number of shards is just not enough. For example, the indexing throughput is lower and lower, because the disks are not able to keep up. Of course, one of the possible solutions is to spread the index across more shards; however, you already have a collection and you want to keep the data and reindexing is not an option, because you don't have the original data. Solr can help you with such situations by allowing splitting shards of already created collections. This article will show you how to do it. Getting ready This article will show you how to create a new SolrCloud cluster. We also assume that ZooKeeper is running on 192.168.1.10 and is listening on port 2181 and that we already have four SolrCloud nodes running as a cluster. How to do it... Let's assume that we already have a SolrCloud cluster up and running and it has one collection called books. So our cloud view (which is available at http://localhost:8983/solr/#/~cloud) looks as follows: We have four nodes and we don't utilize them fully. We can say that these two nodes in which we have our shards are almost fully utilized. What we can do is create a new collection and reindex the data or we can split shards of the already created collection. Let's go with the second option: We start by splitting the first shard. It is as easy as running the following command: curl 'http://localhost:8983/solr/admin/collections?action=SPLITSHARD&collection=books&shard=shard1' After this, we can split the second shard by running a similar command to the one we just used: curl 'http://localhost:8983/solr/admin/collections?action=SPLITSHARD&collection=books&shard=shard2' Let's take a look at the cluster cloud view now (which is available at http://localhost:8983/solr/#/~cloud): As we can see, both shards were split—shard1 was divided into shard1_0 and shard1_1 and shard2 was divided into shard2_0 and shard2_1. Of course, the data was copied as well, so everything is ready. However, the last step should be to delete the original shards. Solr doesn't delete them, because sometimes applications use shard names to connect to a given shard. However, in our case, we can delete them by running the following commands: curl 'http://localhost:8983/solr/admin/collections?action=DELETESHARD&collection=books&shard=shard1' curl 'http://localhost:8983/solr/admin/collections?action=DELETESHARD&collection=books&shard=shard2' Now if we would again look at the cloud view of the cluster, we will see the following: How it works... We start with a simple collection called books that is built of two primary shards and no replicas. This is the collection which shards we will try to divide it without stopping Solr. Splitting shards is very easy. We just need to run a simple command in the Collections API (the /admin/collections endpoint) and specify that we want to split a shard (action=SPLITSHARD). We also need to provide additional information such as which collection we are interested in (the collection parameter) and which shard we want to split (the shard parameter). You can see the name of the shard by looking at the cloud view or by reading the cluster state from ZooKeeper. After sending the command, Solr might force us to wait for a substantial amount of time—shard splitting takes time, especially on large collections. Of course, we can run the same command for the second shard as well. Finally, we end up with six shards—four new and two old ones. The original shard will still contain data, but it will start to re-route requests to newly created shards. The data was split evenly between the new shards. The old shards were left although they are marked as inactive and they won't have any more data indexed to them. Because we don't need them, we can just delete them using the action=DELETESHARD command sent to the same Collections API. Similar to the split shard command, we need to specify the collection name, which shard we want to delete, and the name of the shard. After we delete the initial shards, we now see that our cluster view shows only four shards, which is what we were aiming at. We can now spread the shards across the cluster. Summary In this article, we learned how to set up multiple collections. This article thought us how to increase the number of collections in a cluster. We also worked on a way used to split shards. Resources for Article: Further resources on this subject: Tuning Solr JVM and Container [Article] Apache Solr PHP Integration [Article] Administrating Solr [Article]
Read more
  • 0
  • 0
  • 1916

article-image-taming-big-data-using-hdinsight
Packt
22 Jan 2015
10 min read
Save for later

Taming Big Data using HDInsight

Packt
22 Jan 2015
10 min read
(For more resources related to this topic, see here.) Era of Big Data In this article by Rajesh Nadipalli, the author of HDInsight Essentials Second Edition, we will take a look at the concept of Big Data and how to tame it using HDInsight. We live in a digital era and are always connected with friends and family using social media and smartphones. In 2014, every second, about 5,700 tweets were sent and 800 links were shared using Facebook, and the digital universe was about 1.7 MB per minute for every person on earth (source: IDC 2014 report). This amount of data sharing and storing is unprecedented and is contributing to what is known as Big Data. The following infographic shows you the details of our current use of the top social media sites (source: https://leveragenewagemedia.com/). Another contributor to Big Data are the smart, connected devices such as smartphones, appliances, cars, sensors, and pretty much everything that we use today and is connected to the Internet. These devices, which will soon be in trillions, continuously collect data and communicate with each other about their environment to make intelligent decisions and help us live better. This digitization of the world has added to the exponential growth of Big Data. According to the 2014 IDC digital universe report, the growth trend will continue and double in size every two years. In 2013, about 4.4 zettabytes were created and in 2020, the forecast is 44 zettabytes, which is 44 trillion gigabytes, (source: http://www.emc.com/leadership/digital-universe/2014iview/executive-summary.htm). Business value of Big Data While we generated 4.4 zettabytes of data in 2013, only 5 percent of it was actually analyzed, and this is the real opportunity of Big Data. The IDC report forecasts that by 2020, we will analyze over 35 percent of the generated data by making smarter sensors and devices. This data will drive new consumer and business behavior that will drive trillions of dollars in opportunity for IT vendors and organizations analyzing this data. Let's take a look at some real use cases that have benefited from Big Data: IT systems in all major banks are constantly monitoring fraudulent activities and alerting customers within milliseconds. These systems apply complex business rules and analyze the historical data, geography, type of vendor, and other parameters based on the customer to get accurate results. Commercial drones are transforming agriculture by analyzing real-time aerial images and identifying the problem areas. These drones are cheaper and efficient than satellite imagery, as they fly under the clouds and can be used anytime. They identify the irrigation issues related to water, pests, or fungal infections thereby increasing the crop productivity and quality. These drones are equipped with technology to capture high-quality images every second and transfer them to a cloud-hosted Big Data system for further processing (reference: http://www.technologyreview.com/featuredstory/526491/agricultural-drones/). Developers of the blockbuster Halo 4 game were tasked to analyze player preferences and support an online tournament in the cloud. The game attracted over 4 million players in its first five days after its launch. The development team had to also design a solution that kept track of a leader board for the global Halo 4 Infinity challenge, which was open to all the players. The development team chose the Azure HDInsight service to analyze the massive amounts of unstructured data in a distributed manner. The results from HDInsight were reported using Microsoft SQL Server PowerPivot and Sharepoint and the business was extremely happy with the response times for their queries, which was a few hours or less, (source: http://www.microsoft.com/casestudies/Windows-Azure/343-Industries/343-Industries-Gets-New-User-Insights-from-Big-Data-in-the-Cloud/710000002102) Hadoop Concepts Apache Hadoop is the leading open source Big Data platform that can store and analyze massive amounts of structured and unstructured data efficiently and can be hosted on low-cost commodity hardware. There are other technologies that complement Hadoop under the Big Data umbrella such as MongoDB (a NoSQL database), Cassandra (a document database), and VoltDB (an in-memory database). This section describes Apache Hadoop core concepts and its ecosystem. A brief history of Hadoop Doug Cutting created Hadoop and named it after his kid's stuffed yellow elephant and has no real meaning. In 2004, the initial version of Hadoop was launched as Nutch Distributed Filesystem. In February 2006, the Apache Hadoop project was officially started as a standalone development for MapReduce and HDFS. By 2008, Yahoo adopted Hadoop as the engine of its web search with a cluster size of around 10,000. In the same year, Hadoop graduated as the top-level Apache project confirming its success. In 2012, Hadoop 2.x was launched with YARN enabling Hadoop to take on various types of workloads. Today, Hadoop is known by just about every IT architect and business executive as a open source Big Data platform and is used across all industries and sizes of organizations. Core components In this section, we will explore what Hadoop is actually comprised of. At the basic level, Hadoop consists of 4 layers: Hadoop Common: A set of common libraries and utilities used by Hadoop modules. Hadoop Distributed File System (HDFS): A scalable and fault tolerant distributed filesystem for data in any form. HDFS can be installed on commodity hardware and replicates data three times (which is configurable) to make the filesystem robust and tolerate partial hardware failures. Yet Another Resource Negotiator (YARN): From Hadoop 2.0, YARN is the cluster management layer to handle various workloads on the cluster. MapReduce: MapReduce is a framework that allows parallel processing of data in Hadoop. MapReduce breaks a job into smaller tasks and distributes the load to servers that have the relevant data. The design model is "move code and not data" making this framework efficient as it reduces the network and disk I/O required to move the data. The following diagram shows you the high-level Hadoop 2.0 core components: The preceding diagram shows you the components that form the basic Hadoop framework. In the past few years, a vast array of new components have emerged in the Hadoop ecosystem that take advantage of YARN making Hadoop faster, better, and suitable for various types of workloads. The following diagram shows you the Hadoop framework with these new components: Hadoop cluster layout Each Hadoop cluster has two types of machines, which are as follows: Master nodes: This includes HDFS Name Node, HDFS Secondary Name Node, and YARN Resource Manager. Worker nodes: This includes HDFS Data Nodes and YARN Node Managers. The data nodes and node managers are colocated for optimal data locality and performance. A network switch interconnects the master and worker nodes. It is recommended that you have separate servers for each of the master nodes; however, it is possible to deploy all the master nodes onto a single server for development or testing workloads. The following diagram shows you the typical cluster layout: Let's review the key functions of the master and worker nodes: Name node: This is the master for the distributed filesystem and maintains a metadata. This metadata has the listing of all the files, and the location of each block of a file that are stored across the various slaves. Without a name node, HDFS is not accessible. From Hadoop 2.0 onwards, name node HA (High Availability) can be configured with active and standby servers. Secondary name node: This is an assistant to the name node. It communicates only with the name node to take snapshots of HDFS metadata at intervals that is configured at the cluster level. YARN resource manager: This server is a scheduler that allocates the available resources in the cluster among the competing applications. Worker nodes: The Hadoop cluster will have several worker nodes that handle two types of functions—HDFS Data Node and YARN Node Manager. It is typical that each worker node handles both the functions for optimal data locality. This means processing happens on the data that is local to the node and follows the principle "move code and not data". HDInsight Overview HDInsight is an enterprise-ready distribution of Hadoop that runs on Windows servers and on the Azure HDInsight cloud service (PaaS). It is 100 percent Apache Hadoop based service in the cloud. HDInsight was developed with the partnership of Hortonworks, and Microsoft. Enterprises can now harness the power of Hadoop on Windows servers and the Windows Azure cloud service. The following are the key differentiators for a HDInsight distribution: Enterprise-ready Hadoop: HDInsight is backed by Microsoft support, and runs on standard Windows servers. IT teams can leverage Hadoop with the Platform as a Service (PaaS) reducing the operations overhead. Analytics using Excel: With Excel integration, your business users can visualize and analyze Hadoop data in compelling new ways with an easy-to-use familiar tool. The Excel add-ons PowerBI, PowerPivot, Power Query, and Power Map integrate with HDInsight. Develop in your favorite language: HDInsight has powerful programming extensions for languages, including .Net, C#, Java, and more. Scale using the cloud offering: The Azure HDInsight service enables customers to scale quickly as per the project needs and have seamless interface between HDFS and Azure Blob storage. Connect an on-premises Hadoop cluster with the cloud: With HDInsight, you can move Hadoop data from an on-site data center to the Azure cloud for backup, dev/test, and cloud bursting scenarios. Includes NoSQL transactional capabilities: HDInsight also includes Apache HBase, a columnar NoSQL database that runs on top of Hadoop and allows large online transactional processing (OLTP). HDInsight Emulator: The HDInsight Emulator tool provides a local development environment for Azure HDInsight without the need for a cloud subscription. This can be installed using Microsoft Web Platform Installer. Enterprise-ready Hadoop: HDInsight is backed by Microsoft support, and runs on standard Windows servers. IT teams can leverage Hadoop with the Platform as a Service (PaaS) reducing the operations overhead. Analytics using Excel: With Excel integration, your business users can visualize and analyze Hadoop data in compelling new ways with an easy-to-use familiar tool. The Excel add-ons PowerBI, PowerPivot, Power Query, and Power Map integrate with HDInsight. Develop in your favorite language: HDInsight has powerful programming extensions for languages, including .Net, C#, Java, and more. Scale using the cloud offering: The Azure HDInsight service enables customers to scale quickly as per the project needs and have seamless interface between HDFS and Azure Blob storage. Connect an on-premises Hadoop cluster with the cloud: With HDInsight, you can move Hadoop data from an on-site data center to the Azure cloud for backup, dev/test, and cloud bursting scenarios. Includes NoSQL transactional capabilities: HDInsight also includes Apache HBase, a columnar NoSQL database that runs on top of Hadoop and allows large online transactional processing (OLTP). HDInsight Emulator: The HDInsight Emulator tool provides a local development environment for Azure HDInsight without the need for a cloud subscription. This can be installed using Microsoft Web Platform Installer. Summary We live in a connected digital era and are witnessing unprecedented growth of data. Organizations that are able to analyze Big Data are demonstrating significant return on investment by detecting fraud, improved operations, and reduced time to analyze with a scale-out architecture. Apache Hadoop is the leading open source Big Data platform with strong and diverse ecosystem projects that enable organizations to build a modern data architecture. At the core, Hadoop has two key components: Hadoop Distributed File System also known as HDFS, and a cluster resource manager known as YARN. YARN has enabled Hadoop to be a true multi-use data platform that can handle batch processing, real-time streaming, interactive SQL, and others. Microsoft HDInsight is an enterprise-ready distribution of Hadoop on the cloud that has been developed with the partnership of Hortonworks and Microsoft. The key benefits of HDInsight include scaling up/down as required, analysis using Excel, connecting an on-premise Hadoop cluster with the cloud, and flexible programming and support for NoSQL transactional databases. Resources for Article: Further resources on this subject: Hadoop and HDInsight in a Heartbeat [article] Sizing and Configuring your Hadoop Cluster [article] Introducing Kafka [article]
Read more
  • 0
  • 0
  • 1920

article-image-unboxing-docker
Packt
22 Jan 2015
10 min read
Save for later

Unboxing Docker

Packt
22 Jan 2015
10 min read
In this article by Shrikrishna Holla, author of the book Orchestrating Docker, in this article, you will learn how to install Docker on various systems, both in development and in production. For Linux-based systems, since a kernel is already available, installation is as simple as the apt-get install or yum install commands. However, to run Docker on non-Linux operating systems such as OSX and Windows, you will need to install a helper application developed by Docker Inc., called Boot2Docker. This will install a lightweight Linux VM on VirtualBox, which will make Docker available through port 2375, assigned by the Internet Assigned Numbers Authority (IANA). You will have installed Docker on your system, be it in development or production, and verified it. This article explains: Introducing Docker Installing Docker Ubuntu (14.04 and 12.04) Mac OSX and Windows (For more resources related to this topic, see here.) Docker was developed by DotCloud Inc. (Currently Docker Inc.), as the framework they built their Platform as a Service (PaaS) upon. When they found increasing developer interest in the technology, they released it as open source and have since announced that they will completely focus on the Docker technology's development, which is good news as it means continual support and improvement for the platform. There have been many tools and technologies aimed at making distributed applications possible, even easy to set up, but none of them have as wide an appeal as Docker does, which is primarily because of its cross-platform nature and friendliness towards both system administrators and developers. It is possible to set up Docker in any OS, be it Windows, OSX, or Linux, and Docker containers work the same way everywhere. This is extremely powerful, as it enables a write-once-run-anywhere workflow. Docker containers are guaranteed to run the same way, be it on your development desktop, a bare-metal server, virtual machine, data center, or cloud. No longer do you have the situation where a program runs on the developer's laptop but not on the server. The nature of the workflow that comes with Docker is such that developers can completely concentrate on building applications and getting them running inside the containers, whereas sysadmins can work on running the containers in deployment. This separation of roles and the presence of a single underlying tool to enable it simplifies the management of code and the deployment process. But don't virtual machines already provide all of these features? Virtual Machines (VMs) are fully virtualized. This means that they share minimal resources amongst themselves and each VM has its own set of resources allocated to it. While this allows fine-grained configuration of the individual VMs, minimal sharing also translates into greater resource usage, redundant running processes (an entire operating system needs to run!), and hence a performance overhead. Docker, on the other hand, builds on a container technology that isolates a process and makes it believe that it is running on a standalone operating system. The process still runs in the same operating system as its host, sharing its kernel. It uses a layered copy-on-write filesystem called Another Unionfs (UFS), which shares common portions of the operating system between containers. Greater sharing, of course, can only mean less isolation, but vast improvements in Linux process's resource management solutions such as namespaces and cgroups have allowed Docker to achieve VM-like sandboxing of processes and yet maintain a very small resource footprint. Installing Docker Docker is available in the standard repositories of most major Linux distributions. We will be looking at the installation procedures for Docker in Ubuntu 14.04 and 12.04 (Trusty and Precise), Mac OSX, and Windows. If you are currently using an operating system not listed above, you can look up the instructions for your operating system at https://docs.docker.com/installation/#installation. Installing Docker in Ubuntu Docker is supported by Ubuntu from Ubuntu 12.04 onwards. Remember that you still need a 64-bit operating system to run Docker. Let's take a look at the installation instructions for Ubuntu 14.04. Installing Docker in Ubuntu Trusty 14.04 LTS Docker is available as a package in the Ubuntu Trusty release's software repositories under the name of docker.io: $ sudo apt-get update $ sudo apt-get -y install docker.io That's it! You have now installed Docker onto your system. However, since the command has been renamed docker.io, you will have to run all Docker commands with docker.io instead of docker. The package is named docker.io because it conflicts with another KDE3/GNOME2 package called docker. If you rather want to run commands as docker, you can create a symbolic link to the /usr/local/bin directory. The second command adds autocomplete rules to bash: $ sudo ln -s /usr/bin/docker.io /usr/local/bin/docker $ sudo sed -i '$acomplete -F _docker docker' > /etc/bash_completion.d/docker.io Installing Docker in Ubuntu Precise 12.04 LTS Ubuntu 12.04 comes with an older kernel (3.2), which is incompatible with some of the dependencies of Docker. So we will have to upgrade it: $ sudo apt-get update $ sudo apt-get -y install linux-image-generic-lts-raring linux-headers-generic-lts-raring $ sudo reboot The kernel that we just installed comes with AUFS built in, which is also a Docker requirement. Now let's wrap up the installation: $ curl -s https://get.docker.io/ubuntu/ | sudo sh This is a curl script for easy installation. Looking at the individual pieces of this script will allow us to understand the process better: First, the script checks whether our Advanced Package Tool (APT) system can deal with https URLs, and installs apt-transport-https if it cannot: # Check that HTTPS transport is available to APT if [ ! -e /usr/lib/apt/methods/https ]; then apt-get update apt-get install -y apt-transport-https fi Then it will add the Docker repository to our local key chain: $ sudo apt-key adv --keyserver hkp://keyserver.ubuntu.com:80 --recv-keys 36A1D7869245C8950F966E92D8576A8BA88D21E9 You may receive a warning that the package isn't trusted. Answer yes to continue the installation. Finally, it adds the Docker repository to the APT sources list, and updates and installs the lxc-docker package: $ sudo sh -c "echo deb https://get.docker.io/ubuntu docker main /etc/apt/sources.list.d/docker.list" $ sudo apt-get update $ sudo apt-get install lxc-docker Docker versions before 0.9 had a hard dependency on LXC (Linux Containers) and hence couldn't be installed on VMs hosted on OpenVZ. But since 0.9, the execution driver has been decoupled from the Docker core, which allows us to use one of numerous isolation tools such as LXC, OpenVZ, systemd-nspawn, libvirt-lxc, libvirt-sandbox, qemu/kvm, BSD Jails, Solaris Zones, and even chroot! However, it comes by default with an execution driver for Docker's own containerization engine, called libcontainer, which is a pure Go library that can access the kernel's container APIs directly, without any other dependencies. To use any other containerization engine, say LXC, you can use the-e flag, like so: $ docker -d -e lxc. Now that we have Docker installed, we can get going at full steam! There is one problem though: software repositories like APT are usually behind times and often have older versions. Docker is a fast-moving project and a lot has changed in the last few versions. So it is always recommended to have the latest version installed. Upgrading Docker You can upgrade Docker as and when it is updated in the APT repositories. An alternative (and better) method is to build from source. It is recommended to upgrade to the newest stable version as the newer versions might contain critical security updates and bug fixes. Also, the examples in this book assume a Docker version greater than 1.0, whereas Ubuntu's standard repositories package a much older version. Mac OSX and Windows Docker depends on the Linux kernel, so we need to run Linux in a VM and install and use Docker through it. Boot2Docker is a helper application built by Docker Inc. that installs a VM containing a lightweight Linux distribution made specifically to run Docker containers. It also comes with a client that provides the same Application Program Interface (API) as that of Docker, but interfaces with the docker daemon running in the VM, allowing us to run commands from within the OSX/Windows terminal. To install Boot2Docker, carry out the following steps: Download the latest release of Boot2Docker for your operating system from http://boot2docker.io/. The installation image is shown as follows: Run the installer, which will install VirtualBox and the Boot2Docker management tool. Run Boot2docker. The first run will ask you for a Secure Shell (SSH) key passphrase. Subsequent runs of the script will connect you to a shell session in the virtual machine. If needed, the subsequent runs will initialize a new VM and start it. Alternately, to run Boot2Docker, you can also use the terminal command boot2docker: $ boot2docker init # First run $ boot2docker start $ export DOCKER_HOST=tcp://$(boot2docker ip 2>/dev/null):2375 You will have to run boot2docker init only once. It will ask you for an SSH key passphrase. This passphrase is subsequently used by boot2docker ssh to authenticate SSH access. Once you have initialized Boot2Docker, you can subsequently use it with the boot2docker start and boot2docker stop commands. DOCKER_HOST is an environment variable that, when set, indicates to the Docker client the location of the docker daemon. A port forwarding rule is set to the boot2Docker VM's port 2375 (where the docker daemon runs). You will have to set this variable in every terminal shell you want to use Docker in. Bash allows you to insert commands by enclosing subcommands within `` or $(). These will be evaluated first and the result will be substituted in the outer commands. If you are the kind that loves to poke around, the Boot2Docker default user is docker and the password is tcuser. The boot2Docker management tool provides several commands: $ boot2docker Usage: boot2docker [<options>] {help|init|up|ssh|save|down|poweroff|reset| restart|config|status|info|ip|delete|download|version} [<args>] When using boot2Docker, the DOCKER_HOST environment variable has to be available in the terminal session for Docker commands to work. So, if you are getting the Post http:///var/run/docker.sock/v1.12/containers/create: dial unix /var/run/docker.sock: no such file or directory error, it means that the environment variable is not assigned. It is easy to forget to set this environment variable when you open a new terminal. For OSX users, to make things easy, add the following line to your .bashrc or .bash_profile shells: alias setdockerhost='export DOCKER_HOST=tcp://$(boot2docker ip 2>/dev/null):2375' Now, whenever you open a new terminal or get the above error, just run the following command: $ setdockerhost This image shows how the terminal screen will look like when you have logged into the Boot2Docker VM. Summary I hope you got hooked to Docker. Docker technology will take you into the Docker world and try to dazzle you with its awesomeness. In this article, you learned some history and some basics on Docker and how it works. We saw how it is different from and advantageous over VM. Then we proceeded to install Docker on our development setup, be it Ubuntu, Mac, or Windows. Now you can pat your self on the back and proceed with Docker technology. Resources for Article: Further resources on this subject: Managing Heroku from the Command Line [article] Target Exploitation [article] Wireless and Mobile Hacks [article]
Read more
  • 0
  • 0
  • 2150
article-image-lets-get-started-active-di-rectory
Packt
22 Jan 2015
11 min read
Save for later

Let's Get Started with Active Di-rectory

Packt
22 Jan 2015
11 min read
In this article by Uma Yellapragada, author of the book Active Directory with PowerShell, we will see how the Powershell cmdlets and modules are used for managing Active Directory. (For more resources related to this topic, see here.) Welcome to managing Active Directory using PowerShell. There are lot of good books from Packt Publishing that you might want to refer to improve your PowerShell skills. Assuming that you know the basics of PowerShell, this book further helps you to manage Active Directory using PowerShell. Do not worry if you are not familiar with PowerShell. You can still make use of the content in this book because most of the one-liners quoted in this book are self-explanatory. This chapter will take you through some of the essential tools that are required for managing Active Directory using PowerShell: The Microsoft Active Directory PowerShell module The Quest Active Directory PowerShell module Native PowerShell cmdlets Details of how to get these tools, install, and configure them are also provided in this chapter. The content in this book completely relies on these tools to query Active Directory, so it is important to install and configure them before you proceed with further chapters in this book. Though you can install and use these tools on legacy operating systems such as Windows XP, Windows 7, Windows Server 2003, Windows Server 2003 R2, Windows Server 2008, Windows Server 2008 R2, and so on, we will focus mostly on using them on the latest versions of operating systems, such as Windows 8.1 and Windows Server 2012 R2. Most of the operations performed on Windows 8.1 and Windows Server 2012 work on its predecessors. Any noticeable differences will be highlighted as far as possible. Another reason for using the latest versions of operating systems for demonstration is the features list that they provide. When the Microsoft Active Directory PowerShell module was initially introduced with Windows Server 2008 R2, it came with 76 cmdlets. In Windows Server 2012, the number of cmdlets increased from 76 to 135. Similarly, the Windows Server 2012 R2 release has 147 Active Directory cmdlets. Looking at this pattern, it is clear that Microsoft is focusing on bringing more and more functionality into the Active Directory PowerShell module with its new releases. This means the types of actions we can perform with the Microsoft Active Directory module are increasing. Because of these reasons, Windows 8.1 and Windows Server 2012 R2 are being used for demonstration so that you can learn more about managing Active Directory using PowerShell. To see how many cmdlets a module has, use the following commands once you have the Active Directory PowerShell module installed using the approach that is discussed later in this chapter: Import-Module ActiveDirectory First, import the Active Directory module in a PowerShell window. You will see a progress bar as shown in the following screenshot: Once the module is imported, then you can run the following command to verify how many cmdlets Active Directory module has: (Get-Command -Module ActiveDirectory).Count As you can see in the following screenshot, there are 147 cmdlets available in Active Directory module on a Windows Server 2012 R2 server: Ways to automate Active Directory operations Active Directory operations can be automated in different ways. You can use C#, VB, command line tools (such as dsquery), VBScript, PowerShell, Perl, and so on. Since this book focuses on using PowerShell, let's examine the methodologies that are widely used to automate Active Directory operations using PowerShell. There are three ways available to manage Active Directory using PowerShell. Each of these has its own advantages and operating environments: The Microsoft Active Directory module The Quest Active Directory PowerShell cmdlets The native method of PowerShell Let's dig into each of these and understand a bit more in terms of how to install, configure, and use them. The Microsoft Active Directory module As the name indicates, this PowerShell module is developed and supported by Microsoft itself. This module contains a group of cmdlets that you can use to manage Active Directory Domain Services (AD DS) and Active Directory Lightweight Directory Services (AD LDS). The Microsoft Active Directory module is introduced with the Windows Server 2008 R2 operating system and you need to have at least this version of OS to make use of the module. This module comes as an optional feature on Windows Server 2008 R2, Windows Server 2012, and Windows Server 2012 R2 and gets installed by default when you install the AD DS or AD LDS server roles, or when you promote them as domain controllers. You can have this module installed on Windows 7 or Windows 8 by installing the Remote Server Administration Tools (RSAT) feature. This module works by querying Active Directory through a service called Active Directory Web Services (ADWS), which is available in Windows Server 2008 R2 or later operating systems. This means your domain should have at least one domain controller with an operating system such as Windows Server 2008 R2 or above to make the module work. Don't get disappointed if none of your domain controllers are upgraded to Windows Server 2008 R2. Microsoft has released a component called Active Directory Management Gateway Service that runs as the Windows Server 2008 R2 ADWS service and provides the same functionality on Windows Server 2003 or Windows Server 2008 domain controllers. You can read more about ADWS and gateway service functionality at http://technet.microsoft.com/en-us/library/dd391908(v=ws.10).aspx Installing Active Directory As mentioned earlier, if you promote a Windows Server 2008 R2 or later operating system to domain controller, there is no need to install this module explicitly. It comes with the domain controller installation process. Installing Active Directory module on Windows 7, Windows 8, and Windows 8.1 is a two-step process. First, we need to install the Remote Server Administration Tool (RSAT) kit for the respective operating system; then we enable the Active Directory module, which is part of RSAT, as a second step. Installing the Remote Server Administration Tool kit First, download the RSAT package from one of the following links based on your operating system and install it with administrative privileges: RSATfor Windows 8.1 http://www.microsoft.com/en-us/download/details.aspx?id=39296 RS AT for Windows 8 http://www.microsoft.com/en-us/download/details.aspx?id=28972 RSAT for Windows 7 with SP1 http://www.microsoft.com/en-us/download/details.aspx?id=7887 Installing the Active Directory module Once the RSAT package is installed, you need to enable Remote Server Administration Tools | Role Administration Tools | AD DS and AD LDS Tools | Active Directory module for Windows PowerShell via the Turn Windows features on or off wizard that you will find in the Control Panel of the Windows 7 or Windows 8 operating systems. To install Active Directory module on Windows Server 2008, Windows Server 2008 R2, and Windows Server 2012 member servers, there is no need to install additional components. They are already part of the available features and it's just a matter of adding the feature to the operating system. This can be done using PowerShell or a regular GUI approach. If you want to enable this feature using PowerShell in the aforementioned server operating systems, then use the following commands: Import-Module ServerManager Add-WindowsFeature RSAT-AD-PowerShell The RSAT package comes with the build on Windows Server 2008 R2 and Windows Server 2012. No need to install RSAT explicitly. The Server Manager PowerShell module in these operating systems contains the cmdlet, Add-WindowsFeature, which is used for installing features. In this case, we are installing Active Directory module for the Windows PowerShell feature in the AD DS and AD LDS tools. If you want to perform this installation on remote servers, you can use the PSRemoting feature in PowerShell. This is the best approach if you want to deploy Active Directory module on all your servers in your environment. This Active Directory module for Windows PowerShell can be installed using GUI interface as well. You need to use Server Manager to add Active Directory Module for Windows PowerShell using the Add Roles and Features Wizard as shown in following screenshot: Testing the functionality After installation, you can verify the functionality of Active Directory module by importing it and running a few basic cmdlets. A cmdlet is a simple command that is used in the Windows PowerShell environment. You can read more about cmdlets at http://msdn.microsoft.com/en-us/library/ms714395(v=vs.85).aspx. Your installation is successful if you see your domain information after running the Get-ADDomain cmdlet, as shown in the following: Import-Module ActiveDirectory Get-ADDomain One good thing about PowerShell is you can avoid the hassle of typing the whole command in the PowerShell window by using the Tab Expansion feature. You can type part of the command and press the Tab key to autocomplete it. If there are multiple commands (or cmdlets) that match the string you typed, then use Tab multiple times to select the one you need. It's pretty handy because some of the cmdlets in Active Directory are considerably long and it can get really frustrating to type them. Refer to the TechNet page at http://technet.microsoft.com/en-us/library/dd315316.aspx in order to understand how you can use this feature of PowerShell. Quest Active Directory PowerShell cmdlets Previously, you learned that Microsoft Active Directory (MS AD) module was introduced with Windows Server 2008 R2. So, how did system administrators manage their Active Directory environments before the introduction of MS AD module? Quest Active Directory PowerShell cmdlets were present at that time to simplify AD operations. This Quest module has a bunch of cmdlets to perform various operations in Active Directory. Even after Microsoft released Active Directory module, many people still use Quest AD cmdlets because of its simplicity and the wide variety of management options it provides. Quest AD module is part of the Quest ActiveRoles Server product, which is used for managing Active Directory objects. This Quest AD module is also referred to as ActiveRoles Management Shell for Active Directory because it is an integral part of the ActiveRoles product. Installing Quest Quest software (now acquired by Dell) allows you to download ActiveRoles Management Shell for free and you can download a copy from https://support.software.dell.com/download-install-detail/5024645. You will find two versions of Quest AD Management Shell in the download page. Be sure to download the latest one: v1.6.0. While trying to install the MSI, you might get a prompt saying Microsoft .NET Framework 3.5 Service Pack 1 or later is required. You will experience this even if you have .NET framework 4.0 installed on your computer. It seems the MSI is specifically looking for .NET 3.5 SP1. So, ensure that you have .NET Framework 3.5 SP1 installed before you start installing the Quest AD management Shell MSI. You might want to refer to the TechNet article at http://technet.microsoft.com/en-us/library/dn482071.aspx to understand NET Framework 3.5 installation process on Windows Server 2012 R2. After the completion of MSI, you can start using this module in two ways. You can either search in Program Files for the application with the name ActiveRoles Management Shell for Active Directory or you can add the Quest snap-in into the regular PowerShell window. It's preferred to add the snap-in directly into existing PowerShell windows rather than opening a new Quest AD Shell when you want to manage Active Directory using Quest cmdlets. Also if you are authoring any scripts based on Quest AD cmdlets, it is best to add the snap-in in your code rather than asking the script users to run it from a Quest AD Shell window. The Quest AD Snap-in can be added to an existing PowerShell window using the following command: Add-PSSnapin Quest.ActiveRoles.ADManagement After adding the snap-in, you can list the cmdlets provided by this snap-in using the following command: Get-Command -Module Quest.ActiveRoles.ADManagement Get-Command is the cmdlet used to list cmdlets or functions inside a given module or snap-in after importing them. The version (v1.6.0) of Quest AD Shell has 95 cmdlets. Unlike Microsoft Active Directory module, the number of cmdlets will not change from one operating system to another in Quest AD Shell. The list of cmdlets is the same irrespective of the operating system where the tool is installed. One advantage of Quest AD Shell is that it doesn't need Active Directory Web services, which is mandatory for Microsoft Active Directory module. Quest AD Shell works with Windows Server 2003-based domain controllers as well without the need to install Active Directory Management Gateway Service. Testing the functionality Open a new PowerShell window and try the following commands. The Get-QADRootDSE cmdlet should return your current domain information. All the Quest AD Shell cmdlets will have the word QAD prefixed to the noun: Add-PSSnapin -Name Quest.ActiveRoles.ADManagement Get-QADRootDSE Summary In this article, we reviewed the automations operations of Active Directory, its module. The remote server administration with its functionality and different cmdlets to perform the operations on it. Resources for Article: Further resources on this subject: So, what is PowerShell 3.0 WMI? [article] Unleashing Your Development Skills with PowerShell [article] How to use PowerShell Web Access to manage Windows Server [article]
Read more
  • 0
  • 0
  • 1586

article-image-highcharts-configurations
Packt
21 Jan 2015
53 min read
Save for later

Highcharts Configurations

Packt
21 Jan 2015
53 min read
This article is written by Joe Kuan, the author of Learning Highcharts 4. All Highcharts graphs share the same configuration structure and it is crucial for us to become familiar with the core components. However, it is not possible to go through all the configurations within the book. In this article, we will explore the functional properties that are most used and demonstrate them with examples. We will learn how Highcharts manages layout, and then explore how to configure axes, specify single series and multiple series data, followed by looking at formatting and styling tool tips in both JavaScript and HTML. After that, we will get to know how to polish our charts with various types of animations and apply color gradients. Finally, we will explore the drilldown interactive feature. In this article, we will cover the following topics: Understanding Highcharts layout Framing the chart with axes (For more resources related to this topic, see here.) Configuration structure In the Highcharts configuration object, the components at the top level represent the skeleton structure of a chart. The following is a list of the major components that are covered in this article: chart: This has configurations for the top-level chart properties such as layouts, dimensions, events, animations, and user interactions series: This is an array of series objects (consisting of data and specific options) for single and multiple series, where the series data can be specified in a number of ways xAxis/yAxis/zAxis: This has configurations for all the axis properties such as labels, styles, range, intervals, plotlines, plot bands, and backgrounds tooltip: This has the layout and format style configurations for the series data tool tips drilldown: This has configurations for drilldown series and the ID field associated with the main series title/subtitle: This has the layout and style configurations for the chart title and subtitle legend: This has the layout and format style configurations for the chart legend plotOptions: This contains all the plotting options, such as display, animation, and user interactions, for common series and specific series types exporting: This has configurations that control the layout and the function of print and export features For reference information concerning all configurations, go to http://api.highcharts.com. Understanding Highcharts' layout Before we start to learn how Highcharts layout works, it is imperative that we understand some basic concepts first. First, set a border around the plot area. To do that we can set the options of plotBorderWidth and plotBorderColor in the chart section, as follows:         chart: {                renderTo: 'container',                type: 'spline',                plotBorderWidth: 1,                plotBorderColor: '#3F4044'        }, The second border is set around the Highcharts container. Next, we extend the preceding chart section with additional settings:         chart: {                renderTo: 'container',                ....                borderColor: '#a1a1a1',                borderWidth: 2,                borderRadius: 3        }, This sets the container border color with a width of 2 pixels and corner radius of 3 pixels. As we can see, there is a border around the container and this is the boundary that the Highcharts display cannot exceed: By default, Highcharts displays have three different areas: spacing, labeling, and plot area. The plot area is the area inside the inner rectangle that contains all the plot graphics. The labeling area is the area where labels such as title, subtitle, axis title, legend, and credits go, around the plot area, so that it is between the edge of the plot area and the inner edge of the spacing area. The spacing area is the area between the container border and the outer edge of the labeling area. The following screenshot shows three different kinds of areas. A gray dotted line is inserted to illustrate the boundary between the spacing and labeling areas. Each chart label position can be operated in one of the following two layouts: Automatic layout: Highcharts automatically adjusts the plot area size based on the labels' positions in the labeling area, so the plot area does not overlap with the label element at all. Automatic layout is the simplest way to configure, but has less control. This is the default way of positioning the chart elements. Fixed layout: There is no concept of labeling area. The chart label is specified in a fixed location so that it has a floating effect on the plot area. In other words, the plot area side does not automatically adjust itself to the adjacent label position. This gives the user full control of exactly how to display the chart. The spacing area controls the offset of the Highcharts display on each side. As long as the chart margins are not defined, increasing or decreasing the spacing area has a global effect on the plot area measurements in both automatic and fixed layouts. Chart margins and spacing settings In this section, we will see how chart margins and spacing settings have an effect on the overall layout. Chart margins can be configured with the properties margin, marginTop, marginLeft, marginRight, and marginBottom, and they are not enabled by default. Setting chart margins has a global effect on the plot area, so that none of the label positions or chart spacing configurations can affect the plot area size. Hence, all the chart elements are in a fixed layout mode with respect to the plot area. The margin option is an array of four margin values covered for each direction, the same as in CSS, starting from north and going clockwise. Also, the margin option has a lower precedence than any of the directional margin options, regardless of their order in the chart section. Spacing configurations are enabled by default with a fixed value on each side. These can be configured in the chart section with the property names spacing, spacingTop, spacingLeft, spacingBottom, and spacingRight. In this example, we are going to increase or decrease the margin or spacing property on each side of the chart and observe the effect. The following are the chart settings:             chart: {                renderTo: 'container',                type: ...                marginTop: 10,                marginRight: 0,                spacingLeft: 30,                spacingBottom: 0            }, The following screenshot shows what the chart looks like: The marginTop property fixes the plot area's top border 10 pixels away from the container border. It also changes the top border into fixed layout for any label elements, so the chart title and subtitle float on top of the plot area. The spacingLeft property increases the spacing area on the left-hand side, so it pushes the y axis title further in. As it is in automatic layout (without declaring marginLeft), it also pushes the plot area's west border in. Setting marginRight to 0 will override all the default spacing on the chart's right-hand side and change it to fixed layout mode. Finally, setting spacingBottom to 0 makes the legend touch the lower bar of the container, so it also stretches the plot area downwards. This is because the bottom edge is still in automatic layout even though spacingBottom is set to 0. Chart label properties Chart labels such as xAxis.title, yAxis.title, legend, title, subtitle, and credits share common property names, as follows: align: This is for the horizontal alignment of the label. Possible keywords are 'left', 'center', and 'right'. As for the axis title, it is 'low', 'middle', and 'high'. floating: This is to give the label position a floating effect on the plot area. Setting this to true will cause the label position to have no effect on the adjacent plot area's boundary. margin: This is the margin setting between the label and the side of the plot area adjacent to it. Only certain label types have this setting. verticalAlign: This is for the vertical alignment of the label. The keywords are 'top', 'middle', and 'bottom'. x: This is for horizontal positioning in relation to alignment. y: This is for vertical positioning in relation to alignment. As for the labels' x and y positioning, they are not used for absolute positioning within the chart. They are designed for fine adjustment with the label alignment. The following diagram shows the coordinate directions, where the center represents the label location: We can experiment with these properties with a simple example of the align and y position settings, by placing both title and subtitle next to each other. The title is shifted to the left with align set to 'left', whereas the subtitle alignment is set to 'right'. In order to make both titles appear on the same line, we change the subtitle's y position to 15, which is the same as the title's default y value:  title: {     text: 'Web browsers ...',     align: 'left' }, subtitle: {     text: 'From 2008 to present',     align: 'right',     y: 15 }, The following is a screenshot showing both titles aligned on the same line: In the following subsections, we will experiment with how changes in alignment for each label element affect the layout behavior of the plot area. Title and subtitle alignments Title and subtitle have the same layout properties, and the only differences are that the default values and title have the margin setting. Specifying verticalAlign for any value changes from the default automatic layout to fixed layout (it internally switches floating to true). However, manually setting the subtitle's floating property to false does not switch back to automatic layout. The following is an example of title in automatic layout and subtitle in fixed layout:     title: {       text: 'Web browsers statistics'    },    subtitle: {       text: 'From 2008 to present',       verticalAlign: 'top',       y: 60       }, The verticalAlign property for the subtitle is set to 'top', which switches the layout into fixed layout, and the y offset is increased to 60. The y offset pushes the subtitle's position further down. Due to the fact that the plot area is not in an automatic layout relationship to the subtitle anymore, the top border of the plot area goes above the subtitle. However, the plot area is still in automatic layout towards the title, so the title is still above the plot area: Legend alignment Legends show different behavior for the verticalAlign and align properties. Apart from setting the alignment to 'center', all other settings in verticalAlign and align remain in automatic positioning. The following is an example of a legend located on the right-hand side of the chart. The verticalAlign property is switched to the middle of the chart, where the horizontal align is set to 'right':           legend: {                align: 'right',                verticalAlign: 'middle',                layout: 'vertical'          }, The layout property is assigned to 'vertical' so that it causes the items inside the legend box to be displayed in a vertical manner. As we can see, the plot area is automatically resized for the legend box: Note that the border decoration around the legend box is disabled in the newer version. To display a round border around the legend box, we can add the borderWidth and borderRadius options using the following:           legend: {                align: 'right',                verticalAlign: 'middle',                layout: 'vertical',                borderWidth: 1,                borderRadius: 3          }, Here is the legend box with a round corner border: Axis title alignment Axis titles do not use verticalAlign. Instead, they use the align setting, which is either 'low', 'middle', or 'high'. The title's margin value is the distance between the axis title and the axis line. The following is an example of showing the y-axis title rotated horizontally instead of vertically (which it is by default) and displayed on the top of the axis line instead of next to it. We also use the y property to fine-tune the title location:             yAxis: {                title: {                    text: 'Percentage %',                    rotation: 0,                    y: -15,                    margin: -70,                    align: 'high'                },                min: 0            }, The following is a screenshot of the upper-left corner of the chart showing that the title is aligned horizontally at the top of the y axis. Alternatively, we can use the offset option instead of margin to achieve the same result. Credits alignment Credits is a bit different from other label elements. It only supports the align, verticalAlign, x, and y properties in the credits.position property (shorthand for credits: { position: … }), and is also not affected by any spacing setting. Suppose we have a graph without a legend and we have to move the credits to the lower-left area of the chart, the following code snippet shows how to do it:             legend: {                enabled: false            },            credits: {                position: {                   align: 'left'                },                text: 'Joe Kuan',                href: 'http://joekuan.wordpress.com'            }, However, the credits text is off the edge of the chart, as shown in the following screenshot: Even if we move the credits label to the right with x positioning, the label is still a bit too close to the x axis interval label. We can introduce extra spacingBottom to put a gap between both labels, as follows:             chart: {                   spacingBottom: 30,                    ....            },            credits: {                position: {                   align: 'left',                   x: 20,                   y: -7                },            },            .... The following is a screenshot of the credits with the final adjustments: Experimenting with an automatic layout In this section, we will examine the automatic layout feature in more detail. For the sake of simplifying the example, we will start with only the chart title and without any chart spacing settings:      chart: {         renderTo: 'container',         // border and plotBorder settings         borderWidth: 2,         .....     },     title: {            text: 'Web browsers statistics,     }, From the preceding example, the chart title should appear as expected between the container and the plot area's borders: The space between the title and the top border of the container has the default setting spacingTop for the spacing area (a default value of 10-pixels high). The gap between the title and the top border of the plot area is the default setting for title.margin, which is 15-pixels high. By setting spacingTop in the chart section to 0, the chart title moves up next to the container top border. Hence the size of the plot area is automatically expanded upwards, as follows: Then, we set title.margin to 0; the plot area border moves further up, hence the height of the plot area increases further, as follows: As you may notice, there is still a gap of a few pixels between the top border and the chart title. This is actually due to the default value of the title's y position setting, which is 15 pixels, large enough for the default title font size. The following is the chart configuration for setting all the spaces between the container and the plot area to 0: chart: {     renderTo: 'container',     // border and plotBorder settings     .....     spacingTop: 0},title: {     text: null,     margin: 0,     y: 0} If we set title.y to 0, all the gap between the top edge of the plot area and the top container edge closes up. The following is the final screenshot of the upper-left corner of the chart, to show the effect. The chart title is not visible anymore as it has been shifted above the container: Interestingly, if we work backwards to the first example, the default distance between the top of the plot area and the top of the container is calculated as: spacingTop + title.margin + title.y = 10 + 15 + 15 = 40 Therefore, changing any of these three variables will automatically adjust the plot area from the top container bar. Each of these offset variables actually has its own purpose in the automatic layout. Spacing is for the gap between the container and the chart content; thus, if we want to display a chart nicely spaced with other elements on a web page, spacing elements should be used. Equally, if we want to use a specific font size for the label elements, we should consider adjusting the y offset. Hence, the labels are still maintained at a distance and do not interfere with other components in the chart. Experimenting with a fixed layout In the preceding section, we have learned how the plot area dynamically adjusted itself. In this section, we will see how we can manually position the chart labels. First, we will start with the example code from the beginning of the Experimenting with automatic layout section and set the chart title's verticalAlign to 'bottom', as follows: chart: {    renderTo: 'container',    // border and plotBorder settings    .....},title: {    text: 'Web browsers statistics',    verticalAlign: 'bottom'}, The chart title is moved to the bottom of the chart, next to the lower border of the container. Notice that this setting has changed the title into floating mode; more importantly, the legend still remains in the default automatic layout of the plot area: Be aware that we haven't specified spacingBottom, which has a default value of 15 pixels in height when applied to the chart. This means that there should be a gap between the title and the container bottom border, but none is shown. This is because the title.y position has a default value of 15 pixels in relation to spacing. According to the diagram in the Chart label properties section, this positive y value pushes the title towards the bottom border; this compensates for the space created by spacingBottom. Let's make a bigger change to the y offset position this time to show that verticalAlign is floating on top of the plot area:  title: {     text: 'Web browsers statistics',     verticalAlign: 'bottom',     y: -90 }, The negative y value moves the title up, as shown here: Now the title is overlapping the plot area. To demonstrate that the legend is still in automatic layout with regard to the plot area, here we change the legend's y position and the margin settings, which is the distance from the axis label:                legend: {                   margin: 70,                   y: -10               }, This has pushed up the bottom side of the plot area. However, the chart title still remains in fixed layout and its position within the chart hasn't been changed at all after applying the new legend setting, as shown in the following screenshot: By now, we should have a better understanding of how to position label elements, and their layout policy relating to the plot area. Framing the chart with axes In this section, we are going to look into the configuration of axes in Highcharts in terms of their functional area. We will start off with a plain line graph and gradually apply more options to the chart to demonstrate the effects. Accessing the axis data type There are two ways to specify data for a chart: categories and series data. For displaying intervals with specific names, we should use the categories field that expects an array of strings. Each entry in the categories array is then associated with the series data array. Alternatively, the axis interval values are embedded inside the series data array. Then, Highcharts extracts the series data for both axes, interprets the data type, and formats and labels the values appropriately. The following is a straightforward example showing the use of categories:     chart: {        renderTo: 'container',        height: 250,        spacingRight: 20    },    title: {        text: 'Market Data: Nasdaq 100'    },    subtitle: {        text: 'May 11, 2012'    },    xAxis: {        categories: [ '9:30 am', '10:00 am', '10:30 am',                       '11:00 am', '11:30 am', '12:00 pm',                       '12:30 pm', '1:00 pm', '1:30 pm',                       '2:00 pm', '2:30 pm', '3:00 pm',                       '3:30 pm', '4:00 pm' ],         labels: {             step: 3         }     },     yAxis: {         title: {             text: null         }     },     legend: {         enabled: false     },     credits: {         enabled: false     },     series: [{         name: 'Nasdaq',         color: '#4572A7',         data: [ 2606.01, 2622.08, 2636.03, 2637.78, 2639.15,                 2637.09, 2633.38, 2632.23, 2632.33, 2632.59,                 2630.34, 2626.89, 2624.59, 2615.98 ]     }] The preceding code snippet produces a graph that looks like the following screenshot: The first name in the categories field corresponds to the first value, 9:30 am, 2606.01, in the series data array, and so on. Alternatively, we can specify the time values inside the series data and use the type property of the x axis to format the time. The type property supports 'linear' (default), 'logarithmic', or 'datetime'. The 'datetime' setting automatically interprets the time in the series data into human-readable form. Moreover, we can use the dateTimeLabelFormats property to predefine the custom format for the time unit. The option can also accept multiple time unit formats. This is for when we don't know in advance how long the time span is in the series data, so each unit in the resulting graph can be per hour, per day, and so on. The following example shows how the graph is specified with predefined hourly and minute formats. The syntax of the format string is based on the PHP strftime function:     xAxis: {         type: 'datetime',          // Format 24 hour time to AM/PM          dateTimeLabelFormats: {                hour: '%I:%M %P',              minute: '%I %M'          }               },     series: [{         name: 'Nasdaq',         color: '#4572A7',         data: [ [ Date.UTC(2012, 4, 11, 9, 30), 2606.01 ],                  [ Date.UTC(2012, 4, 11, 10), 2622.08 ],                   [ Date.UTC(2012, 4, 11, 10, 30), 2636.03 ],                  .....                ]     }] Note that the x axis is in the 12-hour time format, as shown in the following screenshot: Instead, we can define the format handler for the xAxis.labels.formatter property to achieve a similar effect. Highcharts provides a utility routine, Highcharts.dateFormat, that converts the timestamp in milliseconds to a readable format. In the following code snippet, we define the formatter function using dateFormat and this.value. The keyword this is the axis's interval object, whereas this.value is the UTC time value for the instance of the interval:     xAxis: {         type: 'datetime',         labels: {             formatter: function() {                 return Highcharts.dateFormat('%I:%M %P', this.value);             }         }     }, Since the time values of our data points are in fixed intervals, they can also be arranged in a cut-down version. All we need is to define the starting point of time, pointStart, and the regular interval between them, pointInterval, in milliseconds: series: [{     name: 'Nasdaq',     color: '#4572A7',     pointStart: Date.UTC(2012, 4, 11, 9, 30),     pointInterval: 30 * 60 * 1000,     data: [ 2606.01, 2622.08, 2636.03, 2637.78,             2639.15, 2637.09, 2633.38, 2632.23,             2632.33, 2632.59, 2630.34, 2626.89,             2624.59, 2615.98 ] }] Adjusting intervals and background We have learned how to use axis categories and series data arrays in the last section. In this section, we will see how to format interval lines and the background style to produce a graph with more clarity. We will continue from the previous example. First, let's create some interval lines along the y axis. In the chart, the interval is automatically set to 20. However, it would be clearer to double the number of interval lines. To do that, simply assign the tickInterval value to 10. Then, we use minorTickInterval to put another line in between the intervals to indicate a semi-interval. In order to distinguish between interval and semi-interval lines, we set the semi-interval lines, minorGridLineDashStyle, to a dashed and dotted style. There are nearly a dozen line style settings available in Highcharts, from 'Solid' to 'LongDashDotDot'. Readers can refer to the online manual for possible values. The following is the first step to create the new settings:             yAxis: {                 title: {                     text: null                 },                 tickInterval: 10,                 minorTickInterval: 5,                 minorGridLineColor: '#ADADAD',                 minorGridLineDashStyle: 'dashdot'            } The interval lines should look like the following screenshot: To make the graph even more presentable, we add a striping effect with shading using alternateGridColor. Then, we change the interval line color, gridLineColor, to a similar range with the stripes. The following code snippet is added into the yAxis configuration:                 gridLineColor: '#8AB8E6',                 alternateGridColor: {                     linearGradient: {                         x1: 0, y1: 1,                         x2: 1, y2: 1                     },                     stops: [ [0, '#FAFCFF' ],                              [0.5, '#F5FAFF'] ,                              [0.8, '#E0F0FF'] ,                              [1, '#D6EBFF'] ]                   } The following is the graph with the new shading background: The next step is to apply a more professional look to the y axis line. We are going to draw a line on the y axis with the lineWidth property, and add some measurement marks along the interval lines with the following code snippet:                  lineWidth: 2,                  lineColor: '#92A8CD',                  tickWidth: 3,                  tickLength: 6,                  tickColor: '#92A8CD',                  minorTickLength: 3,                  minorTickWidth: 1,                  minorTickColor: '#D8D8D8' The tickWidth and tickLength properties add the effect of little marks at the start of each interval line. We apply the same color on both the interval mark and the axis line. Then we add the ticks minorTickLength and minorTickWidth into the semi-interval lines in a smaller size. This gives a nice measurement mark effect along the axis, as shown in the following screenshot: Now, we apply a similar polish to the xAxis configuration, as follows:            xAxis: {                type: 'datetime',                labels: {                    formatter: function() {                        return Highcharts.dateFormat('%I:%M %P', this.value);                    },                },                gridLineDashStyle: 'dot',                gridLineWidth: 1,                tickInterval: 60 * 60 * 1000,                lineWidth: 2,                lineColor: '#92A8CD',                tickWidth: 3,                tickLength: 6,                tickColor: '#92A8CD',            }, We set the x axis interval lines to the hourly format and switch the line style to a dotted line. Then, we apply the same color, thickness, and interval ticks as on the y axis. The following is the resulting screenshot: However, there are some defects along the x axis line. To begin with, the meeting point between the x axis and y axis lines does not align properly. Secondly, the interval labels at the x axis are touching the interval ticks. Finally, part of the first data point is covered by the y-axis line. The following is an enlarged screenshot showing the issues: There are two ways to resolve the axis line alignment problem, as follows: Shift the plot area 1 pixel away from the x axis. This can be achieved by setting the offset property of xAxis to 1. Increase the x-axis line width to 3 pixels, which is the same width as the y-axis tick interval. As for the x-axis label, we can simply solve the problem by introducing the y offset value into the labels setting. Finally, to avoid the first data point touching the y-axis line, we can impose minPadding on the x axis. What this does is to add padding space at the minimum value of the axis, the first point. The minPadding value is based on the ratio of the graph width. In this case, setting the property to 0.02 is equivalent to shifting along the x axis 5 pixels to the right (250 px * 0.02). The following are the additional settings to improve the chart:     xAxis: {         ....         labels: {                formatter: ...,                y: 17         },         .....         minPadding: 0.02,         offset: 1     } The following screenshot shows that the issues have been addressed: As we can see, Highcharts has a comprehensive set of configurable variables with great flexibility. Using plot lines and plot bands In this section, we are going to see how we can use Highcharts to place lines or bands along the axis. We will continue with the example from the previous section. Let's draw a couple of lines to indicate the day's highest and lowest index points on the y axis. The plotLines field accepts an array of object configurations for each plot line. There are no width and color default values for plotLines, so we need to specify them explicitly in order to see the line. The following is the code snippet for the plot lines:       yAxis: {               ... ,               plotLines: [{                    value: 2606.01,                    width: 2,                    color: '#821740',                    label: {                        text: 'Lowest: 2606.01',                        style: {                            color: '#898989'                        }                    }               }, {                    value: 2639.15,                    width: 2,                    color: '#4A9338',                    label: {                        text: 'Highest: 2639.15',                        style: {                            color: '#898989'                        }                    }               }]         } The following screenshot shows what it should look like: We can improve the look of the chart slightly. First, the text label for the top plot line should not be next to the highest point. Second, the label for the bottom line should be remotely covered by the series and interval lines, as follows: To resolve these issues, we can assign the plot line's zIndex to 1, which brings the text label above the interval lines. We also set the x position of the label to shift the text next to the point. The following are the new changes:              plotLines: [{                    ... ,                    label: {                        ... ,                        x: 25                    },                    zIndex: 1                    }, {                    ... ,                    label: {                        ... ,                        x: 130                    },                    zIndex: 1               }] The following graph shows the label has been moved away from the plot line and over the interval line: Now, we are going to change the preceding example with a plot band area that shows the index change between the market's opening and closing values. The plot band configuration is very similar to plot lines, except that it uses the to and from properties, and the color property accepts gradient settings or color code. We create a plot band with a triangle text symbol and values to signify a positive close. Instead of using the x and y properties to fine-tune label position, we use the align option to adjust the text to the center of the plot area (replace the plotLines setting from the above example):               plotBands: [{                    from: 2606.01,                    to: 2615.98,                    label: {                        text: '▲ 9.97 (0.38%)',                        align: 'center',                        style: {                            color: '#007A3D'                        }                    },                    zIndex: 1,                    color: {                        linearGradient: {                            x1: 0, y1: 1,                            x2: 1, y2: 1                        },                        stops: [ [0, '#EBFAEB' ],                                 [0.5, '#C2F0C2'] ,                                 [0.8, '#ADEBAD'] ,                                 [1, '#99E699']                        ]                    }               }] The triangle is an alt-code character; hold down the left Alt key and enter 30 in the number keypad. See http://www.alt-codes.net for more details. This produces a chart with a green plot band highlighting a positive close in the market, as shown in the following screenshot: Extending to multiple axes Previously, we ran through most of the axis configurations. Here, we explore how we can use multiple axes, which are just an array of objects containing axis configurations. Continuing from the previous stock market example, suppose we now want to include another market index, Dow Jones, along with Nasdaq. However, both indices are different in nature, so their value ranges are vastly different. First, let's examine the outcome by displaying both indices with the common y axis. We change the title, remove the fixed interval setting on the y axis, and include data for another series:             chart: ... ,             title: {                 text: 'Market Data: Nasdaq & Dow Jones'             },             subtitle: ... ,             xAxis: ... ,             credits: ... ,             yAxis: {                 title: {                     text: null                 },                 minorGridLineColor: '#D8D8D8',                 minorGridLineDashStyle: 'dashdot',                 gridLineColor: '#8AB8E6',                 alternateGridColor: {                     linearGradient: {                         x1: 0, y1: 1,                         x2: 1, y2: 1                     },                     stops: [ [0, '#FAFCFF' ],                              [0.5, '#F5FAFF'] ,                              [0.8, '#E0F0FF'] ,                              [1, '#D6EBFF'] ]                 },                 lineWidth: 2,                 lineColor: '#92A8CD',                 tickWidth: 3,                 tickLength: 6,                 tickColor: '#92A8CD',                 minorTickLength: 3,                 minorTickWidth: 1,                 minorTickColor: '#D8D8D8'             },             series: [{               name: 'Nasdaq',               color: '#4572A7',               data: [ [ Date.UTC(2012, 4, 11, 9, 30), 2606.01 ],                          [ Date.UTC(2012, 4, 11, 10), 2622.08 ],                           [ Date.UTC(2012, 4, 11, 10, 30), 2636.03 ],                          ...                        ]             }, {               name: 'Dow Jones',               color: '#AA4643',               data: [ [ Date.UTC(2012, 4, 11, 9, 30), 12598.32 ],                          [ Date.UTC(2012, 4, 11, 10), 12538.61 ],                           [ Date.UTC(2012, 4, 11, 10, 30), 12549.89 ],                          ...                        ]             }] The following is the chart showing both market indices: As expected, the index changes that occur during the day have been normalized by the vast differences in value. Both lines look roughly straight, which falsely implies that the indices have hardly changed. Let us now explore putting both indices onto separate y axes. We should remove any background decoration on the y axis, because we now have a different range of data shared on the same background. The following is the new setup for yAxis:            yAxis: [{                  title: {                     text: 'Nasdaq'                 },               }, {                 title: {                     text: 'Dow Jones'                 },                 opposite: true             }], Now yAxis is an array of axis configurations. The first entry in the array is for Nasdaq and the second is for Dow Jones. This time, we display the axis title to distinguish between them. The opposite property is to put the Dow Jones y axis onto the other side of the graph for clarity. Otherwise, both y axes appear on the left-hand side. The next step is to align indices from the y-axis array to the series data array, as follows:             series: [{                 name: 'Nasdaq',                 color: '#4572A7',                 yAxis: 0,                 data: [ ... ]             }, {                 name: 'Dow Jones',                 color: '#AA4643',                 yAxis: 1,                 data: [ ... ]             }]          We can clearly see the movement of the indices in the new graph, as follows: Moreover, we can improve the final view by color-matching the series to the axis lines. The Highcharts.getOptions().colors property contains a list of default colors for the series, so we use the first two entries for our indices. Another improvement is to set maxPadding for the x axis, because the new y-axis line covers parts of the data points at the high end of the x axis:             xAxis: {                 ... ,                 minPadding: 0.02,                 maxPadding: 0.02                 },             yAxis: [{                 title: {                     text: 'Nasdaq'                 },                 lineWidth: 2,                 lineColor: '#4572A7',                 tickWidth: 3,                 tickLength: 6,                 tickColor: '#4572A7'             }, {                 title: {                     text: 'Dow Jones'                 },                 opposite: true,                 lineWidth: 2,                 lineColor: '#AA4643',                 tickWidth: 3,                 tickLength: 6,                 tickColor: '#AA4643'             }], The following screenshot shows the improved look of the chart: We can extend the preceding example and have more than a couple of axes, simply by adding entries into the yAxis and series arrays, and mapping both together. The following screenshot shows a 4-axis line graph: Summary In this article, major configuration components were discussed and experimented with, and examples shown. By now, we should be comfortable with what we have covered already and ready to plot some of the basic graphs with more elaborate styles. Resources for Article: Further resources on this subject: Theming with Highcharts [article] Integrating with other Frameworks [article] Highcharts [article]
Read more
  • 0
  • 0
  • 9155

article-image-dragging-ccnode-cocos2d-swift
Packt
21 Jan 2015
6 min read
Save for later

Dragging a CCNode in Cocos2D-Swift

Packt
21 Jan 2015
6 min read
 In this article by Ben Trengrove, author of the book Cocos2D Game Development Essentials, we will see how can we update our sprite position according to the touch movement. (For more resources related to this topic, see here.) Very often in development with Cocos2d you will want the ability to drag a node around the screen. It is not a built in behavior but it can be easily coded. To do it you will need to track the touch information. Using this information you will move the sprite to the updated position anytime the touch moves. Lets get started. Add a new Boolean property to your private interface. @interface HelloWorldScene ()@property (nonatomic, assign) BOOL dragging;@end Now, add the following code to the touchBegan method. -(void) touchBegan:(UITouch *)touch withEvent:(UIEvent *)event {  CGPoint touchLoc = [touch locationInNode:self];  if (CGRectContainsPoint(_sprite.boundingBox, touchLoc)) {    self.dragging = YES;    NSLog(@"Start dragging");  }} Add a touchMoved method with the following code. - (void)touchMoved:(UITouch *)touch withEvent:(UIEvent *)event {  CGPoint touchLoc = [touch locationInNode:self];  if (self.dragging) {    _sprite.position = touchLoc;  }} What is being done in these methods is first you check to see if the initial touch was inside the sprite. If it was, we set a Boolean to say that the user is dragging the node. They have in effect picked up the node. Next in the touchMoved method, it is as simple as if the user did touch down on the node and move, set the new position of the node to the touch location. Next we just have to implement the letting go of the sprite. This is done in touchEnded. Implement the touchEnded method as follows. - (void)touchEnded:(UITouch *)touch withEvent:(UIEvent *)event {  self.dragging = NO;} Now, if you build and run the app you will be able to drag around the sprite. There is one small problem however, if you don't grab the sprite in its center you will see that the node snaps its center to the touch. What you really want to happen is just move from where on the node it was touched. You will make this adjustment now. To make this fix you are going to have to calculate the offset on the initial touch from the nodes center point. This will be stored and applied to the final position of the node in touchMoved. Add another property to your private interface. @property (nonatomic, assign) CGPoint dragOffset; Modify your touchBegan method to the following: -(void) touchBegan:(UITouch *)touch withEvent:(UIEvent *)event {  CGPoint touchLoc = [touch locationInNode:self];  CGPoint touchOffset = [touch locationInNode:_sprite];  if (CGRectContainsPoint(_sprite.boundingBox, touchLoc)) {    self.dragging = YES;    NSLog(@"Start dragging");    self.dragOffset = touchOffset;  }} Notice that using the locationinnode method, you can calculate the position of the touch relative to the node. This information is only useful if the touch was indeed inside of the node so you only store it if that is the case. Now, modify your touchMoved method to the following: - (void)touchMoved:(UITouch *)touch withEvent:(UIEvent *)event {  CGPoint touchLoc = [touch locationInNode:self];  //Check if we are already dragging  if (self.dragging) {    CGPoint offsetPosition = ccpSub(touchLoc, self.dragOffset);//Calculate an offset to account for the anchor point        CGPoint anchorPointOffset = CGPointMake(_sprite.anchorPoint.x * _sprite.boundingBox.size.width, _sprite.anchorPoint.y * _sprite.boundingBox.size.height);//Add the offset and anchor point adjustment together to get the final position    CGPoint positionWithAnchorPoint = ccpAdd(offsetPosition, anchorPointOffset);    _sprite.position = positionWithAnchorPoint;  }} The offset position is subtracted from the touch location using the Cocos2d convenience function ccpSub. CcpSub subtracts a point from another point. Using the anchor point and size of the sprite, an adjustment is calculated to account for different anchor points. Once these two points have been calculated, they are added together to create a final sprite position. Build and run the app now, you will now have a very natural dragging mechanic. For reference, here is the complete scene. @interface HelloWorldScene ()@property (nonatomic, assign) BOOL dragging;@property (nonatomic, assign) CGPoint dragOffset;@end- (id)init{  // Apple recommend assigning self with supers return value  self = [super init];  if (!self) return(nil);  // Enable touch handling on scene node  self.userInteractionEnabled = YES;  // Create a colored background (Dark Grey)  CCNodeColor *background = [CCNodeColor nodeWithColor:[CCColor colorWithRed:0.2f green:0.2f blue:0.2f alpha:1.0f]];  [self addChild:background];  // Add a sprite  _sprite = [CCSprite spriteWithImageNamed:@"Icon-72.png"];  _sprite.position  = ccp(self.contentSize.width/2,self.contentSize.height/2);  _sprite.anchorPoint = ccp(0.5, 0.5);  [self addChild:_sprite];  // Create a back button  CCButton *backButton = [CCButton buttonWithTitle:@"[ Menu ]" fontName:@"Verdana-Bold" fontSize:18.0f];  backButton.positionType = CCPositionTypeNormalized;  backButton.position = ccp(0.85f, 0.95f); // Top Right of screen  [backButton setTarget:self selector:@selector(onBackClicked:)];  [self addChild:backButton];  // donereturn self;}// -----------------------------------------------------------------------#pragma mark - Touch Handler// ------------------------------------------------------------------------(void) touchBegan:(UITouch *)touch withEvent:(UIEvent *)event {  CGPoint touchLoc = [touch locationInNode:self];  CGPoint touchOffset = [touch locationInNode:_sprite];  if (CGRectContainsPoint(_sprite.boundingBox, touchLoc)) {    self.dragging = YES;    NSLog(@"Start dragging");    self.dragOffset = touchOffset;  }}- (void)touchMoved:(UITouch *)touch withEvent:(UIEvent *)event {  CGPoint touchLoc = [touch locationInNode:self];  if (self.dragging) {    CGPoint offsetPosition = ccpSub(touchLoc, self.dragOffset);    CGPoint anchorPointOffset = CGPointMake(_sprite.anchorPoint.x * _sprite.boundingBox.size.width, _sprite.anchorPoint.y * _sprite.boundingBox.size.height);    CGPoint positionWithAnchorPoint = ccpAdd(offsetPosition, anchorPointOffset);    _sprite.position = positionWithAnchorPoint;  }}- (void)touchEnded:(UITouch *)touch withEvent:(UIEvent *)event {  self.dragging = NO;} Summary In this article, we saw how to update your sprite position according to the touch movement. Resources for Article: Further resources on this subject: Why should I make cross-platform games? [article] Animations in Cocos2d-x [article] Moving the Space Pod Using Touch [article]
Read more
  • 0
  • 0
  • 1985
article-image-sentiment-analysis-twitter-data-part-1
Janu Verma
21 Jan 2015
4 min read
Save for later

Sentiment Analysis of Twitter Data - Part 1

Janu Verma
21 Jan 2015
4 min read
Twitter represents a fundamentally new instrument to make social measurements. Millions of people voluntarily express their opinions across any topic imaginable — this data source is incredibly valuable for both research and business. There have been numerous studies on this data for sociological, political, economical, and network analytical questions. We can tap the vast amount of data from Twitter to generate “public opinion” towards certain topics by aggregating the individual tweet results over time. Sentiment Analysis aims to determine how a certain person or group reacts to a specific topic. Traditionally, we would run surveys to gather data and do statistical analysis. With Twitter, it works by extracting tweets containing references to the desired topic, computing the sentiment polarity and strength of each tweet, and then aggregating the results for all such tweets. Companies use this information to gather public opinion on their products and services, and make data-informed decisions. We can also track changes in the users’ opinion towards a topic over time, allowing us to identify the events that caused these changes. One of the first studies on Twitter data for sentiment was to study public perception of Obama’s performance as President. Another (fun) example could be the to explore the variation of sentiment regarding the TV series “Game of Thrones.” The unpredictable episode “The Rains of Castamere” resulted in a lot of negative tweets and a peak in the sentiment score. Also, we can look at the geocoded information in the tweets and analyze the relation between location and mood. For example, people in California may be happy about event X, while New Yorkers didn’t like it much. Sentiment analysis employs natural language processing (NLP), text mining and computational linguistics to extract subjective information from the textual data. Applications Sentiment analysis techniques find applications in technology, finance, and research. Some important applications of sentiment analysis are: predicting stocks computing movie-ratings discerning product satisfaction analyzing political or apolitical campaigns Techniques There are broadly two categories of sentiment analysis: Lexical Methods : These techniques employ dictionaries of words annotated with their semantic polarity and sentiment strength. This is then used to calculate a score for the polarity and/or sentiment of the document. Usually this method gives high precision but low recall. Machine Learning Methods: Such techniques require creating a model by training the classifier with labeled examples. This means that you must first gather a dataset with examples for positive, negative and neutral classes, extract the features from the examples and then train the algorithm based on the examples. These methods are used mainly for computing the polarity of the document. The choice of the method heavily depends upon the application, the domain and the language. Using lexicon-based techniques with large dictionaries enables you to achieve very good results. Nevertheless, these techniques require using a lexicon, something which is not always available in all languages. On the other hand, Machine Learning based techniques can deliver good results, but they require obtaining training on labeled data. Here are some examples of companies that use sentiment analysis: AlchemyAPI, based in Denver, is a really cool company that provides resources to do sentiment analysis for an entity on a document or webpage. The Stock Sonar uses sentiment analysis of unstructured text to determine whether online press is being positive or negative towards businesses by identifying lexical sentiment as well as business events. About the Author Janu Verma is a Quantitative Researcher at the Buckler Lab, Cornell University, where he works on problems in bioinformatics and genomics. His background is in mathematics and machine learning and leverages tools from these areas to answer questions in biology. Janu holds a Masters in Theoretical Physics from University of Cambridge in UK, and dropped out from mathematics PhD program (after 3 years) at Kansas State University. He also writes about data science, machine learning and mathematics at Random Inferences. Until Sunday 24th January you can save 50% on our leading Machine Learning titles as we celebrate Machine Learning week. From Python to Spark, and from R to Java, we've got a range of tools and languages covered so you can explore Machine Learning from a range of different perspectives. You can also pick up a free Machine Learning eBook every day this week from our Free Learning page – don’t miss out!
Read more
  • 0
  • 0
  • 4978

article-image-servicestack-applications
Packt
21 Jan 2015
9 min read
Save for later

ServiceStack applications

Packt
21 Jan 2015
9 min read
In this article by Kyle Hodgson and Darren Reid, authors of the book ServiceStack 4 Cookbook, we'll learn about unit testing ServiceStack applications. (For more resources related to this topic, see here.) Unit testing ServiceStack applications In this recipe, we'll focus on simple techniques to test individual units of code within a ServiceStack application. We will use the ServiceStack testing helper BasicAppHost as an application container, as it provides us with some useful helpers to inject a test double for our database. Our goal is small; fast tests that test one unit of code within our application. Getting ready We are going to need some services to test, so we are going to use the PlacesToVisit application. How to do it… Create a new testing project. It's a common convention to name the testing project <ProjectName>.Tests—so in our case, we'll call it PlacesToVisit.Tests. Create a class within this project to contain the tests we'll write—let's name it PlacesServiceTests as the tests within it will focus on the PlacesService class. Annotate this class with the [TestFixture] attribute, as follows: [TestFixture]public class PlaceServiceTests{ We'll want one method that runs whenever this set of tests begins to set up the environment and another one that runs afterwards to tear the environment down. These will be annotated with the NUnit attributes of TestFixtureSetUp and TextFixtureTearDown, respectively. Let's name them FixtureInit and FixtureTearDown. In the FixtureInit method, we will use BasicAppHost to initialize our appHost test container. We'll make it a field so that we can easily access it in each test, as follows: ServiceStackHost appHost; [TestFixtureSetUp]public void FixtureInit(){appHost = new BasicAppHost(typeof(PlaceService).Assembly){   ConfigureContainer = container =>   {     container.Register<IDbConnectionFactory>(c =>       new OrmLiteConnectionFactory(         ":memory:", SqliteDialect.Provider));     container.RegisterAutoWiredAs<PlacesToVisitRepository,       IPlacesToVisitRepository>();   }}.Init();} The ConfigureContainer property on BasicAppHost allows us to pass in a function that we want AppHost to run inside of the Configure method. In this case, you can see that we're registering OrmLiteConnectionFactory with an in-memory SQLite instance. This allows us to test code that uses a database without that database actually running. This useful technique could be considered a classic unit testing approach—the mockist approach might have been to mock the database instead. The FixtureTearDown method will dispose of appHost as you might imagine. This is how the code will look: [TestFixtureTearDown]public void FixtureTearDown(){appHost.Dispose();} We haven't created any data in our in memory database yet. We'll want to ensure the data is the same prior to each test, so our TestInit method is a good place to do that—it will be run once before each and every test run as we'll annotate it with the [SetUp] attribute, as follows: [SetUp]public void TestInit(){using (var db = appHost.Container     .Resolve<IDbConnectionFactory>().Open()){   db.DropAndCreateTable<Place>();   db.InsertAll(PlaceSeedData.GetSeedPlaces());}} As our tests all focus on PlaceService, we'll make sure to create Place data. Next, we'll begin writing tests. Let's start with one that asserts that we can create new places. The first step is to create the new method, name it appropriately, and annotate it with the [Test] attribute, as follows: [Test]public void ShouldAddNewPlaces(){ Next, we'll create an instance of PlaceService that we can test against. We'll use the Funq IoC TryResolve method for this: var placeService = appHost.TryResolve<PlaceService>(); We'll want to create a new place, then query the database later to see whether the new one was added. So, it's useful to start by getting a count of how many places there are based on just the seed data. Here's how you can get the count based on the seed data: var startingCount = placeService               .Get(new AllPlacesToVisitRequest())               .Places               .Count; Since we're testing the ability to handle a CreatePlaceToVisit request, we'll need a test object that we can send the service to. Let's create one and then go ahead and post it: var melbourne = new CreatePlaceToVisit{   Name = "Melbourne",   Description = "A nice city to holiday"}; placeService.Post(melbourne); Having done that, we can get the updated count and then assert that there is one more item in the database than there were before: var newCount = placeService               .Get(new AllPlacesToVisitRequest())               .Places              .Count;Assert.That(newCount == startingCount + 1); Next, let's fetch the new record that was created and make an assertion that it's the one we want: var newPlace = placeService.Get(new PlaceToVisitRequest{   Id = startingCount + 1});Assert.That(newPlace.Place.Name == melbourne.Name);} With this in place, if we run the test, we'll expect it to pass both assertions. This proves that we can add new places via PlaceService registered with Funq, and that when we do that we can go and retrieve them later as expected. We can also build a similar test that asserts that on our ability to update an existing place. Adding the code is simple, following the pattern we set out previously. We'll start with the arrange section of the test, creating the variables and objects we'll need: [Test]public void ShouldUpdateExistingPlaces(){var placeService = appHost.TryResolve<PlaceService>();var startingPlaces = placeService     .Get(new AllPlacesToVisitRequest())     .Places;var startingCount = startingPlaces.Count;  var canberra = startingPlaces     .First(c => c.Name.Equals("Canberra")); const string canberrasNewName = "Canberra, ACT";canberra.Name = canberrasNewName; Once they're in place, we'll act. In this case, the Put method on placeService has the responsibility for update operations: placeService.Put(canberra.ConvertTo<UpdatePlaceToVisit>()); Think of the ConvertTo helper method from ServiceStack as an auto-mapper, which converts our Place object for us. Now that we've updated the record for Canberra, we'll proceed to the assert section of the test, as follows: var updatedPlaces = placeService     .Get(new AllPlacesToVisitRequest())     .Places;var updatedCanberra = updatedPlaces     .First(p => p.Id.Equals(canberra.Id));var updatedCount = updatedPlaces.Count; Assert.That(updatedCanberra.Name == canberrasNewName);Assert.That(updatedCount == startingCount);} How it works… These unit tests are using a few different patterns that help us write concise tests, including the development of our own test helpers, and with helpers from the ServiceStack.Testing namespace, for instance BasicAppHost allows us to set up an application host instance without actually hosting a web service. It also lets us provide a custom ConfigureContainer action to mock any of our dependencies for our services and seed our testing data, as follows: appHost = new BasicAppHost(typeof(PlaceService).Assembly){ConfigureContainer = container =>{   container.Register<IDbConnectionFactory>(c =>     new OrmLiteConnectionFactory(     ":memory:", SqliteDialect.Provider));    container.RegisterAutoWiredAs<PlacesToVisitRepository,     IPlacesToVisitRepository>();}}.Init(); To test any ServiceStack service, you can resolve it through the application host via TryResolve<ServiceType>().This will have the IoC container instantiate an object of the type requested. This gives us the ability to test the Get method independent of other aspects of our web service, such as validation. This is shown in the following code: var placeService = appHost.TryResolve<PlaceService>(); In this example, we are using an in-memory SQLite instance to mock our use of OrmLite for data access, which IPlacesToVisitRepository will also use as well as seeding our test data in our ConfigureContainer hook of BasicAppHost. The use of both in-memory SQLite and BasicAppHost provide fast unit tests to very quickly iterate our application services while ensuring we are not breaking any functionality specifically associated with this component. In the example provided, we are running three tests in less than 100 milliseconds. If you are using the full version of Visual Studio, extensions such as NCrunch can allow you to regularly run your unit tests while you make changes to your code. The performance of ServiceStack components and the use of these extensions results in a smooth developer experience with productivity and quality of code. There's more… In the examples in this article, we wrote out tests that would pass, ran them, and saw that they passed (no surprise). While this makes explaining things a bit simpler, it's not really a best practice. You generally want to make sure your tests fail when presented with wrong data at some point. The authors have seen many cases where subtle bugs in test code were causing a test to pass that should not have passed. One best practice is to write tests so that they fail first and then make them pass—this guarantees that the test can actually detect the defect you're guarding against. This is commonly referred to as the red/green/refactor pattern. Summary In this article, we covered some techniques to unit test ServiceStack applications. Resources for Article: Further resources on this subject: Building a Web Application with PHP and MariaDB – Introduction to caching [article] Web API and Client Integration [article] WebSockets in Wildfly [article]
Read more
  • 0
  • 0
  • 2117
Modal Close icon
Modal Close icon