Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletter Hub
Free Learning
Arrow right icon
timer SALE ENDS IN
0 Days
:
00 Hours
:
00 Minutes
:
00 Seconds

How-To Tutorials

7019 Articles
article-image-using-nosql-databases
Packt
26 Apr 2016
11 min read
Save for later

Using NoSQL Databases

Packt
26 Apr 2016
11 min read
In this article by Valentin Bojinov, the author of the book RESTful Web API Design with Node.JS, Second Edition, we willlook for a better storage solution, which can be scalable easily, together with our REST-enabled application. These days, the so-called NoSQL databases are used heavily in cloud environments. They have the following advantages over traditional transactional SQL databases: They are schemaless; that is, they work with object representations rather than store the object state in one or several tables, depending on their complexity. They are extendable, because they store an actual object. Data evolution is supported implicitly, so all you need to do is just call the operation that stores the object. They are designed to be highly distributed and scalable. Nearly all modern NoSQL solutions out there support clustering and can scale further, along with the load of your application. Additionally, most of them have REST-enabled interfaces over HTTP, which eases their usage over a load balancer in high-availability scenarios. Classical database drivers are usually not available for traditional client-side languages, such as JavaScript, because they require native libraries or drivers. However, the idea of NoSQL originated from using document data stores. Thus, most of them support the JSON format, which is native to JavaScript. Last but not least, most NoSQL solutions are open source and are available for free, with all the benefits that open source projects offer: community, examples, and freedom! In this article, we will take a look at two NoSQL solutions: LevelDB and MongoDB. We will see how to design and test our database models, and finally, we will take a brief look at the content delivery network (CDN) infrastructures (For more resources related to this topic, see here.) Key/value store – LevelDB The first data store we will look at is LevelDB. It is an open source implementation developed by Google and written in C++. It is supported by a wide range of platforms, including Node.js. LevelDB is a key/value store; both the key and value are represented as binary data, so their content can vary from simple strings to binary representations of serialized objects in any format, such as JSON or XML. As it is a key/value data store, working with it is similar to working with an associative array—a key identifies an object uniquely within the store. Furthermore, the keys are stored as sorted for better performance. But what makes LevelDB perform better than an arbitrary file storage implementation? Well, it uses a "log-structured merge" topology, which stores all write operations in an in-memory log, transferred (flushed) regularly to a permanent storage called Sorted String Table (SST) files. Read operations first attempt to retrieve entries from a cache containing the most commonly returned results. The size of the reading cache and the flush interval of the writing log are configurable parameters, which can be further adjusted in order to be adequate for the application load. The following image shows this topology: The storage is a collection of string-sorted files with a maximum size of about 2 MB. Each file consists of 4 KB segments that are readable by a single read operation. The table files are not sorted in a straightforward manner, but are organized into levels. The log level is on top, before all other levels. It is always flushed to level 0, which consists of at most four SST files. When filled, one STS file is compacted to a lower level, that is, level 1. The maximum size of level 1 is 10 MB. When it gets filled, a file goes from level 1 to level 2. LevelDB assumes that the size of each lower level is ten times larger than the size of the previous level. So we have the following level structure: Log with a configurable size Level 0, consisting of four SST files Level 1, with a maximum size of 10 MB Level 2, with a maximum size of 100 MB Level 3, with a maximum size of 1000 MB Level n, with a maximum size of the previous level multiplied by 10 – (n-1)*10 MB The hierarchical structure of this topology assures that newer data stays in the top levels, while older data is somewhere in the lower levels. A read operation always starts searching for a given key in the cache, and if it is not found there, the operation traverses through each level until the entry is found. An entry is considered non-existing if its key is not found anywhere within all levels. LevelDB provides get, put, and delete operations to manipulate data records as well as a batch operation that can be used to perform multiple data manipulations atomically; that is, either all or none of the operations in the batch are executed successfully. LevelDB can optionally use a compression library in order to reduce the size of the stored values. This compression is provided by Google's Snappy compression library. It is highly optimized for fast compression with low performance impact, so too high expectations should not be expected for a large compression ratio. There are two popular libraries that enable LevelDB usage in Node: LevelDOWN and LevelUP. Initially, LevelDOWN was acting as foundation binding, implicitly provided with LevelUP, but after version 0.9, it had been extracted out of it and became available as a standalone binding for LevelDB. Currently, LevelUP has no explicit dependency on LevelDOWN defined. It needs to be installed separately, as LevelUP expects it to be available on its Node's require() path. LevelDOWN is a pure C++ interface used to bind Node and LevelDB. Though it is slightly faster than LevelUP, it has some state safety and API considerations, which make it less preferable than LevelUP. To be concrete, LevelDOWN does not keep track of the state of the underlying instance. Thus, it is up to the developers themselves not to open a connection more than once or use a data manipulating operation against a closed database connection, as this will cause errors. LevelUP provides state-safe operations out the box. Thus, it prevents out-of-state operations from being sent to its foundation—LevelDOWN. Let's move on to installing LevelUP by executing the following npm command: npm install levelup leveldown Even though the LevelUP module can be installed without LevelDOWN, it will not work at runtime, complaining that it can't find an underlying dependency. Enough theory! Let's see what the LevelUP API looks like. The following code snippet instantiates LevelDB and inserts a dummy contact record into it. It also exposes a /contacts/:number route so that this very record can be returned as a JSON output if queried appropriately. Let's use it in a new project in the Enide studio, in a file named levelup.js: var express = require('express') , http = require('http') , path = require('path') , bodyParser = require('body-parser') , logger = require('morgan') , methodOverride = require('method-override') , errorHandler = require('errorhandler') , levelup = require('levelup'); var app = express(); var url = require('url'); // all environments app.set('port', process.env.PORT || 3000); app.set('views', __dirname + '/views'); app.set('view engine', 'jade'); app.use(methodOverride()); app.use(bodyParser.json()); // development only if ('development' == app.get('env')) { app.use(errorHandler()); } var db = levelup('./contact', {valueEncoding: 'json'}); db.put('+359777123456', { "firstname": "Joe", "lastname": "Smith", "title": "Mr.", "company": "Dev Inc.", "jobtitle": "Developer", "primarycontactnumber": "+359777123456", "othercontactnumbers": [ "+359777456789", "+359777112233"], "primaryemailaddress": "joe.smith@xyz.com", "emailaddresses": [ "j.smith@xyz.com"], "groups": ["Dev","Family"] }); app.get('/contacts/:number', function(request, response) { console.log(request.url + ' : querying for ' + request.params.number); db.get(request.params.number, function(error, data) { if (error) { response.writeHead(404, { 'Content-Type' : 'text/plain'}); response.end('Not Found'); return; } response.setHeader('content-type', 'application/json'); response.send(data); }); }); console.log('Running at port ' + app.get('port')); http.createServer(app).listen(app.get('port')); As the contact is inserted into LevelDB before the HTTP server is created, the record identified with the +359777123456 key will be available in the database when we execute our first GET request. But before requesting any data, let's take a closer look at the usage of LevelUP. The get() function of LevelDB takes two arguments: The first argument is the key to be used in the query. The second argument is a handler function used to process the results. It also has two additional arguments: A Boolean value, specifying whether an error has occurred during the query The actual result entity from the database. Let's start it with Node's levelup.js and execute some test requests with the REST Client tool to http://localhost:3000/contacts/%2B359777123456. This can be seen in the following screenshot: Expectedly the response is a JSON representation of the contact inserted statically in LevelUP during the initialization of the application. Requesting any other key will result in an "HTTP 404 Not found" response. This example demonstrates how to bind a LevelUP operation to an HTTP operation and process its results, but currently, it lacks support for inserting, editing, and deleting data. We will improve that with the next sample. It binds the HTTP's GET, PUT, and DELETE operations exposed via an express route, /contacts/:number, to the LevelDB's get, put, and del handlers: var express = require('express') , http = require('http') , path = require('path') , bodyParser = require('body-parser') , logger = require('morgan') , methodOverride = require('method-override') , errorHandler = require('errorhandler') , levelup = require('levelup'); var app = express(); var url = require('url'); // all environments app.set('port', process.env.PORT || 3000); app.set('views', __dirname + '/views'); app.set('view engine', 'jade'); app.use(methodOverride()); app.use(bodyParser.json()); // development only if ('development' == app.get('env')) { app.use(errorHandler()); } var db = levelup('./contact', {valueEncoding: 'json'}); app.get('/contacts/:number', function(request, response) { console.log(request.url + ' : querying for ' + request.params.number); db.get(request.params.number, function(error, data) { if (error) { response.writeHead(404, { 'Content-Type' : 'text/plain'}); response.end('Not Found'); return; } response.setHeader('content-type', 'application/json'); response.send(data); }); }); app.post('/contacts/:number', function(request, response) { console.log('Adding new contact with primary number' + request.params.number); db.put(request.params.number, request.body, function(error) { if (error) { response.writeHead(500, { 'Content-Type' : 'text/plain'}); response.end('Internal server error'); return; } response.send(request.params.number + ' successfully inserted'); }); }); app.del('/contacts/:number', function(request, response) { console.log('Deleting contact with primary number' + request.params.number); db.del(request.params.number, function(error) { if (error) { response.writeHead(500, { 'Content-Type' : 'text/plain'}); response.end('Internal server error'); return; } response.send(request.params.number + ' successfully deleted'); }); }); app.get('/contacts', function(request, response) { console.log('Listing all contacts'); var is_first = true; response.setHeader('content-type', 'application/json'); db.createReadStream() .on('data', function (data) { console.log(data.value); if (is_first == true) { response.write('['); } else { response.write(','); } response.write(JSON.stringify(data.value)); is_first = false; }) .on('error', function (error) { console.log('Error while reading', error) }) .on('close', function () { console.log('Closing db stream');}) .on('end', function () { console.log('Db stream closed'); response.end(']'); }) }); console.log('Running at port ' + app.get('port')); http.createServer(app).listen(app.get('port')); Perhaps the most interesting part of the preceding sample is the handler of the /contacts route. It writes a JSON array of all the contacts available in the database to the output stream of the HTTP response. LevelUP's createInputStream method exposes a data handler for every key/value pair available. As LevelDB is not aware of the format of its values, we have to use the native JSON.stringify method to convert each value to a JSON object, based on which we can implement any kind of login. Let's assume we want a function that flushes to the HTTP response only those contacts whose first name is Joe. Then we will need to add filtering logic to the data handler: db.createReadStream() .on('data', function (data) { if (is_first == true) { response.write('['); } else { response.write(','); } if (data.value.lastname.toString() == 'Smith') { var jsonString = JSON.stringify(data.value) console.log('Adding Mr. ' + data.value.lastname + ' to the response'); response.write(jsonString); is_first = false; } else{ console.log('Skipping Mr. ' + data.value.lastname); } }) .on('error', function (error) { console.log('Error while reading', error) }) .on('close', function () { console.log('Closing db stream'); }) .on('end', function () { console.log('Db stream closed'); response.end(']'); }) This looks a bit artificial, doesn't it? Well, this is all that LevelDB can possibly offer us, since LevelDB can search only by a single key. This makes it an inappropriate option for data that has to be indexed by several different attributes. This is where document stores come into play. Summary In this article, we looked at one type of NoSQL database: LevelDB, a key/value datastore. We utilized it to implement automated test for the database layer. Resources for Article: Further resources on this subject: Node.js Fundamentals and Asynchronous JavaScript [article] An Introduction to Node.js Design Patterns [article] Making a Web Server in Node.js [article]
Read more
  • 0
  • 0
  • 2264

article-image-advanced-shell-topics
Packt
26 Apr 2016
10 min read
Save for later

Advanced Shell Topics

Packt
26 Apr 2016
10 min read
In this article by Thomas Bitterman, the author of the book Mastering IPython 4.0, we will look at the tools the IPython interactive shell provides. With the split of the Jupyter and IPython projects, the command line provided by IPython will gain importance. This article covers the following topics: What is IPython? Installing IPython Starting out with the terminal IPython beyond Python Magic commands (For more resources related to this topic, see here.) What is IPython? IPython is an open source platform for interactive and parallel computing. It started with the realization that the standard Python interpreter was too limited for sustained interactive use, especially in the areas of scientific and parallel computing. Overcoming these limitations resulted in a three-part architecture: An enhanced, interactive shell Separation of the shell from the computational kernel A new architecture for parallel computing This article will provide a brief overview of the architecture before introducing some basic shell commands. Before proceeding further, however, IPython needs to be installed. Those readers with experience in parallel and high-performance computing but new to IPython will find the following sections useful in quickly getting up to speed. Those experienced with IPython may skim the next few sections, noting where things have changed now that the notebook is no longer an integral part of development. Installing IPython The first step in installing IPython is to install Python. Instructions for the various platforms differ, but the instructions for each can be found on the Python home page at http://www.python.org. IPython requires Python 2.7 or ≥ 3.3. This article will use 3.5. Both Python and IPython are open source software, so downloading and installation are free. A standard Python installation includes the pip package manager. pip is a handy command-line tool that can be used to download and install various Python libraries. Once Python is installed, IPython can be installed with this command: pip install ipython IPython comes with a test suite called iptest. To run it, simply issue the following command: iptest A series of tests will be run. It is possible (and likely on Windows) that some libraries will be missing, causing the associated tests to fail. Simply use pip to install those libraries and rerun the test until everything passes. It is also possible that all tests pass without an important library being installed. This is the readline library (also known as PyReadline). IPython will work without it but will be missing some features that are useful for the IPython terminal, such as command completion and history navigation. To install readline, use pip: pip install readline pip install gnureadline At this point, issuing the ipython command will start up an IPython interpreter: ipython IPython beyond Python No one would use IPython if it were not more powerful than the standard terminal. Much of IPython's power comes from two features: Shell integration Magic commands Shell integration Any command starting with ! is passed directly to the operating system to be executed, and the result is returned. By default, the output is then printed out to the terminal. If desired, the result of the system command can be assigned to a variable. The result is treated as a multiline string, and the variable is a list containing one string element per line of output. For example: In [22]: myDir = !dir In [23]: myDir Out[23]: [' Volume in drive C has no label.', ' Volume Serial Number is 1E95-5694', '', ' Directory of C:\Program Files\Python 3.5', '', '10/04/2015 08:43 AM <DIR> .', '10/04/2015 08:43 AM <DIR> ..',] While this functionality is not entirely absent in straight Python (the OS and subprocess libraries provide similar abilities), the IPython syntax is much cleaner. Additional functionalities such as input and output caching, directory history, and automatic parentheses are also included. History The previous examples have had lines that were prefixed by elements such as In[23] and Out[15]. In and Out are arrays of strings, where each element is either an input command or the resulting output. They can be referred to using the arrays notation, or "magic" commands can accept the subscript alone. Magic commands IPython also accepts commands that control IPython itself. These are called "magic" commands, and they start with % or %%. A complete list of magic commands can be found by typing %lsmagic in the terminal. Magics that start with a single % sign are called "line" magics. They accept the rest of the current line for arguments. Magics that start with %% are called "cell" magics. They accept not only the rest of the current line but also the following lines. There are too many magic commands to go over in detail, but there are some related families to be aware of: OS equivalents: %cd, %env, %pwd Working with code: %run, %edit, %save, %load, %load_ext, %%capture Logging: %logstart, %logstop, %logon, %logoff, %logstate Debugging: %debug, %pdb, %run, %tb Documentation: %pdef, %pdoc, %pfile, %pprint, %psource, %pycat, %%writefile Profiling: %prun, %time, %run, %time, %timeit Working with other languages: %%script, %%html, %%javascript, %%latex, %%perl, %%ruby With magic commands, IPython becomes a more full-featured development environment. A development session might include the following steps: Set up the OS-level environment with the %cd, %env, and ! commands. Set up the Python environment with %load and %load_ext. Create a program using %edit. Run the program using %run. Log the input/output with %logstart, %logstop, %logon, and %logoff. Debug with %pdb. Create documentation with %pdoc and %pdef. This is not a tenable workflow for a large project, but for exploratory coding of smaller modules, magic commands provide a lightweight support structure. Creating custom magic commands IPython supports the creation of custom magic commands through function decorators. Luckily, one does not have to know how decorators work in order to use them. An example will explain. First, grab the required decorator from the appropriate library: In [1]: from IPython.core.magic import(register_line_magic) Then, prepend the decorator to a standard IPython function definition: In [2]: @register_line_magic ...: def getBootDevice(line): ...: sysinfo = !systeminfo ...: for ln in sysinfo: ...: if ln.startswith("Boot Device"): ...: return(ln.split()[2]) ...: Your new magic is ready to go: In [3]: %getBootDevice Out[3]: '\Device\HarddiskVolume1' Some observations are in order: Note that the function is, for the most part, standard Python. Also note the use of the !systeminfo shell command. One can freely mix both standard Python and IPython in IPython. The name of the function will be the name of the line magic. The parameter, "line," contains the rest of the line (in case any parameters are passed). A parameter is required, although it need not be used. The Out associated with calling this line magic is the return value of the magic. Any print statements executed as part of the magic are displayed on the terminal but are not part of Out (or _). Cython We are not limited to writing custom magic commands in Python. Several languages are supported, including R and Octave. We will look at one in particular, Cython. Cython is a language that can be used to write C extensions for Python. The goal for Cython is to be a superset of Python, with support for optional static type declarations. The driving force behind Cython is efficiency. As a compiled language, there are performance gains to be had from running C code. The downside is that Python is much more productive in terms of programmer hours. Cython can translate Python code into compiled C code, achieving more efficient execution at runtime while retaining the programmer-friendliness of Python. The idea of turning Python into C is not new to Cython. The default and most widely used interpreter (CPython) for Python is written in C. In some sense then, running Python code means running C code, just through an interpreter. There are other Python interpreter implementations as well, including those in Java (Jython) and C# (IronPython). CPython has a foreign function interface to C. That is, it is possible to write C language functions that interface with CPython in such a way that data can be exchanged and functions invoked from one to the other. The primary use is to call C code from Python. There are, however, two primary drawbacks: writing code that works with the CPython foreign function interface is difficult in its own right; and doing so requires knowledge of Python, C, and CPython. Cython aims to remedy this problem by doing all the work of turning Python into C and interfacing with CPython internally to Cython. The programmer writes Cython code and leaves the rest to the Cython compiler. Cython is very close to Python. The primary difference is the ability to specify C types for variables using the cdef keyword. Cython then handles type checking and conversion between Python values and C values, scoping issues, marshalling and unmarshalling of Python objects into C structures, and other cross-language issues. Cython is enabled in IPython by loading an extension. In order to use the Cython extension, do this: In [1]: %load_ext Cython At this point, the cython cell magic can be invoked: In [2]: %%cython ...: def sum(int a, int b): ...: cdef int s = a+b ...: return s And the Cython function can now be called just as if it were a standard Python function: In [3]: sum(1, 1) Out[3]: 2 While this may seem like a lot of work for something that could have been written more easily in Python in the first place, that is the price to be paid for efficiency. If, instead of simply summing two numbers, a function is expensive to execute and is called multiple times (perhaps in a tight loop), it can be worth it to use Cython for a reduction in runtime. There are other languages that have merited the same treatment, GNU Octave and R among them. Summary In this article, we covered many of the basics of using IPython for development. We started out by just getting an instance of IPython running. The intrepid developer can perform all the steps by hand, but there are also various all-in-one distributions available that will include popular modules upon installation. By default, IPython will use the pip package managers. Again, the all-in-one distributions provide added value, this time in the form of advanced package management capability. At that point, all that is obviously available is a terminal, much like the standard Python terminal. IPython offers two additional sources of functionality, however: configuration and magic commands. Magic commands fall into several categories: OS equivalents, working with code, logging, debugging, documentation, profiling, and working with other languages among others. Add to this the ability to create custom magic commands (in IPython or another language) and the IPython terminal becomes a much more powerful alternative to the standard Python terminal. Also included in IPython is the debugger—ipdb. It is very similar to the Python pdb debugger, so it should be familiar to Python developers. All this is supported by the IPython architecture. The basic idea is that of a Read-Eval-Print loop in which the Eval section has been separated out into its own process. This decoupling allows different user interface components and kernels to communicate with each other, making for a flexible system. This flexibility extends to the development environment. There are IDEs devoted to IPython (for example, Spyder and Canopy) and others that originally targeted Python but also work with IPython (for example, Eclipse). There are too many Python IDEs to list, and many should work with an IPython kernel "dropped in" as a superior replacement to a Python interpreter. Resources for Article: Further resources on this subject: Python Data Science Up and Running [article] Scientific Computing APIs for Python [article] Overview of Process Management in Microsoft Visio 2013 [article]
Read more
  • 0
  • 0
  • 1964

article-image-introducing-and-setting-go
Packt
26 Apr 2016
9 min read
Save for later

Introducing and Setting Up GO

Packt
26 Apr 2016
9 min read
In this article by Nathan Kozyra, the author of the book Learning Go Web Development, one of the most common things you'll hear being said is that it's a systems language. Indeed, one of the earlier descriptions of Go, by the Go team itself, was that the language was built to be a modern systems language. It was constructed to combine the speed and the power of languages, such as C with the syntactical elegance and thrift of modern interpreted languages, such as Python. You can see that the goal is realized when you look at just a few snippets of the Go code. The Go FAQ, on why Go was created: "Go was born out of frustration with existing languages and environments for systems programming." Perhaps the largest part of the present-day systems programming comprises of designing backend servers. Obviously, the Web comprises a huge, but not exclusive, percentage of that world. Go hasn't been considered a web language until recently. Unsurprisingly, it took a few years of developers dabbling, experimenting, and finally embracing the language to start taking it to new avenues. While Go is web-ready out of the box, it lacks a lot of the critical frameworks and tools people so often take for granted with web development now. As the community around Go grew, the scaffolding began to manifest in a lot of new and exciting ways. Combined with existing ancillary tools, Go is now a wholly viable option for end-to-end web development. However, lets get back to the primary question: Why Go? To be fair, it's not right for every web project, but any application that can benefit from a high-performance, secure web-serving out of the box with the added benefits of a beautiful concurrency model would make for a good candidate. We're not going to deal with a lot of low-level aspects of the Go language. For example, we assume that you're familiar with variable and constant declaration. We assume that you understand control structures. In this article, we will cover the following topics: Installing Go Structuring a project Importing packages (For more resources related to this topic, see here.) Installing Go The most critical first step is, of course, making sure that Go is available and ready to start our first web server. While one of Go's biggest selling points is its cross-platform support (both building and using locally while targeting other operating systems), your life will be much more easier on a Nix compatible platform. If you're on Windows, don't fear. Natively, you may run into incompatible packages and firewall issues when running using Go run and some other quirks, but 95% of the Go ecosystem will be available to you. You can also, very easily, run a virtual machine and in fact that is a great way to simulate a potential production environment. In-depth installation instructions are available at https://golang.org/doc/install, but we'll talk about a few quirky points here before moving on. For OS X and Windows, Go is provided as a part of a binary installation packages. For any Linux platform with a package manager, things can be pretty easy. To install via common Linux package managers: Ubuntu: sudo apt-get golang CentOS: sudo yum install golang On both OS X and Linux, you'll need to add a couple of lines to your path—the GOPATH and PATH. First, you'll have to find the location of your Go binary's installation. This varies from distribution to distribution. Once you've found that, you can configure the PATH and GOPATH, as follows: export PATH=$PATH:/usr/local/go/bin export GOPATH="/usr/share/go" While the path to be used is not defined rigidly, some convention has coalesced around; starting at a subdirectory directly under your user's home directory, such as $HOME/go or ~Home/go. As long as this location is set perpetually and doesn't change, you won't run into issues with conflicts or missing packages. You can test the impact of these changes by running the go env command. If you see any issues with this, it means that your directories are not correct. Note that this may not prevent Go from running—depending on whether the GOBIN directory is properly set—but will prevent you from installing packages globally across your system. To test the installation, you can grab any Go package using a go get command and create a Go file somewhere. As a quick example, first get a package at random, we'll use a package from the Gorilla framework. go get github.com/gorilla/mux If this runs without any issue, Go is finding your GOPATH correctly. To make sure that Go is able to access your downloaded packages, draw up a very quick package that will attempt to utilize Gorilla's mux package and run it to verify whether the packages are found: package main import ( "fmt" "github.com/gorilla/mux" "net/http" ) func TestHandler(w http.ResponseWriter, r *http.Request) { } func main() { router := mux.NewRouter() router.HandleFunc("/test", TestHandler) http.Handle("/", router) fmt.Println("Everything is set up!") } Run go run test.go in the command line. It won't do much, but it will deliver the good news, as shown in the following screenshot: Structuring a project When you're first getting started and mostly playing around, there's no real problem with setting your application lazily. For example, to get started as quickly as possible, you can create a simple hello.go file anywhere you like and compile without any issue. But when you get into environments that require multiple or distinct packages (more on that shortly) or have more explicit cross-platform requirements, it makes sense to design your projects in a way that will facilitate the use of the Go build tool. The value of setting up your code in this manner lies in the way that the Go build tool works. If you have local (to your project) packages, the build tool will look in the src directory first and then in your GOPATH. When you're building for other platforms, Go build will utilize the local bin folder to organize the binaries. While building packages that are intended for mass use, you may also find that either starting your application under your GOPATH directory and then symbolically linking it to another directory or doing the opposite will allow you to develop without the need to subsequently go get your own code. Code conventions As with any language, being a part of the Go community means perpetual consideration of the way others create their code. Particularly if you're going to work in open source repositories, you'll want to generate your code the way others do, to reduce the amount of friction when people get or include your code. One incredibly helpful piece of tooling that the Go team has included is go fmt. fmt here, of course, means format and that's exactly what this tool does, it automatically formats your code according to the designed conventions. By enforcing style conventions, the Go team has helped to mitigate one of the most common and pervasive debates that exist among a lot of other languages. While the language communities tend to drive coding conventions, there are always little idiosyncrasies in the way individuals write programs. Let's use one of the most common examples around—where to put the opening bracket. Some programmers like it on the same line as the statement: for (int i = 0; i < 100; i++) { // do something } While others prefer it in the subsequent line: for (int i = 0; i < 100; i++) { // do something } These types of minor differences spark major, near-religious debates. The Gofmt tool helps alleviate this by allowing you to yield to Go's directive. Now, Go bypasses this obvious source of contention at the compiler, by formatting your code similar to the latter example discussed earlier. The compiler will complain and all you'll get is a fatal error. However, the other style choices have some flexibility, which are enforced when you use the tool to format. Here, for example, is a piece of code in Go before go fmt: func Double(n int) int { if (n == 0) { return 0 } else { return n * 2 } } Arbitrary whitespace can be the bane of a team's existence when it comes to sharing and reading code, particularly when every team member is not on the same IDE. By running go fmt, we clean this up, thereby translating our whitespace according to Go's conventions: func Double(n int) int { if n == 0 { return 0 } else { return n * 2 } } Long story short: always run go fmt before shipping or pushing your code. Importing packages Beyond the absolute and the most trivial application—one that cannot even produce a Hello World output—you must have some imported package in a Go application. To say Hello World, for example, we'd need some sort of a way to generate an output. Unlike in many other languages, even the core language library is accessible by a namespaced package. In Go, namespaces are handled by a repository endpoint URL, which is github.com/nkozyra/gotest, which can be opened directly on Github (or any other public location) for the review. Handling private repositories The go get tool easily handles packages hosted at the repositories, such as Github, Bitbucket, and Google Code (as well as a few others). You can also host your own projects, ideally a git project, elsewhere; although it might introduce some dependencies and sources for errors, which you'd probably like to avoid. But, what about the private repos? While go get is a wonderful tool, you'll find yourself looking at an error without some additional configuration with an SSH agent forwarding and so on. You can work around this in a couple of ways, but one very simple method is to clone the repository locally, using your version control software directly. Summary This article serves as an introduction to the most basic concepts of Go and producing for the Web in Go, but these points are critical foundational elements for being productive in the language and in the community. We've looked at coding conventions and package design and organization. Obviously, we're a long way from a real, mature application for the Web, but the building blocks are essential to getting there. Resources for Article: Further resources on this subject: ASP.NET 3.5 CMS: Adding Security and Membership (Part 2) [article] Working with Drupal Audio in Flash (part 2) [article] Posting on Your WordPress Blog [article]
Read more
  • 0
  • 0
  • 1611

article-image-mobile-forensics-and-its-challanges
Packt
25 Apr 2016
10 min read
Save for later

Mobile Forensics and Its Challanges

Packt
25 Apr 2016
10 min read
In this article by Heather Mahalik and Rohit Tamma, authors of the book Practical Mobile Forensics, Second Edition, we will cover the following topics: Introduction to mobile forensics Challenges in mobile forensics (For more resources related to this topic, see here.) Why do we need mobile forensics? In 2015, there were more than 7 billion mobile cellular subscriptions worldwide, up from less than 1 billion in 2000, says International Telecommunication Union (ITU). The world is witnessing technology and user migration from desktops to mobile phones. The following figure sourced from statista.com shows the actual and estimated growth of smartphones from the year 2009 to 2018. Growth of smartphones from 2009 to 2018 in million units Gartner Inc. reports that global mobile data traffic reached 52 million terabytes (TB) in 2015, an increase of 59 percent from 2014, and the rapid growth is set to continue through 2018, when mobile data levels are estimated to reach 173 million TB. Smartphones of today, such as the Apple iPhone, Samsung Galaxy series, and BlackBerry phones, are compact forms of computers with high performance, huge storage, and enhanced functionalities. Mobile phones are the most personal electronic device that a user accesses. They are used to perform simple communication tasks, such as calling and texting, while still providing support for Internet browsing, e-mail, taking photos and videos, creating and storing documents, identifying locations with GPS services, and managing business tasks. As new features and applications are incorporated into mobile phones, the amount of information stored on the devices is continuously growing. Mobiles phones become portable data carriers, and they keep track of all your moves. With the increasing prevalence of mobile phones in peoples' daily lives and in crime, data acquired from phones become an invaluable source of evidence for investigations relating to criminal, civil, and even high-profile cases. It is rare to conduct a digital forensic investigation that does not include a phone. Mobile device call logs and GPS data were used to help solve the attempted bombing in Times Square, New York, in 2010. The details of the case can be found at http://www.forensicon.com/forensics-blotter/cell-phone-email-forensics-investigation-cracks-nyc-times-square-car-bombing-case/. The science behind recovering digital evidence from mobile phones is called mobile forensics. Digital evidence is defined as information and data that is stored on, received, or transmitted by an electronic device that is used for investigations. Digital evidence encompasses any and all digital data that can be used as evidence in a case. Mobile forensics Digital forensics is a branch of forensic science focusing on the recovery and investigation of raw data residing in electronic or digital devices. The goal of the process is to extract and recover any information from a digital device without altering the data present on the device. Over the years, digital forensics grew along with the rapid growth of computers and various other digital devices. There are various branches of digital forensics based on the type of digital device involved such as computer forensics, network forensics, mobile forensics, and so on. Mobile forensics is a branch of digital forensics related to the recovery of digital evidence from mobile devices. Forensically sound is a term used extensively in the digital forensics community to qualify and justify the use of particular forensic technology or methodology. The main principle for a sound forensic examination of digital evidence is that the original evidence must not be modified. This is extremely difficult with mobile devices. Some forensic tools require a communication vector with the mobile device, thus a standard write protection will not work during forensic acquisition. Other forensic acquisition methods may involve removing a chip or installing a bootloader on the mobile device prior to extract data for forensic examination. In cases where the examination or data acquisition is not possible without changing the configuration of the device, the procedure and the changes must be tested, validated, and documented. Following proper methodology and guidelines is crucial in examining mobile devices as it yields the most valuable data. As with any evidence gathering, not following the proper procedure during the examination can result in loss or damage of evidence or render it inadmissible in court. The mobile forensics process is broken into three main categories: seizure, acquisition, and examination/analysis. Forensic examiners face some challenges while seizing the mobile device as a source of evidence. At the crime scene, if the mobile device is found switched off, the examiner should place the device in a faraday bag to prevent changes should the device automatically power on. As shown in the following figure, Faraday bags are specifically designed to isolate the phone from the network. A Faraday bag (Image courtesy: http://www.amazon.com/Black-Hole-Faraday-Bag-Isolation/dp/B0091WILY0) If the phone is found switched on, switching it off has a lot of concerns attached to it. If the phone is locked by a PIN or password or encrypted, the examiner will be required to bypass the lock or determine the PIN to access the device. Mobile phones are networked devices and can send and receive data through different sources, such as telecommunication systems, Wi-Fi access points, and Bluetooth. So, if the phone is in a running state, a criminal can securely erase the data stored on the phone by executing a remote wipe command. When a phone is switched on, it should be placed in a faraday bag. If possible, prior to placing the mobile device in the faraday bag, disconnect it from the network to protect the evidence by enabling the flight mode and disabling all network connections (Wi-Fi, GPS, Hotspots, and so on). This will also preserve the battery, which will drain while in a faraday bag and protect against leaks in the faraday bag. Once the mobile device is seized properly, the examiner may need several forensic tools to acquire and analyze the data stored on the phone. Mobile phones are dynamic systems that present a lot of challenges to the examiner in extracting and analyzing digital evidence. The rapid increase in the number of different kinds of mobile phones from different manufacturers makes it difficult to develop a single process or tool to examine all types of devices. Mobile phones are continuously evolving as existing technologies progress and new technologies are introduced. Furthermore, each mobile is designed with a variety of embedded operating systems. Hence, special knowledge and skills are required from forensic experts to acquire and analyze the devices. Challenges in mobile forensics One of the biggest forensic challenges when it comes to the mobile platform is the fact that data can be accessed, stored, and synchronized across multiple devices. As the data is volatile and can be quickly transformed or deleted remotely, more effort is required for the preservation of this data. Mobile forensics is different from computer forensics and presents unique challenges to forensic examiners. Law enforcement and forensic examiners often struggle to obtain digital evidence from mobile devices. The following are some of the reasons: Hardware differences: The market is flooded with different models of mobile phones from different manufacturers. Forensic examiners may come across different types of mobile models, which differ in size, hardware, features, and operating system. Also, with a short product development cycle, new models emerge very frequently. As the mobile landscape is changing each passing day, it is critical for the examiner to adapt to all the challenges and remain updated on mobile device forensic techniques across various devices. Mobile operating systems: Unlike personal computers where Windows has dominated the market for years, mobile devices widely use more operating systems, including Apple's iOS, Google's Android, RIM's BlackBerry OS, Microsoft's Windows Mobile, HP's webOS, Nokia's Symbian OS, and many others. Even within these operating systems, there are several versions which make the task of forensic investigator even more difficult. Mobile platform security features: Modern mobile platforms contain built-in security features to protect user data and privacy. These features act as a hurdle during the forensic acquisition and examination. For example, modern mobile devices come with default encryption mechanisms from the hardware layer to the software layer. The examiner might need to break through these encryption mechanisms to extract data from the devices. Lack of resources: As mentioned earlier, with the growing number of mobile phones, the tools required by a forensic examiner would also increase. Forensic acquisition accessories, such as USB cables, batteries, and chargers for different mobile phones, have to be maintained in order to acquire those devices. Preventing data modification: One of the fundamental rules in forensics is to make sure that data on the device is not modified. In other words, any attempt to extract data from the device should not alter the data present on that device. But this is practically not possible with mobiles because just switching on a device can change the data on that device. Even if a device appears to be in an off state, background processes may still run. For example, in most mobiles, the alarm clock still works even when the phone is switched off. A sudden transition from one state to another may result in the loss or modification of data. Anti-forensic techniques: Anti-forensic techniques, such as data hiding, data obfuscation, data forgery, and secure wiping, make investigations on digital media more difficult. Dynamic nature of evidence: Digital evidence may be easily altered either intentionally or unintentionally. For example, browsing an application on the phone might alter the data stored by that application on the device. Accidental reset: Mobile phones provide features to reset everything. Resetting the device accidentally while examining may result in the loss of data. Device alteration: The possible ways to alter devices may range from moving application data, renaming files, and modifying the manufacturer's operating system. In this case, the expertise of the suspect should be taken into account. Passcode recovery: If the device is protected with a passcode, the forensic examiner needs to gain access to the device without damaging the data on the device. While there are techniques to bypass the screen lock, they may not work always on all the versions. Communication shielding: Mobile devices communicate over cellular networks, Wi-Fi networks, Bluetooth, and Infrared. As device communication might alter the device data, the possibility of further communication should be eliminated after seizing the device. Lack of availability of tools: There is a wide range of mobile devices. A single tool may not support all the devices or perform all the necessary functions, so a combination of tools needs to be used. Choosing the right tool for a particular phone might be difficult. Malicious programs: The device might contain malicious software or malware, such as a virus or a Trojan. Such malicious programs may attempt to spread over other devices over either a wired interface or a wireless one. Legal issues: Mobile devices might be involved in crimes, which can cross geographical boundaries. In order to tackle these multijurisdictional issues, the forensic examiner should be aware of the nature of the crime and the regional laws. Summary Mobile devices store a wide range of information such as SMS, call logs, browser history, chat messages, location details, and so on. Mobile device forensics includes many approaches and concepts that fall outside of the boundaries of traditional digital forensics. Extreme care should be taken while handling the device right from evidence intake phase to archiving phase. Examiners responsible for mobile devices must understand the different acquisition methods and the complexities of handling the data during analysis. Extracting data from a mobile device is half the battle. The operating system, security features, and type of smartphone will determine the amount of access you have to the data. It is important to follow sound forensic practices and make sure that the evidence is unaltered during the investigation. Resources for Article: Further resources on this subject: Forensics Recovery [article] Mobile Phone Forensics – A First Step into Android Forensics [article] Mobility [article]
Read more
  • 0
  • 0
  • 41523

article-image-features-sitecore
Packt
25 Apr 2016
17 min read
Save for later

Features of Sitecore

Packt
25 Apr 2016
17 min read
In this article by Yogesh Patel, the author of the book, Sitecore Cookbook for Developers, we will discuss about the importance of Sitecore and its good features. (For more resources related to this topic, see here.) Why Sitecore? Sitecore Experience Platform (XP) is not only an enterprise-level content management system (CMS), but rather a web framework or web platform, which is the global leader in experience management. It continues to be very popular because of its highly scalable and robust architecture, continuous innovations, and ease of implementations compared to other CMSs available. It also provides an easier integration with many external platforms such as customer relationship management (CRM), e-commerce, and so on. Sitecore architecture is built with the Microsoft .NET framework and provides greater depth of APIs, flexibility, scalability, performance, and power to developers. It has great out-of-the-box capabilities, but one of its great strengths is the ease of extending these capabilities; hence, developers love Sitecore! Sitecore provides many features and functionalities out of the box to help content owners and marketing teams. These features can be extended and highly customized to meet the needs of your unique business rules. Sitecore provides these features with different user-friendly interfaces for content owners that helps them manage content and media easily and quickly. Sitecore user interfaces are supported on almost every modern browser. In addition, fully customized web applications can be layered in and integrated with other modules and tools using Sitecore as the core platform. It helps marketers to optimize the flow of content continuously for better results and more valuable outcomes. It also provides in-depth analytics, personalized experience to end users, and marketing automation tools, which play a significant role for marketing teams. The following are a few of the many features of Sitecore. CMS based on the .NET Framework Sitecore provides building components on ASP.NET Web forms as well as ASP.NET Model-View-Controller (MVC) frameworks, so developers can choose either approach to match the required architecture. Sitecore provides web controls and sublayouts while working with ASP.NET web forms and view rendering, controller rendering, and models and item rendering while working with the ASP.NET MVC framework. Sitecore also provides two frameworks to prepare user interface (UI) applications for Sitecore clients—Sheer UI and SPEAK. Sheer UI applications are prepared using Extensible Application Markup Language (XAML) and most of the Sitecore applications are prepared using Sheer UI. Sitecore Process Enablement and Accelerator Kit (SPEAK) is the latest framework to develop Sitecore applications with a consistent interface quickly and easily. SPEAK gives you a predefined set of page layouts and components: Component-based architecture Sitecore is built on a component-based architecture, which provides us with loosely coupled independent components. The main advantage of these components is their reusability and loosely coupled independent behaviour. It aims to provide reusability of components at the page level, site level, and Sitecore instance level to support multisite or multitenant sites. Components in Sitecore are built with the normal layered approach, where the components are split into layers such as presentation, business logic, data layer, and so on. Sitecore provides different presenation components, including layouts, sublayouts, web control renderings, MVC renderings, and placeholders. Sitecore manages different components in logical grouping by their templates, layouts, sublayouts, renderings, devices, media, content items, and so on: Layout engine The Sitecore layout engine extends the ASP.NET web application server to merge content with presentation logic dynamically when web clients request resources. A layout can be a web form page (.aspx) or MVC view (.cshtml) file. A layout can have multiple placeholders to place content on predefined places, where the controls are placed. Controls can be HTML markup controls such as a sublayout (.ascx) file, MVC view (.cshtml) file, or other renderings such as web control, controller rendering, and so on, which can contain business logic. Once the request criteria are resolved by the layout engine, such as item, language, and device, the layout engine creates a platform to render different controls and assemble their output to relevant placeholders on the layout. Layout engine provides both static and dynamic binding. So, with dynamic binding, we can have clean HTML markups and reusability of all the controls or components. Binding of controls, layouts, and devices can be applied on Sitecore content items itself, as shown in the following screenshot: Once the layout engine renders the page, you can see how the controls will be bound to the layout, as shown in the following image: The layout engine in Sitecore is reponsible for layout rendering, device detection, rule engine, and personalization: Multilingual support In Sitecore, content can be maintained in any number of languages. It provides easier integration with external translation providers for seamless translation and also supports the dynamic creation of multilingual web pages. Sitecore also supports the language fallback feature on the field, item, and template level, which makes life easier for content owners and developers. It also supports chained fallback. Multi-device support Devices represent different types of web clients that connect to the Internet and place HTTP requests. Each device represents a different type of web client. Each device can have unique markup requirements. As we saw, the layout engine applies the presentation components specified for the context device to the layout details of the context item. In the same way, developers can use devices to format the context item output using different collections of presentation components for various types of web clients. Dynamically assembled content can be transformed to conform to virtually any output format, such as a mobile, tablet, desktop, print, or RSS. Sitecore also supports the device fallback feature so that any web page not supported for the requesting device can still be served through the fallback device. It also supports chained fallback for devices. Multi-site capabilities There are many ways to manage multisites on a single Sitecore installation. For example, you can host multiple regional domains with different regional languages as the default language for a single site. For example, http://www.sitecorecookbook.com will serve English content, http://www.sitecorecookbook.de will serve German content of the same website, and so on. Another way is to create multiple websites for different subsidiaries or franchise of a company. In this approach, you can share some common resources across all the sites such as templates, renderings, user interface elements, and other content or media items, but have unique content and pages so that you can find a separate existence of each website in Sitecore. Sitecore has security capabilities so that each franchise or subsidiary can manage their own website independently without affecting other websites. Developers have full flexibility to re-architect Sitecore's multisite architecture as per business needs. Sitecore also supports multitenant multisite architecture so that each website can work as an individual physical website. Caching Caching plays a very important role in website performance. Sitecore contains multiple levels of caching such as prefetch cache, data cache, item cache, and HTML cache. Apart from this, Sitecore creates different caching such as standard values cache, filtered item cache, registry cache, media cache, user cache, proxy cache, AccessResult cache, and so on. This makes understanding all the Sitecore caches really important. Sitecore caching is a very vast topic to cover; you can read more about it at http://sitecoreblog.patelyogesh.in/2013/06/how-sitecore-caching-work.html. Configuration factory Sitecore is configured using IIS's configuration file, Web.config. Sitecore configuration factory allows you to configure pipelines, events, scheduling agents, commands, settings, properties, and configuration nodes in Web.config files, which can be defined in the /configuration/sitecore path. Configurations inside this path can be spread out between multiple files to make it scalable. This process is often called config patching. Instead of touching the Web.config file, Sitecore provides the Sitecore.config file in the App_ConfigInclude directory, which contains all the important Sitecore configurations. Functionality-specific configurations are split into the number of .config files based, which you can find in its subdirectories. These .config files are merged into a single configuration file at runtime, which you can evaluate using http://<domain>/sitecore/admin/showconfig.aspx. Thus, developers create custom .config files in the App_ConfigInclude directory to introduce, override, or delete settings, properties, configuration nodes, and attributes without touching Sitecore's default .config files. This makes managing .config files very easy from development to deployment. You can learn more about file patching from https://sdn.sitecore.net/upload/sitecore6/60/include_file_patching_facilities_sc6orlater-a4.pdf. Dependency injection in .NET has become very common nowadays. If you want to build a generic and reusable functionality, you will surely go for the inversion of control (IoC) framework. Fortunately, Sitecore provides a solution that will allow you to easily use different IoC frameworks between projects. Using patch files, Sitecore allows you to define objects that will be available at runtime. These nodes are defined under /configuration/sitecore and can be retrieved using the Sitecore API. We can define types, constructors, methods, properties, and their input parameters in logical nodes inside nodes of pipelines, events, scheduling agents, and so on. You can learn more examples of it from http://sitecore-community.github.io/docs/documentation/Sitecore%20Fundamentals/Sitecore%20Configuration%20Factory/. Pipelines An operation to be performed in multiple steps can be carried out using the pipeline system, where each individual step is defined as a processor. Data processed from one processor is then carried to the next processor in arguments. The flow of the pipeline can be defined in XML format in the .config files. You can find default pipelines in the Sitecore.config file or patch file under the <pipelines> node (which are system processes) and the <processors> node (which are UI processes). The following image visualizes the pipeline and processors concept: Each processor in a pipeline contains a method named Process() that accepts a single argument, Sitecore.Pipelines.PipelineArgs, to get different argument values and returns void. A processor can abort the pipeline, preventing Sitecore from invoking subsequent processors. A page request traverses through different pipelines such as <preProcessRequest>, <httpRequestBegin>, <renderLayout>, <httpRequestEnd>, and so on. The <httpRequestBegin> pipeline is the heart of the Sitecore HTTP request execution process. It defines different processors to resolve the site, device, language, item, layout, and so on sequentially, which you can find in Sitecore.config as follows: <httpRequestBegin>   ...   <processor type="Sitecore.Pipelines.HttpRequest.SiteResolver,     Sitecore.Kernel"/>   <processor type="Sitecore.Pipelines.HttpRequest.UserResolver,     Sitecore.Kernel"/>   <processor type="     Sitecore.Pipelines.HttpRequest.DatabaseResolver,     Sitecore.Kernel"/>   <processor type="     Sitecore.Pipelines.HttpRequest.BeginDiagnostics,     Sitecore.Kernel"/>   <processor type="     Sitecore.Pipelines.HttpRequest.DeviceResolver,     Sitecore.Kernel"/>   <processor type="     Sitecore.Pipelines.HttpRequest.LanguageResolver,     Sitecore.Kernel"/>   ... </httpRequestBegin> There are more than a hundred pipelines, and the list goes on increasing after every new version release. Sitecore also allows us to create our own pipelines and processors. Background jobs When you need to do some long-running operations such as importing data from external services, sending e-mails to subscribers, resetting content item layout details, and so on, we can use Sitecore jobs, which are asynchronous operations in the backend that you can monitor in a foreground thread (Job Viewer) of Sitecore Rocks or by creating a custom Sitecore application. The jobs can be invoked from the user interface by users or can be scheduled. Sitecore provides APIs to invoke jobs with many different options available. You can simply create and start a job using the following code: public void Run() {   JobOptions options = new JobOptions("Job Name", "Job Category",     "Site Name", "current object", "Task Method to Invoke", new     object[] { rootItem })   {     EnableSecurity = true,     ContextUser = Sitecore.Context.User,     Priority = ThreadPriority.AboveNormal   };   JobManager.Start(options); } You can schedule tasks or jobs by creating scheduling agents in the Sitecore.config file. You can also set their execution frequency. The following example shows you how Sitecore has configured PublishAgent, which publishes a site every 12 hours and simply executes the Run() method of the Sitecore.Tasks.PublishAgent class: <scheduling>   <agent type="Sitecore.Tasks.PublishAgent" method="Run"     interval="12:00:00">     <param desc="source database">master</param>     <param desc="target database">web</param>     <param desc="mode (full or smart or       incremental)">incremental</param>     <param desc="languages">en, da</param>   </agent> </scheduling> Apart from this, Sitecore also provides you with the facility to define scheduled tasks in the database, which has a great advantage of storing tasks in the database, so that we can handle its start and end date and time. We can use it once or make it recurring as well. Workflow and publishing Workflows are essential to the content author experience. Workflows ensure that items move through a predefined set of states before they become publishable. It is necessary to ensure that content receives the appropriate reviews and approvals before publication to the live website. Apart from workflow, Sitecore provides highly configurable security features, access permissions, and versioning. Sitecore also provides full workflow history like when and by whom the content was edited, reviewed, or approved. It also allows you to restrict publishing as well as identify when it is ready to be published. Publishing is an essential part of working in Sitecore. Every time you edit or create new content, you have to publish it to see it on your live website. When publishing happens, the item is copied from the master database to the web database. So, the content of the web database will be shown on the website. When multiple users are working on different content pages or media items, publishing restrictions and workflows play a vital role to make releases, embargoed, or go-live successful. There are three types of publishing available in Sitecore: Republish: This publishes every item even though items are already published. Smart Publish: Sitecore compares the internal revision identifier of the item in the master and web databases. If both identifiers are different, it means that the item is changed in the master database, hence Sitecore will publish the item or skip the item if identifiers are the same. Incremental Publish: Every modified item is added to the publish queue. Once incremental publishing is done, Sitecore will publish all the items found in the publish queue and clear it. Sitecore also supports the publishing of subitems as well as related items (such as publishing a content item will also publish related media items). Search Sitecore comes with out-of-the-box Lucene support. You can also switch your Sitecore search to Solr, which just needs to install Solr and enable Solr configurations already available. Sitecore by default indexes Sitecore content in Lucene index files. The Sitecore search engine lets you search through millions of items of the content tree quickly with the help of different types of queries with Lucene or Solr indexes. Sitecore provides you with the following functionalities for content search: We can search content items and documents such as PDF, Word, and so on. It allows you to search content items based on preconfigured fields. It provides APIs to create and search composite fields as per business needs. It provides content search APIs to sort, filter, and page search results. We can apply wildcards to search complex results and autosuggest. We can apply boosting to influence search results or elevate results by giving more priority. We can create custom dictionaries and index files, using which we can suggest did you mean kind of suggestions to users. We can apply facets to refine search results as we can see on e-commerce sites. W can apply different analyzers to hunt MoreLikeThis results or similar results. We can tag content or media items to categorize them so that we can use features such as a tag cloud. It provides a scalable user interface to search content items and apply filters and operations to selected search results. It provides different indexing strategies to create transparent and diverse models for index maintenance. In short, Sitecore allows us to implement different searching techniques, which are available in Google or other search engines. Content authors always find it difficult while working with a big number of items. You can read more about Sitecore search at https://doc.sitecore.net/sitecore_experience_platform/content_authoring/searching/searching. Security model Sitecore has the reputation of being very easy to set up the security of users, roles, access rights, and so on. Sitecore follows the .NET security model, so we get all the basic information of the .NET membership in Sitecore, which offers several advantages: A variety of plug-and-play features provided directly by Microsoft The option to replace or extend the default configuration with custom providers It is also possible to store the accounts in different storage areas using several providers simultaneously Sitecore provides item-level and field-level rights and an option to create custom rights as well Dynamic user profile structure and role management is possible just through the user interface, which is simpler and easier compared to pure ASP.NET solutions It provides easier implementation for integration with external systems Even after having an extended wrapper on the .NET solution, we get the same performance as a pure ASP.NET solution Experience analytics and personalization Sitecore contains state-of-the-art Analysis, Insights, Decisions, Automation (AIDA) framework, which is the heart for marketing programs. It provides comprehensive analytics data and reports, insights from every website interaction with rules, behavior-based personalization, and marketing automation. Sitecore collects all the visitor interactions in a real-time, big data repository—Experience Database (xDB)—to increase the availability, scalability, and performance of website. Sitecore Marketing Foundation provides the following features: Sitecore uses MongoDB, a big marketing data repository that collects all customer interactions. It provides real-time data to marketers to automate interactions across all channels. It provides a unified 360 degree view of the individual website visitors and in-depth analytics reports. It provides fundamental analytics measurement components such as goals and events to evaluate the effectiveness of online business and marketing campaigns. It provides comprehensive conditions and actions to achieve conditional and behavioral or predictive personalization, which helps show customers what they are looking for instead of forcing them to see what we want to show. Sitecore collects, evaluates, and processes Omnichannel visitor behavioral patterns, which helps better planned effective marketing campaigns and improved user experience. Sitecore provides an engagement plan to control how your website interacts with visitors. It helps nurture relationships with your visitors by adapting personalized communication based on which state they are falling. Sitecore provides an in-depth geolocation service, helpful in optimizing campaigns through segmentation, personalization, and profiling strategies. The Sitecore Device Detection service is helpful in personalizing user experience or promotions based on the device they use. It provides different dimensions and reports to reflect data on full taxonomy provided in Marketing Control Panel. It provides different charting controls to get smart reports. It has full flexibility for developers to customize or extend all these features. High performance and scalability Sitecore supports heavy content management and content delivery usage with a large volume of data. Sitecore is architected for high performance and unlimited scalability. Sitecore cache engine provides caching on the raw data as well as rendered output data, which gives a high-performance platform. Sitecore uses the event queue concept for scalability. Theoretically, it makes Sitecore scalable to any number of instances under a load balancer. Summary In this article, we discussed about the importance of Sitecore and its good features. We also saw that Sitecore XP is not only an enterprise-level CMS, but also a web platform, which is the global leader in experience management. Resources for Article: Further resources on this subject: Building a Recommendation Engine with Spark [article] Configuring a MySQL linked server on SQL Server 2008 [article] Features and utilities in SQL Developer Data Modeler [article]
Read more
  • 0
  • 0
  • 36942

article-image-getting-started-apache-hadoop-and-apache-spark
Packt
22 Apr 2016
12 min read
Save for later

Getting Started with Apache Hadoop and Apache Spark

Packt
22 Apr 2016
12 min read
In this article by Venkat Ankam, author of the book, Big Data Analytics with Spark and Hadoop, we will understand the features of Hadoop and Spark and how we can combine them. (For more resources related to this topic, see here.) This article is divided into the following subtopics: Introducing Apache Spark Why Hadoop + Spark? Introducing Apache Spark Hadoop and MapReduce have been around for 10 years and have proven to be the best solution to process massive data with high performance. However, MapReduce lacked performance in iterative computing where the output between multiple MapReduce jobs had to be written to Hadoop Distributed File System (HDFS). In a single MapReduce job as well, it lacked performance because of the drawbacks of the MapReduce framework. Let's take a look at the history of computing trends to understand how computing paradigms have changed over the last two decades. The trend was to reference the URI when the network was cheaper (in 1990), Replicate when storage became cheaper (in 2000), and Recompute when memory became cheaper (in 2010), as shown in Figure 1: Figure 1: Trends of computing So, what really changed over a period of time? Over a period of time, tape is dead, disk has become tape, and SSD has almost become disk. Now, caching data in RAM is the current trend. Let's understand why memory-based computing is important and how it provides significant performance benefits. Figure 2 indicates that data transfer rates from various mediums to the CPU. Disk to CPU is 100 MB/s, SSD to CPU is 600 MB/s, and over a network to CPU is 1 MB to 1 GB/s. However, the RAM to CPU transfer speed is astonishingly fast, which is 10 GB/s. So, the idea is to cache all or partial data in memory so that higher performance can be achieved. Figure 2: Why memory? Spark history Spark started in 2009 as a research project in the UC Berkeley RAD Lab, that later became AMPLab. The researchers in the lab had previously been working on Hadoop MapReduce and observed that MapReduce was inefficient for iterative and interactive computing jobs. Thus, from the beginning, Spark was designed to be fast for interactive queries and iterative algorithms, bringing in ideas such as support for in-memory storage and efficient fault recovery. In 2011, AMPLab started to develop high-level components in Spark, such as Shark and Spark Streaming. These components are sometimes referred to as Berkeley Data Analytics Stack (BDAS). Spark was first open sourced in March 2010 and transferred to the Apache Software Foundation in June 2013, where it is now a top-level project. In February 2014, it became a top-level project at the Apache Software Foundation. Spark has since become one of the largest open source communities in big data. Now, over 250+ contributors in 50+ organizations are contributing to Spark development. User base has increased tremendously from small companies to Fortune 500 companies.Figure 3 shows the history of Apache Spark: Figure 3: The history of Apache Spark What is Apache Spark? Let's understand what Apache Spark is and what makes it a force to reckon with in big data analytics: Apache Spark is a fast enterprise-grade large-scale data processing, which is interoperable with Apache Hadoop. It is written in Scala, which is both an object-oriented and functional programming language that runs in a JVM. Spark enables applications to distribute data in-memory reliably during processing. This is the key to Spark's performance as it allows applications to avoid expensive disk access and performs computations at memory speeds. It is suitable for iterative algorithms by having every iteration access data through memory. Spark programs perform 100 times faster than MapReduce in-memory or 10 times faster on the disk (http://spark.apache.org/). It provides native support for Java, Scala, Python, and R languages with interactive shells for Scala, Python, and R. Applications can be developed easily and often 2 to 10 times less code is needed. Spark powers a stack of libraries including Spark SQL and DataFrames for interactive analytics, MLlib for machine learning, GraphX for graph processing, and Spark Streaming for real-time analytics. You can combine these features seamlessly in the same application. Spark runs on Hadoop, Mesos, standalone resource managers, on-premise hardware, or in the cloud. What Apache Spark is not Hadoop provides us with HDFS for storage and MapReduce for compute. However, Spark does not provide any specific storage medium. Spark is mainly a compute engine, but you can store data in-memory or on Tachyon to process it. Spark has the ability to create distributed datasets from any file stored in the HDFS or other storage systems supported by Hadoop APIs (including your local filesystem, Amazon S3, Cassandra, Hive, HBase, Elasticsearch, and so on). It's important to note that Spark is not Hadoop and does not require Hadoop to run. It simply has support for storage systems implementing Hadoop APIs. Spark supports text files, SequenceFiles, Avro, Parquet, and any other Hadoop InputFormat. Can Spark replace Hadoop? Spark is designed to interoperate with Hadoop. It's not a replacement for Hadoop but for the MapReduce framework on Hadoop. All Hadoop processing frameworks (Sqoop, Hive, Pig, Mahout, Cascading, Crunch, and so on) using MapReduce as the engine now use Spark as an additional processing engine. MapReduce issues MapReduce developers faced challenges with respect to performance and converting every business problem to a MapReduce problem. Let's understand the issues related to MapReduce and how they are addressed in Apache Spark: MapReduce (MR) creates separate JVMs for every Mapper and Reducer. Launching JVMs takes time. MR code requires a significant amount of boilerplate coding. The programmer needs to think and design every business problem in terms of Map and Reduce, which makes it a very difficult program. One MR job can rarely do a full computation. You need multiple MR jobs to finish the complete task and the programmer needs to design and keep track of optimizations at all levels. An MR job writes the data to the disk between each job and hence is not suitable for iterative processing. A higher level of abstraction, such as Cascading and Scalding, provides better programming of MR jobs, but it does not provide any additional performance benefits. MR does not provide great APIs either. MapReduce is slow because every job in a MapReduce job flow stores data on the disk. Multiple queries on the same dataset will read the data separately and create a high disk I/O, as shown in Figure 4: Figure 4: MapReduce versus Apache Spark Spark takes the concept of MapReduce to the next level to store the intermediate data in-memory and reuse it, as needed, multiple times. This provides high performance at memory speeds, as shown in Figure 4. If I have only one MapReduce job, does it perform the same as Spark? No, the performance of the Spark job is superior to the MapReduce job because of in-memory computations and shuffle improvements. The performance of Spark is superior to MapReduce even when the memory cache is disabled. A new shuffle implementation (sort-based shuffle instead of hash-based shuffle), a new network module (based on netty instead of using block manager to send shuffle data), and a new external shuffle service make Spark perform the fastest petabyte sort (on 190 nodes with 46TB RAM) and terabyte sort. Spark sorted 100 TB of data using 206 EC2 i2.8x large machines in 23 minutes. The previous world record was 72 minutes, set by a Hadoop MapReduce cluster of 2,100 nodes. This means that Spark sorted the same data 3x faster using 10x less machines. All the sorting took place on the disk (HDFS) without using Spark's in-memory cache (https://databricks.com/blog/2014/10/10/spark-petabyte-sort.html). To summarize, here are the differences between MapReduce and Spark: MapReduce Spark Ease of use Not easy to code and use Spark provides a good API and is easy to code and use Performance Performance is relatively poor when compared with Spark In-memory performance Iterative processing Every MR job writes the data to the disk and the next iteration reads from the disk Spark caches data in-memory Fault Tolerance Its achieved by replicating the data in HDFS Spark achieves fault tolerance by resilient distributed dataset (RDD) lineage Runtime Architecture Every Mapper and Reducer runs in a separate JVM Tasks are run in a preallocated executor JVM Shuffle Stores data on the disk Stores data in-memory and on the disk Operations Map and Reduce Map, Reduce, Join, Cogroup, and many more Execution Model Batch Batch, Interactive, and Streaming Natively supported Programming Languages Java Java, Scala, Python, and R Spark's stack Spark's stack components are Spark Core, Spark SQL and DataFrames, Spark Streaming, MLlib, and Graphx, as shown in Figure 5: Figure 5: The Apache Spark ecosystem Here is a comparison of Spark components versus Hadoop components: Spark Hadoop Spark Core MapReduce Apache Tez Spark SQL and DataFrames Apache Hive Impala Apache Tez Apache Drill Spark Streaming Apache Storm Spark MLlib Apache Mahout Spark GraphX Apache Giraph To understand the framework at a higher level, let's take a look at these core components of Spark and their integrations: Feature Details Programming languages Java, Scala, Python, and R. Scala, Python, and R shell for quick development. Core execution engine Spark Core: Spark Core is the underlying general execution engine for the Spark platform and all the other functionality is built on top of it. It provides Java, Scala, Python, and R APIs for the ease of development. Tungsten: This provides memory management and binary processing, cache-aware computation and code generation. Frameworks Spark SQL and DataFrames: Spark SQL is a Spark module for structured data processing. It provides a programming abstraction called DataFrames and can also act as a distributed SQL query engine. Spark Streaming: Spark Streaming enables us to build scalable and fault-tolerant streaming applications. It integrates with a wide variety of data sources, including filesystems, HDFS, Flume, Kafka, and Twitter. MLlib: MLlib is a machine learning library to create data products or extract deep meaning from the data. MLlib provides a high performance because of in-memory caching of data. Graphx: GraphX is a graph computation engine with graph algorithms to build graph applications. Off-heap storage Tachyon: This provides reliable data sharing at memory speed within and across cluster frameworks/jobs. Spark's default OFF_HEAP (experimental) storage is Tachyon. Cluster resource managers Standalone: By default, applications are submitted to the standalone mode cluster and each application will try to use all the available nodes and resources. YARN: YARN controls the resource allocation and provides dynamic resource allocation capabilities. Mesos: Mesos has two modes, Coarse-grained and Fine-grained. The coarse-grained approach has a static number of resources just like the standalone resource manager. The fine-grained approach has dynamic resource allocation just like YARN. Storage HDFS, S3, and other filesystems with the support of Hadoop InputFormat. Database integrations HBase, Cassandra, Mongo DB, Neo4J, and RDBMS databases. Integrations with streaming sources Flume, Kafka and Kinesis, Twitter, Zero MQ, and File Streams. Packages http://spark-packages.org/ provides a list of third-party data source APIs and packages. Distributions Distributions from Cloudera, Hortonworks, MapR, and DataStax. The Spark ecosystem is a unified stack that provides you with the power of combining SQL, streaming, and machine learning in one program. The advantages of unification are as follows: No need of copying or ETL of data between systems Combines processing types in one program Code reuse One system to learn One system to maintain An example of unification is shown in Figure 6: Figure 6: Unification of the Apache Spark ecosystem Why Hadoop + Spark? Apache Spark shines better when it is combined with Hadoop. To understand this, let's take a look at Hadoop and Spark features. Hadoop features The Hadoop features are described as follows: Feature Details Unlimited scalability Stores unlimited data by scaling out HDFS Effectively manages the cluster resources with YARN Runs multiple applications along with Spark Thousands of simultaneous users Enterprise grade Provides security with Kerberos authentication and ACLs authorization Data encryption High reliability and integrity Multitenancy Wide range of applications Files: Strucutured, semi-structured, or unstructured Streaming sources: Flume and Kafka Databases: Any RDBMS and NoSQL database Spark features The Spark features are described as follows: Feature Details Easy development No boilerplate coding Multiple native APIs: Java, Scala, Python, and R REPL for Scala, Python, and R In-memory performance RDDs Direct Acyclic Graph (DAG) to unify processing Unification Batch, SQL, machine learning, streaming, and graph processing When both frameworks are combined, we get the power of enterprise-grade applications with in-memory performance, as shown in Figure 7: Figure 7: Spark applications on the Hadoop platform Frequently asked questions about Spark The following are the frequent questions that practitioners raise about Spark: My dataset does not fit in-memory. How can I use Spark? Spark's operators spill data to the disk if it does not fit in-memory, allowing it to run well on data of any size. Likewise, cached datasets that do not fit in-memory are either spilled to the disk or recomputed on the fly when needed, as determined by the RDD's storage level. By default, Spark will recompute the partitions that don't fit in-memory. The storage level can be changed to MEMORY_AND_DISK to spill partitions to the disk. Figure 8 shows the performance difference in fully cached versus on the disk:Figure 8: Spark performance: Fully cached versus on the disk How does fault recovery work in Spark? Spark's in-built fault tolerance based on RDD lineage will automatically recover from failures. Figure 9 shows the performance over failure in the 6th iteration in a k-means algorithm: Figure 9: Fault recovery performance Summary In this article, we saw an introduction to Apache Spark and the features of Hadoop and Spark and discussed how we can combine them together. Resources for Article: Further resources on this subject: Adding a Spark to R[article] Big Data Analytics[article] Big Data Analysis (R and Hadoop)[article]
Read more
  • 0
  • 0
  • 4265
Unlock access to the largest independent learning library in Tech for FREE!
Get unlimited access to 7500+ expert-authored eBooks and video courses covering every tech area you can think of.
Renews at $19.99/month. Cancel anytime
article-image-introducing-dynamics-crm
Packt
21 Apr 2016
13 min read
Save for later

Introducing Dynamics CRM

Packt
21 Apr 2016
13 min read
In this article by Nicolae Tarla, the author of Microsoft Dynamics CRM 2016 Customization, you will learn about the Customer Relationship Management (CRM) market and the huge uptake it has seen in the last few years. Some of the drivers for this market are the need to enhance customer experience, provide faster and better services, and adapting to the customer’s growing digital presence. CRM systems, in general, are taking a central place in the new organizational initiatives. (For more resources related to this topic, see here.) Dynamics CRM is Microsoft’s response to a growing trend. The newest version is Dynamics CRM 2016. It is being offered in a variety of deployment scenarios. From the standard on-premise deployment to a private cloud or an online cloud offering from Microsoft, the choice depends on each customer, their type of project, and a large number of requirements, policies, and legal restrictions. We’ll first look at what environment we need to complete the examples presented. We will create a new environment based on a Microsoft Dynamics CRM Online trial. This approach will give us 30-day trial to experiment with an environment for free. The following topics will be covered: Introducing Dynamics CRM Dynamics CRM features Deployment models Global datacenter locations Customization requirements Getting setup Dynamics CRM 2016 is the current version of the popular Customer Relationship Management platform offered by Microsoft. This platform offers users the ability to integrate and connect data across their sales, marketing, and customer service activities, and to give staff an overall 360-degree view of all interactions and activities as they relate to a specific customer. Along with the standard platform functionality provided, we have a wide range of customization options, allowing us to extend and further customize solutions to solve a majority of other business requirements. In addition, we can integrate this platform with other applications, and create a seamless solution. While by no means the only available CRM platform on the market today, Microsoft Dynamics CRM 2016, is one of the fastest growing, gaining large acceptance at all levels from small to mid-size and enterprise level organizations. This is due to a multitude of reasons, some of which include the variety of deployment options, the scalability, the extensibility, the ease of integration with other systems, and the ease of use. Microsoft Dynamics CRM can be deployed in a variety of options. Starting with the offering from Microsoft, you can get CRM Online. Once we have a 30-day trial active, this can be easily turned into a full production environment by providing payment information and keeping the environment active. The data will live in the cloud, on one of the data centers provided by Microsoft. Alternatively, you can obtain hosting with a third-party provider. The whole environment can be hosted by a third party, and the service can be offered either as a SaaS solution, or a fully hosted environment. Usually, there is a difference in the way payment is processed, with a SaaS solution in most cases being offered in a monthly subscription model. Another option is to have the environment hosted in-house. This option is called on premise deployment and carries the highest up-front cost but gives you the ability to customize the system extensively. In addition to the higher up-front cost, the cost to maintain the environment, the hardware, and skilled people required to constantly administer the environment can easily add-up. As of recently, we now have the ability to host a virtual CRM environment in Azure. This offloads the cost of maintaining the local infrastructure in a fashion similar to a third-party-hosted solution but takes advantage of the scalability and performance of a large cloud solution maintained and supported fully by Microsoft. The following white paper released by Microsoft describes the deployment model using Azure Virtual Machines: http://www.microsoft.com/en-us/download/details.aspx?id=49193 Features of Dynamics CRM Some of the most notable features of the Dynamics CRM platform include: Scalability Extensibility Ability to integrate with other systems Ease of use Let’s look at each of the features in more detail. Scalability Dynamics CRM can scale over a wide range of deployment options. From a single box deployment, used mostly for development, all the way to a cloud offering that can span over a large number of servers, and host a large number of environments, the same base solution can handle all the scenarios in between with ease. Extensibility Dynamics CRM is a platform in which the base offering comes with prepackaged functionality for Sales, Service, and Marketing; a large variety of solutions can be built on top of Dynamics CRM. The extensibility model is called xRM and allows power users, non-developers, and developers alike to build custom solutions to handle various other business scenarios or integrate with other third-party platforms. The Dynamics CRM Marketplace is a great example of such solutions that are built to extend the core platform, and are offered for sale by various companies. These companies are called Independent Software Vendors (ISVs) and play a very important role in the ecosystem created by Microsoft. In time and with enough experience, some of them become the go-to partners for various implementations. If nothing else, the Dynamics Marketplace is a cool place to look at some of the solutions created, and search for specific applications. The idea of the marketplace became public sometime around 2010 and was integrated into Dynamics CRM 2011. At launch, it was designed as a searchable repository of solutions. It is a win-win for both solution providers and customers alike. Solutions can also be rated, thus giving customers better community feedback before committing to purchasing and implementing a foreign solution into their organization. The Dynamics Marketplace is hosted on Pinpoint, Microsoft’s online directory of software applications and professional services. On this platform, independent companies and certified partners offer their products and services. At the time of this writing, Pinpoint hosts a few marketplaces, including Office, Azure, Dynamics, and Cloud, and is available at the following location: https://pinpoint.microsoft.com/en-CA/Home/Dynamics Navigating to the Dynamics page you are presented with a search option as seen in the following screenshot: You now have the option to filter your results by Solution providers, Services, or Apps (Applications). In addition, you can further filter your results by distance to a geo-location derived from an address or postal code, as well as other categories as illustrated in the following screenshot: When searching for a solution provider, the results provide a high-level view of the organization, with a logo and a high-level description. The Ratings and Competencies count are displayed for easy visibility as shown here: Drilling down into the partner profile page, you can find additional details on the organization, the industries focus, details on the competencies, as well as a way to connect with the organization. Navigation to additional details, including Reviews and Locations is available on the profile page. The Dynamics Marketplace is also available, starting with Dynamics CRM 2011, as a part of the organization. A user with necessary permission can navigate to Settings | Dynamics Marketplace. This presents the user with a view by solutions available. Options for sorting and filtering include Popular, Newest, and Featured. Community rating is clearly visible and provides the necessary feedback to consider when evaluating new solutions. Ability to integrate with other systems There is a large variety of integration options available when working with Dynamics CRM. In addition, various deployment options offer more or fewer integration features. With CRM Online, you tend to get more integration options into cloud services; whereas, the on-premise solution has a limited number of configurable integration options, but can provide more integration using various third-party tools. The base solution comes with the ability to configure integration with the following common services: SharePoint for document management Yammer for social features In addition, you can use specific connectors provided by either Microsoft or other third-party providers for integration with specific solutions. When the preceding options are not available, you can still integrate with other solutions using a third-party integration tool. This allows real-time integration into legacy systems. Some of the most popular tools used for integration include, but are not limited to: Kingsway Software (https://www.kingswaysoft.com/) Scribe (http://www.scribesoft.com/) BizTalk (http://www.microsoft.com/en-us/server-cloud/products/biztalk/) Ease of use Dynamics CRM offers users a variety of options to interact with the system. You can access Dynamics CRM either through a browser, with support for all recent versions of the major browsers. The following browsers and versions are supported: Internet Explorer—versions 10 and above Edge—latest version Chrome—latest version on Windows 7 and above Firefox—latest version on Windows 7 and above Safari on Mac—using the latest publicly released version on OS x 10.8 and above In addition, a user can interact with the system directly from the very familiar interface of Outlook. The Dynamics CRM connector for Outlook allows users to get access to all the system data and features from within Outlook. In addition, a set of functions built specifically for Outlook allows users to track and interact with e-mails, tasks, and events from within Outlook. Further to the features provided through the Outlook integration, users of CRM for Outlook have the ability to work offline. Data can be taken offline, work can be done when disconnected, and can be synchronized back into the system when connectivity resumes. For mobile users, Dynamics CRM can be accessed from mobile devices and tablets. Dynamics CRM provides a standard web-based interface for most mobile devices, as well as specific applications for various platforms including Windows-based tablets, iPads, and Android tablets. With these apps, you can also take a limited sub-set of cached data offline, as well as have the ability to create new records and synchronize them back to CRM next time you go online. The quality of these mobile offerings has increased exponentially over the last few versions, and new features are being added with each new release. In addition, third-party providers have also built mobile solutions for Dynamics CRM. A quick search in the application markets for each platform will reveal several options for each platform. Global Data Centre Locations for Dynamics CRM Online Dynamics CRM Online is hosted at various locations in the world. Preview organizations can be created in all available locations, but features are sometimes rolled out on a schedule, in some locations faster than others. The format of the Dynamics CRM Online Organization URL describes the data center location. As such, the standard format is as follows: https://OrganizationName.crm[x].dynamics.com The OrganizationName is the name you have selected for your online organization. This is customizable, and is validated for uniqueness within the respective data center. The [x] represents a number. As of this writing, this number can be anywhere between 2, 4, 5, 6, 7, 9, or no number at all. This describes the global data center used to host your organization. The following table maps the data center to the URL format: URL Format: crm[x].dynamics.com Global Data Centre Location crm.dynamics.com NAM crm2.dynamics.com SAM crm4.dynamics.com EMEA crm5.dynamics.com APAC crm6.dynamics.com OCE crm7.dynamics.com JPN crm9.dynamics.com GCC Out of these global locations, usually the following get a preview of the new features first: Organization Global Location crm.dynamics.com North America crm4.dynamics.com Europe, the Middle East and Africa crm5.dynamics.com Asia-Pacific New data centers are being added on a regular basis. As of this writing, new data centers are being added in Europe and Canada, with others to follow as needed. Some of the drivers behind adding these new data centers revolve around not only performance improvements, as a data center located closer to a customer will provide theoretically better performance, but also a need for privacy and localization of data. Strict legislation around data residency has a great impact on the selection of the deployment model by customers who are bound to store all data local to the country of the operation. Overall, by the end of 2016, the plan is to have Dynamics CRM Online available in 105 markets. These markets (countries) will be served by data centers spread across five generic global regions. These data centers share services between Dynamics CRM Online and other services such as Azure and Office 365. Advantages of choosing Dynamics CRM online Choosing one of the available hosting models for Dynamics CRM is now not only a matter of preference. The decision can be driven by multiple factors. During the last few years, there has been a huge push for the cloud. Microsoft has been very focused on enhancing their online offering, and has continued to push more functionality and more resources in supporting the cloud model. As such, Dynamics CRM Online has become a force to reckon with. It is hosted on a very modern and high performing infrastructure. Microsoft has pushed literally billions of dollars in new data centers and infrastructure. This allows new customers to forego the necessary expenses on infrastructure associated with an on-premise deployment. Along with investments on infrastructure, the SLA (service level agreement) offered by Dynamics CRM Online is financially backed by Microsoft. Depending on the service selected, the uptime is guaranteed and backed financially. Application and Infrastructure are automatically handled for you by Microsoft so you don’t have to. This translates in much lower upfront costs, as well as reduced costs around ongoing maintenance and upgrades. The Dynamics CRM Online offering is also compliant with various regulatory requirements, and backed and verified through various third-party tests. Various rules, regulations, and policies in various locales are validated and certified by various organizations. Some of the various compliance policies evaluated include but are not limited to: Data Privacy and Confidentiality Policies Data Classification Information Security Privacy Data Stewardship Secure Infrastructure Identity and Access Control All these compliance requirements are in conformance with regulations stipulated by the International Standard Organization and other international and local standards. Independent auditors validate standards compliance. Microsoft is ISO 27001 certified. The Microsoft Trust Center website located at http://www.microsoft.com/en-us/trustcenter/CloudServices/Dynamics provides additional information on compliance, responsibilities, and warranties. Further to the aforementioned benefits, choosing cloud over a standard on-premise deployment offers other advantages around scalability, faster time to market, and higher value proposition. In addition to the standard benefits of an online deployment, one other great advantage is the ability to spin-up a 30-day trial instance of Dynamics CRM Online and convert it to a paid instance only when ready to go to production. This allows customizers and companies to get started and customize their solution in a free environment, with no additional costs attached. The 30-day trial instance gives us a 25-license instance, which allows us to not only customize the organization, but also test various roles and restrictions. Summary We learned to create a new environment based on a Microsoft Dynamics CRM Online trial Resources for Article: Further resources on this subject: Customization in Microsoft Dynamics CRM[article] Introduction to Reporting in Microsoft Dynamics CRM[article] Using Processes in Microsoft Dynamics CRM 2011[article]
Read more
  • 0
  • 0
  • 1725

article-image-reusability-patterns
Packt
21 Apr 2016
17 min read
Save for later

Reusability patterns

Packt
21 Apr 2016
17 min read
In this article by Jaime Soriano Pastor and Allesandro Frachezi, the authors of Extending Puppet - Second Edition, you will learn that the modules reusability is a topic that has got more and more attention in the past few years; as more people started to use Puppet, more evident became the need of having some common and shared code to manage common things. The main characteristics of reusable modules are: They can be used by different people without the need to modify their content They support different OSes, and allow easy extension to new ones They allow users to override the default files provided by the module They might have an opinionated approach to the managed resources but don't force it They follow a single responsibility principle and should manage only the application they are made for Reusability, we must underline, is not an all-or-nothing feature; we might have different levels of reusability to fulfill the needs of a variant percentage of users. For example, a module might support Red Hat and Debian derivatives, but not Solaris or AIX; Is it reusable? If we use the latter OSes, definitively not, if we don't use them, yes, for us it is reusable. I am personally a bit extreme about reusability, and according to me, a module should also: Allow users to provide alternative classes for eventual dependencies from other modules, to ease interoperability Allow any kind of treatment of the managed configuration files, be that file- or setting-based Allow alternative installation methods Allow users to provide their own classes for users or other resources, which could be managed in custom and alternative ways Allow users to modify the default settings (calculated inside the module according to the underlining OS) for package and service names, file paths, and other more or less internal variables that are not always exposed as parameters. Expose parameters that allow removal of the resources provided by the module (this is a functionality feature more than a reusability one) Abstract monitoring and firewalling features so that they are not directly tied to specific modules or applications Managing files Everything is a file in UNIX, and Puppet most of the times manages files. A module can expose parameters that allow its users to manipulate configuration files and it can follow one or both the file/setting approaches, as they are not alternative and can coexist. To manage the contents of a file, Puppet provides different alternative solutions: Use templates, populated with variables that come from parameters, facts, or any scope (argument for the File type: content => template('modulename/path/templatefile.erb') Use static files, served by the Puppet server Manage the file content via concat (https://github.com/puppetlabs/puppetlabs-concat) a module that provides resources that allow to build a file joining different fragments. Manage the file contents via augeas, a native type that interfaces with the Augeas configuration editing tool (http://augeas.net/) Manage the contents with alternative in-file line editing tools For the first two cases, we can expose parameters that allow to define the module's main configuration file either directly via the source and content arguments, or by specifying the name of the template to be passed to the template() function: class redis (   $config_file           = $redis::params::file,   $config_file_source    = undef,   $config_file_template  = undef,   $config_file_content   = undef,   ) { Manage the configuration file arguments with: $managed_config_file_content = $config_file_content ? {     undef   => $config_file_template ? {       undef   => undef,       default => template($config_file_template),     },     default => $config_file_content,   } The $managed_config_file_content variable computed here takes the value of the $config_file_content, if present; otherwise, it uses the template defined with $config_file_template. If also this parameter is unset, the value is undef: if $redis::config_file {     file { 'redis.conf':       path    => $redis::config_file,       source  => $redis::config_file_source,       content => $redis::managed_config_file_content,     }   } } In this way, users can populate redis.conf via a custom template (placed in the site module): class { 'redis':   config_file_template => 'site/redis/redis.conf.erb', } Otherwise, they can also provide the content attribute directly: class { 'redis':   config_file_content => template('site/redis/redis.conf.erb'), } Finally, they can also provide a fileserver source path: class { 'redis':   config_file_source => 'puppet:///modules/site/redis/redis.conf', } In case users prefer to manage the file in other ways (augeas, concat, and so on), they can just include the main class, which, by default, does not manage the configuration file's contents and uses whatever method to alter them: class { 'redis': } A good module could also provide custom defines that allow easy and direct ways to alter configuration files' single lines, either using Augeas or other in-file line management tools. Managing configuration hash patterns If we want a full infrastructure as data setup and be able to manage all our configuration settings as data, we can follow two approaches, regarding the number, name, and kind of parameters to expose: Provide a parameter for each configuration entry we want to manage Provide a single parameter that expects a hash where any configuration entry may be defined The first approach requires a substantial and ongoing effort, as we have to keep our module's classes updated with all the current and future configuration settings our application may have. Its benefit is that it allows us to manage them as plain and easily readable data on, for example, Hiera YAML files. Such an approach is followed, for example, by the OpenStack modules (https://github.com/stackforge) where the configurations of the single components of OpenStack are managed on a settings-based approach, which is fed by the parameters of various classes and subclasses. For example, the Nova module (https://github.com/stackforge/puppet-nova) has many subclasses where the parameters that map to Nova's configuration entries are exposed and are applied via the nova_config native type, which is a basically a line editing tool that works line by line. An alternative and quicker approach is to just define a single parameter, like config_file_options_hash that accepts any settings as a hash: class openssh (   $config_file_options_hash   = { }, } Then, manage in a custom template the hash, either via a custom function, like the hash_lookup() provided by the stdmod shared module (https://github.com/stdmod/stdmod): # File Managed by Puppet [...]   Port <%= scope.function_hash_lookup(['Port','22']) %>   PermitRootLogin <%= scope.function_hash_lookup(['PermitRootLogin','yes']) %>   UsePAM <%= scope.function_hash_lookup(['UsePAM','yes']) %> [...] Otherwise, refer directly to a specific key of the config_file_options_hash parameter:  Port <%= scope.lookupvar('openssh::config_file_options_hash')['Port'] ||= '22' %>   PermitRootLogin <%= scope.lookupvar('openssh::config_file_options_hash')['PermitRootLogin'] ||= 'yes' %>   UsePAM <%= scope.lookupvar('openssh::config_file_options_hash')['UsePAM'] ||= 'yes' %> [...] Needless to say that Hiera is a good place to define these parameters; on a YAML-based backend, we can set these parameters with:  --- openssh::config_file_template: 'site/openssh/sshd_config.erb' 

openssh::config_file_options_hash:   Port: '22222'   PermitRootLogin: 'no' Otherwise, if we prefer to use an explicit parameterized class declaration: class { 'openssh':   config_file_template     => 'site/openssh/sshd_config.erb'   config_file_options_hash => {     Port            => '22222',     PermitRootLogin => 'no',   } } Managing multiple configuration files An application may have different configuration files and our module should provide ways to manage them. In these cases, we may have various options to implement in a reusable module: Expose parameters that let us configure the whole configuration directory Expose parameters that let us configure specific extra files Provide a general purpose define that eases management of configuration files To manage the whole configuration directory these parameters should be enough: class redis (   $config_dir_path            = $redis::params::config_dir,   $config_dir_source          = undef,   $config_dir_purge           = false,   $config_dir_recurse         = true,   ) {   $config_dir_ensure = $ensure ? {     'absent'  => 'absent',     'present' => 'directory',   }   if $redis::config_dir_source {     file { 'redis.dir':       ensure  => $redis::config_dir_ensure,       path    => $redis::config_dir_path,       source  => $redis::config_dir_source,       recurse => $redis::config_dir_recurse,       purge   => $redis::config_dir_purge,       force   => $redis::config_dir_purge,     }   } } Such a code would allow providing a custom location, on the Puppet Master, to use as source for the whole configuration directory:  class { 'redis':   config_dir_source => 'puppet:///modules/site/redis/conf/', } Provide a custom source for the whole config_dir_path and purge any unmanaged config file; all the destination files not present on the source directory would be deleted. Use this option only when we want to have complete control on the contents of a directory:  class { 'redis':   config_dir_source => [                   "puppet:///modules/site/redis/conf--${::fqdn}/", 
                  "puppet:///modules/site/redis/conf-${::role}/",                   'puppet:///modules/site/redis/conf/' ],   config_dir_purge  => true, } Consider that the source files, in this example, placed in the site module according to a naming hierarchy that allows overrides per node or role name, can only be static and not templates. If we want to provide parameters that allow direct management of alternative extra files, we can add parameters such as the following (stdmod compliant): class postgresql (   $hba_file_path             = $postgresql::params::hba_file_path,   $hba_file_template         = undef,   $hba_file_content          = undef,   $hba_file_options_hash     = { } ,   ) { […] } Finally, we can place in our module a general purpose define that allows users to provide the content for any file in the configuration directory. Here is an example https://github.com/example42/puppet-pacemaker/blob/master/manifests/conf.pp The usage is as easy as: pacemaker::conf { 'authkey':   source => 'site/pacemaker/authkey', } Managing users and dependencies Sometimes a module has to create a user or have some prerequisite packages installed in order to have its application running correctly. These are the kind of "extra" resources that can create conflicts among modules, as we may have them already defined somewhere else in the catalog via other modules. For example, we may want to manage users in our own way and don't want them to be created by an application module, or we may already have classes that manage the module's prerequisite. There's not a universally defined way to cope with these cases in Puppet, if not the principle of single point of responsibility, which might conflict with the need to have a full working module, when it requires external prerequisites. My personal approach, which I've not seen being used around, is to let the users define the name of alternative classes, if any, where such resources can be managed. On the code side, the implementation is quite easy: class elasticsearch (   $user_class          = 'elasticsearch::user',   ) { [...]   if $elasticsearch::user_class {     require $elasticsearch::user_class   } Also, of course, in elasticsearch/manifests/user.pp, we can define the module's default elasticsearch::user class. Module users can provide custom classes with: class { 'elasticsearch':   user_class => 'site::users::elasticsearch', } Otherwise, they decide to manage users in other ways and unset any class name: class { 'elasticsearch':   user_class => '', } Something similar can be done for a dependency class or other classes. In an outburst of a reusability spree, in some cases, I added parameters to let users define alternative classes for the typical module classes: class postgresql (   $install_class             = 'postgresql::install',   $config_class              = 'postgresql::config',   $setup_class               = 'postgresql::setup',   $service_class             = 'postgresql::service',   [… ] ) { […] } Maybe this is really too much, but, for example, giving users the option to define the install class to use, and have it integrated in the module's own relationships logic, may be useful for cases where we want to manage the installation in a custom way. Managing installation options Generally, it is recommended to always install applications via packages, eventually to be created onsite when we can't find fitting public repositories. Still, sometimes, we might need to, have to, or want to install an application in other ways; for example just downloading its archive, extracting it, and eventually compiling it. It may not be a best practice, but still it can be done, and people do it. Another reusability feature a module may provide is alternative methods to manage the installation of an application. Implementation may be as easy as: class elasticsearch (   $install_class       = 'elasticsearch::install',   $install             = 'package',   $install_base_url    = $elasticsearch::params::install_base_url,   $install_destination = '/opt',   ) { These options expose both the install method to be used, the name of the installation class (so that it can be overridden), the URL from where to retrieve the archive, and the destination at which to install it. In init.pp, we can include the install class using the parameter that sets its name: include $install_class In the default install class file (here install.pp) manage the install parameter with a case switch: class elasticsearch::install {   case $elasticsearch::install {     package: {       package { $elasticsearch::package:         ensure   => $elasticsearch::managed_package_ensure,         provider => $elasticsearch::package_provider,       }     }     upstream: {       puppi::netinstall { 'netinstall_elasticsearch':         url             => $elasticsearch::base_url,         destination_dir => $elasticsearch::install_destination,         owner           => $elasticsearch::user,         group           => $elasticsearch::user,       }     }     default: { fail('No valid install method defined') }   } } The puppi::netinstall defined in the preceding code comes from a module of mine (https://github.com/example42/puppi) and it's used to download, extract, and eventually execute custom commands on any kind of archive. Users can therefore define which installation method to use with the install parameter and they can even provide another class to manage in a custom way the installation of the application. Managing extra resources Many times, we have in our environment some customizations that cannot be managed just by setting different parameters or names. Sometimes, we have to create extra resources, which no public module may provide as they are too custom and specific. While we can place these extra resources in any class, we may include in our nodes; it may be useful to link this extra class directly to our module, providing a parameter that lets us specify the name of an extra custom class, which, if present, is included (and contained) by the module: class elasticsearch (   $my_class            = undef,   ) { [...]   if $elasticsearch::my_class {     include $elasticsearch::my_class     Class[$elasticsearch::my_class] ->      Anchor['elasticsearch::end']   } } Another method to let users create extra resources by passing a parameter to a class is based on the create_resources function. We have already seen it; it creates all the resources of a given type from a nested hash where their names and arguments can be defined. Here is an example from https://github.com/example42/puppet-network: class network (   $interfaces_hash           = undef,   […] ) { […]   if $interfaces_hash {     create_resources('network::interface', $interfaces_hash)   } } In this case, the type used is network::interface (provided by the same module) and it can be fed with a hash. On Hiera, with the YAML backend, it could look like this: ---   network::interfaces_hash:     eth0:       method: manual       bond_master: 'bond3'       allow_hotplug: 'bond3 eth0 eth1 eth2 eth3'     eth1:       method: manual       bond_master: 'bond3'     bond3:       ipaddress: '10.10.10.3'       netmask: '255.255.255.0'       gateway: '10.10.10.1'       dns_nameservers: '8.8.8.8 8.8.4.4'       bond_mode: 'balance-alb'       bond_miimon: '100'       bond_slaves: 'none' Summary As we can imagine, the usage patterns that such a function allows are quite wide and interesting. Being able to base, on pure data, all the information we need to create a resource may definitively shift most of the logic and the implementation that is done with Puppet code and normal resources to the data backend.
Read more
  • 0
  • 0
  • 1741

article-image-hello-world-program
Packt
20 Apr 2016
12 min read
Save for later

Hello World Program

Packt
20 Apr 2016
12 min read
In this article by Manoj Kumar, author of the book Learning Sinatra, we will write an application. Make sure that you have Ruby installed. We will get a basic skeleton app up and running and see how to structure the application. (For more resources related to this topic, see here.) In this article, we will discuss the following topics: A project that will be used to understand Sinatra Bundler gem File structure of the application Responsibilities of each file Before we begin writing our application, let's write the Hello World application. Getting started The Hello World program is as follows: 1require 'sinatra'23 get '/' do4 return 'Hello World!'5 end The following is how the code works: ruby helloworld.rb Executing this from the command line will run the application and the server will listen to the 4567 port. If we point our browser to http://localhost:4567/, the output will be as shown in the following screenshot: The application To understand how to write a Sinatra application, we will take a small project and discuss every part of the program in detail. The idea We will make a ToDo app and use Sinatra along with a lot of other libraries. The features of the app will be as follows: Each user can have multiple to-do lists Each to-do list will have multiple items To-do lists can be private, public, or shared with a group Items in each to-do list can be assigned to a user or group The modules that we build are as follows: Users: This will manage the users and groups List: This will manage the to-do lists Items: This will manage the items for all the to-do lists Before we start writing the code, let's see what the file structure will be like, understand why each one of them is required, and learn about some new files. The file structure It is always better to keep certain files in certain folders for better readability. We could dump all the files in the home folder; however, that would make it difficult for us to manage the code: The app.rb file This file is the base file that loads all the other files (such as, models, libs, and so on) and starts the application. We can configure various settings of Sinatra here according to the various deployment environments. The config.ru file The config.ru file is generally used when we need to deploy our application with different application servers, such as Passenger, Unicorn, or Heroku. It is also easy to maintain the different deployment environment using config.ru. Gemfile This is one of the interesting stuff that we can do with Ruby applications. As we know, we can use a variety of gems for different purposes. The gems are just pieces of code and are constantly updated. Therefore, sometimes, we need to use specific versions of gems to maintain the stability of our application. We list all the gems that we are going to use for our application with their version. Before we discuss how to use this Gemfile, we will talk about gem bundler. Bundler The gem bundler manages the installation of all the gems and their dependencies. Of course, we would need to install the gem bundler manually: gem install bundler This will install the latest stable version of bundler gem. Once we are done with this, we need to create a new file with the name Gemfile (yes, with a capital G) and add the gems that we will use. It is not necessary to add all the gems to Gemfile before starting to write the application. We can add and remove gems as we require; however, after every change, we need to run the following: bundle install This will make sure that all the required gems and their dependencies are installed. It will also create a 'Gemfile.lock' file. Make sure that we do not edit this file. It contains all the gems and their dependencies information. Therefore, we now know why we should use Gemfile. This is the lib/routes.rb path for folder containing the routes file. What is a route? A route is the URL path for which the application serves a web page when requested. For example, when we type http://www.example.com/, the URL path is / and when we type http://www.example.com/something/, /something/ is the URL path. Now, we need to explicitly define all the routes for which we will be serving requests so that our application knows what to return. It is not important to have this file in the lib folder or to even have it at all. We can also write the routes in the app.rb file. Consider the following examples: get '/' do # code end post '/something' do # code end Both of the preceding routes are valid. The get and post method are the HTTP methods. The first code block will be executed when a GET request is made on / and the second one will be executed when a POST request is made on /something. The only reason we are writing the routes in a separate file is to maintain clean code. The responsibility of each file will be clearly understood in the following: models/: This folder contains all the files that define model of the application. When we write the models for our application, we will save them in this folder. public/: This folder contains all our CSS, JavaScript, and image files. views/: This folder will contain all the files that define the views, such as HTML, HAML, and ERB files. The code Now, we know what we want to build. You also have a rough idea about what our file structure would be. When we run the application, the rackup file that we load will be config.ru. This file tells the server what environment to use and which file is the main application to load. Before running the server, we need to write a minimum code. It includes writing three files, as follows: app.rb config.ru Gemfile We can, of course, write these files in any order we want; however, we need to make sure that all three files have sufficient code for the application to work. Let's start with the app.rb file. The app.rb file This is the file that config.ru loads when the application is executed. This file, in turn, loads all the other files that help it to understand the available routes and the underlying model: 1 require 'sinatra' 2 3 class Todo < Sinatra::Base 4 set :environment, ENV['RACK_ENV'] 5 6 configure do 7 end 8 9 Dir[File.join(File.dirname(__FILE__),'models','*.rb')].each { |model| require model } 10 Dir[File.join(File.dirname(__FILE__),'lib','*.rb')].each { |lib| load lib } 11 12 end What does this code do? Let's see what this code does in the following: 1 require 'sinatra' //This loads the sinatra gem into memory. 3 class Todo < Sinatra::Base 4 set :environment, ENV['RACK_ENV'] 5 6 configure do 7 end 8 9 Dir[File.join(File.dirname(__FILE__),'models','*.rb')].each { |model| require model } 10 Dir[File.join(File.dirname(__FILE__),'lib','*.rb')].each { |lib| load lib } 11 12 end This defines our main application's class. This skeleton is enough to start the basic application. We inherit the Base class of the Sinatra module. Before starting the application, we may want to change some basic configuration settings such as logging, error display, user sessions, and so on. We handle all these configurations through the configure blocks. Also, we might need different configurations for different environments. For example, in development mode, we might want to see all the errors; however, in production we don’t want the end user to see the error dump. Therefore, we can define the configurations for different environments. The first step would be to set the application environment to the concerned one, as follows: 4 set :environment, ENV['RACK_ENV'] We will later see that we can have multiple configure blocks for multiple environments. This line reads the system environment RACK_ENV variable and sets the same environment for the application. When we discuss config.ru, we will see how to set RACK_ENV in the first place: 6 configure do 7 end The following is how we define a configure block. Note that here we have not informed the application that to which environment do these configurations need to be applied. In such cases, this becomes the generic configuration for all the environments and this is generally the last configuration block. All the environment-specific configurations should be written before this block in order to avoid code overriding: 9 Dir[File.join(File.dirname(__FILE__),'models','*.rb')].each { |model| require model } If we see the file structure discussed earlier, we can see that models/ is a directory that contains the model files. We need to import all these files in the application. We have kept all our model files in the models/ folder: Dir[File.join(File.dirname(__FILE__),'models','*.rb')] This would return an array of files having the .rb extension in the models folder. Doing this, avoids writing one require line for each file and modifying this file again: 10 Dir[File.join(File.dirname(__FILE__),'lib','*.rb')].each { |lib| load lib } Similarly, we will import all the files in the lib/ folder. Therefore, in short, the app.rb configures our application according to the deployment environment and imports the model files and the other library files before starting the application. Now, let's proceed to write our next file. The config.ru file The config.ru is the rackup file of the application. This loads all the gems and app.rb. We generally pass this file as a parameter to the server, as follows: 1 require 'sinatra' 2 require 'bundler/setup' 3 Bundler.require 4 5 ENV["RACK_ENV"] = "development" 6 7 require File.join(File.dirname(__FILE__), 'app.rb') 8 9 Todo .start! W Working of the code Let's go through each of the lines, as follows: 1 require 'sinatra' 2 require 'bundler/setup' The first two lines import the gems. This is exactly what we do in other languages. The gem 'sinatra' command will include all the Sinatra classes and help in listening to requests, while the bundler gem will manage all the other gems. As we have discussed earlier, we will always use bundler to manage our gems. 3 Bundler.require This line of the code will check Gemfile and make sure that all the gems available match the version and all the dependencies are met. This does not import all the gems as all gems may not be needed in the memory at all times: 5 ENV["RACK_ENV"] = "development" This code will set the system environment RACK_ENV variable to development. This will help the server know which configurations does it need to use. We will later see how to manage a single configuration file with different settings for different environments and use one particular set of configurations for the given environment. If we use version control for our application, config.ru is not version controlled. It has to be customized on whether our environment is development, staging, testing, or production. We may version control a sample config.ru. We will discuss this when we talk about deploying our application. Next, we will require the main application file, as follows: 7 require File.join(File.dirname(__FILE__), 'app.rb') We see here that we have used the File class to include app.rb: File.dirname(__FILE__) It is a convention to keep config.ru and app.rb in the same folder. It is good practice to give the complete file path whenever we require a file in order to avoid breaking the code. Therefore, this part of the code will return the path of the folder containing config.ru. Now, we know that our main application file is in the same folder as config.ru, therefore, we do the following: File.join(File.dirname(__FILE__), 'app.rb') This would return the complete file path of app.rb and the line 7 will load the main application file in the memory. Now, all we need to do is execute app.rb to start the application, as follows: 9 Todo .start! We see that the start! method is not defined by us in the Todo class in app.rb. This is inherited from the Sinatra::Base class. It starts the application and listens to incoming requests. In short, config.ru checks the availability of all the gems and their dependencies, sets the environment variables, and starts the application. The easiest file to write is Gemfile. It has no complex code and logic. It just contains a list of gems and their version details. Gemfile In Gemfile, we need to specify the source from where the gems will be downloaded and the list of the gems. Therefore, let's write a Gemfile with the following lines: 1 source 'https://rubygems.org' 2 gem 'bundler', '1.6.0' 3 gem 'sinatra', '1.4.4' The first line specifies the source. The https://rubygems.org website is a trusted place to download gems. It has a large collection of gems hosted. We can view this page, search for gems that we want to use, read the documentation, and select the exact version for our application. Generally, the latest stable version of bundler is used. Therefore, we search the site for bundler and find out its version. We do the same for the Sinatra gem. Summary In this article, you learned how to build a Hello World program using Sinatra. Resources for Article: Further resources on this subject: Getting Ready for RubyMotion[article] Quick start - your first Sinatra application[article] Building tiny Web-applications in Ruby using Sinatra[article]
Read more
  • 0
  • 0
  • 11808

article-image-working-compliance
Packt
19 Apr 2016
10 min read
Save for later

Working With Compliance

Packt
19 Apr 2016
10 min read
In this article by Abhijeet Shriram Janwalkar, the author of VMware vRealize Configuration Manager Cookbook, we will discuss how to check compliance, create exceptions so that we don't get any false positives, and finally, create some Alert Rules that will alert us when non-compliant rules are found. (For more resources related to this topic, see here.) Checking the compliance of the infrastructure After creating all the rules, rule groups, and templates, we need to check the compliance of the infrastructure. We will learn how to check how compliant we are against internal standards, or we can directly use standard compliance packs we have already downloaded and imported. We will use a standard imported template for this recipe. Getting ready All the heavy lifting should have been done on the VCM server: it should be ready with the templates and at least one machine group, which will have the machines for whom we need to check the compliance, or we can use the default machine groups available. Using our own machine group is preferable. How to do it... As mentioned earlier, we will use an imported standard template, International Organization for Standardization 27001-27002- Windows 2008 R2 Mbr Server Controls, and we will run this against the default machine group, All Machines. Follow these steps to check the compliance of the Windows servers: Once logged in to VCM, go to Compliance | Templates Make sure the correct machine group is selected from the top; this is how VCM decides which machines to apply the template to measure the compliance. If you want to change the machine group, click on the Machine Group, and from the popup, select the correct machine group. Select the required template from the right-hand side and click on Run.   Depending upon the organization policies, decide to enforce or not enforce the compliance. Not all rules are enforceable; also, we can cause issues such as breaking a working application; for example, if the print spooler service is required to be disabled and we disable the service when we enforce the compliance, this will create an issue on the printer farm as it will stop functioning. So it is better that we first learn what is non-compliant and then make necessary exceptions. We can then enforce compliance from VCM or can ask the respective server owners to take the necessary action.   In a few minutes, depending on how many machines there are to check in the machine group, the compliance run will finish. Click on Close.   The compliance status can be viewed by navigating to the template on the left-hand side and selecting the correct machine group from the top.   In our case, our support team needs to work a lot as we are non-compliant. How it works... When we ask VCM to check compliance, it first applies the filters available in the rule groups, and then, only the machines that pass those filters are considered. The compliance checks are performed on the data collected by VCM and are available in the database, unlike some other tools that perform the checks at the client end, after which the client submits the data to VCM. The process followed by VCM is better as this can be performed on servers that are offline at that time, and when we check the result, we get the value because of which the machine is non-compliant for a rule. Again, this has some issues as well: first, we need to make sure our VCM is clean. By this, I mean whether a machine is purged from VCM when it is decommissioned, or else we will have details of machines that are not present in the infrastructure, and that could affect our final compliance score. The second issue is that it does not give us live details as it works on the data in its database; again, this can produce false positives. To counter this issue, we can schedule a compliance check after a full data collection for that machine group, in which way we will not have stale data to process. Once the compliance has been checked, and if we have chosen to enforce the compliance, it will create jobs to enforce them and will start executing on the managed machines; for example, if we have rules to check the status of a service and expect certain services in the running state, then VCM will start those services. Creating compliance exceptions As you know, every rule has an exception, and this is applicable to compliance as well: you create a rule for blocking the SMTP port on all the servers, and then, you have mail servers that need this port active. Now, we can't block the port, but at the same time, we know this is a known and accepted deviation; hence, we don't want our compliance score to suffer a hit because of this. To solve this, what we can do is add an exception so that this will not create issues while checking compliance. Getting ready Our organization has a policy to disable unwanted services on servers, and the print spooler is considered an unwanted service, so it must be disabled on all the servers but, of course, the exceptions are the print servers. We will create an exception for the print server machine group to be excused from this mandate. We will need rules created in VCM along with a machine group that will include all the print servers. How to do it... Let's create an exception for our print servers by following these steps: Log in to VCM and go to Compliance | Machine Group Compliance | Exceptions. Click on Add.   Provide a descriptive Name and Description, and click on Next.   Select the template for which you want this machine group to be excluded. In this case, we are selecting the one created by our organization rules. Click on Next.   Select the machine group created for this exception; in our case, it is named Print Servers. Click on Next. Select Override non-compliant results to compliant.  I really don't know why there is another option, but there must be a use case that I am not aware of.   We want this exception only for our rule for the print spooler server, called Service_Print_Spooler; so select that rule. Depending upon you requirement, you can have the exception for a complete rule group as well. But having exception for a single rule is sufficient in our case. Click on Finish.   You can enable/disable this exception as per requirement.   How it works... Exceptions are considered when we do a compliance check, and a final score is calculated. By creating an exception, we make sure that we don't get a bad score just because we need to have some things non-compliant. Also, this helps when we are enforcing compliance like in the earlier case, where we enforced Service Status to be disabled then VCM disabled the print spooler service on all the servers including the print servers, and that would have affected productivity. So, creating compliance exception is a win-win situation for both teams: the security team has a nice compliant environment and the printer admin team has a working print farm. Creating compliance alert rules Nobody likes to wait and nobody likes to work on Excel, so what if we get a ticket in our ticketing tool if a managed machine is non-complaint. We can create alerts and then maybe integrate them with a ticketing tool that can create a ticket for us, or VCM can send an e-mail to configured e-mail IDs. Getting ready We will need a working VCM server that is configured to check compliance. How to do it... This is a two-step process; first, we need to create an alert rule and then associate the created alert with the machine group. So, let's begin creating a compliance alert: Log in to VCM and go to Administration | Alerts | Rules. Click on Add.   In the wizard, give descriptive name and add a description. Click on Next. Select Compliance Results Data as the Data Type and click on Next.   As we want to create an alert for the non-compliancy of the rules created for our organization standards, select the appropriate compliance template. If we want an alert for the ISO 27001-27002 standard, we should have opted for that template.   On the next page, accept the newly create rule by clicking on Finish (the button is not in the screenshot).   The next process is to associate this Rule to correct Machine group. So we will continue to step 6. Now move to Administration | AlertsàMachine Group Configuration, and select the Machine group for which you would like the alert to be generated and click Add.   Select the alert we created and click on Next.   Select Severity and click on Next (not shown in the screenshot).   Select the actions that need to be done when the alert is created: we can send an e-mail, we can send an SNMP trap to a monitoring system or VCO that will create an alert in the organization ticketing system, or we can write the log to the Windows event log, and then, from there it will be picked up by the monitoring system to create a ticket. We are choosing to send an e-mail to the concerned people or teams.   Provide the details of who should receive the e-mail, the sender's e-mail ID, the SMTP server, e-mail subject, and modify the message body.   If required, you can check for alerts and click on Finish to close the wizard (the button is not shown in the screenshot).   The alerts can be seen at Console| Alerts.   How it works... We can't just depend on reports for checking compliance, even though that is a good way to check the status, but getting alerts for a non-compliant machine can be more proactive than going through a report. When we check or schedule a check for compliance, the result can be stored in the VCM database and is fetched when we visit the Compliance tab on the VCM console, Alerts provide a more proactive approach: they tell you that there is something wrong and you need to check it, so after every compliance check, if that machine group has something non-compliant, an alert will be created and that will take configured actions like sending an e-mail, sending an SNMP trap, or writing an event to the Windows logs. Those can be proactively worked upon rather than going to the console and checking the reports.   Summary In this article, we learned how to check compliance using a standard imported template. We also learned how to create exceptions for our compliance rules so that standard services can be run without causing our score to go down. Finally, we looked at alerts and ticketing systems. Resources for Article: Further resources on this subject: VM, It Is Not What You Think! [article] vRealize Automation and the Deconstruction of Components [article] Deploying New Hosts with vCenter [article]
Read more
  • 0
  • 0
  • 2967
article-image-introducing-swift-programming-language
Packt
19 Apr 2016
25 min read
Save for later

Introducing the Swift Programming Language

Packt
19 Apr 2016
25 min read
In this article by Steven Daniel, the author of Apple Watch App Development, we will introduce ourselves to Apple's Swift programming language. At WWDC 2014, Apple introduced a brand new programming language called Swift. The Swift programming language brings concise syntax, type safety, and modern programming language features to Mac and iOS developers. (For more resources related to this topic, see here.) Since its release, the Apple developer community has responded with great excitement, and developers are rapidly starting to adopt this new language within their own applications. The Swift language is the future of developing on Apple's platforms. This article includes the following topics: Learning how to register as an Apple developer Learning how to download and install Xcode development tools Introduction to the Swift programming language Learning how to work with Xcode playgrounds Introduction to the newest additions in Swift 2.0 Registering as an Apple developer Before you can begin building iOS applications for your iOS devices, you must first join as a registered user of Apple Developer Program in order to download all of the necessary components to your computer. The registration process is free and provides you with access to the iOS SDK and other developer resources that are really useful to get you started. The following short list outlines some of the things that you will be able to access once you become a registered member of Apple Developer Program: It provides helpful “Getting Started” guides to help you get up and running quickly It gives you helpful tips that show you how to submit your apps to App Store It provides the ability to download the current releases of iOS software It provides the ability to beta test the releases of iOS and the iOS SDK It provides access to Apple Developer Forums Whether you develop applications for the iPhone or iPad, these use the same OS and iOS SDK that allows you to create universal apps that will work with each of these devices. On the other hand, Apple Watch uses an entirely different OS called watchOS. To prepare your computer for iOS development, you need to register as an Apple developer. This free process gives you access to the basic levels of development that allow you to test your app using iOS Simulator without the ability to sell your app on Apple App Store. The steps are as follows: To sign up to Apple Developer Program, you will need to go to https://developer.apple.com/programs/ and then click on the Enroll button to proceed, as shown in the following screenshot: Next, click on the Start Your Enrollment button, as shown in the following screenshot: Once you sign up, you will then be able to download the iOS SDK and proceed with installing it onto your computer. You will then become an official member of Apple Developer Program. You will then be able to download beta software so that you can test them on your actual device hardware as well as having the freedom to distribute your apps to your end users. In the next section, we will look at how to download and install Xcode development tools. Getting and installing Xcode development tools In this section, we will take a look at what Integrated Development Environments (IDEs) and Software Development Kits (SDKs) are needed to develop applications for the iOS platform, which is Apple's operating system for mobile devices. We will explain the importance of each tool's role in the development cycle and the tools required to develop applications for the iOS platform, which are as follows: An Intel-based Mac computer running OS X Yosemite (10.10.2) or later with the latest point release and security patches installed is required. This is so that you can install the latest version of the Xcode development tool. Xcode 6.4 or later is required. Xcode is the main development tool for iOS. You need Xcode 6.4 minimum as this version includes Swift 1.2, and you must be registered as an Apple developer. The iOS SDK consists of the following components: Component Description Xcode This is the main IDE that enables you to develop, edit, and debug your native applications for the iOS and Mac platforms using the Objective-C or Swift programming languages. iOS Simulator This is a Cocoa-based application that enables you to debug your iOS applications on your computer without the need of having an iOS device. There are many iOS features that simply won't work within Simulator, so a device is required if an application uses features such as the Core Location and MapKit frameworks. Instruments These are the analysis tools that help you optimize your applications and monitor memory leaks during the execution of your application in real time. Dashcode This enables you to develop web-based iOS applications and dashboard widgets. Once you are registered, you will need to download and install Xcode developer tools by performing the following steps: Begin by downloading and installing Xcode from Mac App Store at https://itunes.apple.com/au/app/xcode/id497799835?mt=12. Select either the Free or Install button on the App Store page. Once it completes the installation process, you will be able to launch Xcode.app from your Applications folder. You can find additional development tools from the Apple developer website at https://developer.apple.com/. In the next section, we will be look at what exactly Xcode playgrounds are and how you can use them to experiment with designing code algorithms prior to incorporating the code into your project. So, let's get started. Introduction to Xcode playgrounds A playground is basically an interactive Swift coding environment that displays the results of each statement as updates are made without having the need to compile and run a project. You can use playgrounds to learn and explore Swift, prototype parts of your app, and create learning environments for others. The interactive Swift code environment lets you experiment with algorithms, explore system APIs, and even create your very own custom views without the need of having to create a project. Once you perfect your code in the playground, simply move this code into your project. Given that playgrounds are highly interactive, they are a wonderful vehicle for distributing code samples with instructive documentation and can even be used as an alternative medium for presentations. With the new Xcode 7 IDE, you can incorporate rich text comments with bold, italic, and bulleted lists with the addition of having the ability to embed images and links. You can even embed resources and support Swift source code in the playground to make the experience incredibly powerful and engaging, while the visible code remains simple. Playgrounds provide you with the ability to do the following: Share curriculum to teach programming with beautiful text and interactive code Design a new algorithm and watch its results every step of the way Create new tests and verify that they work before promoting them into your test suite Experiment with new APIs to hone your Swift coding skills Turn your experiments into documentation with example code that runs directly within the playground Let's begin by opening the Xcode IDE and explore how to create a new playground file for the first time. Perform the following steps: Open the Xcode.app application either using the finder in your Applications directory or using Apple's Launchpad. If you've never created or opened an Xcode project before, you will be presented with the following screen: In the Welcome to Xcode dialog, select the Get started with a playground option. If this dialog doesn't appear, you can navigate to File | New | Playground… or simply press Shift + Option + Command + N. Next, enter SwiftLanguageBasics as the name of your playground. Then, ensure that you choose iOS as the platform that we will target. Click on the Next button to proceed to the next step in the wizard. Specify the location where you would like to save your project. Then, click on the Create button to save your playground at the specified location. Once your project is created, you will be presented with the default playground template, as shown in the following screenshot: In the next section, you will begin learning about some of the Swift language basics, start adding lines of code within this playground file, and see the results that we get when they are executed. Introduction to the Swift language In this section, we will introduce some of the new and exciting features of the Swift programming language. So, let's get started. Variables, constants, strings, and semicolons Our next step is to familiarize ourselves with the differences between variables, constants, strings, and semicolons in a bit more detail. We will work with and use Xcode playgrounds to put each of these into practice. Variables A variable is a value that can change. Every variable contains a name, called the variable name, and must contain a data type. The data type indicates what sort of value the variable represents, such as whether it is an integer, a floating point number, or a string. Let's take a look at how we can put this into practice and create a variable in Swift. First, let's start by revealing the console output window by navigating to View | Debug Area | Show Debug Area. Next, clear the contents of the playground template and replace them with the following code snippet: /*: # Swift Language Basics – Variables : Created by Steven F. Daniel : Copyright © 2015 GENIESOFT STUDIOS. All Rights Reserved. */ var myGreeting = "Welcome to Learning the basics of Swift Programming" print(myGreeting, terminator: "") As you begin to type in the code, you should immediately see the Welcome to Learning the basics of Swift text magically appear in the right-hand pane in which the assignment takes place and it appears once more for the print statement. The right-hand pane is great to show you smaller output, but for longer debugging output, you would normally take a look at the Xcode console. Constants A constant is basically a value that cannot be changed. Creating these constant variables prevents you from performing accidental assignments and can even improve performance. Let's take a look at how we can put this into practice and create a constant variable in Swift. Clear the contents of the playground template and replace them with the following code snippet: /*: # Swift Language Basics – Constants : Created by Steven F. Daniel : Copyright © 2015 GENIESOFT STUDIOS. All Rights Reserved. */ let myGreeting = "Welcome to Learning the basics" print(myGreeting, terminator: "") myGreeting += " of Swift Programming" Take a look at the screenshot now: As you begin to type in the code, you will immediately receive an error message stating that you cannot assign myGreeting to our let value because the object is not mutable. In Swift, you can control the mutability of the built-in Swift types by using either the let or var keywords during declaration. Strings A string is basically an ordered collection of characters—for example "hello, world". In Swift, strings are represented by the String data type, which represents a collection of values of the char data type. You can use strings to insert constants, variables, literals, and expressions into longer strings in a process known as string interpolation, which we will cover later on in this article. This makes it easy to create custom string values for display, storage, and printing. Let's take a look at how we can put this into practice, create a String variable in Swift, and utilize some of the string methods. Clear the contents of the playground template and replace them with the following code snippet: /*: # Swift Language Basics – Strings : Created by Steven F. Daniel : Copyright © 2015 GENIESOFT STUDIOS. All Rights Reserved. */ import Foundation let myGreeting = "Welcome to Swift Language Basics, working with Strings" // Make our String uppercase print(myGreeting.uppercaseString) // Append exclamation mark at the end of the string var newGreeting = myGreeting.stringByAppendingString("!!!") print(newGreeting) Take a look at the screenshot now: As you can note in the preceding code snippet, we began by importing the Foundation framework class, which contains several APIs to deal with objects such as strings and dates. Next, we declared our myGreeting constant variable and then assigned a default string. We then used the uppercaseString method of the string object to perform a function to make all of the characters within our string uppercase. In our next step, we will declare a new variable called newGreeting and call the stringByAppendingString method to append additional characters at the end of our string. For more information on using the String class, you can consult the Swift programming language documentation at https://developer.apple.com/library/ios/documentation/Swift/Conceptual/Swift_Programming_Language/StringsAndCharacters.html. Semicolons As you would have probably noticed so far, the code you wrote doesn't contain any semicolons. This is because in Swift, these are only required if you want to write multiple statements on a single line. Let's take a look at a code example to see how we can put this into practice. Delete the contents of the playground template and replace them with the following code snippet: /*: # Swift Language Basics – Semicolons : Created by Steven F. Daniel : Copyright © 2015 GENIESOFT STUDIOS. All Rights Reserved. */ import Foundation var myGreeting = "Welcome to Swift Language" let newString = myGreeting + " Basics, ".uppercaseString + "working with semicolons"; print(newString) Take a look the following screenshot now: As you can note in the preceding code snippet, we began by declaring our myGreeting variable and then assigned a default string. In our next step, we declared a new variable called newString, concatenated the details from our myGreeting string, and used the uppercaseString method, which cycles through each character within our string, making our characters uppercase. Next, we appended the additional working with semicolons string to the end of our string and finally used the print statement to output the contents of our newString variable to the console window. As you must have noticed, we included a semicolon to the end of the statement; this is because in Swift, you are required to include semicolons if you want to write multiple statements on a single line. Numeric types and conversion In this section, we will take a look at how we can perform arithmetic operations on our Swift variables. In this example, we will look at how to calculate the area of a triangle, given a base and height value. Let's take a look at a code example to see how we can put this into practice. Clear the contents of the playground template and replace them with the following code snippet: /*: # Swift Language Basics - Numeric Types and Conversion : Created by Steven F. Daniel : Copyright © 2015 GENIESOFT STUDIOS. All Rights Reserved. */ import Foundation // method to calculate the area of a triangle func calcTriangleArea(triBase: Double, triHeight: Double) -> Double { return (triBase * triHeight) / 2 } // Declare our base and height of our triangle let base = 20.0 let height = 120.0 // Calculate and display the area of the triangle and print ("The calculated Area is: " + String(calcTriangleArea(base, triHeight: height))); Take a look at the following screenshot now: As you can note in the preceding code snippet, we started by creating our calcTriangleArea function method, which accepts a base and a height parameter value in order to calculate the area of the triangle. In our next step, we declared two variables, base and height, which contain the assigned values that will be used to calculate the base and the height of our triangle. Next, we made a call to our calcTriangleArea method, passing in the values for our base and height before finally using the print statement to output the calculated area of our triangle to the console window. An important feature of the Swift programming language is that all numeric data type conversions must be explicit, regardless of whether you want to convert to a data type containing more of less precision. Booleans, tuples, and string interpolation In this section, we will look at the various features that come with the Swift programming language. We will look at the improvements that Swift has over Objective-C when it comes to using Booleans and string interpolation before finally discussing how we can use tuples to access elements from a string. Booleans Boolean variables in Swift are basically defined using the Bool data type. This data type can only hold values containing either true or false. Let's take a look at a code example to see how we can put this into practice. Clear the contents of the playground template and replace them with the following code snippet: /*: # Swift Language Basics – Booleans : Created by Steven F. Daniel : Copyright © 2015 GENIESOFT STUDIOS. All Rights Reserved. */ import Foundation let displaySettings : Bool = true print("Display Settings is: " + (displaySettings ? "ON" : "OFF")) Take a look at the following screenshot now: As you can note from the preceding code snippet, we started by declaring our constant variable called displaySettings and assigned it a default Boolean value of true. Next, we performed a check to see whether the value of our displaySettings variable is set to true and called our print statement to output the Display Settings is: ON value to the console window. In Objective-C, you would assign a value of 1 and 0 to denote true and false; this is no longer the case with Swift because Swift doesn't treat 1 as true and 0 as false. You need to explicitly use the actual Boolean values to stay within Swift's data type system. Let's replace the existing playground code with the following code snippet to take a look at what would happen if we changed our value from true to 1: /*: # Swift Language Basics – Booleans : Created by Steven F. Daniel : Copyright © 2015 GENIESOFT STUDIOS. All Rights Reserved. */ import Foundation let displaySettings : Bool = 1 // This will cause an error!!! print("Display Settings is: " + (displaySettings ? "ON" : "OFF")) Take a look at the following screenshot now: As you can note from the previous screenshot, Swift detected that we were assigning an integer value to our Boolean data type and threw an error message. Tuples Tuples provide you with the ability to group multiple values into a single compound value. The values contained within a tuple can be any data type, and therefore are not required to be of the same type. Let's take a look at a code example, to see how we can put this into practice. Clear the contents of the playground template and replace them with the following code snippet: /*: # Swift Language Basics – Tuples : Created by Steven F. Daniel : Copyright © 2015 GENIESOFT STUDIOS. All Rights Reserved. */ import Foundation // Define our Address Details var addressDetails = ("Apple Inc.", "1 Infinite Loop", "Cupertino, California", "United States"); print(addressDetails.0) // Get the Name print(addressDetails.1) // Address print(addressDetails.2) // State print(addressDetails.3) // Country Take a look at the following screenshot now: As you can note from the preceding code snippet, we started by declaring a tuple variable called addressDetails that contains a combination of strings. Next, we accessed each of the tuple elements by referencing their index values and displayed each of these elements in the console window. Let's say that you want to modify the contents of the first element within your tuple. Add the following code snippet after your var addressDetails variable: // Modify the element within our String addressDetails.0 = "Apple Computers, Inc." Take a look at the following screenshot now: As you can note from the preceding screenshot, we modified our first component within our tuple to the Apple Computers, Inc value. If you do not want modifications to be made to your variable, you can just change the var keyword to let, and the assignment would result in a compilation error. You can also express your tuples by referencing them using their named elements. This makes it really useful as you can ensure that your users know exactly what the element refers to. If you express your tuples using their named elements, you will still be able to access your elements using their index notation, as can be seen in the following highlighted code snippet: /*: # Swift Language Basics – Tuples : Created by Steven F. Daniel : Copyright © 2015 GENIESOFT STUDIOS. All Rights Reserved. */ import Foundation // Define our Address Details var addressDetails = (company:"Apple Inc.", Address:"1 Infinite Loop", City:"Cupertino, California", Country:"United States"); // Accessing our Tuple by using their NAMES print("Accessing our Tuple using their NAMESn") print(addressDetails.company) // Get the Name print(addressDetails.Address) // Address print(addressDetails.City) // State print(addressDetails.Country) // Country // Accessing our Tuple by using their index notation print("nAccess our Tuple using their index notation:n") print(addressDetails.0) // Get the Name print(addressDetails.1) // Address print(addressDetails.2) // State print(addressDetails.3) // Country Take a look at the following screenshot now: As you can note from what we covered so far about tuples, these are really cool and are basically just like any other data type in Swift; they can be really powerful to use within your own programs. String interpolation String interpolation means embedding constants, variables, as well as expressions within your string literals. In this section, we will take a look at an example of how you can use this. Clear the contents of the playground template and replace them with the following code snippet: /*: # Swift Language Basics - String Interpolation : Created by Steven F. Daniel : Copyright © 2015 GENIESOFT STUDIOS. All Rights Reserved. */ import Foundation // Define our Address Details var addressDetails = (company:"Apple Inc.", Address:"1 Infinite Loop", City:"Cupertino, California", Country:"United States"); // Use String Interpolation to format output print("Apple Headquarters are located at: nn" + addressDetails.company + ",n" + addressDetails.Address + "n" + addressDetails.City + "n" + addressDetails.Country); Take a look at the following screenshot now: As you can note from the preceding code snippet, we started by declaring a tuple variable called addressDetails that contains a combination of strings. Next, we performed a string concatenation to generate our output in the format that we want by accessing each of the tuple elements using their index values and displaying each of these elements in the console window. Let's take this a step further and use string interpolation to place our address detail information into string variables. The result will still be the same, but I just want to show you the power of using tuples with the Swift programming language. Clear the contents of the playground template and replace them with the following highlighted code snippet: /*: # Swift Language Basics - String Interpolation : Created by Steven F. Daniel : Copyright © 2015 GENIESOFT STUDIOS. All Rights Reserved. */ import Foundation // Use String Interpolation to place elements into string initializers var addressDetails = ("Apple Inc.", "1 Infinite Loop", "Cupertino, California", "United States"); let (Company, Address, City, Country) = addressDetails print("Apple Headquarters are located at: nn" + Company + ",n" + Address + "n" + City + "n" + Country); Take a look at the following screenshot now: As you can note from the preceding code snippet, we removed the named types from our addressDetails string contents, created a new type using the let keyword, and assigned placeholders for each of our tuple elements. This is very handy as it not only makes your code a lot more readable but you can also continue to create additional placeholders for the additional fields that you create. Controlling the flow In this section, we will take a look at how to use the for…in loop to iterate over a set of statements within the body of the loop until a specific condition is met. The for…in loops The for…in loops basically perform a set of statements over a certain number of times until a specific condition is met, which is typically handled by incrementing a counter each time until the loop ends. Let's take a look at a code example to see how we can put this into practice. Clear the contents of the playground template and replace them with the following code snippet: /*: # Swift Language Basics - Control Flow : Created by Steven F. Daniel : Copyright © 2015 GENIESOFT STUDIOS. All Rights Reserved. */ import Foundation // Perform a fibonacci loop using the For-In Loop var fibonacci = 0 var iTemp = 1 var jTemp = 0 for iterator in 0...19 { jTemp = fibonacci fibonacci += iTemp iTemp = jTemp print("Fibonacci: " + String(fibonacci), terminator: "n") } Take a look at the following screenshot now: The preceding code demonstrates the for…in loop and the closed range operator (...). These are often used together, but they are entirely independent. As you can note from the preceding code snippet, we declared the exact same variables: fibonacci, iTemp, and jTemp. Next, we used the for…in loop to iterate over our range, which is from 0 to 19, while displaying the current Fibonacci value in the console window. What's new in Swift 2.0 In this section, we will take a look at some of the new features that come as part of the Swift 2.0 programming language. Error handling Error handling is defined as the process of responding to and recovering from error conditions within your program. The Swift language provides first-class support for throwing, catching, propagating, and manipulating recoverable errors at runtime. In Swift, these are referred to as throwing functions and throwing methods. In Swift 2.0, error handling has vastly improved and adds an additional layer of safety to error checking. You can use the throws keyword to specify which functions and method are most likely to cause an error. You can implement and use the do, try, and catch keywords to handle something that could likely throw an error. Let's take a look at a code example to see how we can put this into practice. Clear the contents of the playground template and replace them with the following code snippet: /*: # Swift Language Basics - What's new in Swift 2.0 : Created by Steven F. Daniel : Copyright © 2015 GENIESOFT STUDIOS. All Rights Reserved. */ import Foundation enum EncryptionError: ErrorType { case Empty case Short } // Method to handle the Encryption func encryptString(str: String, withPassword password: String) throws -> String { if password.characters.count > 0 { // Password is valid } else { throw EncryptionError.Empty } if password.characters.count >= 5 { // Password is valid } else { throw EncryptionError.Short } // Begin constructing our encrypted string let encrypted = password + str + password return String(encrypted.characters.reverse()) } // Call our method to encrypt our string do { let encrypted = try encryptString("Encrypted String Goes Here", withPassword: "123") print(encrypted) } catch EncryptionError.Empty { print("You must provide a password.") } catch EncryptionError.Short { print("Passwords must be at least five characters.") } catch { print("An error occurred!") } Take a look at the following screenshot now: As you can note in the preceding code, we began by creating an enum object that derives from the ErrorType class so that we could create and throw an error. Next, we created a method called encryptString that takes two parameters: str and password. This method performed a check to ensure that we didn't pass an empty password. If our method determines that we did not specify a valid password, we will automatically throw an error using EncryptionError.Empty and exit from this method. Alternatively, if we provide a valid password and string to encrypt, our string will be encrypted. Binding Binding in Swift is something new and provides a means of checking whether a variable contains a valid value prior to continuing and exiting from the method otherwise. Fortunately, Swift 2.0 provides you with exactly this, and it is called the guard keyword. Let's go back to our previous code snippet and take a look at how we can implement the guard statement to our conditional checking within our encryptedString method. Modify the contents of the playground template and replace them with the following highlighted sections: // Method to handle the Encryption func encryptString(str: String, withPassword password: String) throws -> String { guard password.characters.count > 0 else { throw EncryptionError.Empty } guard password.characters.count >= 5 else { throw EncryptionError.Short } // Begin constructing our encrypted string let encrypted = password + str + password return String(encrypted.characters.reverse()) } As you can note in the preceding code snippet, using the guard keyword, you can provide a code block to perform a conditional check within the else statement that will run if the condition fails. This will make your code cleaner as the guard statement lets you trap invalid parameters from being passed to a method. Any conditions you would have checked using if before you can now check using guard. Protocol extensions In Swift 2.0, you have the ability to extend protocols and add additional implementations for properties and methods. For example, you can choose to add additional methods to the String or Array classes, as follows: /* # What's new in Swift 2.0 - Protocol Extensions The first content line displayed in this block of rich text. */ import Foundation let greeting = "Working with Swift Rocks!" // Extend the String class to include additional methods extension CustomStringConvertible { var uCaseString: String { return "(self.description.uppercaseString)!!!" } } print(greeting.uCaseString) Take a look at the following screenshot now: As you can note in the preceding code, we extended the String class using the CustomStringConvertible protocol, which most of the Foundation class objects conform to. Using protocol extensions, they provide you with a wide variety of ways to extend the base classes so that you can add and implement your very own custom functionalities. Summary In this article, we explored how to go about downloading and installing Xcode development tools and then moved on to discussing and using playgrounds to write Swift code to get to grips with some of the Swift programming language features. Next, we looked at some of the newest features that come as part of the Swift 2.0 language. Resources for Article: Further resources on this subject: Exploring Swift[article] Playing with Swift[article] Introduction to WatchKit[article]
Read more
  • 0
  • 0
  • 46898

article-image-creating-your-own-node-module
Soham Kamani
18 Apr 2016
6 min read
Save for later

Creating Your Own Node Module

Soham Kamani
18 Apr 2016
6 min read
Node.js has a great community and one of the best package managers I have ever seen. One of the reasons npm is so great is because it encourages you to make small composable modules, which usually have just one responsibility. Many of the larger, more complex node modules are built by composing smaller node modules. As of this writing, npm has over 219,897 packages. One of the reasons this community is so vibrant is because it is ridiculously easy to make your own node module. This post will go through the steps to create your own node module, as well as some of the best practices to follow while doing so. Prerequisites and Installation node and npm are a given. Additionally, you should also configure your npm author details: npm set init.author.name "My Name" npm set init.author.email "your@email.com" npm set init.author.url "http://your-website.com" npm adduser These are the details that would show up on npmjs.org once you publish. Hello World The reason that I say creating a node module is ridiculously easy is because you only need two files to create the most basic version of a node module. First up, create a package.json file inside of a new folder by running the npm init command. This will ask you to choose a name. Of course, the name you are thinking of might already exist in the npm registry, so to check for this run the command npm ls owner module_name , where module_name is replaced by the namespace you want to check. If it exists, you will get information about the authors: $ npm owner ls forever indexzero <charlie.robbins@gmail.com> bradleymeck <bradley.meck@gmail.com> julianduque <julianduquej@gmail.com> jeffsu <me@jeffsu.com> jcrugzz <jcrugzz@gmail.com> If your namespace is free you would get an error message. Something similar to : $ npm owner ls does_not_exist npm ERR! owner ls Couldnt get owner data does_not_exist npm ERR! Darwin 14.5.0 npm ERR! argv "node" "/usr/local/bin/npm" "owner" "ls" "does_not_exist" npm ERR! node v0.12.4 npm ERR! npm v2.10.1 npm ERR! code E404 npm ERR! 404 Registry returned 404 GET on https://registry.npmjs.org/does_not_exist npm ERR! 404 npm ERR! 404 'does_not_exist' is not in the npm registry. npm ERR! 404 You should bug the author to publish it (or use the name yourself!) npm ERR! 404 npm ERR! 404 Note that you can also install from a npm ERR! 404 tarball, folder, http url, or git url. npm ERR! Please include the following file with any support request: npm ERR! /Users/sohamchetan/Documents/jekyll-blog/npm-debug.log After setting up package.json, add a JavaScript file: module.exports = function(){ return 'Hello World!'; } And that's it! Now execute npm publish . and your node module will be published to npmjs.org. Also, anyone can now install your node module by running npm install --save module_name, where module name is the "name" property contained in package.json. Now anyone can use your module like this : var someModule = require('module_name'); console.log(someModule()); // This will output "Hello World!" Dependencies As stated before, rarely will you find large scale node modules that do not depend on other smaller modules. This is because npm encourages modularity and composability. To add dependancies to your own module, simply install them. For example, one of the most depended upon packages is lodash, a utility library. To add this, run the command : npm install --save lodash Now you can use lodash everywhere in your module by "requiring" it, and when someone else downloads your module, they get lodash bundled along with it as well. Additionally you would want to have some modules purely for development and not for distribution. These are dev-dependencies, and can be installed with the npm install --save-dev command. Dev dependencies will not install when someone else installs your node module. Configuring package.json The package.json file is what contains all the metadata for your node_module. A few fields are filled out automatically (like dependencies or devDependencies during npm installs). There are a few more fields in package.json that you should consider filling out so that your node module is best fitted to its purpose. "main": The relative path of the entry point of your module. Whatever is assigned to module.exports in this file is exported when someone "requires" your module. By default this is the index.js file. "keywords": It’s an array of keywords describing your module. Quite helpful when others from the community are searching for something that your module happens to solve. "license": I normally publish all my packages with an "MIT" licence because of its openness and popularity in the open source community. "version": This is pretty crucial because you cannot publish a node module with the same version twice. Normally, semver versioning should be followed. If you want to know more about the different properties you can set in package.json there's a great interactive guide you can check out. Using Yeoman Generators Although it's really simple to make a basic node module, it can be quite a task to make something substantial using just index.js nd package.json file. In these cases, there's a lot more to do, such as: Writing and running tests. Setting up a CI tool like Travis. Measuring code coverage. Installing standard dev dependencies for testing. Fortunately, there are many Yeoman generators to help you bootstrap your project. Check out generator-nm for setting up a basic project structure for a simple node module. If writing in ES6 is more your style, you can take a look at generator-nm-es6. These generators get your project structure, complete with a testing framework and CI integration so that you don't have to spend all your time writing boilerplate code. About the Author Soham Kamani is a full-stack web developer and electronics hobbyist.  He is especially interested in JavaScript, Python, and IoT.
Read more
  • 0
  • 0
  • 9226

article-image-setting-build-chain-grunt
Packt
18 Apr 2016
24 min read
Save for later

Setting up a Build Chain with Grunt

Packt
18 Apr 2016
24 min read
In this article by Bass Jobsen, author of the book Sass and Compass Designer's Cookbook you will learn the following topics: Installing Grunt Installing Grunt plugins Utilizing the Gruntfile.js file Adding a configuration definition for a plugin Adding the Sass compiler task (For more resources related to this topic, see here.) This article introduces you to the Grunt Task Runner and the features it offers to make your development workflow a delight. Grunt is a JavaScript Task Runner that is installed and managed via npm, the Node.js package manager. You will learn how to take advantage of its plugins to set up your own flexible and productive workflow, which will enable you to compile your Sass code. Although there are many applications available for compiling Sass, Grunt is a more flexible, versatile, and cross-platform tool that will allow you to automate many development tasks, including Sass compilation. It can not only automate the Sass compilation tasks, but also wrap any other mundane jobs, such as linting and minifying and cleaning your code, into tasks and run them automatically for you. By the end of this article, you will be comfortable using Grunt and its plugins to establish a flexible workflow when working with Sass. Using Grunt in your workflow is vital. You will then be shown how to combine Grunt's plugins to establish a workflow for compiling Sass in real time. Grunt becomes a tool to automate integration testing, deployments, builds, and development in which you can use. Finally, by understanding the automation process, you will also learn how to use alternative tools, such as Gulp. Gulp is a JavaScript task runner for node.js and relatively new in comparison to Grunt, so Grunt has more plugins and a wider community support. Currently, the Gulp community is growing fast. The biggest difference between Grunt and Gulp is that Gulp does not save intermediary files, but pipes these files' content in memory to the next stream. A stream enables you to pass some data through a function, which will modify the data and then pass the modified data to the next function. In many situations, Gulp requires less configuration settings, so some people find Gulp more intuitive and easier to learn. In this article, Grunt has been chosen to demonstrate how to run a task runner; this choice does not mean that you will have to prefer the usage of Grunt in your own project. Both the task runners can run all the tasks described in this article. Simply choose the task runner that suits you best. This recipe demonstrates shortly how to compile your Sass code with Gulp. In this article, you should enter your commands in the command prompt. Linux users should open a terminal, while Mac users should run Terminal.app and Window users should use the cmd command for command line usage. Installing Grunt Grunt is essentially a Node.js module; therefore, it requires Node.js to be installed. The goal of this recipe is to show you how to install Grunt on your system and set up your project. Getting ready Installing Grunt requires both Node.js and npm. Node.js is a platform built on Chrome's JavaScript runtime for easily building fast, scalable network applications, and npm is a package manager for Node.js. You can download the Node.js source code or a prebuilt installer for your platform at https://nodejs.org/en/download/. Notice that npm is bundled with node. Also, read the instructions at https://github.com/npm/npm#super-easy-install. How to do it... After installing Node.js and npm, installing Grunt is as simple as running a single command, regardless of the operating system that you are using. Just open the command line or the Terminal and execute the following command: npm install -g grunt-cli That's it! This command will install Grunt globally and make it accessible anywhere on your system. Run the grunt --version command in the command prompt in order to confirm that Grunt has been successfully installed. If the installation is successful, you should see the version of Grunt in the Terminal's output: grunt --version grunt-cli v0.1.11 After installing Grunt, the next step is to set it up for your project: Make a folder on your desktop and call it workflow. Then, navigate to it and run the npm init command to initialize the setup process: mkdir workflow && cd $_ && npm init Press Enter for all the questions and accept the defaults. You can change these settings later. This should create a file called package.json that will contain some information about the project and the project's dependencies. In order to add Grunt as a dependency, install the Grunt package as follows: npm install grunt --save-dev Now, if you look at the package.json file, you should see that Grunt is added to the list of dependencies: ..."devDependencies": {"grunt": "~0.4.5" } In addition, you should see an extra folder created. Called node_modules, it will contain Grunt and other modules that you will install later in this article. How it works... In the preceding section, you installed Grunt (grunt-cli) with the -g option. The -g option installs Grunt globally on your system. Global installation requires superuser or administrator rights on most systems. You need to run only the globally installed packages from the command line. Everything that you will use with the require() function in your programs should be installed locally in the root of your project. Local installation makes it possible to solve your project's specific dependencies. More information about global versus local installation of npm modules can be found at https://www.npmjs.org/doc/faq.html. There's more... Node package managers are available for a wide range of operation systems, including Windows, OSX, Linux, SunOS, and FreeBSD. A complete list of package managers can be found at https://github.com/joyent/node/wiki/Installing-Node.js-via-package-manager. Notice that these package managers are not maintained by the Node.js core team. Instead, each package manager has its own maintainer. See also The npm Registry is a public collection of packages of open source code for Node.js, frontend web apps, mobile apps, robots, routers, and countless other needs of the JavaScript community. You can find the npm Registry at https://www.npmjs.org/. Also, notice that you do not have to use Task Runners to create build chains. Keith Cirkel wrote about how to use npm as a build tool at http://blog.keithcirkel.co.uk/how-to-use-npm-as-a-build-tool/. Installing Grunt plugins Grunt plugins are the heart of Grunt. Every plugin serves a specific purpose and can also work together with other plugins. In order to use Grunt to set up your Sass workflow, you need to install several plugins. You can find more information about these plugins in this recipe's How it works... section. Getting ready Before you install the plugins, you should first create some basic files and folders for the project. You should install Grunt and create a package.json file for your project. Also, create an index.html file to inspect the results in your browser. Two empty folders should be created too. The scss folder contains your Sass code and the css folder contains the compiled CSS code. Navigate to the root of the project, repeat the steps from the Installing Grunt recipe of this article, and create some additional files and directories that you are going to work with throughout the article. In the end, you should end up with the following folder and file structure: How to do it... Grunt plugins are essentially Node.js modules that can be installed and added to the package.json file in the list of dependencies using npm. To do this, follow the ensuing steps: Navigate to the root of the project and run the following command, as described in the Installing Grunt recipe of this article: npm init Install the modules using npm, as follows: npm install grunt-contrib-sass load-grunt-tasks grunt-postcss --save-dev Notice the single space before the backslash in each line. For example, on the second line, grunt-contrib-sass , there is a space before the backslash at the end of the line. The space characters are necessary because they act as separators. The backslash at the end is used to continue the commands on the next line. The npm install command will download all the plugins and place them in the node_modules folder in addition to including them in the package.json file. The next step is to include these plugins in the Gruntfile.js file. How it works... Grunt plugins can be installed and added to the package.json file using the npm install command followed by the name of the plugins separated by a space, and the --save-dev flag: npm install nameOfPlugin1 nameOfPlugin2 --save-dev The --save-dev flag adds the plugin names and a tilde version range to the list of dependencies in the package.json file so that the next time you need to install the plugins, all you need to do is run the npm install command. This command looks for the package.json file in the directory from which it was called, and will automatically download all the specified plugins. This makes porting workflows very easy; all it takes is copying the package.json file and running the npm install command. Finally, the package.json file contains a JSON object with metadata. It is also worth explaining the long command that you have used to install the plugins in this recipe. This command installs the plugins that are continued on to the next line by the backslash. It is essentially equivalent to the following: npm install grunt-contrib-sass –-save-dev npm install load-grunt-tasks –-save-dev npm install grunt-postcss –-save-dev As you can see, it is very repetitive. However, both yield the same results; it is up to you to choose the one that you feel more comfortable with. The node_modules folder contains all the plugins that you install with npm. Every time you run npm install name-of-plugin, the plugin is downloaded and placed in the folder. If you need to port your workflow, you do not need to copy all the contents of the folder. In addition, if you are using a version control system, such as Git, you should add the node_modules folder to the .gitignore file so that the folder and its subdirectories are ignored. There's more... Each Grunt plugin also has its own metadata set in a package.json file, so plugins can have different dependencies. For instance, the grunt-contrib-sass plugin, as described in the Adding the Sass compiler task recipe, has set its dependencies as follows: "dependencies": {     "async": "^0.9.0",     "chalk": "^0.5.1",     "cross-spawn": "^0.2.3",     "dargs": "^4.0.0",     "which": "^1.0.5"   } Besides the dependencies described previously, this task also requires you to have Ruby and Sass installed. In the following list, you will find the plugins used in this article, followed by a brief description: load-grunt-tasks: This loads all the plugins listed in the package.json file grunt-contrib-sass: This compiles Sass files into CSS code grunt-postcss: This enables you to apply one or more postprocessors to your compiled CSS code CSS postprocessors enable you to change your CSS code after compilation. In addition to installing plugins, you can remove them as well. You can remove a plugin using the npm uninstall name-of-plugin command, where name-of-plugin is the name of the plugin that you wish to remove. For example, if a line in the list of dependencies of your package.json file contains grunt-concurrent": "~0.4.2",, then you can remove it using the following command: npm uninstall grunt-concurrent Then, you just need to make sure to remove the name of the plugin from your package.json file so that it is not loaded by the load-grunt-tasks plugin the next time you run a Grunt task. Running the npm prune command after removing the items from the package.json file will also remove the plugins. The prune command removes extraneous packages that are not listed in the parent package's dependencies list. See also More information on the npm version's syntax can be found at https://www. npmjs.org/doc/misc/semver.html  Also, see http://caniuse.com/ for more information on the Can I Use database Utilizing the Gruntfile.js file The Gruntfile.js file is the main configuration file for Grunt that handles all the tasks and task configurations. All the tasks and plugins are loaded using this file. In this recipe, you will create this file and will learn how to load Grunt plugins using it. Getting ready First, you need to install Node and Grunt, as described in the Installing Grunt recipe of this article. You will also have to install some Grunt plugins, as described in the Installing Grunt plugins recipe of this article. How to do it... Once you have installed Node and Grunt, follow these steps: In your Grunt project directory (the folder that contains the package.json file), create a new file, save it as Gruntfile.js, and add the following lines to it: module.exports = function(grunt) {   grunt.initConfig({     pkg: grunt.file.readJSON('package.json'),       //Add the Tasks configurations here.   }); // Define Tasks here }; This is the simplest form of the Gruntfile.js file that only contains two information variables. The next step is to load the plugins that you installed in the Installing Grunt plugins recipe. Add the following lines at the end of your Gruntfile.js file: grunt.loadNpmTasks('grunt-sass'); In the preceding line of code, grunt-sass is the name of the plugin you want to load. That is all it takes to load all the necessary plugins. The next step is to add the configurations for each task to the Gruntfile.js file. How it works... Any Grunt plugin can be loaded by adding a line of JavaScript to the Gruntfile.js file, as follows: grunt.loadNpmTasks('name-of-module'); This line should be added every time a new plugin is installed so that Grunt can access the plugin's functions. However, it is tedious to load every single plugin that you install. In addition, you will soon notice that, as your project grows, the number of configuration lines will increase as well. The Gruntfile.js file should be written in JavaScript or CoffeeScript. Grunt tasks rely on configuration data defined in a JSON object passed to the grunt.initConfig method. JavaScript Object Notation (JSON) is an alternative for XML and used for data exchange. JSON describes name-value pairs written as "name": "value". All the JSON data is separated by commas with JSON objects written inside curly brackets and JSON arrays inside square brackets. Each object can hold more than one name/value pair with each array holding one or more objects. You can also group tasks into one task. Your alias groups of tasks using the following line of code: grunt.registerTask('alias',['task1', 'task2']); There's more... Instead of loading all the required Grunt plugins one by one, you can load them automatically with the load-grunt-tasks plugin. You can install this by using the following command in the root of your project: npm install load-grunt-tasks --save-dev Then, add the following line at the very beginning of your Gruntfile.js file after module.exports: require('load-grunt-tasks')(grunt); Now, your Gruntfile.js file should look like this: module.exports = function(grunt) {   require('load-grunt-tasks')(grunt);   grunt.initConfig({     pkg: grunt.file.readJSON('package.json'),       //Add the Tasks configurations here.   }); // Define Tasks here }; The load-grunt-tasks plugin loads all the plugins specified in the package.json file. It simply loads the plugins that begin with the grunt- prefix or any pattern that you specify. This plugin will also read dependencies, devDependencies, and peerDependencies in your package.json file and load the Grunt tasks that match the provided patterns. A pattern to load specifically chosen plugins can be added as a second parameter. You can load, for instance, all the grunt-contrib tasks with the following code in your Gruntfile.js file: require('load-grunt-tasks')(grunt, {pattern: 'grunt-contrib-*'}); See also Read more about the load-grunt-tasks module at https://github.com/sindresorhus/load-grunt-task Adding a configuration definition for a plugin Any Grunt task needs a configuration definition. The configuration definitions are usually added to the Gruntfile.js file itself and are very easy to set up. In addition, it is very convenient to define and work with them because they are all written in the JSON format. This makes it very easy to spot the configurations in the plugin's documentation examples and add them to your Gruntfile.js file. In this recipe, you will learn how to add the configuration for a Grunt task. Getting ready For this recipe, you will first need to create a basic Gruntfile.js file and install the plugin you want to configure. If you want to install the grunt-example plugin, you can install it using the following command in the root of your project: npm install grunt-example --save-dev How to do it... Once you have created the basic Gruntfile.js file (also refer to the Utilizing the Gruntfile.js file recipe of this article), follow this step: A simple form of the task configuration is shown in the following code. Start by adding it to your Gruntfile.js file wrapped inside grunt.initConfig{}: example: {   subtask: {    files: {      "stylesheets/main.css":      "sass/main.scss"     }   } } How it works... If you look closely at the task configuration, you will notice the files field that specifies what files are going to be operated on. The files field is a very standard field that appears in almost all the Grunt plugins simply due to the fact that many tasks require some or many file manipulations. There's more... The Don't Repeat Yourself (DRY) principle can be applied to your Grunt configuration too. First, define the name and the path added to the beginning of the Gruntfile.js file as follows: app {  dev : "app/dev" } Using the templates is a key in order to avoid hard coded values and inflexible configurations. In addition, you should have noticed that the template has been used using the <%= %> delimiter to expand the value of the development directory: "<%= app.dev %>/css/main.css": "<%= app.dev %>/scss/main.scss"   The <%= %> delimiter essentially executes inline JavaScript and replaces values, as you can see in the following code:   "app/dev/css/main.css": "app/dev/scss/main.scss" So, put simply, the value defined in the app object at the top of the Gruntfile.js file is evaluated and replaced. If you decide to change the name of your development directory, for example, all you need to do is change the app's variable that is defined at the top of your Gruntfile.js file. Finally, it is also worth mentioning that the value for the template does not necessarily have to be a string and can be a JavaScript literal. See also You can read more about templates in the Templates section of Grunt's documentation at http://gruntjs.com/configuring- tasks#templates Adding the Sass compiler task The Sass tasks are the core task that you will need for your Sass development. It has several features and options, but at the heart of it is the Sass compiler that can compile your Sass files into CSS. By the end of this recipe, you will have a good understanding of this plugin, how to add it to your Gruntfile.js file, and how to take advantage of it. In this recipe, the grunt-contrib-sass plugin will be used. This plugin compiles your Sass code by using Ruby Sass. You should use the grunt-sass plugin to compile Sass into CSS with node-sass (LibSass). Getting ready The only requirement for this recipe is to have the grunt-contrib-sass plugin installed and loaded in your Gruntfile.js file. If you have not installed this plugin in the Installing Grunt Plugins recipe of this article, you can do this using the following command in the root of your project: npm install grunt-contrib-sass --save-dev You should also install grunt local by running the following command: npm install grunt --save-dev Finally, your project should have the file and directory, as describe in the Installing Grunt plugins recipe of this article. How to do it... An example of the Sass task configuration is shown in the following code. Start by adding it to your Gruntfile.js file wrapped inside the grunt.initConfig({}) code. Now, your Gruntfile.js file should look as follows: module.exports = function(grunt) {   grunt.initConfig({     //Add the Tasks configurations here.     sass: {                                            dist: {                                            options: {                                       style: 'expanded'         },         files: {                                         'stylesheets/main.css': 'sass/main.scss'  'source'         }       }     }   });     grunt.loadNpmTasks('grunt-contrib-sass');     // Define Tasks here    grunt.registerTask('default', ['sass']);  } Then, run the following command in your console: grunt sass The preceding command will create a new stylesheets/main.css file. Also, notice that the stylesheets/main.css.map file has also been automatically created. The Sass compiler task creates CSS sourcemaps to debug your code by default. How it works... In addition to setting up the task configuration, you should run the Grunt command to test the Sass task. When you run the grunt sass command, Grunt will look for a configuration called Sass in the Gruntfile.js file. Once it finds it, it will run the task with some default options if they are not explicitly defined. Successful tasks will end with the following message: Done, without errors. There's more... There are several other options that you can include in the Sass task. An option can also be set at the global Sass task level, so the option will be applied in all the subtasks of Sass. In addition to options, Grunt also provides targets for every task to allow you to set different configurations for the same task. In other words, if, for example, you need to have two different versions of the Sass task with different source and destination folders, you could easily use two different targets. Adding and executing targets are very easy. Adding more builds just follows the JSON notation, as shown here:    sass: {                                      // Task       dev: {                                    // Target         options: {                               // Target options           style: 'expanded'         },         files: {                                 // Dictionary of files         'stylesheets/main.css': 'sass/main.scss' // 'destination': 'source'         }       },       dist: {                               options: {                        style: 'expanded',           sourcemap: 'none'                  },         files: {                                      'stylesheets/main.min.css': 'sass/main.scss'         }       }     } In the preceding example, two builds are defined. The first one is named dev and the second is called dist. Each of these targets belongs to the Sass task, but they use different options and different folders for the source and the compiled Sass code. Moreover, you can run a particular target using grunt sass:nameOfTarget, where nameOfTarge is the name of the target that you are trying to use. So, for example, if you need to run the dist target, you will have to run the grunt sass:dist command in your console. However, if you need to run both the targets, you could simply run grunt sass and it would run both the targets sequentially. As already mentioned, the grunt-contrib-sass plugin compiles your Sass code by using Ruby Sass, and you should use the grunt-sass plugin to compile Sass to CSS with node-sass (LibSass). To switch to the grunt-sass plugin, you will have to install it locally first by running the following command in your console: npm install grunt-sass Then, replace grunt.loadNpmTasks('grunt-contrib-sass'); with grunt.loadNpmTasks('grunt-sass'); in the Gruntfile.js file; the basic options for grunt-contrib-sass and grunt-sass are very similar, so you have to change the options for the Sass task when switching to grunt-sass. Finally, notice that grunt-contrib-sass also has an option to turn Compass on. See also Please refer to Grunt's documentation for a full list of options, which is available at https://gruntjs/grunt-contrib-sass#options Also, read Grunt's documentation for more details about configuring your tasks and targets at http://gruntjs.com/configuring-tasks#task-configuration-and-targets github.com/ Summary In this article you studied about installing Grunt, installing Grunt plugins, utilizing the Gruntfile.js file, adding a configuration definition for a plugin and adding the Sass compiler task. Resources for Article: Further resources on this subject: Meeting SAP Lumira [article] Security in Microsoft Azure [article] Basic Concepts of Machine Learning and Logistic Regression Example in Mahout [article]
Read more
  • 0
  • 0
  • 35045
article-image-configuring-redmine
Packt
18 Apr 2016
15 min read
Save for later

Configuring Redmine

Packt
18 Apr 2016
15 min read
In this article by Andriy Lesyuk, author of Mastering Redmine, whentalking about the web interface (that is, not system files), all of the global configuration of Redmine can be done on the Settings page of the Administration menu. This is actually the page that this articleis based around. Some settings on this page, however, depend on special system files or third-party tools that need to be installed. And these are the other things that we will discuss. You might expect to see detailed explanations for all the administration settings here, but instead, we will review in detail only a few of them, as I believe that the others do not need to be explained or can easily be tested. So generally, we will focus on hard-to-understand settings and thosesettings that need to be configured additionally in some special way or have some obscurities. So, why should you read this articleif you are not an administrator? Some features of Redmine are available only if they have been configured, so by reading this article, you will learn what extra features exist and get an idea of how to enable them. In this article, we will cover the following topics: The first thing to fix The general settings Authentication (For more resources related to this topic, see here.) The first thing to fix A fresh Redmine installation has only one user account, which has administrator privileges. You can see it in the following screenshot: This account is exactly the same by default on all Redmine installations. That's why it is extremely important to change its credentials immediately after you complete the installation, especially for Redmine instances that can be accessed publicly. The administrator credentials can be changed on the Users page of the Administration menu. To do this, click on the admin link. You will see this screen: In this form, you should specify a new password in the Password and Confirmation fields. Also, it's recommended that you change the login to something different. Additionally, consider specifying your e-mail instead of admin@example.net (at least), changing the First name and Last name. The general settings Everything that is possible to configure at the global level (the opposite is the project level) can be found under the Administration link in the top-left menu. Of course, this link is available only for administrators If you click on the Administrationlink, you will get the list of available administration pages on the sidebar to the right. Most of them are for managing Redmine objects, such as projects and trackers. We will be discussing only general, system-wide configuration. Most of the settings that we are going to review are compiled on the Settings page, as shown in the following screenshot: As all of these settings can't fit on a single page, Redmine organizes them into tabs. We will discuss the Authentication, Email notifications, Incoming emails, and Repositories tabs in the next sections. The General tab So let's start with the General tab, which can be seen in the previous screenshot. Settings in this tab control the general behavior of Redmine, thus Application title is the name of the website that is shown at the top of non-project pages, Welcome text is displayed on the start page of Redmine, Objects per page options specifies how many objects users will be able to see on a page, such settings as Search results per page and Days displayed on project activity allow to control the number of objects that are shown on search results and activity pages correspondingly, the Protocol setting specifies the preferred protocol that will be used in links to the website, Wiki history compression controls whether the history of Wiki changes should be compressed to save the space, and finally Maximum number of items in Atom feeds sets the limit for the amount of items that are returned in the Atom feed. Additionally, the General tab contains settings, which I want to discuss in detail. The Cache formatted text setting Redmine supports text formatting through the lightweight markup language Textile or Markdown. While conversion of text from such a language to HTML is quite fast, in some circumstances, you may want to cache the resulting HTML. If that is the case, the Cache formatted text checkbox is what you need. When this setting is enabled, all Textile or Markdown content that is larger than 2 KB will be cached. The cached HTML will be refreshed only when any changes are made to the source text, so you should take this into account if you are using a Wiki extension that generates the dynamic content (such as my WikiNG plugin). Unless performance is extremely critical for you, you should leave this checkbox unchecked. Other settings tips Here are some other tips for the General tab: The value of the Host name and path setting will be used to generate URLs in the e-mail messages that will be sent to users, so it's important to specify a proper value here. For the Text formatting, select the markup language that is best for you. It's also possible to select none here, but I would not recommend to do this. The Display tab As it comes from the name, this tab contains settings related to the look and feel of Redmine. Its settings can be seen in the following screenshot: Using the Theme setting users can choose a theme for the Redmine interface. The Default language setting allows to specify which language will be used for the interface, if Redmine fails to determine the language of the user. Thus, for not logged-in users it will attempt to use the preferred language of the user's browser, what can be disabled by the Force default language for anonymous users setting, and for logged-in users it will use the language that is chosen by users in their profiles, what can be disabled by the Force default language for logged-in users setting. By default the user's language also affects the start day of the week, and date and time formats, what can also be changed by the Start calendars on, Date format and Time format settings correspondingly. The display format of the user name is controlled by the Users display format setting. Finally, the Thumbnails size (in pixels) setting specifies the size of thumbnail images in pixels. Now let's check what the rest of settings mean. The Use Gravatar user icons setting Once I used a WordPress form to leave a comment on someone's blog. That form asked me to specify the first name, the last name, my e-mail address, and the text. After submitting it, I was surprised to see my photo near the comment. That's what Gravatar does. Gravatar stands for Globally Recognized Avatar. It's a web service that allows you to assign an image for each user's e-mail. Then, third-party sites can fetch the corresponding image by supplying a hash of the user's e-mail address. The Use Gravatar user icons setting enables this behavior for Redmine. Having this option checked is a good idea (unless potential users of your Redmine installation can be unable to access Internet because, for example, Redmine is going to be used in an isolated Intranet. The Default Gravatar image setting What happens if a Gravatar is not available for the user's e-mail? In such cases, the Gravatar service returns a default image, which depends on the Default Gravatar image setting. The following table shows the six available themes of the default avatar image: Theme Sample image Description None The default image, which is shown if no other theme is selected Wavatars A generated face with differing features and background Identicons A geometric pattern Monster IDs A generated monster image with different colors, face, and so on Retro A generated 8-bit, arcade-style pixelated face Mystery man A simple, cartoon-style silhouetted outline of a person   For all of these themes, except Mystery manandnone, Gravatar generates an avatar image that is based on the hash of the user's e-mail and is therefore unique to it. The Redmine Local Avatars plugin Consider installing the Redmine Local Avatars plugin by Andrew Chaika, Luca Pireddu, and Ricardo Santos, if you preferwant users to upload their avatars directly onto Redmine: https://github.com/thorin/redmine_local_avatars This plugin will also let your users take their pictures with web cameras. The Display attachment thumbnails setting If the Display attachment thumbnails setting is enabled, all image attachments—no matter what object (for example, Wiki or issue) they are attached to—will be also seen under the attachment list as clickable thumbnails. If the user clicks on such a thumbnail, the full-size image will be opened. The Redmine Lightbox 2 plugin In pure Redmine, full-size images are opened in the same browser window. To open them in a lightbox, you can use the Lightbox 2 plugin that was created by Genki Zhang and Tobias Fischer: https://github.com/paginagmbh/redmine_lightbox2 Note that in order for this setting to work, you must have the ImageMagick's convert tool installed. The API tab In addition to the web interface that is intended for human Redmine comes with a special REST application programming interface (API) that is intended for third-party applications. Thus, Redmine REST API is used by Redmine Mylyn Connector for Eclipse and RedmineApp for iPhone. This interface can be enabled and configured under the API tab of the Settings page which is shown in the following screenshot: Let's check what these settings mean: If you need to support integration of third-party tools, you should turn on Redmine REST API using the Enable REST web service checkbox. But it is safe to keep this setting disabled, if you are not using any external Redmine tools. Redmine API can also be used via JavaScript in the web browser, but not if the API client (that is, a website, that runs JavaScript) is on different domain. In such cases to bypass the browser's same-origin policy the API client may use the technique called JSONP. As this technique is considered to be insecure it should be explicitly enabled using the Enable JSONP support setting. So in most cases you should leave this option disabled. The Files tab The Files tab contains settings related to file display and attachment as shown in the following screenshot: Here Allowed extensions and Disallowed extensions can be used to restrict file uploads by file extensions – thus you can use the former setting to allow certain extensions only or the latter one to forbid certain extensions only. Such settings as Maximum size of text files displayed inline and Maximum number of diff lines displayed control the amount of the file content that can be displayed. The rest settings are used more often: You may need to change the Maximum attachment size setting to a large value (which is in kB). Thus, project files (releases) are attachments as well, so if you expect your users to upload large files, consider changing this setting to a bigger value. The value of the Attachments and repositories encodings option is used to convert commit messages to UTF-8. Authentication There are two pages in Redmine intended for configuring the authentication. The first one is the Authentication tab on the Settings page, and the second one is the special LDAP Authentication page, which can be found in the Administration menu. Let's discuss these pages in detail. The Authentication tab The next tab in the administration settings is Authentication. The following screenshot shows the various options available under this tab: If the Authentication required setting is enabled, users won't be able to see the content of your Redmine without having logged in first. The Autologin setting can be used to let your users keep themselves logged in for some period of time using their browsers. The Self-registration setting controls, how user accounts are activated (the manual account activation option means that users should be enabled by administrators). The Allow users to delete their own account setting controls, whether users will be able to delete their accounts. The Minimum password length setting specifies the minimum size of the password in characters and the Require password change after setting can be used to force users to change their passwords periodically. The Lost password setting controls, whether users will be able to restore their passwords in cases when they, for example, have forgotten them. And finally the Maximum number of additional email addresses setting specifies the number of additional email addresses a user account may have. After a user logs in Redmine opens a user session. The lifetime of such session is controlled by the Session maximum lifetime setting (value disabled means that the session hangs forever). Such session can also be automatically terminated, if the user was not active for some time, what is controlled by the Session inactivity timeout setting (value disabled means that the session never expires). Now, let's discuss the very special setting, which we skipped. The Allow OpenID login and registration setting If you are running a public website with open registration, you perhaps know (or you will know if you want your Redmine installation to be public and open for user registration) that users do not like to register on each new site. This is understandable, as they do not want to create another password to remember or share their existing password with a new and therefore untrusted website. Besides, it's also a matter of sharing the e-mail address and—sometimes—remembering another login. That's when OpenID comes in handy. OpenID is an open-standard authentication protocol in which authentication (password verification) is performed by the OpenID provider. This popular protocol is currently supported by many companies, such as Yahoo!, PayPal, AOL, LiveJournal, IBM, VeriSign, and WordPress. In other words, servers of such companies can act as OpenID providers, and therefore users can log in to Redmine using their accounts that they have on these companies' websites if the Allow OpenID login and registration setting is enabled. Google used to support OpenID too, but they shut it down recently in favor of the OAuth2.0-based OpenID Connect authentication protocol. Despite the use of "OpenID" in its name, OpenID Connect is very different from OpenID. So, if your Redmine installation is (or is going to be) public, consider enabling this setting. But note that to log in using this protocol, your users will need to specify OpenID URL (the URL of the OpenID provider) in addition to Login and Password, as it can be seen on the following Redmine login form: LDAP authentication Just as OpenID is convenient for public sites to be used to authenticate external users, LDAP is convenient for private sites—to authenticate corporate users. Like OpenID, LDAP is a standard that describes how to authenticate against a special LDAP directory server, and is widely used by many applications such as MediaWiki, Apache, JIRA, Samba, SugarCRM, and so on. Also, as LDAP is an open protocol, it is supported by some other directory servers, such as Microsoft Active Directory and Apple Open Directory. For this reason, it is often used by companies as a centralized users' directory and an authentication server. To allow users to authenticate against an LDAP server, you should add it to the list of supported authentication modes on the LDAP authentication page, which is available in the Administration menu. To add a mode, click on the New authentication mode link. This will open the form: If the On-the-fly user creation option is checked, user accounts will be created automatically when users log in to the system for the first time. If this option is not checked, users will have to be added manually beforehand. Also, if you check this option, you need to specify all the attributes in the Attributes box, as they are going to be used to import user details from the LDAP server. Check with your LDAP server administrator to find out what values should be used in this form. In Redmine, LDAP authentication can be performed against many LDAP servers. Every such server is represented as an authentication source in the authentication mode list, which has just been mentioned. The corresponding source can also be seen in the user's profile and can even be changed to the internal Redmine authentication if needed. Summary I guess you have become a bit tired with all those general details, installations, configurations, integrations, and so on. You might expect to see detailed explanations for all the administration settings here, but instead, we will review in detail only a few of them, as I believe that the others do not need to be explained or can easily be tested. So generally, we will focus on hard-to-understand settings and those settings that need to be configured additionally in some special way or have some obscurities. Resources for Article: Further resources on this subject: Project management with Redmine [article] Redmine - Permissions and Security [article] Installing and customizing Redmine [article]
Read more
  • 0
  • 0
  • 5530

Packt
15 Apr 2016
17 min read
Save for later

Finding Patterns in the Noise – Clustering and Unsupervised Learning

Packt
15 Apr 2016
17 min read
In this article by, Joseph J, author of Mastering Predictive Analytics with Python, we will cover one of the natural questions to ask about a dataset is if it contains groups. For example, if we examine financial markets as a time series of prices over time, are there groups of stocks that behave similarly over time? Likewise, in a set of customer financial transactions from an e-commerce business, are there user accounts distinguished by patterns of similar purchasing activity? By identifying groups using the methods described in this article, we can understand the data as a set of larger patterns rather than just individual points. These patterns can help in making high-level summaries at the outset of a predictive modeling project, or as an ongoing way to report on the shape of the data we are modeling. Likewise, the groupings produced can serve as insights themselves, or they can provide starting points for the models. For example, the group to which a datapoint is assigned can become a feature of this observation, adding additional information beyond its individual values. Additionally, we can potentially calculate statistics (such as mean and standard deviation) for other features within these groups, which may be more robust as model features than individual entries. (For more resources related to this topic, see here.) In contrast to the methods, grouping or clustering algorithms are known as unsupervised learning, meaning we have no response, such as a sale price or click-through rate, which is used to determine the optimal parameters of the algorithm. Rather, we identify similar datapoints, and as a secondary analysis might ask whether the clusters we identify share a common pattern in their responses (and thus suggest the cluster is useful in finding groups associated with the outcome we are interested in). The task of finding these groups, or clusters, has a few common ingredients that vary between algorithms. One is a notion of distance or similarity between items in the dataset, which will allow us to compare them. A second is the number of groups we wish to identify; this can be specified initially using domain knowledge, or determined by running an algorithm with different choices of initial groups to identify the best number of groups that describes a dataset, as judged by numerical variance within the groups. Finally, we need a way to measure the quality of the groups we've identified; this can be done either visually or through the statistics that we will cover. In this article we will dive into: How to normalize data for use in a clustering algorithm and to compute similarity measurements for both categorical and numerical data How to use k-means to identify an optimal number of clusters by examining the loss function How to use agglomerative clustering to identify clusters at different scales Using affinity propagation to automatically identify the number of clusters in a dataset How to use spectral methods to cluster data with nonlinear boundaries Similarity and distance The first step in clustering any new dataset is to decide how to compare the similarity (or dissimilarity) between items. Sometimes the choice is dictated by what kinds of similarity we are trying to measure, in others it is restricted by the properties of the dataset. In the following we illustrate several kinds of distance for numerical, categorical, time series, and set-based data—while this list is not exhaustive, it should cover many of the common use cases you will encounter in business analysis. We will also cover normalizations that may be needed for different data types prior to running clustering algorithms. Numerical distances Let's begin by looking at an example contained in the wine.data file. It contains a set of chemical measurements that describe the properties of different kinds of wines, and the class of quality (I-III) to which the wine is assigned (Forina, M. et al, PARVUS - An Extendible Package for Data Exploration, Classification and Correlation, Institute of Pharmaceutical and Food Analysis and Technologies, Via Brigata Salerno, 16147 Genoa, Italy.). Open the file in an iPython notebook and look at the first few rows: Notice that in this dataset we have no column descriptions. We need to parse these from the dataset description file wine.data. With the following code, we generate a regular expression that will match a header name (we match a pattern where a number followed by a parenthesis has a column name after it, as you can see in the list of column names listed in the file), and add these to an array of column names along with the first column, which is the class label of the wine (whether it belongs to category I-III). We then assign this list to the dataframe column names: Now that we have appended the column names, we can look at a summary of the dataset: How can we calculate a similarity between wines based on this data? One option would be to consider each of the wines as a point in a thirteen-dimensional space specified by its dimensions (for example, each of the properties other than the class). Since the resulting space has thirteen dimensions, we can't directly visualize the datapoints using a scatterplot to see if they are nearby, but we can calculate distances just the same as with a more familiar 2- or 3-dimensional space using the Euclidean distance formula, which is simply the length of the straight line between two points. This formula for this length can be used whether the points are in a 2-dimensional plot or a more complex space such as this example, and is given by: Here aand bare rows of the dataset and nis the number of columns. One feature of the Euclidean distance is that columns whose scale is much different from others can distort it. In our example, the values describing the magnesium content of each wine are ~100 times greater than the magnitude of features describing the alcohol content or ash percentage. If we were to calculate the distance between these datapoints, it would largely be determined by the magnesium concentration (as even small differences on this scale overwhelmingly determine the value of the distance calculation), rather than any of its other properties. While this might sometimes be desirable, in most applications we do not favour one feature over another and want to give equal weight to all columns. To get a fair distance comparison between these points, we need to normalize the columns so that they fall into the same numerical range (have similar maxima and minima values). We can do so using the scale()function in scikit-learn:   This function will subtract the mean value of a column from each element and then divide each point by the standard deviation of the column. This normalization centers each column at 0 with variance 1, and in the case of normally distributed data this would make a standard normal distribution. Also note that the scale() function returns a numpy dataframe, which is why we must call dataframe on the output to use the pandas function describe(). Now that we've scaled the data, we can calculate Euclidean distances between the points: We've now converted our dataset of 178 rows and 13 columns into a square matrix, giving the distance between each of these rows. In other words, row I, column j in this matrix represents the Euclidean distance between rows I and j in our dataset. This 'distance matrix' is the input we will use for clustering inputs in the following section. If we just want to get a visual sense of how the points compare to each other, we could use multidimensional scaling (MDS)—Modern Multidimensional Scaling - Theory and Applications Borg, I., Groenen P., Springer Series in Statistics (1997), Nonmetric multidimensional scaling: a numerical method, Kruskal, J. Psychometrika, 29 (1964), and Multidimensional scaling by optimizing goodness of fit to a nonmetric hypothesis, Kruskal, J. Psychometrika, 29, (1964)—to create a visualization. Multidimensional scaling attempts to find the set of lower dimensional coordinates (here, two dimensions) that best represents the distances in the higher dimensions of a dataset (here, the pairwise Euclidean distances we calculated from the 13 dimensions). It does this by minimizing the coordinates (x, y) according to the strain function: Strain(x1…..xn) = (1 – Sum(ijdij*<xi,xj>)2/Sum(ij(dij**2)Sumij<xi,x,j>**2))1/2 Where d are the distances we've calculated between points. In other words, we find coordinates that best capture the variation in the distance through the variation in dot product the coordinates. We can then plot the resulting coordinates, using the wine class to label points in the diagram. Note that the coordinates themselves have no interpretation (in fact, they could change each time we run the algorithm). Rather, it is the relative position of points that we are interested in: Given that there are many ways we could have calculated the distance between datapoints, is the Euclidean distance a good choice here? Visually, based on the multidimensional scaling plot, we can see there is separation between the classes based on the features we've used to calculate distance, so conceptually it appears that this is a reasonable choice in this case. However, the decision also depends on what we are trying to compare; if we are interested in detecting wines with similar attributes in absolute values, then it is a good metric. However, what if we're not interested so much in the absolute composition of the wine, but whether its variables follow similar trends among wines with different alcohol contents? In this case, we wouldn't be interested in the absolute difference in values, but rather the correlationbetween the columns. This sort of comparison is common for time series, which we turn to next. Correlations and time series For time series data, we are often concerned with whether the patterns between series exhibit the same variation over time, rather than their absolute differences in value. For example, if we were to compare stocks, we might want to identify groups of stocks whose prices move up and down in similar patterns over time. The absolute price is of less interest than this pattern of increase and decrease. Let's look at an example of the Dow Jones industrial average over time (Brown, M. S., Pelosi, M., and Dirska, H. (2013). Dynamic-radius Species-conserving Genetic Algorithm for the Financial Forecasting of Dow Jones Index Stocks and Machine Learning and Data Mining in Pattern Recognition, 7988, 27-41.): This data contains the daily stock price (for 6 months) for a set of 30 stocks. Because all of the numerical values (the prices) are on the same scale, we won't normalize this data as with the wine dimensions. We notice two things about this data. First, the closing price per week (the variable we will use to calculate correlation) is presented as a string. Second, the date is not in the current format for plotting. We will process both columns to fix this, converting the columns to a float and datetime object, respectively: With this transformation, we can now make a pivot table to place the closing prices for week as columns and individual stocks as rows: As we can see, we only need columns 2 and onwards to calculate correlations between rows. Let's calculate the correlation between these time series of stock prices by selecting the second column to end columns of the data frame, calculating the pairwise correlations distance metric, and visualizing it using MDS, as before: It is important to note that the Pearson coefficient, which we've calculated here, is a measure of linearcorrelation between these time series. In other words, it captures the linear increase (or decrease) of the trend in one price relative to another, but won't necessarily capture nonlinear trends. We can see this by looking at the formula for the Pearson correlation, which is given by: P(a,b) = cov(a,b)/sd(a)/sd(b) = Sum(a-mean(b))*Sum(b-mean(b))/Sqrt(Sum(a-mean(a))2* Sqrt(Sum(b-mean(b)) This value varies from 1 (highly correlated) to -1 (inversely correlated), with 0 representing no correlation (such as a cloud of points). You might recognize the numerator of this equation as the covariance, which is a measure of how much two datasets, a and b, vary with one another. You can understand this by considering that the numerator is maximized when corresponding points in both datasets are above or below their mean value. However, whether this accurately captures the similarity in the data depends upon the scale. In data that is distributed in regular intervals between a maximum and minimum, with roughly the same difference between consecutive values (which is essentially how a trend line appears), it captures this pattern well. However, consider a case in which the data is exponentially distributed, with orders of magnitude differences between the minimum and maximum, and the difference between consecutive datapoints also varyies widely. Here, the Pearson correlation would be numerically dominated by only the largest terms, which might or might not represent the overall similarity in the data. This numerical sensitivity also occurs in the numerator, which represents the product of the standard deviations of both datasets. Thus, the value of the correlation is maximized when the variation in the two datasets is roughly explained by the product of their individual variations; there is no 'left over' variation between the datasets that is not explained by their respective standard deviations. Looking at the first two stocks in this dataset, this assumption of linearity appears to be a valid one for comparing datapoints: In addition to verifying that these stocks have a roughly linear correlation, this command introduces some new functions in pandas you may find useful. The first is iloc, which allows you to select indexed rows from a dataframe. The second is transpose, which inverts the rows and columns. Here, we select the first two rows, transpose, and then select all rows (prices) after the first (since the first is the Ticker symbol) Despite the trend we see in this example, we could imagine a nonlinear trend between prices. In these cases, it might be better to measure, not the linear correlation of the prices themselves, but whether the high prices for one stock coincide with another. In other words, the rank of market days by price should be the same, even if the prices are nonlinearly related. We can also calculate this rank correlation, also known as the Spearman's Rho, using scipy, with the following formula: Rho(a,b) = 6 * sum(d^2) / n (n2-1) Where n is the number of datapoints in each of two sets a and b, and d is the difference in ranks between each pair of datapoints ai and bi. Because we only compare the ranks of the data, not their actual values, this measure can capture variations up and down between two datasets, even if they vary over wide numerical ranges. Let's see if plotting the results using the Spearman correlation metric generates any differences in the pairwise distance of the stocks: The Spearman correlation distances, based on the x and y axes, appear closer to each other, suggesting from the perspective of rank correlation that the time series appear more similar. Though they differ in their assumptions about how the two compared datasets are distributed numerically, Pearson and Spearman correlations share the requirement that the two sets are of the same length. This is usually a reasonable assumption, and will be true of most of the examples we consider in this book. However, for cases where we wish to compare time series of unequal lengths, we can use Dynamic Time Warping (DTW). Conceptually, the idea of DTW is to warp one time series to align with a second, by allowing us to open gaps in either dataset so that it becomes the same size as the second. What the algorithm needs to resolve is where the most similar areas of the two series are, so that gaps can be places in the appropriate locations. In the simplest implementation, DTW consists of the following steps: For a dataset a of length n and a dataset n of length m, construct a matrix m by n. Set the top row and the leftmost column of this matrix to both be infinity. For each point i in set a, and point j in set b, compare their similarity using a cost function. To this cost function, add the minimum of the element (i-1, j-1), (i-1, j), and (j-1, i)—moving up and left, left, or up). These conceptually represent the costs of opening a gap in one of the series, versus aligning the same element in both. At the end of step 3, we will have traced the minimum cost path to align the two series, and the DTW distance will be represented by the bottommost corner of the matrix, (n.m). A negative aspect of this algorithm is that step 3 involves computing a value for every element of series a and b. For large time series or large datasets, this can be computationally prohibitive. While a full discussion of algorithmic improvements is beyond the scope of our present examples, we refer interested readers to FastDTW (which we will use in our example) and SparseDTW as examples of improvements that can be evaluated using many fewer calculations (Al-Naymat, G., Chawla, S., & Taheri, J. (2012), SparseDTW: A Novel Approach to Speed up Dynamic Time Warping and Stan Salvador and Philip Chan, FastDTW: Toward Accurate Dynamic Time Warping in Linear Time and Space. KDD Workshop on Mining Temporal and Sequential Data, pages 70-80, 20043). We can use the FastDTW algorithm to compare the stocks data as well, and to plot the resulting coordinates. First we will compare pairwise each pair of stocks and record their DTW distance in a matrix: For computational efficiency (because the distance between i and j equals the distance between stocks j and i), we calculate only the upper triangle of this matrix. We then add the transpose (for example, the lower triangle) to this result to get the full distance matrix. Finally, we can use MDS again to plot the results: Compared to the distribution of coordinates along the x and y axis for Pearson correlation and rank correlation, the DTW distances appear to span a wider range, picking up more nuanced differences between the time series of stock prices. Now that we've looked at numerical and time series data, as a last example let's examine calculating similarity in categorical datasets. Summary In this section, we learned how to identify groups of similar items in a dataset, an exploratory analysis that we might frequently use as a first step in deciphering new datasets. We explored different ways of calculating the similarity between datapoints and described what kinds of data these metrics might best apply to. We examined both divisive clustering algorithms, which split the data into smaller components starting from a single group, and agglomerative methods, where every datapoint starts as its own cluster. Using a number of datasets, we showed examples where these algorithms will perform better or worse, and some ways to optimize them. We also saw our first (small) data pipeline, a clustering application in PySpark using streaming data. Resources for Article: Further resources on this subject: Python Data Structures[article] Big Data Analytics[article] Data Analytics[article]
Read more
  • 0
  • 0
  • 12166
Modal Close icon
Modal Close icon