How-To Tutorials

09 Feb 2015

15 min read

Home Page Structure

09 Feb 2015

0
0
1189

How-To Tutorials

article-image-how-to-build-a-koa-web-application-part-2

Christoffer Hallas

08 Feb 2015

5 min read

How to Build a Koa Web Application - Part 2

Christoffer Hallas

08 Feb 2015

5 min read

In Part 1 of this series, we got everything in place for our Koa app using Jade and Mongel. In this post, we will cover Jade templates and how to use listing and viewing pages. Please note that this series requires that you use Node.js version 0.11+. Jade templates Rendering HTML is always an important part of any web application. Luckily, when using Node.js there are many great choices, and for this article we’ve chosen Jade. Keep in mind though that we will only touch on a tiny fraction of the Jade functionality. Let’s create our first Jade template. Create a file called create.jade and put in the following: create.jade doctype html html(lang='en') head title Create Page body h1 Create Page form(method='POST', action='/create') input(type='text', name='title', placeholder='Title') input(type='text', name='contents', placeholder='Contents') input(type='submit') For all the Jade questions you have that we won’t answer in this series, I refer you to the excellent official Jade website at http://jade-lang.com . If you add the following statement app.listen(3000); to the end of index.js, then you should be able to run the program from your terminal using the following command and by visiting http://localhost:3000 in your browser. $ node --harmony index.js The --harmony flag just tells the node program that we need support for generators in our program: Listing and viewing pages Now that we can create a page in our MongoDB database, it is time to actually list and view these pages. For this purpose we need to add another middleware to our index.js file after the first middleware: app.use(function* () { if (this.method != 'GET') { this.status = 405; this.body = 'Method Not Allowed'; return } … }); As you can probably already tell, this new middleware is very similar to the first one we added that handled the creation of pages. At first we make sure that the method of the request is GET, and if not, we respond appropriately and return the following: var params = this.path.split('/').slice(1); var id = params[0]; if (id.length == 0) { var pages = yield Page.find(); var html = jade.renderFile('list.jade', { pages: pages }); this.body = html; return } Then, we proceed to inspect the path attribute of the Koa context, looking for an ID that represents the page in the database. Remember how we redirected using the ID in the previous middleware. We inspect the path by splitting it into an array of strings separated by the forward slashes of a URL; this way the path /1234 becomes an array of ‘’ and ‘1234.’ Because the path starts with a forward slash, the first item in the array will always be the empty string, so we just discard that by default. Then we check the length of the ID parameter, and if it’s zero we know that there is in fact no ID in the path, and we should just look for the pages in the database and render our list.jade template with those pages made available to the template as the variable pages. Making data available in templates is also known as providing locals to the template. list.jade doctype html html(lang="en") head title Your Web Application body h1 Your Web Application ul - each page in pages li a(href='/#{page._id}')= page.title But if the length of id was not zero, we assume that it’s an id and we try to load that specific page from the database instead of all the pages, and we proceed to render our view.jade template with the: var page = yield Page.findById(id); var html = jade.renderFile('view.jade', page); this.body = html; view.jade doctype html html(lang="en") head title= title body h1= title p= contents That’s it You should now be able to run the app as previously described and create a page, list all of your pages, and view them. If you want to, you can continue and build a simple CMS system. Koa is very simple to use and doesn’t enforce a lot of functionality on you, allowing you to pick and choose between libraries that you need and want to use. There are many possibilities and that is one of Koa’s biggest strengths. Find even more Node.js content on our Node.js page. Featuring our latest titles and most popular tutorials, it's the perfect place to learn more about Node.js. About the author Christoffer Hallas is a software developer and entrepreneur from Copenhagen, Denmark. He is a computer polyglot and contributes to and maintains a number of open source projects. When not contemplating his next grand idea (which remains an idea), he enjoys music, sports, and design of all kinds. Christoffer can be found on GitHub as hallas and at Twitter as @hamderhallas.

0
0
4080

Packt

06 Feb 2015

14 min read

NSB and Security

Packt

06 Feb 2015

14 min read

This article by Rich Helton, the author of Learning NServiceBus Sagas, delves into the details of NSB and its security. In this article, we will cover the following: Introducing web security Cloud vendors Using .NET 4 Adding NServiceBus Benefits of NSB (For more resources related to this topic, see here.) Introducing web security According to the Top 10 list of 2013 by the Open Web Application Security Project (OWASP), found at https://www.owasp.org/index.php/Top10#OWASP_Top_10_for_2013, injection flaws still remain at the top among the ways to penetrate a web site. This is shown in the following screenshot: An injection flaw is a means of being able to access information or the site by injecting data into the input fields. This is normally used to bypass proper authentication and authorization. Normally, this is the data that the website has not seen in the testing efforts or considered during development. For references, I will consider some slides found at http://www.slideshare.net/rhelton_1/cweb-sec-oct27-2010-final. An instance of an injection flaw is to put SQL commands in form fields and even URL fields to try to get SQL errors and returns with further information. If the error is not generic, and a SQL exception occurs, it will sometimes return with table names. It may deny authorization for sa under the password table in SQL Server 2008. Knowing this gives a person knowledge of the SQL Server version, the sa user is being used, and the existence of a password table. There are many tools and websites for people on the Internet to practice their web security testing skills, rather than them literally being in IT security as a professional or amateur. Many of these websites are well-known and posted at places such as https://www.owasp.org/index.php/Phoenix/Tools. General disclaimer I do not endorse or encourage others to practice on websites without written permission from the website owner. Some of the live sites are as follows, and most are used to test web scanners: http://zero.webappsecurity.com/: This is developed by SPI Dynamics (now HP Security) for Web Inspect. It is an ASP site. http://crackme.cenzic.com/Kelev/view/home.php: This PHP site is from Cenzic. http://demo.testfire.net/: This is developed by WatchFire (now IBM Rational AppScan). It is an ASP site. http://testaspnet.vulnweb.com/: This is developed by Acunetix. It is a PHP site. http://webscantest.com/: This is developed by NT OBJECTives NTOSpider. It is a PHP site. There are many more sites and tools, and one would have to research them themselves. There are tools that will only look for SQL Injection. Hacking professionals who are very gifted and spend their days looking for only SQL injection would find these useful. We will start with SQL injection, as it is one of the most popular ways to enter a website. But before we start an analysis report on a website hack, we will document the website. Our target site will be http://zero.webappsecurity.com/. We will start with the EC-Council's Certified Ethical Hacker program, where they divide footprinting and scanning into seven basic steps: Information gathering Determining the network range Identifying active machines Finding open ports and access points OS fingerprinting Fingerprinting services Mapping the network We could also follow the OWASP Web Testing checklist, which includes: Information gathering Configuration testing Identity management testing Authentication testing Session management testing Data validation testing Error handling Cryptography Business logic testing Client-side testing The idea is to gather as much information on the website as possible before launching an attack, as there is no information gathered so far. To gather information on the website, you don't actually have to scan the website yourself at the start. There are many scanners that scan the website before you start. There are Google Bots gathering search information about the site, the Netcraft search engine gathering statistics about the site, as well as many domain search engines with contact information. If another person has hacked the site, there are sites and blogs where hackers talk about hacking a specific site, including what tools they used. They may even post security scans on the Internet, which could be found by googling. There is even a site (https://archive.org/) that is called the WayBack Machine as it keeps previous versions of websites that it scans for in archive. These are just some basic pieces, and any person who has studied for their Certified Ethical Hacker's exam should have all of this on their fingertips. We will discuss some of the benefits that Microsoft and Particular.net have taken into consideration to assist those who develop solutions in C#. We can search at http://web.archive.org/web/ or http://zero.webappsecurity.com/ for changes from the WayBack Machine, and we will see something like this: From this search engine, we look at what the screens looked like 2003, and walk through various changes to the present 2014. Actually, there were errors on archive copying the site in 2003, so this machine directed us to the first best copy on May 11, 2006, as shown in the following screenshot: Looking with Netcraft, we can see that it was first started in 2004, last rebooted in 2014, and is running Ubuntu, as shown in this screenshot: Next, we can try to see what Google tells us. There are many Google Hacking Databases that keep track of keywords in the Google Search Engine API. These keywords are expressions such as file: passwd to search for password files in Ubuntu, and many more. This is not a hacking book, and this site is well-known, so we will just search for webappsecurity.com file:passwd. This gives me more information than needed. On the first item, I get a sample web scan report of the available vulnerabilities in the site from 2008, as shown in the following screenshot: We can also see which links Google has already found by running http://zero.webappsecurity.com/, as shown in this screenshot: In these few steps, I have enough information to bring a targeted website attack to check whether these vulnerabilities are still active or not. I know the operating system of the website and have details of the history of the website. This is before I have even considered running tools to approach the website. To scan the website, for which permission is always needed ahead of time, there are multiple web scanners available. For a list of web scanners, one website is http://sectools.org/tag/web-scanners/. One of the favorites is built by the famed Googler Michal Zalewski, and is called skipfish. Skipfish is an open source tool written in the C language, and it can be used in Windows by compiling it in Cygwin libraries, which are Linux virtual libraries and tools for Windows. Skipfish has its own man pages at http://dev.man-online.org/man1/skipfish/, and it can be downloaded from https://code.google.com/p/skipfish/. Skipfish performs web crawling, fuzzing, and tests for many issues such as XSS and SQL Injection. In Skipfish's case, its fussing uses dictionaries to add more paths to websites, extensions, and keywords that are normally found as attack vectors through the experience of hackers, to apply to the website being scanned. For instance, it may not be apparent from the pages being scanned that there is an admin/index.html page available, but the dictionary will try to check whether the page is available. Skipfish results will appear as follows: The issue with Skipfish is that it is noisy, because of its fuzzer. Skipfish will try many scans and checks for links that might not exist, which will take some time and can be a little noisy out of the box. There are many configurations, and there is throttling of the scanning to try to hide the noise. An associated scan in HP's WebInspect scanner will appear like this: These are just automated means to inspect a website. These steps are common, and much of this material is known in web security. After an initial inspection of a website, a person may start making decisions on how to check their information further. Manually checking websites An experienced web security person may now start proceeding through more manual checks and less automated checking of websites after taking an initial look at the website. For instance, type Admin as the user ID and password, or type Guest instead of Admin, and the list progresses based on experience. Then try the Admin and password combination, then the Admin and password123 combination, and so on. A person inspecting a website might have a lot of time to try to perform penetration testing, and might try hundreds of scenarios. There are many tools and scripts to automate the process. As security analysts, we find many sites that give admin access just by using Admin and Admin as the user ID and password, respectively. To enhance personal skills, there are many tutorials to walk through. One thing to do is to pull down a live website that you can set up for practice, such as WebGoat, and go through the steps outlined in the tutorials from sites such as http://webappsecmovies.sourceforge.net/webgoat/. These sites will show a person how to perform SQL Injection testing through the WebGoat site. As part of these tutorials, there are plugins of Firefox to test security scripts, HTML, debug pieces and tamper with the website through the browser, as shown in this screenshot: Using .NET 4 can help Every page that is deployed to the Internet (and in many cases, the Intranet as well), constantly gets probed and prodded by scans, viruses, and network noise. There are so many pokes, probes, and prods on networks these days that most of them are seen as noise. By default, .NET 4 offers some validation and out-of-the-box support for Web requests. Using .NET 4, you may discover that some input types such as double quotes, single quotes, and even < are blocked in some form fields. You will get an error like what is shown in the following screenshot when trying to pass some of the values: This is very basic validation, and it will reside in the .NET version 4 framework's pooling pieces of Internet Information Services (IIS) for Windows. To further offer security following Microsoft's best enterprise practices, we may also consider using Model-View-Controller (MVC) and Entity Frameworks (EF). To get this information, we can review Microsoft Application Architecture Guide at http://msdn.microsoft.com/en-us/library/ff650706.aspx. The MVC design pattern is the most commonly used pattern in software and is designed as follows: This is a very common design pattern, so why is this important in security? What is helpful is that we can validate data requests and responses through the controllers, as well as provide data annotations for each data element for more validation. A common attack that appeared through viruses through the years is the buffer overflow. A buffer overflow is used to send a lot of data to the data elements. Validation can check whether there is sufficient data to counteract the buffer overflow. EF is a Microsoft framework used to provide an object-relationship mapper. Not only can it easily generate objects to and from the SQL Server through Visual Studio, but it can also use objects instead of SQL scripting. Since it does not use SQL, SQL Injection, which is an attack involving injecting SQL commands through input fields, can be counteracted. Even though some of these techniques will help mitigate many attack vectors, the gateway to backend processes is usually through the website. There are many more injection attack vectors. If stored procedures are used for SQL Server, a scan be tried to access any stored procedures that the website may be calling, as well as for any default stored procedures that may be lingering from default installations from SQL Server. So how do we add further validation and decouple the backend processes in an organization from the website? NServiceBus to the rescue NServiceBus is the most popular C# platform framework used to implement an Enterprise Service Bus (ESB) for service-oriented architecture (SOA). Basically, NSB hosts Windows services through its NServiceBus.Host.exe program, and interfaces these services through different message queuing components. A C# MVC-EF program can call web services directly, and when the web service receives an error, the website will receive the error directly in the MVC program. This creates a coupling of the web service and the website, where changes in the website can affect the web services and actions in the web services can affect the website. Because of this coupling, websites may have a Please do not refresh the page until the process is finished warning. Normally, it is wise to step away from the phone, tablet, or computer until the website is loaded. It could be that even though you may not touch the website, another process running on the machine may. A virus scanner, update, or multiple other processes running on the device could cause any glitch in the refreshing of anything on the device. With all the scans that could be happening on a website and that others on the Internet could be doing, it seems quite odd that a page would say Please don't' touch me, I am busy. In order to decouple the website from the web services, a service needs to be deployed between the website and web service. It helps if that service has a lot of out-of-the-box security features as well, to help protect the interaction between the website and web service. For this reason, a product such as NServiceBus is most helpful, where others have already laid the groundwork to have advanced security features in services tested through the industry by their use. Being the most common C# ESB platform has its advantages, as developers and architects ensure the integrity of the framework well before a new design starts using it. Benefits of NSB NSB provides many components needed for automation that are only found in ESBs. ESBs provide the following: Separation of duties: There is separation of duties from the frontend to the backend, allowing the frontend to fire a message to a service and continue in its processing, and not worrying about the results until it needs an update. Also, separation of workflow responsibility exists through separating out NSB services. One service could be used to send payments to a bank, and another service could be used to provide feedback of the current status of payment to the MVC-EF database so that a user may see their payment status. Message durability: Messages are saved in queues between services so that in case services are stopped, they can start from the messages in the queues when they restart, and the messages will persist until told otherwise. Workflow retries: Messages, or endpoints, can be told to retry a number of times until they completely fail and send an error. The error is automated to return to an error queue. For instance, a web service message can be sent to a bank, and it can be set to retry the web service every 5 minutes for 20 minutes before giving up completely. This is useful during any network or server issues. Monitoring: NSB ServicePulse can keep a heartbeat on its services. Other monitoring can easily be done on the NSB queues to report on the number of messages. Encryption: Messages between services and endpoints can be easily encrypted. High availability: Multiple services or subscribers could be processing the same or similar messages from various services that are living on different servers. When one server or service goes down, others could be made available to take over those that are already running. Summary If any website is on the Internet, it is being scanned by a multitude of means, from websites and others. It is wise to decouple external websites from backend processes through a means such as NServiceBus. Websites that are not decoupled from the backend can be acted upon by the external processes that it may be accomplishing, such as a web service to validate a credit card. These websites may say Do not refresh this page. Other conditions might occur to the website and be beyond your reach, refreshing the page to affect that interaction. The best solution is to decouple the website from these processes through NServiceBus. Resources for Article: Further resources on this subject: Mobile Game Design [Article] CryENGINE 3: Breaking Ground with Sandbox [Article] CryENGINE 3: Fun Physics [Article]

0
0
3783

How-To Tutorials

article-image-multiplying-performance-parallel-computing

Packt

06 Feb 2015

22 min read

Multiplying Performance with Parallel Computing

Packt

06 Feb 2015

22 min read

In this article, by Aloysius Lim and William Tjhi, authors of the book R High Performance Programming, we will learn how to write and execute a parallel R code, where different parts of the code run simultaneously. So far, we have learned various ways to optimize the performance of R programs running serially, that is in a single process. This does not take full advantage of the computing power of modern CPUs with multiple cores. Parallel computing allows us to tap into all the computational resources available and to speed up the execution of R programs by many times. We will examine the different types of parallelism and how to implement them in R, and we will take a closer look at a few performance considerations when designing the parallel architecture of R programs. (For more resources related to this topic, see here.) Data parallelism versus task parallelism Many modern software applications are designed to run computations in parallel in order to take advantage of the multiple CPU cores available on almost any computer today. Many R programs can similarly be written in order to run in parallel. However, the extent of possible parallelism depends on the computing task involved. On one side of the scale are embarrassingly parallel tasks, where there are no dependencies between the parallel subtasks; such tasks can be made to run in parallel very easily. An example of this is, building an ensemble of decision trees in a random forest algorithm—randomized decision trees can be built independently from one another and in parallel across tens or hundreds of CPUs, and can be combined to form the random forest. On the other end of the scale are tasks that cannot be parallelized, as each step of the task depends on the results of the previous step. One such example is a depth-first search of a tree, where the subtree to search at each step depends on the path taken in previous steps. Most algorithms fall somewhere in between with some steps that must run serially and some that can run in parallel. With this in mind, careful thought must be given when designing a parallel code that works correctly and efficiently. Often an R program has some parts that have to be run serially and other parts that can run in parallel. Before making the effort to parallelize any of the R code, it is useful to have an estimate of the potential performance gains that can be achieved. Amdahl's law provides a way to estimate the best attainable performance gain when you convert a code from serial to parallel execution. It divides a computing task into its serial and potentially-parallel parts and states that the time needed to execute the task in parallel will be no less than this formula: T(n) = T(1)(P + (1-P)/n), where: T(n) is the time taken to execute the task using n parallel processes P is the proportion of the whole task that is strictly serial The theoretical best possible speed up of the parallel algorithm is thus: S(n) = T(1) / T(n) = 1 / (P + (1-P)/n) For example, given a task that takes 10 seconds to execute on one processor, where half of the task can be run in parallel, then the best possible time to run it on four processors is T(4) = 10(0.5 + (1-0.5)/4) = 6.25 seconds. The theoretical best possible speed up of the parallel algorithm with four processors is 1 / (0.5 + (1-0.5)/4) = 1.6x . The following figure shows you how the theoretical best possible execution time decreases as more CPU cores are added. Notice that the execution time reaches a limit that is just above five seconds. This corresponds to the half of the task that must be run serially, where parallelism does not help. Best possible execution time versus number of CPU cores In general, Amdahl's law means that the fastest execution time for any parallelized algorithm is limited by the time needed for the serial portions of the algorithm. Bear in mind that Amdahl's law provides only a theoretical estimate. It does not account for the overheads of parallel computing (such as starting and coordinating tasks) and assumes that the parallel portions of the algorithm are infinitely scalable. In practice, these factors might significantly limit the performance gains of parallelism, so use Amdahl's law only to get a rough estimate of the maximum speedup possible. There are two main classes of parallelism: data parallelism and task parallelism. Understanding these concepts helps to determine what types of tasks can be modified to run in parallel. In data parallelism, a dataset is divided into multiple partitions. Different partitions are distributed to multiple processors, and the same task is executed on each partition of data. Take for example, the task of finding the maximum value in a vector dataset, say one that has one billion numeric data points. A serial algorithm to do this would look like the following code, which iterates over every element of the data in sequence to search for the largest value. (This code is intentionally verbose to illustrate how the algorithm works; in practice, the max() function in R, though also serial in nature, is much faster.) serialmax <- function(data) {max = -Inffor (i in data) {if (i > max)max = i}return max} One way to parallelize this algorithm is to split the data into partitions. If we have a computer with eight CPU cores, we can split the data into eight partitions of 125 million numbers each. Here is the pseudocode for how to perform the same task in parallel: # Run this in parallel across 8 CPU corespart.results <- run.in.parallel(serialmax(data.part))# Compute global maxglobal.max <- serialmax(part.results) This pseudocode runs eight instances of serialmax()in parallel—one for each data partition—to find the local maximum value in each partition. Once all the partitions have been processed, the algorithm finds the global maximum value by finding the largest value among the local maxima. This parallel algorithm works because the global maximum of a dataset must be the largest of the local maxima from all the partitions. The following figure depicts data parallelism pictorially. The key behind data parallel algorithms is that each partition of data can be processed independently of the other partitions, and the results from all the partitions can be combined to compute the final results. This is similar to the mechanism of the MapReduce framework from Hadoop. Data parallelism allows algorithms to scale up easily as data volume increases—as more data is added to the dataset, more computing nodes can be added to a cluster to process new partitions of data. Data parallelism Other examples of computations and algorithms that can be run in a data parallel way include: Element-wise matrix operations such as addition and subtraction: The matrices can be partitioned and the operations are applied to each pair of partitions. Means: The sums and number of elements in each partition can be added to find the global sum and number of elements from which the mean can be computed. K-means clustering: After data partitioning, the K centroids are distributed to all the partitions. Finding the closest centroid is performed in parallel and independently across the partitions. The centroids are updated by first, calculating the sums and the counts of their respective members in parallel, and then consolidating them in a single process to get the global means. Frequent itemset mining using the Partition algorithm: In the first pass, the frequent itemsets are mined from each partition of data to generate a global set of candidate itemsets; in the second pass, the supports of the candidate itemsets are summed from each partition to filter out the globally infrequent ones. The other main class of parallelism is task parallelism, where tasks are distributed to and executed on different processors in parallel. The tasks on each processor might be the same or different, and the data that they act on might also be the same or different. The key difference between task parallelism and data parallelism is that the data is not divided into partitions. An example of a task parallel algorithm performing the same task on the same data is the training of a random forest model. A random forest is a collection of decision trees built independently on the same data. During the training process for a particular tree, a random subset of the data is chosen as the training set, and the variables to consider at each branch of the tree are also selected randomly. Hence, even though the same data is used, the trees are different from one another. In order to train a random forest of say 100 decision trees, the workload could be distributed to a computing cluster with 100 processors, with each processor building one tree. All the processors perform the same task on the same data (or exact copies of the data), but the data is not partitioned. The parallel tasks can also be different. For example, computing a set of summary statistics on the same set of data can be done in a task parallel way. Each process can be assigned to compute a different statistic—the mean, standard deviation, percentiles, and so on. Pseudocode of a task parallel algorithm might look like this: # Run 4 tasks in parallel across 4 coresfor (task in tasks)run.in.parallel(task)# Collect the results of the 4 tasksresults <- collect.parallel.output()# Continue processing after all 4 tasks are complete Implementing data parallel algorithms Several R packages allow code to be executed in parallel. The parallel package that comes with R provides the foundation for most parallel computing capabilities in other packages. Let's see how it works with an example. This example involves finding documents that match a regular expression. Regular expression matching is a fairly computational expensive task, depending on the complexity of the regular expression. The corpus, or set of documents, for this example is a sample of the Reuters-21578 dataset for the topic corporate acquisitions (acq) from the tm package. Because this dataset contains only 50 documents, they are replicated 100,000 times to form a corpus of 5 million documents so that parallelizing the code will lead to meaningful savings in execution times. library(tm)data("acq")textdata <- rep(sapply(content(acq), content), 1e5) The task is to find documents that match the regular expression d+(,d+)? mln dlrs, which represents monetary amounts in millions of dollars. In this regular expression, d+ matches a string of one or more digits, and (,d+)? optionally matches a comma followed by one more digits. For example, the strings 12 mln dlrs, 1,234 mln dlrs and 123,456,789 mln dlrs will match the regular expression. First, we will measure the execution time to find these documents serially with grepl(): pattern <- "\d+(,\d+)? mln dlrs"system.time(res1 <- grepl(pattern, textdata))## user system elapsed ## 65.601 0.114 65.721 Next, we will modify the code to run in parallel and measure the execution time on a computer with four CPU cores: library(parallel)detectCores()## [1] 4cl <- makeCluster(detectCores())part <- clusterSplit(cl, seq_along(textdata))text.partitioned <- lapply(part, function(p) textdata[p])system.time(res2 <- unlist( parSapply(cl, text.partitioned, grepl, pattern = pattern))) ## user system elapsed ## 3.708 8.007 50.806 stopCluster(cl) In this code, the detectCores() function reveals how many CPU cores are available on the machine, where this code is executed. Before running any parallel code, makeCluster() is called to create a local cluster of processing nodes with all four CPU cores. The corpus is then split into four partitions using the clusterSplit() function to determine the ideal split of the corpus such that each partition has roughly the same number of documents. The actual parallel execution of grepl() on each partition of the corpus is carried out by the parSapply() function. Each processing node in the cluster is given a copy of the partition of data that it is supposed to process along with the code to be executed and other variables that are needed to run the code (in this case, the pattern argument). When all four processing nodes have completed their tasks, the results are combined in a similar fashion to sapply(). Finally, the cluster is destroyed by calling stopCluster(). It is good practice to ensure that stopCluster() is always called in production code, even if an error occurs during execution. This can be done as follows: doSomethingInParallel <- function(...) { cl <- makeCluster(...) on.exit(stopCluster(cl)) # do something} In this example, running the task in parallel on four processors resulted in a 23 percent reduction in the execution time. This is not in proportion to the amount of compute resources used to perform the task; with four times as many CPU cores working on it, a perfectly parallelizable task might experience as much as a 75 percent runtime reduction. However, remember Amdahl's law—the speed of parallel code is limited by the serial parts, which includes the overheads of parallelization. In this case, calling makeCluster() with the default arguments creates a socket-based cluster. When such a cluster is created, additional copies of R are run as workers. The workers communicate with the master R process using network sockets, hence the name. The worker R processes are initialized with the relevant packages loaded, and data partitions are serialized and sent to each worker process. These overheads can be significant, especially in data parallel algorithms where large volumes of data needs to be transferred to the worker processes. Besides parSapply(), parallel also provides the parApply() and parLapply() functions; these functions are analogous to the standard sapply(), apply(), and lapply() functions, respectively. In addition, the parLapplyLB() and parSapplyLB() functions provide load balancing, which is useful when the execution of each parallel task takes variable amounts of time. Finally, parRapply() and parCapply() are parallel row and column apply() functions for matrices. On non-Windows systems, parallel supports another type of cluster that often incurs less overheads — forked clusters. In these clusters, new worker processes are forked from the parent R process with a copy of the data. However, the data is not actually copied in the memory unless it is modified by a child process. This means that, compared to socket-based clusters, initializing child processes is quicker and the memory usage is often lower. Another advantage of using forked clusters is that parallel provides a convenient and concise way to run tasks on them via the mclapply(), mcmapply(), and mcMap() functions. (These functions start with mc because they were originally a part of the multicore package) There is no need to explicitly create and destroy the cluster, as these functions do this automatically. We can simply call mclapply() and state the number of worker processes to fork via the mc.cores argument: system.time(res3 <- unlist( mclapply(text.partitioned, grepl, pattern = pattern, mc.cores = detectCores())))## user system elapsed ## 127.012 0.350 33.264 This shows a 49 percent reduction in execution time compared to the serial version, and 35 percent reduction compared to parallelizing using a socket-based cluster. For this example, forked clusters provide the best performance. Due to differences in system configuration, you might see very different results when you try the examples in your own environment. When you develop parallel code, it is important to test the code in an environment that is similar to the one that it will eventually run in. Implementing task parallel algorithms Let's now see how to implement a task parallel algorithm using both socket-based and forked clusters. We will look at how to run the same task and different tasks on workers in a cluster. Running the same task on workers in a cluster To demonstrate how to run the same task on a cluster, the task for this example is to generate 500 million Poisson random numbers. We will do this by using L'Ecuyer's combined multiple-recursive generator, which is the only random number generator in base R that supports multiple streams to generate random numbers in parallel. The random number generator is selected by calling the RNGkind() function. We cannot just use any random number generator in parallel because the randomness of the data depends on the algorithm used to generate random data and the seed value given to each parallel task. Most other algorithms were not designed to produce random numbers in multiple parallel streams, and might produce multiple highly correlated streams of numbers, or worse, multiple identical streams! First, we will measure the execution time of the serial algorithm: RNGkind("L'Ecuyer-CMRG")nsamples <- 5e8lambda <- 10system.time(random1 <- rpois(nsamples, lambda))## user system elapsed## 51.905 0.636 52.544 To generate the random numbers on a cluster, we will first distribute the task evenly among the workers. In the following code, the integer vector samples.per.process contains the number of random numbers that each worker needs to generate on a four-core CPU. The seq() function produces ncores+1 numbers evenly distributed between 0 and nsamples, with the first number being 0 and the next ncores numbers indicating the approximate cumulative number of samples across the worker processes. The round() function rounds off these numbers into integers and diff() computes the difference between them to give the number of random numbers that each worker process should generate. cores <- detectCores()cl <- makeCluster(ncores)samples.per.process <- diff(round(seq(0, nsamples, length.out = ncores+1))) Before we can generate the random numbers on a cluster, each worker needs a different seed from which it can generate a stream of random numbers. The seeds need to be set on all the workers before running the task, to ensure that all the workers generate different random numbers. For a socket-based cluster, we can call clusterSetRNGStream() to set the seeds for the workers, then run the random number generation task on the cluster. When the task is completed, we call stopCluster() to shut down the cluster: clusterSetRNGStream(cl)system.time(random2 <- unlist( parLapply(cl, samples.per.process, rpois, lambda = lambda)))## user system elapsed ## 5.006 3.000 27.436stopCluster(cl) Using four parallel processes in a socket-based cluster reduces the execution time by 48 percent. The performance of this type of cluster for this example is better than that of the data parallel example because there is less data to copy to the worker processes—only an integer that indicates how many random numbers to generate. Next, we run the same task on a forked cluster (again, this is not supported on Windows). The mclapply() function can set the random number seeds for each worker for us, when the mc.set.seed argument is set to TRUE; we do not need to call clusterSetRNGStream(). Otherwise, the code is similar to that of the socket-based cluster: system.time(random3 <- unlist( mclapply(samples.per.process, rpois, lambda = lambda, mc.set.seed = TRUE, mc.cores = ncores))) ## user system elapsed ## 76.283 7.272 25.052 On our test machine, the execution time of the forked cluster is slightly faster, but close to that of the socket-based cluster, indicating that the overheads for this task are similar for both types of clusters. Running different tasks on workers in a cluster So far, we have executed the same tasks on each parallel process. The parallel package also allows different tasks to be executed on different workers. For this example, the task is to generate not only Poisson random numbers, but also uniform, normal, and exponential random numbers. As before, we start by measuring the time to perform this task serially: RNGkind("L'Ecuyer-CMRG")nsamples <- 5e7pois.lambda <- 10system.time(random1 <- list(pois = rpois(nsamples, pois.lambda), unif = runif(nsamples), norm = rnorm(nsamples), exp = rexp(nsamples)))## user system elapsed ## 14.180 0.384 14.570 In order to run different tasks on different workers on socket-based clusters, a list of function calls and their associated arguments must be passed to parLapply(). This is a bit cumbersome, but parallel unfortunately does not provide an easier interface to run different tasks on a socket-based cluster. In the following code, the function calls are represented as a list of lists, where the first element of each sublist is the name of the function that runs on a worker, and the second element contains the function arguments. The function do.call() is used to call the given function with the given arguments. cores <- detectCores()cl <- makeCluster(cores)calls <- list(pois = list("rpois", list(n = nsamples, lambda = pois.lambda)), unif = list("runif", list(n = nsamples)), norm = list("rnorm", list(n = nsamples)), exp = list("rexp", list(n = nsamples)))clusterSetRNGStream(cl)system.time( random2 <- parLapply(cl, calls, function(call) { do.call(call[[1]], call[[2]]) }))## user system elapsed ## 2.185 1.629 10.403stopCluster(cl) On forked clusters on non-Windows machines, the mcparallel() and mccollect() functions offer a more intuitive way to run different tasks on different workers. For each task, mcparallel() sends the given task to an available worker. Once all the workers have been assigned their tasks, mccollect() waits for the workers to complete their tasks and collects the results from all the workers. mc.reset.stream()system.time({ jobs <- list() jobs[[1]] <- mcparallel(rpois(nsamples, pois.lambda), "pois", mc.set.seed = TRUE) jobs[[2]] <- mcparallel(runif(nsamples), "unif", mc.set.seed = TRUE) jobs[[3]] <- mcparallel(rnorm(nsamples), "norm", mc.set.seed = TRUE) jobs[[4]] <- mcparallel(rexp(nsamples), "exp", mc.set.seed = TRUE) random3 <- mccollect(jobs)})## user system elapsed ## 14.535 3.569 7.97 Notice that we also had to call mc.reset.stream() to set the seeds for random number generation in each worker. This was not necessary when we used mclapply(), which calls mc.reset.stream() for us. However, mcparallel() does not, so we need to call it ourselves. Summary In this article, we learned about two classes of parallelism: data parallelism and task parallelism. Data parallelism is good for tasks that can be performed in parallel on partitions of a dataset. The dataset to be processed is split into partitions and each partition is processed on a different worker processes. Task parallelism, on the other hand, divides a set of similar or different tasks to amongst the worker processes. In either case, Amdahl's law states that the maximum improvement in speed that can be achieved by parallelizing code is limited by the proportion of that code that can be parallelized. Resources for Article: Further resources on this subject: Using R for Statistics, Research, and Graphics [Article] Learning Data Analytics with R and Hadoop [Article] Aspects of Data Manipulation in R [Article]

0
0
3888

article-image-lync-2013-hybrid-and-lync-online

Packt

06 Feb 2015

27 min read

Lync 2013 Hybrid and Lync Online

Packt

06 Feb 2015

27 min read

0
0
12847

Packt

06 Feb 2015

13 min read

Tour of Xcode

Packt

06 Feb 2015

13 min read

In this article, written by Jayant Varma, the author of Xcode 6 Essentials, we shall look at Xcode closely as this is going to be the tool you would use quite a lot for all aspects of your app development for Apple devices. It is a good idea to know and be familiar with the interface, the sections, shortcut keys, and so on. (For more resources related to this topic, see here.) Starting Xcode Xcode, like many other Mac applications, is found in the Applications folder or the Launchpad. On starting Xcode, you will be greeted with the launch screen that offers some entry points for working with Xcode. Mostly, you will select Create a new Xcode project or Check out an existing project , if you have an existing project to continue work on. Xcode remembers what it was doing last, so if you had a project or file open, it will open up those windows again. Creating a new project After selecting the Create a new project option, we are guided via a wizard that helps us get started. Selecting the project type The first step is to select what type of project you want to create. At the moment, there are two distinct types of projects, mobile (iOS) or desktop (OS X) that you can create. Within each of those types, you can select the type of project you want. The screenshot displays a standard configuration for iOS application projects. The templates used when the selected type of project is created are self sufficient, that is, when the Run button is pressed, the app compiles and runs. It might do nothing, as this is a minimalistic template. On selecting the type of project, we can select the next step: Setting the project options This step allows selecting the options, namely setting the application name, the organization name, identifier, language, and devices to support. In the past, the language was always set to Objective-C, however with Xcode 6, there are two options: objective-C and Swift Setting the project properties On creation, the main screen is displayed. Here it offers the option to change other details related to the application such as the version number and build. It also allows you to configure the team ID and certificates used for signing the application to test on a mobile device or for distribution to the App Store. It also allows you to set the compatibility for earlier versions. The orientation and app icons, splash screens, and so on are also set from this screen. If you want to set these up later on in the project, it is fine, this can be accessed at any time and does not stop you from development. It needs to be set prior to deploying it on a device or creating an App Store ready application. Xcode overview Let us have a look at the Xcode interface to familiarize ourselves with the same as it would help improve productivity when building your application. The top section immediately following the traffic light (window chrome) displays a Play and Stop button. This allows the project to run and stop. The breadcrumb toolbar displays the project-specific settings with respect to the product and the target. With an iOS project, it could be a particular simulator for iPhone, iPad, and so on, or a physical device (number 5 in the following screenshot). Just under this are vertical areas that are the main content area with all the files, editors, UI, and so on. These can be displayed or hidden as required and can be stacked vertically or horizontally. The distinct areas in Xcode are as follows: Project navigation (number1) Editor and assistant editor (number 2 ) and (number 3 ) Utility/inspector (number 4 ) The toolbar (number 5 ) and (number 6 ) These sections can be switched on and off (shown or hidden) as required to make space for other sections or more screen space to work with: Sections in Xcode The project section The project navigation section has three sub sections, the topmost being the project toolbar that has eight icons. These can be seen as in the following screenshot. The next sub section contains the project files and all the assets required for this project. The bottom most section consists of recently edited files and filters: You can use the keyboard shortcuts to access these areas quickly with the CMD + 1...8 keys. The eight areas available under project navigation are key and for the beginner to Xcode, this could be a bit daunting. When you run the project, the current section might change and display another where you might wonder how to get back to the project (file) navigator. Getting familiar with these is always helpful and the easiest way to navigate between these is the CMD + 1..8 keys. Project navigator ( CMD + 1 ): This displays all of the files, folders, assets, frameworks, and so on that are part of this project. This is displayed as a hierarchical view and is the way that a majority of developers access their files, folders, and so on. Symbol navigator ( CMD + 2 ): This displays all of the classes, members, and methods that are available in them. This is the easiest way to navigate quickly to a method/function, attribute/property. Search navigator ( CMD + 3 ): This allows you to search the project for a particular match. This is quite useful to find and replace text. Issues navigator ( CMD + 4 ): This displays the warning and errors that occur while typing your code or on building and running it. This also displays the results of the static analyzer. Tests navigator ( CMD + 5 ); This displays the tests that you have present in your code either added by yourself or the default ones created with the project. Debug navigator ( CMD + 6 ): This displays the information about the application when you choose to run it. It has some amazing detailed information on CPU usage, memory usage, disk usage, threads, and so on. Breakpoint navigator ( CMD + 7 ): This displays all the breakpoints in your project from all files. This also allows you to create exception and symbolic breakpoints. Log navigator ( CMD + 8 ): This displays a log of all actions carried out, namely compiling, building, and running. This is more useful when used to determine the results of automated builds The editor and assistant editor sections The second area contains the editor and assistant editor sections. These display the code, the XIB (as appropriate), storyboard files, device previews, and so on. Each of the sub sections have a jump bar on the top that relates to files and allow for navigating back and forth in the files and display the location of the file in the workspace. To the right from this is a mini issues navigator that displays all warnings and errors. In the case of the assistant editors, it also displays two buttons: one to add a new assistant editor area and another to close it. Source code editors While we are looking at the interface, it is worth noting that the Xcode code editor is a very advanced editor with a lot of features, which is now seen as standard with a lot of text editors. Some of the features that make working with Xcode easier are as follows: Code folding : This feature helps to hide code at points such as the function declaration, loops, matching brace brackets, and so on. When a function or portion of code is folded, it hides it from view, thereby allowing you to view other areas of the code that would not be visible unless you scrolled. Syntax highlighting : This is one of the most useful features as it helps you, the developer, to visually, at a glance, differentiate your source code from variables, constants, and strings. Xcode has syntax highlighting for a couple of languages as mentioned earlier. Context help : This is one of the best features whereby when you hover over a word in the source code with OPT pressed, it shows a dotted underline and the cursor changes to a question mark. When you click on a word with the dotted underline and the question mark cursor, it displays a popup with details about that word. It also highlights all instances of that word in the file. The popup details as much information as available. If it is a variable or a function that you have added to the code, then it will display the name of the file where it was declared. If it is a word that is contained in the Apple libraries, then it displays the description and other additional details. Context jump : This is another cool feature that allows jumping to the point of declaration of that word. This is achieved by clicking on a word while keeping the CMD button pressed. In many cases, this is mainly helpful to know how the function is declared and what parameters it expects. It can also be useful to get information on other enumerators and constants used with that function. The jump could be in the same file as where you are editing the code or it could be to the header files where they are declared. Edit all in scope : This is a cool feature where you can edit all of the instances of the word together rather than using search and replace. A case scenario is if you want to change the name of a variable and ensure that all instances you are using in the file are changed but not the ones that are text, then you can use this option to quickly change it. Catching mistakes with fix-it : This is another cool feature in Xcode that will save you a lot of time and hassle. As you type text, Xcode keeps analyzing the code and looking for errors. If you have declared a variable and not used it in your code, Xcode immediately draws attention to it suggesting that the variable is an unused variable. However, if it was supposed to be a pointer and you have declared it without *; Xcode immediately flags it as an error that the interface type cannot be statically allocated. It offers a fix-it solution of inserting * and the code has a greyed * character showing where it will be added. This helps the developer fix commonly overlooked issues such as missing semicolons, missing declarations, or misspelled variable names. Code completion : This is the bit that makes writing code so much easier, type in a few letters of the function name and Xcode pops up a list of functions, constants, methods, and so on that start with those letters and displays all of the required parameters (as applicable) including the return type. When selected, it adds the token placeholders that can be replaced with the actual parameter values. The results might vary from person to person depending on the settings and the speed of the system you run Xcode on. The assistant editor The assistant editor is mainly used to display the counterparts and related files to the file open in the primary editor (generally used when working with Objective-C where the .h or.m files are the related files). The assistant editors track the contents of the editor. Xcode is quite intelligent and knows the corresponding sections and counterparts. When you click on a file, it opens up in the editor. However, pressing the OPT + Shift while clicking on the file, you would be provided with an interactive dialog to select where to open the file. The options include the primary editor or the assistant editor. You can also add assistant editors as required. Another way to open a file quickly is to use the Open Quickly option, which has a shortcut key of CMD + Shift + O . This displays a textbox that allows accessing a file from the project. The utility/inspector section The last section contains the inspector and library. This section changes based on the type of file selected in the current editor. The inspector has 6 tabs/sections and they are as follows: The file inspector ( CMD + OPT + 1 ): This displays the physical file information for the file selected. For code files, it is the text encoding, the targets that it belongs to, and the physical file path. While for the storyboard, it is the physical file path and allows setting attributes such as auto layout and size classes (new in Xcode 6). The quick help inspector ( CMD + OPT + 2 ): This displays information about the class or object selected. The identity inspector ( CMD + OPT + 3 ): This displays the class name, ID, and others that identify the object selected. The attributes inspector ( CMD + OPT + 4 ): This displays the attributes for the object selected as if it is the initial root view controller, does it extend under the top bars or not, if it has a navigation bar or not, and others. This also displays the user-defined attributes (a new feature with Xcode 6). The size inspector ( CMD + OPT + 5 ): This displays the size of the control selected and the associated constraints that help position it on the container. The connections inspector ( CMD + OPT + 6 ): This displays the connections created in the Interface Builder between the UI and the code. The lower half of this inspector contains four options that help you work efficiently, they are as follows: The file template library : This contains the options to create a new class, protocol. The options that are available when selecting the File | New option from the menu. The code snippets library : This is a wonderful but not widely used option. This can hold code snippets that can help you avoid writing repetitive blocks of code in your app. You can drag and drop the snippet to your code in the editor. This also offers features such as shortcuts, scopes, platforms, and languages. So you can have a shortcut such as appDidLoad (for example) that inserts the code to create and populate a button. This is achieved simply by setting the platform as appropriate to iOS or OS X. After creating a code snippet, as soon as you type the first few characters, the code snippet shows up in the list of autocomplete options; The object library : This is the toolbox that contains all of the controls that you need for creating your UI, be it a button, a label, a Table View, view, View Controller, or anything else. Adding a code snippet is as easy as dragging the selected code from the editor onto the snippet area. It is a little tricky because the moment you start dragging, it could break your selection highlight. You need to select the text, click (hold) and then drag it. The media library : This contains the list of all images and other media types that are available to this project/workspace. Summary In this article, you have seen a quick tour of Xcode, keeping the shortcuts and tips handy as they really do help get things done faster. The code snippets are a wonderful feature that allow for quickly setting up commonly used code with shortcut keywords. Resources for Article: Further resources on this subject: Introducing Xcode Tools for iPhone Development [article] Xcode 4 ios: Displaying Notification Messages [article] Linking OpenCV to an iOS project [article]

0
0
9665

article-image-structural-equation-modeling-and-confirmatory-factor-analysis

Packt

06 Feb 2015

30 min read

Structural Equation Modeling and Confirmatory Factor Analysis

Packt

06 Feb 2015

30 min read

0
0
6841

article-image-creating-games-cocos2d-x-easy-and-100-percent-free

Packt

06 Feb 2015

5 min read

Creating Games with Cocos2d-x is Easy and 100-percent Free

Packt

06 Feb 2015

5 min read

This article written by Raydelto Hernandez, the author of Cocos2d-x Android Game Development, explains the history of game development. It also shows how Cocos2d-x is a beneficial software for game development. This article also explains that this software is free and open source, which makes it all the more beneficial. The launch of the Apple App Store back in 2008 leveraged the reach capacity of indie game developers that since this occurrence are able to reach millions of users and compete with large companies, outperforming them in some situations. This reality led the trend of creating reusable game engines such as Cocos2D-iPhone written natively using Objective-C by the argentine, Ricardo Quesada; it allowed many independent developers to reach the top charts of downloads. Picking an existing game engine is a smart choice for indies and large companies since it allows them to focus on the game logic rather than rewriting core features over and over again, thus there are many game engines out there with all kind of licenses and characteristics. The most popular game engines for mobile systems right now are Unity, Marmalade, and Cocos2d-x; the three of them have the capabilities to create 2D and 3D games. Determining which one is the best in terms of ease of use and available tools may be arguably but, there is one objective fact that we can mention that could be easily verified. Among these three engines, Cocos2d-x is the only one that you can use for free, no matter how much money you make using it. We highlighted on this article's title that Cocos2d-x is completely free. This emphasis was done because the other two frameworks also allow some ways of free usage; nevertheless, both at some point require a payment for the usage license. In order to understand why Cocos2d-x is still free and open source, we need to understand how this tool was born. Ricardo, an enthusiastic Python programmer, often participated on game creation challenges from the scratch in only one week. Back in those days, Ricardo and his team re-wrote the core engine for each game until they came with the idea of creating a framework for encapsulating core game capabilities that could be used on any two-dimensional game and make it open source, so contributions could be received worldwide. And that is why Cocos2d was originally written for fun. With the launch in 2007 of the first iPhone, Ricardo lead the development of the port of the Cocos2d Python framework to the iPhone platform using its native language Objective-C. Cocos2D-iPhone quickly became popular among indie game developers, some of them turning themselves into appillionaires, as Chris Stevens called those individuals and enterprises that made millions of dollars during the app store bubble period. This phenomenon made game development companies look at this framework created by hobbyist as a tool creating their products. Zynga was one of the first big companies to adopt Cocos2d as their framework for delivering their famous Farmville game to the iPhone in 2009; this company trades on NASDAQ since 2011 and has more than 2,000 employees. In July 2010, a C++ port of the Cocos2d iPhone called Cocos2d-x was written in China with the objective of taking the power of the framework to other platforms such as the Android operating system that by that time was gaining market share at a spectacular rate. In 2011, this Cocos2d port was acquired by Chukong Technologies, the third largest mobile game development company in China, who later hired the original Cocos2d-iPhone author to join their team. Today, Cocos2d-x-based games dominate the top grossing charts of Google Play and the App Store, especially in Asia. Recognized companies and leading studios such as Konami, Zynga, BANDAI NAMCO, Wooga, Disney Mobile, and Square Enix are using Cocos2d-x in their games. Currently, there are 400,000 developers working on adding new functionalities and making this framework as stable as possible, including engineers from Google, ARM, INTEL, BlackBerry, and Microsoft, who officially support the ports to their products such as Windows Phone, Windows, Windows Metro Interface, and they're planning to support Cocos2d-x for the Xbox during this year. Cocos2d-x is a very straightforward engine that requires a little learning curve to grasp it. I teach game development courses at many universities using this framework. During the first week, the students are capable of creating a game with the complexity of the famous title, Doodle Jump. This can be easily achieved because the framework provides us with all the single components required for our game, such as physics, audio-handling, collision detection, animations, networking, data storage, user input, map rendering, scene transitions, 3D rendering, particle systems rendering, font handling, menu creation, displaying forms, threads handling, and so on, abstracting us from the low-level logic and allowing us to focus on the game logic. In conclusion, if you are willing to learn how to develop games for mobile platforms I strongly recommend you to learn and use the Cocos2d-x framework because it is easy to use, is totally free, is open source, which means that you could better understand it by reading its source, you could modify it if needed, and you have the warranty that you will never be forced to pay a license fee if your game becomes a hit. Another big advantage of this framework is its highly available documentation including the Packt Publishing collection of Cocos2d-x game development books. Sumary This article talked about the different uses of Cocos2d-x. It explained how Cocos2d-x is used worldwide today for game development. This article talked about the use of Cocos2d-x as a free and open source platform for game development.

0
0
1699

How-To Tutorials

Packt

06 Feb 2015

10 min read

Hyper-V Basics

Packt

06 Feb 2015

10 min read

This article by Vinith Menon, the author of Microsoft Hyper-V PowerShell Automation, delves into the basics of Hyper-V, right from installing Hyper-V to resizing virtual hard disks. The Hyper-V PowerShell module includes several significant features that extend its use, improve its usability, and allow you to control and manage your Hyper-V environment with more granular control. Various organizations have moved on from Hyper-V (V2) to Hyper-V (V3). In Hyper-V (V2), the Hyper-V management shell was not built-in and the PowerShell module had to be manually installed. In Hyper-V (V3), Microsoft has provided an exhaustive set of cmdlets that can be used to manage and automate all configuration activities of the Hyper-V environment. The cmdlets are executed across the network using Windows Remote Management. In this article, we will cover: The basics of setting up a Hyper-V environment using PowerShell The fundamental concepts of Hyper-V management with the Hyper-V management shell The updated features in Hyper-V (For more resources related to this topic, see here.) Here is a list of all the new features introduced in Hyper-V in Windows Server 2012 R2. We will be going in depth through the important changes that have come into the Hyper-V PowerShell module with the following features and functions: Shared virtual hard disk Resizing the live virtual hard disk Installing and configuring your Hyper-V environment Installing and configuring Hyper-V using PowerShell Before you proceed with the installation and configuration of Hyper-V, there are some prerequisites that need to be taken care of: The user account that is used to install the Hyper-V role should have administrative privileges on the computer There should be enough RAM on the server to run newly created virtual machines Once the prerequisites have been taken care of, let's start with installing the Hyper-V role: Open a PowerShell prompt in Run as Administrator mode: Type the following into the PowerShell prompt to install the Hyper-V role along with the management tools; once the installation is complete, the Hyper-V Server will reboot and the Hyper-V role will be successfully installed: Install-WindowsFeature –Name Hyper-V -IncludeManagementTools - Restart Once the server boots up, verify the installation of Hyper-V using the Get-WindowsFeature cmdlet: Get-WindowsFeature -Name hyper* You will be able to see that the Hyper-V role, Hyper-V PowerShell management shell, and the GUI management tools are successfully installed: Fundamental concepts of Hyper-V management with the Hyper-V management shell In this section, we will look at some of the fundamental concepts of Hyper-V management with the Hyper-V management shell. Once you get the Hyper-V role installed as per the steps illustrated in the previous section, a PowerShell module to manage your Hyper-V environment will also get installed. Now, perform the following steps: Open a PowerShell prompt in the Run as Administrator mode. PowerShell uses cmdlets that are built using a verb-noun naming system (for more details, refer to Learning Windows PowerShell Names at http://technet.microsoft.com/en-us/library/dd315315.aspx). Type the following command into the PowerShell prompt to get a list of all the cmdlets in the Hyper-V PowerShell module: Get-Command -Module Hyper-V Hyper-V in Windows Server 2012 R2 ships with about 178 cmdlets. These cmdlets allow a Hyper-V administrator to handle very simple, basic tasks to advanced ones such as setting up a Hyper-V replica for virtual machine disaster recovery. To get the count of all the available Hyper-V cmdlets, you can type the following command in PowerShell: Get-Command -Module Hyper-V | Measure-Object The Hyper-V PowerShell cmdlets follow a very simple approach and are very user friendly. The cmdlet name itself indirectly communicates with the Hyper-V administrator about its functionality. The following screenshot shows the output of the Get command: For example, in the following screenshot, the Remove-VMSwitch cmdlet itself says that it's used to delete a previously created virtual machine switch: If the administrator is still not sure about the task that can be performed by the cmdlet, he or she can get help with detailed examples using the Get-Help cmdlet. To get help on the cmdlet type, type the cmdlet name in the prescribed format. To make sure that the latest version of help files are installed on the server, run the Update-Help cmdlet before executing the following cmdlet: Get-Help <Hyper-V cmdlet> -Full The following screenshot is an example of the Get-Help cmdlet: Shared virtual hard disks This new and improved feature in Windows Server 2012 R2 allows an administrator to share a virtual hard disk file (the .vhdx file format) between multiple virtual machines. These .vhdx files can be used as shared storage for a failover cluster created between virtual machines (also known as guest clustering). A shared virtual hard disk allows you to create data disks and witness disks using .vhdx files with some advantages: Shared disks are ideal for SQL database files and file servers Shared disks can be run on generation 1 and generation 2 virtual machines This new feature allows you to save on storage costs and use the .vhdx files for guest clustering, enabling easier deployment rather than using virtual Fibre Channel or Internet Small Computer System Interface (iSCSI), which are complicated and require storage configuration changes such as zoning and Logic Unit Number (LUN) masking. In Windows Server 2012 R2, virtual iSCSI disks (both shared and unshared virtual hard disk files) show up as virtual SAS disks when you add an iSCSI hard disk to a virtual machine. Shared virtual hard disks (.vhdx) files can be placed on Cluster Shared Volumes (CSV) or a Scale-Out File Server cluster Let's look at the ways you can automate and manage your shared .vhdx guest clustering configuration using PowerShell. In the following example, we will demonstrate how you can create a two-node file server cluster using the shared VHDX feature. After that, let's set up a testing environment within which we can start learning these new features. The steps are as follows: We will start by creating two virtual machines each with 50 GB OS drives, which contains a sysprep image of Windows Server 2012 R2. Each virtual machine will have 4 GB RAM and four virtual CPUs. D:vhdbase_1.vhdx and D:vhdbase_2.vhdx are already existing VHDX files with sysprepped image of Windows Server 2012 R2. The following code is used to create two virtual machines: New-VM –Name "Fileserver_VM1" –MemoryStartupBytes 4GB – NewVHDPath d:vhdbase_1.vhdx -NewVHDSizeBytes 50GB New-VM –Name "Fileserver_VM2" –MemoryStartupBytes 4GB –NewVHDPath d:vhdbase_2.vhdx -NewVHDSizeBytes 50GB Next, we will install the file server role and configure a failover cluster on both the virtual machines using PowerShell. You need to enable PowerShell remoting on both the file servers and also have them joined to a domain. The following is the code: Install-WindowsFeature -computername Fileserver_VM1 File- Services, FS-FileServer, Failover-Clustering Install-WindowsFeature -computername Fileserver_VM1 RSAT- Clustering –IncludeAllSubFeature Install-WindowsFeature -computername Fileserver_VM2 File- Services, FS-FileServer, Failover-Clustering Install-WindowsFeature -computername Fileserver_VM2 RSAT- Clustering -IncludeAllSubFeature Once we have the virtual machines created and the file server and failover clustering features installed, we will create the failover cluster as per Microsoft's best practices using the following set of cmdlets: New-Cluster -Name Cluster1 -Node FileServer_VM1, FileServer_VM2 -StaticAddress 10.0.0.59 -NoStorage – Verbose You will need to choose a name and IP address that fits your organization. Next, we will create two vhdx files named sharedvhdx_data.vhdx (which will be used as a data disk) and sharedvhdx_quorum.vhdx (which will be used as the quorum or the witness disk). To do this, the following commands need to be run on the Hyper-V cluster: New-VHD -Path c:ClusterStorageVolume1sharedvhdx_data.VHDX -Fixed - SizeBytes 10GB New-VHD -Path c:ClusterStorageVolume1sharedvhdx_quorum.VHDX -Fixed - SizeBytes 1GB Once we have created these virtual hard disk files, we will add them as shared .vhdx files. We will attach these newly created VHDX files to the Fileserver_VM1 and Fileserver_VM2 virtual machines and specify the parameter-shared VHDX files for guest clustering: Add-VMHardDiskDrive –VMName Fileserver_VM1 -Path c:ClusterStorageVolume1sharedvhdx_data.VHDX – ShareVirtualDisk Add-VMHardDiskDrive –VMName Fileserver_VM2 -Path c:ClusterStorageVolume1sharedvhdx_data.VHDX – ShareVirtualDisk Finally, we will be making the disks available online and adding them to the failover cluster using the following command: Get-ClusterAvailableDisk | Add-ClusterDisk Once we have executed the preceding set of steps, we will have a highly available file server infrastructure using shared VHD files. Live virtual hard disk resizing With Windows Server 2012 R2, a newly added feature in Hyper-V allows the administrators to expand or shrink the size of a virtual hard disk attached to the SCSI controller while the virtual machines are still running. Hyper-V administrators can now perform maintenance operations on a live VHD and avoid any downtime by not temporarily shutting down the virtual machine for these maintenance activities. Prior to Windows Server 2012 R2, to resize a VHD attached to the virtual machine, it had to be turned off leading to costly downtime. Using the GUI controls, the VHD resize can be done by using only the Edit Virtual Hard Disk wizard. Also, note that the VHDs that were previously expanded can be shrunk. The Windows PowerShell way of doing a VHD resize is by using the Resize-VirtualDisk cmdlet. Let's look at the ways you can automate a VHD resize using PowerShell. In the next example, we will demonstrate how you can expand and shrink a virtual hard disk connected to a VM's SCSI controller. We will continue using the virtual machine that we created for our previous example. We have a pre-created VHD of 50 GB that is connected to the virtual machine's SCSI controller. Expanding the virtual hard disk Let's resize the aforementioned virtual hard disk to 57 GB using the Resize-Virtualdisk cmdlet: Resize-VirtualDisk -Name "scsidisk" -Size (57GB) Next, if we open the VM settings and perform an inspect disk operation, we'll be able to see that the VHDX file size has become 57 GB: Also, one can verify this when he or she logs into the VM, opens disk management, and extends the unused partition. You can see that the disk size has increased to 57 GB: Resizing the virtual hard disk Let's resize the earlier mentioned VHD to 57 GB using the Resize-Virtualdisk cmdlet: For this exercise, the primary requirement is to shrink the disk partition by logging in to the VM using disk management, as you can see in the following screenshot; we're shrinking the VHDX file by 7 GB: Next, click on Shrink. Once you complete this step, you will see that the unallocated space is 7 GB. You can also execute this step using the Resize-Partition Powershell cmdlet: Get-Partition -DiskNumber 1 | Resize-Partition -Size 50GB The following screenshot shows the partition: Next, we will resize/shrink the VHD to 50 GB: Resize-VirtualDisk -Name "scsidisk" -Size (50GB) Once the previous steps have been executed successfully, run a re-scan disk using disk management and you will see that the disk size is 50 GB: Summary In this article, we went through the basics of setting up a Hyper-V environment using PowerShell. We also explored the fundamental concepts of Hyper-V management with Hyper-V management shell. Resources for Article: Further resources on this subject: Hyper-V building blocks for creating your Microsoft virtualization platform [article] The importance of Hyper-V Security [article] Network Access Control Lists [article]

0
0
9499

article-image-extending-elasticsearch-scripting

Packt

06 Feb 2015

21 min read

Extending ElasticSearch with Scripting

Packt

06 Feb 2015

21 min read

0
0
8475

Packt

06 Feb 2015

11 min read

Warming Up

Packt

06 Feb 2015

11 min read

In this article by Bater Makhabel, author of Learning Data Mining with R, you will learn basic data mining terms such as data definition, preprocessing, and so on. (For more resources related to this topic, see here.) The most important data mining algorithms will be illustrated with R to help you grasp the principles quickly, including but not limited to, classification, clustering, and outlier detection. Before diving right into data mining, let's have a look at the topics we'll cover: Data mining Social network mining In the history of humankind, the results of data from every aspect is extensive, for example websites, social networks by user's e-mail or name or account, search terms, locations on map, companies, IP addresses, books, films, music, and products. Data mining techniques can be applied to any kind of old or emerging data; each data type can be best dealt with using certain, but not all, techniques. In other words, the data mining techniques are constrained by data type, size of the dataset, context of the tasks applied, and so on. Every dataset has its own appropriate data mining solutions. New data mining techniques always need to be researched along with new data types once the old techniques cannot be applied to it or if the new data type cannot be transformed onto the traditional data types. The evolution of stream mining algorithms applied to Twitter's huge source set is one typical example. The graph mining algorithms developed for social networks is another example. The most popular and basic forms of data are from databases, data warehouses, ordered/sequence data, graph data, text data, and so on. In other words, they are federated data, high dimensional data, longitudinal data, streaming data, web data, numeric, categorical, or text data. Big data Big data is large amount of data that does not fit in the memory of a single machine. In other words, the size of data itself becomes a part of the issue when studying it. Besides volume, two other major characteristics of big data are variety and velocity; these are the famous three Vs of big data. Velocity means data process rate or how fast the data is being processed. Variety denotes various data source types. Noises arise more frequently in big data source sets and affect the mining results, which require efficient data preprocessing algorithms. As a result, distributed filesystems are used as tools for successful implementation of parallel algorithms on large amounts of data; it is a certainty that we will get even more data with each passing second. Data analytics and visualization techniques are the primary factors of the data mining tasks related to massive data. Some data types that are important to big data are as follows: The data from the camera video, which includes more metadata for analysis to expedite crime investigations, enhanced retail analysis, military intelligence, and so on. The second data type is from embedded sensors, such as medical sensors, to monitor any potential outbreaks of virus. The third data type is from entertainment, information freely published through social media by anyone. The last data type is consumer images, aggregated from social media, and tagging on these like images are important. Here is a table illustrating the history of data size growth. It shows that information will be more than double every two years, changing the way researchers or companies manage and extract value through data mining techniques from data, revealing new data mining studies. Year Data Sizes Comments N/A 1 MB (Megabyte) = 220. The human brain holds about 200 MB of information. N/A 1 PB (Petabyte) = 250. It is similar to the size of 3 years' observation data for Earth by NASA and is equivalent of 70.8 times the books in America's Library of Congress. 1999 1 EB 1 EB (Exabyte) = 260. The world produced 1.5 EB of unique information. 2007 281 EB The world produced about 281 Exabyte of unique information. 2011 1.8 ZB 1 ZB (Zetabyte)= 270. This is all data gathered by human beings in 2011. Very soon 1 YB(Yottabytes)= 280. Scalability and efficiency Efficiency, scalability, performance, optimization, and the ability to perform in real time are important issues for almost any algorithms, and it is the same for data mining. There are always necessary metrics or benchmark factors of data mining algorithms. As the amount of data continues to grow, keeping data mining algorithms effective and scalable is necessary to effectively extract information from massive datasets in many data repositories or data streams. The storage of data from a single machine to wide distribution, the huge size of many datasets, and the computational complexity of the data mining methods are all factors that drive the development of parallel and distributed data-intensive mining algorithms. Data source Data serves as the input for the data mining system and data repositories are important. In an enterprise environment, database and logfiles are common sources. In web data mining, web pages are the source of data. The data that continuously fetched various sensors are also a typical data source. Here are some free online data sources particularly helpful to learn about data mining: Frequent Itemset Mining Dataset Repository: A repository with datasets for methods to find frequent itemsets (http://fimi.ua.ac.be/data/). UCI Machine Learning Repository: This is a collection of dataset, suitable for classification tasks (http://archive.ics.uci.edu/ml/). The Data and Story Library at statlib: DASL (pronounced "dazzle") is an online library of data files and stories that illustrate the use of basic statistics methods. We hope to provide data from a wide variety of topics so that statistics teachers can find real-world examples that will be interesting to their students. Use DASL's powerful search engine to locate the story or data file of interest. (http://lib.stat.cmu.edu/DASL/) WordNet: This is a lexical database for English (http://wordnet.princeton.edu) Data mining Data mining is the discovery of a model in data; it's also called exploratory data analysis, and discovers useful, valid, unexpected, and understandable knowledge from the data. Some goals are shared with other sciences, such as statistics, artificial intelligence, machine learning, and pattern recognition. Data mining has been frequently treated as an algorithmic problem in most cases. Clustering, classification, association rule learning, anomaly detection, regression, and summarization are all part of the tasks belonging to data mining. The data mining methods can be summarized into two main categories of data mining problems: feature extraction and summarization. Feature extraction This is to extract the most prominent features of the data and ignore the rest. Here are some examples: Frequent itemsets: This model makes sense for data that consists of baskets of small sets of items. Similar items: Sometimes your data looks like a collection of sets and the objective is to find pairs of sets that have a relatively large fraction of their elements in common. It's a fundamental problem of data mining. Summarization The target is to summarize the dataset succinctly and approximately, such as clustering, which is the process of examining a collection of points (data) and grouping the points into clusters according to some measure. The goal is that points in the same cluster have a small distance from one another, while points in different clusters are at a large distance from one another. The data mining process There are two popular processes to define the data mining process in different perspectives, and the more widely adopted one is CRISP-DM: Cross-Industry Standard Process for Data Mining (CRISP-DM) Sample, Explore, Modify, Model, Assess (SEMMA), which was developed by the SAS Institute, USA CRISP-DM There are six phases in this process that are shown in the following figure; it is not rigid, but often has a great deal of backtracking: Let's look at the phases in detail: Business understanding: This task includes determining business objectives, assessing the current situation, establishing data mining goals, and developing a plan. Data understanding: This task evaluates data requirements and includes initial data collection, data description, data exploration, and the verification of data quality. Data preparation: Once available, data resources are identified in the last step. Then, the data needs to be selected, cleaned, and then built into the desired form and format. Modeling: Visualization and cluster analysis are useful for initial analysis. The initial association rules can be developed by applying tools such as generalized rule induction. This is a data mining technique to discover knowledge represented as rules to illustrate the data in the view of causal relationship between conditional factors and a given decision/outcome. The models appropriate to the data type can also be applied. Evaluation :The results should be evaluated in the context specified by the business objectives in the first step. This leads to the identification of new needs and in turn reverts to the prior phases in most cases. Deployment: Data mining can be used to both verify previously held hypotheses or for knowledge. SEMMA Here is an overview of the process for SEMMA: Let's look at these processes in detail: Sample: In this step, a portion of a large dataset is extracted Explore: To gain a better understanding of the dataset, unanticipated trends and anomalies are searched in this step Modify: The variables are created, selected, and transformed to focus on the model construction process Model: A variable combination of models is searched to predict a desired outcome Assess: The findings from the data mining process are evaluated by its usefulness and reliability Social network mining As we mentioned before, data mining finds a model on data and the mining of social network finds the model on graph data in which the social network is represented. Social network mining is one application of web data mining; the popular applications are social sciences and bibliometry, PageRank and HITS, shortcomings of the coarse-grained graph model, enhanced models and techniques, evaluation of topic distillation, and measuring and modeling the Web. Social network When it comes to the discussion of social networks, you will think of Facebook, Google+, LinkedIn, and so on. The essential characteristics of a social network are as follows: There is a collection of entities that participate in the network. Typically, these entities are people, but they could be something else entirely. There is at least one relationship between the entities of the network. On Facebook, this relationship is called friends. Sometimes, the relationship is all-or-nothing; two people are either friends or they are not. However, in other examples of social networks, the relationship has a degree. This degree could be discrete, for example, friends, family, acquaintances, or none as in Google+. It could be a real number; an example would be the fraction of the average day that two people spend talking to each other. There is an assumption of nonrandomness or locality. This condition is the hardest to formalize, but the intuition is that relationships tend to cluster. That is, if entity A is related to both B and C, then there is a higher probability than average that B and C are related. Here are some varieties of social networks: Telephone networks: The nodes in this network are phone numbers and represent individuals E-mail networks: The nodes represent e-mail addresses, which represent individuals Collaboration networks: The nodes here represent individuals who published research papers; the edge connecting two nodes represent two individuals who published one or more papers jointly Social networks are modeled as undirected graphs. The entities are the nodes, and an edge connects two nodes if the nodes are related by the relationship that characterizes the network. If there is a degree associated with the relationship, this degree is represented by labeling the edges. Here is an example in which Coleman's High School Friendship Data from the sna R package is used for analysis. The data is from a research on friendship ties between 73 boys in a high school in one chosen academic year; reported ties for all informants are provided for two time points (fall and spring). The dataset's name is coleman, which is an array type in R language. The node denotes a specific student and the line represents the tie between two students. Summary The book has, as showcased in this article, a lot more interesting coverage with regard to data mining and R. Deep diving into the algorithms associated with data mining and efficient methods to implement them using R. Resources for Article: Further resources on this subject: Multiplying Performance with Parallel Computing [article] Supervised learning [article] Using R for Statistics, Research, and Graphics [article]

0
0
2000

Packt

06 Feb 2015

22 min read

Event-driven Programming

Packt

06 Feb 2015

22 min read

0
0
6437

Packt

06 Feb 2015

17 min read

Mobile Administration

Packt

06 Feb 2015

17 min read

0
0
2036

Packt

06 Feb 2015

32 min read

Remote Access

Packt

06 Feb 2015

32 min read

0
0
1564

Packt

06 Feb 2015

12 min read

Qlik Sense's Vision

Packt

06 Feb 2015

12 min read

In this article by Christopher Ilacqua, Henric Cronström, and James Richardson, authors of the book Learning Qlik® Sense, we will look at the evolving requirements that compel organizations to readdress how they deliver business intelligence and support data-driven decision-making. This is important as it supplies some of the reasons as to why Qlik® Sense is relevant and important to their success. The purpose of covering these factors is so that you can consider and plan for them in your organization. Among other things, in this article, we will cover the following topics: The ongoing data explosion The rise of in-memory processing Barrierless BI through Human-Computer Interaction The consumerization of BI and the rise of self-service The use of information as an asset The changing role of IT (For more resources related to this topic, see here.) Evolving market factors Technologies are developed and evolved in response to the needs of the environment they are created and used within. The most successful new technologies anticipate upcoming changes in order to help people take advantage of altered circumstances or reimagine how things are done. Any market is defined by both the suppliers—in this case, Qlik®—and the buyers, that is, the people who want to get more use and value from their information. Buyers' wants and needs are driven by a variety of macro and micro factors, and these are always in flux in some markets more than others. This is obviously and apparently the case in the world of data, BI, and analytics, which has been changing at a great pace due to a number of factors discussed further in the rest of this article. Qlik Sense has been designed to be the means through which organizations and the people that are a part of them thrive in a changed environment. Big, big, and even bigger data A key factor is that there's simply much more data in many forms to analyze than before. We're in the middle of an ongoing, accelerating data boom. According to Science Daily, 90 percent of the world's data was generated over the past two years. The fact is that with technologies such as Hadoop and NoSQL databases, we now have unprecedented access to cost-effective data storage. With vast amounts of data now storable and available for analysis, people need a way to sort the signal from the noise. People from a wider variety of roles—not all of them BI users or business analysts—are demanding better, greater access to data, regardless of where it comes from. Qlik Sense's fundamental design centers on bringing varied data together for exploration in an easy and powerful way. The slow spinning down of the disk At the same time, we are seeing a shift in how computation occurs and potentially, how information is managed. Fundamentals of the computing architectures that we've used for decades, the spinning disk and moving read head, are becoming outmoded. This means storing and accessing data has been around since Edison invented the cylinder phonograph in 1877. It's about time this changed. This technology has served us very well; it was elegant and reliable, but it has limitations. Speed limitations primarily. Fundamentals that we take for granted today in BI, such as relational and multidimensional storage models, were built around these limitations. So were our IT skills, whether we realized it at the time. With the use of in-memory processing and 64-bit addressable memory spaces, these limitations are gone! This means a complete change in how we think about analysis. Processing data in memory means we can do analysis that was impractical or impossible before with the old approach. With in-memory computing, analysis that would've taken days before, now takes just seconds (or much less). However, why does it matter? Because it allows us to use the time more effectively; after all, time is the most finite resource of all. In-memory computing enables us to ask more questions, test more scenarios, do more experiments, debunk more hypotheses, explore more data, and run more simulations in the short window available to us. For IT, it means no longer trying to second-guess what users will do months or years in advance and trying to premodel it in order to achieve acceptable response times. People hate watching the hourglass spin. Qlik Sense's predecessor QlikView® was built on the exploitation of in-memory processing; Qlik Sense has it at its core too. Ubiquitous computing and the Internet of Things You may know that more than a billion people use Facebook, but did you know that the majority of those people do so from a mobile device? The growth in the number of devices connected to the Internet is absolutely astonishing. According to Cisco's Zettabyte Era report, Internet traffic from wireless devices will exceed traffic from wired devices in 2014. If we were writing this article even as recently as a year ago, we'd probably be talking about mobile BI as a separate thing from desktop or laptop delivered analytics. The fact of the matter is that we've quickly gone beyond that. For many people now, the most common way to use technology is on a mobile device, and they expect the kind of experience they've become used to on their iOS or Android device to be mirrored in complex software, such as the technology they use for visual discovery and analytics. From its inception, Qlik Sense has had mobile usage in the center of its design ethos. It's the first data discovery software to be built for mobiles, and that's evident in how it uses HTML5 to automatically render output for the device being used, whatever it is. Plug in a laptop running Qlik Sense to a 70-inch OLED TV and the visual output is resized and re-expressed to optimize the new form factor. So mobile is the new normal. This may be astonishing but it's just the beginning. Mobile technology isn't just a medium to deliver information to people, but an acceleration of data production for analysis too. By 2020, pretty much everyone and an increasing number of things will be connected to the Internet. There are 7 billion people on the planet today. Intel predicts that by 2020, more than 31 billion devices will be connected to the Internet. So, that's not just devices used by people directly to consume or share information. More and more things will be put online and communicate their state: cars, fridges, lampposts, shoes, rubbish bins, pets, plants, heating systems—you name it. These devices will generate a huge amount of data from sensors that monitor all kinds of measurable attributes: temperature, velocity, direction, orientation, and time. This means an increasing opportunity to understand a huge gamut of data, but without the right technology and approaches it will be complex to analyze what is going on. Old methods of analysis won't work, as they don't move quickly enough. The variety and volume of information that can be analyzed will explode at an exponential rate. The rise of this type of big data makes us redefine how we build, deliver, and even promote analytics. It is an opportunity for those organizations that can exploit it through analysis; this can sort the signals from the noise and make sense of the patterns in the data. Qlik Sense is designed as just such a signal booster; it takes how users can zoom and pan through information too large for them to easily understand the product. Unbound Human-Computer Interaction We touched on the boundary between the computing power and the humans using it in the previous section. Increasingly, we're removing barriers between humans and technology. Take the rise of touch devices. Users don't want to just view data presented to them in a static form. Instead, they want to "feel" the data and interact with it. The same is increasingly true of BI. The adoption of BI tools has been too low because the technology has been hard to use. Adoption has been low because in the past BI tools often required people to conform to the tool's way of working, rather than reflecting the user's way of thinking. The aspiration for Qlik Sense (when part of the QlikView.Next project) was that the software should be both "gorgeous and genius". The genius part obviously refers to the built-in intelligence, the smarts, the software will have. The gorgeous part is misunderstood or at least oversimplified. Yes, it means cosmetically attractive (which is important) but much more importantly, it means enjoyable to use and experience. In other words, Qlik Sense should never be jarring to users but seamless, perhaps almost transparent to them, inducing a state of mental flow that encourages thinking about the question being considered rather than the tool used to answer it. The aim was to be of most value to people. Qlik Sense will empower users to explore their data and uncover hidden insights, naturally. Evolving customer requirements It is not only the external market drivers that impact how we use information. Our organizations and the people that work within them are also changing in their attitude towards technology, how they express ideas through data, and how increasingly they make use of data as a competitive weapon. Consumerization of BI and the rise of self-service The consumerization of any technology space is all about how enterprises are affected by, and can take advantage of, new technologies and models that originate and develop in the consumer marker, rather than in the enterprise IT sector. The reality is that individuals react quicker than enterprises to changes in technology. As such, consumerization cannot be stopped, nor is it something to be adopted. It can be embraced. While it's not viable to build a BI strategy around consumerization alone, its impact must be considered. Consumerization makes itself felt in three areas: Technology: Most investment in innovation occurs in the consumer space first, with enterprise vendors incorporating consumer-derived features after the fact. (Think about how vendors added the browser as a UI for business software applications.) Economics: Consumer offerings are often less expensive or free (to try) with a low barrier of entry. This drives prices down, including enterprise sectors, and alters selection behavior. People: Demographics, which is the flow of Millennial Generation into the workplace, and the blurring of home/work boundaries and roles, which may be seen from a traditional IT perspective as rogue users, with demands to BYOPC or device. In line with consumerization, BI users want to be able to pick up and just use the technology to create and share engaging solutions; they don't want to read the manual. This places a high degree of importance on the Human-Computer Interaction (HCI) aspects of a BI product (refer to the preceding list) and governed access to information and deployment design. Add mobility to this and you get a brand new sourcing and adoption dynamic in BI, one that Qlik engendered, and Qlik Sense is designed to take advantage of. Think about how Qlik Sense Desktop was made available as a freemium offer. Information as an asset and differentiator As times change, so do differentiators. For example, car manufacturers in the 1980s differentiated themselves based on reliability, making sure their cars started every single time. Today, we expect that our cars will start; reliability is now a commodity. The same is true for ERP systems. Originally, companies implemented ERPs to improve reliability, but in today's post-ERP world, companies are shifting to differentiating their businesses based on information. This means our focus changes from apps to analytics. And analytics apps, like those delivered by Qlik Sense, help companies access the data they need to set themselves apart from the competition. However, to get maximum return from information, the analysis must be delivered fast enough, and in sync with the operational tempo people need. Things are speeding up all the time. For example, take the fashion industry. Large mainstream fashion retailers used to work two seasons per year. Those that stuck to that were destroyed by fast fashion retailers. The same is true for old style, system-of-record BI tools; they just can't cope with today's demands for speed and agility. The rise of information activism A new, tech-savvy generation is entering the workforce, and their expectations are different than those of past generations. The Beloit College Mindset List for the entering class of 2017 gives the perspective of students entering college this year, how they see the world, and the reality they've known all their lives. For this year's freshman class, Java has never been just a cup of coffee and a tablet is no longer something you take in the morning. This new generation of workers grew up with the Internet and is less likely to be passive with data. They bring their own devices everywhere they go, and expect it to be easy to mash-up data, communicate, and collaborate with their peers. The evolution and elevation of the role of IT We've all read about how the role of IT is changing, and the question CIOs today must ask themselves is: "How do we drive innovation?". IT must transform from being gatekeepers (doers) to storekeepers (enablers), providing business users with self-service tools they need to be successful. However, to achieve this transformation, they need to stock helpful tools and provide consumable information products or apps. Qlik Sense is a key part of the armory that IT needs to provide to be successful in this transformation. Summary In this article, we looked at the factors that provide the wider context for the use of Qlik Sense. The factors covered arise out of both increasing technical capability and demands to compete in a globalized, information-centric world, where out-analyzing your competitors is a key success factor. Resources for Article: Further resources on this subject: Securing QlikView Documents [article] Conozca QlikView [article] Introducing QlikView elements [article]

0
0
2208

Home Page Structure

How to Build a Koa Web Application - Part 2

NSB and Security

Multiplying Performance with Parallel Computing

Lync 2013 Hybrid and Lync Online

Tour of Xcode

Structural Equation Modeling and Confirmatory Factor Analysis

Creating Games with Cocos2d-x is Easy and 100-percent Free

Hyper-V Basics

Extending ElasticSearch with Scripting

Trending Topics

Warming Up

Event-driven Programming

Mobile Administration

Remote Access

Qlik Sense's Vision

Create a Free Account To Continue Reading

Sign in to activate your 7-day free access