Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletter Hub
Free Learning
Arrow right icon
timer SALE ENDS IN
0 Days
:
00 Hours
:
00 Minutes
:
00 Seconds

How-To Tutorials

7008 Articles
article-image-fine-tune-your-web-application-profiling-and-automation
Packt
07 Jun 2016
17 min read
Save for later

Fine Tune Your Web Application by Profiling and Automation

Packt
07 Jun 2016
17 min read
In this article by James Singleton, author of the book,ASP.NET Core 1.0 High Performance,sheds some light on how to improve the performance of your web application by profiling and testing it. In this article, we will cover writing automated tests to monitor performance along with adding these to aContinuous Integration(CI) and deployment system by constantly checking for regressions. (For more resources related to this topic, see here.) Profiling and measurement It's impossible to overstate how important profiling, measuring, and analyzingreliable evidence is, especially when dealing with web application performance. Maybe you used Glimpseor MiniProfilerto provide insights into the running of your web application;or perhaps, you are familiar with the Visual Studio diagnostics tools and the Application InsightsSoftware Development Kit (SDK). There's another tool that's worth mentioning and that's the Prefix profiler, which you can get at prefix.io.Prefix is a free, web‑based,ASP.NET profiler thatsupports ASP.NET Core. However, it doesn't yet support .NET Core (although this is planned),so you'll need to run ASP.NETCore on .NET Framework 4.6, for now. There's a live demo on their website (at demo.prefix.io) if you want to quickly check it out. You may also want to look at the PerfView performance analysis tool from Microsoft, which is used in the development of .NET Core. You can download PerfView from https://www.microsoft.com/en-us/download/details.aspx?id=28567, as a ZIP file that you can just extract and run. It is useful to analyze the memory of .NET applications among other things. You can use PerfView for many debugging activities, for example, to snapshot the heap or force GC runs. We don't have space for a detailed walkthrough here, but the included instructions are good, and there blogs on MSDN with guides and many video tutorials on Channel 9 at channel9.msdn.com/Series/PerfView-Tutorial if you need more information.Sysinternals tools (technet.microsoft.com/sysinternals) can also be helpful, but as they are not focused on .NET, they are less useful in this context. While tools such as these are great, what would be even better is building performance monitoring into your development workflow. Automate everything that you can and make performance checks transparent, routine, and run by default. Manual processes are bad becausesteps can be skipped and errors can easily be made. You wouldn't dream of developing software by e-mailing files around or editing code directly on a production server, so why not automate your performance tests too? Change control processes exist to ensure consistency and reduce errors. This is why using a Source Control Management (SCM) system, such as git or Team Foundation Server (TFS) is essential. It's also extremely useful to have a build server and perform Continuous Integration(CI) or even fully automated deployments. If the code that is deployed in production differs from what you have on your local workstation, then you have very little chance of success. This is one of the reasons why SQL Stored Procedures (SPs/sprocs) are difficult to work with,at least without rigorous version control. It's far too easy to modify an old version of an SP on a development database, accidentally revert a bug fix, and end up with a regression.If you must use sprocs, then you will need a versioning system such, as ReadyRoll (which Redgate has now acquired). If you practice Continuous Delivery (CD),then you'll have a build server, such as JetBrains TeamCity, ThoughtWorksGoCD, orCruiseControl.NET,or a cloud service, such as AppVeyor. Perhaps, you even automating your deployments using a tool, such as Octopus Deploy, and have your own internal NuGet feeds using software such as TheMotleyFool's Klondike or a cloud service such as MyGet (which also supports npm, bower, and VSIX packages). Bypassing processes and doing things manually will cause problems, even if you follow a script. If it can be automated, then it probably should be, and this includes testing. Automated testing As previously mentioned, the key to improving almost everything is automation. Tests thatare only run manually on developer workstations add very little value. It should of course be possible to run the tests on desktops, but this shouldn't be the official result because there's no guarantee that they will pass on a server (where the correct functioning matters more). Although automation usually occurs on servers, it can be useful to automate tests running on developer workstations too. One way of doing this in Visual Studio is to use a plugin, such as NCrunch. This runs your tests as you work, which can be very useful if you practice Test-Driven Development (TDD) and write your tests before your implementations. You can read more about NCrunch and see the pricing at ncrunch.net, or there's a similar open source project at continuoustests.com. One way of enforcing testing is to use gated check-ins in TFS, but this can be a little draconian, and if you use an SCM-like git, then it's easier to work on branches and simply block merges until all of the tests pass. You want to encourage developers to check-in early and often because this makes merges easier.Therefore, it's a bad idea to have features in progress sitting on workstations for a long time (generally no longer than a day). Continuous integration CI systems automatically build and test all of your branches, and they feed this information back to your version control system. For example, using the GitHubAPI,you can block the merging of pull requests until the build server has reported success of the merge result. Both Bitbucket and GitLab offer free CI systems called pipelines, so you may not need any extra systems in addition to one for source control because everything is in one place. GitLab also offers an integrated Docker container registry, and there is an open source version that you can install locally. Docker is well supported by .NET Core, and the new version of Visual Studio.You cando something similar with Visual Studio Team Services for CI builds and unit testing. Visual Studioalso has git services built into it. This process works well for unit testing because unit tests must be quick so that you get feedback early.Shortening the iteration cycle is a good way of increasing productivity,and you'll want the lag to be as small as possible. However, running tests on each build isn't suitable for all types of testing because not all tests can be quick. In this case, you'll need an additional strategy so as not to slow down your feedback loop. There are many unit testing frameworks available for .NET, for example NUnit, xUnit, and MSTest (Microsoft's unit test framework), along with multiple graphical ways of running tests locally, such as the Visual Studio Test Explorer and the ReSharper plugin. People have their favorites, but it doesn't really matter what you choose because most CI systems will support all of them. Slow testing Some tests are slow,but even if each test is fast they can easily add up to a lengthy time if you have a lot of them. This is especially true if they can't be parallelized and need to be run in sequence.Therefore, you should always aim to have each test stand on its own, without any dependencies on others. It's good practice to divide your tests into rings of importance so that you can at least run a subset of the most crucial on every CI build. However, if you have a large test suite or some tests thatare unavoidably slow, then you may choose to only run these once a day (perhaps overnight) or every week (maybe over the weekend). Some testing is simply slow by nature, and performance testing can often fall into this category, for example, load testing or User Interface (UI) testing. These are usually classed as integration testing, rather than unit testing, because they require your code to be deployed to an environment for testing, and the tests can't simply exercise the binaries. To make use of such automated testing, you will need to have an automated deployment system in addition to your CI system. If you have enough confidence in your test system, then you caneven have live deployments happen automatically. This works well if you also use feature switching to control the rollout of new features. Realistic environments Using a test environment that is as close to production (or as live-like) as possible is a good step toward ensuring reliable results. You cantry and use a smaller set of servers, and then scale your results up to get an estimate of live performance, but this assumes that you have an intimate knowledge of how your application scales, and what hardware constraints will be the bottlenecks. A better option is to use your live environment or rather what will become your production stack. You first create a staging environment that is identical to live, then you deploy your code to it, and run your full test suite, including a comprehensive performance test, ensuring that it behaves correctly. Once you are happy, then you simply swap staging and production, perhaps using DNS or Azure staging slots. Your old live environment now either becomes your test environment or if you use immutable cloud instances, then you can simply terminate it and spin up a new staging system. This concept is known as blue‑green deployment. You don't necessarily have to move all users across at once in a big bang. You canmove a few over first to test whether everything is correct. Web UI testing tools One of the most popular web testing tools is Selenium, which allows you to easily write tests and automate web browsers using WebDriver. Selenium is useful for many other tasks apart from testing, and you can read more about it at docs.seleniumhq.org. WebDriver is a protocol for remote controlling web browsers, and you can read about it at w3c.github.io/webdriver/webdriver-spec.html. Selenium uses real browsers, the same versions your users will access your web application with. This makes it excellent to get representative results, but it can cause issues if itrunsfrom the command line in an unattended fashion. For example, you may find your test server's memory full of dead browser processes, which have timed out. You may find it easier to use a dedicated headless test browser, which while not exactly the same as what your users will see, is more suitable for automation. The best approach is of course to use a combination of both, perhaps running headless tests first and then running the same tests on real browsers with WebDriver. One of the most well-known headless test browsers is PhantomJS. This is based on the WebKit engine, so it should give similar results to Chrome and Safari. PhantomJS is useful for many things apart from testing, such as capturing screenshots, and many different testing frameworks can drive it. As the name suggests,JavaScript can control PhantomJS, and you can read more about it at phantomjs.org. WebKit is an open source engine for web browsers, which was originally part of the KDE Linux desktop environment. It is mainly used in Apple's Safari browser, but a fork called Blink is used in Google Chrome, Chromium, and Opera. You can read more at webkit.org. Other automatable testing browsers based on different engines are available, but they have some limitations. For example, SlimerJS (slimerjs.org) is based on the Gecko engine used by Firefox, but is not fully headless. You probably want to use a higher-level testing utility rather than scripting browser engines directly. One such utility that provides many useful abstractions is CasperJS(casperjs.org),which supports running onboth PhantomJS and SlimerJS. Another library is Capybara, which allows you to easily simulate user interactions in Ruby. It supports Selenium, WebKit, Rack, and PhantomJS (via Poltergeist), although it's more suitable for Rails apps.You can read more at jnicklas.github.io/capybara. There is also TrifleJS (triflejs.org), which uses the .NET WebBrowser class (the Internet Explorer Trident engine), but this is a work in progress. Additionally, there's Watir (watir.com), which is a set of Ruby libraries that target Internet Explorer and WebDriver. However, neither have been updated in a while, and IE has changed a lot recently. Microsoft Edge (codenamed Spartan)is the new version of IE, and the Trident engine has been forked to EdgeHTML.The JavaScript engine (Chakra) has been open sourced as ChakraCore (github.com/Microsoft/ChakraCore). It shouldn't matter too much what browser engine you use, and PhantomJS will work fine as a first pass for automated tests. You can always test with real browsers after using a headless one, perhaps with Selenium or with PhantomJS using WebDriver. When we refer to browser engines (WebKit/Blink, Gecko, and Trident/EdgeHTML), we generally mean only the rendering and layout engine, not the JavaScript engine (SFX/Nitro/FTL/B3, V8, SpiderMonkey, and Chakra/ChakraCore). You'll probably still want to use a utility such as CasperJS to make writing tests easier, and you'll likely need a test framework, such as Jasmine (jasmine.github.io) or QUnit (qunitjs.com), too. You can also use a test runner thatsupports both Jasmine and QUnit, such as Chutzpah (mmanela.github.io/chutzpah). You can integrate your automated tests with many different CI systems, for example, Jenkins or JetBrains TeamCity. If you prefer a cloud-hosted option, then there's Travis CI (travis-ci.org) andAppVeyor (appveyor.com), which is also suitableto build .NET apps. You may prefer to run your integration and UI tests from your deployment system, for example, to verify a successful deployment in Octopus Deploy. There are also dedicated,cloud-based,web-application UI testing services available, such as BrowserStack (browserstack.com). Automating UI performancetests Automated UI tests are clearly great to check functional regressions, but they are also useful to test performance. You have programmatic access to the same information provided by the network inspector in the browser developer tools. You can integrate the YSlow (yslow.org)performance analyzerwith PhantomJS, enabling your CI system to check for common web performance mistakes on every commit. YSlow came out of Yahoo!, and it provides rules used to identify bad practices, which can slow down web applications for users. It's a similar idea to Google's PageSpeed Insights service (which can be automated via its API). However, YSlow is pretty old, and things have moved on in web development recently, for example, HTTP/2. A modern alternative is "the coach" from sitespeed.io, and you can read more at github.com/sitespeedio/coach.You should check out their other open source tools too, such as the dashboard at dashboard.sitespeed.io, which uses Graphite and Grafana. You canalso export the network results (in industry standard HAR format) and analyze them however you like. For example, visualizing them graphically in waterfall format, as you might do manually with your browser developer tools. The HTTP Archive (HAR) format is a standard way of representing the content of monitored network data to export it to other software. You can copy or save as HAR in some browser developer tools by right-clicking on a network request. DevOps When using automation and techniques, such as feature switching, it is essential to have a good view of your environments so that you know the utilization of all the hardware. Good tooling is important to perform this monitoring, and you want to easily be able to see the vital statistics of every server. This will consist of at least the CPU, memory, and disk space consumption, but it may include more, and you will want alarms set up to alert you if any of these stray outside allowed bands. The practice of DevOps is the culmination of all of the automation that we covered previously with development, operations, and quality assurance testing teams all collaborating. The only missing pieces left now are provisioning and configuring infrastructure and then monitoring it while in use. Although DevOps is a culture, there is plenty of tooling that can help. DevOps tooling One of the primary themes of DevOps tooling is defining infrastructure as code. The idea is that you shouldn't manually perform a task, such as setting up a server, when you can create software to do it for you. You canthen reuse these provisioning scripts, which will not only save you time, but it will also ensure that all of the machines are consistent and free of mistakes or missed steps. Provisioning There are many systems available to commission and configure new machines. Some popular configuration management automation tools are Ansible (ansible.com), Chef (chef.io), and Puppet (puppet.com). Not all of these tools work great on Windows servers, partly because Linux is easier to automate. However, you can run ASP.NETCore on Linux and still develop on Windows using Visual Studio, while testing in a VM. Developing for a VM is a great idea because it solves the problems in setting up environments and issues where it "works on my machine" but not in production. Vagrant (vagrantup.com) is a great command line tool to manage developer VMs. It allows you to easily create, spin up, and share developer environments. The successor to Vagrant, Otto (ottoproject.io) takes this a step further and abstracts deployment too.Therefore,you can push to multiple cloud providers without worrying about the intricacies of CloudFormation, OpsWorks, or anything else. If you create your infrastructure as code, then your scripts can be versioned and tested, just like your application code. We'll stop before we get too far off-topic, but the point is that if you have reliable environments, which you can easily verify, instantiate, and perform testing on, then CI is a lot easier. Monitoring Monitoring is essential, especially for web applications, and there are many tools available to help with it. A popular open source infrastructure monitoring system is Nagios (nagios.org). Another more modern open source alerting and metrics tool is Prometheus(prometheus.io). If you use a cloud platform, then there will be monitoring built in, for example AWS CloudWatch or Azure Diagnostics.There are also cloud servicesto directly monitor your website, such as Pingdom (pingdom.com), UptimeRobot (uptimerobot.com),Datadog (datadoghq.com),and PagerDuty (pagerduty.com). You probably already have a system in place to measure availability, but you can also use the same systems to monitor performance. This is not only helpfulto ensure a responsive users experience, but it can also provide early warning signs that a failure is imminent. If you are proactive and take preventative action, then you can save yourself a lot of trouble reactively fighting fires. It helps consider application support requirements at design time. Development, testing, and operations aren't competing disciplines, and you will succeed more often if you work as one team rather than simply throwing an application over the fence and saying it "worked in test, ops problem now". Summary In this article, we saw how wecan integrate automated testing into a CI system in order to monitor for performance regressions. We also learned some strategies to roll out changes and ensure that tests accurately reflect real life. We also briefly covered some options for DevOps practices and cloud-hosting providers, which together make continuous performance testing much easier. Resources for Article: Further resources on this subject: Designing your very own ASP.NET MVC Application [article] Creating a NHibernate session to access database within ASP.NET [article] Working With ASP.NET DataList Control [article]
Read more
  • 0
  • 0
  • 26185

article-image-arrays
Packt
07 Jun 2016
18 min read
Save for later

Learning JavaScript Data Structures: Arrays

Packt
07 Jun 2016
18 min read
In this article by Loiane Groner, author of the book Learning JavaScript Data Structures and Algorithms, Second Edition, we will learn about arrays. An array is the simplest memory data structure. For this reason, all programming languages have a built-in array datatype. JavaScript also supports arrays natively, even though its first version was released without array support. In this article, we will dive into the array data structure and its capabilities. An array stores values sequentially that are all of the same datatype. Although JavaScript allows us to create arrays with values from different datatypes, we will follow best practices and assume that we cannot do this(most languages do not have this capability). (For more resources related to this topic, see here.) Why should we use arrays? Let's consider that we need to store the average temperature of each month of the year of the city that we live in. We could use something similar to the following to store this information: var averageTempJan = 31.9; var averageTempFeb = 35.3; var averageTempMar = 42.4; var averageTempApr = 52; var averageTempMay = 60.8; However, this is not the best approach. If we store the temperature for only 1 year, we could manage 12 variables. However, what if we need to store the average temperature for more than 1 year? Fortunately, this is why arrays were created, and we can easily represent the same information mentioned earlier as follows: averageTemp[0] = 31.9; averageTemp[1] = 35.3; averageTemp[2] = 42.4; averageTemp[3] = 52; averageTemp[4] = 60.8; We can also represent the averageTemp array graphically: Creating and initializing arrays Declaring, creating, and initializing an array in JavaScript is as simple, as shown by the following: var daysOfWeek = new Array(); //{1} var daysOfWeek = new Array(7); //{2} var daysOfWeek = new Array('Sunday', 'Monday', 'Tuesday', 'Wednes"day', 'Thursday', 'Friday', 'Saturday'); //{3} We can simply declare and instantiate a new array using the keyword new (line {1}). Also, using the keyword new, we can create a new array specifying the length of the array (line {2}). A third option would be passing the array elements directly to its constructor (line {3}). However, using the new keyword is not best practice. If you want to create an array in JavaScript, we can assign empty brackets ([]),as in the following example: var daysOfWeek = []; We can also initialize the array with some elements, as follows: var daysOfWeek = ['Sunday', 'Monday', 'Tuesday', 'Wednesday', "'Thursday', 'Friday', 'Saturday']; If we want to know how many elements are in the array (its size), we can use the length property. The following code will give an output of 7: console.log(daysOfWeek.length); Accessing elements and iterating an array To access a particular position of the array, we can also use brackets, passing the index of the position we would like to access. For example, let's say we want to output all the elements from the daysOfWeek array. To do so, we need to loop the array and print the elements, as follows: for (var i=0; i<daysOfWeek.length; i++){ console.log(daysOfWeek[i]); } Let's take a look at another example. Let's say that we want to find out the first 20 numbers of the Fibonacci sequence. The first two numbers of the Fibonacci sequence are 1 and 2, and each subsequent number is the sum of the previous two numbers: var fibonacci = []; //{1} fibonacci[1] = 1; //{2} fibonacci[2] = 1; //{3} for(var i = 3; i < 20; i++){ fibonacci[i] = fibonacci[i-1] + fibonacci[i-2]; ////{4} } for(var i = 1; i<fibonacci.length; i++){ //{5} console.log(fibonacci[i]); //{6} } So, in line {1}, we declared and created an array. In lines {2} and {3}, we assigned the first two numbers of the Fibonacci sequence to the second and third positions of the array (in JavaScript, the first position of the array is always referenced by 0, and as there is no 0 in the Fibonacci sequence, we will skip it). Then, all we have to do is create the third to the twentieth number of the sequence (as we know the first two numbers already). To do so, we can use a loop and assign the sum of the previous two positions of the array to the current position (line {4},starting from index 3 of the array to the 19th index). Then, to take a look at the output (line {6}), we just need to loop the array from its first position to its length (line {5}). We can use console.log to output each index of the array (lines {5} and {6}), or we can also use console.log(fibonacci) to output the array itself. Most browsers have a nice array representation in console.log. If you would like to generate more than 20 numbers of the Fibonacci sequence, just change the number 20 to whatever number you like. Adding elements Adding and removing elements from an array is not that difficult; however, it can be tricky. For the examples we will use in this section, let's consider that we have the following numbers array initialized with numbers from 0 to 9: var numbers = [0,1,2,3,4,5,6,7,8,9]; If we want to add a new element to this array (for example, the number 10), all we have to do is reference the latest free position of the array and assign a value to it: numbers[numbers.length] = 10; In JavaScript, an array is a mutable object. We can easily add new elements to it. The object will grow dynamically as we add new elements to it. In many other languages, such as C and Java, we need to determine the size of the array, and if we need to add more elements to the array, we need to create a completely new array; we cannot simply add new elements to it as we need them. Using the push method However, there is also a method called push that allows us to add new elements to the end of the array. We can add as many elements as we want as arguments to the push method: numbers.push(11); numbers.push(12, 13); The output of the numbers array will be the numbers from 0 to 13. Inserting an element in the first position Now, let's say we need to add a new element to the array and would like to insert it in the first position, not the last one. To do so, first, we need to free the first position by shifting all the elements to the right. We can loop all the elements of the array, starting from the last position + 1 (length) and shifting the previous element to the new position to finally assign the new value we want to the first position (-1). Run the following code for this: for (var i=numbers.length; i>=0; i--){ numbers[i] = numbers[i-1]; } numbers[0] = -1; We can represent this action with the following diagram: Using the unshift method The JavaScript array class also has a method called unshift, which inserts the values passed in the method's arguments at the start of the array: numbers.unshift(-2); numbers.unshift(-4, -3); So, using the unshift method, we can add the value -2 and then -3 and -4 to the beginning of the numbers array. The output of this array will be the numbers from -4 to 13. Removing elements So far, you have learned how to add values to the end and at the beginning of an array. Let's take a look at how we can remove a value from an array. To remove a value from the end of an array, we can use the pop method: numbers.pop(); The push and pop methods allow an array to emulate a basic stack data structure. The output of our array will be the numbers from -4 to 12. The length of our array is 17. Removing an element from first position To remove a value from the beginning of the array, we can use the following code: for (var i=0; i<numbers.length; i++){ numbers[i] = numbers[i+1]; } We can represent the previous code using the following diagram: We shifted all the elements one position to the left. However, the length of the array is still the same (17), meaning we still have an extra element in our array (with an undefined value).The last time the code inside the loop was executed, i+1was a reference to a position that does not exist. In some languages such as Java, C/C++, or C#, the code would throw an exception, and we would have to end our loop at numbers.length -1. As you can note, we have only overwritten the array's original values, and we did not really remove the value (as the length of the array is still the same and we have this extra undefined element). Using the shift method To actually remove an element from the beginning of the array, we can use the shift method, as follows: numbers.shift(); So, if we consider that our array has the value -4 to 12 and a length of 17, after we execute the previous code, the array will contain the values -3 to 12 and have a length of 16. The shift and unshift methods allow an array to emulate a basic queue data structure. Adding and removing elements from a specific position So far, you have learned how to add elements at the end and at the beginning of an array, and you have also learned how to remove elements from the beginning and end of an array. What if we also want to add or remove elements from any particular position of our array? How can we do this? We can use the splice method to remove an element from an array by simply specifying the position/index that we would like to delete from and how many elements we would like to remove, as follows: numbers.splice(5,3); This code will remove three elements, starting from index 5 of our array. This means the numbers [5],numbers [6], and numbers [7] will be removed from the numbers array. The content of our array will be -3, -2, -1, 0, 1, 5, 6, 7, 8, 9, 10, 11, and 12 (as the numbers 2, 3, and 4 have been removed). As with JavaScript arrays and objects, we can also use the delete operator to remove an element from the array, for example, remove numbers[0]. However, the position 0 of the array will have the value undefined, meaning that it would be the same as doing numbers[0] = undefined. For this reason, we should always use the splice, pop, or shift methods to remove elements. Now, let's say we want to insert numbers 2 to 4 back into the array, starting from the position 5. We can again use the splice method to do this: numbers.splice(5,0,2,3,4); The first argument of the method is the index we want to remove elements from or insert elements into. The second argument is the number of elements we want to remove (in this case, we do not want to remove any, so we will pass the value 0 (zero)). And the third argument (onwards) are the values we would like to insert into the array (the elements 2, 3, and 4). The output will be values from -3 to 12 again. Finally, let's execute the following code: numbers.splice(5,3,2,3,4); The output will be values from -3 to 12. This is because we are removing three elements, starting from the index 5, and we are also adding the elements 2, 3, and 4, starting at index 5. Two-dimensional and multidimensional arrays At the beginning of this article, we used the temperature measurement example. We will now use this example one more time. Let's consider that we need to measure the temperature hourly for a few days. Now that we already know we can use an array to store the temperatures, we can easily write the following code to store the temperatures over two days: var averageTempDay1 = [72,75,79,79,81,81]; var averageTempDay2 = [81,79,75,75,73,72]; However, this is not the best approach; we can write better code! We can use a matrix (two-dimensional array) to store this information, in which each row will represent the day, and each column will represent an hourly measurement of temperature, as follows: var averageTemp = []; averageTemp[0] = [72,75,79,79,81,81]; averageTemp[1] = [81,79,75,75,73,72]; JavaScript only supports one-dimensional arrays; it does not support matrices. However, we can implement matrices or any multidimensional array using an array of arrays, as in the previous code. The same code can also be written as follows: //day 1 averageTemp[0] = []; averageTemp[0][0] = 72; averageTemp[0][1] = 75; averageTemp[0][2] = 79; averageTemp[0][3] = 79; averageTemp[0][4] = 81; averageTemp[0][5] = 81; //day 2 averageTemp[1] = []; averageTemp[1][0] = 81; averageTemp[1][1] = 79; averageTemp[1][2] = 75; averageTemp[1][3] = 75; averageTemp[1][4] = 73; averageTemp[1][5] = 72; In the previous code, we specified the value of each day and hour separately. We can also represent this example in a diagram similar to the following: Each row represents a day, and each column represents an hour of the day (temperature). Iterating the elements of two-dimensional arrays If we want to take a look at the output of the matrix, we can create a generic function to log its output: function printMatrix(myMatrix) { for (var i=0; i<myMatrix.length; i++){ for (var j=0; j<myMatrix[i].length; j++){ console.log(myMatrix[i][j]); } } } We need to loop through all the rows and columns. To do this, we need to use a nested for loop in which the variable i represents rows, and j represents the columns. We can call the following code to take a look at the output of the averageTemp matrix: printMatrix(averageTemp); Multi-dimensional arrays We can also work with multidimensional arrays in JavaScript. For example, let's create a 3 x 3 matrix. Each cell contains the sum i (row) + j (column) + z (depth) of the matrix, as follows: var matrix3x3x3 = []; for (var i=0; i<3; i++){ matrix3x3x3[i] = []; for (var j=0; j<3; j++){ matrix3x3x3[i][j] = []; for (var z=0; z<3; z++){ matrix3x3x3[i][j][z] = i+j+z; } } } It does not matter how many dimensions we have in the data structure; we need to loop each dimension to access the cell. We can represent a 3 x 3 x 3 matrix with a cube diagram, as follows: To output the content of this matrix, we can use the following code: for (var i=0; i<matrix3x3x3.length; i++){ for (var j=0; j<matrix3x3x3[i].length; j++){ for (var z=0; z<matrix3x3x3[i][j].length; z++){ console.log(matrix3x3x3[i][j][z]); } } } If we had a 3 x 3 x 3 x 3 matrix, we would have four nested for statements in our code and so on. References for JavaScript array methods Arrays in JavaScript are modified objects, meaning that every array we create has a few methods available to be used. JavaScript arrays are very interesting because they are very powerful and have more capabilities available than primitive arrays in other languages. This means that we do not need to write basic capabilities ourselves, such as adding and removing elements in/from the middle of the data structure. The following is a list of the core available methods in an array object. We have covered some methods already: Method Description concat This joins multiple arrays and returns a copy of the joined arrays every This iterates every element of the array, verifying a desired condition (function) until false is returned filter This creates an array with each element that evaluates to true in the function provided forEach This executes a specific function on each element of the array join This joins all the array elements into a string indexOf This searches the array for specific elements and returns its position lastIndexOf This returns the position of last item in the array that matches the search criteria map This creates a new array from a function that contains the criteria/condition and returns the elements of the array that match the criteria reverse This reverses the array so that the last items become the first and vice versa slice This returns a new array from the specified index some This iterates every element of the array, verifying a desired condition (function) until true is returned sort This sorts the array alphabetically or by the supplied function toString This returns the array as a string valueOf Similar to the toString method, this returns the array as a string We have already covered the push, pop, shift, unshift, and splice methods. Let's take a look at these new ones. Joining multiple arrays Consider a scenario where you have different arrays and you need to join all of them into a single array. We could iterate each array and add each element to the final array. Fortunately, JavaScript already has a method that can do this for us named the concat method, which looks as follows: var zero = 0; var positiveNumbers = [1,2,3]; var negativeNumbers = [-3,-2,-1]; var numbers = negativeNumbers.concat(zero, positiveNumbers); We can pass as many arrays and objects/elements to this array as we desire. The arrays will be concatenated to the specified array in the order that the arguments are passed to the method. In this example, zero will be concatenated to negativeNumbers, and then positiveNumbers will be concatenated to the resulting array. The output of the numbers array will be the values -3, -2, -1, 0, 1, 2, and 3. Iterator functions Sometimes, we need to iterate the elements of an array. You learned that we can use a loop construct to do this, such as the for statement, as we saw in some previous examples. JavaScript also has some built-in iterator methods that we can use with arrays. For the examples of this section, we will need an array and also a function. We will use an array with values from 1 to 15 and also a function that returns true if the number is a multiple of 2 (even) and false otherwise. Run the following code: var isEven = function (x) { // returns true if x is a multiple of 2. console.log(x); return (x % 2 == 0) ? true : false; }; var numbers = [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15]; return (x % 2 == 0) ? true : false can also be represented as return (x % 2 == 0). Iterating using the every method The first method we will take a look at is the every method. The every method iterates each element of the array until the return of the function is false, as follows: numbers.every(isEven); In this case, our first element of the numbers array is the number 1. 1 is not a multiple of 2 (it is an odd number), so the isEven function will return false, and this will be the only time the function will be executed. Iterating using the some method Next, we have the some method. It has the same behavior as the every method; however, the some method iterates each element of the array until the return of the function is true: numbers.some(isEven); In our case, the first even number of our numbers array is 2 (the second element). The first element that will be iterated is the number 1; it will return false. Then, the second element that will be iterated is the number 2, which will return true, and the iteration will stop. Iterating using forEach If we need the array to be completely iterated no matter what, we can use the forEach function. It has the same result as using a for loop with the function's code inside it, as follows: numbers.forEach(function(x){ console.log((x % 2 == 0)); }); Using map and filter JavaScript also has two other iterator methods that return a new array with a result. The first one is the map method, which is as follows: var myMap = numbers.map(isEven); The myMap array will have the following values: [false, true, false, true, false, true, false, true, false, true, false, true, false, true, false]. It stores the result of the isEven function that was passed to the map method. This way, we can easily know whether a number is even or not. For example, myMap[0] returns false because 1 is not even, and myMap[1] returns true because 2 is even. We also have the filter method. It returns a new array with the elements that the function returned true, as follows: var evenNumbers = numbers.filter(isEven); In our case, the evenNumbers array will contain the elements that are multiples of 2: [2, 4, 6, 8, 10, 12, 14]. Using the reduce method Finally, we have the reduce method. The reduce method also receives a function with the following parameters: previousValue, currentValue, index, and array. We can use this function to return a value that will be added to an accumulator, which will be returned after the reduce method stops being executed. It can be very useful if we want to sum up all the values in an array. Here's an example: numbers.reduce(function(previous, current, index){ return previous + current; }); The output will be 120. The JavaScript Array class also has two other important methods: map and reduce. The method names are self-explanatory, meaning that the map method will map values when given a function, and the reduce method will reduce the array containing the values that match a function as well. These three methods (map, filter, and reduce) are the base of the functional programming of JavaScript. Summary In this article, we covered the most-used data structure: arrays. You learned how to declare, initialize, and assign values as well as add and remove elements. You also learned about two-dimensional and multidimensional arrays as well as the main methods of an array, which will be very useful when we start creating our own algorithms.
Read more
  • 0
  • 0
  • 2277

article-image-asynchronous-control-flow-patterns-es2015-and-beyond
Packt
07 Jun 2016
6 min read
Save for later

Asynchronous Control Flow Patterns with ES2015 and beyond

Packt
07 Jun 2016
6 min read
In this article,by Luciano Mammino, the author of the book Node.js Design Patterns, Second Edition, we will explore async await, an innovative syntaxthat will be available in JavaScript as part of the release of ECMAScript 2017. (For more resources related to this topic, see here.) Async await using Babel Callbacks, promises, and generators turn out to be the weapons at our disposal to deal with asynchronous code in JavaScript and in Node.js. As we have seen, generators are very interesting because they offer a way to actually suspend the execution of a function and resume it at a later stage. Now we can adopt this feature to write asynchronous codethatallowsdevelopers to write functions that "appear" to block at each asynchronous operation, waiting for the results before continuing with the following statement. The problem is that generator functions are designed to deal mostly with iterators and their usage with asynchronous code feels a bit cumbersome.It might be hard to understand,leading to code that is hard to read and maintain. But there is hope that there will be a cleaner syntax sometime in the near future. In fact, there is an interesting proposal that will be introduced with the ECMAScript 2017 specification that defines the async function's syntax. You can read more about the current status of the async await proposal at https://tc39.github.io/ecmascript-asyncawait/. The async function specification aims to dramatically improve the language-level model for writing asynchronous code by introducing two new keywords into the language: async and await. To clarify how these keywords are meant to be used and why they are useful, let's see a very quick example: const request = require('request'); function getPageHtml(url) { return new Promise(function(resolve, reject) { request(url, function(error, response, body) { resolve(body); }); }); } async function main() { const html = awaitgetPageHtml('http://google.com'); console.log(html); } main(); console.log('Loading...'); In this code,there are two functions: getPageHtml and main. The first one is a very simple function that fetches the HTML code of a remote web page given its URL. It's worth noticing that this function returns a promise. The main function is the most interesting one because it's where the new async and await keywords are used. The first thing to notice is that the function is prefixed with the async keyword. This means that the function executes asynchronous code and allows it to use the await keyword within its body. The await keyword before the call to getPageHtml tells the JavaScript interpreter to "await" the resolution of the promise returned by getPageHtml before continuing to the next instruction. This way, the main function is internally suspended until the asynchronous code completes without blocking the normal execution of the rest of the program. In fact, we will see the string Loading… in the console and, after a moment, the HTML code of the Google landing page. Isn't this approach much more readable and easy to understand? Unfortunately, this proposal is not yet final, and even if it will be approved we will need to wait for the next version of the ECMAScript specification to come out and be integrated in Node.js to be able to use this new syntax natively. So what do we do today? Just wait? No, of course not! We can already leverage async await in our code thanks to transpilers such as Babel. Installing and running Babel Babel is a JavaScript compiler (or transpiler) that is able to convert JavaScript code into other JavaScript code using syntax transformers. Syntax transformers allowsthe use of new syntax such as ES2015, ES2016, JSX, and others to produce backward compatible equivalent code that can be executed in modernJavaScript runtimes, such as browsers or Node.js. You can install Babel in your project using NPM with the following command: npm install --save-dev babel-cli We also need to install the extensions to support async await parsing and transformation: npm install --save-dev babel-plugin-syntax-async-functions babel-plugin-transform-async-to-generator Now let's assume we want to run our previous example (called index.js).We need to launch the following command: node_modules/.bin/babel-node --plugins "syntax-async-functions,transform-async-to-generator" index.js This way, we are transforming the source code in index.js on the fly, applying the transformers to support async await. This new backward compatible code is stored in memory and then executed on the fly on the Node.js runtime. Babel can also be configured to act as a build processor that stores the generated code into files so that you can easily deploy and run the generated code. You can read more about how to install and configure Babel on the official website at https://babeljs.io. Comparison At this point, we should have a better understanding of the options we have to tame the asynchronous nature of JavaScript. Each one of the solutions presented has its own pros and cons. Let's summarize them in the following table: Solutions Pros Cons Plain JavaScript Does not require any additional libraries or technology Offers the best performances Provides the best level of compatibility with third-party libraries Allows the creation of ad hoc and more advanced algorithms Might require extra code and relatively complex algorithms Async (library) Simplifies the most common control flow patterns Is still a callback-based solution Good performance Introduces an external dependency Might still not be enough for advanced flows   Promises Greatly simplify the most common control flow patterns Robust error handling Part of the ES2015 specification Guarantee deferred invocation of onFulfilled and onRejected Require to promisify callback-based APIs Introduce a small performance hit   Generators Make non-blocking API look like a blocking one Simplify error handling Part of ES2015 specification Require a complementary control flow library Still require callbacks or promises to implement non-sequential flows Require to thunkify or promisify nongenerator-based APIs   Async await Make non-blocking API look like blocking Clean and intuitive syntax Not yet available in JavaScript and Node.js natively Requires Babel or other transpilers and some configuration to be used today   It is worth mentioning that we chose to present only the most popular solutions to handle asynchronous control flow, or the ones receiving a lot of momentum, but it's good to know that there are a few more options you might want to look at, for example, Fibers (https://npmjs.org/package/fibers) and Streamline (https://npmjs.org/package/streamline). Summary In this article, we analyzed how Babel can be used for performing async await and how to install Babel.
Read more
  • 0
  • 0
  • 19131

article-image-game-development-using-c
Packt
07 Jun 2016
14 min read
Save for later

Game Development Using C++

Packt
07 Jun 2016
14 min read
C++ is one of the most popular languages for game development as it supports a variety of coding styles that provide low-level access to the system. In this article by Druhin Mukherjee, author of the book C++ Game Development Cookbook, we will go through a basics of game development using C++. (For more resources related to this topic, see here.) Creating your first simple game Creating a simple text-based game is really easy. All we need to do is to create some rules and logic and we will have ourselves a game. Of course, as the game gets more complex we need to add more functions. When the game reaches a point where there are multiple behaviors and states of objects and enemies, we should use classes and inheritance to achieve the desired result. Getting ready To work through this recipe, you will need a machine running Windows. You also need to have a working copy of Visual Studio installed on your Windows machine. No other prerequisites are required. How to do it… In this recipe, we will learn how to create a simple luck-based lottery game: Open Visual Studio. Create a new C++ project. Select Win32 Console Application. Add a Source.cpp file. Add the following lines of code to it: #include <iostream> #include <cstdlib> #include <ctime>   int main(void) { srand(time(NULL)); // To not have the same numbers over and over again.   while (true) { // Main loop.     // Initialize and allocate. intinumber = rand() % 100 + 1 // System number is stored in here. intiguess; // User guess is stored in here. intitries = 0; // Number of tries is stored here. charcanswer; // User answer to question is stored here.       while (true) { // Get user number loop.       // Get number. std::cout<< "Enter a number between 1 and 100 (" << 20 - itries<< " tries left): "; std::cin>>iguess; std::cin.ignore();         // Check is tries are taken up. if (itries>= 20) { break;       }         // Check number. if (iguess>inumber) { std::cout<< "Too high! Try again.n";       } else if (iguess<inumber) { std::cout<< "Too low! Try again.n";       } else { break;       }         // If not number, increment tries. itries++;     }       // Check for tries. if (itries>= 20) { std::cout<< "You ran out of tries!nn";     } else {       // Or, user won. std::cout<< "Congratulations!! "<<std::endl; std::cout<< "You got the right number in "<<itries<< " tries!n";     }   while (true) { // Loop to ask user is he/she would like to play again.       // Get user response. std::cout<< "Would you like to play again (Y/N)? "; std::cin>>canswer; std::cin.ignore();         // Check if proper response. if (canswer == 'n' || canswer == 'N' || canswer == 'y' || canswer == 'Y') { break;       } else { std::cout<< "Please enter 'Y' or 'N'...n";       }     }       // Check user's input and run again or exit; if (canswer == 'n' || canswer == 'N') { std::cout<< "Thank you for playing!"; break;     } else { std::cout<< "nnn";     }   }     // Safely exit. std::cout<< "nnEnter anything to exit. . . "; std::cin.ignore(); return 0; } How it works… The game works by creating a random number from 1 to 100 and asks the user to guess that number. Hints are provided as to whether the number guessed is higher or lower than the actual number. The user is given just 20 tries to guess the number. We first need a pseudo seeder, based on which we are going to generate a random number. The pseudo seeder in this case is srand. We have chosen TIME as a value to generate our random range. We need to execute the program in an infinite loop so that the program breaks only when all tries are used up or when the user correctly guesses the number. We can set a variable for tries and increment for every guess a user takes. The random number is generated by the rand function. We use rand%100+1 so that the random number is in the range 1 to 100. We ask the user to input the guessed number and then we check whether that number is less than, greater than, or equal to the randomly generated number. We then display the correct message. If the user has guessed correctly, or all tries have been used, the program should break out of the main loop. At this point, we ask the user whether they want to play the game again. Then, depending on the answer, we go back into the main loop and start the process of selecting a random number. Creating your first window Creating a window is the first step in Windows programming. All our sprites and other objects will be drawn on top of this window. There is a standard way of drawing a window. So this part of the code will be repeated in all programs that use Windows programming to draw something. Getting ready You need to have a working copy of Visual Studio installed on your Windows machine. How to do it… In this recipe, we will find out how easy it is to create a window: Open Visual Studio. Create a new C++ project. Select a Win32 Windows application. Add a source file called Source.cpp. Add the following lines of code to it: #define WIN32_LEAN_AND_MEAN   #include <windows.h>   // Include all the windows headers. #include <windowsx.h>  // Include useful macros. #include "resource.h"   #define WINDOW_CLASS_NAMEL"WINCLASS1"     voidGameLoop() {   //One frame of game logic occurs here... }   LRESULT CALLBACK WindowProc(HWND _hwnd, UINT _msg, WPARAM _wparam, LPARAM _lparam) {   // This is the main message handler of the system. PAINTSTRUCTps; // Used in WM_PAINT. HDChdc;        // Handle to a device context.     // What is the message? switch (_msg)   { caseWM_CREATE:   {             // Do initialization stuff here.               // Return Success. return (0);   } break;   caseWM_PAINT:   {            // Simply validate the window. hdc = BeginPaint(_hwnd, &ps);              // You would do all your painting here...   EndPaint(_hwnd, &ps);              // Return Success. return (0);   } break;   caseWM_DESTROY:   {              // Kill the application, this sends a WM_QUIT message. PostQuitMessage(0);                // Return success. return (0);   } break;     default:break;   } // End switch.     // Process any messages that we did not take care of...   return (DefWindowProc(_hwnd, _msg, _wparam, _lparam)); }   intWINAPIWinMain(HINSTANCE _hInstance, HINSTANCE _hPrevInstance, LPSTR _lpCmdLine, int _nCmdShow) { WNDCLASSEXwinclass; // This will hold the class we create. HWNDhwnd;           // Generic window handle.   MSG msg;             // Generic message.   HCURSORhCrosshair = LoadCursor(_hInstance, MAKEINTRESOURCE(IDC_CURSOR2));     // First fill in the window class structure. winclass.cbSize = sizeof(WNDCLASSEX); winclass.style = CS_DBLCLKS | CS_OWNDC | CS_HREDRAW | CS_VREDRAW; winclass.lpfnWndProc = WindowProc; winclass.cbClsExtra = 0; winclass.cbWndExtra = 0; winclass.hInstance = _hInstance; winclass.hIcon = LoadIcon(NULL, IDI_APPLICATION); winclass.hCursor = LoadCursor(_hInstance, MAKEINTRESOURCE(IDC_CURSOR2)); winclass.hbrBackground = static_cast<HBRUSH>(GetStockObject(WHITE_BRUSH)); winclass.lpszMenuName = NULL; winclass.lpszClassName = WINDOW_CLASS_NAME; winclass.hIconSm = LoadIcon(NULL, IDI_APPLICATION);     // register the window class if (!RegisterClassEx(&winclass))   { return (0);   }     // create the window hwnd = CreateWindowEx(NULL, // Extended style. WINDOW_CLASS_NAME,      // Class. L"Packt Publishing",   // Title. WS_OVERLAPPEDWINDOW | WS_VISIBLE,     0, 0,                    // Initial x,y.     400, 400,                // Initial width, height.     NULL,                   // Handle to parent.     NULL,                   // Handle to menu.     _hInstance,             // Instance of this application.     NULL);                  // Extra creation parameters.   if (!(hwnd))   { return (0);   }     // Enter main event loop while (true)   {     // Test if there is a message in queue, if so get it. if (PeekMessage(&msg, NULL, 0, 0, PM_REMOVE))     {       // Test if this is a quit. if (msg.message == WM_QUIT)       { break;       }         // Translate any accelerator keys. TranslateMessage(&msg);       // Send the message to the window proc. DispatchMessage(&msg);     }       // Main game processing goes here. GameLoop(); //One frame of game logic occurs here...   }     // Return to Windows like this... return (static_cast<int>(msg.wParam)); } How it works… In this example, we have used the standard Windows API callback. We query on the message parameter that is passed and, based on that, we intercept and perform suitable actions. We have used the WM_PAINT message to paint the window for us and the WM_DESTROY message to destroy the current window. To paint the window, we need a handle to the device context and then we can use BeginPaint and EndPaint appropriately. In the main structure, we need to fill up the Windows structures and specify the current cursor and icons that need to be loaded. Here, we can specify what color brush we are going to use to paint the window. Finally, the size of the window is specified and registered. After that, we need to continuously peek messages, translate them, and finally dispatch them to the Windows procedure. Adding artificial intelligence to a game Adding artificial intelligence to a game may be easy or extremely difficult, based on the level of realism or complexity we are trying to achieve. In this recipe, we will start with the basics of adding artificial intelligence. Getting ready To work through this recipe, you will need a machine running Windows and a version of Visual Studio. No other prerequisites are required. How to do it… In this recipe, we will see how easy it is to add a basic artificial intelligence to the game. Add a source file called Source.cpp. Add the following code to it: // Basic AI : Keyword identification   #include <iostream> #include <string> #include <string.h>     std::stringarr[] = { "Hello, what is your name ?", "My name is Siri" };   int main() {   std::stringUserResponse;   std::cout<< "Enter your question? "; std::cin>>UserResponse;   if (UserResponse == "Hi")   { std::cout<<arr[0] <<std::endl; std::cout<<arr[1];   }     int a; std::cin>> a; return 0;   } How it works… In the previous example, we are using a string array to store a response. The idea of the software is to create an intelligent chat bot that can reply to questions asked by users and interact with them as if it were human. Hence the first task was to create an array of responses. The next thing to do is to ask the user for the question. In this example, we are searching for a basic keyword called Hi and, based on that, we are displaying the appropriate answer.  Of course, this is a very basic implementation. Ideally we would have a list of keywords and responses when either of the keywords is triggered. We can even personalize this by asking the user for their name and then appending it to the answer every time. The user may also ask to search for something. That is actually quite an easy thing to do. If we have detected the word that the user is longing to search for correctly, we just need to enter that into the search engine. Whatever result the page displays, we can report it back to the user. We can also use voice commands to enter the questions and give the responses. In this case, we would also need to implement some kind of NLP (Natural Language Processing). After the voice command is correctly identified, all the other processes are exactly the same. Using the const keyword to optimize your code Aconst keyword is used to make data or a pointer constant so that we cannot change the value or address, respectively. There is one more advantage of using the const keyword. This is particularly useful in the object-oriented paradigm. Getting ready For this recipe, you will need a Windows machine and an installed version of Visual Studio. How to do it… In this recipe, we will find out how easy it is to use the const keyword effectively: #include <iostream>   class A { public:   voidCalc()const   { Add(a, b);     //a = 9;       // Not Allowed   } A()   {     a = 10;     b = 10;     } private:   int a, b; void Add(int a, int b)const   {   std::cout<< a + b <<std::endl;   } };   int main() {     A _a;   _a.Calc();   int a; std::cin>> a;   return 0; } How it works… In this example, we are writing a simple application to add two numbers. The first function is a public function. This mean that it is exposed to other classes. Whenever we write public functions, we must ensure that they are not harming any private data of that class. As an example, if the public function was to return the values of the member variables or change the values, then this public function is very risky. Therefore, we must ensure that the function cannot modify any member variables by adding the const keyword at the end of the function. This ensures that the function is not allowed to change any member variables. If we try to assign a different value to the member, we will get a compiler error: errorC3490: 'a' cannot be modified because it is being accessed through a const object. So this makes the code more secure. However, there is another problem. This public function internally calls another private function. What if this private function modifies the values of the member variables? Again, we will be at the same risk. As a result, C++ does not allow us to call that function unless it has the same signature of const at the end of the function. This is to ensure that the function cannot change the values of the member variables. Summary C++ is still used as the game programming languageof choice by many as it gives game programmers control of the entire architecture, including memorypatterns and usage. For detailed game development using C++ you can refer to: https://www.packtpub.com/game-development/procedural-content-generation-c-game-development https://www.packtpub.com/game-development/learning-c-creating-games-ue4 Resources for Article: Further resources on this subject: Putting the Fun in Functional Python [article] Playing with Physics [article] Introducing GameMaker [article]
Read more
  • 0
  • 0
  • 19296

article-image-managing-network-devices
Packt
07 Jun 2016
18 min read
Save for later

Managing Network Devices

Packt
07 Jun 2016
18 min read
In this article, Kevin Greene, author of book Getting Started with Microsoft System Center Operations Manager, explains hownetwork devices play a key role in our IT environments. Without them, we wouldn't have interconnectivity between our servers, clients, and applications and it goes without saying that their availability and performance should be monitored with OpsMgr. Here, we will discuss the out-of-the-box network monitoring capability of OpsMgr and also learn how to discover and manage network devices using the Simple Network Management Protocol (SNMP)and ICMP. At the start of the article, we introduced the different vendors, devices, and protocol support available for network monitoring along with some requirements and considerations to get it up and running smoothly. After discovering some devices, we will demonstrate how to best use the different network monitoring dashboards to deliver purposeful visualizations based on performance and availability. We'll finish the articlewith a rundown on the network monitoring reports available to you with this feature. Here's what you will learn: Network monitoring overview Discovering network devices Managing network devices Working with dashboards Network monitoring reports (For more resources related to this topic, see here.) Network monitoringoverview The out-of-the-box network monitoring capability has been around since the release of the original OpsMgr in 2012 and not much has changed since then. You have the ability to perform advanced monitoring of your network devices using SNMP or basic discovery and availability monitoring using ICMP (Ping). If you use SNMP, you can get a detailed monitoring of ports, interfaces, hardware,virtual local area networks (VLAN's), and even Hot Standby Router Protocol (HSRP) groups. With ICMP, all you get is an indication that the IP address of the network device is responding to Ping requests with very little information about the underlying components or interfaces. Although, the network monitoring feature of OpsMgr won't have network administrators throwing out the specialist tools they use from the likes of Cisco, for the IT administrator and IT pro, asit's still very useful when used in the overall context of IT service monitoring. This is because, regardless of the method used to discover and monitor your network devices, if it has an IP address, you can at least get an overview of its availability in OpsMgr and then visualize it as a part of your IT service models, reports, and dashboards. Multi-vendor support OpsMgr network monitoring works with any device that supports SNMP and also provides extended monitoring for devices that implement the management information base (MIB) RFC 2863 and MIB-II RFC 1213standards. Microsoft has published an Excel spreadsheet containing a list ofnearly 850 devices from various vendors that are supported for extended monitoring in OpsMgr. If you need a reference for supported vendors, you can download the spreadsheet from the following link: http://tinyurl.com/opsmgrnetworkdevicelist This spreadsheet gives us the details about the SNMP Object ID (OID) device type, vendor name, model, and the components that are supported for extended monitoring. If a vendor's network device is supported for extended SNMP monitoring, then it will be discovered as a certified device in OpsMgr. A certified device can show monitoring information for components,such as processor, memory, fans, and chassis. A non-certified device discovered using SNMP will be registered as a generic device in OpsMgr. This means that you won't see advanced information on hardware components, but you'll still get interface and availability monitoring, which is more than you'd get from an ICMP monitoring. Multi-device support Some of the network device types that OpsMgr can monitor include switches, firewalls, load balancers, air-conditioning units, and UPS devices, basically anything that supports SNMP or ICMP. It has the flexibility to bring all the fabric components to your datacenter under monitoring; this sets OpsMgr apart from other monitoring solutions. Multi-protocol support Three different versions of SNMP are supported for network device monitoring—SNMP v1, SNMP v2c, and SNMP v3. The first two versions are the most common and require an SNMP community string as a passphrase for the monitoring connection to be completed, whereas the newer SNMP v3 requires a unique username and password to be configured before you can monitor devices that support it. For ICMP, the network devices are discovered and monitored usingInternet Protocol version 4 (IPv4);OpsMgralso provides support for Internet Protocol version 6 (IPv6) when running a recursive discovery on your network. Additional SNMP monitoring options If you find that your network devices are discovered as non-certified (generic) and you need to get more monitoring information from them, consider some of these options: Check with the device vendor to see if they have authored a management pack of their own to light up extra capabilities for their hardware. Author your own SNMP monitoring management pack. Admittedly, this is a big suggestion to make in a beginners guide but if you're willing to give it a go, then System Center MVP Daniele Grandini has put together an excellent series of blog posts that will walk you through this process from start to finish. You can check out the series athttp://tinyurl.com/opsmgrsnmpauthoring. If you've deployed OpsMgr 2016, then the new Network Monitoring Management Pack Generator tool (http://tinyurl.com/opsmgrmpgenerator) will help you author a custom SNMP management pack in no time. This tool (NetMonMPGenerator.exe) is located in the Server directory of the OpsMgr 2016 install location and with it you can create a management pack that monitors network device components, such as Memory, Processors, Fans, Sensors, and Power Supplies. You can download the full user guide for this tool fromhttp://tinyurl.com/mpgeneratorguide. Requirements and considerations There are a number of things that you'll need to consider before you dive in to monitor your network devices, such as resource pool design, firewall rules to be configured, management packs to be deployed, and user role requirements. Resource pools You'll need to create additional resource pools when designing a network monitoring architecture for your OpsMgr environments,to ensure optimal performance and scalability. For example, if you have a large number of network devices to be monitored, it's recommended that you assign a resource pool that includes management or gateway servers that will be specifically responsible for monitoring those devices. In this way, you can control the OpsMgr servers that are to be used to monitor agents and the ones that are exclusively monitoring your network devices. The OpsMgr Sizing Helper tool provides guidance on how to design resource pools for network monitoring. Firewall rules The firewall rules between the OpsMgr servers listed in the network monitoring resource pool and the network devices to are being monitored need to be configured to allow bi-directional communication on ports 161 (UDP) and 162 (UDP) in order to support SNMP. Bi-directional ICMP traffic communication is also required to support devices that won't be monitored using SNMP. If the Windows Firewall is configured on your OpsMgr servers, then you will need to make the changes there too, to ensure that network monitoring communication is successful. Management packs All core management packs required for network monitoring are deployed automatically as a part of the original OpsMgr installation and these are listed as follows: SNMP Library Network Device Library Network Discovery Internal Network Management – Core Monitoring Network Management Library Network Management Reports Network Management Templates Windows Server Network Discovery Windows Client Network Discovery The last two additional management packs in this list are needed to discover the network adapters of Windows server and client computers. It's also recommended to deploy the latest versions of core Microsoft operating system management packs to ensure that the network adapters on your agent-managed computers get monitored properly. User roles The user account that you use to create the network discovery rules must be a member of the Operations Manager Administrators user role, which is configurable in the User Roles section of the Administration workspace. Understanding network discovery The first step that you need to take to monitor and manage your network devices is to create a discovery for them. You will need to decide if you will use SNMP, ICMP, or both for discovery and then create a discovery rule to go out and find the network devices that you wish to monitor. The following sections will help you understand the process of network monitoring discovery in OpsMgr. Discovery rules A network discovery rule is created using the Computer and Device Management wizard from the Administration workspace of the OpsMgr console, and only one discovery rule can be assigned to each management server or gateway server. For this very reason, you will need to think about resource pool design and the placement of your management and gateway servers to ensure that they communicate with the network devices that they will be assigned to monitor. Discovery rules can be configured to run automatically on a schedule or manually on-demand when you need to. The advantage for large organizations, of running a discovery rule on a schedule, is that you can ensure that any new network devices that have been brought online can be captured for monitoring with little effort (assuming all security requirements have been met) and any network devices that have been retired will be removed from monitoring automatically. Discovery types There are two types of discovery rules that you can configure – Explicit and Recursive. Here's an explanation of both. Explicit discoveries An explicit network discovery rule will only attempt to discover the network devices that you explicitly specify in the wizard by IP address or FQDN. Once all the prerequisites for discovery have been met and a device has been successfully accessed, monitoring will be enabled for it and any devices that cannot be successfully accessed will be placed in the Network Devices Pending Management view for review. An explicit discovery rule can be configured to discover and access devices using SNMP, ICMP, or a combination of them both. Recursive discoveries With a recursive discovery, you can first explicitly specify one or more network devices and after they are discovered OpsMgr will perform a scan to discover any other connected network devices using theAddress Routing Protocol (ARP) table, IP address table, or topology MIB of the initially discovered devices. This type of discovery grows the network map and presents all the applicable devices to you for monitoring. Similar to the explicit discovery rule, recursive discovery can also be configured to discover and access devices using SNMP, ICMP, or both. IPv6 addresses can also be identified,however; the initial discovered device must use an IPv4 address. With recursive discovery, you can also create a filter by using properties, such as the device type, name, and object identifier (OID) to give more control over what is or isn't discovered. DNS resolution of network devices When planning your network monitoring designs, pay careful attention to the DNS resolution of your network devices. A common mistake people make when bringing their network devices under monitoring is to just use IP addresses to identify them and as a result, when OpsMgr is finished discovering the devices, they will display within the console as a list of IP addresses instead of a descriptive DNS name. When discovering devices, OpsMgr uses a naming algorithm where it attempts DNS resolution from the following sources – where the first one in the list to succeed becomes the name of the device: Loopback IP sysName Public IP Private IP SNMP Agent IP To ensure your network devices are discovered with useful DNS names, you will need to put in some work either on your corporate DNS servers by creating 'A' records for each network device or by creating custom entries on the local 'hosts' file for each OpsMgr server that will discover and manage your network devices. If you decide to use custom entries in your local 'hosts' file to support DNS name resolution of your monitored network devices, remember to update the hosts file on each management and gateway server that are members of the network monitoring resource pools. If you don't do this, then there's a possibility that some network devices will only be discovered with their IP address. Run as accounts For network device discovery to be successful, a Run As account needs to be configured in OpsMgr with credentials that match the relevant access and security policies of the device to be monitored. For SNMP v1 and SNMPv2 devices, a passphrase in the form of acommunity string is required and for SNMPv3, access credentials in the form of a username and password are needed. Read-only or read-write permissions can be configured on network devices to control how much interaction OpsMgr has with them and in nearly all cases, read-only will be sufficient. The community string or access credentials must first be configured on the network device by someone with the relevant permissions to do so. It's useful to discuss this requirement with network administrators before you deploy OpsMgr as this step has the potential to become a time-consuming task if there's a lot of network devices to be configured. Run as profiles During installation, two new network monitoring RunAs profiles are automatically created. These profiles are used specifically for SNMP discoveries and are defined in the following table: Profile Name Description SNMP Monitoring Account Used for SNMPv1 and SNMPv2 monitoring. SNMPv3 Monitoring Account Used for SNMPv3 monitoring.   The RunAs accounts that you create or specify for network device discovery are automatically assigned to the appropriate SNMP Monitoring Run As profile. Discovery stages When a discovery rule kicks off, it will run through the following three stages: Probing This is the first discovery stage; here, the management server attempts to contact a device using the specified protocol (SNMP, ICMP or both) and uses the methods outlined in the following table: Type Description SNMP Only Successful discovery if an SNMP GET message is processed. ICMP Only Successful discovery if it can Ping the network device. SNMP and ICMP Successful discovery only if both protocols are processed. Processing When the Probing stage is completed, OpsMgr processes all the information returned from the device and maps out its components, such as ports and interfaces, memory, processors, VLAN membership, and HSRP groups. Post-processing At the final post-processing stage, OpsMgr correlates the network device ports to the servers that the ports are connected to. It inserts all relevant items into the Operational database and associates RunAs accounts to each network device. After the three discovery stages are complete, the resource pool that you've specified for network monitoring in the discovery rule configuration will begin to monitor the discovered network devices. Discovering network devices Now that you have an understanding of the discovery process, it's time to monitor some network devices and this section will walk you through the process. Before you begin though, ensure that all the previously discussed requirements are in place and confirm that the IP addresses or DNS names of your network devices are correct. If you're working through the steps in this articlein your lab and you don't have any network devices to monitor, then take a read through Cameron Fuller's blog post on using the free and very useful Xian SNMP Device Simulator tool from Jalasoft. This tool gives you the ability to simulate network devices using SNMP v1, v2c and v3 authentication.http://tinyurl.com/snmpsimulator. Here's what you need to do to begin monitoring your network devices: From the Administration workspace in the OpsMgr console, expand the Network Management view. Right-click on Network Management, then click Discovery Wizard as shown in Figure 6.1(you can also choose to click on the Discovery Wizard… link located above the Wunderbar). Figure 6.1: Opening the Discovery Wizard From the Computer and Device Management Wizard, select the Network Devices option as shown in Figure 6.2, then click Next to continue. Figure 6.2: Choosing the Network Devices wizard At the General Properties dialog box, enter a name and a description for the discovery rule, select the management server or gateway server that will run the discovery andthen choose a resource pool created specifically for network monitoring as shown in Figure 6.3. Click Next to move on. Figure 6.3: Configuring the discovery rule. From the Discovery Method dialog box, select the discovery type you want to use, as shown in Figure 6.4, we'll choose theExplicit discovery option, then click Next to continue. Figure 6.4: Choosing a discovery type. In the Discovery Settings dialog box, you can create a new SNMP v1/v2cRun As account or use an existing one. If you use different SNMP community strings on different network devices, then you'll need to create separate Run As accounts for each device. Figure 6.5 shows an example of multiple Run As accounts being selected for a network discovery. Click Next to continue. Figure 6.5: Specifying Run As accounts. From the Devices dialog box you can choose to either hit the Import button to import a text file containing the IP addresses of your network devices (very useful when you have more than a few devices to monitor), or you can click the Add button to specify an individual network device.For this example, we'll click the Add button. If you choose the Import option here, then you can use a simple .txt or .csv file containing the IP addresses of each network device you wish to monitor. Make surethat each IP address is listed in its own separate line in the file. In the Add a Device dialog box, input an IP address or DNS name for your network device, choose which access mode you wish to use (SNMP, ICMP or both), select the SNMP version you wish to use (you can configure an SNMP v3 account at this point if you wish), then leave the Use selected default accounts option enabled, as shown inFigure 6.6 and hit OK. Figure 6.6: Configuring discovery settings. When configuring the Access Mode setting for a device, be aware that if you leave it as the default ICMP and SNMP option, then both access types must succeed before proceeding. This means that if ICMP can't Ping the device or SNMP can't connect, then the discovery fails. This is useful to know in case you have an internal firewall policy that blocks ICMP (Ping) traffic. In most cases, it's best to choose either one or the otherand not both here. In Figure 6.7 you can see some network devices specified and if you want to modify the number of retries and timeout thresholds, you can click the Advanced Discovery Settings button now. When you're happy enough to move on, click Next. Figure 6.7: Configuring discovery settings. In the Schedule Discovery dialog box, choose to run the discovery rule on a schedule, or to just run it manually. Running the discovery rule on a schedule can be useful if you work in an environment where network devices are added and removed on a regular basis or it can also be helpful if you want to minimize discovery traffic during office hours. As shown in Figure 6.8, we'll choose to run our discovery rule manually. Figure 6.8: Choosing when to run the discovery rule. Click Next to move on; at the Summary dialog box, hit the Create button to create your new discovery rule. If you see a warning pop up indicating that you need to distribute the new Run As accounts to the management server, click Yes to do so before clicking on the Close button to close the wizard and run the discovery rule automatically. When the discovery rule is finished processing, you should be able to see the number of network devices you specified show up in the Last Discovered column as shown in Figure 6.9. Figure 6.9: Successful processing of a discovery rule. Network Device Discovery Failure From time to time, it's not uncommon to have a device (or a number of devices) fail to be discovered. This can be a problem if you don't know where to find a list of these failed devices. In this instance, the failed device or devices will be listed in the Network Devices Pending Management view from within the Network Devices section of the Administration workspace. Once you've located the failed devices, you can use one of these options to try to rediscover them again: Right-click on the failed device from within the Network Devices Pending Management view and then click Submit Rediscovery. Rerun the discovery rule by right-clicking on the rule and selecting the Run option. Summary In this article, we gave you an overview of the Network Monitoring feature of OpsMgr and discussed what you need to have in place in your environment to ensure successful monitoring of your network devices. We demonstrated how to configure a discovery rule and bring some devices under monitoring and we walked you through using the built-in tasks, dashboards and reports that are specific to this feature. Resources for Article: Further resources on this subject: Getting Started with Force.com [article] Neutron API Basics [article] VM, It Is Not What You Think! [article]
Read more
  • 0
  • 0
  • 3839

article-image-practical-big-data-exploration-spark-and-python
Anant Asthana
06 Jun 2016
6 min read
Save for later

Practical Big Data Exploration with Spark and Python

Anant Asthana
06 Jun 2016
6 min read
The reader of this post should be familiar with basic concepts of Spark, such as the shell and RDDs. Data sizes have increased, but our exploration tools and techniques have not evolved as fast. Traditional Hadoop Map Reduce jobs are cumbersome and time consuming to develop. Also, Pig isn't quite as fully featured and easy to work with. Exploration can mean parsing/analyzing raw text documents, analyzing log files, processing tabular data in various formats, and exploring data that may or may not be correctly formatted. This is where a tool like Spark excels. It provides an interactive shell for quick processing, prototyping, exploring, and slicing and dicing data. Spark works with R, Scala, and Python. In conjunction with Jupyter notebooks, we get a clean web interface to write out python, R, or Scala code backed by a Spark cluster. Jupyter notebook is also a great tool for presenting our findings, since we can do inline visualizations and easily share them as a PDF on GitHub or through a web viewer. The power of this set up is that we make Spark do the heavy lifting while still having the flexibility to test code on a small subset of data via the interactive notebooks. Another powerful capability of Spark is its Data Frames API. After we have cleaned our data (dealt with badly formatted rows that can't be loaded correctly), we can load it as a Data Frame. Once the data is a loaded as a Data Frame, we can use the Spark SQL to explore the data. Since notebooks can be shared, this is also a great way to let the developers do the work of cleaning the data and loading it as a Data Frame. Analysts, data scientists, and the likes can then use this data for their tasks. Data Frames can also be exported as Hive tables, which are commonly used in Hadoop-based warehouses. Examples: For this section, we will be using examples that I have uploaded on GitHub. These examples can be found at here. In addition to the examples, there is also a Docker container for running these examples that have been provided. The container runs Spark in a pseudo-distributed mode, and has Jupyter notebook configured with to run Python/PYspark. The basics: To set this up, in your environment, you need a running spark cluster with Jupyter notebook installed. Jupyter notebook, by default, only has the Python kernel configured. You can download additional kernels for Jupyter notebook to run R and Scala. To run Jupyter notebook with Pyspark, use the following command on your cluster: IPYTHON_OPTS="notebook --pylab inline --notebook-dir=<directory sto store notebooks>" MASTER=local[6] ./bin/pyspark When you start Jupyter notebook in the way we mentioned earlier, it initializes a few critical variables. One of them is the Spark Context (sc), which is used to interact with all spark-related tasks. The other is sqlContext, which is the Spark SQL context. This is used to interact with Spark SQL (create Data Frames, run queries, and so on). You need to understand the following: Log Analysis In this example, we use a log file from Apache Server. The code for this example can be found at here. We load our log file in question using: log_file = sc.textFile("../data/log_file.txt") Spark can load files from HDFS, local filesystem, and S3 natively. Other storage formats libraries can be found freely on the Internet, or you could write you own formats (Blog post for another time). The previous command loads the log file. We then use Python’s native shlex library to split the file into different fields and use the Sparks map command to load them as a Row. An RDD consisting of rows can easily be registered as a DataFrame. How we arrived at this solution is where data exploration comes in. We use the Sparks takeSample method to sample the file and get five rows: log_file.takeSample(True, 5) These sample rows are helpful in determining how to parse and load the file. Once we have written our code to load the file, we can apply it to the dataset using map to create a new RDD consisting of Rows to test code on a subset of data in a similar manner using the take or takeSample methods. The take method sequentially reads rows from the file, so although it is faster, it may not be a good representation of the dataset. The take sample method on the other hand randomly picks sample rows from the file; this has a better representation. To create the new RDD and register it as a DataFrame, we use the following code: schema_DF = splits.map(create_schema).toDF() Once we have created the DataFrame and tested it using take/takeSample to make sure that our loading code is working, we can register it as a table using the following: sqlCtx.registerDataFrameAsTable(schema_DF, 'logs') Once it is registered as a table, we can run SQL queries on the log file: sample = sqlCtx.sql('SELECT * FROM logs LIMIT 10').collect() Note that the collect() method collects the result to the driver’s memory so this may not be feasible for large datasets. Use take/takeSample instead to sample data if your dataset is large. The beauty of using Spark with Jupyter is that all this exploration work takes only a few lines of code. It can be written interactively with all the trial and error we needed, the processed data can be easily shared, and running interactive queries on this data is easy. Last but not least, this can easily scale to massive (GB, TB) data sets. k-means on the Iris dataset In this example, we use data from the Iris dataset, which contains measurements of sepal and petal length and width. This is a popular open source dataset used to showcase classification algorithms. In this case, we use Spark’s k-Means algorithm from the MLlib library of Spark. MLlib is Spark’s machine learning library. The code and the output can be found at here. In this example, we are not going to get into too much detail since some of the concepts are outside the scope of this blog post. This example showcases how we load the Iris dataset and create a DataFrame with it. We then train a k-means classifier on this dataset, and then we visualize our classification results. The power of this is that we did a somewhat complex task of parsing a dataset, creating a DataFrame, training a machine learning classifier, and visualizing the data in an interactive and scalable manner. The repository contains several more examples. Feel free to reach out to me if you have any questions. If you would like to see more posts with practical examples, please let us know. About the Author Anant Asthana is a data scientist and principal architect at Pythian, and he can be found on Github at anantasty.
Read more
  • 0
  • 0
  • 6397
Unlock access to the largest independent learning library in Tech for FREE!
Get unlimited access to 7500+ expert-authored eBooks and video courses covering every tech area you can think of.
Renews at €18.99/month. Cancel anytime
article-image-logging-and-monitoring
Packt
02 Jun 2016
17 min read
Save for later

Logging and Monitoring

Packt
02 Jun 2016
17 min read
In this article by Hui-Chuan Chloe Lee, Hideto Saito, and Ke-Jou Carol Hsu, the authors of the book, Kubernetes Cookbook, we will cover the recipe Monitoring master and node. (For more resources related to this topic, see here.) Monitoring master and node Here comes a new level of view for your Kubernetes cluster. In this recipe, we are going to talk about monitoring. Through monitoring tool, users could not only know the resource consumption of workers, the nodes, but also the pods. It will help us to have a better efficiency on resource utilization. Getting ready Before we setup our monitoring cluster in Kubernetes system, there are two main prerequisites: One is to update the last version of binary files, which makes sure your cluster has stable and capable functionality The other one is to setup the DNS server A Kubernetes DNS server can reduce some steps and dependency for installing cluster-like pods. In here, it is easier to deploy a monitoring system in Kubernetes with a DNS server. In Kubernetes, how DNS server gives assistance in large-system deployment? The DNS server can support to resolve the name of Kubernetes service for every container. Therefore, while running a pod, we don't have to set specific IP of service for connecting to other pods. Containers in a pod just need to know the service's name. The daemon of node kubelet assign containers the DNS server by modifying the file /etc/resolv.conf. Try to check the file or use the command nslookup for verification after you have installed the DNS server: # kubectl exec <POD_NAME> [-c <CONTAINER_NAME>] -- cat /etc/resolv.conf // Check where the service "kubernetes" served # kubectl exec <POD_NAME> [-c <CONTAINER_NAME>] -- nslookup kubernetes Update Kubernetes to the latest version: 1.2.1 Updating the version of a running Kubernetes system is not such a trouble duty. You can simply follow the following steps. The procedure is similar to both master and node: Since we are going to upgrade every Kubernetes binary file, stop all of the Kubernetes services before you upgrade. For example, service <KUBERNETES_DAEMON> stop. Download the latest tarball file: version 1.2.1: # cd /tmp && wget https://storage.googleapis.com/kubernetes-release/release/v1.2.1/kubernetes.tar.gz Decompress the file at a permanent directory. We are going to use the add-on templates provided in official source files. These templates can help to create both DNS server and monitoring system: // Open the tarball under /opt # tar -xvf /tmp/kubernetes.tar.gz -C /opt/ // Go further decompression for binary files # cd /opt && tar -xvf /opt/kubernetes/server/kubernetes-server-linux-amd64.tar.gz Copy the new files and overwrite the old ones: # cd /opt/kubernetes/server/bin/ // For master, you should copy following files and confirm to overwrite # cp kubectl hypercube kube-apiserver kube-controller-manager kube-scheduler kube-proxy /usr/local/bin // For nodes, copy the below files # cp kubelet kube-proxy /usr/local/bin Finally, you can now start the system services. It is good to verify the version through the command line: # kubectl version Client Version: version.Info{Major:"1", Minor:"2", GitVersion:"v1.2.1", GitCommit:"50809107cd47a1f62da362bccefdd9e6f7076145", GitTreeState:"clean"} Server Version: version.Info{Major:"1", Minor:"2", GitVersion:"v1.2.1", GitCommit:"50809107cd47a1f62da362bccefdd9e6f7076145", GitTreeState:"clean"} As a reminder, you should update both master and node at the same time. Setup DNS server As mentioned, we will use the official template to build up the DNS server in our Kubernetes system. Two steps only. First, modify templates and create the resources. Then, we need to restart the kubelet daemon with DNS information. Start the server by template The add-on files of Kubernetes are located at <KUBERNETES_HOME>/cluster/addons/. According to last step, we can access the add-on files for DNS at /opt/kubernetes/cluster/addons/dns, and two template files are going to be modified and executed. Feel free to depend on the following steps: Copy the file from the format .yaml.in to YAML file, and we will edit the copied ones later: # cp skydns-rc.yaml.in skydns-rc.yaml Input variable Substitute value Example {{ pillar['dns_domain'] }} The domain of this cluster k8s.local {{ pillar['dns_replicas'] }} The number of relica for this replication controller 1 {{ pillar['dns_server'] }} The private IP of DNS server. Must also be in the CIDR of cluster 192.168.0.2   # cp skydns-svc.yaml.in skydns-svc.yaml In this two templates, replace the pillar variable, which is covered by double big parentheses with the items in this table. As you know, the default service kubernetes will occupy the first IP in CIDR. That's why we use IP 192.168.0.2 for our DNS server: # kubectl get svc NAME         CLUSTER-IP    EXTERNAL-IP   PORT(S)   AGE kubernetes   192.168.0.1   <none>        443/TCP   4d In the template for the replication controller, the file named skydns-rc.yaml, specify master URL in the container kube2sky: # cat skydns-rc.yaml (Ignore above lines) : - name: kube2sky   image: gcr.io/google_containers/kube2sky:1.14   resources:     limits:       cpu: 100m       memory: 200Mi     requests:       cpu: 100m       memory: 50Mi   livenessProbe:     httpGet:       path: /healthz       port: 8080       scheme: HTTP     initialDelaySeconds: 60     timeoutSeconds: 5     successThreshold: 1     failureThreshold: 5   readinessProbe:     httpGet:       path: /readiness       port: 8081       scheme: HTTP     initialDelaySeconds: 30     timeoutSeconds: 5   args:   # command = "/kube2sky"   - --domain=k8s.local   - --kube-master-url=<MASTER_ENDPOINT_URL>:<EXPOSED_PORT> : (Ignore below lines) After you finish the preceding steps for modification, you just start them using the subcommand create: # kubectl create -f skydns-svc.yaml service "kube-dns" created # kubectl create -f skydns-rc.yaml replicationcontroller "kube-dns-v11" created Enable Kubernetes DNS in kubelet Next, we have to access to each node and add DNS information in the daemon kubelet. The tags we used for cluster DNS are --cluster-dns, which assigns the IP of DNS server, and --cluster-domain, which defines the domain of the Kubernetes services: // For init service daemon # cat /etc/init.d/kubernetes-node (Ignore above lines) : # Start daemon. echo $"Starting kubelet: "         daemon $kubelet_prog                 --api_servers=<MASTER_ENDPOINT_URL>:<EXPOSED_PORT>                 --v=2                 --cluster-dns=192.168.0.2                 --cluster-domain=k8s.local                 --address=0.0.0.0                 --enable_server                 --hostname_override=${hostname}                 > ${logfile}-kubelet.log 2>&1 & : (Ignore below lines) // Or, for systemd service # cat /etc/kubernetes/kubelet (Ignore above lines) : # Add your own! KUBELET_ARGS="--cluster-dns=192.168.0.2 --cluster-domain=k8s.local" Now, it is good for you to restart either service kubernetes-node or just kubelet! And you can enjoy the cluster with the DNS server. How to do it… In this section, we will work on installing a monitoring system and introducing its dashboard. This monitoring system is based on Heapster (https://github.com/kubernetes/heapster), a resource usage collecting and analyzing tool. Heapster communicates with kubelet to get the resource usage of both machine and container. Along with Heapster, we have influxDB (https://influxdata.com) for storage, and Grafana (http://grafana.org) as the frontend dashboard, which visualizes the status of resources in several user-friendly plots. Install monitoring cluster If you have gone through the preceding section about the prerequisite DNS server, you must be very familiar with deploying the system with official add-on templates. Let's go check the directory cluster-monitoring under <KUBERNETES_HOME>/cluster/addons. There are different environments provided for deploying monitoring cluster. We choose influxdb in this recipe for demonstration: # cd /opt/kubernetes/cluster/addons/cluster-monitoring/influxdb && ls grafana-service.yaml      heapster-service.yaml             influxdb-service.yaml heapster-controller.yaml  influxdb-grafana-controller.yaml Under this directory, you can see three templates for services and two for replication controllers. We will retain most of the service templates as the original ones. Because these templates define the network configurations, it is fine to use the default settings but expose Grafana service: # cat heapster-service.yaml apiVersion: v1 kind: Service metadata:   name: monitoring-grafana   namespace: kube-system   labels:     kubernetes.io/cluster-service: "true"     kubernetes.io/name: "Grafana" spec:   type: NodePort   ports:     - port: 80       nodePort: 30000       targetPort: 3000   selector:     k8s-app: influxGrafana As you can find, we expose Grafana service with port 30000. This revision will let us be able to access the dashboard of monitoring from browser. On the other hand, the replication controller of Heapster and the one combining influxDB and Grafana require more additional editing to meet our Kubernetes system: # cat influxdb-grafana-controller.yaml (Ignored above lines) : - image: gcr.io/google_containers/heapster_grafana:v2.6.0-2           name: grafana           env:           resources:             # keep request = limit to keep this container in guaranteed class             limits:               cpu: 100m               memory: 100Mi             requests:               cpu: 100m               memory: 100Mi           env:             # This variable is required to setup templates in Grafana.             - name: INFLUXDB_SERVICE_URL               value: http://monitoring-influxdb.kube-system:8086             - name: GF_AUTH_BASIC_ENABLED               value: "false"             - name: GF_AUTH_ANONYMOUS_ENABLED               value: "true"             - name: GF_AUTH_ANONYMOUS_ORG_ROLE               value: Admin             - name: GF_SERVER_ROOT_URL               value: / : (Ignored below lines) For the container of Grafana, please change some environment variables. The first one is the URL of influxDB service. Since we set up the DNS server, we don't have to specify the particular IP address. But an extra-postfix domain should be added. It is because the service is created in the namespace kube-system. Without adding this postfix domain, DNS server cannot resolve monitoring-influxdb in the default namespace. Furthermore, the Grafana root URL should be changed to a single slash. Instead of the default URL, the root (/) makes Grafana transfer the correct webpage in the current system. In the template of Heapster, we run two Heapster containers in a pod. These two containers use the same image andhave similar settings, but actually, they take to different roles. We just take a look at one of them as an example of modification: # cat heapster-controller.yaml (Ignore above lines) :       containers:         - image: gcr.io/google_containers/heapster:v1.0.2           name: heapster           resources:             limits:               cpu: 100m               memory: 200Mi             requests:               cpu: 100m               memory: 200Mi           command:             - /heapster             - --source=kubernetes:<MASTER_ENDPOINT_URL>:<EXPOSED_PORT>?inClusterConfig=false             - --sink=influxdb:http://monitoring-influxdb.kube-system:8086             - --metric_resolution=60s : (Ignore below lines) At the beginning, remove all double-big-parentheses lines. These lines will cause creation error, since they could not be parsed or considered in the YAML format. Still, there are two input variables that need to be replaced to possible values. Replace {{ metrics_memory }} and {{ eventer_memory }} to 200Mi. The value 200MiB is a guaranteed amount of memory that the container could have. And please change the usage for Kubernetes source. We specify the full access URL and port, and disable ClusterConfig for refraining authentication. Remember to do an adjustment on both the heapster and eventer containers. At last, now you can create these items with simple commands: # kubectl create -f influxdb-service.yaml service "monitoring-influxdb" created # kubectl create -f grafana-service.yaml You have exposed your service on an external port on all nodes in your If you want to expose this service to the external internet, you may need to set up firewall rules for the service port(s) (tcp:30000) to serve traffic. See http://releases.k8s.io/release-1.2/docs/user-guide/services-firewalls.md for more details. service "monitoring-grafana" created # kubectl create -f heapster-service.yaml service "heapster" created # kubectl create -f influxdb-grafana-controller.yaml replicationcontroller "monitoring-influxdb-grafana-v3" created // Because heapster requires the DB server and service to be ready, schedule it as the last one to be created. # kubectl create -f heapster-controller.yaml replicationcontroller "heapster-v1.0.2" created Check your Kubernetes resources at namespace kube-system: # kubectl get svc --namespace=kube-system NAME                  CLUSTER-IP        EXTERNAL-IP   PORT(S)             AGE heapster              192.168.135.85    <none>        80/TCP              12m kube-dns              192.168.0.2       <none>        53/UDP,53/TCP       15h monitoring-grafana    192.168.84.223    nodes         80/TCP              12m monitoring-influxdb   192.168.116.162   <none>        8083/TCP,8086/TCP   13m # kubectl get pod --namespace=kube-system NAME                                   READY     STATUS    RESTARTS   AGE heapster-v1.0.2-r6oc8                  2/2       Running   0          4m kube-dns-v11-k81cm                     4/4       Running   0          15h monitoring-influxdb-grafana-v3-d6pcb   2/2       Running   0          12m Congratulations! Once you have all the pods in a ready state, let's check the monitoring dashboard. Introduce Grafana dashboard At this moment, the Grafana dashboard is available through nodes' endpoints. Please make sure whether node's firewall or security group on AWS have opened port 30000 to your local subnet. Take a look at the dashboard by browser. Type <NODE_ENDPOINT>:30000 in your URL searching bar: In the default setting, we have Cluster and Pods in these two dashboards. Cluster board covers nodes' resource utilization, such as CPU, memory, network transaction, and storage. Pods dashboard has similar plots for each pod and you can go watching deep into each container in a pod: As the previous images show, for example, we can observe the memory utilization of individual containers in the pod kube-dns-v11, which is the cluster of the DNS server. The purple lines in the middle just indicate the limitation we set to the container skydns and kube2sky. Create a new metric to monitor pod There are several metrics for monitoring offered by Heapster (https://github.com/kubernetes/heapster/blob/master/docs/storage-schema.md).We are going to show you how to create a customized panel by yourself. Please take the following steps as a reference: Go to the Pods dashboard and click on ADD ROW at the bottom of webpage. A green button will show up on the left-hand side. Choose to add a graph panel: First, give your panel a name. For example, CPU Rate. We would like to create the one showing the rate of CPU utility: Set up the parameters in the query as shown in the following screenshot: FROM: For this parameter input cpu/usage_rate WHERE: For this parameter set type = pod_container AND: Set this parameter with the namespace_name=$namespace, pod_name= $podname value GROUP BY: Enter tag(container_name) for this parameter ALIAS BY: For this parameter input $tag_container_name Good job! You can now save the pod by clicking on the icon at upper bar. Just try to discover more functionality of the Grafana dashboard and the Heapster monitoring tool. You will get more understanding about your system, services, and containers through the information from the monitoring system. Summary This recipe informs you how to monitor your master node and nodes in the Kubernetes system. Kubernetes is a project which keeps moving forward and upgrade at a fast speed. The recommended way for catching up to it is to check out new features on its official website: http://kubernetes.io; also, you can always get new Kubernetes on GitHub: https://github.com/kubernetes/kubernetes/releases. Making your Kubernetes system up to date and learning new features practically is the best method to access the Kubernetes technology continuously. Resources for Article: Further resources on this subject: Setting up a Kubernetes Cluster [article]
Read more
  • 0
  • 0
  • 2524

article-image-identity-and-access-management-solutions-iot
Packt
02 Jun 2016
18 min read
Save for later

Identity and Access-Management Solutions for the IoT

Packt
02 Jun 2016
18 min read
In this article by Drew Van Duren and Brian Russell, the authors of the book, Practical Internet of Things Security, we'll have a look at how establishing a structured identity namespace will significantly help manage the identities of the thousands to millions of devices that will eventually be added to your organization. (For more resources related to this topic, see here.) Establishingnaming conventions and uniqueness requirements Uniqueness is a feature that can be randomized or deterministic (for example, algorithmically sequenced); its only requirement is that there are no others identical to it. The simplest unique identifier is a counter. Each value is assigned and never repeats itself. The other is a static value in concert with a counter, for example, a device manufacturer ID plus a product line ID plus a counter. In many cases, a random value is used in concert with static and counter fields. Nonrepetition is generally not enough from the manufacturer's perspective. Usually, something needs a name that provides some context. To this end, manufacturer-unique fields may be added in a variety of ways unique to the manufacturer or in conformance with an industry convention. Uniqueness may also be fulfilled by using a globally unique identifier (UUID) for which the UUID standard specified in RFC 4122 applies. No matter the mechanism, so long as a device is able to be provisioned, an identifier that is nonrepeating, unique to its manufacturer, use, application, or a hybrid of all these should be acceptable for use in identity management. Beyond the mechanisms, the only thing to be careful about is that the combination of all possible identifiers within a statically specified ID length should not be exhausted prematurely if at all possible. Once a method for assigning uniqueness to your IoT devices is established, the next step is to be able to logically identify the assets within their area of operation in order to support authentication and access-control functions. Naming a device Every time you access a restricted computing resource, your identity is checked to ensure that you are authorized to access that specific resource. There are many ways in whichthis can occur, but the endresult of a successful implementation is that someone who does not have the right credentials is not allowed access. Although the process sounds simple, there are a number of difficult challenges that must be overcome when discussing identity and access management for the constrained and numerous devices that comprise the IoT. One of the first challenges is related to the identity itself. Although identity may seem straightforward to you—your name, for example—that identity must be translated into a piece of information that the computing resource (or access-management system) understands. The identity must also not be duplicated across the information domain. Many computer systems today rely on a username, where each username within a domain is distinct. The username could be something as simple as <lastname_firstname_middleiniital>. In the case of the IoT, understanding what identities, or names, to provision to a device can cause confusion. As discussed, in some systems, devices use unique identifiers such as UUIDs or Electronic Serial Numbers (ESNs). We can see a good illustration by looking at how Amazon's first implementation of its IoT service makes use of the IoT device serial numbers of IoT devices. Amazon IoT includes a thing registry service that allows an administrator to register IoT devices, capturing for each the name of the thing and various attributes of it. The attributes can include data items such as: manufacturer type serial number deployment_date location Note that such attributes can be used in what is called attribute-based access control(ABAC). ABAC access approaches allow access decision policies to be defined not just by the identity of the device but also its properties (attributes). Rich, potentially complex rules can be defined for the needs at hand. The following figure provides a view of the AWS IoT service: Even when identifiers such as UUIDs or ESNs are available for an IoT device, these are generally not sufficient for securing authentication and access-control decisions; an identifier can easily be spoofed without enhancement through cryptographic controls. In these instances, administrators must bind another type of identifier to a device. This binding can be as simple as associating a password with the identifier or, more appropriately, using credentials such as digital certificates. IoT messaging protocols frequently include the ability to transmit a unique identifier. For example, MQTT includes a ClientID field that can transmit a broker-unique client identifier. In the case of MQTT, the ClientID value is used to maintain state within a unique broker-client communication session. Secure bootstrap Nothing is worse for security than an IoT-enabled system or network replete with false identities used in acts of identity theft, loss of private information, spoofing, and general mayhem. However, a difficult task in the identity lifecycle is to establish the initial trust in the device that allows it to bootstrap itself into the system. Among the greatest vulnerabilities to secure identity and access management is insecure bootstrapping. Bootstrapping represents the beginning of the process of provisioning a trusted identity to a device within a given system. Bootstrapping may begin in the manufacturing process (for example, in the foundry manufacturing a chip) and be completed once delivered to the end operator. It may also be completely performed in the hands of the end user or some intermediary (for example, the depot or supplier) once delivered. The most secure bootstrapping methods start in the manufacturing processes and implement discrete security associations throughout the supply chain. They uniquely identify a device through: Unique serial numbers printed on the device. Unique and unalterable identifiers stored and fused in device read-only memory(ROM). Manufacturer-specific cryptographic keys used only through specific lifecycle states to securely handoff the bootstrapping process to follow-on lifecycle states (for example, shipping, distribution, and handoff to an enrollment center). Such keys (frequently delivered outofband) are used for loading subsequent components by specific entities responsible for preparing the device. PKIs are often used to aid in the bootstrap process. Bootstrapping from a PKI perspective should generally involve the following processes: Devices shouldbe securely shipped from the manufacturer (via secureshipping servicescapable of tamperdetection) to a trusted facility or depot. The facility should have robust physical security access controls, record-keeping, and audit processes in addition to highly vetted staff. Device counts and batches should bematched against the shipping manifest. Once they have been received, the steps for each device include: Authenticating the device uniquely,using a customer-specific, default manufacturer authenticator (password or key). Installing PKI trust anchors and any intermediate public key certificates (for example, those of the registration authority, enrollment certificate authority, or other roots). Installing minimal network reachability information such that the device knows where to check certificate revocation lists, perform OCSP lookups, or perform other security-related functions. Provisioning the device PKI credentials (public key signed by CA) and private key(s) such that other entities possessing the signing CA keys can trust the new device. A secure bootstrapping process may not be identical to that described in the preceding list but should be one that mitigates the following types of threats and vulnerabilities when provisioning devices: Insider threats designed to introduce new, rogue, or compromised devices (whichshould not be trusted) Duplication (cloning) of devices no matter where in the lifecycle Introduction of public key trust anchors or other key material into a device that should notbe trusted (rogue trust anchors and other keys) Compromising (including replication) of a new IoT device's private keys during key generation or import into the device Gaps in device possession during the supply chain and enrollment processes Protection of the device when re-keying and assigning new identification material needed for normal use (re-bootstrapping as needed) Given the security-critical features of smart chip cards and their use in sensitive financial operations, the smartcard industry adopted rigid enrollment process controls not unlike those described above. Without them, severe attacks would have the potential of crippling the financial industry. Granted, many consumer-level IoT devices are unlikely to have secure bootstrap processes, but over time,the authorsbelieve that this will change, depending on the deployment environment and the stakeholders' appreciation of the threats. The more connected devices become, the more their potential to do harm. In practice, secure bootstrapping processes need to be tailored to the threat environment of the particular IoT device, its capabilities, and the network environment in question. The greater the potential risks, the more strict and thorough the bootstrapping process needs to be. Credential and attribute provisioning Once the foundation for identities within the device is laid, the provisioning of operational credentials and attributes can occur. These are the credentials that will be used within an IoT system for secure communications, authentication, and integrity protections. The authorsstrongly recommend using certificates for authentication and authorization whenever possible. If using certificates, an important and security-relevant consideration is whether to generate the key pairs on the device itself or centrally. Some IoT services allow the central (for example, by a key server) generation of public/private key pairs. While this can be an efficient method of bulk-provisioning thousands of devices with credentials, care should be taken to address potential vulnerabilities the process may expose (that is, the sending of sensitive, private key material through intermediary devices/systems). If centralized generation is used, it should make use of a strongly secured key-management system operated by vetted personnel in secured facilities. Another means of provisioning certificates is through local generation of the key pairs (directly on the IoT device) followed by transmission of the public key certificate through a certificate-signing request to the PKI. Absent well-secured bootstrapping procedures, additional policy controls will have to be established for the PKI's registration authority(RA) in order to verify the identity of the device being provisioned. In general, the more secure the bootstrapping process, the more automated the provisioning can be. The following is a sequence diagram that depicts an overall registration, enrollment, and provisioning flow for an IoT device: Local access There are times when local access to the device is required for administration purposes. This may require the provisioning of SSH keys or administrative passwords. In the past, organizations frequently made the mistake of sharing administrative passwords to allow easeofaccess to devices. This is not a recommended approach, although implementing a federated access solution for administrators can be daunting. This is especially true when devices are spread across wide geographic distances, such various sensors, gateways, and other unattended devices in the transportation industry. Account monitoring and control After accounts and credentials have been provisioned, accounts must continue to be monitored against defined security policies. It is also important that organizations monitor the strength of the credentials (that is, cryptographic cipher suites and key lengths) provisioned to IoT devices across their infrastructure. It is highly likely that pockets of teams will provision IoT subsystems on their own; therefore, defining, communicating, and monitoring the required security controls to apply to those systems is vital. Another aspect of monitoring relates to tracking the use of accounts and credentials. Assign someone to audit local IoT device administrative credential (passwords and SSH keys) use on a routine basis. Also, strongly consider whether privileged account-management tools can be applied to your IoT deployment. These tools havefeatures such as checking out administrative passwords to aid in audit processes. Account updates Credentials must be rotated on a regular basis; this is true for certificates and keys as well as passwords. Logistical impediments have historically hampered IT organizations' willingness to shorten certificate lifetimes and manage increasing numbers of credentials. There is a tradeoff to consider as short-lived credentials have a reduced attack footprint, yet the process of changing them tends to be expensive and time consuming. Whenever possible, look for automated solutions these processes. Services such as Let's Encrypt (https://letsencrypt.org/) are gaining popularity inhelping improve and simplify certificate-management practices for organizations. Let's Encrypt provides PKI services along with an extremely easy-to-use plugin-based client that supports various platforms. Account suspension Just as with user accounts, do not automatically delete IoT device accounts. Consider maintaining those accounts in a suspended state in case data tied to the accounts is required for forensic analysis at a later time. Account / credential deactivation/ deletion Deleting accounts used by IoT devices and the services they interact with will help combat the ability of an adversary to use those accounts to gain access after the devices have been decommissioned. Keys used for encryption (whether network or application) should also be deleted to keep adversaries from decrypting captured data later using those recovered keys. Authentication credentials IoT messaging protocols often support the ability to use different types of credentials for authentication with external services and other IoT devices. This section examines the typical options available for these functions. Passwords Some protocols, such as MQTT, only provide the ability to use a username/password combination for nativeprotocol authentication purposes. Within MQTT, the CONNECT message includes the fields for passing this information to an MQTT broker. In the MQTT version 3.1.1 specification defined by OASIS, you can see these fields within the CONNECT message (http://docs.oasis-open.org/mqtt/mqtt/v3.1.1/os/mqtt-v3.1.1-os.html): Note that there are no protections applied to support the confidentiality of the username/password in transit by the MQTT protocol. Instead, implementers should consider using the Transport Layer Security (TLS) protocol to provide cryptographic protections. There are numerous security considerations related to using a username/password-based approach for IoT devices. Some of these concerns include: Difficulty in managing large numbers of device usernames and passwords Difficulty securing the passwords stored on the devices themselves Difficulty managing passwords throughout the device lifecycle Though not ideal, if you do plan on implementing a username/password system for IoT device authentication, consider taking these precautions: Create policies and procedures torotate passwords at least every 30 days for each device. Better yet, implement a technical control wherein the management interface automatically prompts you when password rotation is needed. Establish controls for monitoring device account activity. Establish controls for privileged accounts that support administrative access to IoT devices. Segregate the password-protected IoT devices into less-trusted networks. Symmetric keys Symmetric key material may also be used to authenticate. Message authentication codes (MACs) are generated using a MAC algorithm (such as HMAC and CMAC) with a shared key and known data (signed by the key). On the receiving side, an entity can prove that the sender possessed the preshared key when its computed MAC is shown to be identical to the received MAC. Unlike a password, symmetric keys do not require the key to be sent between the parties (except ahead of time or agreed on using a key-establishment protocol) at the time of the authentication event. The keys will either need to be established using a public key algorithm, input out of band, or sent to the devices ahead of time, encrypted using key encryption keys (KEKs). Certificates Digital certificates, based on public keys, are the preferred method of providing authentication functionality in the IoT. Although some implementations today may not support the processing capabilities needed to use certificates, Moore's law for computational power and storage is fast changing this. X.509 Certificates come with a highly organized hierarchical naming structure that consists of organizations, organizational units, and distinguished names(DNs) or common names(CNs). Referencing AWS support for provisioning X.509 certificates, we can see that AWS allowsone-click generation of a device certificate. In the following example, we generate a device certificate with a generic IoT device common name and a lifetime of 33 years. The one-click generation also (centrally) creates the public/private key pair. If possible, it is recommended that you generate your certificates locally by generating a key pair on the devices and uploading a CSR to the AWS IoT service. This enables thecustomized tailoring of the certificate policy in order to define the hierarchical units (OU, DN, and so on) that are useful for additional authorization processes. IEEE 1609.2 The IoT is characterized by many use cases involving machine-to-machine communication, and some of them involve communications through the congested wireless spectrum. Take connected vehicles, for instance: an emerging technology wherein your vehicle will possess onboardequipment(OBE) that frequently automatically alerts other drivers in your vicinity to your car's location in the form of basic safety messages(BSM). The automotive industry, the US Department of Transportation (USDOT), and academia have been developing CV technology for many years, and it will make its commercial debut in the 2017 Cadillac. In a few years, it is likely that most new US vehicles will be outfitted with the technology. It will not only enable vehicle-to-vehicle communications but also vehicle-to-infrastructure (V2I) communications to various roadside and backhaul applications. The Dedicated Short Range Communications (DSRC) wireless protocol (based on IEEE 802.11p) is limited to a narrow set of channels in the 5-GHz frequency band. To accommodate so many vehicles and maintain security, it was necessary to secure the communications using cryptography (to reduce malicious spoofing or eavesdropping attacks) and minimize the security overhead within connected vehicles' BSM transmissions. The industry resolved to a new, slimmer, and sleeker digital certificate design: IEEE 802.16. The 1609.2 certificate format is advantageous in that it is approximately halfthe size of a typical X.509 certificate while still using strong, elliptic curve cryptographic algorithms (ECDSA and ECDH). The certificate is also useful for general machine-to-machine communication through its unique attributes, including explicit application identifier (SSID) and credential holder permission (SSP) fields. These attributes can allow IoT applications to make explicit access-control decisions without having to internally or externally query the credential holder's permissions. They're embedded right in the certificate during the secure, integrated bootstrap and enrollment process with the PKI. The reduced size of these credentials also makes them attractive for other, bandwidth-constrained wireless protocols. Biometrics There is work being done in the industry today on new approaches that leverage biometrics for device authentication. The FIDO alliance (www.fidoalliance.org) has developed specifications that define the use of biometrics for both a passwordless experience as well as for use as a second authentication factor. Authentication can include a range of flexible biometric types, from fingerprints to voiceprints. Biometric authentication is being added to some commercial IoT devices (for example, consumer door locks) already, and there is interesting potential in leveraging biometrics as a second factor of authentication for IoT systems. As an example, voiceprints can be used to enable authentication across a set of distributed IoT devices such as roadside equipment(RSE) in the transportation sector. This would allow an RSE tech to access the device through a cloud connection to the backend authentication server. Companies like Hypr Biometric Security (https://www.hypr.com/) are leading the way toward using this technology to reduce the need for passwords and enable more robust authentication techniques. New work in authorization for the IoT Progress toward using tokens with resource-constrained IoT devices has not fully matured; however, there are organizations working on defining the use of protocols such as OAUTH 2.0 for the IoT. One such group is the Internet Engineering Task Force (IETF), through the Authentication and Authorization for Constrained Environments (ACE) effort. ACE has specified RFC 7744,Use Cases for Authentication and Authorization in Constrained Environments (https://datatracker.ietf.org/doc/rfc7744/). The RFC use cases are primarily based on IoT devices that employ CoAP as the messaging protocol. The document provides a useful set of use cases that clarify the need for a comprehensive IoT authentication and authorization strategy. RFC 7744 provides valuable considerations the for authentication and authorization of IoT devices, including these: Devices may host several resources, wherein each requires its own access-control policy. A single device may have different access rights for different requesting entities. Policy decision points must be able to evaluate the context of a transaction. This includes the potential for understanding that a transaction is occurring during an emergency situation. The ability to dynamically control authorization policies is critical to supporting the dynamic environment of the IoT. IoT IAM infrastructure Now that we have addressed many of the enablers of identity and access management, it is important to elaborate on how solutions are realized in infrastructures. This section is primarily devoted to public key infrastructures(PKIs) and their utility in securing IAM deployments for the IoT. 802.1x 802.1x authentication mechanisms can be employed to limit IP-based IoT device access to a network. Note though that not all IoT devices rely on the provisioning of an IP address. While it cannot accommodate all IoT device types, implementing 802.1x is a component of a good access-control strategy that addresses many use cases. Enabling 802.1x authentication requires an access device and an authentication server. The access device is typically an access point and the authentication server can take the form of a RADIUS or authentication, authorization, and accounting(AAA) server. Summary This articleprovided an introduction to the infrastructure components required for provisioning authentication credentials, with a heavy focus on PKI. A look at different types of authentication credentials was given and a new approaches to providing authorization and access control for IoT devices were also discussed. Resources for Article: Further resources on this subject: Internet of Things with BeagleBone [article] The Internet of Things [article] Internet of Things with Xively [article]
Read more
  • 0
  • 0
  • 6985

article-image-understanding-patterns-and-architecturesin-typescript
Packt
01 Jun 2016
19 min read
Save for later

Understanding Patterns and Architecturesin TypeScript

Packt
01 Jun 2016
19 min read
In this article by Vilic Vane,author of the book TypeScript Design Patterns, we'll study architecture and patterns that are closely related to the language or its common applications. Many topics in this articleare related to asynchronous programming. We'll start from a web architecture for Node.js that's based on Promise. This is a larger topic that has interesting ideas involved, including abstractions of response and permission, as well as error handling tips. Then, we'll talk about how to organize modules with ES module syntax. Due to the limited length of this article, some of the related code is aggressively simplified, and nothing more than the idea itself can be applied practically. (For more resources related to this topic, see here.) Promise-based web architecture The most exciting thing for Promise may be the benefits brought to error handling. In a Promise-based architecture, throwing an error could be safe and pleasant. You don't have to explicitly handle errors when chaining asynchronous operations, and this makes it tougher for mistakes to occur. With the growing usage with ES2015 compatible runtimes, Promise has already been there out of the box. We have actually plenty of polyfills for Promises (including my ThenFail, written in TypeScript) as people who write JavaScript roughly, refer to the same group of people who create wheels. Promises work great with other Promises: A Promises/A+ compatible implementation should work with other Promises/A+ compatible implementations Promises do their best in a Promise-based architecture If you are new to Promise, you may complain about trying Promise with a callback-based project. You may intend to use helpers provided by Promise libraries, such asPromise.all, but it turns out that you have better alternatives,such as the async library. So, the reason that makes you decide to switch should not be these helpers (as there are a lot of them for callbacks).They should be because there's an easier way to handle errors or because you want to take the advantages of ES async and awaitfeatures which are based on Promise. Promisifying existing modules or libraries Though Promises do their best with a Promise-based architecture, it is still possible to begin using Promise with a smaller scope by promisifying existing modules or libraries. Taking Node.js style callbacks as an example, this is how we use them: import * as FS from 'fs';   FS.readFile('some-file.txt', 'utf-8', (error, text) => { if (error) {     console.error(error);     return; }   console.log('Content:', text); }); You may expect a promisified version of readFile to look like the following: FS .readFile('some-file.txt', 'utf-8') .then(text => {     console.log('Content:', text); }) .catch(reason => {     Console.error(reason); }); Implementing the promisified version of readFile can be easy as the following: function readFile(path: string, options: any): Promise<string> { return new Promise((resolve, reject) => {     FS.readFile(path, options, (error, result) => {         if (error) { reject(error);         } else {             resolve(result);         }     }); }); } I am using any here for parameter options to reduce the size of demo code, but I would suggest that you donot useany whenever possible in practice. There are libraries that are able to promisify methods automatically. Unfortunately, you may need to write declaration files yourself for the promisified methods if there is no declaration file of the promisified version that is available. Views and controllers in Express Many of us may have already been working with frameworks such as Express. This is how we render a view or send back JSON data in Express: import * as Path from 'path'; import * as express from 'express';   let app = express();   app.set('engine', 'hbs'); app.set('views', Path.join(__dirname, '../views'));   app.get('/page', (req, res) => {     res.render('page', {         title: 'Hello, Express!',         content: '...'     }); });   app.get('/data', (req, res) => {     res.json({         version: '0.0.0',         items: []     }); });   app.listen(1337); We will usuallyseparate controller from routing, as follows: import { Request, Response } from 'express';   export function page(req: Request, res: Response): void {     res.render('page', {         title: 'Hello, Express!',         content: '...'     }); } Thus, we may have a better idea of existing routes, and we may have controllers managed more easily. Furthermore, automated routing can be introduced so that we don't always need to update routing manually: import * as glob from 'glob';   let controllersDir = Path.join(__dirname, 'controllers');   let controllerPaths = glob.sync('**/*.js', {     cwd: controllersDir });   for (let path of controllerPaths) {     let controller = require(Path.join(controllersDir, path));     let urlPath = path.replace(/\/g, '/').replace(/.js$/, '');       for (let actionName of Object.keys(controller)) {         app.get(             `/${urlPath}/${actionName}`, controller[actionName] );     } } The preceding implementation is certainly too simple to cover daily usage. However, it displays the one rough idea of how automated routing could work: via conventions that are based on file structures. Now, if we are working with asynchronous code that is written in Promises, an action in the controller could be like the following: export function foo(req: Request, res: Response): void {     Promise         .all([             Post.getContent(),             Post.getComments()         ])         .then(([post, comments]) => {             res.render('foo', {                 post,                 comments             });         }); } We use destructuring of an array within a parameter. Promise.all returns a Promise of an array with elements corresponding to values of resolvablesthat are passed in. (A resolvable means a normal value or a Promise-like object that may resolve to a normal value.) However, this is not enough, we need to handle errors properly. Or in some case, the preceding code may fail in silence (which is terrible). In Express, when an error occurs, you should call next (the third argument that is passed into the callback) with the error object, as follows: import { Request, Response, NextFunction } from 'express';   export function foo( req: Request, res: Response, next: NextFunction ): void {     Promise         // ...         .catch(reason => next(reason)); } Now, we are fine with the correctness of this approach, but this is simply not how Promises work. Explicit error handling with callbacks could be eliminated in the scope of controllers, and the easiest way to do this is to return the Promise chain and hand over to code that was previously performing routing logic. So, the controller could be written like the following: export function foo(req: Request, res: Response) {     return Promise         .all([             Post.getContent(),             Post.getComments()         ])         .then(([post, comments]) => {             res.render('foo', {                 post,                 comments             });         }); } Or, can we make this even better? Abstraction of response We've already been returning a Promise to tell whether an error occurs. So, for a server error, the Promise actually indicates the result, or in other words, the response of the request. However, why we are still calling res.render()to render the view? The returned Promise object could be an abstraction of the response itself. Think about the following controller again: export class Response {}   export class PageResponse extends Response {     constructor(view: string, data: any) { } }   export function foo(req: Request) {     return Promise         .all([             Post.getContent(),             Post.getComments()         ])         .then(([post, comments]) => {             return new PageResponse('foo', {                 post,                 comments             });         }); } The response object that is returned could vary for a different response output. For example, it could be either a PageResponse like it is in the preceding example, a JSONResponse, a StreamResponse, or even a simple Redirection. As in most of the cases, PageResponse or JSONResponse is applied, and the view of a PageResponse can usually be implied with the controller path and action name.It is useful to have these two responses automatically generated from a plain data object with proper view to render with, as follows: export function foo(req: Request) {     return Promise         .all([             Post.getContent(),             Post.getComments()         ])         .then(([post, comments]) => {             return {                 post,                 comments             };         }); } This is how a Promise-based controller should respond. With this idea in mind, let's update the routing code with an abstraction of responses. Previously, we were passing controller actions directly as Express request handlers. Now, we need to do some wrapping up with the actions by resolving the return value, and applying operations that are based on the resolved result, as follows: If it fulfills and it's an instance of Response, apply it to the resobjectthat is passed in by Express. If it fulfills and it's a plain object, construct a PageResponse or a JSONResponse if no view found and apply it to the resobject. If it rejects, call thenext function using this reason. As seen previously,our code was like the following: app.get(`/${urlPath}/${actionName}`, controller[actionName]); Now, it gets a little bit more lines, as follows: let action = controller[actionName];   app.get(`/${urlPath}/${actionName}`, (req, res, next) => {     Promise         .resolve(action(req))         .then(result => {             if (result instanceof Response) {                 result.applyTo(res);             } else if (existsView(actionName)) {                 new PageResponse(actionName, result).applyTo(res);             } else {                 new JSONResponse(result).applyTo(res);             }         })         .catch(reason => next(reason)); });   However, so far we can only handle GET requests as we hardcoded app.get() in our router implementation. The poor view matching logic can hardly be used in practice either. We need to make these actions configurable, and ES decorators could perform a good job here: export default class Controller { @get({     View: 'custom-view-path' })     foo(req: Request) {         return {             title: 'Action foo',             content: 'Content of action foo'         };     } } I'll leave the implementation to you, and feel free to make them awesome. Abstraction of permission Permission plays an important role in a project, especially in systems that have different user groups. For example, a forum. The abstraction of permission should be extendable to satisfy changing requirements, and it should be easy to use as well. Here, we are going to talk about the abstraction of permission in the level of controller actions. Consider the legibility of performing one or more actions a privilege. The permission of a user may consist of several privileges, and usually most of the users at the same level would have the same set of privileges. So, we may have a larger concept, namely groups. The abstraction could either work based on both groups and privileges, or work based on only privileges (groups are now just aliases to sets of privileges): Abstraction that validates based on privileges and groups at the same time is easier to build. You do not need to create a large list of which actions can be performed for a certain group of user, as granular privileges are only required when necessary. Abstraction that validates based on privileges has better control and more flexibility to describe the permission. For example, you can remove a small set of privileges from the permission of a user easily. However, both approaches have similar upper-level abstractions, and they differ mostly on implementations. The general structure of the permission abstractions that we've talked about is like in the following diagram: The participants include the following: Privilege: This describes detailed privilege corresponding to specific actions Group: This defines a set of privileges Permission: This describes what a user is capable of doing, consist of groups that the user belongs to, and the privileges that the user has. Permission descriptor: This describes how the permission of a user works and consists of possible groups and privileges. Expected errors A great concern that was wiped away after using Promises is that we do not need to worry about whether throwing an error in a callback would crash the application most of the time. The error will flow through the Promises chain and if not caught, it will be handled by our router. Errors can be roughly divided as expected errors and unexpected errors. Expected errors are usually caused by incorrect input or foreseeable exceptions, and unexpected errors are usually caused by bugs or other libraries that the project relies on. For expected errors, we usually want to give users a friendly response with readable error messages and codes. So that the user can help themselves searching the error or report to us with useful context. For unexpected errors, we would also want a reasonable response (usually a message described as an unknown error), a detailed server-side log (including real error name, message, stack information, and so on), and even alerts to let the team know as soon as possible. Defining and throwing expected errors The router will need to handle different types of errors, and an easy way to achieve this is to subclass a universal ExpectedError class and throw its instances out, as follows: import ExtendableError from 'extendable-error';   class ExpectedError extends ExtendableError { constructor(     message: string,     public code: number ) {     super(message); } } The extendable-error is a package of mine that handles stack trace and themessage property. You can directly extend Error class as well. Thus, when receiving an expected error, we can safely output the error name and message as part of the response. If this is not an instance of ExpectedError, we can display predefined unknown error messages. Transforming errors Some errors such as errors that are caused by unstable networks or remote services are expected.We may want to catch these errors and throw them out again as expected errors. However, it could be rather trivial to actually do this. A centralized error transforming process can then be applied to reduce the efforts required to manage these errors. The transforming process includes two parts: filtering (or matching) and transforming. These are the approaches to filter errors: Filter by error class: Many third party libraries throws error of certain class. Taking Sequelize (a popular Node.js ORM) as an example, it has DatabaseError, ConnectionError, ValidationError, and so on. By filtering errors by checking whether they are instances of a certain error class, we may easily pick up target errors from the pile. Filter by string or regular expression: Sometimes a library might be throw errors that are instances of theError class itself instead of its subclasses.This makes these errors hard to distinguish from others. In this situation, we can filter these errors by their message with keywords or regular expressions. Filter by scope: It's possible that instances of the same error class with the same error message should result in a different response. One of the reasons may be that the operation throwing a certain error is at a lower-level, but it is being used by upper structures within different scopes. Thus, a scope mark can be added for these errors and make it easier to be filtered. There could be more ways to filter errors, and they are usually able to cooperate as well. By properly applying these filters and transforming errors, we can reduce noises, analyze what's going on within a system,and locate problems faster if they occur. Modularizing project Before ES2015, there are actually a lot of module solutions for JavaScript that work. The most famous two of them might be AMD and CommonJS. AMD is designed for asynchronous module loading, which is mostly applied in browsers. While CommonJSperforms module loading synchronously, and this is the way that the Node.js module system works. To make it work asynchronously, writing an AMD module takes more characters. Due to the popularity of tools, such asbrowserify and webpack, CommonJS becomes popular even for browser projects. Proper granularity of internal modules can help a project keep a healthy structure. Consider project structure like the following: project├─controllers├─core│  │ index.ts│  ││  ├─product│  │   index.ts│  │   order.ts│  │   shipping.ts│  ││  └─user│      index.ts│      account.ts│      statistics.ts│├─helpers├─models├─utils└─views Let's assume that we are writing a controller file that's going to import a module defined by thecore/product/order.ts file. Previously, usingCommonJS style'srequire, we would write the following: const Order = require('../core/product/order'); Now, with the new ES import syntax, this would be like the following: import * as Order from '../core/product/order'; Wait, isn't this essentially the same? Sort of. However, you may have noticed several index.ts files that I've put into folders. Now, in the core/product/index.tsfile, we could have the following: import * as Order from './order'; import * as Shipping from './shipping';   export { Order, Shipping } Or, we could also have the following: export * from './order'; export * from './shipping'; What's the difference? The ideal behind these two approaches of re-exporting modules can vary. The first style works better when we treat Order and Shipping as namespaces, under which the identifier names may not be easy to distinguish from one another. With this style, the files are the natural boundaries of building these namespaces. The second style weakens the namespace property of two files, and then uses them as tools to organize objects and classes under the same larger category. A good thingabout using these files as namespaces is that multiple-level re-exporting is fine, while weakening namespaces makes it harder to understand different identifier names as the number of re-exporting levels grows. Summary In this article, we discussed some interesting ideas and an architecture formed by these ideas. Most of these topics focused on limited examples, and did their own jobs.However, we also discussed ideas about putting a whole system together. Resources for Article: Further resources on this subject: Introducing Object Oriented Programmng with TypeScript [article] Writing SOLID JavaScript code with TypeScript [article] Optimizing JavaScript for iOS Hybrid Apps [article]
Read more
  • 0
  • 0
  • 32879

article-image-classifier-construction
Packt
01 Jun 2016
8 min read
Save for later

Classifier Construction

Packt
01 Jun 2016
8 min read
In this article by Pratik Joshi, author of the book Python Machine Learning Cookbook, we will build a simple classifier using supervised learning, and then go onto build a logistic-regression classifier. Building a simple classifier In the field of machine learning, classification refers to the process of using the characteristics of data to separate it into a certain number of classes. A supervised learning classifier builds a model using labeled training data, and then uses this model to classify unknown data. Let's take a look at how to build a simple classifier. (For more resources related to this topic, see here.) How to do it… Before we begin, make sure thatyou have imported thenumpy and matplotlib.pyplot packages. After this, let's create some sample data: X = np.array([[3,1], [2,5], [1,8], [6,4], [5,2], [3,5], [4,7], [4,-1]]) Let's assign some labels to these points: y = [0, 1, 1, 0, 0, 1, 1, 0] As we have only two classes, the list y contains 0s and 1s. In general, if you have N classes, then the values in y will range from 0 to N-1. Let's separate the data into classes that are based on the labels: class_0 = np.array([X[i] for i in range(len(X)) if y[i]==0]) class_1 = np.array([X[i] for i in range(len(X)) if y[i]==1]) To get an idea about our data, let's plot this, as follows: plt.figure() plt.scatter(class_0[:,0], class_0[:,1], color='black', marker='s') plt.scatter(class_1[:,0], class_1[:,1], color='black', marker='x') This is a scatterplot where we use squares and crosses to plot the points. In this context,the marker parameter specifies the shape that you want to use. We usesquares to denote points in class_0 and crosses to denote points in class_1. If you run this code, you will see the following figure: In the preceding two lines, we just use the mapping between X and y to create two lists. If you were asked to inspect the datapoints visually and draw a separating line, what would you do? You would simply draw a line in between them. Let's go ahead and do this: line_x = range(10) line_y = line_x We just created a line with the mathematical equation,y = x. Let's plot this, as follows: plt.figure() plt.scatter(class_0[:,0], class_0[:,1], color='black', marker='s') plt.scatter(class_1[:,0], class_1[:,1], color='black', marker='x') plt.plot(line_x, line_y, color='black', linewidth=3) plt.show() If you run this code, you should see the following figure: There's more… We built a really simple classifier using the following rule: the input point (a, b) belongs to class_0 if a is greater than or equal tob;otherwise, it belongs to class_1. If you inspect the points one by one, you will see that this is true. This is it! You just built a linear classifier that can classify unknown data. It's a linear classifier because the separating line is a straight line. If it's a curve, then it becomes a nonlinear classifier. This formation worked fine because there were a limited number of points, and we could visually inspect them. What if there are thousands of points? How do we generalize this process? Let's discuss this in the next section. Building a logistic regression classifier Despite the word regression being present in the name, logistic regression is actually used for classification purposes. Given a set of datapoints, our goal is to build a model that can draw linear boundaries between our classes. It extracts these boundaries by solving a set of equations derived from the training data. Let's see how to do that in Python: We will use the logistic_regression.pyfile that is already provided to you as a reference. Assuming that you have imported the necessary packages, let's create some sample data along with training labels: X = np.array([[4, 7], [3.5, 8], [3.1, 6.2], [0.5, 1], [1, 2], [1.2, 1.9], [6, 2], [5.7, 1.5], [5.4, 2.2]]) y = np.array([0, 0, 0, 1, 1, 1, 2, 2, 2]) Here, we assume that we have three classes. Let's initialize the logistic regression classifier: classifier = linear_model.LogisticRegression(solver='liblinear', C=100) There are a number of input parameters that can be specified for the preceding function, but a couple of important ones are solver and C. The solverparameter specifies the type of solver that the algorithm will use to solve the system of equations. The C parameter controls the regularization strength. A lower value indicates higher regularization strength. Let's train the classifier: classifier.fit(X, y) Let's draw datapoints and boundaries: plot_classifier(classifier, X, y) We need to define this function: def plot_classifier(classifier, X, y):     # define ranges to plot the figure     x_min, x_max = min(X[:, 0]) - 1.0, max(X[:, 0]) + 1.0     y_min, y_max = min(X[:, 1]) - 1.0, max(X[:, 1]) + 1.0 The preceding values indicate the range of values that we want to use in our figure. These values usually range from the minimum value to the maximum value present in our data. We add some buffers, such as 1.0 in the preceding lines, for clarity. In order to plot the boundaries, we need to evaluate the function across a grid of points and plot it. Let's go ahead and define the grid: # denotes the step size that will be used in the mesh grid     step_size = 0.01       # define the mesh grid     x_values, y_values = np.meshgrid(np.arange(x_min, x_max, step_size), np.arange(y_min, y_max, step_size)) The x_values and y_valuesvariables contain the grid of points where the function will be evaluated. Let's compute the output of the classifier for all these points: # compute the classifier output     mesh_output = classifier.predict(np.c_[x_values.ravel(), y_values.ravel()])       # reshape the array     mesh_output = mesh_output.reshape(x_values.shape) Let's plot the boundaries using colored regions: # Plot the output using a colored plot     plt.figure()       # choose a color scheme     plt.pcolormesh(x_values, y_values, mesh_output, cmap=plt.cm.Set1) This is basically a 3D plotter that takes the 2D points and the associated values to draw different regions using a color scheme. You can find all the color scheme options athttp://matplotlib.org/examples/color/colormaps_reference.html. Let's overlay the training points on the plot: plt.scatter(X[:, 0], X[:, 1], c=y, edgecolors='black', linewidth=2, cmap=plt.cm.Paired)       # specify the boundaries of the figure     plt.xlim(x_values.min(), x_values.max())     plt.ylim(y_values.min(), y_values.max())       # specify the ticks on the X and Y axes     plt.xticks((np.arange(int(min(X[:, 0])-1), int(max(X[:, 0])+1), 1.0)))     plt.yticks((np.arange(int(min(X[:, 1])-1), int(max(X[:, 1])+1), 1.0)))       plt.show() Here, plt.scatter plots the points on the 2D graph. TheX[:, 0] specifies that we should take all the values along axis 0 (X-axis in our case), and X[:, 1] specifies axis 1 (Y-axis). The c=y parameter indicates the color sequence. We use the target labels to map to colors using cmap. We basically want different colors based on the target labels; hence, we use y as the mapping. The limits of the display figure are set using plt.xlim and plt.ylim. In order to mark the axes with values, we need to use plt.xticks and plt.yticks. These functions mark the axes with values so that it's easier for us to see where the points are located. In the preceding code, we want the ticks to lie between the minimum and maximum values with a buffer of 1 unit. We also want these ticks to be integers. So, we use theint() function to round off the values. If you run this code, you should see the following output: Let's see how the Cparameter affects our model. The C parameter indicates the penalty for misclassification. If we set this to 1.0, we will get the following figure: If we set C to 10000, we get the following figure: As we increase C, there is a higher penalty for misclassification. Hence, the boundaries get more optimal. Summary We successfully employed supervised learning to build a simple classifier. We subsequently went on to construct a logistic-regression classifier and saw different results of tweaking C—the regularization strength parameter. Resources for Article:   Further resources on this subject: Python Scripting Essentials [article] Web scraping with Python (Part 2) [article] Web Server Development [article]
Read more
  • 0
  • 0
  • 2323
article-image-understanding-uikitfundamentals
Packt
01 Jun 2016
9 min read
Save for later

Understanding UIKitFundamentals

Packt
01 Jun 2016
9 min read
In this article by Jak Tiano, author of the book Learning Xcode, we're mostly going to be talking about concepts rather than concrete code examples. Since we've been using UIKit throughout the whole book (and we will continue to do so), I'm going to do my best to elaborate on some things we've already seen and give you new information that you can apply to what we do in the future. (For more resources related to this topic, see here) As we've heard a lot about UIKit. We've seen it at the top of our Swift files in the form of import UIKit. We've used many of the UI elements and classes it provides for us. Now, it's time to take an isolated look at the biggest and most important framework in iOS development. Application management Unlike most other frameworks in the iOS SDK, UIKit is deeply integrated into the way your app runs. That's because UIKit is responsible for some of the most essential functionalities of an app. It also manages your application's window and view architecture, which we'll be talking about next. It also drives the main run loop, which basically means that it is executing your program. The UIDevice class In addition to these very important features, UIKit also gives you access to some other useful information about the device the app is currently running on through the UIDevice class. Using online resources and documentation: Since this article is about exploring frameworks, it is a good time to remind you that you can (and should!) always be searching online for anything and everything. For example, if you search for UIDevice, you'll end up on Apple's developer page for the UIDevice class, where you can see even more bits of information that you can pull from it. As we progress, keep in mind that searching the name of a class or framework will usually give you quick access to the full documentation. Here are some code examples of the information you can access: UIDevice.currentDevice().name UIDevice.currentDevice().model UIDevice.currentDevice().orientation UIDevice.currentDevice().batteryLevel UIDevice.currentDevice().systemVersion Some developers have a little bit of fun with this information: for example, Snapchat gives you a special filter to use for photos when your battery is fully charged.Always keep an open mind about what you can do with data you have access to! Views One of the most important responsibilities of UIKit is that it provides views and the view hierarchy architecture. We've talked before about what a view is within the MVC programming paradigm, but here we're referring to the UIView class that acts as the base for (almost) all of our visual content in iOS programming. While it wasn't too important to know about when just getting our feet wet, now is a good time to really dig in a bit and understand what UIViews are and how they work both on their own and together. Let's start from the beginning: a view (UIView) defines a rectangle on your screen that is responsible for output and input, meaning drawing to the screen and receiving touch events.It can also contain other views, known as subviews, which ultimately create a view hierarchy. As a result of this hierarchy, we have to be aware of the coordinate systems involved. Now, let's talk about each of these three functions: drawing, hierarchies, and coordinate systems. Drawing Each UIView is responsible for drawing itself to the screen. In order to optimize drawing performance, the views will usually try to render their content once and then reuse that image content when it doesn't change. It can even move and scale content around inside of it without needing to redraw, which can be an expensive operation: An overview of how UIView draws itself to the screen With the system provided views, all of this is handled automatically. However, if you ever need to create your own UIView subclass that uses custom drawing, it's important to know what goes on behind the scenes. To implement custom drawing in a view, you need to implement the drawRect() function in your subclass. When something changes in your view, you need to call the setNeedsDisplay() function, which acts as a marker to let the system know that your view needs to be redrawn. During the next drawing cycle, the code in your drawRect() function will be executed to refresh the content of your view, which will then be cached for performance. A code example of this custom drawing functionality is a bit beyond the scope of this article, but discussing this will hopefully give you a better understanding of how drawing works in addition to giving you a jumping off point should you need to do this in the future. Hierarchies Now, let's discuss view hierarchies. When we would use a view controller in a storyboard, we would drag UI elements onto the view controller. However, what we were actually doing is adding a subview to the base view of the view controller. And in fact, that base view was a subview of the UIWindow, which is also a UIView. So, though, we haven't really acknowledged it, we've already put view hierarchies to work many times. The easiest way to think about what happens in a view hierarchy is that you set one view's parent coordinate system relative to another view. By default, you'd be setting a view's coordinate system to be relative to the base view, which is normally just the whole screen. But you can also set the parent coordinate system to some other view so that when you move or transform the parent view, the children views are moved and transformed along with it. Example of how parenting works with a view hierarchy. It's also important to note that the view hierarchy impacts the draw order of your views. All of a view's subviews will be drawn on top of the parent view, and the subviews will be drawn in the order they were added (the last subview added will be on top). To add a subview through code, you can use the addSubview() function. Here's an example: var view1 = UIView() var view2 = UIView() view1.addSubview(view2) The top-most views will intercept a touch first, and if it doesn't respond, it will pass it down the view hierarchy until a view does respond. Coordinate systems With all of this drawing and parenting, we need to take a minute to look at how the coordinate system works in UIKit for our views.The origin (0,0 point) in UIKit is the top left of the screen, and increases along X to the right, and increases on the Y downward. Each view is placed in this upper-left positioning system relative to its parent view's origin. Be careful! Other frameworks in iOS use different coordinate systems. For example, SpriteKit uses the lower-left corner as the origin. Each view also has its own setof positioning information. This is composed of the view's frame, bounds, and center. The frame rectangle describes the origin and the size of view relative to its parent view's coordinate system. The bounds rectangle describes the origin and the size of the view from its local coordinate system. The center is just the center point of the view relative to the parent view. When dealing with so many different coordinate systems, it can seem like a nightmare to compare positions from different views. Luckily, the UIView class provides a simple convertPoint()function to convert points between systems. Try running this little experiment in a playground to see how the point gets converted from one view's coordinate system to the other: import UIKit let view1 = UIView(frame: CGRect(x: 0, y: 0, width: 50, height: 50)) let view2 = UIView(frame: CGRect(x: 10, y: 10, width: 30, height: 30)) view1.addSubview(view2) let pointFrom1 = CGPoint(x: 20, y: 20) let pointFromView2 = view1.convertPoint(pointFrom1, toView: view2) Hopefully, you now have a much better understanding of some of the underlying workings of the view system in UIKit. Documents, displays, printing, and more In this section, I'm going to do my best to introduce you to the many additional features of the UIKit framework. The idea is to give you a better understanding of what is possible with UIKit, and if anything sounds interesting to you, you can go off and explore these features on your own. Documents UIKit has built in support for documents, much like you'd find on a desktop operating system. Using the UIDocument class, UIKit can help you save and load documents in the background in addition to saving them to iCloud. This could be a powerful feature for any app that allows the user to create content that they expect to save and resume working on later. Displays On most new iOS devices, you can connect external screens via HDMI. You can take advantage of these external displays by creating a new instance of the UIWindow class, and associating it with the external display screen. You can then add subviews to that window to create a secondscreen experience for devices like a bigscreen TV. While most consumers don't ever use HDMI-connected external displays, this is a great feature to keep in mind when working on internal applications for corporate or personal use. Printing Using the UIPrintInteractionController, you can set up and send print jobs to AirPrint-enabled printers on the user's network. Before you print, you can also create PDFs by drawing content off screen to make printing easier. And more! There are many more features of UIKit that are just waiting to be explored! To be honest, UIKit seems to be pretty much a dumping ground for any general features that were just a bit too small to deserve their own framework. If you do some digging in Apple's documentation, you'll find all kinds of interesting things you can do with UIKit, such as creating custom keyboards, creating share sheets, and custom cut-copy-paste support. Summary In this article, we looked at the biggest and most important UIKit and learned about some of the most important system processes like the view hierarchy. Resources for Article:   Further resources on this subject: Building Surveys using Xcode [article] Run Xcode Run [article] Tour of Xcode [article]
Read more
  • 0
  • 0
  • 17290

article-image-webhooks-slack
Packt
01 Jun 2016
11 min read
Save for later

Webhooks in Slack

Packt
01 Jun 2016
11 min read
In this article by Paul Asjes, the author of the book, Building Slack Bots, we'll have a look at webhooks in Slack. (For more resources related to this topic, see here.) Slack is a great way of communicating at your work environment—it's easy to use, intuitive, and highly extensible. Did you know that you can make Slack do even more for you and your team by developing your own bots? This article will teach you how to implement incoming and outgoing webhooks for Slack, supercharging your Slack team into even greater levels of productivity. The programming language we'll use here is JavaScript; however, webhooks can be programmed with any language capable of HTTP requests. Webhooks First let's talk basics: a webhook is a way of altering or augmenting a web application through HTTP methods. Webhooks allow us to post messages to and from Slack using regular HTTP requests with a JSON payloads. What makes a webhook a bot is its ability to post messages to Slack as if it were a bot user. These webhooks can be divided into incoming and outgoing webhooks, each with their own purposes and uses. Incoming webhooks An example of an incoming webhook is a service that relays information from an external source to a Slack channel without being explicitly requested, such as GitHub Slack integration: The GitHub integration posts messages about repositories we are interested in In the preceding screenshot, we see how a message was sent to Slack after a new branch was made on a repository this team was watching. This data wasn't explicitly requested by a team member but automatically sent to the channel as a result of the incoming webhook. Other popular examples include Jenkins integration, where infrastructure changes can be monitored in Slack (for example, if a server watched by Jenkins goes down, a warning message can be posted immediately to a relevant Slack channel). Let's start with setting up an incoming webhook that sends a simple "Hello world" message: First, navigate to the Custom Integration Slack team page, as shown in the following screenshot (https://my.slack.com/apps/build/custom-integration): The various flavors of custom integrations Select Incoming WebHooks from the list and then select which channel you'd like your webhook app to post messages to: Webhook apps will post to a channel of your choosing Once you've clicked on the Add Incoming WebHooks integration button, you will be presented with this options page, which allows you to customize your integration a little further: Names, descriptions, and icons can be set from this menu Set a customized icon for your integration (for this example, the wave emoji was used) and copy down the webhook URL, which has the following format:https://hooks.slack.com/services/T00000000/B00000000/XXXXXXXXXXXXXXXXXXXXXXXX This generated URL is unique to your team, meaning that any JSON payloads sent via this URL will only appear in your team's Slack channels. Now, let's throw together a quick test of our incoming webhook in Node. Start a new Node project (remember: you can use npm init to create your package.json file) and install the superagent AJAX library by running the following command in your terminal: npm install superagent –save Create a file named index.js and paste the following JavaScript code within it: const WEBHOOK_URL = [YOUR_WEBHOOK_URL]; const request = require('superagent'); request .post(WEBHOOK_URL) .send({ text: 'Hello! I am an incoming Webhook bot!' }) .end((err, res) => { console.log(res); }); Remember to replace [YOUR_WEBHOOK_URL] with your newly generated URL, and then run the program by executing the following command: nodemon index.js Two things should happen now: firstly, a long response should be logged in your terminal, and secondly, you should see a message like the following in the Slack client: The incoming webhook equivalent of "hello world" The res object we logged in our terminal is the response from the AJAX request. Taking the form of a large JavaScript object, it displays information about the HTTP POST request we made to our webhook URL. Looking at the message received in the Slack client, notice how the name and icon are the same ones we set in our integration setup on the team admin site. Remember that the default icon, name, and channel are used if none are provided, so let's see what happens when we change that. Replace your request AJAX call in index.js with the following: request .post(WEBHOOK_URL) .send({ username: "Incoming bot", channel: "#general", icon_emoji: ":+1:", text: 'Hello! I am different from the previous bot!' }) .end((err, res) => { console.log(res); }); Save the file, and nodemon will automatically restart the program. Switch over to the Slack client and you should see a message like the following pop up in your #general channel: New name, icon, and message In place of icon_emoji, you could also use icon_url to link to a specific image of your choosing. If you wish your message to be sent only to one user, you can supply a username as the value for the channel property: channel: "@paul" This will cause the message to be sent from within the Slackbot direct message. The message's icon and username will match either what you configured in the setup or set in the body of the POST request. Finally, let's look at sending links in our integration. Replace the text property with the following and save index.js: text: 'Hello! Here is a fun link: <http://www.github.com|Github is great!>' Slack will automatically parse any links it finds, whether it's in the http://www.example.com or www.example.com formats. By enclosing the URL in angled brackets and using the | character, we can specify what we would like the URL to be shown as: Formatted links are easier to read than long URLs For more information on message formatting, visit https://api.slack.com/docs/formatting. Note that as this is a custom webhook integration, we can change the name, icon, and channel of the integration. If we were to package the integration as a Slack app (an app installable by other teams), then it is not possible to override the default channel, username, and icon set. Incoming webhooks are triggered by external sources; an example would be when a new user signs up to your service or a product is sold. The goal of the incoming webhook is to provide information to your team that is easy to reach and comprehend. The opposite of this would be if you wanted users to get data out of Slack, which can be done via the medium of outgoing webhooks. Outgoing webhooks Outgoing webhooks differ from the incoming variety in that they send data out of Slack and to a service of your choosing, which in turn can respond with a message to the Slack channel. To set up an outgoing webhook, visit the custom integration page of your Slack team's admin page again—https://my.slack.com/apps/build/custom-integration—and this time, select the Outgoing WebHooks option. On the next screen, be sure to select a channel, name, and icon. Notice how there is a target URL field to be filled in; we will fill this out shortly. When an outgoing webhook is triggered in Slack, an HTTP POST request is made to the URL (or URLs, as you can specify multiple ones) you provide. So first, we need to build a server that can accept our webhook. In index.js, paste the following code: 'use strict'; const http = require('http'); // create a simple server with node's built in http module http.createServer((req, res) => { res.writeHead(200, {'Content-Type': 'text/plain'}); // get the data embedded in the POST request req.on('data', (chunk) => { // chunk is a buffer, so first convert it to // a string and split it to make it more legible as an array console.log('Body:', chunk.toString().split('&')); }); // create a response let response = JSON.stringify({ text: 'Outgoing webhook received!' }); // send the response to Slack as a message res.end(response); }).listen(8080, '0.0.0.0'); console.log('Server running at http://0.0.0.0:8080/'); Notice how we require the http module despite not installing it with NPM. This is because the http module is a core Node dependency and is automatically included with your installation of Node. In this block of code, we start a simple server on port 8080 and listen for incoming requests. In this example, we set our server to run at 0.0.0.0 rather than localhost. This is important as Slack is sending a request to our server, so it needs to be accessible from the Internet. Setting the IP of your server to 0.0.0.0 tells Node to use your computer's network-assigned IP address. Therefore, if you set the IP of your server to 0.0.0.0, Slack can reach your server by hitting your IP on port 8080 (for example, http://123.456.78.90:8080). If you are having trouble with Slack reaching your server, it is most likely because you are behind a router or firewall. To circumvent this issue, you can use a service such as ngrok (https://ngrok.com/). Alternatively, look at port forwarding settings for your router or firewall. Let's update our outgoing webhook settings accordingly: The outgoing webhook settings, with a destination URL Save your settings and run your Node app; test whether the outgoing webhook works by typing a message into the channel you specified in the webhook's settings. You should then see something like this in Slack: We built a spam bot Well, the good news is that our server is receiving requests and returning a message to send to Slack each time. The issue here is that we skipped over the Trigger Word(s) field in the webhook settings page. Without a trigger word, any message sent to the specified channel will trigger the outgoing webhook. This causes our webhook to be triggered by a message sent by the outgoing webhook in the first place, creating an infinite loop. To fix this, we could do one of two things: Refrain from returning a message to the channel when listening to all the channel's messages. Specify one or more trigger words to ensure we don't spam the channel. Returning a message is optional yet encouraged to ensure a better user experience. Even a confirmation message such as Message received! is better than no message as it confirms to the user that their message was received and is being processed. Let's therefore presume we prefer the second option, and add a trigger word: Trigger words keep our webhooks organized Let's try that again, this time sending a message with the trigger word at the beginning of the message. Restart your Node app and send a new message: Our outgoing webhook app now functions a lot like our bots from earlier Great, now switch over to your terminal and see what that message logged: Body: [ 'token=KJcfN8xakBegb5RReelRKJng', 'team_id=T000001', 'team_domain=buildingbots', 'service_id=34210109492', 'channel_id=C0J4E5SG6', 'channel_name=bot-test', 'timestamp=1460684994.000598', 'user_id=U0HKKH1TR', 'user_name=paul', 'text=webhook+hi+bot%21', 'trigger_word=webhook' ] This array contains the body of the HTTP POST request sent by Slack; in it, we have some useful data, such as the user's name, the message sent, and the team ID. We can use this data to customize the response or to perform some validation to make sure the user is authorized to use this webhook. In our response, we simply sent back a Message received string; however, like with incoming webhooks, we can set our own username and icon. The channel cannot be different from the channel specified in the webhook's settings, however. The same restrictions apply when the webhook is not a custom integration. This means that if the webhook was installed as a Slack app for another team, it can only post messages as the username and icon specified in the setup screen. An important thing to note is that webhooks, either incoming or outgoing, can only be set up in public channels. This is predominantly to discourage abuse and uphold privacy, as we've seen that it's simple to set up a webhook that can record all the activity on a channel. Summary In this article, you learned what webhooks are and how you can use them to get data in and out of Slack. You learned how to send messages as a bot user and how to interact with your users in the native Slack client. Resources for Article: Further resources on this subject: Keystone – OpenStack Identity Service[article] A Sample LEMP Stack[article] Implementing Stacks using JavaScript[article]
Read more
  • 0
  • 0
  • 14392

article-image-linking-data-shapes
Packt
01 Jun 2016
7 min read
Save for later

Linking Data to Shapes

Packt
01 Jun 2016
7 min read
In this article by David J Parker, the author of the book Mastering Data Visualization with Microsoft Visio Professional 2016, discusses about that Microsoft introduced the current data-linking feature in the Professional edition of Visio Professional 2007. This feature is better than the database add-on that has been around since Visio 4 because it has greater importing capabilities and is part of the core product with its own API. This provides the Visio user with a simple method of surfacing data from a variety of data sources, and it gives the power user (or developer) the ability to create productivity enhancements in code. (For more resources related to this topic, see here.) Once data is imported in Visio, the rows of data can be linked to shapes and then displayed visually, or be used to automatically create hyperlinks. Moreover, if the data is edited outside of Visio, then the data in the Visio shapes can be refreshed, so the shapes reflect the updated data. This can be done in the Visio client, but some data sources can also refresh the data in Visio documents that are displayed in SharePoint web pages. In this way, Visio documents truly become operational intelligence dashboards. Some VBA knowledge will be useful, and the sample data sources are introduced in each section. In this chapter, we shall cover the following topics: The new Quick Import feature Importing data from a variety of sources How to link shapes to rows of data Using code for more linking possibilities A very quick introduction to importing and linking data Visio Professional 2016 added more buttons to the Data ribbon tab, and some new Data Graphics, but the functionality has basically been the same since Visio 2007 Professional. The new additions, as seen in the following screenshot, can make this particular ribbon tab quite wide on the screen. Thank goodness that wide screens have become the norm: The process to create data-refreshable shapes in Visio is simply as follows: Import data as recordsets. Link rows of data to shapes. Make the shapes display the data. Use any hyperlinks that have been created automatically. The Quick Import tool introduced in Visio 2016 Professional attempts to merge the first three steps into one, but it rarely gets it perfectly, and it is only for simple Excel data sources. Therefore, it is necessary to learn how to use the Custom Import feature properly. Knowing when to use the Quick Import tool The Data | External Data | Quick Import button is new in Visio 2016 Professional. It is part of the Visio API, so it cannot be called in code. This is not a great problem because it is only a wrapper for some of the actions that can be done in code anyway. This feature can only use an Excel workbook, but fortunately Visio installs a sample OrgData.xls file in the Visio Content<LCID> folder. The LCID (Location Code Identifier) for US English is 1033, as shown in the following screenshot: The screenshot shows a Visio Professional 2016 32-bit installation is on a Windows 10 64-bit laptop. Therefore, the Office16 applications are installed in the Program Files (x86)root folder. It would just be Program Filesroot if the 64-bit version of Office was installed. It is not possible to install a different bit version of Visio than the rest of the Office applications. There is no root folder in previous versions of Office, but the rest of the path is the same. The full path on this laptop is C:Program Files (x86)Microsoft OfficerootOffice16Visio Content1033ORGDATA.XLS, but it is best to copy this file to a folder where it can be edited. It is surprising that the Excel workbook is in the old binary format, but it is a simple process to open and save it in the new Open Packaging Convention file format with an xlsx extension. Importing to shapes without existing Shape Data rows The following example contains three Person shapes from the Work Flow Objects stencil, and each one contains the names of a person’s name, spelt exactly the same as in the key column on the Excel worksheet. It is not case sensitive, and it does not matter whether there are leading or trailing spaces in the text. When the Quick Import button is pressed, a dialog opens up to show the progress of the stages that the wizard feature is going through, as shown in the following screenshot: If the workbook contains more than one table of data, the user is prompted to select the range of cells within the workbook. When the process is complete, each of the Person shapes contains all of the data from the row in the External Data recordset, where the text matches the Name column, as shown in the following screenshot: The linked rows in the External Data window also display a chain icon, and the right-click menu has many actions, such as selecting the Linked Shapes for a row. Conversely, each shape now contains a right-mouse menu action to select the linked row in an External Data recordset. The Quick Import feature also adds some default data graphics to each shape, which will be ignored in this chapter because it is explored in detail in chapter 4, Using the Built-in Data Graphics. Note that the recordset in the External Data window is named Sheet1$A1:H52. This is not perfect, but the user can rename it through the right mouse menu actions of the tab. The Properties dialog, as seen in the following screenshot: The user can also choose what to do if a data link is added to a shape that already has one. A shape can be linked to a single row in multiple recordsets, and a single row can be linked to multiple shapes in a document, or even on the same page. However, a shape cannot be linked to more than one row in the same recordset. Importing to shapes with existing Shape Data rows The Person shape from the Resources stencil has been used in the following example, and as earlier, each shape has the name text. However, in this case, there are some existing Shape Data rows: When the Quick Import feature is run, the data is linked to each shape where the text matches the Name column value. This feature has unfortunately created a problem this time because the Phone Number, E-mail Alias, and Manager Shape Data rows have remained empty, but the superfluous Telephone, E-mail, and Reports_To have been added. The solution is to edit the column headers in the worksheet to match the existing Shape Data row labels, as shown in the following screenshot: Then, when Quick Import is used again, the column headers will match the Shape Data row names, and the data will be automatically cached into the correct places, as shown in the following screenshot: Using the Custom Import feature The user has more control using the Custom Import button on the Data | External Data ribbon tab. This button was called Link Data to Shapes in the previous versions of Visio. In either case, the action opens the Data Selector dialog, as shown in the following screenshot: Each of these data sources will be explained in this chapter, along with the two data sources that are not available in the UI (namely XML files and SQL Server Stored Procedures). Summary This article has gone through the many different sources for importing data in Visio and has shown how each can be done. Resources for Article: Further resources on this subject: Overview of Process Management in Microsoft Visio 2013[article] Data Visualization[article] Data visualization[article]
Read more
  • 0
  • 0
  • 7293
article-image-holistic-view-spark
Packt
31 May 2016
19 min read
Save for later

Holistic View on Spark

Packt
31 May 2016
19 min read
In this article by Alex Liu, author of the book Apache Spark Machine Learning Blueprints, the author talks about a new stage of utilizing Apache Spark-based systems to turn data into insights. (For more resources related to this topic, see here.) According to research done by Gartner and others, many organizations lost a huge amount of value only due to the lack of a holistic view of their business. In this article, we will review the machine learning methods and the processes of obtaining a holistic view of business. Then, we will discuss how Apache Spark fits in to make the related computing easy and fast, and at the same time, with one real-life example, illustrate this process of developing holistic views from data using Apache Spark computing, step by step as follows: Spark for a holistic view Methods for a holistic view Feature preparation Model estimation Model evaluation Results Explanation Deployment Spark for holistic view Spark is very suitable for machine-learning projects such as ours to obtain a holistic view of business as it enables us to process huge amounts of data fast, and it enables us to code complicated computation easily. In this section, we will first describe a real business case and then describe how to prepare the Spark computing for our project. The use case The company IFS sells and distributes thousands of IT products and has a lot of data on marketing, training, team management, promotion, and products. The company wants to understand how various kinds of actions, such as that in marketing and training, affect sales teams’ success. In other words, IFS is interested in finding out how much impact marketing, training, or promotions have generated separately. In the past, IFS has done a lot of analytical work, but all of it was completed by individual departments on soloed data sets. That is, they have analytical results about how marketing affects sales from using marketing data alone, and how training affects sales from analyzing training data alone. When the decision makers collected all the results together and prepared to make use of them, they found that some of the results were contradicting with each other. For example, when they added all the effects together, the total impacts were beyond their intuitively imaged. This is a typical problem that every organization is facing. A soloed approach with soloed data will produce an incomplete view, and also an often biased view, or even conflicting views. To solve this problem, analytical teams need to take a holistic view of all the company data, and gather all the data into one place, and then utilize new machine learning approaches to gain a holistic view of business. To do so, companies also need to care for the following: The completeness of causes Advanced analytics to account for the complexity of relationships Computing the complexity related to subgroups and a big number of products or services For this example, we have eight datasets that include one dataset for marketing with 48 features, one dataset for training with 56 features, and one dataset for team administration with 73 features, with the following table as a complete summary: Category Number of Features Team 73 Marketing 48 Training 56 Staffing 103 Product 77 Promotion 43 Total 400 In this company, researchers understood that pooling all the data sets together and building a complete model was the solution, but they were not able to achieve it for several reasons. Besides organizational issues inside the corporation, tech capability to store all the data, to process all the data quickly with the right methods, and to present all the results in the right ways with reasonable speed were other challenges. At the same time, the company has more than 100 products to offer for which data was pooled together to study impacts of company interventions. That is, calculated impacts are average impacts, but variations among products are too large to ignore. If we need to assess impacts for each product, parallel computing is preferred and needs to be implemented at good speed. Without utilizing a good computing platform such as Apache Spark meeting the requirements that were just described is a big challenge for this company. In the sections that follow, we will use modern machine learning on top of Apache Spark to attack this business use case and help the company to gain a holistic view of their business. In order to help readers learn machine learning on Spark effectively, discussions in the following sections are all based on work about this real business use case that was just described. But, we left some details out to protect the company’s privacy and also to keep everything brief. Distributed computing For our project, parallel computing is needed for which we should set up clusters and worker notes. Then, we can use the driver program and cluster manager to manage the computing that has to be done in each worker node. As an example, let's assume that we choose to work within Databricks’ environment: The users can go to the main menu, as shown in the preceding screenshot, click Clusters. A Window will open for users to name the cluster, select a version of Spark, and then specify number of workers. Once the clusters are created, we can go to the main menu, click the down arrow on the right-hand side of Tables. We then choose Create Tables to import our datasets that were cleaned and prepared. For the data source, the options include S3, DBFS, JDBC, and File (for local fields). Our data has been separated into two subsets, one to train and one to test each product, as we need to train a few models per product. In Apache Spark, we need to direct workers to complete computation on each note. We will use scheduler to get Notebook computation completed on Databricks, and collect the results back, which will be discussed in the Model Estimation section. Fast and easy computing One of the most important advantages of utilizing Apache Spark is to make coding easy for which several approaches are available. Here for this project, we will focus our effort on the notebook approach, and specifically, we will use the R notebooks to develop and organize codes. At the same time, with an effort to illustrate the Spark technology more thoroughly, we will also use MLlib directly to code some of our needed algorithms as MLlib has been seamlessly integrated with Spark. In the Databricks’ environment, setting up notebooks will take the following steps: As shown in the preceding screenshot, users can go to the Databricks main menu, click the down arrow on the right-hand side of Workspace, and choose Create -> Notebook to create a new notebook. A table will pop up for users to name the notebook and also select a language (R, Python, Scala, or SQL). In order to make our work repeatable and also easy to understand, we will adopt a workflow approach that is consistent with the RM4Es framework. We will also adopt Spark’s ML Pipeline tools to represent our workflows whenever possible. Specifically, for the training data set, we need to estimate models, evaluate models, then maybe to re-estimate the models again before we can finalize our models. So, we need to use Spark’s Transformer, Estimator, and Evaluator to organize an ML pipeline for this project. In practice, we can also organize these workflows within the R notebook environment. For more information about pipeline programming, please go to http://spark.apache.org/docs/latest/ml-guide.html#example-pipeline.  & http://spark.apache.org/docs/latest/ml-guide.html Once our computing platform is set up and our framework is cleared, everything becomes clear too. In the following sections, we will move forward step by step. We will use our RM4Es framework and related processes to identity equations or methods and then prepare features first. The second step is to complete model estimations, the third is to evaluate models, and the fourth is to explain our results. Finally, we will deploy the models. Methods for a holistic view In this section, we need to select our analytical methods or models (equations), which is to complete a task of mapping our business use case to machine learning methods. For our use case of assessing impacts of various factors on sales team success, there are many suitable models for us to use. As an exercise, we will select: regression models, structural equation models, and decision trees, mainly for their easiness to interpret as well as their implementation on Spark. Once we finalize our decision for analytical methods or models, we will need to prepare the dependent variable and also prepare to code, which we will discuss one by one. Regression modeling To get ready for regression modeling on Spark, there are three issues that you have to take care of, as follows: Linear regression or logistic regression: Regression is the most mature and also most widely-used model to represent the impacts of various factors on one dependent variable. Whether to use linear regression or logistic regression depends on whether the relationship is linear or not. We are not sure about this, so we will use adopt both and then compare their results to decide on which one to deploy. Preparing the dependent variable: In order to use logistic regression, we need to recode the target variable or dependent variable (the sales team success variable now with a rating from 0 to 100) to be 0 versus 1 by separating it with the medium value. Preparing coding: In MLlib, we can use the following codes for regression modeling as we will use Spark MLlib’s Linear Regression with Stochastic Gradient Descent (LinearRegressionWithSGD): val numIterations = 90 val model = LinearRegressionWithSGD.train(TrainingData, numIterations) For logistic regression, we use the following codes: val model = new LogisticRegressionWithSGD() .setNumClasses(2) .run(training) For more about using MLlib for regression modeling, please go to: http://spark.apache.org/docs/latest/mllib-linear-methods.html#linear-least-squares-lasso-and-ridge-regression In R, we can use the lm function for linear regression, and the glm function for logistic regression with family=binomial(). SEM aproach To get ready for Structural Equation Modeling (SEM) on Spark, there are also three issues that we need to take care of as follows: SEM introduction specification: SEM may be considered as an extension of regression modeling, as it consists of several linear equations that are similar to regression equations. But, this method estimates all the equations at the same time regarding their internal relations, so it is less biased than regression modeling. SEM consists of both structural modeling and latent variable modeling, but for us, we will only use structural modeling. Preparing dependent variable: We can just use the sales team success scale (rating of 0 to 100) as our target variable here. Preparing coding: We will adopt the R notebook within the Databricks environment, for which we should use the R package SEM. There are also other SEM packages, such as lavaan, that are available to use, but for this project, we will use the SEM package for its easiness to learn. Loading SEM package into the R notebook, we will use install.packages("sem", repos="http://R-Forge.R-project.org"). Then, we need to perform the R code of library(sem). After that, we need to use the specify.model() function to write some codes to specify models into our R notebook, for which the following codes are needed: mod.no1 <- specifyModel() s1 <- x1, gam31 s1 <- x2, gam32 Decision trees To get ready for the decision tree modeling on Spark, there are also three issues that we need to take care of as follows: Decision tree selection: Decision tree aims to model classifying cases, which is about classifying elements into success or not success for our use case. It is also one of the most mature and widely-used methods. For this exercise, we will only use the simple linear decision tree, and we will not venture into any more complicated trees, such as random forest. Prepare the dependent variable: To use the decision tree model here, we will separate the sales team rating into two categories of SUCCESS or NOT as we did for logistic regression. Prepare coding: For MLlib, we can use the following codes: val numClasses = 2 val categoricalFeaturesInfo = Map[Int, Int]() val impurity = "gini" val maxDepth = 6 val maxBins = 32 val model = DecisionTree.trainClassifier(trainingData, numClasses, categoricalFeaturesInfo, impurity, maxDepth, maxBins)   For more information about using MLlib for decision tree, please go to: http://spark.apache.org/docs/latest/mllib-decision-tree.html As for the R notebook on Spark, we need to use an R package of rpart, and then use the rpart functions for all the calculation. For rpart, we need to specify the classifier and also all the features that have to be used. Model estimation Once feature sets get finalized, what follows is to estimate the parameters of the selected models. We can use either MLlib or R here to do this, and we need to arrange distributed computing. To simplify this, we can utilize the Databricks’ Job feature. Specifically, within the Databricks environment, we can go to Jobs, then create jobs, as shown in the following screenshot: Then, users can select what notebooks to run, specify clusters, and then schedule jobs. Once scheduled, users can also monitor the running notebooks, and then collect results back. In Section II, we prepared some codes for each of the three models that were selected. Now, we need to modify them with the final set of features selected in the last section, so to create our final notebooks. In other words, we have one dependent variable prepared, and 17 features selected out from our PCA and feature selection work. So, we need to insert all of them into the codes that were developed in Section II to finalize our notebook. Then, we will use Spark Job feature to get these notebooks implemented in a distributed way. MLlib implementation First, we need to prepare our data with the s1 dependent variable for linear regression, and the s2 dependent variable for logistic regression or decision tree. Then, we need to add the selected 17 features into them to form datasets that are ready for our use. For linear regression, we will use the following code: val numIterations = 90 val model = LinearRegressionWithSGD.train(TrainingData, numIterations) For logistic regression, we will use the following code: val model = new LogisticRegressionWithSGD() .setNumClasses(2) For Decision tree, we will use the following code: val model = DecisionTree.trainClassifier(trainingData, numClasses, categoricalFeaturesInfo, impurity, maxDepth, maxBins) R notebooks implementation For better comparison, it is a good idea to write linear regression and SEM into the same R notebook and also write logistic regression and Decision tree into the same R notebook. Then, the main task left here is to schedule the estimation for each worker, and then collect the results, using the previously mentioned Job feature in Databricks environment as follows: The code for linear regression and SEM is as follows: lm.est1 <- lm(s1 ~ T1+T2+M1+ M2+ M3+ Tr1+ Tr2+ Tr3+ S1+ S2+ P1+ P2+ P3+ P4+ Pr1+ Pr2+ Pr3) mod.no1 <- specifyModel() s1 <- x1, gam31 s1 <- x2, gam32 The code for logistic regression and Decision tree is as follows: logit.est1 <- glm(s2~ T1+T2+M1+ M2+ M3+ Tr1+ Tr2+ Tr3+ S1+ S2+ P1+ P2+ P3+ P4+ Pr1+ Pr2+ Pr3,family=binomial()) dt.est1 <- rpart(s2~ T1+T2+M1+ M2+ M3+ Tr1+ Tr2+ Tr3+ S1+ S2+ P1+ P2+ P3+ P4+ Pr1+ Pr2+ Pr3, method="class") After we get all models estimated as per each product, for simplicity, we will focus on one product to complete our discussion on model evaluation and model deployment. Model evaluation In the previous section, we completed our model estimation task. Now, it is time for us to evaluate the estimated models to see whether they meet our model quality criteria so that we can either move to our next stage for results explanation or go back to some previous stages to refine our models. To perform our model evaluation, in this section, we will focus our effort on utilizing RMSE (root-mean-square error) and ROC Curves (receiver operating characteristic) to assess whether our models are a good fit. To calculate RMSEs and ROC Curves, we need to use our test data rather than the training data that was used to estimate our models. Quick evaluations Many packages have already included some algorithms for users to assess models quickly. For example, both MLlib and R have algorithms to return a confusion matrix for logistic regression models, and they even get false positive numbers calculated. Specifically, MLlib has functions of confusionMatrixand numFalseNegatives() for us to use, and even some algorithms to calculate MSE quickly as follows: MSE = valuesAndPreds.(lambda (v, p): (v - p)**2).mean() print("Mean Squared Error = " + str(MSE)) Also, R has a function of confusion.matrix for us to use. In R, there are even many tools to produce some quick graphical plots that can be used to gain a quick evaluation of models. For example, we can perform plots of predicted versus actual values, and also residuals on predicted values. Intuitively, the methods of comparing predicted versus actual values are the easiest to understand and give us a quick model evaluation. The following table is a calculated confusion matrix for one of the company products, which shows a reasonable fit of our model.   Predicted as Success Predicted as NOT Actual Success 83% 17% Actual Not 9% 91% RMSE In MLlib, we can use the following codes to calculate RMSE: val valuesAndPreds = test.map { point => val prediction = new_model.predict(point.features) val r = (point.label, prediction) r } val residuals = valuesAndPreds.map {case (v, p) => math.pow((v - p), 2)} val MSE = residuals.mean(); val RMSE = math.pow(MSE, 0.5) Besides the above, MLlib also has some functions in the RegressionMetrics and RankingMetrics classes for us to use for RMSE calculation. In R, we can compute RMSE as follows: RMSE <- sqrt(mean((y-y_pred)^2)) Before this, we need to obtain the predicted values with the following commands: > # build a model > RMSElinreg <- lm(s1 ~ . ,data= data1) > #score the model > score <- predict(RMSElinreg, data2) After we have obtained RMSE values for all the estimated models, we will compare them to evaluate the linear regression model versus the logistic regression model versus the Decision tree model. For our case, linear regression models turned out to be almost the best. Then, we also compare RMSE values across products, and send back some product models back for refinement. For another example of obtaining RMSE, please go to http://www.cakesolutions.net/teamblogs/spark-mllib-linear-regression-example-and-vocabulary. ROC curves As an example, we calculate ROC curves to assess our logistic models. In MLlib, we can use the MLlib function of metrics.areaUnderROC() to calculate ROC once we apply our estimated model to our test data and get labels for testing cases. For more on using MLlib to obtain ROC, please go to: http://web.cs.ucla.edu/~mtgarip/linear.html In R, using package pROC, we can perform the following to calculate and plot ROC curves: mylogit <- glm(s2 ~ ., family = "binomial") summary(mylogit) prob=predict(mylogit,type=c("response")) testdata1$prob=prob library(pROC) g <- roc(s2 ~ prob, data = testdata1) plot(g) As discussed, once ROC curves get calculated, we can use them to compare our logistic models against Decision tree models, or compare models cross products. In our case, logistic models perform better than Decision tree models. Results explanation Once we pass our model evaluation and decide to select the estimated model as our final model, we need to interpret results to the company executives and also their technicians. Next, we discuss some commonly-used ways of interpreting our results, one using tables and another using graphs, with our focus on impacts assessments. Some users may prefer to interpret our results in terms of ROIs, for which cost and benefits data is needed. Once we have the cost and benefit data, our results here can be easily expanded to cover the ROI issues. Also, some optimization may need to be applied for real decision making. Impacts assessments As discussed in Section 1, the main purpose of this project is to gain a holistic view of the sales team success. For example, the company wishes to understand the impact of marketing on sales success in comparison to training and other factors. As we have our linear regression model estimated, one easy way of comparing impacts is to summarize the variance explained by each feature group, as shown by the following table. Tables for Impact Assessment: Feature Group % Team 8.5 Marketing 7.6 Training 5.7 Staffing 12.9 Product 8.9 Promotion 14.6 Total 58.2 The following figure is another example of using graphs to display the results that were discussed. Summary In this article, we went through a step-by-step process from data to a holistic view of businesses. From this, we processed a large amount of data on Spark and then built a model to produce a holistic view of the sales team success for the company IFS. Specifically, we first selected models per business needs after we prepared Spark computing and loaded in preprocessed data. Secondly, we estimated model coefficients. Third, we evaluated the estimated models. Then, we finally interpreted analytical results. This process is similar to the process of working with small data. But in dealing with big data, we will need parallel computing, for which Apache Spark is utilized. During this process, Apache Spark makes things easy and fast. After this article, readers will have gained a full understanding about how Apache Spark can be utilized to make our work easier and faster in obtaining a holistic view of businesses. At the same time, readers should become familiar with the RM4Es modeling processes of processing large amount of data and developing predictive models, and they should especially become capable of producing their own holistic view of businesses. Resources for Article: Further resources on this subject: Getting Started with Apache Hadoop and Apache Spark [article] Getting Started with Apache Spark DataFrames [article] Sabermetrics with Apache Spark [article]
Read more
  • 0
  • 0
  • 2732

article-image-splunks-input-methods-and-data-feeds
Packt
30 May 2016
13 min read
Save for later

Splunk's Input Methods and Data Feeds

Packt
30 May 2016
13 min read
This article being crafted by Ashish Kumar Yadav has been picked from Advanced Splunk book. This book helps you to get in touch with a great data science tool named Splunk. The big data world is an ever expanding forte and it is easy to get lost in the enormousness of machine data available at your bay. The Advanced Splunk book will definitely provide you with the necessary resources and the trail to get you at the other end of the machine data. While the book emphasizes on Splunk, it also discusses its close association with Python language and tools like R and Tableau that are needed for better analytics and visualization purpose. (For more resources related to this topic, see here.) Splunk supports numerous ways to ingest data on its server. Any data generated from a human-readable machine from various sources can be uploaded using data input methods such as files, directories, TCP/UDP scripts can be indexed on the Splunk Enterprise server and analytics and insights can be derived from them. Data sources Uploading data on Splunk is one of the most important parts of analytics and visualizations of data. If data is not properly parsed, timestamped, or broken into events, then it can be difficult to analyze and get proper insight on the data. Splunk can be used to analyze and visualize data ranging from various domains, such as IT security, networking, mobile devices, telecom infrastructure, media and entertainment devices, storage devices, and many more. The machine generated data from different sources can be of different formats and types, and hence, it is very important to parse data in the best format to get the required insight from it. Splunk supports machine-generated data of various types and structures, and the following screenshot shows the common types of data that comes with an inbuilt support in Splunk Enterprise. The most important point of these sources is that if the data source is from the following list, then the preconfigured settings and configurations already stored in Splunk Enterprise are applied. This helps in getting the data parsed in the best and most suitable formats of events and timestamps to enable faster searching, analytics, and better visualization. The following screenshot enlists common data sources supported by Splunk Enterprise: Structured data Machine-generated data is generally structured, and in some cases, it can be semistructured. Some of the types of structured data are EXtensible Markup Language (XML), JavaScript Object Notation (JSON), comma-separated values (CSV), tab-separated values (TSV), and pipe-separated values (PSV). Any format of structured data can be uploaded on Splunk. However, if the data is from any of the preceding formats, then predefined settings and configuration can be applied directly by choosing the respective source type while uploading the data or by configuring it in the inputs.conf file. The preconfigured settings for any of the preceding structured data is very generic. Many times, it happens that the machine logs are customized structured logs; in that case, additional settings will be required to parse the data. For example, there are various types of XML. We have listed two types here. In the first type, there is the <note> tag at the start and </note> at the end, and in between, there are parameters are their values. In the second type, there are two levels of hierarchies. XML has the <library> tag along with the <book> tag. Between the <book> and </book> tags, we have parameters and their values. The first type is as follows: <note> <to>Jack</to> <from>Micheal</from> <heading>Test XML Format</heading> <body>This is one of the format of XML!</body> </note> The second type is shown in the following code snippet: <Library> <book category="Technical"> <title lang="en">Splunk Basic</title> <author>Jack Thomas</author> <year>2007</year> <price>520.00</price> </book> <book category="Story"> <title lang="en">Jungle Book</title> <author>Rudyard Kiplin</author> <year>1984</year> <price>50.50</price> </book> </Library > Similarly, there can be many types of customized XML scripts generated by machines. To parse different types of structured data, Splunk Enterprise comes with inbuilt settings and configuration defined for the source it comes from. Let's say, for example, that the data received from a web server's logs are also structured logs and it can be in either a JSON, CSV, or simple text format. So, depending on the specific sources, Splunk tries to make the job of the user easier by providing the best settings and configuration for many common sources of data. Some of the most common sources of data are data from web servers, databases, operation systems, network security, and various other applications and services. Web and cloud services The most commonly used web servers are Apache and Microsoft IIS. All Linux-based web services are hosted on Apache servers, and all Windows-based web services on IIS. The logs generated from Linux web servers are simple plain text files, whereas the log files of Microsoft IIS can be in a W3C-extended log file format or it can be stored in a database in the ODBC log file format as well. Cloud services such as Amazon AWS, S3, and Microsoft Azure can be directly connected and configured according to the forwarded data on Splunk Enterprise. The Splunk app store has many technology add-ons that can be used to create data inputs to send data from cloud services to Splunk Enterprise. So, when uploading log files from web services, such as Apache, Splunk provides a preconfigured source type that parses data in the best format for it to be available for visualization. Suppose that the user wants to upload apache error logs on the Splunk server, and then the user chooses apache_error from the Web category of Source type, as shown in the following screenshot: On choosing this option, the following set of configuration is applied on the data to be uploaded: The event break is configured to be on the regular expression pattern ^[ The events in the log files will be broken into a single event on occurrence of [ at every start of a line (^) The timestamp is to be identified in the [%A %B %d %T %Y] format, where: %A is the day of week; for example, Monday %B is the month; for example, January %d is the day of the month; for example, 1 %T is the time that has to be in the %H : %M : %S format %Y is the year; for example, 2016 Various other settings such as maxDist that allows the amount of variance of logs can vary from the one specified in the source type and other settings such as category, descriptions, and others. Any new settings required as per our needs can be added using the New Settings option available in the section below Settings. After making the changes, either the settings can be saved as a new source type or the existing source type can be updated with the new settings. IT operations and network security Splunk Enterprise has many applications on the Splunk app store that specifically target IT operations and network security. Splunk is a widely accepted tool for intrusion detection, network and information security, fraud and theft detection, and user behaviour analytics and compliance. A Splunk Enterprise application provides inbuilt support for the Cisco Adaptive Security Appliance (ASA) firewall, Cisco SYSLOG, Call Detail Records (CDR) logs, and one of the most popular intrusion detection application, Snort. The Splunk app store has many technology add-ons to get data from various security devices such as firewall, routers, DMZ, and others. The app store also has the Splunk application that shows graphical insights and analytics over the data uploaded from various IT and security devices. Databases The Splunk Enterprise application has inbuilt support for databases such as MySQL, Oracle Syslog, and IBM DB2. Apart from this, there are technology add-ons on the Splunk app store to fetch data from the Oracle database and the MySQL database. These technology add-ons can be used to fetch, parse, and upload data from the respective database to the Splunk Enterprise server. There can be various types of data available from one source; let's take MySQL as an example. There can be error log data, query logging data, MySQL server health and status log data, or MySQL data stored in the form of databases and tables. This concludes that there can be a huge variety of data generated from the same source. Hence, Splunk provides support for all types of data generated from a source. We have inbuilt configuration for MySQL error logs, MySQL slow queries, and MySQL database logs that have been already defined for easier input configuration of data generated from respective sources. Application and operating system data The Splunk input source type has inbuilt configuration available for Linux dmesg, syslog, security logs, and various other logs available from the Linux operating system. Apart from the Linux OS, Splunk also provides configuration settings for data input of logs from Windows and iOS systems. It also provides default settings for Log4j-based logging for Java, PHP, and .NET enterprise applications. Splunk also supports lots of other applications' data such as Ruby on Rails, Catalina, WebSphere, and others. Splunk Enterprise provides predefined configuration for various applications, databases, OSes, and cloud and virtual environments to enrich the respective data with better parsing and breaking into events, thus deriving at better insight from the available data. The applications' source whose settings are not available in Splunk Enterprise can alternatively have apps or add-ons on the app store. Data input methods Splunk Enterprise supports data input through numerous methods. Data can be sent on Splunk via files and directories, TCP, UDP, scripts or using universal forwarders. Files and directories Splunk Enterprise provides an easy interface to the uploaded data via files and directories. Files can be directly uploaded from the Splunk web interface manually or it can be configured to monitor the file for changes in content, and the new data will be uploaded on Splunk whenever it is written in the file. Splunk can also be configured to upload multiple files by either uploading all the files in one shot or the directory can be monitored for any new files, and the data will get indexed on Splunk whenever it arrives in the directory. Any data format from any sources that are in a human-readable format, that is, no propriety tools are needed to read the data, can be uploaded on Splunk. Splunk Enterprise even supports uploading in a compressed file format such as (.zip and .tar.gz), which has multiple log files in a compressed format. Network sources Splunk supports both TCP and UDP to get data on Splunk from network sources. It can monitor any network port for incoming data and then can index it on Splunk. Generally, in case of data from network sources, it is recommended that you use a Universal forwarder to send data on Splunk, as Universal forwarder buffers the data in case of any issues on the Splunk server to avoid data loss. Windows data Splunk Enterprise provides direct configuration to access data from a Windows system. It supports both local as well as remote collections of various types and sources from a Windows system. Splunk has predefined input methods and settings to parse event log, performance monitoring report, registry information, hosts, networks and print monitoring of a local as well as remote Windows system. So, data from different sources of different formats can be sent to Splunk using various input methods as per the requirement and suitability of the data and source. New data inputs can also be created using Splunk apps or technology add-ons available on the Splunk app store. Adding data to Splunk—new interfaces Splunk Enterprises introduced new interfaces to accept data that is compatible with constrained resources and lightweight devices for Internet of Things. Splunk Enterprise version 6.3 supports HTTP Event Collector and REST and JSON APIs for data collection on Splunk. HTTP Event Collector is a very useful interface that can be used to send data without using any forwarder from your existing application to the Splunk Enterprise server. HTTP APIs are available in .NET, Java, Python, and almost all the programming languages. So, forwarding data from your existing application that is based on a specific programming language becomes a cake walk. Let's take an example, say, you are a developer of an Android application, and you want to know what all features the user uses that are the pain areas or problem-causing screens. You also want to know the usage pattern of your application. So, in the code of your Android application, you can use REST APIs to forward the logging data on the Splunk Enterprise server. The only important point to note here is that the data needs to be sent in a JSON payload envelope. The advantage of using HTTP Event Collector is that without using any third-party tools or any configuration, the data can be sent on Splunk and we can easily derive insights, analytics, and visualizations from it. HTTP Event Collector and configuration HTTP Event Collector can be used when you configure it from the Splunk Web console, and the event data from HTTP can be indexed in Splunk using the REST API. HTTP Event Collector HTTP Event Collector (EC) provides an API with an endpoint that can be used to send log data from applications into Splunk Enterprise. Splunk HTTP Event Collector supports both HTTP and HTTPS for secure connections. The following are the features of HTTP Event Collector, which make's adding data on Splunk Enterprise easier: It is very lightweight is terms of memory and resource usage, and thus can be used in resources constrained to lightweight devices as well. Events can be sent directly from anywhere such as web servers, mobile devices, and IoT without any need of configuration or installation of forwarders. It is a token-based JSON API that doesn't require you to save user credentials in the code or in the application settings. The authentication is handled by tokens used in the API. It is easy to configure EC from the Splunk Web console, enable HTTP EC, and define the token. After this, you are ready to accept data on Splunk Enterprise. It supports both HTTP and HTTPS, and hence it is very secure. It supports GZIP compression and batch processing. HTTP EC is highly scalable as it can be used in a distributed environment as well as with a load balancer to crunch and index millions of events per second. Summary In this article, we walked through various data input methods along with various data sources supported by Splunk. We also looked at HTTP Event Collector, which is a new feature added in Splunk 6.3 for data collection via REST to encourage the usage of Splunk for IoT. The data sources and input methods for Splunk are unlike any generic tool and the HTTP Event Collector is the added advantage compare to other data analytics tools. Resources for Article: Further resources on this subject: The Splunk Interface [article] The Splunk Web Framework [article] Introducing Splunk [article]
Read more
  • 0
  • 0
  • 16491
Modal Close icon
Modal Close icon