Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletter Hub
Free Learning
Arrow right icon
timer SALE ENDS IN
0 Days
:
00 Hours
:
00 Minutes
:
00 Seconds

How-To Tutorials

7008 Articles
article-image-planning-your-crm-implementation
Packt
08 Jan 2016
13 min read
Save for later

Planning Your CRM Implementation

Packt
08 Jan 2016
13 min read
In this article by Joseph Murray and Brian Shaughnessy, authors of the book Using CiviCRM, Second Edition, you will learn to plan your implementation of CiviCRM so that your project has the best chance to achieve greater organization success. In this article, we will do the following: Identify potential barriers to success and learn how to overcome them Select an appropriate development methodology (For more resources related to this topic, see here.) Barriers to success Constituent Relationship Management (CRM) initiatives can be difficult. They require change, often impacting processes and workflows that are at the heart of your staff's daily responsibilities. They force you to rethink how you are managing your existing data, and decide what new data you want to begin collecting. They may require you to restructure external relationships, even as you rebuild internal processes and tools. Externally, the experience of your constituents, as they interact with your organization, may need to change so that they provide more value and fewer barriers to involvement. Internally, business processes and supporting technological systems may need to change in order to break down departmental operations' silos, increase efficiencies, and enable more effective targeting, improved responsiveness, and new initiatives. The success of the CRM project often depends on changing the behavior and attitude of individuals across the organization and replacing, changing, and/or integrating many IT systems used across the organization. To realize success as you manage the potential culture changes, you may need to ask staff members to take on tasks and responsibilities that do not directly benefit them or the department managed by their supervisor, but provides value to the organization by what it enables others in the organization to accomplish. As a result, it is often very challenging to align the interests of the staff and organizational units with the organization's broader interest in seeking improved constituent relations, as promised by the CRM strategy. This is why an executive sponsor, such as the executive director is so important. They must help staff see beyond their immediate scope of responsibility and buy-in to the larger goals of the organization. On the technical side, CRM projects for mid- and large-sized organizations typically involve replacing or integrating other systems. Configuring and customizing a new software system, migrating data into it, testing and deploying it, and training the staff members to use it, can be a challenge at the best of times. Doing it for multiple systems and more users multiplies the challenge. Since a CRM initiative generally involves integrating separate systems, you must be prepared to face the potential complexity of working with disparate data schemas requiring transformations and cleanup for interoperability, and keeping middleware in sync with changes in multiple independent software packages. Unfortunately, these challenges to the CRM implementation initiative may lead to a project failure if they are not identified and addressed. Common causes for failure may include: Lack of executive-level sponsorship resulting in improperly resolved turf wars. IT-led initiatives have a greater tendency to focus on cost efficiency. This focus will generally not result in better constituent relations that are oriented toward achieving the organization's mission. An IT approach, particularly where users and usability experts are not on the project team, may also lead to poor user adoption if the system is not adapted to their needs, or even if the users are poorly trained. No customer data integration approach resulting in not overcoming the data silos problem, no consolidated view of constituents, poorer targeting, and an inability to realize enterprise-level benefits. Lack of buy-in, leading to a lack of use of the new CRM system and continued use of the old processes and systems it was meant to supplant. Lack of training and follow-up causing staff anxiety and opposition. This may cause non-use or misuse of the system, resulting in poor data handling and mix-ups in the way in which constituents are treated. Not customizing enough to actually meet the requirements of the organization in the areas of: Data integration Business processes User experiences Over-customizing, causes: The costs of the system to escalate, especially through future upgrades The best practices incorporated in the base functionality to be overridden in some cases The user forms to become overly complex The user experiences to become off-putting No strategy for dealing with the technical challenges associated with developing, extending, and/or integrating the CRM software system, leads to: Cost overruns Poorly designed and built software Poor user experiences Incomplete or invalid data This does not mean that project failure is inevitable or common. These clearly identifiable causes of failure can be overcome through effective project planning. Perfection is the enemy of the good CRM systems and their functional components such as fundraising, ticket sales, communication, event registration, membership management, and case management are essential for the core operations of most non profit. Since they are so central to the daily operations of the organization, there is often legitimate fear of project failure when considering changes. This fear can easily create a perfectionist mentality, where the project team attempts to overcompensate by creating too much oversight, contingency planning, and project discovery time in an effort to avoid missing any potentially useful feature that could be integrated into the project. While planning is good, perfection may not be good, as perfection is often the enemy of the good. CRM implementations often risk erring on the side of what is known as the MIT approach (tongue-in-cheek). The MIT approach believes in, and attempts to design, construct, and deploy, the right thing right from the start. Its big-brain approach to problem solving leads to correctness, completeness, and consistency in the design. It values simplicity in the user interface over simplicity in the implementation design. The other end of the spectrum is captured with aphorisms like "Less is More," "KISS" (Keep it simple, stupid), and "Worse is Better." This alternate view willingly accepts deviations from correctness, completeness, and consistency in design in favor of general simplicity or simplicity of implementation over simplicity of user interface. The reason that such counter-intuitive approaches to developing solutions have become respected and popular is the problems and failures that can result from trying to do it all perfectly from the start. Neither end of the spectrum is healthier. Handcuffing the project to an unattainable standard of perfection or over simplifying in order to artificially reduce complexity will both lead to project failure. Value and success are generally found somewhere in between. As a project manager, it will be your responsibility to set the tone, determine priorities, and plan the implementation and development process. One rule that may help achieve balance and move the project forward is, Release early, release often. This is commonly embraced in the open source community where collaboration is essential to success. This motto: Captures the intent of catching errors earlier Allows users to realize value from the system sooner Allows users to better imagine and articulate what the software should do through ongoing use and interaction with a working system early in the process Creates a healthy cyclical process where end users are providing rapid feedback into the development process, and where those ideas are considered, implemented, and released on a regular basis There is no perfect antidote to these two extremes—only an awareness of the tendency for projects (and stakeholders) to lean in one of the two directions—and the realization the both extremes should be avoided. Development methodologies Whatever approach your organization decides to take for developing and implementing its CRM strategy, it's usually good to have an agreed upon process and methodology. Your processes define the steps to be taken as you implement the project. Your methodology defines the rules for the process, that is, the methods to be used throughout the course of the project. The spirit of this problem-solving approach can be seen in the Traditional Waterfall Development model and in the contrasting Iterative and Incremental Development model. Projects naturally change and evolve over time. You may find that you embrace one of these methodologies for initial implementation and then migrate to a different method or mixed method for maintenance and future development work. By no means should you feel restricted by the definitions provided, but rather adjust the principles to meet your changing needs throughout the course of the project. That being said, it's important that your team understands the project rules at a given point in time so that the project management principles are respected. The conventional Waterfall Development methodology The traditional Waterfall method of software development is sometimes thought of as "big design upfront." It employs a sequential approach to development, moving from needs analysis and requirements, to architectural and user experience, detailed design, implementation, integration, testing, deployment, and maintenance. The output of each step or phase flows downward, such as water, to the next step in the process, as illustrated by the arrows in the following figure: The Waterfall model tends to be more formal, more planned, includes more documentation, and often has a stronger division of labor. This methodology benefits from having clear, linear, and progressive development steps in which each phase builds upon the previous one. However, it can suffer from inflexibility if used too rigidly. For example, if during the verification and quality assurance phase you realize a significant functionality gap resulting from incorrect (or changing) specification requirements, then it may be difficult to interject those new needs into the process. The "release early, release often" iterative principle mentioned earlier can help overcome that inflexibility. If the overall process is kept tight and the development window short, you can justify delaying the new functionality or corrective specifications for the next release. If you do not embrace the "release early, release often" principle—either because it is not supported by the project team or the scope of the project does not permit it—you should still anticipate the need for flexibility and build it into your methodology. The overriding principle is to define the rules of the game early, so your project team knows what options are available at each stage of development. Iterative development methodology Iterative development models depart from this structure by breaking the work up into chunks that can be developed and delivered separately. The Waterfall process is used in each phase or segment of the project, but the overall project structure is not necessarily held to the same rigid process. As one moves farther away from the Waterfall approach, there is a greater emphasis on evaluating incrementally delivered pieces of the solution and incorporating feedback on what has already been developed into the planning of future work, as illustrated in the loop in the following figure: This methodology seeks to take what is good in the traditional Waterfall approach—structure, clearly defined linear steps, a strong development/quality assurance/roll out process—and improve it through shorter development cycles that are centered on smaller segments of the overall project. Perhaps, the biggest challenge in this model is the project management role, as it may result in many moving pieces that must be tightly coordinated in order to release the final working product. An iterative development method can also feel a little like a dog chasing its tail—you keep getting work done, but never feel like you're getting anywhere. It may be necessary to limit how many iterations take place before you pause the process, take a step back, and consider where you are in the project's bigger picture—before you set new goals and begin the cyclical process again. Agile development methodology Agile development methodologies are an effective derivative of the iterative development model that moves one step further away from the Waterfall model. They are characterized by requirements and solutions evolving together, requiring work teams to be drawn from all the relevant parts of the organization. They organize themselves to work in rapid 1 to 4 week iteration cycles. Agile centers on time-based release cycles, and in this way, it differs from the other methodologies discussed, which are oriented more toward functionality-based releases. The following figure illustrates the implementation of an Agile methodology that highlights short daily Scrum status meetings, a product backlog containing features or user stories for each iteration, and a sprint backlog containing revisable and re-prioritizable work items for the team during the current iteration. A deliberate effort is usually made in order to ensure that the sprint backlog is long enough to ensure that the lowest priority items will not be dealt with before the end of the iteration. Although they can be put onto the list of work items that may or may not be selected for the next iteration, the idea is that the client or the product owner should, at some point, decide that it's not worth investing more resources in the "nice to have, but not really necessary" items. But having those low-priority backlog items are equally as important for maximizing developer efficiency. If developers are able to work through the higher priority issues faster than originally expected, the backlog items give them a chance to chip away at the "nice to have" features while keeping within the time-based release cycle (and your budget constraints). As one might expect, this methodology relies heavily on effective prioritization. Since software releases and development cycles adhere to rigid timeframes, only high-priority issues or features are actively addressed at a given point in time; the remaining incomplete issues falling lower on the list are subject to reassignment for the next cycle. While an Agile development model may seem attractive (and rightly so), there are a few things to realize before embracing it: The process of reviewing, prioritizing, and managing the issue list does take time and effort. Each cycle will require the team to evaluate the status of issues and reprioritize for the next release. Central to the success of this model is a rigid allegiance to time-based releases and a ruthless determination to prevent feature creep. Yes – you will have releases that are delayed a week or see must-have features or bug fixes that enter the issue queue late in the process. However, the exceptions must not become the rule, or you lose your agility. Summary This article lists common barriers to the success of CRM initiatives that arise because of people issues or technical issues. These include problems getting systems and tools supporting disparate business functions to provide integrated functionality. We advocate a pragmatic approach to implementing a CRM strategy for your organization. We encourage the adoption of a change in approach and associated processes and methodologies that works for your organization: wide ranges in the level of structure, formality, and planning all work in different organizations. Resources for Article: Further resources on this subject: Getting Dynamics CRM 2015 Data into Power BI [article] Customization in Microsoft Dynamics CRM [article] Getting Started with Microsoft Dynamics CRM 2013 Marketing [article]
Read more
  • 0
  • 0
  • 2280

article-image-project-structure
Packt
08 Jan 2016
14 min read
Save for later

The Project Structure

Packt
08 Jan 2016
14 min read
In this article written by Nathanael Anderson, author of the book Getting Started with NativeScript, we will see how to navigate through your new project and its full structure. We will explain each of the files that are automatically created and where you create your own files. Then, we will proceed to gain a deeper understanding of some of the basic components of your application, and we will finally see how to change screens. In this article, we will cover the following topics: An overview of the project directory The root directory The application components (For more resources related to this topic, see here.) Project directory overview By running the nativescript create crossCommunicator command, the command creates a nice structure of files and folders for us to explore. First, we will do a high level overview of the different folders and their purposes and touch on the important files in those folders. Then, we will finish the overview by going into depth about the App directory, which is where you will spend pretty much your entire time in developing an application. To give you a good visual overview, here is what the project hierarchy structure looks like: The root directory In your Root folder, you will see only a couple of directories. The package.json file will look something like this: {   "nativescript": {     "id": "org.nativescript.crossCommunicator",     "tns-android": {       "version": "1.5.0"     },     "tns-ios": {       "version": "1.5.0"     },   },   "dependencies": {     "tns-core-modules": "1.5.0"   } } This is the NativeScript master project configuration file for your entire application. It basically outlines the basic information and all platform requirements of the project. It will also be modified by the nativescript tool when you add and remove any plugins or platforms. So, in the preceding package.json file, you can see that I have installed the Android (tns-android) and iOS (tns-ios) platforms, using the nativscript platform add command, and they are both currently at version 1.5.0. The tns-core-modules dependency was added by the nativescript command when we created the project and it is the core modules. Changing the app ID Now, if you want this to be your company's name instead of the default ID of org.nativescript.yourProjectName, there are two ways to set the app ID. The first way is to set it up when you create the project; executing a nativescript create myProjName --appid com.myCompany.myProjName command will automatically set the ID value. If you forget to run the create command with a --appid option; you can change it here. However, any time you change the option here, you will also have to remove all the installed platforms and then re-add the platforms you are using. This must be done as when each platform is added because it uses this configuration id while building all the platform folders and all of the needed platform files. The package.json file This file contains basic information about the current template that is installed. When you create a new application via the nativescript create command, by default, it copies everything from a template project called tns-template-hello-world. In the future, NativeScript will allow you to choose the template, but currently this is the only template that is available and working. This template contains the folders we discussed earlier and all the files that we will discuss further. The package.json file is from this template, and it basically tells you about the template and its specific version that is installed. Le's take a look at the following code snippet: {   "name": "tns-template-hello-world",   "main": "app.js",   "version": "1.5.0", ... more json documentation fields... } Feel free to modify the package.json file so that it matches your project's name and details. This file currently does not currently serve much purpose beyond template documentation and the link to the main application file. License The License file that is present in the the App folder is the license that the tns-template-hello-world is under. In your case, your app will probably be distributed under a different license. You can either update this file to match your application's license or delete it. App.js Awesome! We have finally made it to the first of the JavaScript files that make up the code that we will change to make this our application. This file is the bootstrap file of the entire application, and this is where the fun begins. In the preceding package.json file, you can see that the package.json file references this file (app.js) under the main key. This key in the package.json file is what NativeScript uses to make this file the entry point for the entire application. Looking at this file, we can see that it currently has four simple lines of code: var application = require("application"); application.mainModule = "main-page"; application.cssFile = "./app.css"; application.start(); The file seems simple enough, so let's work through the code as we see it. The first line loads the NativeScript common application component class, and the application component wraps the initialization and life cycle of the entire application. The require() function is the what we use to reference another file in your project. It will look in your current directory by default and then it will use some additional logic to look into a couple places in the tns_core_modules folder to check whether it can find the name as a common component. Now that we have the application component loaded, the next lines are used to configure what the application class does when your program starts and runs. The second line tells which one of the files is the main page. The third line tells the application which CSS file is the main CSS file for the entire application. And finally, we tell the application component to actually start. Now, in a sense, the application has actually started running, we are already running our JavaScript code. This start() function actually starts the code that manages the application life cycle and the application events, loads and applies your main CSS file, and finally loads your main page files. If you want to add any additional code to this file, you will need to put it before the application.start() command. On some platforms, nothing below the application.start() command will be run in the app.js file. The main-page.js file The JavaScript portion of the page is probably where you will spend the majority of your time developing, as it contains all of the logic of your page. To create a page, you typically have a JavaScript file as you can do everything in JavaScript, and this is where all your logic for the page resides. In our application, the main page currently has only six lines of code: var vmModule = require("./main-view-model"); function pageLoaded(args) {   var page = args.object;   page.bindingContext = vmModule.mainViewModel; } exports.pageLoaded = pageLoaded; The first line loads the main-view-model.js file. If you haven't guessed yet, in our new application, this is used as the model file for this page. We will check out the optional model file in a few minutes after we are done exploring the rest of the main-page files. An app page does not have to have a model, they are totally optional. Furthermore, you can actually combine your model JavaScript into the page's JavaScript file. Some people find it easier to keep their model separate, so when Telerik designed this example, they built this application using the MVVM pattern, which uses a separate view model file. For more information on MVVM, you can take a look at the Wikipedia entry at https://en.wikipedia.org/wiki/Model_View_ViewModel. This file also has a function called pageLoaded, which is what sets the model object as the model for this page. The third line assigns the page variable to the page component object that is passed as part of the event handler. The fourth line assigns the model to the current page's bindingContext attribute. Then, we export the pageLoaded function as a function called pageLoaded. Using exports and module.exports is the way in which we publish something to other files that use require() to load it. Each file is its own independent blackbox, nothing that is not exported can be seen by any of the other files. Using exports, you can create the interface of your code to the rest of the world. This is part of the CommonJS standard, and you can read more about it at the NodeJS site. The main-page.xml file The final file of our application folder is also named main-page; it is the page layout file. As you can see, the main-page.xml layout consists of seven simple lines of XML code, which actually does quite a lot, as you can see: <Page loaded="pageLoaded">   <StackLayout>     <Label text="Tap the button" cssClass="title"/>     <Button text="TAP" tap="{{ tapAction }}" />     <Label text="{{ message }}" cssClass="message"     textWrap="true"/>   </StackLayout> </Page> Each of the XML layout files are actually a simplified way to load your visual components that you want on your page. In this case, it is what made your app look like this: The main-view-model.js file The final file in our tour of the App folder is the model file. This file has about 30 lines of code. By looking at the first couple of lines, you might have figured out that this file was transpiled from TypeScript. Since this file actually has a lot of boilerplate and unneeded code from the TypeScript conversion, we will rewrite the code in plain JavaScript to help you easily understand what each of the parts are used for. This rewrite will be as close to what as I can make it. So without further ado, here is the original transpiled code to compare our new code with: var observable = require("data/observable"); var HelloWorldModel = (function (_super) {   __extends(HelloWorldModel, _super);   function HelloWorldModel() {     _super.call(this);     this.counter = 42;     this.set("message", this.counter + " taps left");   }   HelloWorldModel.prototype.tapAction = function () {     this.counter--;     if (this.counter <= 0) {       this.set("message", "Hoorraaay! You unlocked the       NativeScript clicker achievement!");     }     else {       this.set("message", this.counter + " taps left");     }   };   return HelloWorldModel; })(observable.Observable); exports.HelloWorldModel = HelloWorldModel; exports.mainViewModel = new HelloWorldModel(); The Rewrite of the main-view-model.js file The rewrite of the main-view-model.js file is very straightforward. The first thing we need for a working model to also require the Observable class is the primary class that handles data binding events in NativeScript. We then create a new instance of the Observable class named mainViewModel. Next, we need to assign the two default values to the mainViewModel instance. Then, we create the same tapAction() function, which is the code that is executed each time when the user taps on the button. Finally, we export the mainViewModel model we created so that it is available to any other files that require this file. This is what the new JavaScript version looks like: // Require the Observable class and create a new Model from it var Observable = require("data/observable").Observable; var mainViewModel = new Observable(); // Setup our default values mainViewModel.counter = 42; mainViewModel.set("message", mainViewModel.counter + " taps left"); // Setup the function that runs when a tap is detected. mainViewModel.tapAction = function() {   this.counter--;   if (this.counter <= 0) {     this.set("message", "Hoorraaay! You unlocked the NativeScript     clicker achievement!");   } else {     this.set("message", this.counter + " taps left");   } }; // Export our already instantiated model class as the variable name that the main-page.js is expecting on line 4. exports.mainViewModel = mainViewModel; The set() command is the only thing that is not totally self-explanatory or explained in this code. What is probably fairly obvious is that this command sets the variable specified to the value specified. However, what is not obvious is when a value is set on an instance of the Observable class, it will automatically send a change event to anyone who has asked to be notified of any changes to that specific variable. If you recall, in the main-page.xml file, the: <Label text="{{ message }}" ….> line will automatically register the label component as a listener for all change events on the message variable when the layout system creates the label. Now, every time the message variable is changed, the text on this label changes. The application component If you recall, earlier in the article, we discussed the app.js file. It basically contains only the code to set up the properties of your application, and then finally, it starts the application component. So, you probably have guessed that this is the primary component for your entire application life cycle. A part of the features that this component provides us is access to all the application-wide events. Frequently in an app, you will want to know when your app is no longer the foreground application or when it finally returns to being the foreground application. To get this information, you can attach  the code to two of the events that it provides like this: application.on("suspend", function(event) {   console.log("Hey, I was suspended – I thought I was your   favorite app!"); }); application.on("resume", function(event) {   console.log("Awesome, we are back!"); }); Some of the other events that you can watch from the application component are launch, exit, lowMemory and uncaughtError. These events allow you to handle different application wide issues that your application might need to know about. Creating settings.js In our application, we will need a settings page; so, we will create the framework for our application's setting page now. We will just get our feet a little wet and explore how to build it purely in JavaScript. As you can see, the following code is fairly straightforward. First, we require all the components that we will be using: Frame, Page, StackLayout, Button, and finally, the Label component. Then, we have to export a createPage function, which is what NativeScript will be running to generate the page if you do not have an XML layout file to go along with the page's JavaScript file. At the beginning of our createPage function, we create each of the four components that we will need. Then, we assign some values and properties to make them have some sort of visual capability that we will be able to see. Next, we create the parent-child relationships and add our label and button to the Layout component, and then we assign that layout to the Page component. Finally, we return the page component: // Add our Requires for the components we need on our page var frame = require("ui/frame"); var Page = require("ui/page").Page; var StackLayout = require("ui/layouts/stack-layout").StackLayout; var Label = require("ui/label").Label; var Button = require("ui/button").Button;   // Create our required function which NativeScript uses // to build the page. exports.createPage = function() {   // Create our components for this page   var page = new Page();   var layout = new StackLayout();   var welcomeLabel = new Label();   var backButton = new Button();     // Assign our page title   page.actionBar.title = "Settings";   // Setup our welcome label   welcomeLabel.text = "You are now in Settings!";   welcomeLabel.cssClass = "message";     // Setup our Go Back button   backButton.text = "Go Back";   backButton.on("tap", function () {     frame.topmost().goBack();   });     // Add our layout items to our StackLayout   layout.addChild(welcomeLabel);   layout.addChild(backButton);     // Assign our layout to the page.   page.content = layout;   // Return our created page   return page; }; One thing I did want to mention here is that if you are creating a page totally programmatically without the use of a Declarative XML file, the createPage function must return the page component. The frame component is expected to have a Page component. Summary We have covered a large amount of foundational information in this article. We also covered which files are used for your application and where to find and make any changes to the project control files. In addition to all this, we also covered several foundational components such as the Application, Frame, and Page components. Resources for Article: Further resources on this subject: Overview of TDD [article] Understanding outside-in [article] Understanding TDD [article]
Read more
  • 0
  • 0
  • 13370

article-image-getting-started-emberjspart1
Daniel Ochoa
08 Jan 2016
9 min read
Save for later

Getting started with Ember.js – Part 1

Daniel Ochoa
08 Jan 2016
9 min read
Ember.js is a fantastic framework for developers and designers alike for building ambitious web applications. As touted by its website, Ember.js is built for productivity. Designed with the developer in mind, its friendly API’s help you get your job done fast. It also makes all the trivial choices for you. By taking care of certain architectural choices, it lets you concentrate on your application instead of reinventing the wheel or focusing on already solved problems. With Ember.js you will be empowered to rapidly prototype applications with more features for less code. Although Ember.js follows the MVC (Model-View-Controller) design pattern, it’s been slowly moving to a more component centric approach of building web applications. On this part 1 of 2 blog posts, I’ll be talking about how to quickly get started with Ember.js. I’ll go into detail on how to set up your development environment from beginning to end so you can immediately start building an Ember.js app with ember-cli – Ember’s build tool. Ember-cli provides an asset pipeline to handle your assets. It minifies and concatenates your JavaScript; it gives you a strong conventional project structure and a powerful addon system for extensions. In part two, I’ll guide you through setting up a very basic todo-like Ember.js application to get your feet wet with actual Ember.js development. Setup The first thing you need is node.js. Follow this guide on how to install node and npm from the npmjs.com website. Npm stands for Node Package Manager and it’s the most popular package manager out there for Node. Once you setup Node and Npm, you can install ember-cli with the following command on your terminal: npm install -g ember-cli You can verify whether you have correctly installed ember-cli by running the following command: ember –v If you see the output of the different versions you have for ember-cli, node, npm and your OS it means everything is correctly set up and you are ready to start building an Ember.js application. In order to get you more acquainted with ember-cli, you can run ember -h to see a list of useful ember-cli commands. Ember-cli gives you an easy way to install add-ons (packages created by the community to quickly set up functionality, so instead of creating a specific feature you need, someone may have already made a package for it. See http://emberobserver.com/). You can also generate a new project with ember init <app_name>, run the tests of your app with ember test and scaffold project files with ember generate. These are just one of the many useful commands ember-cli gives you by default. You can learn more specific subcommands for any given command by running ember <command_name> -help. Now that you know what ember-cli is useful for its time to move on to building a fun example application. Building an Ember.js app The application we will be building is an Ember.js application that will fetch a few posts from www.reddit.com/r/funny. It will display a list of these posts with some information about them such as title, author, and date. The purpose of this example application is to show you how easy it is to build an ember.js application that fetches data from a remote API and displays it. It will also show you have to leverage one of the most powerful features of Ember – its router. Now that you are more acquainted with ember-cli, let's create the skeleton of our application. There’s no need to worry and think about what packages and what features from ember we will need and we don’t even have to think about what local server to run in order to display our work in progress. First things first, run the following command on your terminal: ember new ember-r-funny We are running the ‘ember new’ command with the argument of ‘ember-r-funny’ which is what we are naming our app. Feel free to change the name to anything you’d like. From here, you’ll see a list of files being created. After it finishes, you’ll have the app directory with the app name being created in the current directory where you are working on. If you go into this directory and inspect the files, you’ll see that quite a few directories and files. For now don’t pay too much attention to these files except for the directory called ‘app’. This is where you’ll be mostly working on. On your terminal, if you got to the base path of your project (just inside of ember-r-funny/ ) and you run ember server, ember-cli will run a local server for you to see your app. If you now go on your browser to http://localhost:4200 you will see your newly created application, which is just a blank page with the wording Welcome to Ember. If you go into app/templates/application.js and change the <h1> tag and save the file, you’ll notice that your browser will automatically refresh the page. One thing to note before we continue, is that ember-cli projects allow you to use the ES6 JavaScript syntax. ES6 is a significant update to the language which although current browsers do not use it, ember-cli will compile your project to browser readable ES5. For a more in-depth explanation of ES6 visit: https://hacks.mozilla.org/2015/04/es6-in-depth-an-introduction/ Creating your first resource One of the strong points of Ember is the router. The router is responsible for displaying templates, loading data, and otherwise setting up application state. The next thing we need to do is to set up a route to display the /r/funny posts from reddit. Run the following command to create our base route: ember generate route index This will generate an index route. In Ember-speak, the index route is the base or lowest level route of the app. Now go to ‘app/routes/index.js’, and make sure the route looks like the following: import Ember from 'ember'; export default Ember.Route.extend({ beforeModel() { this.transitionTo('posts'); } }); This is telling the app that whenever a user lands on our base URL ‘/’, that it should transition to ‘posts’. Next, run the following command to generate our posts resource: ember generate resource posts If you open the ‘app/router.js file, you’ll see that that ‘this.route(‘posts’)’ was added. Change this to ‘this.resource(‘posts’)’ instead (since we want to deal with a resource and not a route). It should look like the following: import Ember from 'ember'; import config from './config/environment'; var Router = Ember.Router.extend({ location: config.locationType }); Router.map(function() { this.resource('posts'); }); export default Router;dfdf In this router.js file, we’ve created a ‘posts’ resource. So far, we’ve told the app to take the user to ‘/posts’ whenever the user goes to our app. Next, we’ll make sure to set up what data and templates to display when the user lands in ‘/posts’. Thanks to the generator, we now have a route file for posts under ‘app/routes/posts.js’. Open this file and make sure that it looks like the following: import Ember from 'ember'; export default Ember.Route.extend({ model() { return Ember.$.getJSON('https://www.reddit.com/r/funny.json?jsonp=?&limit=10').then(result => { return Ember.A(result.data.children).mapBy('data'); }); } }); Ember works by doing the asynchronous fetching of data on the model hook of our posts route. Routes are where we fetch the data so we can consume it and in this case, we are hitting reddit/r/funny and fetching the latest 10 posts. Once that data is returned, we filter out the unnecessary properties from the response so we can actually return an array with our 10 reddit post entries through the use of a handy function provided by ember called `mapBy`. One important thing to note is that you need to always return something on the model hook, be it either an array, object, or a Promise (which is what we are doing in this case; for a more in depth explanation of promises you can read more here). Now that we have our route wired up to fetch the information, let’s open the ‘app/templates/posts.hbs’ file, remove the current contents, and add the following: <ul> {{#each model as |post|}} <li>{{post.title}}</li> {{/each}} </ul> This is HTML mixed with the Handlebars syntax. Handlebars is the templating engine Ember.js uses to display your dynamic content. What this .hbs template is doing here is that it is looping through our model and displaying the title property for each object inside the model array. If you haven’t noticed yet, Ember.js is smart enough to know when the data has returned from the server and then displays it, so we don’t need to handle any callback functions as far as the model hook is concerned. At this point, it may be normal to see some deprecation warnings on the console, but if you see an error with the words ‘refused to load the script https://www.reddit.com/r/funny.json..’ you need to add the following key and value to the ‘config/environment.js’ file inside the ENV object: contentSecurityPolicy: { 'script-src': "'self' https://www.reddit.com" }, By default, ember-cli will prevent you from doing external requests and fetching external resources from different domain names. This is a security feature so we need to whitelist the reddit domain name so we can make requests against it. At this point, if you go to localhost:4200, you should be redirected to the /posts route and you should see something like the following: Congratulations, you’ve just created a simple ember.js app that displays the title of some reddit posts inside an html list element. So far we’ve added a few lines of code here and there and we already have most of what we need. In Part 2 of this blog, we will set up a more detailed view for each of our reddit posts. About the Author: Daniel Ochoa is a senior software engineer at Frog with a passion for crafting beautiful web and mobile experiences. His current interests are Node.js, Ember.js, Ruby on Rails, iOS development with Swift, and the Haskell language. He can be found on Twitter @DanyOchoaOzz.
Read more
  • 0
  • 0
  • 15204

article-image-distributed-resource-scheduler
Packt
07 Jan 2016
14 min read
Save for later

Distributed resource scheduler

Packt
07 Jan 2016
14 min read
In this article written by Christian Stankowic, author of the book vSphere High Performance Essentials In cluster setups, Distributed Resource Scheduler (DRS) can assist you with automatic balancing CPU and storage load (Storage DRS). DRS monitors the ESXi hosts in a cluster and migrates the running VMs using vMotion, primarily, to ensure that all the VMs get the resources they need. Secondarily, it tries to balance the cluster. In addition to this, Storage DRS monitors the shared storage for information about latency and capacity consumption. In this case, Storage DRS recognizes the potential to optimize storage resources; it will make use of Storage vMotion to balance the load. We will cover Storage DRS in detail later. (For more resources related to this topic, see here.) Working of DRS DRS primarily uses two metrics to determine the cluster balance: Active host CPU: it includes the usage (CPU task time in ms) and ready (wait times in ms per VMs to get scheduled on physical cores) metrics. Active host Memory: It describes the amount of memory pages that are predicted to have changed in the last 20 seconds. A math-sampling algorithm calculates this amount; however, it is quite inaccurate. Active host memory is often used for resource capacity purposes. Be careful with using this value as an indicator as it only describes how aggressively a workload changes the memory. Depending on your application architecture, it may not measure how much memory a particular VM really needs. Think about applications that allocate a lot of memory for the purpose of caching. Using the active host memory metric for the purpose of capacity might lead to inappropriate settings. The migration threshold controls DRS's aggressiveness and defines how much a cluster can be imbalanced. Refer to the following table for detailed explanation: DRS level Priorities Effect Most conservative 1 Only affinity/anti-affinity constraints are applied More conservative 1–2 This will also apply recommendations addressing significant improvements Balanced (default) 1–3 Recommendations that, at least, promise good improvements are applied More aggressive 1–4 DRS applies recommendations that only promise a moderate improvement Most aggressive 1–5 Recommendations only addressing smaller improvements are applied Apart from the migration threshold, two other metrics—Target Host Load Standard Deviation (THLSD) and Current host load standard deviation (CHLSD)—are calculated. THLSD defines how much a cluster node's load can differ from others in order to be still balanced. The migration threshold and the particular ESXi host's active CPU and memory values heavily influence this metric. CHLSD calculates whether the cluster is currently balanced. If this value differs from the THLSD, the cluster is imbalanced and DRS will calculate the recommendations in order to balance it. In addition to this, DRS also calculates the vMotion overhead that is needed for the migration. If a migration's overhead is deemed higher than the benefit, vMotion will not be executed. DRS also evaluates the migration recommendations multiple times in order to avoid ping pong migrations. By default, once enabled, DRS is polled every five minutes (300 seconds). Depending on your landscape, it might be required to change this behavior. To do so, you need to alter the vpxd.cfg configuration file on the vCenter Server machine. Search for the following lines and alter the period (in seconds): <config>   <drm>     <pollPeriodSec>       300[SR1]      </pollPeriodSec>   </drm> </config> Refer to the following table for configuration file location, depending on your vCenter implementation: vCenter Server type File location vCenter Server Appliance /etc/vmware-vpx/vpxd.cfg vCenter Server C:ProgramDataVMwareVMware VirtualCentervpxd.cfg Check-list – performance tuning There are a couple of things to be considered when optimizing DRS for high-performance setups, as shown in the following: Make sure to use the hosts with homogenous CPU and memory configuration. Having different nodes will make DRS less effective. Use at least 1 Gbps network connection for vMotion. For better performance, it is recommended to use 10 Gbps instead. For virtual machines, it is a common procedure not to oversize them. Only configure as much as the CPU and memory resources need. Migrating workloads with unneeded resources takes more time. Make sure not to exceed the ESXi host and cluster limits that are mentioned in the VMware vSphere Configuration Maximums document. For vSphere 5.5, refer to https://www.vmware.com/pdf/vsphere5/r55/vsphere-55-configuration-maximums.pdf. For vSphere 6.0, refer to https://www.vmware.com/pdf/vsphere6/r60/vsphere-60-configuration-maximums.pdf. Configure DRS To configure DRS for your cluster, proceed with the following steps: Select your cluster from the inventory tab and click Manage and Settings. Under Services, select vSphere DRS. Click Edit. Select whether DRS should act in the Partially Automated or Fully Automated mode. In partially automated mode, DRS will place VMs in appropriate hosts, once powered on; however, it wil not migrate the running workloads. In fully automated mode, DRS will also migrate the running workloads in order to balance the cluster load. The Manual mode only gives you recommendations and the administrator can select the recommendations to apply. To create resource pools at cluster level, you will need to have at least the manual mode enabled. Select the DRS aggressiveness. Refer to the preceding table for a short explanation. Using more aggressive DRS levels is only recommended when having homogenous CPU and memory setups! When creating VMware support calls regarding DRS issues, a DRS dump file called drmdump is important. This file contains various metrics that DRS uses to calculate the possible migration benefits. On the vCenter Server Appliance, this file is located in /var/log/vmware/vpx/drmdump/clusterName. On the Windows variant, the file is located in %ALLUSERSPROFILE%VMwareVMware VirtualCenterLogsdrmdumpclusterName. VMware also offers an online tool called VM Resource and Availability Service (http://hasimulator.vmware.com), telling you which VMs can be restarted during the ESXi host failures. It requires you to upload this metric file in order to give you the results. This can be helpful when simulating the failure scenarios. Enhanced vMotion capability Enhanced vMotion Compatibility (EVC) enables your cluster to migrate the workloads between ESXi hosts with different processor generations. Unfortunately, it is not possible to migrate workloads between Intel-based and AMD-based servers; EVC only enables migrations in different Intel or AMD CPU generations. Once enabled, all the ESXi hosts are configured to provide the same set of CPU functions. In other words, the functions of newer CPU generations are disabled to match those of the older ESXi hosts in the cluster in order to create a common baseline. Configuring EVC To enable EVC, perform the following steps: Select the affected cluster from the inventory tab. Click on Manage, Settings, VMware EVC, and Edit. Choose Enable EVC for AMD Hosts or Enable EVC for Intel Hosts. Select the appropriate CPU generation for the cluster (the oldest). Make sure that Compatibility acknowledges your configuration. Save the changes, as follows: As mixing older hosts in high-performance clusters is not recommended, you should also avoid using EVC. To sum it up, keep the following steps in mind when planning the use of DRS: Enable DRS if you plan to have automatic load balancing; this is highly recommended for high-performance setups. Adjust the DRS aggressiveness level to match your requirements. Too aggressive migration thresholds may result in too many migrations, therefore, play with this setting to find the best for you. Make sure to have a separated vMotion network. Using the same logical network components as for the VM traffic is not recommended and might result in poor workload performance. Don't overload ESXi hosts to spare some CPU resources for vMotion processes in order to avoid performance bottlenecks during migrations. In high-performance setups, mixing various CPU and memory configurations is not recommended to achieve better performance. Try not to use EVC. Also, keep license constraints in mind when configuring DRS. Some software products might require additional licenses if it runs on multiple servers. We will focus on this later. Affinity and anti-affinity rules Sometimes, it is necessary to separate workloads or stick them together. To name some examples, think about the classical multi-tier applications such as the following: Frontend layer Database layer Backend layer One possibility would be to separate the particular VMs on multiple ESXi hosts to increase resilience. If a single ESXi host that is serving all the workloads crashes, all application components are affected by this fault. Moving all the participating application VMs to one single ESXi can result in higher performance as network traffic does not need to leave the ESXi host. However, there are more use cases to create affinity and anti-affinity rules, as shown in the following: Diving into production, development, and test workloads. For example, it would possible to separate production from the development and test workloads. This is a common procedure that many application vendors require. Licensing reasons (for example, license bound to the USB dongle, per core licensing, software assurance denying vMotion, and so on.) Application interoperability incompatibility (for example, applications need to run on separated hosts). As VMware vSphere has no knowledge about the license conditions of the workloads running virtualized, it is very important to check your software vendor's license agreements. You, as a virtual infrastructure administrator, are responsible to ensure that your software is fully licensed. Some software vendors require special licenses when running virtualized/on multiple hosts. There are two kinds of affinity/anti-affinity rules: VM-Host (relationship between VMs and ESXi hosts) and VM-VM (intra-relationship between particular VMs). Each rule consists of at least one VM and host DRS group. These groups also contain at least one entry. Every rule has a designation, where the administrator can choose between must or should. Implementing a rule with the should designation results in a preference on hosts satisfying all the configured rules. If no applicable host is found, the VM is put on another host in order to ensure at least the workload is running. If the must designation is selected, a VM is only running on hosts that are satisfying the configured rules. If no applicable host is found, the VM cannot be moved or started. This configuration approach is strict and requires excessive testing in order to avoid unplanned effects. DRS rules are rather combined than ranked. Therefore, if multiple rules are defined for a particular VM/host or VM/VM combination, the power-on process is only granted if all the rules apply to the requested action. If two rules are conflicting for a particular VM/host or VM/VM combination, the first rule is chosen and the other rule is automatically disabled. Especially, the use of the must rules should be evaluated very carefully as HA might not restart some workloads if these rules cannot be followed in case of a host crash. Configuring affinity/anti-affinity rules In this example, we will have a look at two use cases that affinity/anti-affinity rules can apply to. Example 1: VM-VM relationship This example consists of two VMs serving a two-tier application: db001 (database VM) and web001 (frontend VM). It is advisable to have both VMs running on the same physical host in order to reduce networking hops to connect the frontend server to its database. To configure the VM-VM affinity rule, proceed with the following steps: Select your cluster from the inventory tab and click Manage and VM/Host Rule underneath Configuration. Click Add. Enter a readable rule name (for example, db001-web001-bundle) and select Enable rule. Select the Keep Virtual Machines Together type and select the affected VMs. Click OK to save the rule, as shown in the following: When migrating one of the virtual machines using vMotion, the other VM will also migrate. Example 2: VM-Host relationship In this example, a VM (vcsa) is pinned to a particular ESXi host of a two-node cluster designated for production workloads. To configure the VM-Host affinity rule, proceed with the following steps: Select your cluster from the inventory tab and click Manage and VM/Host Groups underneath Configuration. Click Add. Enter a group name for the VM; make sure to select the VM Group type. Also, click Add to add the affected VM. Click Add once again. Enter a group name for the ESXi host; make sure to select the Host Group type. Later, click Add to add the ESXi host. Select VM/Host Rule underneath Configuration and click Add. Enter a readable rule name (for example, vcsa-to-esxi02) and select Enable rule. Select the Virtual Machines to Hosts type and select the previously created VM and host groups. Make sure to select Must run on hosts in group or Should run on hosts in group before clicking OK, as follows: Migrating the virtual machine to another host will fail with the following error message if Must run on hosts in group was selected earlier: Keep the following in mind when designing affinity and anti-affinity rules: Enable DRS. Double-check your software vendor's licensing agreements. Make sure to test your affinity/anti-affinity rules by simulating vMotion processes. Also, simulate host failures by using maintenance mode to ensure that your rules are working as expected. Note that the created rules also apply to HA and DPM. KISS – Keep it simple, stupid. Try to avoid utilizing too many or multiple rules for one VM/host combination. Distributed power management High performance setups are often the opposite of efficient, green infrastructures; however, high-performing virtual infrastructure setups can be efficient as well. Distributed Power Management (DPM) can help you with reducing the power costs and consumption of your virtual infrastructure. It is part of DRS and monitors the CPU and memory usage of all workloads running in the cluster. If it is possible to run all VMs on fewer hosts, DPM will put one or more ESXi hosts in standby mode (they will be powered off) after migrating the VMs using vMotion. DPM tries to keep the CPU and memory usage between 45% and 81% for all the cluster nodes by default. If this range is exceeded, the hosts will be powered on/off. Setting two advanced parameters can change this behaviour, as follows:DemandCapacityRatioTarget: Utilization target for the ESXi hosts (default: 63%)DemandCapacityRatioToleranceHost: Utilization range around target utilization (default 18%)The range is calculated as follows: (DemandCapacityRatioTarget - DemandCapacityRatioToleranceHost) to (DemandCapacityRatioTarget + DemandCapacityRatioToleranceHost)in this example we can calculate range: (63% - 18%) to (63% + 18%) To control a server's power state, DPM makes use of these three protocols in the following order: Intelligent Platform Management Interface (IPMI) Hewlett Packard Integrated Lights-Out (HP iLO) Wake-on-LAN (WoL) To enable IPMI/HP iLO management, you will need to configure the Baseboard Management Controller (BMC) IP address and other access information. To configure them, follow the given steps: Log in to vSphere Web Client and select the host that you want to configure for power management. Click on Configuration and select the Power Management tab. Select Properties and enter an IP address, MAC address, username, and password for the server's BMC. Note that entering hostnames will not work, as shown in the following: To enable DPM for a cluster, perform the following steps: Select the cluster from the inventory tab and select Manage. From the Services tab, select vSphere DRS and click Edit. Expand the Power Management tab and select Manual or Automatic. Also, select the threshold, DPM will choose to make power decisions. The higher the value, the faster DPM will put the ESXi hosts in standby mode, as follows: It is also possible to disable DPM for a particular host (for example, the strongest in your cluster). To do so; select the cluster and select Manage and Host Options. Check the host and click Edit. Make sure to select Disabled for the Power Management option. Consider giving a thought to the following when planning to utilize DPM: Make sure your server's have a supported BMC, such as HP iLO or IPMI. Evaluate the right DPM threshold. Also, keep your server's boot time (including firmware initialization) in mind and test your configuration before running in production. Keep in mind that the DPM also uses Active Memory and CPU usage for its decisions. Booting VMs might claim all memory; however, not use many active memory resources. If hosts are powered down while plenty VMs are booting, this might result in extensive swapping. Summary In this article, you learned how to implement the affinity and anti-affinity rules. You have also learned how to save power, while still achieving our workload requirements. Resources for Article: Further resources on this subject: Monitoring and Troubleshooting Networking [article] Storage Scalability [article] Upgrading VMware Virtual Infrastructure Setups [article]
Read more
  • 0
  • 0
  • 12447

article-image-understanding-elf-specimen
Packt
07 Jan 2016
21 min read
Save for later

Understanding the ELF specimen

Packt
07 Jan 2016
21 min read
In this article by Ryan O'Neill, author of the book Learning Linux Binary Analysis, ELF will be discussed. In order to reverse-engineer Linux binaries, we must understand the binary format itself. ELF has become the standard binary format for UNIX and UNIX-Flavor OS's. Binary formats such as ELF are not generally a quick study, and to learn ELF requires some degree of application of the different components that you learn as you go. Programming things that perform tasks such as binary parsing will require learning some ELF, and in the act of programming such things, you will in-turn learn ELF better and more proficiently as you go along. ELF is often thought to be a dry and complicated topic to learn, and if one were to simply read through the ELF specs without applying them through the spirit of creativity, then indeed it would be. ELF is really quite an incredible composition of computer science at work, with program layout, program loading, dynamic linking, symbol table lookups, and many other tightly orchestrated components. (For more resources related to this topic, see here.) ELF section headers Now that we've looked at what program headers are, it is time to look at section headers. I really want to point out here the distinction between the two; I often hear people calling sections "segments" and vice versa. A section is not a segment. Segments are necessary for program execution, and within segments are contained different types of code and data which are separated within sections and these sections always exist, and usually they are addressable through something called section-headers. Section-headers are what make sections accessible, but if the section-headers are stripped (Missing from the binary), it doesn't mean that the sections are not there. Sections are just data or code. This data or code is organized across the binary in different sections. The sections themselves exist within the boundaries of the text and data segment. Each section contains either code or data of some type. The data could range from program data, such as global variables, or dynamic linking information that is necessary for the linker. Now, as mentioned earlier, every ELF object has sections, but not all ELF objects have section headers. Usually this is because the executable has been tampered with (Such as the section headers having been stripped so that debugging is harder). All of GNU's binutils, such as objcopy, objdump, and other tools such as gdb, rely on the section-headers to locate symbol information that is stored in the sections specific to containing symbol data. Without section-headers, tools such as gdb and objdump are nearly useless. Section-headers are convenient to have for granular inspection over what parts or sections of an ELF object we are viewing. In fact, section-headers make reverse engineering a lot easier, since they provide us with the ability to use certain tools that require them. If, for instance, the section-header table is stripped, then we can't access a section such as .dynsym, which contains imported/exported symbols describing function names and offsets/addresses. Even if a section-header table has been stripped from an executable, a moderate reverse engineer can actually reconstruct a section-header table (and even part of a symbol table) by getting information from certain program headers, since these will always exist in a program or shared library. We discussed the dynamic segment earlier and the different DT_TAGs that contain information about the symbol table and relocation entries. This is what a 32 bit ELF section-header looks like: typedef struct {     uint32_t   sh_name; // offset into shdr string table for shdr name     uint32_t   sh_type; // shdr type I.E SHT_PROGBITS     uint32_t   sh_flags; // shdr flags I.E SHT_WRITE|SHT_ALLOC     Elf32_Addr sh_addr;  // address of where section begins     Elf32_Off  sh_offset; // offset of shdr from beginning of file     uint32_t   sh_size;   // size that section takes up on disk     uint32_t   sh_link;   // points to another section     uint32_t   sh_info;   // interpretation depends on section type     uint32_t   sh_addralign; // alignment for address of section     uint32_t   sh_entsize;  // size of each certain entries that may be in section } Elf32_Shdr; Let's take a look at some of the most important section types, once again allowing room to study the ELF(5) man pages and the official ELF specification for more detailed information about the sections. .text The .text section is a code section that contains program code instructions. In an executable program where there are also phdr, this section would be within the range of the text segment. Because it contains program code, it is of the section type SHT_PROGBITS. .rodata The rodata section contains read-only data, such as strings, from a line of C code: printf("Hello World!n"); These are stored in this section. This section is read-only, and therefore must exist in a read-only segment of an executable. So, you would find .rodata within the range of the text segment (Not the data segment). Because this section is read-only, it is of the type SHT_PROGBITS. .plt The Procedure linkage table (PLT) contains code that is necessary for the dynamic linker to call functions that are imported from shared libraries. It resides in the text segment and contains code, so it is marked as type SHT_PROGBITS. .data The data section, not to be confused with the data segment, will exist within the data segment and contain data such as initialized global variables. It contains program variable data, so it is marked as SHT_PROGBITS. .bss The bss section contains uninitialized global data as part of the data segment, and therefore takes up no space on the disk other than 4 bytes, which represents the section itself. The data is initialized to zero at program-load time, and the data can be assigned values during program execution. The bss section is marked as SHT_NOBITS, since it contains no actual data. .got The Global offset table (GOT) section contains the global offset table. This works together with the PLT to provide access to imported shared library functions, and is modified by the dynamic linker at runtime. This section has to do with program execution and is therefore marked as SHT_PROGBITS. .dynsym The dynsym section contains dynamic symbol information imported from shared libraries. It is contained within the text segment and is marked as type SHT_DYNSYM. .dynstr The dynstr section contains the string table for dynamic symbols; this has the name of each symbol in a series of null terminated strings. .rel.* Relocation sections contain information about how the parts of an ELF object or process image need to be fixed up or modified at linking or runtime. .hash The hash section, sometimes called .gnu.hash, contains a hash table for symbol lookup. The following hash algorithm is used for symbol name lookups in Linux ELF: uint32_t dl_new_hash (const char *s) {         uint32_t h = 5381;           for (unsigned char c = *s; c != ' '; c = *++s)                 h = h * 33 + c;           return h; } .symtab The symtab section contains symbol information of the type ElfN_Sym. The symtab section is marked as type SHT_SYMTAB as it contains symbol information. .strtab This section contains the symbol string table that is referenced by the st_name entries within the ElfN_Sym structs of .symtab, and is marked as type SHT_STRTAB since it contains a string table. .shstrtab The shstrtab section contains the section header string table, which is a set of null terminated strings containing the names of each section, such as .text, .data, and so on. This section is pointed to by the ELF file header entry called e_shstrndx, which holds the offset of .shstrtab. This section is marked as SHT_STRTAB since it contains a string table. .ctors and .dtors The .ctors (constructors) and .dtors (destructors) sections contain code for initialization and finalization, which is to be executed before and after the actual main() body of program code, and then after the main program code. The __constructor__ function attribute is often used by hackers and virus writers to implement a function that performs an anti-debugging trick, such as calling PTRACE_TRACEME, so that the process traces itself and no debuggers can attach themselves to it. This way, the anti-debugging mechanism gets executed before the program enters main(). There are many other section names and types, but we have covered most of the primary ones found in a dynamically linked executable. One can now visualize how an executable is laid out with both phdrs and shdrs: ELF Relocations From the ELF(5) man pages: Relocation is the process of connecting symbolic references with symbolic definitions.  Relocatable files must have information that describes how to modify their section contents, thus allowing executable and shared object files to hold the right information for a process's program image. Relocation entries are these data. The process of relocation relies on symbols, which is why we covered symbols first. An example of relocation might be a couple of relocatable objects (ET_REL) being linked together to create an executable. obj1.o wants to call a function, foo(), located in obj2.o. Both obj1.o and obj2.o are being linked to create a fully working executable; they are currently Position independent code (PIC), but once relocated to form an executable, they will no longer be position independent since symbolic references will be resolved into symbolic definitions. The term "relocated" means exactly that: a piece of code or data is being relocated from a simple offset in an object file to some memory address location in an executable, and anything that references that relocated code or data must also be adjusted. Let's take a quick look at a 32 bit relocation entry: typedef struct {     Elf32_Addr r_offset;     uint32_t   r_info; } Elf32_Rel; And some relocation entries require an addend: typedef struct {     Elf32_Addr r_offset;     uint32_t   r_info;     int32_t    r_addend; } Elf32_Rela; Following is the description of the preceding snippet: r_offset: This points to the location (offset or address) that requires the relocation action (which is going to be some type of modification) r_info: This gives both the symbol table index with respect to which the relocation must be made, and the type of relocation to apply r_addend: This specifies a constant addend used to compute the value stored in the relocatable field Let's take a look at the source code for obj1.o: _start() {   foo(); } We see that it calls the function foo(), however foo() is not located within the source code or the compiled object file, so there will be a relocation entry necessary for symbolic reference: ryan@alchemy:~$ objdump -d obj1.o obj1.o:     file format elf32-i386 Disassembly of section .text: 00000000 <func>:    0:  55                     push   %ebp    1:  89 e5                  mov    %esp,%ebp    3:  83 ec 08               sub    $0x8,%esp    6:  e8 fc ff ff ff         call   7 <func+0x7>    b:  c9                     leave     c:  c3                     ret   As we can see, the call to foo() is highlighted and simply calls to nowhere; 7 is the offset of itself. So, when obj1.o, which calls foo() (located in obj2.o), is linked with obj2.o to make an executable, a relocation entry is there to point at offset 7, which is the data that needs to be modified, changing it to the offset of the actual function, foo(), once the linker knows its location in the executable during link time: ryan@alchemy:~$ readelf -r obj1.o Relocation section '.rel.text' at offset 0x394 contains 1 entries:  Offset     Info    Type            Sym.Value  Sym. Name 00000007  00000902 R_386_PC32        00000000   foo As we can see, a relocation field at offset 7 is specified by the relocation entry's r_offset field. R_386_PC32 is the relocation type; to understand all of these types, read the ELF specs as we will only be covering some. Each relocation type requires a different computation on the relocation target being modified. R_386_PC32 says to modify the target with S + A – P. The following list explains all these terms: S is the value of the symbol whose index resides in the relocation entry A is the addend found in the relocation entry P is the place (section offset or address) where the storage unit is being relocated (computed using r_offset) If we look at the final output of our executable after compiling obj1.o and obj2.o, as shown in the following code snippet: ryan@alchemy:~$ gcc -nostdlib obj1.o obj2.o -o relocated ryan@alchemy:~$ objdump -d relocated test:     file format elf32-i386 Disassembly of section .text: 080480d8 <func>:  80480d8:  55                     push   %ebp  80480d9:  89 e5                  mov    %esp,%ebp  80480db:  83 ec 08               sub    $0x8,%esp  80480de:  e8 05 00 00 00         call   80480e8 <foo>  80480e3:  c9                     leave   80480e4:  c3                     ret     80480e5:  90                     nop  80480e6:  90                     nop  80480e7:  90                     nop 080480e8 <foo>:  80480e8:  55                     push   %ebp  80480e9:  89 e5                  mov    %esp,%ebp  80480eb:  5d                     pop    %ebp  80480ec:  c3                     ret We can see that the call instruction (the relocation target) at 0x80480de has been modified with the 32 bit offset value of 5, which points to foo(). The value 5 is the result of the R386_PC_32 relocation action: S + A – P: 0x80480e8 + 0xfffffffc – 0x80480df = 5 0xfffffffc is the same as -4 if a signed integer, so the calculation can also be seen as: 0x80480e8 + (0x80480df + sizeof(uint32_t)) To calculate an offset into a virtual address, use the following computation: address_of_call + offset + 5 (Where 5 is the length of the call instruction) Which in this case is 0x80480de + 5 + 5 = 0x80480e8. An address may also be computed into an offset with the following computation: address – address_of_call – 4 (Where 4 is the length of a call instruction – 1) Relocatable code injection based binary patching Relocatable code injection is a technique that hackers, virus writers, or anyone who wants to modify the code in a binary may utilize as a way to sort of re-link a binary after it has already been compiled. That is, you can inject an object file into an executable, update the executables symbol table, and perform the necessary relocations on the injected object code so that it becomes a part of the executable. A complicated virus might use this rather than just appending code at the end of an executable or finding existing padding. This technique requires extending the text segment to create enough padding room to load the object file. The real trick though is handling the relocations and applying them properly. I designed a custom reverse engineering tool for ELF that is named Quenya. Quenya has many features and capabilities, and one of them is to inject object code into an executable. Why do this? Well, one reason would be to inject a malicious function into an executable, and then hijack a legitimate function and replace it with the malicious one. From a security point of view, one could do hot-patching and apply a legitimate patch to a binary rather than doing something malicious. Let's pretend we are an attacker and we want to infect a program that calls puts() to print "Hello World", and our goal is to hijack puts() so that it calls evil_puts(). First, we would need to write a quick PIC object that can write a string to standard output: #include <sys/syscall.h> int _write (int fd, void *buf, int count) {   long ret;     __asm__ __volatile__ ("pushl %%ebxnt"                         "movl %%esi,%%ebxnt"                         "int $0x80nt" "popl %%ebx":"=a" (ret)                         :"0" (SYS_write), "S" ((long) fd),                         "c" ((long) buf), "d" ((long) count));   if (ret >= 0) {     return (int) ret;   }   return -1; } int evil_puts(void) {         _write(1, "HAHA puts() has been hijacked!n", 31); } Now, we compile evil_puts.c into evil_puts.o, and inject it into our program, hello_world: ryan@alchemy:~/quenya$ ./hello_world Hello World This program calls the following: puts(“Hello Worldn”); We now use Quenya to inject and relocate our evil_puts.o file into hello_world: [Quenya v0.1@alchemy] reloc evil_puts.o hello_world 0x08048624  addr: 0x8048612 0x080485c4 _write addr: 0x804861e 0x080485c4  addr: 0x804868f 0x080485c4  addr: 0x80486b7 Injection/Relocation succeeded As we can see, the function write() from our evil_puts.o has been relocated and assigned an address at 0x804861e in the executable, hello_world. The next command, hijack, overwrites the global offset table entry for puts() with the address of evil_puts(): [Quenya v0.1@alchemy] hijack binary hello_world evil_puts puts Attempting to hijack function: puts Modifying GOT entry for puts Succesfully hijacked function: puts Commiting changes into executable file [Quenya v0.1@alchemy] quit And Whammi! ryan@alchemy:~/quenya$ ./hello_world HAHA puts() has been hijacked! We have successfully relocated an object file into an executable and modified the executable's control flow so that it executes the code that we injected. If we use readelf -s on hello_world, we can actually now see a symbol called evil_puts(). For the readers interest, I have included a small snippet of code that contains the ELF relocation mechanics in Quenya; it may be a little bit obscure without knowledge of the rest of the code base, but it is also somewhat straightforward if you've paid attention to what we learned about relocations. It is just a snippet and does not show any of the other important aspects such as modifying the executables symbol table: case SHT_RELA: for (j = 0; j < obj.shdr[i].sh_size / sizeof(Elf32_Rela); j++, rela++) {   rela = (Elf32_Rela *)(obj.mem + obj.shdr[i].sh_offset);       /* symbol table */                            symtab = (Elf32_Sym *)obj.section[obj.shdr[i].sh_link];               /* symbol we are applying relocation to */       symbol = &symtab[ELF32_R_SYM(rela->r_info)];        /* section to modify */       TargetSection = &obj.shdr[obj.shdr[i].sh_info];       TargetIndex = obj.shdr[i].sh_info;        /* target location */       TargetAddr = TargetSection->sh_addr + rela->r_offset;              /* pointer to relocation target */       RelocPtr = (Elf32_Addr *)(obj.section[TargetIndex] + rela->r_offset);              /* relocation value */       RelVal = symbol->st_value;       RelVal += obj.shdr[symbol->st_shndx].sh_addr;       switch (ELF32_R_TYPE(rela->r_info))       {         /* R_386_PC32      2    word32  S + A - P */         case R_386_PC32:               *RelocPtr += RelVal;                   *RelocPtr += rela->r_addend;                   *RelocPtr -= TargetAddr;                   break;              /* R_386_32        1    word32  S + A */            case R_386_32:                *RelocPtr += RelVal;                   *RelocPtr += rela->r_addend;                   break;       }  } As shown in the preceding code, the relocation target that RelocPtr points to is modified according to the relocation action requested by the relocation type (such as R_386_32). Although relocatable code binary injection is a good example of the idea behind relocations, it is not a perfect example of how a linker actually performs it with multiple object files. Nevertheless, it still retains the general idea and application of a relocation action. Later on, we will talk about shared library (ET_DYN) injection, which brings us now to the topic of dynamic linking. Summary In this article we discussed different types of ELF section headers and ELF relocations. Resources for Article:   Further resources on this subject: Advanced User Management [article] Creating Code Snippets [article] Advanced Wireless Sniffing [article]
Read more
  • 0
  • 0
  • 10445

article-image-parallelization-using-reducers
Packt
06 Jan 2016
18 min read
Save for later

Parallelization using Reducers

Packt
06 Jan 2016
18 min read
In this article by Akhil Wali, the author of the book Mastering Clojure, we will study this particular abstraction of collections and how it is quite orthogonal to viewing collections as sequences. Sequences and laziness are great way of handling collections. The Clojure standard library provides several functions to handle and manipulate sequences. However, abstracting a collection as a sequence has an unfortunate consequence—any computation that is performed over all the elements of a sequence is inherently sequential. All standard sequence functions create a new collection that be similar to the collection they are passed. Interestingly, performing a computation over a collection without creating a similar collection—even as an intermediary result—is quite useful. For example, it is often required to reduce a given collection to a single value through a series of transformations in an iterative manner. This sort of computation does not necessarily require the intermediary results of each transformation to be saved. (For more resources related to this topic, see here.) A consequence of iteratively computing values from a collection is that we cannot parallelize it in a straightforward way. Modern map-reduce frameworks handle this kind of computation by pipelining the elements of a collection through several transformations in parallel and finally reducing the results into a single result. Of course, the result could be a new collection as well. A drawback is that this methodology produces concrete collections as intermediate results of each transformation, which is rather wasteful. For example, if we want to filter values from a collection, a map-reduce strategy would require creating empty collections to represent values that are left out of the reduction step to produce the final result. This incurs unnecessary memory allocation and also creates additional work for the reduction step that produces the final result. Hence, there’s scope for optimizing these kinds of computations. This brings us to the notion of treating computations over collections as reducers to attain better performance. Of course, this doesn't mean that reducers are a replacement for sequences. Sequences and laziness are great for abstracting computations that create and manipulate collections, while reducers are specialized high-performance abstractions of collections in which a collection needs to be piped through several transformations and combined to produce the final result. Reducers achieve a performance gain in the following ways: Reducing the amount of memory allocated to produce the desired result Parallelizing the process of reducing a collection into a single result, which could be an entirely new collection The clojure.core.reducers namespace provides several functions for processing collections using reducers. Let's now examine how reducers are implemented and also study a few examples that demonstrate how reducers can be used. Using reduce to transform collections Sequences and functions that operate on sequences preserve the sequential ordering between the constituent elements of a collection. Lazy sequences avoid unnecessary realization of elements in a collection until they are required for a computation, but the realization of these values is still performed in a sequential manner. However, this characteristic of sequential ordering may not be desirable for all computations performed over it. For example, it's not possible to map a function over a vector and then lazily realize values in the resulting collection by random access, since the map function converts the collection that it is supplied into a sequence. Also, functions such as map and filter are lazy but still sequential by nature. Consider a unary function as shown in Example 3.1 that we intend to map it over a given vector. The function must compute a value from the one it is supplied, and also perform a side effect so that we can observe its application over the elements in a collection: Example 3.1. A simple unary function (defn square-with-side-effect [x]   (do     (println (str "Side-effect: " x))     (* x x))) The square-with-side-effect function defined here simply returns the square of a number x using the * function. This function also prints the value of x using a println form whenever it is called. Suppose this function is mapped over a given vector. The resulting collection will have to be realized completely if a computation has to be performed over it, even if all the elements from the resulting vector are not required. This can be demonstrated as follows: user> (def mapped (map square-with-side-effect [0 1 2 3 4 5])) #'user/mapped user> (reduce + (take 3 mapped)) Side-effect: 0 Side-effect: 1 Side-effect: 2 Side-effect: 3 Side-effect: 4 Side-effect: 5 5 As previously shown, the mapped variable contains the result of mapping the square-with-side-effect function over a vector. If we try to sum the first three values in the resulting collection using the reduce, take, and + functions, all the values in the [0 1 2 3 4 5] vector are printed as a side effect. This means that the square-with-side-effect function was applied to all the elements in the initial vector, despite the fact that only the first three elements were actually required by the reduce form. Of course, this can be solved by using the seq function to convert the vector to a sequence before we map the square-with-side-effect function over it. But then, we lose the ability to efficiently access elements in a random order in the resulting collection. To dive deeper into why this actually happens, you first need to understand how the standard map function is actually implemented. A simplified definition of the map function is shown here: Example 3.2. A simplified definition of the map function (defn map [f coll]   (cons (f (first coll))         (lazy-seq (map f (rest coll))))) The definition of map in Example 3.2 is a simplified and rather incomplete one, as it doesn't check for an empty collection and cannot be used over multiple collections. That aside, this definition of map does indeed apply a function f to all the elements in a coll collection. This is implemented using a composition of the cons, first, rest, and lazy-seq forms. The implementation can be interpreted as, "apply the f function to the first element in the coll collection, and then map f over the rest of the collection in a lazy manner." An interesting consequence of this implementation is that the map function has the following characteristics: The ordering among the elements in the coll collection is preserved. This computation is performed recursively. The lazy-seq form is used to perform the computation in a lazy manner. The use of the first and rest forms indicates that coll must be a sequence, and the cons form will also produce a result that is a sequence. Hence, the map function accepts a sequence and builds a new one. Another interesting characteristic about lazy sequences is that they are realized in chunks. This means that a lazy sequence is realized in chunks of 32 elements, each as an optimization, when the values in the sequence are actually required. Sequences that behave this way are termed as chunked sequences. Of course, not all sequences are chunked, and we can check whether a given sequence is chunked using the chunked-seq? predicate. The range function returns a chunked sequence, shown as follows: user> (first (map #(do (print !) %) (range 70))) !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! 0 user> (nth (map #(do (print !) %) (range 70)) 32) !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! 32 Both the statements in the output shown previously select a single element from a sequence returned by the map function. The function passed to the map function in both the above statements prints the ! character and returns the value supplied to it. In the first statement, the first 32 elements of the resulting sequence are realized, even though only the first element is required. Similarly, the second statement is observed to realize the first 64 elements of the resulting sequence when the element at the 32nd position is obtained using the nth function. Chunked sequences have been an integral part of Clojure since version 1.1. However, none of the properties of the sequences described above are needed to transform a given collection into a result that is not a sequence. If we were to handle such computations efficiently, we cannot build on functions that return sequences, such as map and filter. Incidentally, the reduce function does not necessarily produce a sequence. It also has a couple of other interesting properties: The reduce function actually lets the collection it is passed to define how it is computed over or reduced. Thus, reduce is collection independent. Also, the reduce function is versatile enough to build a single value or an entirely new collection as well. For example, using reduce with the * or + function will create a single-valued result, while using it with the cons or concat function can create a new collection as the result. Thus, reduce can build anything. A collection is said to be reducible if it defines how it can be reduced to a single result. The binary function that is used by the reduce function along with a collection is also termed as a reducing function. A reducing function requires two arguments: one to represent the result of the computation so far, and another to represent an input value that has to be combined into the result. Several reducing functions can be composed into one, which effectively changes how the reduce function processes a given collection. This composition is done using reducers, which can be thought of as a shortened version of the term reducing function transformers. The use of sequences and laziness can be compared to the use of reducers to perform a given computation by Rich Hickey's infamous pie-maker analogy. Suppose a pie-maker has been supplied a bag of apples with an intent to reduce the apples to a pie. There are a couple of transformations needed to perform this task. First, the stickers on all the apples have to be removed; as in, we map a function to "take the sticker off" the apples in the collection. Also, all the rotten apples will have to be removed, which is analogous to using the filter function to remove elements from a collection. Instead of performing this work herself, the pie-maker delegates it to her assistant. The assistant could first take the stickers off all the apples, thus producing a new collection, and then take out the rotten apples to produce another new collection, which illustrates the use of lazy sequences. But then, the assistant would be doing unnecessary work by removing the stickers off of the rotten apples, which will have to be discarded later anyway. On the other hand, the assistant could delay this work until the actual reduction of the processed apples into a pie is performed. Once the work actually needs to be performed, the assistant will compose the two tasks of mapping and filtering the collection of apples, thus avoiding any unnecessary work. This case depicts the use of reducers for composing and transforming the tasks needed to effectively reduce a collection of apples to a pie. By using reducers, we create a recipe of tasks to reduce a collection of apples to a pie and delay all processing until the final reduction, instead of dealing with collections of apples as intermediary results of each task: The following namespaces must be included in your namespace declaration for the upcoming examples. (ns my-namespace   (:require [clojure.core.reducers :as r])) The clojure.core.reducers namespace requires Java 6 with the jsr166y.jar or Java 7+ for fork/join support.  Let's now briefly explore how reducers are actually implemented. Functions that operate on sequences use the clojure.lang.ISeq interface to abstract the behavior of a collection. In the case of reducers, the common interface that we must build upon is that of a reducing function. As we mentioned earlier, a reducing function is a two-arity function in which the first argument is the result produced so far and the second argument is the current input, which has to be combined with the first argument. The process of performing a computation over a collection and producing a result can be generalized into three distinct cases. They can be described as follows: A new collection with the same number of elements as that of the collection it is supplied needs to be produced. This one-to-one case is analogous to using the map function. The computation shrinks the supplied collection by removing elements from it. This can be done using the filter function. The computation could also be expansive, in which case it produces a new collection that contains an increased number of elements. This is like what the mapcat function does. These cases depict the different ways by which a collection can be transformed into the desired result. Any computation, or reduction, over a collection can be thought of as an arbitrary sequence of such transformations. These transformations are represented by transformers, which are functions that transform a reducing function. They can be implemented as shown here in Example 3.3: Example 3.3. Transformers (defn mapping [f]   (fn [rf]     (fn [result input]       (rf result (f input)))))    (defn filtering [p?]   (fn [rf]     (fn [result input]       (if (p? input)         (rf result input)         result))))   (defn mapcatting [f]   (fn [rf]     (fn [result input]       (reduce rf result (f input))))) The mapping, filtering, and mapcatting functions in the above example Example 3.3 represent the core logic of the map, filter, and mapcat functions respectively. All of these functions are transformers that take a single argument and return a new function. The returned function transforms a supplied reducing function, represented by rf, and returns a new reducing function, created using this expression: (fn [result input] ... ). Functions returned by the mapping, filtering, and mapcatting functions are termed as reducing function transformers. The mapping function applies the f function to the current input, represented by the input variable. The value returned by the f function is then combined with the accumulated result, represented by result, using the reducing function rf. This transformer is a frighteningly pure abstraction of the standard map function that applies an f function over a collection. The mapping function makes no assumptions about the structure of the collection it is supplied, or how the values returned by the f function are combined to produce the final result. Similarly, the filtering function uses a predicate, p?, to check whether the current input of the rf reducing function must combined into the final result, represented by result. If the predicate is not true, then the reducing function will simply return the result value without any modification. The mapcatting function uses the reduce function to combine the value result with the result of the (f input) expression. In this transformer, we can assume that the f function will return a new collection and the rf reducing function will somehow combine two collections into a new one. One of the foundations of the reducers library is the CollReduce protocol defined in the clojure.core.protocols namespace. This protocol defines the behavior of a collection when it is passed as an argument to the reduce function, and it is declared as shown below Example 3.4: Example 3.4. The CollReduce protocol (defprotocol CollReduce   (coll-reduce [coll rf init])) The clojure.core.reducers namespace defines a reducer function that creates a reducible collection by dynamically extending the CollReduce protocol, as shown in this code Example 3.5: The reducer function (defn reducer   ([coll xf]    (reify      CollReduce      (coll-reduce [_ rf init]        (coll-reduce coll (xf rf) init))))) The reducer function combines a collection (coll) and a reducing function transformer (xf), which is returned by the mapping, filtering, and mapcatting functions, to produce a new reducible collection. When reduce is invoked on a reducible collection, it will ultimately ask the collection to reduce itself using the reducing function returned by the (xf rf) expression. Using this mechanism, several reducing functions can be composed into a computation that has to be performed over a given collection. Also, the reducer function needs to be defined only once, and the actual implementation of coll-reduce is provided by the collection supplied to the reducer function. Now, we can redefine the reduce function to simply invoke the coll-reduce function implemented by a given collection, as shown herein Example 3.6: Example 3.6. Redefining the reduce function (defn reduce   ([rf coll]    (reduce rf (rf) coll))   ([rf init coll]    (coll-reduce coll rf init))) As shown in the above code Example 3.6, the reduce function delegates the job of reducing a collection to the collection itself using the coll-reduce function. Also, the reduce function will use the rf reducing function to supply the init argument when it is not specified. An interesting consequence of this definition of reduce is that the rf function must produce an identity value when it is supplied no arguments. The standard reduce function even uses the CollReduce protocol to delegate the job of reducing a collection to the collection itself, but it will also fall back on the default definition of reduce if the supplied collection does not implement the CollReduce protocol. Since Clojure 1.4, the reduce function allows a collection to define how it is reduced using the clojure.core.CollReduce protocol. Clojure 1.5 introduced the clojure.core.reducers namespace, which extends the use of this protocol. All the standard Clojure collections, namely lists, vectors, sets, and maps, implement the CollReduce protocol. The reducer function can be used to build a sequence of transformations to be applied on a collection when it is passed as an argument to the reduce function. This can be demonstrated as follows: user> (r/reduce + 0 (r/reducer [1 2 3 4] (mapping inc))) 14 user> (reduce + 0 (r/reducer [1 2 3 4] (mapping inc))) 14 In this output, the mapping function is used with the inc function to create a reducing function transformer that increments all the elements in a given collection. This transformer is then combined with a vector using the reducer function to produce a reducible collection. The call to reduce in both of the above statements is transformed into the (reduce + [2 3 4 5]) expression, thus producing the result as 14. We can now redefine the map, filter, and mapcat functions using the reducer function, as shown belowin Example 3.7: Redefining the map, filter and mapcat functions using the reducer form (defn map [f coll]   (reducer coll (mapping f)))   (defn filter [p? coll]   (reducer coll (filtering p?)))   (defn mapcat [f coll]   (reducer coll (mapcatting f))) As shown in Example 3.7, the map, filter, and mapcat functions are now simply compositions of the reducer form with the mapping, filtering, and mapcatting transformers respectively. The definitions of CollReduce, reducer, reduce, map, filter, and mapcat are simplified versions of their actual definitions in the clojure.core.reducers namespace. The definitions of the map, filter, and mapcat functions shown in Example 3.7 have the same shape as the standard versions of these functions, shown as follows: user> (r/reduce + (r/map inc [1 2 3 4])) 14 user> (r/reduce + (r/filter even? [1 2 3 4])) 6 user> (r/reduce + (r/mapcat range [1 2 3 4])) 10 Hence, the map, filter, and mapcat functions from the clojure.core.reducers namespace can be used in the same way as the standard versions of these functions. The reducers library also provides a take function that can be used as a replacement for the standard take function. We can use this function to reduce the number of calls to the square-with-side-effect function (from Example 3.1) when it is mapped over a given vector, as shown below: user> (def mapped (r/map square-with-side-effect [0 1 2 3 4 5])) #'user/mapped user> (reduce + (r/take 3 mapped)) Side-effect: 0 Side-effect: 1 Side-effect: 2 Side-effect: 3 5 Thus, using the map and take functions from the clojure.core.reducers namespace as shown above avoids the application of the square-with-side-effect function over all five elements in the [0 1 2 3 4 5] vector, as only the first three are required. The reducers library also provides variants of the standard take-while, drop, flatten, and remove functions which are based on reducers. Effectively, functions based on reducers will require a lesser number of allocations than sequence-based functions, thus leading to an improvement in performance. For example, consider the process and process-with-reducer functions, shown here: Example 3.8. Functions to process a collection of numbers using sequences and reducers (defn process [nums]   (reduce + (map inc (map inc (map inc nums))))) (defn process-with-reducer [nums]   (reduce + (r/map inc (r/map inc (r/map inc nums))))) This process function in Example 3.8 applies the inc function over a collection of numbers represented by nums using the map function. The process-with-reducer function performs the same action but uses the reducer variant of the map function. The process-with-reducer function will take a lesser amount of time to produce its result from a large vector compared to the process function, as shown here: user> (def nums (vec (range 1000000))) #'user/nums user> (time (process nums)) "Elapsed time: 471.217086 msecs" 500002500000 user> (time (process-with-reducer nums)) "Elapsed time: 356.767024 msecs" 500002500000 The process-with-reducer function gets a slight performance boost as it requires a lesser number of memory allocations than the process function. The performance of this computation can be improved by a greater scale if we can somehow parallelize it. Summary In this article, we explored the clojure.core.reducers library in detail. We took a look at how reducers are implemented and also how we can use reducers to handle large collections of data in an efficient manner. Resources for Article:   Further resources on this subject: Getting Acquainted with Storm [article] Working with Incanter Datasets [article] Application Performance [article]
Read more
  • 0
  • 0
  • 4834
Unlock access to the largest independent learning library in Tech for FREE!
Get unlimited access to 7500+ expert-authored eBooks and video courses covering every tech area you can think of.
Renews at €18.99/month. Cancel anytime
article-image-building-command-line-tool
Packt
06 Jan 2016
15 min read
Save for later

Building a Command-line Tool

Packt
06 Jan 2016
15 min read
In this article Eduardo Díaz, author of the book Clojure for Java Developers, we will learn how to build a command line tool in Clojure. We'll also look to some new features in the Clojure world; we'll be discussing core.async and transducers, which are the new ways to write asynchronous programs. core.async is a very exciting method to create thousands of light threads, along with the capability to manage them. Transducers are a way to separate computation from the source of data; you can use transducers with the data flowing between the light threads or use them with a vector of data. (For more resources related to this topic, see here.) The requirements First, let's take the time to understand the requirements fully. Let's try to summarize our requirement in a single statement: We need to know all the places in a set of pages that use the same CSS selector. This seems to be a well-defined problem, but let's not forget to specify some things in order to make the best possible decisions. We want to be able to use more than one browser. We want to look for the same CSS selector in almost all the pages. It will be better to have an image of where the elements are used, instead of a text report. We want it to be a command-line app. We need to count the number of elements that a CSS selector can match in a single page and in all of them; this might help us get a sense of the classes we can change freely and the ones we should never touch. It should be written in Clojure! How can we solve the previous requirements? Java and Clojure already have a wide variety of libraries that we can use. Let's have a look at a couple of them, which we can use in this example. Automating the browser The biggest issue seems to be finding a simple way to build a cross browser. Even automating a single browser sounds like a complex task; how could we automate different browsers? You probably have heard about selenium, a library that enables us to automate a browser. It is normally used to test, but it also lets us take screenshots to lookup for certain elements and allows us to run custom JavaScript on the browser and in turn its architecture allows it to run on different browsers. It does seem like a great fit. In the modern world, you can use selenium for almost any language you want; however, it is written in Java and you can expect a first class support if you are running in the JVM. We are using Clojure and we can expect better integration with the Clojure language; for this particular project we will rely on clj-webdriver (https://github.com/semperos/clj-webdriver). It is an open source project that features an idiomatic Clojure API for selenium, called Taxi. You can find the documentation for Taxi at https://github.com/semperos/clj-webdriver/wiki/Introduction%3A-Taxi Parsing the command-line parameters We want to build a command-line app and if we want to do it in the best possible way, it is important to think of our users. Command-line users are comfortable with using their apps in a standard way. One of the most used and best-known standards to pass arguments to command-line apps is the standard GNU. There are libraries in most of the languages that help you parse command-line arguments and Clojure is no exception. Let's use the tools.cli library (https://github.com/clojure/tools.cli). The tools.cli library is a parser for the command-line arguments, it adheres to the GNU standard and makes it very easy to create a command-line interface that is safe and familiar to use. Some very interesting features that tools.cli gives you are: Instant creation of a help message that shows each option available Custom parsing of options (you can give your function to the parser, so you'll get the exact value you want) Custom validation of options. (you can give a function to the parser so it validates what you are passing to the command line) It works with Clojure script, so you can create your own Node.js with Clojure script and use tools.cli to parse the arguments! The README file in GitHub is very helpful and it can be updated to the latest version. Therefore, I recommend that you have a look to understand all the possibilities of this awesome library. We have everything we need in order to build our app, let's write it now. Getting to know the check-css Project This project has the following three main namespaces: Core Browser Page Let's check the responsibilities of each of these namespaces. The check-css.browser namespace (ns check-css.browser   (:require [clj-webdriver.taxi :as taxi]             [clojure.string :as s]             [clojure.java.io :as io]))   (defn exec-site-fn [urls f & {:keys [driver]                               :or {driver {:browser :chrome}}}]   (taxi/with-driver driver     (doseq [url urls]       (taxi/to url)       (f url)))) This is very simple code, it includes the function exec-site-fn that receives a list of urls and the optional configuration of your driver. If you don't specify a driver, it will be Chrome by default. Taxi includes a macro with-driver, which allows you to execute a procedure with a single browser in a sequential manner. We get the following benefits from this: We need the resources (memory, cpu) for one browser We don't need to coordinate parallel execution We don't need to think about closing the resources correctly (in this case, the browser) So this function just executes something for some urls using a single browser, we can think of it as just a helper function. The check-css.page namespace (ns check-css.page   (:require [clj-webdriver.taxi :as taxi]             [clojure.string :as s]             [clojure.java.io :as io]))   (defn execute-script-fn [base-path js-src selector url]   (let [path-url (s/replace url #"/" "_")         path (str base-path "/" path-url ".png")]     (taxi/execute-script (s/replace js-src #"#selector#  " selector))     (println (str "Checking site " url))     (taxi/take-screenshot :file path))) This is again a helper function, which does two things: It executes some JavaScript that you pass along (changing references to #selector to the passed selector). It takes a screenshot. How can we use this to our advantage? It is very easy to use JavaScript to mark the elements you are interested in. You can use this script as shown: var els = document.querySelectorAll('#selector#); for (var i = 0; i < els.length; i++) {     els[i].style.border = '2px solid red'; } Therefore, we just need to use everything together for this to work and we'll see that in the check-css.core namespace. The check-css.core namespace (ns check-css.core …) (def cli-options   [["-s" "--selector SELECTOR" "CSS Selector"]    ["-p" "--path PATH" "The base folder for images"]    ["-b" "--browser BROWSER" "Browser"     :default :chrome     :parse-fn keyword]    ["-h" "--help"]]) (defn find-css-usages [browser selector output-path urls]   (let [js-src (-> (io/resource "script.js") slurp)         apply-script-fn (partial p/execute-script-fn                                  output-path                                  js-src                                  selector)]     (doseq [url urls]       (b/exec-site-fn urls apply-script-fn                       :driver {:browser browser})))) (defn -main [& args]   (let [{:keys [options arguments summary]} (parse-opts args cli-options)         {:keys [browser selector path help]} options         urls arguments]     (if-not help       (find-css-usages browser selector path urls)       (exit 0 summary)))) This code looks very simple; here we can see the usage of tools.cli and the function that takes everything together, find-css-usages. This function: It reads the JavaScript file from the classpath It creates a function that only receives the url, so it is compatible with the function f in exec-site-fn. This is all that is needed to execute our program. Now we can do the following from the command line: # lein uberjar # java -jar target/uberjar/check-css-0.1.0-SNAPSHOT-standalone.jar -p . -s "input" -b chrome http://www.google.com http://www.facebook.com It creates a couple of screenshots of Google and Facebook, pointing out the elements that are inputs. Granted, we can do something more interesting with our app, but for now, let's focus on the code. There are a couple of things we want to do to this code. The first thing is that we want to have some sort of statistical record of how many elements were found, not just the screenshots. The second important thing has to do with an opportunity to learn about core.async and what's coming up next in the Clojure world. Core.async Core.async is yet another way of programming concurrently, it uses the idea of lightweight threads and channels it to communicate between them. Why lightweight threads? The lightweight threads are used in languages like go and erlang. They pride in being able to run thousands of threads in a single process. What is the difference between the lightweight threads and traditional threads? The traditional threads need to reserve memory and this also takes some time. If you want to create a couple thousand threads, you will be using a noticeable amount of memory for each thread and asking the kernel to do that also takes time. What difference do lightweight threads make? To have a couple hundred lightweight threads you only need to create a couple of threads, there is no need to reserve memory. The lightweight threads are merely a software idea. This can be achieved with most languages and Clojure adds first class support (without changing the language, this is part of the lisp power) using core.async! Let's have a look of how it works. There are two concepts that you need to keep in mind: Goblocks: They are the lightweight threads. Channels: The channels are a way to communicate between various goblocks, you can think of them as queues. Goblocks can publish a message to the channel and other goblocks can take a message from them. Just as there are integration patterns for queues, there are integration patterns for channels, where you will find concepts similar to broadcasting, filtering, and mapping. Now, let's play a little with each of them so you can understand how to use them for our program. Goblocks You will find goblocks in the clojure.core.async namespace. Goblocks are extremely easy to use, you need the go macro and you will do something similar to this: (ns test   (:require [clojure.core.async :refer [go]]))   (go   (println "Running in a goblock!")) They are similar to threads; you just need to remember that you can create goblocks freely. There can be thousands of running goblocks in a single JVM. Channels You can actually use anything you like to communicate between goblocks, but it is recommended that you use channels. Channels have two main operations namely, putting and getting. Let's see how to do it: (ns test   (:require [clojure.core.async :refer [go chan >! <!]]))   (let [c (chan)]   (go (println (str "The data in the channel is" (<! c))))   (go (>! c 6))) That's it! It looks pretty simple, as you can see there are three main functions that we are using with channels: chan: This function creates a channel; the channels can store messages in a buffer and if you want that functionality you should just pass the size of the buffer to the chan function. If no size is specified, the channel can store only one message. >!: The put function must be used within a goblock; it receives a channel and the value you want to publish to it. This function is blocking; if a channel's buffer is already full, it will block until something is consumed from the channel. <! The take function must be used within a goblock, it receives the channel you are taking from. It is blocking, if you haven't published something in the channel it will park until there's data available. There are lots of other functions that you can use with channels, for now let's add two related functions that you will probably use soon >!!: The blocking put, it works exactly the same as the put function, except it can be used from anywhere. Remember, if a channel cannot take more data this function will block the entire thread from where it runs. <!!: The blocking take, it works exactly as the take function, except you can use this from anywhere not just inside goblocks. Just keep in mind that this blocks the thread where it runs until there's data available If you look into the core.async API docs, (http://clojure.github.io/core.async/) you will find a fair amount of functions. Some of them look similar to the functions that give you functionalities similar to queues, let's look at the broadcast function. (ns test   (:require [clojure.core.async.lab :refer [broadcast]]             [clojure.core.async :refer [chan <! >!! go-loop]])   (let [c1 (chan 5)       c2 (chan 5)       bc (broadcast c1 c2)]   (go-loop []     (println "Getting from the first channel" (<! c1))     (recur))   (go-loop []     (println "Getting from the second channel" (<! C2))     (recur))   (>!! bc 5)   (>!! bc 9)) With this you can now publish to several channels at the same time, this is helpful to subscribe multiple processes to a single source of events, with a great amount of separation of concerns. If you take a good look, you will also find familiar functions over there: map, filter, and reduce. Depending of the version of core.async, some of these functions could not be there anymore. Why are these functions there? Those functions are for modifying collections of data, right? The reason is that there has been a good amount of effort towards using channels as higher-level abstractions. The idea is to see channels as collections of events, if you think of them that way it's easy to see that you can create a new channel by mapping every element of an old channel, or you can create a new channel by filtering away some elements. In recent versions of Clojure, the abstraction has become even more noticeable with transducers. Transducers Transducers are a way to separate the computations from the input source, simply they are a way to apply a sequence of steps to a sequence or a channel. Let's look at an example for a sequence. (let [odd-counts (comp (map count)                        (filter odd?))       vs [[1 2 3 4 5 6]           [:a :c :d :e]           [:test]]]   (sequence odd-counts vs)) comp feels similar to the threading macros, it composes functions and stores the steps of the computation. The interesting part is that we can use this same odd-counts transformation with a channel, as shown: (let [odd-counts (comp (map count)                        (filter odd?))       input (chan)       output (chan 5 odd-counts)]   (go-loop []     (let [x (<! output)]       (println x))       (recur))   (>!! input [1 2 3 4 5 6])   (>!! input [:a :c :d :e])   (>!! input [:test])) This is quite interesting and now you can use this to understand how to improve the code of the check-css program. The main thing we'll gain (besides learning core.async and how to use transducers) is the visibility and separation of concerns; if we have channels publishing events, then it becomes extremely simple to add new subscribers without changing anything. Summary In this article we have learned how to build a command line tool, what are the requirements we needed to build the tool, how we can use this in different projects. Resources for Article: Further resources on this subject: Big Data [article] Implementing a Reusable Mini-firewall Using Transducers [article] Developing a JavaFX Application for iOS [article]
Read more
  • 0
  • 0
  • 2488

article-image-raster-calculations
Packt
06 Jan 2016
8 min read
Save for later

Raster Calculations

Packt
06 Jan 2016
8 min read
In this article by Alexander Bruy, author of the book, QGIS 2 Cookbook, we will see some of the most common operations related to Digital Elevation Models (DEM). (For more resources related to this topic, see here.) Calculating a hillshade layer A hillshade layer is commonly used to enhance the appearance of a map, and it can be computed from a DEM. This recipe shows how to compute it. Getting ready Open the dem_to_prepare.tif layer. This layer contains a DEM in EPSG:4326 CRS and elevation data in feet. These characteristics are unsuitable to runmost terrain analysis algorithms, so we will modify this layer to get a suitable algorithm. How to do it... In the Processing toolbox, find the Hillshade algorithm and double-click on it to open it, as shown in the following screenshot: Select the DEM in the Input layer field. Leave the rest of the parameters with their default values. Click on Run to run the algorithm. The hillshade layer will be added to the QGIS project, as shown in the following screenshot: How it works... As in the case of the slope, the algorithm is part of the GDAL library. You will see that the parameters are quite similar to the slope case. This is because the slope is used to compute the hillshade layer. Based on the slope and the aspect of the terrain in each cell, and using the position of the sun defined by the Azimuth and Altitude fields, the algorithm computes the illumination that the cell will receive. You can try changing the values of these parameters to alter the appearance of the layer. There's more... As in the case of slope, there are alternative options to compute the hillshade. The SAGA one in the Processing toolbox has a feature that is worth mentioning. The SAGA hillshade algorithm contains a field named method. This field is used to select the method used to compute the hillshade value, and the last method available. Raytracing, differs from the other ones. In that it models the real behavior of light, making an analysis that is not local but that uses the full information of the DEM instead. This renders more precise hillshade layers, but the processing time can be notably larger. Enhancing your map view with a hillshade layer You can combine the hillshade layer with your other layers to enhance their appearance. Since you have used a DEM to compute the hillshade layer, it should be already in your QGIS project along with the hillshade itself. However, it will be covered by it since the new layers are produced by the processing. Move it to the top of the layer list, so you can see the DEM (and not the hillshade layer) and style it to something like the following screenshot: In the Properties dialog of the layer, move to the Transparency section and set the Global transparency value to 50 %, as shown in the following screenshot: Now you should see the hillshade layer through the DEM, and the combination of both of them will look like the following screenshot: Analyzing hydrology A common analysis from a DEM is to compute hydrological elements, such as the channel network or the set of watersheds. This recipe shows the steps to follow to do it. Getting ready Open the DEM that we prepared in the previous recipe. How to do it... In the Processing toolbox, find the Fill sinks algorithm and double-click on it to open that: Select the DEM in the DEM field and run the algorithm. It will generate a new filtered DEM layer. From now on, we will just use this DEM in the recipe, but not the original one. Open the Catchment Area in and select the filtered DEM in the Elevation field: Run the algorithm. It will generate a Catchment Area layer. Open the Channel network algorithm and fill it, as shown in the following screenshot: Run the algorithm. It will extract the channel network from the DEM based on Catchment Area and generate it as both raster and vector layer. Open the Watershed basins algorithm and fill it, as shown in the following screenshot: Run the algorithm. It will generate a raster layer with the watersheds calculated from the DEM and the channel network. Each watershed is a hydrological unit that represents the area that flows into a junction defined by the channel network: How it works... Starting from the DEM, the preceding described steps follow a typical workflow for hydrological analysis: First, the sinks are removed. This is a required preparation whenever you plan to do hydrological analysis. The DEM might contain sinks where a flow direction cannot be computed, which represents a problem in order to model the movement of water across those cells. Removing them solves this problem. The catchment area is computed from the DEM. The values in the catchment area layer represent the area upstream of each cell. That is, the total area in which, if water is dropped, it will eventually reach the cell. Cells with high values of catchment area will likely contain a river, whereas cell with lower values will have the overland flow. By setting a threshold on the catchment area values, we can separate the river cells (those above the threshold) from the remaining ones and extract the channel network. Finally, we compute the watersheds associated with each junction in the channel network extracted in the last step. There's more... The key parameter in the preceding workflow is the catchment area threshold. If a larger threshold is used, fewer cells will be considered as river cells, and the resulting channel network will be sparser. Because the watersheds are computed based on the channel network, it will result in a lower number of watersheds. You can try yourself with different values of the catchment area threshold. Here, you can see the result for threshold equal to 10,00,000 and 5,00,00,000. The following screenshot shows the result of threshold equal to 10,00,000: The following screenshot shows the result of threshold equal to 5,00,00,000: Note that in the previous case, with a higher threshold value, there is only one single watershed in the resulting layer. The threshold values are expressed in the units of the catchment area, which, because the cell size is assumed to be in meters, are in square meters. Calculating a topographic index Because the topography defines and influences most of the processes that take place in a given terrain, the DEM can be used to extract many different parameters that give us information about those processes. This recipe shows to calculate a popular one named the Topographic Wetness Index, which estimates the soil wetness based on the topography. Getting ready Open the DEM that we prepared in the Calculating a hillshade layer recipe. How to do it... Calculate a slope layer using the Slope, Aspect, and Curvature algorithm from the Processing toolbox. Calculate a catchment area layer using the Catchment Area algorithm from the Processing toolbox. Note that you must use a sinkless DEM, as the DEM that we generated in the previous recipe with the Fill sinks algorithm. Open the Topographic Wetness Index algorithm from the Processing toolbox and fill it, as shown in the following screenshot: Run the algorithm. It will create a layer with the Topographic Wetness Index field indicating the soil wetness in each cell: How it works... The index combines slope and catchment area, two parameters that influence the soil wetness. If the catchment area value is high, it means more water will flow into the cell thus increasing its soil wetness. A low value of slope will have a similar effect because the water that flows into the cell will not flow out of it quickly. The algorithm expects the slope to be expressed in radians. That's the reason why the Slope, Aspect, and Curvature algorithm has to be used because it produces its slope output in radians. The Slope algorithm that you will also find, which is based on the GDAL library, creates a slope layer with values expressed in degrees. You can use that layer if you convert its units by using the raster calculator. There's more... Other indices based on the same input layers can be found in different algorithm in the Processing toolbox. The Stream Power Index and the LS factor fields use the slope and catchment area as inputs as well and can be related to potential erosion. Summary In this article, we saw the working of the hillshade layer and a topographic index along with their calculation technique. We also saw how to analyze hydrology. Resources for Article: Further resources on this subject: Identifying the Best Places [article] Style Management in QGIS [article] Geolocating photos on the map [article]
Read more
  • 0
  • 0
  • 2374

article-image-forensics-recovery
Packt
05 Jan 2016
6 min read
Save for later

Forensics Recovery

Packt
05 Jan 2016
6 min read
In this article by Bhanu Birani and Mayank Birani, the authors of the book, IOS Forensics Cookbook, we have discussed Forensics recovery; also, how it is important, when in some investigation cases there is a need of decrypting the information from the iOS devices. These devices are in an encrypted form usually. In this article, we will focus on various tools and scripts, which can be used to read the data from the devices under investigation. We are going to cover the following topics: DFU and Recovery mode Extracting iTunes backup (For more resources related to this topic, see here.) DFU and Recovery Mode In this section we'll cover both the DFU mode and the Recovery mode separately. DFU mode In this section, we will see how to launch the DFU mode, but before that we see what DFU means. DFU stands for Device Firmware Upgrade, which means this mode is used specifically while iOS upgrades. This is a mode where device can be connected with iTunes and still do not load iBoot boot loader. Your device screen will be completely black in DFU mode because neither the boot loader nor the operating system is loaded. DFU bypasses the iBoot so that you can downgrade your device. How to do it... We need to follow these steps in order to launch a device in DFU mode: Turn off your device. Connect your device to the computer. Press your Home button and the Power button, together, for 10 seconds. Now, release the Power button and keep holding the Home button till your computer detects the device that is connected. After sometime, iTunes should detect your device. Make sure that your phone does not show any Restore logo on the device, if it does, then you are in Recovery mode, not in DFU. Once your DFU operations are done, you can hold the Power and Home buttons till you see the Apple logo in order to return to the normal functioning device. This is the easiest way to recover a device from a faulty backup file. Recovery mode In this section, you will learn about the Recovery mode of our iOS devices. To dive deep into the Recovery mode, we fist need to understand a few basics such as which boot loader is been used by iOS devices, how the boot takes place, and so on. We will explore all such concepts in order to simplify the understanding of the Recovery mode. All iOS devices use the iBoot boot loader in order to load the operating systems. The iBoot's state, which is used for recovery and restore purposes, is called Recovery mode. iOS cannot be downgraded in this state as the iBoot is loaded. iBoot also prevents any other custom firmware to flash into device unless it is a jailbreak, that is, "pwned". How to do it... The following are the detailed steps to launch the Recovery mode on any iOS device: You need to turn off your iOS device in order to launch the Recovery mode. Disconnect all the cables from the device and remove it from the dock if it is connected. Now, while holding the Home button, connect your iOS device to the computer using the cable. Hold the Home button till you see the Connect to iTunes screen. Once you see the screen, you have entered the Recovery mode. Now you will receive a popup in your Mac saying "iTunes has detected your iDevice in recovery mode". Now you can use iTunes to restore the device in the Recovery mode. Make sure your data is backed up because the recovery will restore the device to Factory Settings. You can later restore from the backup as well. Once your Recovery mode operations are complete, you will need to escape from the Recovery mode. To escape, just press the power button and the home button concurrently for 10-12 seconds. Extracting iTunes backup Extracting the logical information from the iTunes backup is crucial for forensics investigation. There is a full stack of tools available for extracting data from the iTunes backup. They come in a wide variety, distributed from open source to paid tools. Some of these forensic tools are Oxygen Forensics Suite, Access Data MPE+, EnCase, iBackup Bot, DiskAid, and so on. The famous open source tools are iPhone backup analyzer and iPhone analyzer. In this section, we are going to learn how to use the iPhone backup extractor tools. How to do it... The iPhone backup extractor is an open source forensic tool, which can extract information from device backups. However, there is one constraint that the backup should be created from iTunes 10 onwards. Follow these steps to extract data from iTunes backup: Download the iPhone backup extractor from http://supercrazyawesome.com/. Make sure that all your iTunes backup is located at this directory: ~/Library/ApplicationSupports/MobileSync/Backup. In case you don't have the required backup at this location, you can also copy paste it. The application will prompt after it is launched. The prompt should look similar to the following screenshot: Now tap on the Read Backups button to read the backup available at ~/Library/ApplicationSupports/MobileSync/Backup. Now, you can choose any option as shown here: This tool also allows you to extract data for an individual application and enables you to read the iOS file system backup. Now, you can select the file you want to extract. Once the file is selected, click on Extract. You will be get a popup asking for the destination directory. This complete process should look similar to the following screenshot: There are various other tools similar to this; iPhone Backup Browser is one of them, where you can view your decrypted data stored in your backup files. This tool supports only Windows operating system as of now. You can download this software from http://code.google.com/p/iphonebackupbrowser/. Summary In this article, we covered how to launch the DFU and the DFU and the Recovery modes. We also learned to extract the logical information from the iTunes backup using the iPhone backup extractor tool. Resources for Article: Further resources on this subject: Signing up to be an iOS developer [article] Exploring Swift [article] Introduction to GameMaker: Studio [article]
Read more
  • 0
  • 0
  • 17242

article-image-nsx-core-components
Packt
05 Jan 2016
16 min read
Save for later

NSX Core Components

Packt
05 Jan 2016
16 min read
In this article by Ranjit Singh Thakurratan, the author of the book, Learning VMware NSX, we have discussed some of the core components of NSX. The article begins with a brief introduction of the NSX core components followed by a detailed discussion of these core components. We will go over three different control planes and see how each of the NSX core components fit in this architecture. Next, we will cover the VXLAN architecture and the transport zones that allow us to create and extend overlay networks across multiple clusters. We will also look at NSX Edge and the distributed firewall in greater detail and take a look at the newest NSX feature of multi-vCenter or cross-vCenterNSX deployment. By the end of this article, you will have a thorough understanding of the NSX core components and also their functional inter-dependencies. In this article, we will cover the following topics: An introduction to the NSX core components NSX Manager NSX Controller clusters VXLAN architecture overview Transport zones NSX Edge Distributed firewall Cross-vCenterNSX (For more resources related to this topic, see here.) An introduction to the NSX core components The foundational core components of NSX are divided across three different planes. The core components of a NSX deployment consist of a NSX Manager, Controller clusters, and hypervisor kernel modules. Each of these are crucial for your NSX deployment; however, they are decoupled to a certain extent to allow resiliency during the failure of multiple components. For example if your controller clusters fail, your virtual machines will still be able to communicate with each other without any network disruption. You have to ensure that the NSX components are always deployed in a clustered environment so that they are protected by vSphere HA. The high-level architecture of NSX primarily describes three different planes wherein each of the core components fit in. They are the Management plane, the Control plane, and the Data plane. The following figure represents how the three planes are interlinked with each other. The management plane is how an end user interacts with NSX as a centralized access point, while the data plane consists of north-south or east-west traffic. Let's look at some of the important components in the preceding figure: Management plane: The management plane primarily consists of NSX Manager. NSX Manager is a centralized network management component and primarily allows a single management point. It also provides the REST API that a user can use to perform all the NSX functions and actions. During the deployment phase, the management plane is established when the NSX appliance is deployed and configured. This management plane directly interacts with the control plane and also with the data plane. The NSX Manager is then managed via the vSphere web client and CLI. The NSX Manager is configured to interact with vSphere and ESXi, and once configured, all of the NSX components are then configured and managed via the vSphere web GUI. Control plane: The control plane consists of the NSX Controller that manages the state of virtual networks. NSX Controllers also enable overlay networks (VXLAN) that are multicast-free and make it easier to create new VXLAN networks without having to enable multicast functionality on physical switches. The controllers also keep track of all the information about the virtual machines, hosts, and VXLAN networks and can perform ARP suppression as well. No data passes through the control plane, and a loss of controllers does not affect network functionality between virtual machines. Overlay networks and VXLANs can be used interchangeably. They both represent L2 over L3 virtual networks. Data plane: The NSX data plane primarily consists of NSX logical switch. The NSX logical switch is a part of the vSphere distributed switch and is created when a VXLAN network is created. The logical switch and other NSX services such as logical routing and logical firewall are enabled at the hypervisor kernel level after the installation of hypervisor kernel modules (VIBs). This logical switch is the key to enabling overlay networks that are able to encapsulate and send traffic over existing physical networks. It also allows gateway devices that allow L2 bridging between virtual and physical workloads.The data plane receives its updates from the control plane as hypervisors maintain local virtual machines and VXLAN (Logical switch) mapping tables as well. A loss of data plane will cause a loss of the overlay (VXLAN) network, as virtual machines that are part of a NSX logical switch will not be able to send and receive data. NSX Manager NSX Manager, once deployed and configured, can deploy Controller cluster appliances and prepare the ESXi host that involves installing various vSphere installation bundles (VIB) that allow network virtualization features such as VXLAN, logical switching, logical firewall, and logical routing. NSX Manager can also deploy and configure Edge gateway appliances and its services. The NSX version as of this writing is 6.2 that only supports 1:1 vCenter connectivity. NSX Manager is deployed as a single virtual machine and relies on VMware's HA functionality to ensure its availability. There is no NSX Manager clustering available as of this writing. It is important to note that a loss of NSX Manager will lead to a loss of management and API access, but does not disrupt virtual machine connectivity. Finally, the NSX Manager's configuration UI allows an administrator to collect log bundles and also to back up the NSX configuration. NSX Controller clusters NSX Controller provides a control plane functionality to distribute Logical Routing, VXLAN network information to the underlying hypervisor. Controllers are deployed as Virtual Appliances, and they should be deployed in the same vCenter to which NSX Manager is connected. In a production environment, it is recommended to deploy minimum three controllers. For better availability and scalability, we need to ensure that DRS ant-affinity rules are configured to deploy Controllers on a separate ESXI host. The control plane to management and data plane traffic is secured by a certificate-based authentication. It is important to note that controller nodes employ a scale-out mechanism, where each controller node uses a slicing mechanism that divides the workload equally across all the nodes. This renders all the controller nodes as Active at all times. If one controller node fails, then the other nodes are reassigned the tasks that were owned by the failed node to ensure operational status. The VMware NSX Controller uses a Paxos-based algorithm within the NSX Controller cluster. The Controller removes dependency on multicast routing/PIM in the physical network. It also suppresses broadcast traffic in VXLAN networks. The NSX version 6.2 only supports three controller nodes. VXLAN architecture overview One of the most important functions of NSX is enabling virtual networks. These virtual networks or overlay networks have become very popular due to the fact that they can leverage existing network infrastructure without the need to modify it in any way. The decoupling of logical networks from the physical infrastructure allows users to scale rapidly. Overlay networks or VXLAN was developed by a host of vendors that include Arista, Cisco, Citrix, Red Hat, and Broadcom. Due to this joint effort in developing its architecture, it allows the VXLAN standard to be implemented by multiple vendors. VXLAN is a layer 2 over layer 3 tunneling protocol that allows logical network segments to extend on routable networks. This is achieved by encapsulating the Ethernet frame with additional UPD, IP, and VXLAN headers. Consequently, this increases the size of the packet by 50 bytes. Hence, VMware recommends increasing the MTU size to a minimum of 1600 bytes for all the interfaces in the physical infrastructure and any associated vSwitches. When a virtual machine generates traffic meant for another virtual machine on the same virtual network, the hosts on which these source and destination virtual machines run are called VXLAN Tunnel End Point (VTEP). VTEPs are configured as separate VM Kernel interfaces on the hosts. The outer IP header block in the VXLAN frame contains the source and the destination IP addresses that contain the source hypervisor and the destination hypervisor. When a packet leaves the source virtual machine, it is encapsulated at the source hypervisor and sent to the target hypervisor. The target hypervisor, upon receiving this packet, decapsulates the Ethernet frame and forwards it to the destination virtual machine. Once the ESXI host is prepared from NSX Manager, we need to configure VTEP. NSX supports multiple VXLAN vmknics per host for uplink load balancing features. In addition to this, Guest VLAN tagging is also supported. A sample packet flow We face a challenging situation when a virtual machine generates traffic—Broadcast, Unknown Unicast, or Multicast (BUM)—meant for another virtual machine on the same virtual network (VNI) on a different host. Control plane modes play a crucial factor in optimizing the VXLAN traffic depending on the modes selected for the Logical Switch/Transport Scope: Unicast Hybrid Multicast By default, a Logical Switch inherits its replication mode from the transport zone. However, we can set this on a per-Logical-Switch basis. Segment ID is needed for Multicast and Hybrid Modes. The following is a representation of the VXLAN-encapsulated packet showing the VXLAN headers: As indicated in the preceding figure, the outer IP header identifies the source and the destination VTEPs. The VXLAN header also has the Virtual Network Identifier (VNI) that is a 24-bit unique network identifier. This allows the scaling of virtual networks beyond the 4094 VLAN limitation placed by the physical switches. Two virtual machines that are a part of the same virtual network will have the same virtual network identifier, similar to how two machines on the same VLAN share the same VLAN ID. Transport zones A group of ESXi hosts that are able to communicate with one another over the physical network by means of VTEPs are said to be in the same transport zone. A transport zone defines the extension of a logical switch across multiple ESXi clusters that span across multiple virtual distributed switches. A typical environment has more than one virtual distributed switch that spans across multiple hosts. A transport zone enables a logical switch to extend across multiple virtual distributed switches, and any ESXi host that is a part of this transport zone can have virtual machines as a part of that logical network. A logical switch is always created as part of a transport zone and ESXi hosts can participate in them. The following is a figure that shows a transport zone that defines the extension of a logical switch across multiple virtual distributed switches: NSX Edge Services Gateway The NSX Edge Services Gateway (ESG) offers a feature rich set of services that include NAT, routing, firewall, load balancing, L2/L3 VPN, and DHCP/DNS relay. NSX API allows each of these services to be deployed, configured, and consumed on-demand. The ESG is deployed as a virtual machine from NSX Manager that is accessed using the vSphere web client. Four different form factors are offered for differently-sized environments. It is important that you factor in enough resources for the appropriate ESG when building your environment. The ESG can be deployed in different sizes. The following are the available size options for an ESG appliance: X-Large: The X-large form factor is suitable for high performance firewall, load balancer, and routing or a combination of multiple services. When an X-large form factor is selected, the ESG will be deployed with six vCPUs and 8GB of RAM. Quad-Large: The Quad-large form factor is ideal for a high performance firewall. It will be deployed with four vCPUs and 1GB of RAM. Large: The large form factor is suitable for medium performance routing and firewall. It is recommended that, in production, you start with the large form factor. The large ESG is deployed with two vCPUs and 1GB of RAM. Compact: The compact form factor is suitable for DHCP and DNS replay functions. It is deployed with one vCPU and 512MB of RAM. Once deployed, a form factor can be upgraded by using the API or the UI. The upgrade action will incur an outage. Edge gateway services can also be deployed in an Active/Standby mode to ensure high availability and resiliency. A heartbeat network between the Edge appliances ensures state replication and uptime. If the active gateway goes down and the "declared dead time" passes, the standby Edge appliance takes over. The default declared dead time is 15 seconds and can be reduced to 6 seconds. Let's look at some of the Edge services as follows: Network Address Translation: The NSX Edge supports both source and destination NAT and NAT is allowed for all traffic flowing through the Edge appliance. If the Edge appliance supports more than 100 virtual machines, it is recommended that a Quad instance be deployed to allow high performance translation. Routing: The NSX Edge allows centralized routing that allows the logical networks deployed in the NSX domain to be routed to the external physical network. The Edge supports multiple routing protocols including OSPF, iBGP, and eBGP. The Edge also supports static routing. Load balancing: The NSX Edge also offers a load balancing functionality that allows the load balancing of traffic between the virtual machines. The load balancer supports different balancing mechanisms including IP Hash, least connections, URI-based, and round robin. Firewall: NSX Edge provides a stateful firewall functionality that is ideal for north-south traffic flowing between the physical and the virtual workloads behind the Edge gateway. The Edge firewall can be deployed alongside the hypervisor kernel-based distributed firewall that is primarily used to enforce security policies between workloads in the same logical network. L2/L3VPN: The Edge also provides L2 and L3 VPNs that makes it possible to extend L2 domains between two sites. An IPSEC site-to-site connectivity between two NSX Edges or other VPN termination devices can also be set up. DHCP/DNS relay: NSX Edge also offers DHCP and DNS relay functions that allows you to offload these services to the Edge gateway. Edge only supports DNS relay functionality and can forward any DNS requests to the DNS server. The Edge gateway can be configured as a DHCP server to provide and manage IP addresses, default gateway, DNS servers and, search domain information for workloads connected to the logical networks. Distributed firewall NSX provides L2-L4stateful firewall services by means of a distributed firewall that runs in the ESXi hypervisor kernel. Because the firewall is a function of the ESXi kernel, it provides massive throughput and performs at a near line rate. When the ESXi host is initially prepared by NSX, the distributed firewall service is installed in the kernel by deploying the kernel VIB – VMware Internetworking Service insertion platform or VSIP. VSIP is responsible for monitoring and enforcing security policies on all the traffic flowing through the data plane. The distributed firewall (DFW) throughput and performance scales horizontally as more ESXi hosts are added. DFW instances are associated to each vNIC, and every vNIC requires one DFW instance. A virtual machine with 2 vNICs has two DFW instances associated with it, each monitoring its own vNIC and applying security policies to it. DFW is ideally deployed to protect virtual-to-virtual or virtual-to-physical traffic. This makes DFW very effective in protecting east-west traffic between workloads that are a part of the same logical network. DFW policies can also be used to restrict traffic between virtual machines and external networks because it is applied at the vNIC of the virtual machine. Any virtual machine that does not require firewall protection can be added to the exclusion list. A diagrammatic representation is shown as follows: DFW fully supports vMotion and the rules applied to a virtual machine always follow the virtual machine. This means any manual or automated vMotion triggered by DRS does not cause any disruption in its protection status. The VSIP kernel module also adds spoofguard and traffic redirection functionalities as well. The spoofguard function maintains a VM name and IP address mapping table and prevents against IP spoofing. Spoofguard is disabled by default and needs to be manually enabled per logical switch or virtual distributed switch port group. Traffic redirection allows traffic to be redirected to a third-party appliance that can do enhanced monitoring, if needed. This allows third-party vendors to be interfaced with DFW directly and offer custom services as needed. Cross-vCenterNSX With NSX 6.2, VMware introduced an interesting feature that allows you to manage multiple vCenterNSX environments using a primary NSX Manager. This allows to have easy management and also enables lots of new functionalities including extending networks and other features such as distributed logical routing. Cross-vCenterNSX deployment also allows centralized management and eases disaster recovery architectures. In a cross-vCenter deployment, multiple vCenters are all paired with their own NSX Manager per vCenter. One NSX Manager is assigned as the primary while other NSX Managers become secondary. This primary NSX Manager can now deploy a universal controller cluster that provides the control plane. Unlike a standalone vCenter-NSX deployment, secondary NSX Managers do not deploy their own controller clusters. The primary NSX Manager also creates objects whose scope is universal. This means that these objects extend to all the secondary NSX Managers. These universal objects are synchronized across all the secondary NSX Managers and can be edited and changed by the primary NSX Manager only. This does not prevent you from creating local objects on each of the NSX Managers. Similar to local NSX objects, a primary NSX Manager can create global objects such as universal transport zones, universal logical switches, universal distributed routers, universal firewall rules, and universal security objects. There can be only one universal transport zone in a cross-vCenterNSX environment. After it is created, it is synchronized across all the secondary NSX Managers. When a logical switch is created inside a universal transport zone, it becomes a universal logical switch that spans layer 2 network across all the vCenters. All traffic is routed using the universal logical router, and any traffic that needs to be routed between a universal logical switch and a logical switch (local scope) requires an ESG. Summary We began the article with a brief introduction of the NSX core components and looked at the management, control, and the data plane. We then discussed NSX Manager and the NSX Controller clusters. This was followed by a VXLAN architecture overview discussion, where we looked at the VXLAN packet. We then discussed transport zones and NSX Edge gateway services. We ended the article with NSX Distributed firewall services and also an overview of Cross-vCenterNSX deployment. Resources for Article: Further resources on this subject: vRealize Automation and the Deconstruction of Components [article] Monitoring and Troubleshooting Networking [article] Managing Pools for Desktops [article]
Read more
  • 0
  • 0
  • 11229
article-image-flexible-layouts-swift-and-uistackview-0
Milton Moura
04 Jan 2016
12 min read
Save for later

Flexible Layouts with Swift and UIStackview

Milton Moura
04 Jan 2016
12 min read
In this post we will build a Sign In and Password Recovery form with a single flexible layout, using Swift and the UIStackView class, which has been available since the release of the iOS 9 SDK. By taking advantage of UIStackView's properties, we will dynamically adapt to the device's orientation and show / hide different form components with animations. The source code for this post can the found in this github repository. Auto Layout Auto Layout has become a requirement for any application that wants to adhere to modern best practices of iOS development. When introduced in iOS 6, it was optional and full visual support in Interface Builder just wasn't there. With the release of iOS 8 and the introduction of Size Classes, the tools and the API improved but you could still dodge and avoid Auto Layout. But now, we are at a point where, in order to fully support all device sizes and split-screen multitasking on the iPad, you must embrace it and design your applications with a flexible UI in mind. The problem with Auto Layout Auto Layout basically works as an linear equation solver, taking all of the constraints defined in your views and subviews, and calculates the correct sizes and positioning for them. One disadvantage of this approach is that you are obligated to define, typically, between 2 to 6 constraints for each control you add to your view. With different constraint sets for different size classes, the total number of constraints increases considerably and the complexity of managing them increases as well. Enter the Stack View In order to reduce this complexity, the iOS 9 SDK introduced the UIStackView, an interface control that serves the single purpose of laying out collections of views. A UIStackView will dynamically adapt its containing views' layout to the device's current orientation, screen sizes and other changes in its views. You should keep the following stack view properties in mind: The views contained in a stack view can be arranged either Vertically or Horizontally, in the order they were added to the arrangedSubviews array. You can embed stack views within each other, recursively. The containing views are laid out according to the stack view's [distribution](...) and [alignment](...) types. These attributes specify how the view collection is laid out across the span of the stack view (distribution) and how to align all subviews within the stack view's container (alignment). Most properties are animatable and inserting / deleting / hiding / showing views within an animation block will also be animated. Even though you can use a stack view within an UIScrollView, don't try to replicate the behaviour of an UITableView or UICollectionView, as you'll soon regret it. Apple recommends that you use UIStackView for all cases, as it will seriously reduce constraint overhead. Just be sure to judiciously use compression and content hugging priorities to solve possible layout ambiguities. A Flexible Sign In / Recover Form The sample application we'll build features a simple Sign In form, with the option for recovering a forgotten password, all in a single screen. When tapping on the "Forgot your password?" button, the form will change, hiding the password text field and showing the new call-to-action buttons and message labels. By canceling the password recovery action, these new controls will be hidden once again and the form will return to it's initial state. 1. Creating the form This is what the form will look like when we're done. Let's start by creating a new iOS > Single View Application template. Then, we add a new UIStackView to the ViewController and add some constraints for positioning it within its parent view. Since we want a full screen width vertical form, we set its axis to .Vertical, the alignment to .Fill and the distribution to .FillProportionally, so that individual views within the stack view can grow bigger or smaller, according to their content.    class ViewController : UIViewController    {        let formStackView = UIStackView()        ...        override func viewDidLoad() {            super.viewDidLoad()                       // Initialize the top-level form stack view            formStackView.axis = .Vertical            formStackView.alignment = .Fill            formStackView.distribution = .FillProportionally            formStackView.spacing = 8            formStackView.translatesAutoresizingMaskIntoConstraints = false                       view.addSubview(formStackView)                       // Anchor it to the parent view            view.addConstraints(                NSLayoutConstraint.constraintsWithVisualFormat("H:|-20-[formStackView]-20-|", options: [.AlignAllRight,.AlignAllLeft], metrics: nil, views: ["formStackView": formStackView])            )            view.addConstraints(                NSLayoutConstraint.constraintsWithVisualFormat("V:|-20-[formStackView]-8-|", options: [.AlignAllTop,.AlignAllBottom], metrics: nil, views: ["formStackView": formStackView])            )            ...        }        ...    } Next, we'll add all the fields and buttons that make up our form. We'll only present a couple of them here as the rest of the code is boilerplate. In order to refrain UIStackView from growing the height of our inputs and buttons as needed to fill vertical space, we add height constraints to set the maximum value for their vertical size.    class ViewController : UIViewController    {        ...        var passwordField: UITextField!        var signInButton: UIButton!        var signInLabel: UILabel!        var forgotButton: UIButton!        var backToSignIn: UIButton!        var recoverLabel: UILabel!        var recoverButton: UIButton!        ...               override func viewDidLoad() {            ...                       // Add the email field            let emailField = UITextField()            emailField.translatesAutoresizingMaskIntoConstraints = false            emailField.borderStyle = .RoundedRect            emailField.placeholder = "Email Address"            formStackView.addArrangedSubview(emailField)                       // Make sure we have a height constraint, so it doesn't change according to the stackview auto-layout            emailField.addConstraints(                NSLayoutConstraint.constraintsWithVisualFormat("V:[emailField(<=30)]", options: [.AlignAllTop, .AlignAllBottom], metrics: nil, views: ["emailField": emailField])             )                       // Add the password field            passwordField = UITextField()            passwordField.translatesAutoresizingMaskIntoConstraints = false            passwordField.borderStyle = .RoundedRect            passwordField.placeholder = "Password"            formStackView.addArrangedSubview(passwordField)                       // Make sure we have a height constraint, so it doesn't change according to the stackview auto-layout            passwordField.addConstraints(                 NSLayoutConstraint.constraintsWithVisualFormat("V:[passwordField(<=30)]", options: .AlignAllCenterY, metrics: nil, views: ["passwordField": passwordField])            )            ...        }        ...    } 2. Animating by showing / hiding specific views By taking advantage of the previously mentioned properties of UIStackView, we can transition from the Sign In form to the Password Recovery form by showing and hiding specific field and buttons. We do this by setting the hidden property within a UIView.animateWithDuration block.    class ViewController : UIViewController    {        ...        // Callback target for the Forgot my password button, animates old and new controls in / out        func forgotTapped(sender: AnyObject) {            UIView.animateWithDuration(0.2) { [weak self] () -> Void in                self?.signInButton.hidden = true                self?.signInLabel.hidden = true                self?.forgotButton.hidden = true                self?.passwordField.hidden = true                self?.recoverButton.hidden = false                self?.recoverLabel.hidden = false                self?.backToSignIn.hidden = false            }        }               // Callback target for the Back to Sign In button, animates old and new controls in / out        func backToSignInTapped(sender: AnyObject) {            UIView.animateWithDuration(0.2) { [weak self] () -> Void in                self?.signInButton.hidden = false                self?.signInLabel.hidden = false                self?.forgotButton.hidden = false                self?.passwordField.hidden = false                self?.recoverButton.hidden = true                self?.recoverLabel.hidden = true                self?.backToSignIn.hidden = true            }        }        ...    } 3. Handling different Size Classes Because we have many vertical input fields and buttons, space can become an issue when presenting in a compact vertical size, like the iPhone in landscape. To overcome this, we add a stack view to the header section of the form and change its axis orientation between Vertical and Horizontal, according to the current active size class.    override func viewDidLoad() {        ...        // Initialize the header stack view, that will change orientation type according to the current size class        headerStackView.axis = .Vertical        headerStackView.alignment = .Fill        headerStackView.distribution = .Fill        headerStackView.spacing = 8        headerStackView.translatesAutoresizingMaskIntoConstraints = false        ...    }       // If we are presenting in a Compact Vertical Size Class, let's change the header stack view axis orientation    override func willTransitionToTraitCollection(newCollection: UITraitCollection, withTransitionCoordinator coordinator: UIViewControllerTransitionCoordinator) {        if newCollection.verticalSizeClass == .Compact {            headerStackView.axis = .Horizontal        } else {            headerStackView.axis = .Vertical        }    } 4. The flexible form layout So, with a couple of UIStackViews, we've built a flexible form only by defining a few height constraints for our input fields and buttons, with all the remaining constraints magically managed by the stack views. Here is the end result: Conclusion We have included in the sample source code a view controller with this same example but designed with Interface Builder. There, you can clearly see that we have less than 10 constraints, on a layout that could easily have up to 40-50 constraints if we had not used UIStackView. Stack Views are here to stay and you should use them now if you are targeting iOS 9 and above. About the author Milton Moura (@mgcm) is a freelance iOS developer based in Portugal. He has worked professionally in several industries, from aviation to telecommunications and energy and is now fully dedicated to creating amazing applications using Apple technologies. With a passion for design and user interaction, he is also very interested in new approaches to software development. You can find out more at http://defaultbreak.com
Read more
  • 0
  • 0
  • 32824

article-image-assessment-planning
Packt
04 Jan 2016
12 min read
Save for later

Assessment Planning

Packt
04 Jan 2016
12 min read
In this article by Kevin Cardwell the author of the book Advanced Penetration Testing for Highly-Secured Environments - Second Edition, discusses the test environment and how we have selected the chosen platform. We will discuss the following: Introduction to advanced penetration testing How to successfully scope your testing (For more resources related to this topic, see here.) Introduction to advanced penetration testing Penetration testing is necessary to determine the true attack footprint of your environment. It may often be confused with vulnerability assessment and thus it is important that the differences should be fully explained to your clients. Vulnerability assessments Vulnerability assessments are necessary for discovering potential vulnerabilities throughout the environment. There are many tools available that automate this process so that even an inexperienced security professional or administrator can effectively determine the security posture of their environment. Depending on scope, additional manual testing may also be required. Full exploitation of systems and services is not generally in scope for a normal vulnerability assessment engagement. Systems are typically enumerated and evaluated for vulnerabilities, and testing can often be done with or without authentication. Most vulnerability management and scanning solutions provide actionable reports that detail mitigation strategies such as applying missing patches, or correcting insecure system configurations. Penetration testing Penetration testing expands upon vulnerability assessment efforts by introducing exploitation into the mix The risk of accidentally causing an unintentional denial of service or other outage is moderately higher when conducting a penetration test than it is when conducting vulnerability assessments. To an extent, this can be mitigated by proper planning, and a solid understanding of the technologies involved during the testing process. Thus, it is important that the penetration tester continually updates and refines the necessary skills. Penetration testing allows the business to understand if the mitigation strategies employed are actually working as expected; it essentially takes the guesswork out of the equation. The penetration tester will be expected to emulate the actions that an attacker would attempt and will be challenged with proving that they were able to compromise the critical systems targeted. The most successful penetration tests result in the penetration tester being able to prove without a doubt that the vulnerabilities that are found will lead to a significant loss of revenue unless properly addressed. Think of the impact that you would have if you could prove to the client that practically anyone in the world has easy access to their most confidential information! Penetration testing requires a higher skill level than is needed for vulnerability analysis. This generally means that the price of a penetration test will be much higher than that of a vulnerability analysis. If you are unable to penetrate the network you will be ensuring your clientele that their systems are secure to the best of your knowledge. If you want to be able to sleep soundly at night, I recommend that you go above and beyond in verifying the security of your clients. Advanced penetration testing Some environments will be more secured than others. You will be faced with environments that use: Effective patch management procedures Managed system configuration hardening policies Multi-layered DMZ's Centralized security log management Host-based security controls Network intrusion detection or prevention systems Wireless intrusion detection or prevention systems Web application intrusion detection or prevention systems Effective use of these controls increases the difficulty level of a penetration test significantly. Clients need to have complete confidence that these security mechanisms and procedures are able to protect the integrity, confidentiality, and availability of their systems. They also need to understand that at times the reason an attacker is able to compromise a system is due to configuration errors, or poorly designed IT architecture. Note that there is no such thing as a panacea in security. As penetration testers, it is our duty to look at all angles of the problem and make the client aware of anything that allows an attacker to adversely affect their business. Advanced penetration testing goes above and beyond standard penetration testing by taking advantage of the latest security research and exploitation methods available. The goal should be to prove that sensitive data and systems are protected even from a targeted attack, and if that is not the case, to ensure that the client is provided with the proper instruction on what needs to be changed to make it so. A penetration test is a snapshot of the current security posture. Penetration testing should be performed on a continual basis. Many exploitation methods are poorly documented, frequently hard to use, and require hands-on experience to effectively and efficiently execute. At DefCon 19 Bruce "Grymoire" Barnett provided an excellent presentation on "Deceptive Hacking". In this presentation, he discussed how hackers use many of the very same techniques used by magicians. This is exactly the tenacity that penetration testers must assume as well. Only through dedication, effort, practice, and the willingness to explore unknown areas will penetration testers be able to mimic the targeted attack types that a malicious hacker would attempt in the wild. Often times you will be required to work on these penetration tests as part of a team and will need to know how to use the tools that are available to make this process more endurable and efficient. This is yet another challenge presented to today's pentesters. Working in a silo is just not an option when your scope restricts you to a very limited testing period. In some situations, companies may use non-standard methods of securing their data, which makes your job even more difficult. The complexity of their security systems working in tandem with each other may actually be the weakest link in their security strategy. The likelihood of finding exploitable vulnerabilities is directly proportional to the complexity of the environment being tested. Before testing begins Before we commence with testing, there are requirements that must be taken into consideration. You will need to determine the proper scoping of the test, timeframes and restrictions, the type of testing (Whitebox, Blackbox), and how to deal with third-party equipment and IP space. Determining scope Before you can accurately determine the scope of the test, you will need to gather as much information as possible. It is critical that the following is fully understood prior to starting testing procedures: Who has the authority to authorize testing? What is the purpose of the test? What is the proposed timeframe for the testing? Are there any restrictions as to when the testing can be performed? Does your customer understand the difference between a vulnerability assessment and a penetration test? Will you be conducting this test with, or without cooperation of the IT Security Operations Team? Are you testing their effectiveness? Is social engineering permitted? How about denial-of-service attacks? Are you able to test physical security measures used to secure servers, critical data storage, or anything else that requires physical access? For example, lock picking, impersonating an employee to gain entry into a building, or just generally walking into areas that the average unaffiliated person should not have access to. Are you allowed to see the network documentation or to be informed of the network architecture prior to testing to speed things along? (Not necessarily recommended as this may instill doubt for the value of your findings. Most businesses do not expect this to be easy information to determine on your own.) What are the IP ranges that you are allowed to test against? There are laws against scanning and testing systems without proper permissions. Be extremely diligent when ensuring that these devices and ranges actually belong to your client or you may be in danger of facing legal ramifications. What are the physical locations of the company? This is more valuable to you as a tester if social engineering is permitted because it ensures that you are at the sanctioned buildings when testing. If time permits, you should let your clients know if you were able to access any of this information publicly in case they were under the impression that their locations were secret or difficult to find. What to do if there is a problem or if the initial goal of the test has been reached. Will you continue to test to find more entries or is the testing over? This part is critical and ties into the question of why the customer wants a penetration test in the first place. Are there legal implications that you need to be aware of such as systems that are in different countries, and so on? Not all countries have the same laws when it comes to penetration testing. Will additional permission be required once a vulnerability has been exploited? This is important when performing tests on segmented networks. The client may not be aware that you can use internal systems as pivot points to delve deeper within their network. How are databases to be handled? Are you allowed to add records, users, and so on? This listing is not all-inclusive and you may need to add items to the list depending on the requirements of your clients. Much of this data can be gathered directly from the client, but some will have to be handled by your team. If there are legal concerns, it is recommended that you seek legal counsel to ensure you fully understand the implications of your testing. It is better to have too much information than not enough, once the time comes to begin testing. In any case, you should always verify for yourself that the information you have been given is accurate. You do not want to find out that the systems you have been accessing do not actually fall under the authority of the client! It is of utmost importance to gain proper authorization in writing before accessing any of your clients systems. Failure to do so may result in legal action and possibly jail. Use proper judgment! You should also consider that errors and omissions insurance is a necessity when performing penetration testing. Setting limits–nothing lasts forever Setting proper limitations is essential if you want to be successful at performing penetration testing. Your clients need to understand the full ramifications involved, and should be made aware of any residual costs incurred, if additional services beyond those listed within the contract are needed. Be sure to set defined start and end dates for your services. Clearly define the rules of engagement and include IP ranges, buildings, hours, and so on that may need to be tested. If it is not in your rules of engagement documentation, it should not be tested. Meetings should be predefined prior to the start of testing, and the customer should know exactly what your deliverables will be. Rules of engagement documentation Every penetration test will need to start with a rules of engagement document that all involved parties must have. This document should at a minimum cover several items: Proper permissions by appropriate personnel. Begin and end dates for your testing. The type of testing that will be performed. Limitations of testing. What type of testing is permitted? DDOS? Full penetration? Social engineering? These questions need to be addressed in detail. Can intrusive tests as well as unobtrusive testing be performed? Does your client expect cleanup to be performed afterwards or is this a stage environment that will be completely rebuilt after testing has been completed? IP ranges and physical locations to be tested. How the report will be transmitted at the end of the test? (Use secure means of transmission!) Which tools will be used during the test? Do not limit yourself to only one specific tool; it may be beneficial to provide a list of the primary toolset to avoid confusion in the future. For example, we will use the tools found in the most recent edition of the Kali Suite. Let your client know how any illegal data that is found during testing would be handled: law enforcement should be contacted prior to the client. Please be sure to understand fully the laws in this regard before conducting your test. How sensitive information will be handled: you should not be downloading sensitive customer information; there are other methods of proving that the clients' data is not secured. This is especially important when regulated data is a concern. Important contact information for both your team and for the key employees of the company you are testing. An agreement of what you will do to ensure the customer's system information does not remain on unsecured laptops and desktops used during testing. Will you need to properly scrub your machine after this testing? What do you plan to do with the information you gathered? Is it to be kept somewhere for future testing? Make sure this has been addressed before you start testing, not after. The rules of engagement should contain all the details that are needed to determine the scope of the assessment. Any questions should have been answered prior to drafting your rules of engagement to ensure there are no misunderstandings once the time comes to test. Your team members need to keep a copy of this signed document on their person at all times when performing the test. Imagine you have been hired to assert the security posture of a client's wireless network and you are stealthily creeping along the parking lot on private property with your gigantic directional Wi-Fi antenna and a laptop. If someone witnesses you in this act, they will probably be concerned and call the authorities. You will need to have something on you that documents you have a legitimate reason to be there. This is one time where having the contact information of the business leaders that hired you will come in extremely handy! Summary In this article, we focused on all that is necessary to prepare and plan for a successful penetration test. We discussed the differences between penetration testing and vulnerability assessments. The steps involved with proper scoping were detailed, as were the necessary steps to ensure all information has been gathered prior to testing. One thing to remember is that proper scoping and planning is just as important as ensuring you test against the latest and greatest vulnerabilities. Resources for Article: Further resources on this subject: Penetration Testing[article] Penetration Testing and Setup[article] BackTrack 4: Security with Penetration Testing Methodology[article]
Read more
  • 0
  • 0
  • 6781

article-image-remote-sensing-and-histogram
Packt
04 Jan 2016
10 min read
Save for later

Remote Sensing and Histogram

Packt
04 Jan 2016
10 min read
In this article by Joel Lawhead, the author of Learning GeoSpatial Analysis with Python - Second Edition, we will discuss remote sensing. This field grows more exciting every day as more satellites are launched and the distribution of data becomes easier. The high availability of satellite and aerial images as well as the interesting new types of sensors that are being launched each year is changing the role that remote sensing plays in understanding our world. In this field, Python is quite capable. In remote sensing, we step through each pixel in an image and perform a type of query or mathematical process. An image can be thought of as a large numerical array and in remote sensing, these arrays can be as large as tens of megabytes to several gigabytes. While Python is fast, only C-based libraries can provide the speed that is needed to loop through the arrays at a tolerable speed. (For more resources related to this topic, see here.) In this article, whenever possible, we’ll use Python Imaging Library (PIL) for image processing and NumPy, which provides multidimensional array mathematics. While written in C for speed, these libraries are designed for Python and provide Python’s API. In this article, we’ll start with basic image manipulation and build on each exercise all the way to automatic change detection. Here are the topics that we’ll cover: Swapping image bands Creating image histograms Swapping image bands Our eyes can only see colors in the visible spectrum as combinations of red, green, and blue (RGB). Airborne and spaceborne sensors can collect wavelengths of the energy that is outside the visible spectrum. In order to view this data, we move images representing different wavelengths of light reflectance in and out of the RGB channels in order to make color images. These images often end up as bizarre and alien color combinations that can make visual analysis difficult. An example of a typical satellite image is seen in the following Landsat 7 satellite scene near the NASA's Stennis Space Center in Mississippi along the Gulf of Mexico, which is a leading space center for remote sensing and geospatial analysis in general: Most of the vegetation appears red and water appears almost black. This image is one type of false color image, meaning that the color of the image is not based on the RGB light. However, we can change the order of the bands or swap certain bands in order to create another type of false color image that looks more like the world we are used to seeing. In order to do so, you first need to download this image as a ZIP file from the following: http://git.io/vqs41 We need to install the GDAL library with Python bindings. The Geospatial Data Abstraction Library(GDAL)includes a module called gdalnumeric that loads and saves remotely sensed images to and from NumPy arrays for easy manipulation. GDAL itself is a data access library and does not provide much in the name of processing. Therefore, in this article, we will rely heavily on NumPy to actually change images. In this example, we’ll load the image in a NumPy array using gdalnumeric and then, we’ll immediately save it in a new .tiff file. However, upon saving, we’ll use NumPy’s advanced array slicing feature to change the order of the bands. Images in NumPy are multidimensional arrays in the order of band, height, and width. Therefore, an image with three bands will be an array of length three, containing an array for each band, the height, and width of the image. It’s important to note that NumPy references the array locations as y,x (row, column) instead of the usual column and row format that we work with in spreadsheets and other software: from osgeo import gdalnumeric src = "FalseColor.tif" arr = gdalnumeric.LoadFile(src) gdalnumeric.SaveArray(arr[[1, 0, 2], :], "swap.tif",format="GTiff", prototype=src) Also in the SaveArray() method, the last argument is called prototype. This argument let’s you specify another image for GDAL from which we can copy spatial reference information and some other image parameters. Without this argument, we’d end up with an image without georeferencing information, which can not be used in a geographical information system (GIS). In this case, we specified our input image file name as the images are identical, except for the band order. The result of this example produces the swap.tif image, which is a much more visually appealing image with green vegetation and blue water: There’s only one problem with this image. It’s a little dark and difficult to see. Let’s see if we can figure out why. Creating histograms A histogram shows the statistical frequency of data distribution in a dataset. In the case of remote sensing, the dataset is an image, the data distribution is the frequency of the pixels in the range of 0 to 255, which is the range of the 8-byte numbers used to store image information on computers. In an RGB image, color is represented as a three-digit tuple with (0,0,0) being black and (255,255,255) being white. We can graph the histogram of an image with the frequency of each value along the y axis and the range of 255 possible pixel values along the x axis. We can use the Turtle graphics engine that is included with Python to create a simple GIS. We can also use it to easily graph histograms. Histograms are usually a one-off product that makes a quick script, like this example, great. Also, histograms are typically displayed as a bar graph with the width of the bars representing the size of the grouped data bins. However, in an image, each bin is only one value, so we’ll create a line graph. We’ll use the histogram function in this example and create a red, green, and blue line for each band. The graphing portion of this example also defaults to scaling the y axis values to the maximum RGB frequency that is found in the image. Technically, the y axis represents the maximum frequency, that is, the number of pixels in the image, which would be the case if the image was of one color. We’ll use the turtle module again; however, this example could be easily converted to any graphical output module. However, this format makes the distribution harder to see. Let’s take a look at our swap.tif image: from osgeo import gdalnumeric import turtle as t def histogram(a, bins=list(range(0, 256))): fa = a.flat n = gdalnumeric.numpy.searchsorted(gdalnumeric.numpy.sort(fa), bins) n = gdalnumeric.numpy.concatenate([n, [len(fa)]]) hist = n[1:]-n[:-1] return hist defdraw_histogram(hist, scale=True): t.color("black") axes = ((-355, -200), (355, -200), (-355, -200), (-355, 250)) t.up() for p in axes: t.goto(p) t.down() t.up() t.goto(0, -250) t.write("VALUE", font=("Arial, ", 12, "bold")) t.up() t.goto(-400, 280) t.write("FREQUENCY", font=("Arial, ", 12, "bold")) x = -355 y = -200 t.up() for i in range(1, 11): x = x+65 t.goto(x, y) t.down() t.goto(x, y-10) t.up() t.goto(x, y-25) t.write("{}".format((i*25)), align="center") x = -355 y = -200 t.up() pixels = sum(hist[0]) if scale: max = 0 for h in hist: hmax = h.max() if hmax> max: max = hmax pixels = max label = pixels/10 for i in range(1, 11): y = y+45 t.goto(x, y) t.down() t.goto(x-10, y) t.up() t.goto(x-15, y-6) t.write("{}" .format((i*label)), align="right") x_ratio = 709.0 / 256 y_ratio = 450.0 / pixels colors = ["red", "green", "blue"] for j in range(len(hist)): h = hist[j] x = -354 y = -199 t.up() t.goto(x, y) t.down() t.color(colors[j]) for i in range(256): x = i * x_ratio y = h[i] * y_ratio x = x - (709/2) y = y + -199 t.goto((x, y)) im = "swap.tif" histograms = [] arr = gdalnumeric.LoadFile(im) for b in arr: histograms.append(histogram(b)) draw_histogram(histograms) t.pen(shown=False) t.done() Here's what the histogram for swap.tif looks similar to after running the example: As you can see, all the three bands are grouped closely towards the left-hand side of the graph and all have values that are less than 125. As these values approach zero, the image becomes darker, which is not surprising. Just for fun, let’s run the script again and when we call the draw_histogram() function, we’ll add the scale=False option to get an idea of the size of the image and provide an absolute scale. Therefore, we take the following line: draw_histogram(histograms) Change it to the following: draw_histogram(histograms, scale=False) This change will produce the following histogram graph: As you can see, it’s harder to see the details of the value distribution. However, this absolute scale approach is useful if you are comparing multiple histograms from different products that are produced from the same source image. Now that we understand the basics of looking at an image statistically using histograms, how do we make our image brighter? Performing a histogram stretch A histogram stretch operation does exactly what the name suggests. It distributes the pixel values across the whole scale. By doing so, we have more values at the higher-intensity level and the image becomes brighter. Therefore, in this example, we’ll use our histogram function; however, we’ll add another function called stretch() that takes an image array, creates the histogram, and then spreads out the range of values for each band. We’ll run these functions on swap.tif and save the result in an image called stretched.tif: import gdalnumeric import operator from functools import reduce def histogram(a, bins=list(range(0, 256))): fa = a.flat n = gdalnumeric.numpy.searchsorted(gdalnumeric.numpy.sort(fa), bins) n = gdalnumeric.numpy.concatenate([n, [len(fa)]]) hist = n[1:]-n[:-1] return hist def stretch(a): hist = histogram(a) lut = [] for b in range(0, len(hist), 256): step = reduce(operator.add, hist[b:b+256]) / 255 n = 0 for i in range(256): lut.append(n / step) n = n + hist[i+b] gdalnumeric.numpy.take(lut, a, out=a) return a src = "swap.tif" arr = gdalnumeric.LoadFile(src) stretched = stretch(arr) gdalnumeric.SaveArray(arr, "stretched.tif", format="GTiff", prototype=src) The stretch algorithm will produce the following image. Look how much brighter and visually appealing it is: We can run our turtle graphics histogram script on stretched.tif by changing the filename in the im variable to stretched.tif: im = "stretched.tif" This run will give us the following histogram: As you can see, all the three bands are distributed evenly now. Their relative distribution to each other is the same; however, in the image, they are now spread across the spectrum. Summary In this article, we covered the foundations of remote sensing including band swapping and histograms. The authors of GDAL have a set of Python examples, covering some advanced topics that may be of interest, available at https://svn.osgeo.org/gdal/trunk/gdal/swig/python/samples/. Resources for Article: Further resources on this subject: Python Libraries for Geospatial Development[article] Python Libraries[article] Learning R for Geospatial Analysis [article]
Read more
  • 0
  • 0
  • 11394
article-image-different-ir-algorithms-you-will-learn
Packt
04 Jan 2016
23 min read
Save for later

Different IR Algorithms

Packt
04 Jan 2016
23 min read
In this article written by Sudipta Mukherjee, author of the book F# for Machine Learning, we learn about how information overload is almost a passé term; however, it is still valid. Information retrieval is a big arena and most of it is far from being solved. However, that being said, we have come a long way and the results produced by some of the state of the art information retrieval algorithms are really impressive. You may not know that you are using information retrieval but whenever you search for some documents on your PC or on the internet, you are actually using the produce of some information retrieval algorithm at the background. So as the metaphor goes, finding the needle (read information/insight) in a haystack (read your data archive on your PC or on the web) is the key to successful business. Distance based: Two documents are matched based on their proximity, calculated by several distance metric on the vector representation of the document Set based: Two documents are matched based on their proximity, calculated by several set based / fuzzy set based, metric based on the bag of words (BoW) model of the document (For more resources related to this topic, see here.) What are some interesting things that you can do? You will learn how the same algorithm can find similar biscuits and identify author of digital documents from the words authors use. You will also learn how IR distance metrics can be used to group color images. Information retrieval using tf-idf Whenever you type some search term in your "Windows" search box, some documents appear matching your search term. There is a common, well-known, easy-to-implement algorithm that makes it possible to rank the documents based on the search term. Basically, the algorithm allows the developers to assign some kind of score to each document in the result set. That score can be seen as a score of confidence that the system has on how much the user would like that result. The score that this algorithm attaches with each document is a product of two different scores. The first one is called term frequency (tf) and the other one is called inverse document frequency (idf). Their product is referred to as tf-idf or "term frequency inverse document frequency". Tf is the number of times some term occurs in a given document. Idf is the ratio between the total number of documents scanned and the number of documents in which a given search term is found. However, this ration is not used as is. Log of this ration is used as idf, as shown next. The following is a term frequency and inverse term frequency example for the word "example": This is the same as: Idf is normally calculated with the following formula: The following is the code that demonstrates how to find tf-idf score for the given search terms; in this case, "example". Sentences are fabricated to match up the desired number of count of the word "example" in document 2; in this case, "sentence2". Here  denotes the set of all the documents and  denotes a second document. let sentence1 = "this is a a sample" let sentence2 = "this is another another example example example" let word = "example" let numberOfDocs = 2. let tf1 = sentence1.Split ' ' |> Array.filter ( fun t -> t = word) |> Array.length let tf2 = sentence2.Split ' ' |> Array.filter ( fun t -> t = word) |> Array.length let docs = [|sentence1;sentence2|] let foundIn = docs |> Array.map ( fun t -> t.Split ' '                                             |> Array.filter ( fun z -> z = word))                                             |> Array.filter ( fun m -> m |> Array.length <> 0)                                             |> Array.length let idf =  Operators.log10 ( numberOfDocs / float foundIn) let pr1 = float  tf1 * idf let pr2 = float tf2 * idf printfn "%f %f" pr1 pr2 This produces the following output: 0.0 ­ 0.903090 This means that the second document is more closely related with the word "example" than the first one. In this case, this is one of the extreme cases where the word doesn't appear at all in one document and appears three times in the other. However, with the same word occurring multiple times in both the documents, you will get different scores for each document. You can think of these scores as the confidence scores for association of the word and the document. More the score, more is the confidence that the document is bound to have something related to that word. Measures of similarity In the following section, you will create a framework for finding several distance measures. A distance between two probability distribution functions (pdf) is an important way to know how close two entities are. One way to generate the pdfs from histogram is to normalize the histogram. Generating a PDF from a histogram A histogram holds the number of times a value occurred. For example, a text can be represented as a histogram where the histogram values represent the number of times a word appears in the text. For gray images, it can be the number of times each gray scale appears in the image. In the following section, you will build a few modules that hold several distance metrics. A distance metric is a measure of how similar two objects are. It can also be called a measure of proximity. The following metrics use the notation of  and  to denote the ith value of either PDF. Let's say we have a histogram denoted by , and there are  elements in the histogram. Then a rough estimate of  is . The following function does this transformation from histogram to pdf: let toPdf (histogram:int list)=         histogram |> List.map (fun t -> float t / float histogram.Length ) There are a couple of important assumptions made. First, we assume that the number of bins is equal to the number of elements. So the histogram and the pdf will have the same number of elements. That's not exactly correct in mathematical sense. All the elements of a pdf should add up to 1. The following implementation of histToPdf guarantees that but it's not a good choice as it is not normalization. So resist the temptation to use this one. let histToPdf (histogram:int list)=         let sum = histogram |> List.sum         histogram |> List.map (fun t -> float t / float sum) Generating a histogram from a list of elements is simple. Following is the function that takes a list and returns the histogram. F# has the function already built in; it is called countBy. let listToHist (l : int list)=         l |> List.countBy (fun t -> t) Next is an example of how using these two functions, a list of integer can be transformed into a pdf. The following method takes a list of integers and returns the probability distribution associated: let listToPdf (aList : int list)=         aList |> List.countBy (fun t -> t)               |> List.map snd               |> toPdf Here is how you can use it: let list = [1;1;1;2;3;4;5;1;2] let pdf = list |> listToPdf I have captured the following in the F# interactive and got the following histogram from the preceding list: val it : (int * int) list = [(1, 4); (2, 2); (3, 1); (4, 1); (5, 1)] If you just project the second element for this histogram and store it in an int list, then you can represent the histogram in a list. So for this example, the histogram can be represented as: let hist = [4;2;1;1;1] Distance metrics are classified among several families based on their structural similarity. The sections following show how to work on these metrics using F# and uses histograms as int list as input.  and  denote the vectors that represent the entity being compared. For example, for the document retrieval system, these numbers might indicate the number of times a given word occurred in each of the documents that are being compared.  denotes the i element of the vector represented by  and  denotes the ith element of the vector represented by . Some literature call these vectors the profile vectors. Minkowski family As you can see, when two pdfs are almost the same then this family of distance metrics tends to be zero, and when they are further apart, they lead to positive infinity. So if the distance metric between two pdfs is close to zero then we can conclude that they are similar and if the distance is more, then we can conclude otherwise. All these formulae of these metrics are special cases of what's known as Minkowski distance. Euclidean distance The following code implements Euclidean distance. //Euclidean distance let euclidean (p:int list)(q:int list) =       List.zip p q |> List.sumBy (fun t -> float (fst t - snd t) ** 2.) City block distance The following code implements City block distance. let cityBlock (p:int list)(q:int list) =     List.zip p q |> List.sumBy (fun t -> float (fst t  - snd t)) Chebyshev distance The following code implements Chebyshev distance. let chebyshev(p:int list)(q:int list) =         List.zip p q |> List.map( fun t -> abs (fst t - snd t)) |> List.max L1 family This family of distances relies on normalization to keep the values within limit. All these metrics are of the form A/B where A is primarily a measure of proximity between the two pdfs P and Q. Most of the time A is calculated based on the absolute distance. For example, the numerator of the Sørensen distance is the City Block distance while the bottom is a normalization component that is obtained by adding each element of the two participating pdfs. Sørensen The following code implements Sørensendistance. let sorensen  (p:int list)(q:int list) =         let zipped = List.zip (p |> toPdf ) (q|>toPdf)         let numerator = zipped  |> List.sumBy (fun t -> float (fst t - snd t))         let denominator = zipped |> List.sumBy (fun t -> float (fst t + snd t))numerator / denominator Gower distance The following code implements Gowerdistance. There could be division by zero if the collection q is empty. let gower(p:int list)(q:int list)=         //I love this. Free flowing fluid conversion         //rather than cramping abs and fst t - snd t in a         //single line      let numerator = List.zip (p|>toPdf) (q |> toPdf)                          |> List.map (fun t -> fst t - snd t)                          |> List.map (fun z -> abs z)                          |> List.map float                          |> List.sum                          |> float      let denominator = float p.Length      numerator / denominator Soergel The following code implements Soergeldistance. let soergel (p:int list)(q:int list) =         let zipped = List.zip (p|>toPdf) (q |> toPdf)         let numerator =  zipped |> List.sumBy(fun t -> abs( fst t - snd t))         let denominator = zipped |> List.sumBy(fun t -> max (fst t ) (snd t))         float numerator / float denominator kulczynski d The following code implements Kulczynski ddistance. let kulczynski_d (p:int list)(q:int list) =         let zipped = List.zip (p|>toPdf) (q |> toPdf)         let numerator =  zipped |> List.sumBy(fun t -> abs( fst t - snd t))         let denominator = zipped |> List.sumBy(fun t -> min (fst t ) (snd t))         float numerator / float denominator kulczynski s The following code implements Kulczynski sdistance. let kulczynski_s (p:int list)(q:int list) = / kulczynski_d p q Canberra distance The following code implements Canberradistance. let canberra (p:int list)(q:int list) =     let zipped = List.zip (p|>toPdf) (q |> toPdf)     let numerator =  zipped |> List.sumBy(fun t -> abs( fst t - snd t))     let denominator = zipped |> List.sumBy(fun t -> fst t + snd t)     float numerator / float denominator Intersection family This family of distances tries to find the overlap between two participating pdfs. Intersection The following code implements Intersectiondistance. let intersection(p:int list ) (q: int list) =         List.zip (p|>toPdf) (q |> toPdf)             |> List.map (fun t -> min (fst t) (snd t)) |> List.sum Wave Hedges The following code implements Wave Hedgesdistance. let waveHedges (p:int list)(q:int list)=         List.zip (p|>toPdf) (q |> toPdf)             |> List.sumBy ( fun t -> 1.0 - float( min (fst t) (snd t))                                      / float (max (fst t) (snd t))) Czekanowski distance The following code implements Czekanowski distance. let czekanowski(p:int list)(q:int list) =         let zipped  = List.zip (p|>toPdf) (q |> toPdf)         let numerator = 2. * float (zipped |> List.sumBy (fun t -> min (fst t) (snd t)))         let denominator =  zipped |> List.sumBy (fun t -> fst t + snd t)         numerator / float denominator Motyka The following code implements Motyka distance.  let motyka(p:int list)(q:int list)=        let zipped = List.zip (p|>toPdf) (q |> toPdf)        let numerator = zipped |> List.sumBy (fun t -> min (fst t) (snd t))        let denominator = zipped |> List.sumBy (fun t -> fst t + snd t)        float numerator / float denominator Ruzicka The following code implements Ruzicka distance. let ruzicka (p:int list) (q:int list) =         let zipped = List.zip (p|>toPdf) (q |> toPdf)         let numerator = zipped |> List.sumBy (fun t -> min (fst t) (snd t))         let denominator = zipped |> List.sumBy (fun t -> max (fst t) (snd t))         float numerator / float denominator Inner Product family Distances belonging to this family are calculated by some product of pairwise elements from both the participating pdfs. Then this product is normalized with a value also calculated from the pairwise elements. Innerproduct The following code implements Inner-productdistance. let innerProduct(p:int list)(q:int list)=         List.zip (p|>toPdf) (q |> toPdf)             |> List.sumBy (fun t -> fst t * snd t) Harmonic mean The following code implements Harmonicdistance. let harmonicMean(p:int list)(q:int list)=        2. * (List.zip (p|>toPdf) (q |> toPdf)           |> List.sumBy (fun t -> ( fst t * snd t )/(fst t + snd t))) Cosine similarity The following code implements Cosine Similarity distance measure. let cosineSimilarity(p:int list)(q:int list)=     let zipped = List.zip p q     let prod  (x,y) = float x *  float y     let numerator = zipped |> List.sumBy prod     let denominator  =  sqrt ( p|> List.map sqr |> List.sum |> float) *                         sqrt ( q|> List.map sqr |> List.sum |> float)     numerator / denominator Kumar Hassebrook The following code implements Kumar Hassebrook distance measure.   let kumarHassebrook (p:int list) (q:int list) =         let sqr x = x * x         let zipped = List.zip (p|>toPdf) (q |> toPdf)         let numerator = zipped |> List.sumBy prod         let denominator =  (p |> List.sumBy sqr) +                            (q |> List.sumBy sqr) - numerator           numerator / denominator Dice coefficient The following code implements Dice coefficient.   let dicePoint(p:int list)(q:int list)=             let zipped = List.zip (p|>toPdf) (q |> toPdf)         let numerator = zipped |> List.sumBy (fun t -> fst t * snd t)         let denominator  =  (p |> List.sumBy sqr)  +                             (q |> List.sumBy sqr)           float numerator / float denominator Fidelity family or squared-chord family This family of distances uses square root as an instrument to keep the distance within limit. Sometimes other functions, such as log, are also used. Fidelity The following code implements FidelityDistance measure.   let fidelity(p:int list)(q:int list)=               List.zip (p|>toPdf) (q |> toPdf)|> List.map  prod                                         |> List.map sqrt                                         |> List.sum Bhattacharya The following code implements Bhattacharya distance measure. let bhattacharya(p:int list)(q:int list)=         -log (fidelity p q) Hellinger The following code implements Hellingerdistance measure. let hellinger(p:int list)(q:int list)=             let product = List.zip (p|>toPdf) (q |> toPdf)                          |> List.map prod |> List.sumBy sqrt        let right = 1. -  product        2. * right |> abs//taking this off will result in NaN                   |> float                   |> sqrt Matusita The following code implements Matusita distance measure. let matusita(p:int list)(q:int list)=         let value =2. - 2. *( List.zip (p|>toPdf) (q |> toPdf)                             |> List.map prod                             |> List.sumBy sqrt)         value |> abs |> sqrt Squared Chord The following code implements Squared Chord distance measure. let squarredChord(p:int list)(q:int list)=         List.zip (p|>toPdf) (q |> toPdf)             |> List.sumBy (fun t -> sqrt (fst t ) - sqrt (snd t)) Squared L2 family This is almost the same as the L1 family, just that it got rid of the expensive square root operation and relies on the squares instead. However, that should not be an issue. Sometimes the squares can be quite large so a normalization scheme is provided by dividing the result of the squared sum with another square sum, as done in "Divergence". Squared Euclidean The following code implements Squared Euclidean distance measure. For most purpose this can be used instead of Euclidean distance as it is computationally cheap and performs as well. let squaredEuclidean (p:int list)(q:int list)=        List.zip (p|>toPdf) (q |> toPdf)            |> List.sumBy (fun t-> float (fst t - snd t) ** 2.0) Squared Chi The following code implements Squared Chi distance measure. let squaredChi(p:int list)(q:int list)=         List.zip (p|>toPdf) (q |> toPdf)            |> List.sumBy (fun t -> (fst t - snd t ) ** 2.0 / (fst t + snd t)) Pearson's Chi The following code implements Pearson’s Chidistance measure. let pearsonsChi(p:int list)(q:int list)=         List.zip (p|>toPdf) (q |> toPdf)            |> List.sumBy (fun t -> (fst t - snd t ) ** 2.0 / snd t) Neyman's Chi The following code implements Neyman’s Chi distance measure. let neymanChi(p:int list)(q:int list)=         List.zip (p|>toPdf) (q |> toPdf)            |> List.sumBy (fun t -> (fst t - snd t ) ** 2.0 / fst t) Probabilistic Symmetric Chi The following code implements Probabilistic Symmetric Chidistance measure. let probabilisticSymmetricChi(p:int list)(q:int list)=        2.0 * squaredChi p q Divergence The following code implements Divergence measure. This metric is useful when the elements of the collections have elements in different order of magnitude. This normalization will make the distance properly adjusted for several kinds of usages. let divergence(p:int list)(q:int list)=         List.zip (p|>toPdf) (q |> toPdf)             |> List.sumBy (fun t -> (fst t - snd t) ** 2. / (fst t + snd t) ** 2.) Clark The following code implements Clark’s distance measure. let clark(p:int list)(q:int list)=               sqrt( List.zip (p|>toPdf) (q |> toPdf)         |> List.map (fun t ->  float( abs( fst t - snd t))                                / (float (fst t + snd t)))           |> List.sumBy (fun t -> t * t)) Additive Symmetric Chi The following code implements Additive Symmetric Chi distance measure. let additiveSymmetricChi(p:int list)(q:int list)=         List.zip (p|>toPdf) (q |> toPdf)             |> List.sumBy (fun t -> (fst t - snd t ) ** 2. * (fst t + snd t)/ (fst t * snd t))  Summary Congratulations!! you learnt how different similarity measures work and when to use which one, to find the closest match. Edmund Burke said that, "It's the nature of every greatness not to be exact", and I can't agree more. Most of the time the users aren't really sure what they are looking for. So providing a binary answer of yes or no, or found or not found is not that useful. Striking the middle ground by attaching a confidence score to each result is the key. The techniques that you learnt will prove to be useful when we deal with recommender system and anomaly detection, because both these fields rely heavily on the IR techniques. Resources for Article:   Further resources on this subject: Learning Option Pricing [article] Working with Windows Phone Controls [article] Simplifying Parallelism Complexity in C# [article]
Read more
  • 0
  • 0
  • 2369

article-image-types-and-providers
Packt
31 Dec 2015
12 min read
Save for later

Types and providers

Packt
31 Dec 2015
12 min read
In this article by Thomas Uphill, the author of Mastering Puppet Second Edition, we will look at custom types. Puppet separates the implementation of a type into the type definition and any one of the many providers for that type. For instance, the package type in Puppet has multiple providers depending on the platform in use (apt, yum, rpm, gem, and others). Early on in Puppet development there were only a few core types defined. Since then, the core types have expanded to the point where anything that I feel should be a type is already defined by core Puppet. The LVM module creates a type for defining logical volumes, and the concat module creates types for defining file fragments. The firewall module creates a type for defining firewall rules. Each of these types represents something on the system with the following properties: Unique Searchable Atomic Destroyable Creatable (For more resources related to this topic, see here.) When creating a new type, you have to make sure your new type has these properties. The resource defined by the type has to be unique, which is why the file type uses the path to a file as the naming variable (namevar). A system may have files with the same name (not unique), but it cannot have more than one file with an identical path. As an example, the ldap configuration file for openldap is /etc/openldap/ldap.conf, the ldap configuration file for the name services library is /etc/ldap.conf. If you used filename, then they would both be the same resource. Resources must be unique. By atomic, I mean it is indivisible; it cannot be made of smaller components. For instance, the firewall module creates a type for single iptables rules. Creating a type for the tables (INPUT, OUTPUT, FORWARD) within iptables wouldn't be atomic—each table is made up of multiple smaller parts, the rules. Your type has to be searchable so that Puppet can determine the state of the thing you are modifying. A mechanism has to exist to know what the current state is of the thing in question. The last two properties are equally important. Puppet must be able to remove the thing, destroy it, and likewise, Puppet must be able to create the thing anew. Given these criteria, there are several modules that define new types, with some examples including types that manage: Git repositories Apache virtual hosts LDAP entries Network routes Gem modules Perl CPAN modules Databases Drupal multisites Creating a new type As an example, we will create a gem type for managing Ruby gems installed for a user. Ruby gems are packages for Ruby that are installed on the system and can be queried like packages. Installing gems with Puppet can already be done using the gem, pe_gem, or pe_puppetserver_gem providers for the package type. To create a custom type requires some knowledge of Ruby. In this example, we assume the reader is fairly literate in Ruby. We start by defining our type in the lib/puppet/type directory of our module. We'll do this in our example module, modules/example/lib/puppet/type/gem.rb. The file will contain the newtype method and a single property for our type, version as shown in the following code: Puppet::Type.newtype(:gem) do   ensurable   newparam(:name, :namevar => true) do     desc 'The name of the gem'   end   newproperty(:version) do     desc 'version of the gem'     validate do |value|       fail("Invalid gem version #{value}") unless value =~ /^[0-9]         +[0-9A-Za-z.-]+$/     end   end end The ensurable keyword creates the ensure property for our new type, allowing the type to be either present or absent. The only thing we require of the version is that it starts with a number and only contain numbers, letters, periods, or dashes. A more thorough regular expression here could save you time later, such as checking that the version ends with a number or letter. Now we need to start making our provider. The name of the provider is the name of the command used to manipulate the type. For packages, the providers are named things like yum, apt, and dpkg. In our case we'll be using the gem command to manage gems, which makes our path seem a little redundant. Our provider will live at modules/example/lib/puppet/provider/gem/gem.rb. We'll start our provider with a description of the provider and the commands it will use as shown in the following code: Puppet::Type.type(:gem).provide :gem do   desc "Manages gems using gem" Then we'll define a method to list all the gems installed on the system as shown in the following code, which defines the self.instances method: def self.instances   gems = []   command = 'gem list -l'     begin       stdin, stdout, stderr = Open3.popen3(command)       for line in stdout.readlines         (name,version) = line.split(' ')         gem = {}         gem[:provider] = self.name         gem[:name] = name         gem[:ensure] = :present         gem[:version] = version.tr('()','')         gems << new(gem)       end     rescue       raise Puppet::Error, "Failed to list gems using '#         {command}'"     end     gems   end This method runs gem list -l and then parses the output looking for lines such as gemname (version). The output from the gem command is written to the variable stdout. We then use readlines on stdout to create an array which we iterate over with a for loop. Within the for loop we split the lines of output based on a space character into the gem name and version. The version will be wrapped in parenthesis at this point, we use the tr (translate) method to remove the parentheses. We create a local hash of these values and then append the hash to the gems hash. The gems hash is returned and then Puppet knows all about the gems installed on the system. Puppet needs two more methods at this point, a method to determine if a gem exists (is installed), and if it does exist, which version is installed. We already populated the ensure parameter, so as to use that to define our exists method as follows: def exists?   @property_hash[:ensure] == :present end To determine the version of an installed gem, we can use the property_hash variable as follows: def version   @property_hash[:version] || :absent end To test this, add the module to a node and pluginsync the module over to the node as follows: [root@client ~]# puppet plugin download Notice: /File[/opt/puppetlabs/puppet/cache/lib/puppet/provider/gem]/   ensure: created Notice: /File[/opt/puppetlabs/puppet/cache/lib/puppet/provider/gem/   gem.rb]/ensure: defined content as   '{md5}4379c3d0bd6c696fc9f9593a984926d3' Notice: /File[/opt/puppetlabs/puppet/cache/lib/puppet/provider/gem/   gem.rb.orig]/ensure: defined content as   '{md5}c6024c240262f4097c0361ca53c7bab0' Notice: /File[/opt/puppetlabs/puppet/cache/lib/puppet/type/gem.rb]/   ensure: defined content as '{md5}48749efcd33ce06b401d5c008d10166c' Downloaded these plugins: /opt/puppetlabs/puppet/cache/lib/puppet/provider/gem, /opt/puppetlabs/puppet/cache/lib/puppet/provider/gem/gem.rb, /opt/puppetlabs/puppet/cache/lib/puppet/provider/gem/gem.rb.orig, /opt/puppetlabs/puppet/cache/lib/puppet/type/gem.rb This will install our type/gem.rb and provider/gem/gem.rb files into /opt/puppetlabs/puppet/cache/lib/puppet on the node. After that, we are free to run puppet resource on our new type to list the available gems as shown in the following code: [root@client ~]# puppet resource gem gem { 'bigdecimal':   ensure  => 'present',   version => '1.2.0', } gem { 'bropages':   ensure  => 'present',   version => '0.1.0', } gem{ 'commander':   ensure  => 'present',   version => '4.1.5', } gem { 'highline':   ensure  => 'present',   version => '1.6.20', } … Now, if we want to manage gems, we'll need to create and destroy them, and we'll need to provide methods for those operations. If we try at this point, Puppet will fail, as we can see from the following output: [root@client ~]# puppet resource gem bropages pugem { 'bropages':   ensure => 'present',   version => '0.1.0', } [root@client ~]# puppet resource gem bropages ensure=absent gem { 'bropages':   ensure => 'absent', } [root@client ~]# puppet resource gem bropages ensure=absent gem { 'bropages':   ensure => 'absent', } When we run puppet resource, there is no destroy method, so puppet returns that it the gem was removed but doesn't actually do anything. To get Puppet to actually remove the gem, we'll need a method to destroy (remove) gems, gem uninstall should do the trick, as shown in the following code: def destroy   g = @resource[:version] ? [@resource[:name], '--version',     @resource[:version]] : @resource[:name]   command = "gem uninstall #{g} -q -x"     begin       system command     rescue       raise Puppet::Error, "Failed to remove #{@resource[:name]}         '#{command}'"     end     @property_hash.clear   end Using the ternary operator, we either run gem uninstall name -q -x if no version is defined, or gem uninstall name --version version -q -x if a version is defined. We finish by calling @property_hash.clear to remove the gem from the property_hash since the gem is now removed. Now we need to let Puppet know about the state of the bropages gem using our instances method we defined earlier, we'll need to write a new method to prefetch all the available gems. This is done with self.prefetch, as shown in the following code:  def self.prefetch(resources)     gems = instances     resources.keys.each do |name|       if provider = gems.find{ |gem| gem.name == name }         resources[name].provider = provider       end     end   end We can see this in action using puppet resource as shown in the following output: [root@client ~]# puppet resource gem bropages ensure=absent Removing bro Successfully uninstalled bropages-0.1.0 Notice: /Gem[bropages]/ensure: removed gem { 'bropages':   ensure => 'absent', } Almost there, now we want to add bropages back, we'll need a create method, as shown in the following code:  def create     g = @resource[:version] ? [@resource[:name], '--version',       @resource[:version]] : @resource[:name]     command = "gem install #{g} -q"     begin       system command       @property_hash[:ensure] = :present     rescue       raise Puppet::Error, "Failed to install #{@resource[:name]}         '#{command}'"     end   end Now when we run puppet resource to create the gem, we see the installation, as shown in the following output: [root@client ~]# puppet resource gem bropages ensure=present Successfully installed bropages-0.1.0 Parsing documentation for bropages-0.1.0 Installing ri documentation for bropages-0.1.0 1 gem installed Notice: /Gem[bropages]/ensure: created gem { 'bropages':   ensure => 'present', } Nearly done now, we need to handle versions. If we want to install a specific version of the gem, we'll need to define methods to deal with versions.  def version=(value)     command = "gem install #{@resource[:name]} --version       #{@resource[:version]}"     begin       system command       @property_hash[:version] = value     rescue       raise Puppet::Error, "Failed to install gem         #{resource[:name]} using #{command}"     end   end Now, we can tell Puppet to install a specific version of the gem and have the correct results as shown in the following output: [root@client ~]# puppet resource gem bropages version='0.0.9' Fetching: highline-1.7.8.gem (100%) Successfully installed highline-1.7.8 Fetching: bropages-0.0.9.gem (100%) Successfully installed bropages-0.0.9 Parsing documentation for highline-1.7.8 Installing ri documentation for highline-1.7.8 Parsing documentation for bropages-0.0.9 Installing ri documentation for bropages-0.0.9 2 gems installed Notice: /Gem[bropages]/version: version changed '0.1.0' to '0.0.9' gem { 'bropages':   ensure  => 'present',   version => '0.0.9', } This is where our choice of gem as an example breaks down as gem provides for multiple versions of a gem to be installed. Our gem provider, however, works well enough for use at this point. We can specify the gem type in our manifests and have gems installed or removed from the node. This type and provider is only an example; the gem provider for the package type provides the same features in a standard way. When considering creating a new type and provider, search the puppet forge for existing modules first. Summary When the defined types are not enough, you can extend Puppet with custom types and providers written in Ruby. The details of writing providers are best learned by reading the already written providers and referring to the documentation on the Puppet Labs website. Resources for Article: Further resources on this subject: Modules and Templates [article] My First Puppet Module [article] Installing Software and Updates [article]
Read more
  • 0
  • 0
  • 2324
Modal Close icon
Modal Close icon