Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletter Hub
Free Learning
Arrow right icon
timer SALE ENDS IN
0 Days
:
00 Hours
:
00 Minutes
:
00 Seconds

How-To Tutorials

7014 Articles
article-image-sewable-leds-clothing
Packt
14 Aug 2017
16 min read
Save for later

Sewable LEDs in Clothing

Packt
14 Aug 2017
16 min read
In this article by Jonathan Witts, author of the book Wearable Projects with Raspberry Pi Zero, we will use sewable LEDs and conductive thread to transform an item of clothing into a sparkling LED piece of wearable tech, controlled with our Pi Zero hidden in the clothing. We will incorporate a Pi Zero and battery into a hidden pocket in the garment and connect our sewable LEDs to the Pi's GPIO pins so that we can write Python code to control the order and timings of the LEDs. (For more resources related to this topic, see here.) To deactivate the software running automatically, connect to your Pi Zero over SSH and issue the following command: sudo systemctl disable scrollBadge.service Once that command completes, you can shutdown your Pi Zero by pressing your off-switch for three seconds. Now let's look at what we are going to cover in this article. What We Will Cover  In this article we will cover the following topics: What we will need to complete the project in this article How we will modify our item of clothing Writing a Python program to control the electronics in our modified garment Making our program run automatically Let's jump straight in and look at what parts we will need to complete the project in this article. Bill of parts  We will make use of the following things in the project in this article: A Pi Zero W An official Pi Zero Case A portable battery pack Item of clothing to modify e.g. a top or t-shirt 10 sewable LEDs Conductive Thread Some fabric the same color as the clothing Thread the same color as the clothing A sewing needle Pins 6 metal poppers Some black and yellow colored cable Solder and soldering iron Modifying our item of clothing  So let's take a look at what we need to do to modify our item of clothing ready to accommodate our Pi Zero, battery pack and sewable LEDs. We will start by looking at creating our hidden pocket for the Pi Zero and the batteries, followed by how we will sew our LEDs into the top and design our sewable circuit. We will then solve the problem of connecting our conductive thread back to the GPIO holes on our Pi Zero. Our hidden pocket  You will need a piece of fabric that is large enough to house your Pi Zero case and battery pack alongside each other, with enough spare to hem the pocket all the way round. If you have access to a sewing machine then this section of the project will be much quicker, otherwise you will need to do the stitching by hand. The piece of fabric I am using is 18 x 22 cm allowing for a 1 cm hem all around. Fold and pin the 1 cm hem and then either stitch by hand or run it through a sewing machine to secure your hem. When you have finished, remove the pins. You need to then turn your garment inside out and decide where you are going to position your hidden pocket. As I am using a t-shirt for my garment I am going to position my pocket just inside at the bottom side of the garment, wrapping around the front and back so that it sits across the wearer's hip. Pin your pocket in place and then stitch it along the bottom and left and right sides, leaving the top open. Make sure that you stitch this nice and firmly as it has to hold the weight of your battery pack. The picture below shows you my pocket after being stitched in place. When you have finished this you can remove the pins and turn your garment the correct way round again. Adding our sewable LEDs  We are now going to plan our circuit for our sewable LEDs and add them to the garment. I am going to run a line of 10 LEDs around the bottom of my top. These will be wired up in pairs so that we can control a pair of LEDs at any one time with our Pi Zero. You can position your LEDs however you want, but it is important that the circuits of conductive thread do not cross one another. Turn your garment inside out again and mark with a washable pen where you want your LEDs to be sewn. As the fabric for my garment is quite light I am going to just stitch them inside and let the LED shine through the material. However if your material is heavier then you will have to put a small cut where each LED will be and then button-hole the cut so the LED can push through. Start with the LED furthest from your hidden pocket. Once you have your LED in position take a length of your conductive thread and over-sew the negative hole of your LED to your garment, trying to keep your stitches as small and as neat as possible, to minimize how much they can be seen from the front of the garment. You now need to stitch small running stitches from the first LED to the second, ensure that you use the same piece of conductive thread and do not cut it! When you get to the position of your LED, again over-sew the negative hole of your LED ensuring that it is face down so that the LED shows through the fabric. As I am stitching my LEDs quite close to the hem of my t-shirt, I have made use of the hem to run the conductive thread in when connecting the negative holes of the LEDs, as shown in the following image: Continue on connecting each negative point of your LEDs to each other with a single length of conductive thread. Your final LED will be the one closest to your hidden pocket, continue with a running stitch until you are on your hidden pocket. Now take the male half of one of your metal poppers and over-sew this in place through one of the holes. You can now cut the length of conductive thread as we have completed our common ground circuit for all the LEDs. Now stitch the three remaining holes with standard thread, as shown in this picture. When cutting the conductive thread be sure to leave a very short end. If two pieces of thread were to touch when we were powering the LEDs we could cause a short circuit. We now need to sew our positive connections to our garment. Start with a new length of conductive thread and attach the LED second closest to your hidden pocket. Again over-sew it it to the fabric trying to ensure that your stitches are as small as possible so they are not very visible from the front of the garment. Now you have secured the first LED, sew a small running stitch to the positive connection on the LED closest to the hidden pocket. After securing this LED stitch a running stitch so that it stops alongside the popper you previously secured to your pocket, and this time attach a female half of a metal popper in the same way as before, as shown in this picture: Secure your remaining 8 LEDs in the same fashion, working in pairs, away from the pocket. so that you are left with 1 male metal popper and 5 female metal poppers in a line on your hidden pocket. Ensure that the 6 different conductive threads do not cross at any point, the six poppers do not touch and that you have the positive and negative connections the right way round! The picture below shows you the completed circuit stitched into the t-shirt, terminating at the poppers on the pocket. Connecting our Pi Zero Now we have our electrical circuit we need to find a way to attach our Pi Zero to each pair of LEDs and the common ground we have sewn into our garment. We are going to make use of the poppers we stitched onto our hidden pocket for this. You have probably noticed that the only piece of conductive thread which we attached the male popper to was the common ground thread for our LEDs. This is so that when we construct our method of attaching the Pi Zero GPIO pins to the conductive thread it will be impossible to connect the positive and negative threads the wrong way round! Another reason for using the poppers to attach our Pi Zero to the conductive thread is because the LEDs and thread I am using are both rated as OK for hand washing; your Pi Zero is not!  Take your remaining female popper and solder a length of black cable to it, about two and a half times the height of your hidden pocket should do the job. You can feed the cable through one of the holes in the popper as shown in the picture to ensure you get a good connection. For the five remaining male poppers solder the same length of yellow cable to each popper. The picture below shows two of my soldered poppers. Now connect all of your poppers to the other parts on your garment and carefully bend all the cables so that they all run in the same direction, up towards the top of your hidden pocket. Trim all the cable to the same length and then mark the top and bottom of each yellow cable with a permanent marker so that you know which cable is attached to which pair of LEDs. I am marking the bottom yellow cable as number 1 and the top as number 5. We can now cut a length of heat shrink and and cover the loose lengths of cable, leaving about 4 cm free to strip, tin and solder onto your Pi. You can now heat the heat shrink with a hair dryer to shrink it around your cables. We are now going to stitch together a small piece of fabric to attach the poppers to. We want to end up with a piece of fabric which is large enough to sew all six poppers to and to stitch the uncovered cables down to. This will be used to detach the Pi Zero from our garment when we need to wash it, or just remove our Pi Zero for another project. To strengthen up the fabric I am doubling it over and hemming a long and short side of the fabric to make a pocket. This can then be turned inside out and the remaining short side stitched over. You now need to position these poppers onto your piece of fabric so that they are aligned with the poppers you have sewn into your garment. Once you are happy with their placement, stitch them to the piece of fabric using standard thread, ensuring that they are really firmly attached. If you like you can also put a few stitches over each cable to ensure they stay in place too. Using a fine permanent marker number both ends of the 5 yellow cables 1 through 5 so that you can identify each cable. Now push all 6 cables through about 12 cm of shrink wrap and apply some heat from a hair dryer until it shrinks around your cables.You can strip and tin the ends of the six cables ready to solder them to your Pi Zero. Insert the black cable in the ground hole below GPIO 11 on your Pi Zero and then insert the 5 yellow cables, sequentially 1 through 5, into the GPIO holes 11, 09, 10, 7 and 8 as shown in this diagram, again from the rear of the Pi Zero. When you are happy all the cables are in the correct place, solder them to your Pi Zero and clip off any extra length from the front of the Pi zero with wire snips. You should now be able to connect your Pi Zero to your LEDs by pressing all 6 poppers together. To ensure that the wearer's body does not cause a short circuit with the conductive thread on the inside of the garment, you may want to take another piece of fabric and stitch it over all of the conductive thread lines. I would recommend that you do this after you have tested all your LEDs with the program in the next section. We have now carried out all the modifications needed to our garment, so let's move onto writing our Python program to control our LEDs. Writing Our Python Program  To start with we will write a simple, short piece of Python just to check that all ten of our LEDs are working and we know which GPIO pin controls which pair of LEDs. Power on your Pi Zero and connect to it via SSH. Testing Our LEDs  To check that our LEDs are all correctly wired up and that we can control them using Python, we will write this short program to test them. First move into our project directory by typing: cd WearableTech Now make a new directory for this article by typing: mkdir Chapter3 Now move into our new directory: cd Chapter3 Next we create our test program, by typing: nano testLED.py Then we enter the following code into Nano: #!/usr/bin/python3 from gpiozero import LED from time import sleep pair1 = LED(11) pair2 = LED(9) pair3 = LED(10) pair4 = LED(7) pair5 = LED(8) for i in range(4):     pair1.on()     sleep(2)     pair1.off()     pair2.on()     sleep(2)     pair2.off()     pair3.on()     sleep(2)     pair3.off()     pair4.on()     sleep(2)     pair4.off()     pair5.on()     sleep(2)     pair5.off() Press Ctrl + o followed by Enter to save the file, and then Ctrl + x to exit Nano. We can then run our file by typing: python3 ./testLED.py All being well we should see each pair of LEDs light up for two seconds in turn and then the next pair, and the whole loop should repeat 4 times.  Our final LED program  We will now write our Python program which will control our LEDs in our t-shirt. This will be the program which we configure to run automatically when we power up our Pi Zero. So let's begin...  Create a new file by typing: nano tShirtLED.py Now type the following Python program into Nano: #!/usr/bin/python3 from gpiozero import LEDBoard from time import sleep from random import randint leds = LEDBoard(11, 9, 10, 7, 8) while True: for i in range(5): wait = randint(5,10)/10 leds.on() sleep(wait) leds.off() sleep(wait) for i in range(5): wait = randint(5,10)/10 leds.value = (1, 0, 1, 0, 1) sleep(wait) leds.value = (0, 1, 0, 1, 0) sleep(wait) for i in range(5): wait = randint(1,5)/10 leds.value = (1, 0, 0, 0, 0) sleep(wait) leds.value = (1, 1, 0, 0, 0) sleep(wait) leds.value = (1, 1, 1, 0, 0) sleep(wait) leds.value = (1, 1, 1, 1, 0) sleep(wait) leds.value = (1, 1, 1, 1, 1) sleep(wait) leds.value = (1, 1, 1, 1, 0) sleep(wait) leds.value = (1, 1, 1, 0, 0) sleep(wait) leds.value = (1, 1, 0, 0, 0) sleep(wait) leds.value = (1, 0, 0, 0, 0) sleep(wait) Now save your file by pressing Ctrl + o followed by Enter, then exit Nano by pressing Ctrl + x. Test that your program is working correctly by typing: python3 ./tShirtLED.py If there are any errors displayed, go back and check your program in Nano. Once your program has gone through the three different display patterns, press Ctrl + c to stop the program from running. We have introduced a number of new things in this program, firstly we imported a new GPIOZero library item called LEDBoard. LEDBoard lets us define a list of GPIO pins which have LEDs attached to them and perform actions on our list of LEDs rather than having to operate them all one at a time. It also lets us pass a value to the LEDBoard object which indicates whether to turn the individual members of the board on or off. We also imported randint from the random library. Randint allows us to get a random integer in our program and we can also pass it a start and stop value from which the random integer should be taken. We then define three different loop patterns and set each of them inside a for loop which repeats 5 times. Making our program start automatically  We now need to make our tShirtLED.py program run automatically when we switch our Pi Zero on: First we must make the Python program we just wrote executable, type: chmod +x ./tShirtLED.py Now we will create our service definition file, type: sudo nano /lib/systemd/system/tShirtLED.service Now type the definition into it: [Unit] Description=tShirt LED Service After=multi-user.target [Service] Type=idle ExecStart=/home/pi/WearableTech/Chapter3/tShirtLED.py [Install] WantedBy=multi-user.target Save and exit Nano by typing Ctrl + o followed by Enter and then Ctrl + x. Now change the file permissions, reload the systemd daemon and activate our service by typing: sudo chmod 644 /lib/systemd/system/tShirtLED.service sudo systemctl daemon-reload sudo systemctl enable tShirtLED.service Now we need to test whether this is working, so reboot your Pi by typing sudo reboot and then when your Pi Zero restarts you should see that your LED pattern starts to display automatically. Once you are happy that it is all working correctly press and hold your power-off button for three seconds to shut your Pi Zero down. You can now turn your garment the right way round and install the Pi Zero and battery pack into your hidden pocket. As soon as you plug your battery pack into your Pi Zero your LEDs will start to display their patterns and you can safely turn it all off using your power-off button. Summary In this article we looked at making use of stitchable electronics and how we could combine them with our Pi Zero. We made our first stitchable circuit and found a way that we could connect our Pi Zero to this circuit and control the electronic devices using the GPIO Zero Python library. Resources for Article: Further resources on this subject: Sending Notifications using Raspberry Pi Zero [article] Raspberry Pi LED Blueprints [article] Raspberry Pi and 1-Wire [article]
Read more
  • 0
  • 0
  • 12877

article-image-writing-modules
Packt
14 Aug 2017
15 min read
Save for later

Writing Modules

Packt
14 Aug 2017
15 min read
In this article, David Mark Clements, the author of the book, Node.js Cookbook, we will be covering the following points to introduce you to using Node.js  for exploratory data analysis: Node's module system Initializing a module Writing a module Tooling around modules Publishing modules Setting up a private module repository Best practices (For more resources related to this topic, see here.) In idiomatic Node, the module is the fundamental unit of logic. Any typical application or system consists of generic code and application code. As a best practice, generic shareable code should be held in discrete modules, which can be composed together at the application level with minimal amounts of domain-specific logic. In this article, we'll learn how Node's module system works, how to create modules for various scenarios, and how we can reuse and share our code. Scaffolding a module Let's begin our exploration by setting up a typical file and directory structure for a Node module. At the same time, we'll be learning how to automatically generate a package.json file (we refer to this throughout as initializing a folder as a package) and to configure npm (Node's package managing tool) with some defaults, which can then be used as part of the package generation process. In this recipe, we'll create the initial scaffolding for a full Node module. Getting ready Installing Node If we don't already have Node installed, we can go to https://nodejs.org to pick up the latest version for our operating system. If Node is on our system, then so is the npm executable; npm is the default package manager for Node. It's useful for creating, managing, installing, and publishing modules. Before we run any commands, let's tweak the npm configuration a little: npm config set init.author.name "<name here>" This will speed up module creation and ensure that each package we create has a consistent author name, thus avoiding typos and variations of our name. npm stands for... Contrary to popular belief, npm is not an acronym for Node Package Manager; in fact, it stands for npm is Not An Acronym, which is why it's not called NINAA. How to do it… Let's say we want to create a module that converts HSL (hue, saturation, luminosity) values into a hex-based RGB representation, such as will be used in CSS (for example,  #fb4a45 ). The name hsl-to-hex seems good, so let's make a new folder for our module and cd into it: mkdir hsl-to-hex cd hsl-to-hex Every Node module must have a package.json file, which holds metadata about the module. Instead of manually creating a package.json file, we can simply execute the following command in our newly created module folder: npm init This will ask a series of questions. We can hit enter for every question without supplying an answer. Note how the default module name corresponds to the current working directory, and the default author is the init.author.name value we set earlier. An npm init should look like this: Upon completion, we should have a package.json file that looks something like the following: { "name": "hsl-to-hex", "version": "1.0.0", "description": "", "main": "index.js", "scripts": { "test": "echo "Error: no test specified" && exit 1" }, "author": "David Mark Clements", "license": "MIT" } How it works… When Node is installed on our system, npm comes bundled with it. The npm executable is written in JavaScript and runs on Node. The npm config command can be used to permanently alter settings. In our case, we changed the init.author.name setting so that npm init would reference it for the default during a module's initialization. We can list all the current configuration settings with npm config ls . Config Docs Refer to https://docs.npmjs.com/misc/config for all possible npm configuration settings. When we run npm init, the answers to prompts are stored in an object, serialized as JSON and then saved to a newly created package.json file in the current directory. There's more… Let's find out some more ways to automatically manage the content of the package.json file via the npm command. Reinitializing Sometimes additional metadata can be available after we've created a module. A typical scenario can arise when we initialize our module as a git repository and add a remote endpoint after creating the module. Git and GitHub If we've not used the git tool and GitHub before, we can refer to http://help.github.com to get started. If we don't have a GitHub account, we can head to http://github.com to get a free account. To demonstrate, let's create a GitHub repository for our module. Head to GitHub and click on the plus symbol in the top-right, then select New repository: Select New repository. Specify the name as hsl-to-hex and click on Create Repository. Back in the Terminal, inside our module folder, we can now run this: echo -e "node_modulesn*.log" > .gitignore git init git add . git commit -m '1st' git remote add origin http://github.com/<username>/hsl-to-hex git push -u origin master Now here comes the magic part; let's initialize again (simply press enter for every question): npm init This time the Git remote we just added was detected and became the default answer for the git repository question. Accepting this default answer meant that the repository, bugs, and homepage fields were added to package.json . A repository field in package.json is an important addition when it comes to publishing open source modules since it will be rendered as a link on the modules information page at http://npmjs.com. A repository link enables potential users to peruse the code prior to installation. Modules that can't be viewed before use are far less likely to be considered viable. Versioning The npm tool supplies other functionalities to help with module creation and management workflow. For instance, the npm version command can allow us to manage our module's version number according to SemVer semantics. SemVer SemVer is a versioning standard. A version consists of three numbers separated by a dot, for example, 2.4.16. The position of a number denotes specific information about the version in comparison to the other versions. The three positions are known as MAJOR.MINOR.PATCH. The PATCH number is increased when changes have been made that don't break the existing functionality or add any new functionality. For instance, a bug fix will be considered a patch. The MINOR number should be increased when new backward compatible functionality is added. For instance, the adding of a method. The MAJOR number increases when backwards-incompatible changes are made. Refer to http://semver.org/ for more information. If we were to a fix a bug, we would want to increase the PATCH number. We can either manually edit the version field in package.json , setting it to 1.0.1, or we can execute the following: npm version patch This will increase the version field in one command. Additionally, if our module is a Git repository, it will add a commit based on the version (in our case, v1.0.1), which we can then immediately push. When we ran the command, npm output the new version number. However, we can double-check the version number of our module without opening package.json: npm version This will output something similar to the following: { 'hsl-to-hex': '1.0.1', npm: '2.14.17', ares: '1.10.1-DEV', http_parser: '2.6.2', icu: '56.1', modules: '47', node: '5.7.0', openssl: '1.0.2f', uv: '1.8.0', v8: '4.6.85.31', zlib: '1.2.8' } The first field is our module along with its version number. If we added a new backwards-compatible functionality, we can run this: npm version minor Now our version is 1.1.0. Finally, we can run the following for a major version bump: npm version major This sets our modules version to 2.0.0. Since we're just experimenting and didn't make any changes, we should set our version back to 1.0.0. We can do this via the npm command as well: npm version 1.0.0 See also Refer to the following recipes: Writing module code Publishing a module Installing dependencies In most cases, it's most wise to compose a module out of other modules. In this recipe, we will install a dependency. Getting ready For this recipe, all we need is Command Prompt open in the hsl-to-hex folder from the Scaffolding a module recipe. How to do it… Our hsl-to-hex module can be implemented in two steps: Convert the hue degrees, saturation percentage, and luminosity percentage to corresponding red, green, and blue numbers between 0 and 255. Convert the RGB values to HEX. Before we tear into writing an HSL to the RGB algorithm, we should check whether this problem has already been solved. The easiest way to check is to head to http://npmjs.com and perform a search: Oh, look! Somebody already solved this. After some research, we decide that the hsl-to-rgb-for-reals module is the best fit. Ensuring that we are in the hsl-to-hex folder, we can now install our dependency with the following: npm install --save hsl-to-rgb-for-reals Now let's take a look at the bottom of package.json: tail package.json #linux/osx type package.json #windows Tail output should give us this: "bugs": { "url": "https://github.com/davidmarkclements/hsl-to-hex/issues" }, "homepage": "https://github.com/davidmarkclements/hsl-to-hex#readme", "description": "", "dependencies": { "hsl-to-rgb-for-reals": "^1.1.0" } } We can see that the dependency we installed has been added to a dependencies object in the package.json file. How it works… The top two results of the npm search are hsl-to-rgb and hsl-to-rgb-for-reals . The first result is unusable because the author of the package forgot to export it and is unresponsive to fixing it. The hsl-to-rgb-for-reals module is a fixed version of hsl-to-rgb . This situation serves to illustrate the nature of the npm ecosystem. On the one hand, there are over 200,000 modules and counting, and on the other many of these modules are of low value. Nevertheless, the system is also self-healing in that if a module is broken and not fixed by the original maintainer, a second developer often assumes responsibility and publishes a fixed version of the module. When we run npm install in a folder with a package.json file, a node_modules folder is created (if it doesn't already exist). Then, the package is downloaded from the npm registry and saved into a subdirectory of node_modules (for example, node_modules/hsl-to-rgb-for-reals ). npm 2 vs npm 3 Our installed module doesn't have any dependencies of its own. However, if it did, the sub-dependencies would be installed differently depending on whether we're using version 2 or version 3 of npm. Essentially, npm 2 installs dependencies in a tree structure, for instance, node_modules/dep/node_modules/sub-dep-of-dep/node_modules/sub-dep-of-sub-dep. Conversely, npm 3 follows a maximally flat strategy where sub-dependencies are installed in the top level node_modules folder when possible, for example, node_modules/dep, node_modules/sub-dep-of-dep, and node_modules/sub-dep-of-sub-dep. This results in fewer downloads and less disk space usage; npm 3 resorts to a tree structure in cases where there are two versions of a sub-dependency, which is why it's called a maximally flat strategy. Typically, if we've installed Node 4 or above, we'll be using npm version 3. There's more… Let's explore development dependencies, creating module management scripts and installing global modules without requiring root access. Installing development dependencies We usually need some tooling to assist with development and maintenance of a module or application. The ecosystem is full of programming support modules, from linting to testing to browser bundling to transpilation. In general, we don't want consumers of our module to download dependencies they don't need. Similarly, if we're deploying a system built-in node, we don't want to burden the continuous integration and deployment processes with superfluous, pointless work. So, we separate our dependencies into production and development categories. When we use npm --save install <dep>, we're installing a production module. To install a development dependency, we use --save-dev. Let's go ahead and install a linter. JavaScript Standard Style A standard is a JavaScript linter that enforces an unconfigurable ruleset. The premise of this approach is that we should stop using precious time up on bikeshedding about syntax. All the code in this article uses the standard linter, so we'll install that: npm install --save-dev standard semistandard If the absence of semicolons is abhorrent, we can choose to install semistandard instead of standard at this point. The lint rules match those of standard, with the obvious exception of requiring semicolons. Further, any code written using standard can be reformatted to semistandard using the semistandard-format command tool. Simply, run npm -g i semistandard-format to get started with it. Now, let's take a look at the package.json file: { "name": "hsl-to-hex", "version": "1.0.0", "main": "index.js", "scripts": { "test": "echo "Error: no test specified" && exit 1" }, "author": "David Mark Clements", "license": "MIT", "repository": { "type": "git", "url": "git+ssh://git@github.com/davidmarkclements/hsl-to-hex.git" }, "bugs": { "url": "https://github.com/davidmarkclements/hsl-to-hex/issues" }, "homepage": "https://github.com/davidmarkclements/hsl-to- hex#readme", "description": "", "dependencies": { "hsl-to-rgb-for-reals": "^1.1.0" }, "devDependencies": { "standard": "^6.0.8" } } We now have a devDependencies field alongside the dependencies field. When our module is installed as a sub-dependency of another package, only the hsl-to-rgb-for-reals module will be installed while the standard module will be ignored since it's irrelevant to our module's actual implementation. If this package.json file represented a production system, we could run the install step with the --production flag, as shown: npm install --production Alternatively, this can be set in the production environment with the following command: npm config set production true Currently, we can run our linter using the executable installed in the node_modules/.bin folder. Consider this example: ./node_modules/.bin/standard This is ugly and not at all ideal. Refer to Using npm run scripts for a more elegant approach. Using npm run scripts Our package.json file currently has a scripts property that looks like this: "scripts": { "test": "echo "Error: no test specified" && exit 1" }, Let's edit the package.json file and add another field, called lint, as follows: "scripts": { "test": "echo "Error: no test specified" && exit 1", "lint": "standard" }, Now, as long as we have standard installed as a development dependency of our module (refer to Installing Development Dependencies), we can run the following command to run a lint check on our code: npm run-script lint This can be shortened to the following: npm run lint When we run an npm script, the current directory's node_modules/.bin folder is appended to the execution context's PATH environment variable. This means even if we don't have the standard executable in our usual system PATH, we can reference it in an npm script as if it was in our PATH. Some consider lint checks to be a precursor to tests. Let's alter the scripts.test field, as illustrated: "scripts": { "test": "npm run lint", "lint": "standard" }, Chaining commands Later, we can append other commands to the test script using the double ampersand (&&) to run a chain of checks. For instance, "test": "npm run lint && tap test". Now, let's run the test script: npm run test Since the test script is special, we can simply run this: npm test Eliminating the need for sudo The npm executable can install both the local and global modules. Global modules are mostly installed so to allow command line utilities to be used system wide. On OS X and Linux, the default npm setup requires sudo access to install a module. For example, the following will fail on a typical OS X or Linux system with the default npm setup: npm -g install cute-stack # <-- oh oh needs sudo This is unsuitable for several reasons. Forgetting to use sudo becomes frustrating; we're trusting npm with root access and accidentally using sudo for a local install causes permission problems (particularly with the npm local cache). The prefix setting stores the location for globally installed modules; we can view this with the following: npm config get prefix Usually, the output will be /usr/local . To avoid the use of sudo, all we have to do is set ownership permissions on any subfolders in /usr/local used by npm: sudo chown -R $(whoami) $(npm config get prefix)/{lib/node_modules,bin,share} Now we can install global modules without root access: npm -g install cute-stack # <-- now works without sudo If changing ownership of system folders isn't feasible, we can use a second approach, which involves changing the prefix setting to a folder in our home path: mkdir ~/npm-global npm config set prefix ~/npm-global We'll also need to set our PATH: export PATH=$PATH:~/npm-global/bin source ~/.profile The source essentially refreshes the Terminal environment to reflect the changes we've made. See also Scaffolding a module Writing module code Publishing a module Resources for Article: Further resources on this subject: Understanding and Developing Node Modules [article] Working with Pluginlib, Nodelets, and Gazebo Plugins [article] Basic Website using Node.js and MySQL database [article]
Read more
  • 0
  • 0
  • 14417

article-image-introduction-latest-social-media-landscape-and-importance
Packt
14 Aug 2017
10 min read
Save for later

Introduction to the Latest Social Media Landscape and Importance

Packt
14 Aug 2017
10 min read
In this article by Siddhartha Chatterjee and Michal Krystyanczuk, author of the book, Python Social Media Analytics, starts with a question to you: Have you seen the movie Social Network? If you have not, it could be a good idea to see it before you read this. If you have, you may have seen the success story around Mark Zuckerberg and his company Facebook. This was possible due to power of the platform in connecting, enabling, sharing, and impacting the lives of almost two billion people on this planet. The earliest Social Networks existed as far back as 1995; such as Yahoo (Geocities), theglobe.com, and tripod.com. These platforms were mainly to facilitate interaction among people through chat rooms. It was only at the end of the 90s that user profiles became the in thing in social networking platforms, allowing information about people to be discoverable, and therefore, providing a choice to make friends or not. Those embracing this new methodology were Makeoutclub, Friendster, SixDegrees.com, and so on. MySpace, LinkedIn, and Orkut were thereafter created, and the social networks were on the verge of becoming mainstream. However, the biggest impact happened with the creation of Facebook in 2004; a total game changer for people's lives, business, and the world. The sophistication and the ease of using the platform made it into mainstream media for individuals and companies to advertise and sell their ideas and products. Hence, we are in the age of social media that has changed the way the world functions. Since the last few years, there have been new entrants in the social media, which are essentially of different interaction models as compared to Facebook, LinkedIn, or Twitter. These are Pinterest, Instagram, Tinder, and others. Interesting example is Pinterest, which unlike Facebook, is not centered around people but is centered around interests and/or topics. It's essentially able to structure people based on their interest around these topics. CEO of Pinterest describes it as a catalog of ideas. Forums which are not considered as regular social networks, such as Facebook, Twitter, and others, are also very important social platforms. Unlike in Twitter or Facebook, Forum users are often anonymous in nature, which enables them to make in-depth conversations with communities. Other non-typical social networks are video sharing platforms, such as YouTube and Dailymotion. They are non-typical because they are centered around the user-generated content, and the social nature is generated by the sharing of these content on various social networks and also the discussion it generates around the user commentaries. Social media is gradually changing from platform centric to more experiences and features. In the future, we'll see more and more traditional content providers and services becoming social in nature through sharing and conversations. The term social media today includes not just social networks but every service that's social in nature with a wide audience. Delving into Social Data The data acquired from social media is called social data. The social data exists in many forms. The types of social media data can be information around the users of social networks, like name, city, interests, and so on. These types of data that are numeric or quantifiable are known as structured data. However, since Social Media are platforms for expression, hence, a lot of the data is in the form of texts, images, videos, and such. These sources are rich in information, but not as direct to analyze as structured data described earlier. These types of data are known as unstructured data. The process of applying rigorous methods to make sense of the social data is called social data analytics. We will go into great depth in social data analytics to demonstrate how we can extract valuable sense and information from these really interesting sources of social data. Since there are almost no restrictions on social media, there are lot of meaningless accounts, content, and interactions. So, the data coming out of these streams is quite noisy and polluted. Hence, a lot of effort is required to separate the information from the noise. Once the data is cleaned and we are focused on the most important and interesting aspects, we then require various statistical and algorithmic methods to make sense out of the filtered data and draw meaningful conclusions. Understanding the process Once you are familiar with the topic of social media data, let us proceed to the next phase. The first step is to understand the process involved in exploitation of data present on social networks. A proper execution of the process, with attention to small details, is the key to good results. In many computer science domains, a small error in code will lead to a visible or at least correctable dysfunction, but in data science, it will produce entirely wrong results, which in turn will lead to incorrect conclusions. The very first step of data analysis is always problem definition. Understanding the problem is crucial for choosing the right data sources and the methods of analysis. It also helps to realize what kind of information and conclusions we can infer from the data and what is impossible to derive. This part is very often underestimated while it is key to successful data analysis. Any question that we try to answer in a data science project has to be very precise. Some people tend to ask very generic questions, such as I want to find trends on Twitter. This is not a correct problem definition and an analysis based on such statement can fail in finding relevant trends. By a naïve analysis, we can get repeating Twitter ads and content generated by bots. Moreover, it raises more questions than it answers. In order to approach the problem correctly, we have to ask in the first step: what is a trend? what is an interesting trend for us? and what is the time scope? Once we answer these questions, we can break up the problem in multiple sub problems: I'm looking for the most frequent consumer reactions about my brand on Twitter in English over the last week and I want to know if they were positive or negative. Such a problem definition will lead to a relevant, valuable analysis with insightful conclusions. The next part of the process consists of getting the right data according to the defined problem. Many social media platforms allow users to collect a lot of information in an automatized way via APIs (Application Programming Interfaces), which is the easiest way to complete the task. Once the data is stored in a database, we perform the cleaning. This step requires a precise understanding of the project's goals. In many cases, it will involve very basic tasks such as duplicates removal, for example, retweets on Twitter, or more sophisticated such as spam detection to remove irrelevant comments, language detection to perform linguistic analysis, or other statistical or machine learning approaches that can help to produce a clean dataset. When the data is ready to be analyzed, we have to choose what kind of analysis and structure the data accordingly. If our goal is to understand the sense of the conversations, then it only requires a simple list of verbatims (textual data), but if we aim to perform analysis on different variables, like number of likes, dates, number of shares, and so on, the data should be combined in a structure such as data frame, where each row corresponds to an observation and each column to a variable. The choice of the analysis method depends on the objectives of the study and the type of data. It may require statistical or machine learning approach, or a specific approach to time series. Different approaches will be explained on the examples of Facebook, Twitter, YouTube, GitHub, Pinterest, and Forum data. Once the analysis is done, it's time to infer conclusions. We can derive conclusions based on the outputs from the models, but one of the most useful tools is visualization technique. Data and output can be presented in many different ways, starting from charts, plots, and diagrams through more complex 2D charts, to multidimensional visualizations. Project planning Analysis of content on social media can get very confusing due to difficulty of working on large amount of data and also trying to make sense out of it. For this reason, it's extremely important to ask the right questions in the beginning to get the right answers. Even though this is an exploratory approach, and getting exact answers may be difficult, the right questions allow you to define the scope, process and the time. The main questions that we will be working on are the following : What does Google post on Facebook ? How do people react to Google Posts ? (Likes, Shares and Comments) What do Google's audience say about Google and its ecosystem? What are the emotions expressed by Google's audience ? With the preceding questions in mind we will proceed to the next steps. Scope and process The analysis will consist of analyzing the feed of posts and comments on official Facebook page of Google. The process of information extraction is organized in a data flow. It starts with data extraction from API, data preprocessing and wrangling and is followed by  a series of different analyses. The analysis becomes actionable only after the last step of results interpretation. In order to arrive at retrieving the above information we need to do the following : Extract all the posts of Google permitted by the Facebook API Extract the metadata for each posts : TimeStamp, Number of Likes, Number of Shares, Number of comments. Extract the user comments under each post and the metadata Process the posts to retrieve the most common keywords, bi-grams, hashtags Process the user comments using Alchemy API to retrieve the emotions Analyse the above information to derive conclusions Data type The main part of information extraction comes from an analysis of textual data (posts and comments). However, in order to add quantitative and temporal dimension, we process numbers (likes, shares) and dates (date of creation). Summary The avalanche of Social Network data is a result of communication platforms been developed since the last two decades. These are the platforms that evolved from chat rooms to personal information sharing and finally, social and professional networks. Among many Facebook, Twitter, Instagram, Pinterest and LinkedIn have emerged as the modern day Social Media. These platforms collectively have reach of more than a billon or more of individuals across the world sharing their activities and interaction with each other. Sharing of their data by these media through APIs and other technologies has given rise to a new field called Social Media Analytics. This has multiple applications such as in Marketing, Personalized recommendations, Research and Societal. Modern Data Science techniques such as Machine Learning and Text Mining are widely used for these applications. Python is one of the most widely used programming languages used for these techniques. However, manipulating the unstructured-data from Social Networks requires a lot of precise processing and preparation before coming to the most interesting bits.  Resources for Article:  Further resources on this subject: How to integrate social media with your WordPress website [article] Social Media for Wordpress: VIP Memberships [article] Social Media Insight Using Naive Bayes [article]
Read more
  • 0
  • 0
  • 2192

article-image-cloud-and-devops-revolution
Packt
14 Aug 2017
12 min read
Save for later

The Cloud and the DevOps Revolution

Packt
14 Aug 2017
12 min read
Cloud and DevOps are two of the most important trends to emerge in technology. The reasons are clear - it's all about the amount of data that needs to be processed and managed in the applications and websites we use every day. The amount of data being processed and handled is huge. Every day, over a billion people visit Facebook, every hour 18,000 hours of video are uploaded to YouTube, every second Google process 40,000 search queries. Being able to handle such a staggering scale isn't easy. Through the use of the Amazon Web Services (AWS), you will be able to build out the key components needed to succeed at minimum cost and effort. This is an extract from Effective DevOps on AWS. Thinking in terms of cloud and not infrastructure The day I discovered that noise can damage hard drives. December 2011, sometime between Christmas and new year's eve. I started to receive dozens of alerts from our monitoring system. Apparently we had just lost connectivity to our European datacenter in Luxembourg. I rushed into network operating center (NOC) hopping that it's only a small glitch in our monitoring system, maybe just a joke after all, with so much redundancy how can everything go offline? Unfortunately, when I got into the room, the big monitoring monitors were all red, not a good sign. This was just the beginning of a very long nightmare. An electrician working in our datacenter mistakenly triggered the fire alarm, within seconds the fire suppression system set off and released its aragonite on top of our server racks. Unfortunately, this kind of fire suppression system makes so much noise when it releases its gas that sound wave instantly killed hundreds and hundreds of hard drives effectively shutting down our only European facility. It took months for us to be back on our feet. Where is the cloud when you need it! As Charles Philips said it the best: "Friends don't let friends build a datacenter." Deploying your own hardware versus in the cloud It wasn't long ago that tech companies small and large had to have a proper technical operations organization able to build out infrastructures. The process went a little bit like this: Fly to the location you want to put your infrastructure in, go tour the the different datacenters and their facilities. Look at the floor considerations, power considerations, HVAC, fire prevention systems, physical security, and so on. Shop for an internet provider, ultimately even so you are talking about servers and a lot more bandwidth, the process is the same, you want to get internet connectivity for your servers. Once that's done, it's time to get your hardware. Make the right decisions because you are probably going to spend a big portion of your company money on buying servers, switches, routers, firewall, storage, UPS (for when you have a power outage), kvm, network cables, the dear to every system administrator heart, labeler and a bunch of spare parts, hard drives, raid controllers, memory, power cable, you name it. At that point, once the hardware is bought and shipped to the datacenter location, you can rack everything, wire all the servers, power everything.Your network team can kick in and start establishing connectivity to the new datacenter using various links, configuring the edge routers, switches, top of the racks switches, kvm, firewalls (sometime), your storage team is next and is going to provide the much needed NAS or SAN, next come your sysops team who will image the servers, sometime upgrade the bios and configure hardware raid and finally put an OS on those servers. Not only this is a full-time job for a big team, it also takes lots of time and money to even get there. Getting new servers up and running with AWS will take us minutes.In fact, more than just providing a server within minutes, we will soon see how to deploy and run a service in minutes and just when you need it. Cost Analysis From a cost stand point, deploying in a cloud infrastructure such as AWSusually end up being a to a lot cheaper than buying your own hardware. If you want to deploy your own hardware, you have to pay upfront for all the hardware (servers, network equipment) and sometime license software as well. In a cloud environment you pay as you go. You can add and remove servers in no time. Also, if you take advantage of the PaaS and SaaS applications you also usually end up saving even more money by lowering your operating costs as you don't need as much staff to administrate your database, storage, and so on. Most cloud providers,AWS included, also offer tired pricing and volume discount. As your service gets bigger and bigger, you end up paying less for each unit of storage, bandwidth, and so on. Just on time infrastructure As we just saw, when deploying in the cloud,you pay as you go. Most cloud companies use that to their advantage to scale up and down their infrastructure as the traffic to their sites changes. This ability to add and remove new servers and services in no time and on demand is one of the main differentiator of an effective cloud infrastructure. Here is a diagram from a presentation from 2015 that shows the annual traffic going to https://www.amazon.com/(the online store): © 2016, Amazon Web Services, Inc. or its affiliates. All rights reserved. As you can see, with the holidays, the end of the year is a busy time for https://www.amazon.com/, their traffic triple. If they were hosting their service in an "old fashion" way, they would have only 24% of their infrastructure used in average every year but thanks to being able to scale dynamically they are able to only provision what they really need. © 2016, Amazon Web Services, Inc. or its affiliates. All rights reserved. Here at medium, we also see on a very regular basis the benefits from having fast auto scaling capabilities. Very often, stories will become viral and the amount of traffic going on medium can drastically change. On January 21st 2015, to our surprise, the White House posted the transcript of the State of the Union minutes before President Obama started his speech: https://medium.com/@WhiteHouse/ As you can see in the following graph, thanks to being in the cloud and having auto scaling capabilities, our platform was able to absorb the 5xinstant spike of traffic that the announcement made by doubling the amount of servers our front service uses. Later as the traffic started to naturally drain, we automatically removed some hosts from our fleet. The different layer of building a cloud The cloud computing is often broken up into 3 different type of service: Infrastructure as a Service (IaaS): It is the fundamental block on top of which everything cloud is built upon. It is usually a computing resource in a virtualized environment. It offers a combination of processing power, memory, storage, and network. The most common IaaS entities you will find are virtual machines (VM), network equipment like load balancers or virtual Ethernet interface and storage like block devices.This layer is very close to the hardware and give you the full flexibility that you would get deploying your software outside of a cloud. If you have any datacentre physical knowledge, this will mostly also apply to that layer. Platform as a Service (PaaS):It is where things start to get really interesting with the cloud. When building an application, you will likely need a certain number of common components such as a data store, a queue, and so on. The PaaS layer provides a number of ready to use applications to help you build your own services without worrying about administrating and operating those 3rd party services such as a database server. Sofware as a Service (SaaS):It is the icing on the cake. Similarly, to the PaaS layer you get access to managed services but this time those services are complete solution dedicated to certain purpose such as management or monitoring tools. When building an application, relying on those services make a big difference when compared to more traditional environment outside of a cloud. Another key element to succeed when deploying or migrating to a new infrastructure is to adopt a DevOps mind-set. Deploying in AWS AWS is on the forefront of the cloud providers. Launched in 2006 with SQS and EC2, Amazon quickly became the biggest IaaS provider. They have the biggest infrastructure, the biggest ecosystem and constantly add new feature and release new services. In 2015 they passed the cap of 1 million active customers. Over the last few years, they managed to change people's mind set about cloud and now deploying new services to the cloud is the new normal. Using the AWS managed tools and services is a drastic way to improve your productivity and keep your team lean. Amazon is continually listening to its customer's feedback and looking at the market trends therefore, as the DevOps movement started to get established, Amazon released a number of new services tailored toward implementing some of the DevOps best practices.We will also see how those services synergize with the DevOps culture. How to take advantage of the AWS ecosystem When you talk to applications architects, there are usually two train of train of thought. The first one is to stay as platform agnostic as possible. The idea behind that is that if you aren't happy with AWS anymore, you can easily switch cloud provider or even build your own private cloud. The 2nd train of thought is the complete opposite; the idea is that you are going to stick to AWS no matter what. It feels a bit extreme to think it that way but the reward is worth the risk and more and more companies agree with that. That's also where I stand. When you build a product now a day, the scarcity is always time and people. If you can outsource what is not your core business to a company that provides similar service or technology, support expertise and that you can just pay for it on SaaS model, then do so. If, like me you agree that using managed services is the way to go then being a cloud architect is like playing with Lego. With Lego, you have lots of pieces of different shapes, sizes, and colors and you assemble them to build your own MOC. Amazon services are like those Lego pieces. If you can picture your final product, then you can explore the different services and start combining them to build the supporting stack needed to quickly and efficiently build your product. Of course, in this case, the "If" is a big if and unlike Lego, understanding what each piece can do is a lot less visual and colorful than Lego pieces. How AWS synergize with the DevOps culture Having a DevOps culture is about rethinking how engineering teams work together by breaking out those developers and operations silos and bringing a new set of new tools to implement some best practices. AWS helps in many different accomplish that: For some developers, the world of operations can be scary and confusing but if you want better cooperation between engineers, it is important to expose every aspect of running a service to the entire engineering organization. As an operations engineer, you can't have a gate keeper mentality toward developers, instead it's better to make them comfortable accessing production and working on the different component of the platform. A good way to get started with that in the AWS console. While a bit overwarming, it is still a much better experience for people not familiar with this world to navigate that web interface than referring to constantly out of date documentations, using SSH and random plays to discover the topology and configuration of the service. Of course, as your expertise grows, as your application becomes and more complex and the need to operate it faster increase, the web interface starts to showing some weakness. To go around that issue, AWS provides a very DevOPS friendly alternative: an API. Accessible through a command-line tool and a number of SDK (which include Java, Javascript, Python, .net, php, ruby go, and c++) the SDKs let you administrate and use the managed services. Finally, AWS offers a number of DevOps tools. AWS has a source control service similar to GitHub called CodeCommit. For automation, in addition to allowing to control everything via SDKs, AWSprovides the ability to create template of your infrastructure via CloudFormation but also a configuration management system called OpsWork. It also knows how to scale up and down fleets of servers using Auto Scaling Groups. For the continuous delivery AWS provide a service called CodePipeline and for continuous deployment a service called CodeCommit. With regards to measuring everything, we will rely on CloudWatch and later ElasticSearch / Kibanato visualize metrics and logs. Finally, we will see how to use Docker via ECS which will let us create containers to improve the server density (we will be able to reduce the VM consumption as we will be able to collocate services together in 1 VM while still keeping fairly good isolation), improve the developer environment as we will now be able to run something closer to the production environment and improve testing time as starting containers is a lot faster than starting virtual machines.
Read more
  • 0
  • 0
  • 35554

article-image-understanding-sap-analytics-cloud
Packt
10 Aug 2017
14 min read
Save for later

Understanding SAP Analytics Cloud

Packt
10 Aug 2017
14 min read
In this article, Riaz Ahmed, the author of the book Learning SAP Analytics Cloud provides an overview of this unique cloud-based business intelligence platform. You will learn about the following SAP Analytics Cloud segments Models and data sources Visualization Collaboration Presentation Administration and security (For more resources related to this topic, see here.) What is SAP Analytics Cloud? SAP Analytics Cloud is a new generation cloud-based application, which helps you explore your data, perform visualization and analysis, create financial plans, and produce predictive forecasting. It is a one-stop-shop solution to cope with your analytic needs and comprises business intelligence, planning, predictive analytics, and governance and risk. The application is built on the SAP Cloud Platform and delivers great performance and scalability. In addition to the on-premise SAP HANA, SAP BW, and S/4HANA sources, you can work with data from a wide range of non-SAP sources including Google Drive, Salesforce, SQL Server, Concur, and CSV to name a few. SAP Analytics Cloud allows you to make secure connections to these cloud and on premise data sources. Anatomy of SAP Analytics Cloud The following figure depicts the anatomy of SAP Analytics Cloud: Data sources and models Before commencing your analytical tasks in SAP Analytics Cloud, you need to create  models. Models are the basis for all of  your analysis in SAP Analytics Cloud  to evaluate the performance of your organization. It is a high-level design that exposes the analytic requirements of end users. You can create planning and analytics models based on the cloud or on premise data sources. Analytics models are more simpler and flexible, while planning models are full featured models in which business analysts and finance professionals can quickly and easily build connected models to analyze data and then collaborate with each other to attain better business performance. Preconfigured with dimensions for Time and Categories, planning models support for multi-currency and security features at both model and dimension levels. After creating these models you can share it with other users in your organization. Before sharing, you can set up model access privileges for users according to their level of authorization and can also enable data auditing. With the help of SAP Analytics Cloud's analytical capabilities, users can discover hidden traits in the data and can predict likely outcomes. It equips them with the ability to uncover potentials risks and hidden opportunities. To determine what content to include in your model, you must first identify the columns from the source data on which users need to query. The columns you need in your model reside in some sort of data source. SAP Analytics Cloud supports three types of data sources: files (such as CSV or Excel files) that usually reside on your computer, live data connection from a connected remote system, and  cloud apps. In addition to the files on your computer, you can use on-premise data sources such as SAP Business Warehouse, SAP ERP, SAP Universe, SQL database, and more to acquire data for your models. In the cloud, you can get data from apps like Concur, Google Drive, SAP Business ByDesign, SAP Hybris Cloud, OData Services, and Success Factors. The following figure depicts these data sources. The cloud apps data sources you can use with SAP Analytics Cloud are displayed above the firewall mark, while those in your local network are shown under the firewall.As you can see in this figure, there are over twenty data sources currently supported by SAP Analytics Cloud.  The method of connecting to these data sources also vary from each other. Create a direct live connection to SAP HANA You can connect to on-premise SAP HANA system to use live data in SAP Analytics Cloud. Live data means that you can get up-to-the-minute data when you open a story in SAP Analytics Cloud.In this case, any changes made to the data in the source system are reflected immediately. Usually, there are two ways to establish a connection to a data source – use  the Connection option from the main menu, or specify the data source during the process of creating a model. However, live data connections must be established via the Connection menu option prior to creating the corresponding model. Connect remote systems to import data In addition to creating live connections, you can also create connections that allow you to import data into SAP Analytics Cloud. In these types of connections that you make to access remote systems, data is imported (copied) to SAP Analytics Cloud. Any changes users make in the source data do not affect the imported data. To establish connections with these remote systems, you need to install some additional components. For example, you must install SAP HANA Cloud Connector to access SAP Business Planning and Consolidation (BPC) for Netweaver. Similarly, SAP Analytics Cloud Agent should be installed for SAP Business Warehouse (BW), SQL Server, SAP ERP, and others. Connect to a cloud app to import data In addition to creating a live connection to create a model on live data and importing data from remote systems, you can set up connections to acquire data from cloud apps, such as Google Drive, SuccessFactors, Odata, Concur, Fieldglass, Google BigQuery, and more. Refreshing imported data SAP Analytics Cloud allows you to refresh your imported data. With this option, you can reimport the data on demand to get the latest values. You can perform this refresh operation either manually, or create an import schedule to refresh the data at a specific date and time or on a recurring basis. Visualization Once you have created a model and set up appropriate security for it, you can create stories in which the underlying model data can be explored and visualized with the help of different types of charts, geo maps, and tables. There is a wide range of charts you can add to your story pages to address different scenarios. You can create multiple pages in your story to present your model data using charts, geo maps, tables, text, shapes, and images. On your story pages, you can link dimensions to present data from multiple sources. Adding reference lines and thresholds, applying filters, and drilling down into data can be done on-the-fly. The ability to interactively drag and drop page objects is the most useful feature of this application. Charts: SAP Analytics Cloud comes with a variety of charts to present your analysis according to your specific needs. You can add multiple types of charts to a single story page. Geo Map: The models that include latitude and longitude information can be used in stories to visualize data in geo maps. By adding multiple layers of different types of data in geo maps, you can show different geographic features and point of interests enabling you to perform sophisticated geographic analysis. Table: A table is a spreadsheet like object that can be used to view and analyze text data. You can add this object to either canvas or grid pages in stories. Static and Dynamic Text: You can add static and dynamic text to your story pages. Static text are normally used to display page titles, while dynamic text automatically updates page headings based on the values from the source input control or filter. Images and Shapes: You can add images (such as your company logo) to your story page by uploading them from your computer. In addition to images, you can also add shapes such as line, square, or circle to your page. Collaboration Collaboration, alert, and notification features of SAP Analytics Cloud keep business users in touch with each other while executing their tasks.During the process of creating models and stories, you need input from other people. For example, you might ask a colleague to update a model by providing first quarter sales data, or request Sheela to enter her comments on one of your story pages. In SAP Analytics Cloud, these interactions come under the collaboration features of the application. Using these features users can discuss business content and share information that consequently smoothens the decision making process. Here is a list of available collaboration features in the application that allow group members to discuss stories and other business content. Create a workflow using events and tasks: The events and tasks features in SAP Analytics Cloud are the two major sources that help you collaborate with other group members and  manage your planning and analytic activities. After creating an event and assigning tasks to relevant group members, you can monitor the task progress in the Events interface. Here is the workflow to utilize these two features: Create events based on categories and processes within categories Create a task, assign it to users, and set a due date for its submission Monitor the task progress Commenting on a chart's data point: Using this feature, you can add annotations or additional information to individual data points in a chart. Commenting on a story page: In addition to adding comments to an individual chart, you have the option to add comments on an entire story page to provide some vital information to other users. When you add a comment, other users can see and reply to it. Produce a story as a PDF: You can save your story in a PDF file to share it with other users and for offline access. You can save all story pages or a specific page as a PDF file. Sharing a story with colleagues: Once you complete a story, you can share it with other members in your organization. You are provided with three options (Public, Teams, and Private) when you save a story. When you save your story in the Public folder, it can be accessed by anyone. The Teams option lets you select specific teams to share your story with. In the Private option, you have to manually select users with whom you want to share your story. Collaborate via discussions: You can collaborate with colleagues using the discussions feature of SAP Analytics Cloud. The discussions feature enables you to connect with other members in real-time. Sharing files and other objects: The Files option under the Browse menu allows you to access SAP Analytics Cloud repository, where the stories you created and the files you uploaded are stored. After accessing the Files page, you can share its objects with other users. On the Files page, you can manage files and folders, upload files, share files, and change share settings. Presentation In the past, board meetings were held in which the participants used to deliver their reports through their laptops and a bunch of papers. With different versions of reality, it was very difficult for decision makers to arrive at a good decision. With the advent of SAP Digital Boardroom the board meetings have revolutionized. It is a next-generation presentation platform, which helps you visualize your data and plan ahead in a real-time environment. It runs on multiple screens simultaneously displaying information from difference sources. Due to this capability, more people can work together using a single dataset that consequently creates one version of reality to make the best decision. SAP Digital Boardroom is changing the landscape of board meetings. In addition to supporting traditional presentation methods, it goes beyond the corporate boardroom and allows remote members to join and participate the meeting online. These remote participants can actively engage in the meeting and can play with live data using their own devices. SAP Digital Boardroom is a visualization and presentation platform that enormously assists in the decision-making process. It transforms executive meetings by replacing static and stale presentations with interactive discussions based on live data, which allows them to make fact-based decisions to drive their business. Here are the main benefits of SAP Digital Boardroom: Collaborate with others in remote locations and on other devices in an interactive meeting room. Answer ad hoc questions on the fly. Visualize, recognize, experiment, and decide by jumping on and off script at any point. Find answers to the questions that matter to you by exploring directly on live data and focusing on relevant aspects by drilling into details. Discover opportunities or reveal hidden threats. Simulate various decisions and project the results. Weigh and share the pros and cons of your findings. The Digital Boardroom is interactive so you can retrieve real-time data, make changes to the schedule and even run through what-if scenarios. It presents a live picture of your organization across three interlinked touch screens to make faster, better executive decisions. There are two aspects of Digital Boardroom with which you can share existing stories with executives and decision-makers to reveal business performance. First, you have to design your agenda for your boardroom presentation by adding meeting information, agenda items, and linking stories as pages in a navigation structure. Once you have created a digital boardroom agenda, you can schedule a meeting to discuss the agenda. In Digital Boardroom interface, you can organize a meeting in which members can interact with live data during a boardroom presentation. Administration andsecurity An application that is accessed by multiple users is useless without a proper system administration module. The existence of this module ensures that things are under control and secured. It also includes upkeep, configuration, and reliable operation of the application. This module is usually assigned to a person who is called system administrator and whose responsibility is to watch uptime, performance, resources, and security of the application. Being a multi-user application, SAP Analytics Cloud also comes with this vital module that allows a system administrator to take care of the following segments: Creating users and setting their passwords Importing users from another data source Exporting user's profiles for other apps Deleting unwanted user accounts from the system Creating roles and assigning them to users Setting permissions for roles Forming teams Setting security for models Monitoring users activities Monitoring data changes Monitoring system performance System deployment via export and import Signing up for trial version If you want to get your feet wet by exploring this exciting cloud application, then sign up for a free 30 days trial version. Note that the trial version doesn't allow you to access all the features of SAP Analytics Cloud. For example, you cannot create a planning model in the trial version nor can you access its security and administration features. Execute the following steps to get access to the free SAP Analytics Cloud trial: Put the following URL in you browser's address bar and press enter: http://discover.sapanalytics.cloud/trialrequest-auto/ Enter and confirm your business e-mail address in relevant boxes. Select No from the Is your company an SAP Partner? list. Click the Submit button. After a short while, you will get an e-mail with a link to connect to SAP Analytics Cloud. Click the Activate Account button in the e-mail. This will open the Activate Your Account page, where you will have to set a strong password. The password must be at least 8 characters long and should also include uppercase and lowercase letters, numbers, and symbols. After entering and confirming your password, click the Save button to complete the activation process. The confirmation page appears telling you that your account is successfully activated. Click Continue. You will be taken to SAP Analytics Cloud site. The e-mail you receive carries a link under SAP Analytics Cloud System section that you can use to access the application any time. Your username (your e-mail address) is also mentioned in the same e-mail along with a log on button to access the application. Summary SAP Analytics Cloud is the next generation cloud-based analytic application, which provides an end-to-end cloud analytics experience. SAP Analytics Cloud can help transform how you discover, plan, predict, collaborate, visualize, and extend all in one solution. In addition to on-premise data sources, you can fetch data from a variety of other cloud apps and even from Excel and text files to build your data models and then create stories based on these models. The ultimate purpose of this amazing and easy-to-use application is to enable you to make the right decision. SAP Analytics Cloud is more than visualization of data it is insight to action, it is realization of success. Resources for Article: Further resources on this subject: Working with User Defined Values in SAP Business One [article] Understanding Text Search and Hierarchies in SAP HANA [article] Meeting SAP Lumira [article]
Read more
  • 0
  • 0
  • 9861

article-image-starting-out
Packt
10 Aug 2017
21 min read
Save for later

Starting Out

Packt
10 Aug 2017
21 min read
In this article by Chris Simmonds, author of the book Mastering Embedded Linux Programming – Second Edition, you are about to begin working on your next project, and this time it is going to be running Linux. What should you think about before you put finger to keyboard? Let's begin with a high-level look at embedded Linux and see why it is popular, what are the implications of open source licenses, and what kind of hardware you will need to run Linux. (For more resources related to this topic, see here.) Linux first became a viable choice for embedded devices around 1999. That was when Axis (https://www.axis.com), released their first Linux-powered network camera and TiVo (https://business.tivo.com/) their first Digital Video Recorder (DVR). Since 1999, Linux has become ever more popular, to the point that today it is the operating system of choice for many classes of product. At the time of writing, in 2017, there are about two billion devices running Linux. That includes a large number of smartphones running Android, which uses a Linux kernel, and hundreds of millions of set-top-boxes, smart TVs, and Wi-Fi routers, not to mention a very diverse range of devices such as vehicle diagnostics, weighing scales, industrial devices, and medical monitoring units that ship in smaller volumes. So, why does your TV run Linux? At first glance, the function of a TV is simple: it has to display a stream of video on a screen. Why is a complex Unix-like operating system like Linux necessary? The simple answer is Moore's Law: Gordon Moore, co-founder of Intel, observed in 1965 that the density of components on a chip will double approximately every two years. That applies to the devices that we design and use in our everyday lives just as much as it does to desktops, laptops, and servers. At the heart of most embedded devices is a highly integrated chip that contains one or more processor cores and interfaces with main memory, mass storage, and peripherals of many types. This is referred to as a System on Chip, or SoC, and SoCs are increasing in complexity in accordance with Moore's Law. A typical SoC has a technical reference manual that stretches to thousands of pages. Your TV is not simply displaying a video stream as the old analog sets used to do. The stream is digital, possibly encrypted, and it needs processing to create an image. Your TV is (or soon will be) connected to the Internet. It can receive content from smartphones, tablets, and home media servers. It can be (or soon will be) used to play games. And so on and so on. You need a full operating system to manage this degree of complexity. Here are some points that drive the adoption of Linux: Linux has the necessary functionality. It has a good scheduler, a good network stack, support for USB, Wi-Fi, Bluetooth, many kinds of storage media, good support for multimedia devices, and so on. It ticks all the boxes. Linux has been ported to a wide range of processor architectures, including some that are very commonly found in SoC designs—ARM, MIPS, x86, and PowerPC. Linux is open source, so you have the freedom to get the source code and modify it to meet your needs. You, or someone working on your behalf, can create a board support package for your particular SoC board or device. You can add protocols, features, and technologies that may be missing from the mainline source code. You can remove features that you don't need to reduce memory and storage requirements. Linux is flexible. Linux has an active community; in the case of the Linux kernel, very active. There is a new release of the kernel every 8 to 10 weeks, and each release contains code from more than 1,000 developers. An active community means that Linux is up to date and supports current hardware, protocols, and standards. Open source licenses guarantee that you have access to the source code. There is no vendor tie-in. For these reasons, Linux is an ideal choice for complex devices. But there are a few caveats I should mention here. Complexity makes it harder to understand. Coupled with the fast moving development process and the decentralized structures of open source, you have to put some effort into learning how to use it and to keep on re-learning as it changes. Selecting the right operating system Is Linux suitable for your project? Linux works well where the problem being solved justifies the complexity. It is especially good where connectivity, robustness, and complex user interfaces are required. However, it cannot solve every problem, so here are some things to consider before you jump in: Is your hardware up to the job? Compared to a traditional real-time operating system (RTOS) such as VxWorks, Linux requires a lot more resources. It needs at least a 32-bit processor and lots more memory. I will go into more detail in the section on typical hardware requirements. Do you have the right skill set? The early parts of a project, board bring-up, detailed knowledge of Linux and how it relates to your hardware. Likewise, when debugging and tuning your application, you will need to be able to interpret the results. If you don't have the skills in-house, you may want to outsource some of the work. Is your system real-time? Linux can handle many real-time activities so long as you pay attention to certain details. Consider these points carefully. Probably the best indicator of success is to look around for similar products that run Linux and see how they have done it; follow best practice. The players Where does open source software come from? Who writes it? In particular, how does this relate to the key components of embedded development—the toolchain, bootloader, kernel, and basic utilities found in the root filesystem? The main players are: The open source community: This, after all, is the engine that generates the software you are going to be using. The community is a loose alliance of developers, many of whom are funded in some way, perhaps by a not-for-profit organization, an academic institution, or a commercial company. They work together to further the aims of the various projects. There are many of them—some small, some large.  CPU architects: These are the organizations that design the CPUs we use. The important ones here are ARM/Linaro (ARM-based SoCs), Intel (x86 and x86_64), Imagination Technologies (MIPS), and IBM (PowerPC). They implement or, at the very least, influence support for the basic CPU architecture. SoC vendors (Atmel, Broadcom, Intel, Qualcomm, TI, and many others). They take the kernel and toolchain from the CPU architects and modify them to support their chips. They also create reference boards: designs that are used by the next level down to create development boards and working products. Board vendors and OEMs: These people take the reference designs from SoC vendors and build them in to specific products, for instance, set-top-boxes or cameras, or create more general purpose development boards, such as those from Avantech and Kontron. An important category are the cheap development boards such as BeagleBoard/BeagleBone and Raspberry Pi that have created their own ecosystems of software and hardware add-ons. These form a chain, with your project usually at the end, which means that you do not have a free choice of components. You cannot simply take the latest kernel from https://www.kernel.org/, except in a few rare cases, because it does not have support for the chip or board that you are using. This is an ongoing problem with embedded development. Ideally, the developers at each link in the chain would push their changes upstream, but they don't. It is not uncommon to find a kernel which has many thousands of patches that are not merged. In addition, SoC vendors tend to actively develop open source components only for their latest chips, meaning that support for any chip more than a couple of years old will be frozen and not receive any updates. The consequence is that most embedded designs are based on old versions of software. They do not receive security fixes, performance enhancements, or features that are in newer versions. Problems such as Heartbleed (a bug in the OpenSSL libraries) and ShellShock (a bug in the bash shell) go unfixed. What can you do about it? First, ask questions of your vendors: what is their update policy, how often do they revise kernel versions, what is the current kernel version, what was the one before that, and what is their policy for merging changes up-stream? Some vendors are making great strides in this way. You should prefer their chips. Secondly, you can take steps to make yourself more self-sufficient. The article explains the dependencies in more detail and show you where you can help yourself. Don't just take the package offered to you by the SoC or board vendor and use it blindly without considering the alternatives. The four elements of embedded Linux Every project begins by obtaining, customizing, and deploying these four elements: the toolchain, the bootloader, the kernel, and the root filesystem: Toolchain: The compiler and other tools needed to create code for your target device. Everything else depends on the toolchain. Bootloader: The program that initializes the board and loads the Linux kernel. Kernel: This is the heart of the system, managing system resources and interfacing with hardware. Root filesystem: Contains the libraries and programs that are run once the kernel has completed its initialization. Of course, there is also a fifth element, not mentioned here. That is the collection of programs specific to your embedded application which make the device do whatever it is supposed to do, be it weigh groceries, display movies, control a robot, or fly a drone. Typically, you will be offered some or all of these elements as a package when you buy your SoC or board. But, for the reasons mentioned in the preceding paragraph, they may not be the best choices for you. Open source The components of embedded Linux are open source, so now is a good time to consider what that means, why open sources work the way they do, and how this affects the often proprietary embedded device you will be creating from it. Licenses When talking about open source, the word free is often used. People new to the subject often take it to mean nothing to pay, and open source software licenses do indeed guarantee that you can use the software to develop and deploy systems for no charge. However, the more important meaning here is freedom, since you are free to obtain the source code, modify it in any way you see fit, and redeploy it in other systems. These licenses give you this right. Compare that with shareware licenses which allow you to copy the binaries for no cost but do not give you the source code, or other licenses that allow you to use the software for free under certain circumstances, for example, for personal use but not commercial. These are not open source. I will provide the following comments in the interest of helping you understand the implications of working with open source licenses, but I would like to point out that I am an engineer and not a lawyer. What follows is my understanding of the licenses and the way they are interpreted. Open source licenses fall broadly into two categories: the copyleft licenses such as the General Public License (GPL) and the permissive licenses such as those from the Berkeley Software Distribution (BSD), the , and others. The permissive licenses say, in essence, that you may modify the source code and use it in systems of your own choosing so long as you do not modify the terms of the license in any way. In other words, with that one restriction, you can do with it what you want, including building it into possibly proprietary systems. The GPL licenses are similar, but have clauses which compel you to pass the rights to obtain and modify the software on to your end users. In other words, you share your source code. One option is to make it completely public by putting it onto a public server. Another is to offer it only to your end users by means of a written offer to provide the code when requested. The GPL goes further to say that you cannot incorporate GPL code into proprietary programs. Any attempt to do so would make the GPL apply to the whole. In other words, you cannot combine a GPL and proprietary code in one program. So, what about libraries? If they are licensed with the GPL, any program linked with them becomes GPL also. However, most libraries are licensed under the Lesser General Public License (LGPL). If this is the case, you are allowed to link with them from a proprietary program. All the preceding description relates specifically to GLP v2 and LGPL v2.1. I should mention the latest versions of GLP v3 and LGPL v3. These are controversial, and I will admit that I don't fully understand the implications. However, the intention is to ensure that the GPLv3 and LGPL v3 components in any system can be replaced by the end user, which is in the spirit of open source software for everyone. It does pose some problems though. Some Linux devices are used to gain access to information according to a subscription level or another restriction, and replacing critical parts of the software may compromise that. Set-top-boxes fit into this category. There are also issues with security. If the owner of a device has access to the system code, then so might an unwelcome intruder. Often the defense is to have kernel images that are signed by an authority, the vendor, so that unauthorized updates are not possible. Is that an infringement of my right to modify my device? Opinions differ. The TiVo set-top-box is an important part of this debate. It uses a Linux kernel, which is licensed under GPL v2. TiVo have released the source code of their version of the kernel and so comply with the license. TiVo also has a bootloader that will only load a kernel binary that is signed by them. Consequently, you can build a modified kernel for a TiVo box but you cannot load it on the hardware. The Free Software Foundation (FSF) takes the position that this is not in the spirit of open source software and refers to this procedure as Tivoization. The GPL v3 and LGPL v3 were written to explicitly prevent this happening. Some projects, the Linux kernel in particular, have been reluctant to adopt the version three licenses because of the restrictions it would place on device manufacturers. Hardware for embedded Linux If you are designing or selecting hardware for an embedded Linux project, what do you look out for? Firstly, a CPU architecture that is supported by the kernel—unless you plan to add a new architecture yourself, of course! Looking at the source code for Linux 4.9, there are 31 architectures, each represented by a sub-directory in the arch/ directory. They are all 32- or 64-bit architectures, most with a memory management unit (MMU), but some without. The ones most often found in embedded devices are ARM, MIPS PowerPC, and X86, each in 32- and 64-bit variants, and all of which have memory management units. That doesn't have an MMU that runs a subset of Linux known as microcontroller Linux or uClinux. These processor architectures include ARC, Blackfin, MicroBlaze, and Nios. I will mention uClinux from time to time but I will not go into detail because it is a rather specialized topic. Secondly, you will need a reasonable amount of RAM. 16 MiB is a good minimum, although it is quite possible to run Linux using half that. It is even possible to run Linux with 4 MiB if you are prepared to go to the trouble of optimizing every part of the system. It may even be possible to get lower, but there comes a point at which it is no longer Linux. Thirdly, there is non-volatile storage, usually flash memory. 8 MiB is enough for a simple device such as a webcam or a simple router. As with RAM, you can create a workable Linux system with less storage if you really want to, but the lower you go, the harder it becomes. Linux has extensive support for flash storage devices, including raw NOR and NAND flash chips, and managed flash in the form of SD cards, eMMC chips, USB flash memory, and so on. Fourthly, a debug port is very useful, most commonly an RS-232 serial port. It does not have to be fitted on production boards, but makes board bring-up, debugging, and development much easier. Fifthly, you need some means of loading software when starting from scratch. A few years ago, boards would have been fitted with a Joint Test Action Group (JTAG) interface for this purpose, but modern SoCs have the ability to load boot code directly from removable media, especially SD and micro SD cards, or serial interfaces such as RS-232 or USB. In addition to these basics, there are interfaces to the specific bits of hardware your device needs to get its job done. Mainline Linux comes with open source drivers for many thousands of different devices, and there are drivers (of variable quality) from the SoC manufacturer and from the OEMs of third-party chips that may be included in the design, but remember my comments on the commitment and ability of some manufacturers. As a developer of embedded devices, you will find that you spend quite a lot of time evaluating and adapting third-party code, if you have it, or liaising with the manufacturer if you don't. Finally, you will have to write the device support for interfaces that are unique to the device, or find someone to do it for you. Hardware The worked examples are intended to be generic, but to make them relevant and easy to follow, I have had to choose specific hardware. I have chosen two exemplar devices: the BeagleBone Black and QEMU. The first is a widely-available and cheap development board which can be used in serious embedded hardware. The second is a machine emulator that can be used to create a range of systems that are typical of embedded hardware. It was tempting to use QEMU exclusively, but, like all emulations, it is not quite the same as the real thing. Using a BeagleBone Black, you have the satisfaction of interacting with real hardware and seeing real LEDs flash. I could have selected a board that is more up-to-date than the BeagleBone Black, which is several years old now, but I believe that its popularity gives it a degree of longevity and it means that it will continue to be available for some years yet. In any case, I encourage you to try out as many of the examples as you can, using either of these two platforms, or indeed any embedded hardware you may have to hand. The BeagleBone Black The BeagleBone and the later BeagleBone Black are open hardware designs for a small, credit card sized development board produced by CircuitCo LLC. The main repository of information is at https://beagleboard.org/. The main points of the specifications are: TI AM335x 1 GHz ARM® Cortex-A8 Sitara SoC 512 MiB DDR3 RAM 2 or 4 GiB 8-bit eMMC on-board flash storage Serial port for debug and development MicroSD connector, which can be used as the boot device Mini USB OTG client/host port that can also be used to power the board Full size USB 2.0 host port 10/100 Ethernet port HDMI for video and audio output In addition, there are two 46-pin expansion headers for which there are a great variety of daughter boards, known as capes, which allow you to adapt the board to do many different things. However, you do not need to fit any capes in the examples. In addition to the board itself, you will need: A mini USB to full-size USB cable (supplied with the board) to provide power, unless you have the last item on this list. An RS-232 cable that can interface with the 6-pin 3.3V TTL level signals provided by the board. The Beagleboard website has links to compatible cables. A microSD card and a means of writing to it from your development PC or laptop, which will be needed to load software onto the board. An Ethernet cable, as some of the examples require network connectivity. Optional, but recommended, a 5V power supply capable of delivering 1 A or more. QEMU QEMU is a machine emulator. It comes in a number of different flavors, each of which can emulate a processor architecture and a number of boards built using that architecture. For example, we have the following: qemu-system-arm: ARM qemu-system-mips: MIPS qemu-system-ppc: PowerPC qemu-system-x86: x86 and x86_64 For each architecture, QEMU emulates a range of hardware, which you can see by using the option—machine help. Each machine emulates most of the hardware that would normally be found on that board. There are options to link hardware to local resources, such as using a local file for the emulated disk drive. Here is a concrete example: $ qemu-system-arm -machine vexpress-a9 -m 256M -drive file=rootfs.ext4,sd -net nic -net use -kernel zImage -dtb vexpress- v2p-ca9.dtb -append "console=ttyAMA0,115200 root=/dev/mmcblk0" - serial stdio -net nic,model=lan9118 -net tap,ifname=tap0 The options used in the preceding command line are: -machine vexpress-a9: Creates an emulation of an ARM Versatile Express development board with a Cortex A-9 processor -m 256M: Populates it with 256 MiB of RAM -drive file=rootfs.ext4,sd: Connects the SD interface to the local file rootfs.ext4 (which contains a filesystem image) -kernel zImage: Loads the Linux kernel from the local file named zImage -dtb vexpress-v2p-ca9.dtb: Loads the device tree from the local file vexpress-v2p-ca9.dtb -append "...": Supplies this string as the kernel command-line -serial stdio: Connects the serial port to the terminal that launched QEMU, usually so that you can log on to the emulated machine via the serial console -net nic,model=lan9118: Creates a network interface -net tap,ifname=tap0: Connects the network interface to the virtual network interface tap0 To configure the host side of the network, you need the tunctl command from the User Mode Linux (UML) project; on Debian and Ubuntu, the package is named uml-utilites: $ sudo tunctl -u $(whoami) -t tap0 This creates a network interface named tap0 which is connected to the network controller in the emulated QEMU machine. You configure tap0 in exactly the same way as any other interface. I will be using Versatile Express for most of my examples, but it should be easy to use a different machine or architecture. Software I have used only open source software, both for the development tools and the target operating system and applications. I assume that you will be using Linux on your development system. I tested all the host commands using Ubuntu 14.04 and so there is a slight bias towards that particular version, but any modern Linux distribution is likely to work just fine. Summary Embedded hardware will continue to get more complex, following the trajectory set by Moore's Law. Linux has the power and the flexibility to make use of hardware in an efficient way. Linux is just one component of open source software out of the many that you need to create a working product. The fact that the code is freely available means that people and organizations at many different levels can contribute. However, the sheer variety of embedded platforms and the fast pace of development lead to isolated pools of software which are not shared as efficiently as they should be. In many cases, you will become dependent on this software, especially the Linux kernel that is provided by your SoC or Board vendor, and to a lesser extent, the toolchain. Some SoC manufacturers are getting better at pushing their changes upstream and the maintenance of these changes is getting easier. Fortunately, there are some powerful tools that can help you create and maintain the software for your device. For example, Buildroot is ideal for small systems and the Yocto Project for larger ones. Before I describe these build tools, I will describe the four elements of embedded Linux, which you can apply to all embedded Linux projects, however they are created. Resources for Article: Further resources on this subject: Programming with Linux [article] Embedded Linux and Its Elements [article] Revisiting Linux Network Basics [article]
Read more
  • 0
  • 0
  • 2423
Unlock access to the largest independent learning library in Tech for FREE!
Get unlimited access to 7500+ expert-authored eBooks and video courses covering every tech area you can think of.
Renews at $19.99/month. Cancel anytime
article-image-creating-first-python-script
Packt
09 Aug 2017
27 min read
Save for later

Creating the First Python Script

Packt
09 Aug 2017
27 min read
In this article by Silas Toms, the author of the book ArcPy and ArcGIS - Second Edition, we will demonstrate how to use ModelBuilder, which ArcGIS professionals are already familiar with, to model their first analysis and then export it out as a script. With the Python  environment configured to fit our needs, we can now create and execute ArcPy scripts. To ease into the creation of Python scripts, this article will use ArcGIS ModelBuilder to model a simple analysis, and export it as a Python script. ModelBuilder is very useful for creating Python scripts. It has an operational and a visual component, and all models can be outputted as Python scripts, where they can be further customized.  This article we will cover the following topics: Modeling a simple analysis using ModelBuilder Exporting the model out to a Python script Window file paths versus Pythonic file paths String formatting methods (For more resources related to this topic, see here.) Prerequisites The following are the prerequisites for this article: ArcGIS 10x and Python 2.7, with arcpy available as a module. For this article, the accompanying data and scripts should be downloaded from Packt Publishing's website. The completed scripts are available for comparison purposes, and the data will be used for this article's analysis. To run the code and test code examples, use your favorite IDE or open the IDLE (Python GUI) program from the Start Menu/ArcGIS/Python2.7 folder after installing ArcGIS for Desktop. Use the built-in "interpreter" or code entry interface, indicated by the triple chevron >>> and a blinking cursor. ModelBuilder ArcGIS has been in development since the 1970s. Since that time, it has included a variety of programming languages and tools to help GIS users automate analysis and map production. These include the Avenue scripting language in the ArcGIS 3x series, and the ARC Macro Language (AML) in the ARCInfo Workstation days as well as VBScript up until ArcGIS 10x, when Python was introduced. Another useful tool introduced in ArcGIS 9x was ModelBuilder, a visual programming environment used for both modeling analysis and creating tools that can be used repeatedly with different input feature classes. A useful feature of ModelBuilder is an export function, which allows modelers to create Python scripts directly from a model. This makes it easier to compare how parameters in a ModelBuilder tool are accepted as compared to how a Python script calls the same tool and supplies its parameters, and how generated feature classes are named and placed within the file structure. ModelBuilder is a helpful tool on its own, and its Python export functionality makes it easy for a GIS analyst to generate and customize ArcPy scripts. Creating a model and exporting to Python This article and the associated scripts depend on the downloadable file SanFrancisco.gdb geodatabase available from Packt. SanFrancisco.gdb contains data downloaded from https://datasf.org/ and the US Census' American Factfinder website at https://factfinder.census.gov/faces/nav/jsf/pages/index.xhtml. All census and geographic data included in the geodatabase is from the 2010 census. The data is contained within a feature dataset called SanFrancisco. The data in this feature dataset is in NAD 83 California State Plane Zone 3, and the linear unit of measure is the US foot. This corresponds to SRID 2227 in the European Petroleum Survey Group (EPSG) format. The analysis which will create with the model, and eventually export to Python for further refinement, will use bus stops along a specific line in San Francisco. These bus stops will be buffered to create a representative region around each bus stop. The buffered areas will then be intersected with census blocks to find out how many people live within each representative region around the bus stops. Modeling the Select and Buffer tools Using ModelBuilder, we will model the basic bus stop analysis. Once it has been modeled, it will be exported as an automatically generated Python script. Follow these steps to begin the analysis: Open up ArcCatalog, and create a folder connection to the folder containing SanFrancisco.gdb. I have put the geodatabase in a C drive folder called "Projects" for a resulting file path of C:ProjectsSanFrancisco.gdb. Right-click on  geodatabase, and add a new toolbox called Chapter2Tools. Right-click on geodatabase; select New, and then Feature Dataset, from the menu. A dialogue will appear that asks for a name; call it Chapter2Results, and push Next. It will ask for a spatial reference system; enter 2227 into the search bar, and push the magnifying glass icon. This will locate the correct spatial reference system: NAD 1983 StatePlane California III FIPS 0403 Feet. Don't select a vertical reference system, as we are not doing any Z value analysis. Push next, select the default tolerances, and push Finish. Next, open ModelBuilder using the ModelBuilder icon or by right-clicking on the Toolbox, and create a new Model. Save the model in the Chapter2Tools toolbox as Chapter2Model1. Drag in the  Bus_Stops feature class and the Select tool from the Analysis/Extract toolset in ArcToolbox. Open up the Select tool, and name the output feature class Inbound71. Make sure that the feature class is written to the Chapter2Results feature dataset. Open up the Expression SQL Query Builder, and create the following SQL expression : NAME = '71 IB' AND BUS_SIGNAG = 'Ferry Plaza'. The next step is to add a Buffer Tool from the Analysis/Proximity toolset. The Buffer tool will be used to create buffers around each bus stop. The buffered bus stops allow us to intersect with census data in the form of census blocks, creating the representative regions around each bus stop. Connect the output of the Select tool (Inbound71) to the Buffer tool. Open up the Buffer tool, add 400 to the Distance field, and change the units to Feet. Leave the rest of the options blank. Click on OK, and return to the model: Adding in the Intersect tool Now that we have selected the bus line of interest, and buffered the stops to create representative regions, we will need to intersect the regions with the census blocks to find the population of each representative region. This can be done as follows: First, add the CensusBlocks2010 feature class from the SanFrancisco feature dataset to the model. Next, add in the Intersect tool located in the Analysis/Overlay toolset in the ArcToolbox. While we could use a Spatial Join to achieve a similar result, I have used the Intersect tool to capture the area of intersect for use later in the model and script. At this point, our model should look like this: Tallying the analysis results After we have created this simple analysis, the next step is to determine the results for each bus stop. Finding the number of people that live in census blocks, touched by the 400-foot buffer of each bus stop, involves examining each row of data in the final feature class, and selecting rows that correspond to the bus stop. Once these are selected, a sum of the selected rows would be calculated either using the Field Calculator or the Summarize tool. All of these methods will work, and yet none are perfect. They take too long, and worse, are not repeatable automatically if an assumption in the model is adjusted (if the buffer is adjusted from 400 feet to 500 feet, for instance). This is where the traditional uses of ModelBuilder begin to fail analysts. It should be easy to instruct the model to select all rows associated with each bus stop, and then generate a summed population figure for each bus stop's representative region. It would be even better to have the model create a spreadsheet to contain the final results of the analysis. It's time to use Python to take this analysis to the next level. Exporting the model and adjusting the script While modeling analysis in ModelBuilder has its drawbacks, there is one fantastic option built into ModelBuilder: the ability to create a model, and then export the model to Python. Along with the ArcGIS Help Documentation, it is the best way to discover the correct Python syntax to use when writing ArcPy scripts. Create a folder that can hold the exported scripts next to the SanFrancisco geodatabase (for example, C:ProjectsScripts). This will hold both the exported scripts that ArcGIS automatically generates, and the versions that we will build from those generated scripts. Now, perform the following steps: Open up the model called Chapter2Model1. Click on the Model menu in the upper-left side of the screen. Select Export from the menu. Select To Python Script. Save the script as Chapter2Model1.py. Note that there is also the option to export the model as a graphic. Creating a graphic of the model is a good way to share what the model is doing with other analysts without the need to share the model and the data, and can also be useful when sharing Python scripts as well. The Automatically generated script Open the automatically generated script in an IDE. It should look like this: # -*- coding: utf-8 -*- # --------------------------------------------------------------------------- # Chapter2Model1.py # Created on: 2017-01-26 04:26:31.00000 # (generated by ArcGIS/ModelBuilder) # Description: # --------------------------------------------------------------------------- # Import arcpy module import arcpy # Local variables: Bus_Stops = "C:ProjectsSanFrancisco.gdbSanFranciscoBus_Stops" Inbound71 = "C:ProjectsSanFrancisco.gdbChapter2ResultsInbound71" Inbound71_400ft_Buffer = "C:ProjectsSanFrancisco.gdbChapter2ResultsInbound71_400ft_Buffer" CensusBlocks2010 = "C:ProjectsSanFrancisco.gdbSanFranciscoCensusBlocks2010" Intersect71Census = "C:ProjectsSanFrancisco.gdbChapter2ResultsIntersect71Census" # Process: Select arcpy.Select_analysis(Bus_Stops, Inbound71, "NAME = '71 IB' AND BUS_SIGNAG = 'Ferry Plaza'") # Process: Buffer arcpy.Buffer_analysis(Inbound71, Inbound71_400ft_buffer, "400 Feet", "FULL", "ROUND", "NONE", "") # Process: Intersect arcpy.Intersect_analysis("C:ProjectsSanFrancisco.gdbChapter2ResultsInbound71_400ft_Buffer #;C:ProjectsSanFrancisco.gdbSanFranciscoCensusBlocks2010 #",Intersect71Census, "ALL", "", "INPUT") Let's examine this script line by line. The first line is preceded by a pound sign ("#"), which again means that this line is a comment; however, it is not ignored by the Python interpreter when the script is executed as usual, but is used to help Python interpret the encoding of the script as described here: http://legacy.python.org/dev/peps/pep-0263. The second commented line and the third line are included for decorative purposes. The next four lines, all commented, are used for providing readers information about the script: what it is called and when it was created along with a description, which is pulled from the model's properties. Another decorative line is included to visually separate out the informative header from the body of the script. While the commented information section is nice to include in a script for other users of the script, it is not necessary. The body of the script, or the executable portion of the script, starts with the import arcpy line. Import statements are, by convention, included at the top of the body of the script. In this instance, the only module that is being imported is ArcPy. ModelBuilder's export function creates not only an executable script, but also comments each section to help mark the different sections of the script. The comments let user know where the variables are located, and where the ArcToolbox tools are being executed.  After the import statements come the variables. In this case, the variables represent the file paths to the input and output feature classes. The variable names are derived from the names of the feature classes (the base names of the file paths). The file paths are assigned to the variables using the assignment operator ("="), and the parts of the file paths are separated by two backslashes. File paths in Python To store and retrieve data, it is important to understand how file paths are used in Python as compared to how they are represented in Windows. In Python, file paths are strings, and strings in Python have special characters used to represent tabs "t", newlines "n", or carriage returns "r", among many others. These special characters all incorporate single backslashes, making it very hard to create a file path that uses single backslashes. File paths in Windows Explorer all use single backslashes. Windows Explorer: C:ProjectsSanFrancisco.gdbChapter2ResultsIntersect71Census Python was developed within the Linux environment, where file paths have forward slashes. There are a number of methods used to avoid this issue. The first is using filepaths with forward slashes. The Python interpreter will understand file paths with forward slashes as seen in this code: Python version: "C:/Projects/SanFrancisco.gdb/Chapter2Results/Intersect71Census" Within a Python script, the Python file path with the forward slashes will definitely work, while the Windows Explorer version might cause the script to throw an exception as Python strings can have special characters like the newline character n, or tab t. that will cause the string file path to be read incorrectly by the Python interpreter. Another method used to avoid the issue with special characters is the one employed by ModelBuilder when it automatically creates the Python scripts from a model. In this case, the backslashes are "escaped" using a second backslash. The preceding script uses this second method to produce the following results: Python escaped version: "C:ProjectsSanFrancisco.gdbChapter2ResultsIntersect71Census" The third method, which I use when copying file paths from ArcCatalog or Windows Explorer into scripts, is to create what is known as a "raw" string. This is the same as a regular string, but it includes an "r" before the script begins. This "r" alerts the Python interpreter that the following script does not contain any special characters or escape characters. Here is an example of how it is used: Python raw string: r"C:ProjectsSanFrancisco.gdbSanFranciscoBus_Stops" Using raw strings makes it easier to grab a file path from Windows Explorer, and add it to a string inside a script. It also makes it easier to avoid accidentally forgetting to include a set of double backslashes in a file path, which happens all the time and is the cause of many script bugs. String manipulation There are three major methods for inserting variables into strings. Each has different advantages and disadvantages of a technical nature. It's good to know about all three, as they have uses beyond our needs here, so let's review them. String manipulation method 1: string addition String addition seems like an odd concept at first, as it would not seem possible to "add" strings together, unlike integers or floats which are numbers. However, within Python and other programming languages, this is a normal step. Using the plus sign "+", strings are "added" together to make longer strings, or to allow variables to be added into the middle of existing strings. Here are some examples of this process: >>> aString = "This is a string" >>> bString = " and this is another string" >>> cString = aString + bString >>> cString The output is as follows: 'This is a string and this is another string' Two or more strings can be "added" together, and the result can be assigned to a third variable for using it later in the script. This process can be useful for data processing and formatting.  Another similar offshoot of string addition is string multiplication, where strings are multiplied by an integer to produce repeating versions of the string, like this: >>> "string" * 3 'stringstringstring' String manipulation method 2: string formatting #1 The second method of string manipulation, known as string formatting, involves adding placeholders into the string, which accept specific kinds of data. This means that these special strings can accept other strings as well as integers and float values. These placeholders use the modulo "%" and a key letter to indicate the type of data to expect. Strings are represented using %s, floats using %f, and integers using %d. The floats can also be adjusted to limit the digits included by adding a modifying number after the modulo. If there is more than one placeholder in a string, the values are passed to the string in a tuple. This method has become less popular, since the third method discussed next was introduced in Python 2.6, but it is still valuable to know, as many older scripts use it. Here is an example of this method: >>> origString = "This string has as a placeholder %s" >>> newString = origString % "and this text was added" >>> print newString The output is as follows: This string has as a placeholder and this text was added Here is an example when using a float placeholder: >>> floatString1 = "This string has a float here: %f" >>> newString = floatString % 1.0 >>> newString = floatString1 % 1.0 >>> print newString The output is as follows: This string has a float here: 1.000000 Here is another example when using a float placeholder: >>> floatString2 = "This string has a float here: %.1f" >>> newString2 = floatString2 % 1.0 >>> print newString2 The output is as follows: This string has a float here: 1.0 Here is an example using an integer placeholder: >>> intString = "Here is an integer: %d" >>> newString = intString % 1 >>> print newString The output is as follows: Here is an integer: 1 String manipulation method 3: string formatting #2 The final method is known as string formatting. It is similar to the string formatting method 1, with the added benefit of not requiring a specific data type of placeholder. The placeholders, or tokens as they are also known, are only required to be in order to be accepted. The format function is built into strings; by adding .format to the string, and passing in parameters, the string accepts the values, as seen in the following example: >>> formatString = "This string has 3 tokens: {0}, {1}, {2}" >>> newString = formatString.format("String", 2.5, 4) >>> print newString This string has 3 tokens: String, 2.5, 4 The tokens don't have to be in order within the string, and can even be repeated by adding a token wherever it is needed within the template. The order of the values applied to the template is derived from the parameters supplied to the .format function, which passes the values to the string. The third method has become my go-to method for string manipulation because of the ability to add the values repeatedly, and because it makes it possible to avoid supplying the wrong type of data to a specific placeholder, unlike the second method. The ArcPy tools After the import statements and the variable definitions, the next section of the script is where the analysis is executed. The same tools that we created in the model--the Select, Buffer, and Intersect tools, are included in this section. The same parameters that we supplied in the model are also included here: the inputs and outputs, plus the SQL statement in the Select tool, and the buffer distance in the Buffer tool. The tool parameters are supplied to the tools in the script in the same order as they appear in the tool interfaces in the model. Here is the Select tool in the script: arcpy.Select_analysis(Bus_Stops, Inbound71, "NAME = '71 IB' AND BUS_SIGNAG = 'Ferry Plaza'") It works like this: the arcpy module has a "method", or tool, called Select_analysis. This method, when called, requires three parameters: the input feature class (or shapefile), the output feature class, and the SQL statement. In this example, the input is represented by the variable Bus_Stops, and the output feature class is represented by the variable Inbound71, both of which are defined in the variable section. The SQL statement is included as the third parameter. Note that it could also be represented by a variable if the variable was defined preceding to this line; the SQL statement, as a string, could be assigned to a variable, and the variable could replace the SQL statement as the third parameter. Here is an example of parameter replacement using a variable: sqlStatement = "NAME = '71 IB' AND BUS_SIGNAG = 'Ferry Plaza'" arcpy.Select_analysis(Bus_Stops, Inbound71, sqlStatement) While ModelBuilder is good for assigning input and output feature classes to variables, it does not assign variables to every portion of the parameters. This will be an important thing to correct when we adjust and build our own scripts. The Buffer tool accepts a similar set of parameters as the Select tool. There is an input feature class represented by a variable, an output feature class variable, and the distance that we provided (400 feet in this case) along with a series of parameters that were supplied by default. Note that the parameters rely on keywords, and these keywords can be adjusted within the text of the script to adjust the resulting buffer output. For instance, "Feet" could be adjusted to "Meters", and the buffer would be much larger. Check the help section of the tool to understand better how the other parameters will affect the buffer, and to find the keyword arguments that are accepted by the Buffer tool in ArcPy. Also, as noted earlier, all of the parameters could be assigned to variables, which can save time if the same parameters are used repeatedly throughout a script. Sometimes, the supplied parameter is merely an empty string, as in this case here with the last parameter: arcpy.Buffer_analysis(Inbound71,Inbound71_400ft_buffer, "400 Feet", "FULL", "ROUND", "NONE", "") The empty string for the last parameter, which, in this case, signifies that there is no dissolve field for this buffer, is found quite frequently within ArcPy. It could also be represented by two single quotes, but ModelBuilder has been built to use double quotes to encase strings. The Intersect tool The last tool, the Intersect tool, uses a different method to represent the files that need to be intersected together when the tool is executed. Because the tool accepts multiple files in the input section (meaning, there is no limit to the number of files that can be intersected together in one operation), it stores all of the file paths within one string. This string can be manipulated using one of the string manipulation methods discussed earlier, or it can be reorganized to accept a Python list that contains the file paths, or variables representing file paths as a list, as the first parameter in any order. The Intersect tool will find the intersection of all of the strings. Adjusting the script Now is the time to take the automatically generated script, and adjust it to fit our needs. We want the script to both produce the output data, and to have it analyze the data and tally the results into a spreadsheet. This spreadsheet will hold an averaged population value for each bus stop. The average will be derived from each census block that the buffered representative region surrounding the stops intersected. Save the original script as "Chapter2Model1Modified.py". Adding the CSV module to the script For this script, we will use the csv module, a useful module for creating Comma-Separated Value spreadsheets. Its simple syntax will make it a useful tool for creating script outputs. ArcGIS for Desktop also installs the xlrd and xlwt modules, used to read or generate Excel spreadsheets respectively, when it is installed. These modules are also great for data analysis output. After the import arcpy line, add import csv. This will allow us to use the csv module for creating the spreadsheet. # Import arcpy module import arcpy import csv The next adjustment is made to the Intersect tool. Notice that the two paths included in the input string are also defined as variables in the variable section. Remove the file paths from the input strings, and replace them with a list containing the variable names of the input datasets, as follows: # Process: Intersect arcpy.Intersect_analysis([Inbound71_400ft_buffer,CensusBlocks2010],Intersect71Census, "ALL", "", "INPUT") Accessing the data: using a cursor Now that the script is in place to generate the raw data we need, we need a way to access the data held in the output feature class from the Intersect tool. This access will allow us to aggregate the rows of data representing each bus stop. We also need a data container to hold the aggregated data in memory before it is written to the spreadsheet. To accomplish the second part, we will use a Python dictionary. To accomplish the first part, we will use a method built into the ArcPy module: the Data Access SearchCursor. The Python dictionary will be added after the Intersect tool. A dictionary in Python is created using curly brackets {}. Add the following line to the script, below the analysis section: dataDictionary = {} This script will use the bus stop IDs as keys for the dictionary. The values will be lists, which will hold all of the population values associated with each busStopID. Add the following lines to generate a Data Cursor: with arcpy.da.SearchCursor(Intersect71Census, ["STOPID","POP10"]) as cursor: for row in cursor: busStopID = row[0] pop10 = row[1] if busStopID not in dataDictionary.keys(): dataDictionary[busStopID] = [pop10] else: dataDictionary[busStopID].append(pop10) This iteration combines a few ideas in Python and ArcPy. The with...as statement is used to create a variable (cursor), which represents the arcpy.da.SearchCursor object. It could also be written like this: cursor = arcpy.da.SearchCursor(Intersect71Census, ["STOPID","POP10"]) The advantage of the with...as structure is that the cursor object is erased from memory when the iteration is completed, which eliminates locks on the feature classes being evaluated. The arcpy.da.SearchCursor function requires an input feature class, and a list of fields to be returned. Optionally, an SQL statement can limit the number of rows returned. The next line, for row in cursor:, is the iteration through the data. It is not a normal Pythonic iteration, a distinction that will have ramifications in certain instances. For instance, one cannot pass index parameters to the cursor object to only evaluate specific rows within the cursor object, as one can do with a list.  When using a Search Cursor, each row of data is returned as a tuple, which cannot be modified. The data can be accessed using indexes. The if...else condition allows the data to be sorted. As noted earlier, the bus stop ID, which is the first member of the data included in the tuple, will be used as a key. The conditional evaluates if the bus stop ID is included in the dictionary's existing keys (which are contained in a list, and accessed using the dictionary.keys() method). If it is not, it is added to the keys, and assigned a value that is a list that contains (at first) one piece of data, the population value contained in that row. If it does exist in the keys, the list is appended with the next population value associated with that bus stop ID. With this code, we have now sorted each census block population according to the bus stop with which it is associated. Next we need to add code to create the spreadsheet. This code will use the same with...as structure, and will generate an average population value by using two built-in Python functions: sum, which creates a sum from a list of numbers, and len, which will get the length of a list, tuple, or string. with open(r'C:ProjectsAverages.csv', 'wb') as csvfile: csvwriter = csv.writer(csvfile, delimiter=',') for busStopID in dataDictionary.keys(): popList = dataDictionary[busStopID] averagePop = sum(popList)/len(popList) data = [busStopID, averagePop] csvwriter.writerow(data) The average population value is retrieved from the dictionary using the busStopID key, and then assigned to the variable averagePop. The two data pieces, the busStopID and the averagePop variable are then added to a list.This list is supplied to a csvwriter object, which knows how to accept the data and write it out to a file located at the file path supplied to the built-in Python function open, used to create simple files. The script is complete, although it is nice to add one more line to the end to give us visual confirmation that the script has run. print "Data Analysis Complete" This last line will create an output indicating that the script has run. Once it is done, go to the location of the output CSV file and open it using Excel or Notepad, and see the results of the analysis. Our first script is complete! Exceptions and  tracebacks During the process of writing and testing scripts, there will be errors that cause the code to break and throw exceptions. In Python, these are reported as a "traceback", which shows the last few lines of code executed before an exception occurred. To best understand the message, read them from the last line up. It will tell you the type of exception that occurred, and preceding to that will be the code that failed, with a line number, that should allow you to find and fix the code. It's not perfect, but it works. Overwriting files One common issue is that ArcGIS for Desktop does not allow you to overwrite files without turning on an environment variable. To avoid this issue, you can add a line after the import statements that will make overwriting files possible. Be aware that the original data will be unrecoverable once it is overwritten. It uses the env module to access the ArcGIS environment: import arcpy arcpy.env.overwriteOutput = True The final script Here is how the script should look in the end: # Chapter2Model1Modified.py # Import arcpy module import arcpy import csv # Local variables: Bus_Stops = r"C:ProjectsSanFrancisco.gdbSanFranciscoBus_Stops" CensusBlocks2010 = r"C:ProjectsSanFrancisco.gdbSanFranciscoCensusBlocks2010" Inbound71 = r"C:ProjectsSanFrancisco.gdbChapter2ResultsInbound71" Inbound71_400ft_buffer = r"C:ProjectsSanFrancisco.gdbChapter2ResultsInbound71_400ft_buffer" Intersect71Census = r"C:ProjectsSanFrancisco.gdbChapter2ResultsIntersect71Census" # Process: Select arcpy.Select_analysis(Bus_Stops, Inbound71, "NAME = '71 IB' AND BUS_SIGNAG = 'Ferry Plaza'") # Process: Buffer arcpy.Buffer_analysis(Inbound71, Inbound71_400ft_buffer, "400 Feet", "FULL", "ROUND", "NONE", "") # Process: Intersect arcpy.Intersect_analysis([Inbound71_400ft_buffe,CensusBlocks2010], Intersect71Census, "ALL", "", "INPUT") dataDictionary = {} with arcpy.da.SearchCursor(Intersect71Census, ["STOPID","POP10"]) as cursor: for row in cursor: busStopID = row[0] pop10 = row[1] if busStopID not in dataDictionary.keys(): dataDictionary[busStopID] = [pop10] else: dataDictionary[busStopID].append(pop10) with open(r'C:ProjectsAverages.csv', 'wb') as csvfile: csvwriter = csv.writer(csvfile, delimiter=',') for busStopID in dataDictionary.keys(): popList = dataDictionary[busStopID] averagePop = sum(popList)/len(popList) data = [busStopID, averagePop] csvwriter.writerow(data) print "Data Analysis Complete" Summary In this article, you learned how to craft a model of an analysis and export it out to a script. In particular, you learned how to use ModelBuilder to create an analysis and export it out as a script and how to adjust the script to be more "Pythonic". After explaining about the auto-generated script, we adjusted the script to include a results analysis and summation, which was outputted to a CSV file. We also briefly touched on the use of Search Cursors. Also, we saw how built-in modules such as the csv module can be used along with ArcPy to capture analysis output in formatted spreadsheets. Resources for Article: Further resources on this subject: Using the ArcPy DataAccess Module withFeature Classesand Tables [article] Measuring Geographic Distributions with ArcGIS Tool [article] Learning to Create and Edit Data in ArcGIS [article]
Read more
  • 0
  • 0
  • 6626

article-image-games-and-exercises
Packt
09 Aug 2017
3 min read
Save for later

Games and Exercises

Packt
09 Aug 2017
3 min read
In this article by Shishira Bhat and Ravi Wray, authors of the book, Learn Java in 7 days, we will study the following concepts: Making an object as the return type for a method Making an object as the parameter for a method (For more resources related to this topic, see here.) Let’s start this article by revisiting the reference variablesand custom data types: In the preceding program, p is a variable of datatype,Pen. Yes! Pen is a class, but it is also a datatype, a custom datatype. The pvariable stores the address of the Penobject, which is in heap memory. The pvariable is a reference that refers to a Penobject. Now, let’s get more comfortable by understanding and working with examples. How to return an Object from a method? In this section, let’s understand return types. In the following code, methods returnthe inbuilt data types (int and String), and the reason is explained after each method, as follows: int add () { int res = (20+50); return res; } The addmethod returns the res(70) variable, which is of the int type. Hence, the return type must be int: String sendsms () { String msg = "hello"; return msg; } The sendsmsmethod returns a variable by the name of msg, which is of the String type. Hence, the return type is String. The data type of the returning value and the return type must be the same. In the following code snippet, the return type of the givenPenmethod is not an inbuilt data type. However, the return type is a class (Pen) Let’s understand the following code: The givePen ()methodreturns a variable (reference variable) by the name of p, which is of the Pen type. Hence, the return type is Pen: In the preceding program, tk is a variable of the Ticket type. The method returns tk; hence, the return type of the method is Ticket. A method accepting an object (parameter) After seeing how a method can return an object/reference, let's understand how a method can take an object/reference as the input,that is, parameter. We already understood that if a method takes parameter(s), then we need to pass argument(s). Example In the preceding program,the method takestwo parameters,iandk. So, while calling/invoking the method, we need to pass two arguments, which are 20.5 and 15. The parameter type andthe argument type must be the same. Remember thatwhen class is the datatype, then object is the data. Consider the following example with respect toa non-primitive/class data type andthe object as its data: In the preceding code, the Kid class has the eat method, which takes ch as a parameter of the Chocolatetype, that is,the data type of ch is Chocolate, which is a class. When class is the data type then the object of that class is an actual data or argument. Hence,new Chocolate() is passed as an argument to the eat method. Let's see one more example: The drink method takes wtr as the parameter of the type,Water, which is a class/non-primitive type; hence, the argument must be an object of theWater class. Summary In this article we have learned what to return when a class is a return type for a method and what to pass as an argument for a method when a class is a parameter for the method.  Resources for Article: Further resources on this subject: Saying Hello to Java EE [article] Getting Started with Sorting Algorithms in Java [article] Debugging Java Programs using JDB [article]
Read more
  • 0
  • 0
  • 11438

article-image-developing-games-using-ai
Packt
08 Aug 2017
13 min read
Save for later

Developing Games Using AI

Packt
08 Aug 2017
13 min read
In this article, MicaelDaGraça, the author of the book, Practical Game AI Programming, we will be covering the following points to introduce you to using AI Programming for games for exploratory data analysis. A brief history of and solutions to game AI Enemy AI in video games From simple to smart and human-like AI Visual and Audio Awareness A brief history of and solutions to game AI To better understand how to overcome the present problems that game developers are currently facing, we need to dig a little bit on the history of video game development and take a look to the problems and solutions that was so important at the time, where some of them was so avant-garde that actually changed the entire history of video game design itself and we still use the same methods today to create unique and enjoyable games. One of the first relevant marks that is always worth to mention when talking about game AI is the chess computer programmed to compete against humans. It was the perfect game to start experimenting artificial intelligence, because chess usually requires a lot of thought and planning ahead, something that a computer couldn't do at the time because it was necessary to have human features in order to successfully play and win the game. So the first step was to make it able for the computer to process the game rules and think for itself in order to make a good judgment of the next move that the computer should do to achieve the final goal that was winning by check-mate. The problem, chess has many possibilities so even if the computer had a perfect strategy to beat the game, it was necessary to recalculate that strategy, adapting it, changing or even creating a new one every time something went wrong with the first strategy. Humans can play differently every time, what makes it a huge task for the programmers to input all the possibility data into the computer to win the game. So writing all the possibilities that could exist wasn't a viable solution, and because of that programmers needed to think again about the problem. Then one day they finally came across with a better solution, making the computer decide for itself every turn, choosing the most plausible option for each turn, that way the computer could adapt to any possibility of the game. Yet this involved another problem, the computer would only think on the short term, not creating any plans to defeat the human in the future moves, so it was easy to play against it but at least we started to have something going on. It would be necessary decades until someone defined the word "Artificial Intelligence" by solving the first problem that many researchers had by trying to create a computer that was capable of defeating a human player. Arthur Samuel is the person responsible for creating a computer that could learn for itself and memorize all the possible combinations. That way there wasn't necessary any human intervention and the computer could actually think for its own and that was a huge step that it's still impressive even for today standards. Enemy AI in video games Now let's move to the video game industry and analyze how the first enemies and game obstacles were programmed, was it that different from what we are doing now? Let's find out. Single player games with AI enemies started to appear in the 70's and soon some games started to elevate the quality and expectations of what defines a video game AI, some of those examples were released for the arcade machines, like Speed Race from Taito (racing video game) and Qwak(duck hunting using a light gun) or Pursuit(aircraft fighter) both from Atari. Other notable examples are the text based games released for the first personal computers, like Hunt the Wumpus and Star Trek that also had enemies. What made those games so enjoyable was precisely the AI enemies that didn't reacted like any other before because they had random elements mixed with the traditional stored patterns, making them unpredictable and a unique experience every time you played the game. But that was only possible due to the incorporation of microprocessors that expanded the capabilities of a programmer at that time. Space Invaders brought the movement patterns,Galaxian improved and added more variety making the AI even more complex, Pac-Man later on brought movement patterns to the maze genre. The influence that the AI design in Pac-Man had is just as significant as the influence of the game itself. This classic arcade game makes the player believe that the enemies in the game are chasing him but not in a crude manner. The ghosts are chasing the player (or evading the player) in a different way as if they have an individual personality. This gives people the illusion that they are actually playing against 4 or 5 individual ghosts rather than copies of a same computer enemy. After that Karate Champ introduced the first AI fighting character, Dragon Quest introduced tactical system for the RPG genre and over the years the list of games that explored artificial intelligence and used it to create unique game concepts kept expanding and all of that came from a single question, how can we make a computer capable of beating a human on a game. All the games mentioned above have a different genre and they are unique in their style but all of them used the same method for the AI, that is called Finite State Machine. Here the programmer input all the behaviors necessary for the computer to challenge the player, just like the first computer that played chess. The programmer defined exactly how the computer should behave in different occasions in order to move, avoid, attack or perform any other behavior in order to challenge the player and that method is used even in the latest big budget games of today. From simple to smart and human-like AI Programmers face many challenges while developing an AI character but one of the greatest challenges is adapting the AI movement and behavior in relation to what the player is currently doing or will do in future actions. The difficulty exists because the AI is programmed with pre-determined states, using probability or possibility maps in order to adapt his movement and behavior according to the player. This technic can become very complex if the programmer extends the possibilities of the AI decisions, just like the chess machine that has all the possible situations that may occur on the game. It's a huge task for the programmer because it's necessary to determine what the player can do and how the AI will react to each action of the player and that takes a lot of CPU power. To overcome that challenge programmers started to mix possibility maps with probabilities and other technics that let the AI decide for itself on how it should react according to the player actions. These factors are important to consider while developing an AI that elevates the game quality as we are about to discover. Games kept evolving and players got even more exigent, not only with the visual quality, as well with the capabilities of the AI enemies and also with the allied characters. To deliver new games that took in consideration the player expectations, programmers started to write even more states for each character, creating new possibilities, more engaging enemies implementing important allies characters, more things for the player to do and a lot more features that helped re-defined different genres and creating new ones. Of course this was also possible because of the technology that also kept improving, allowing the developers to explore even more the artificial intelligence in the video games. A great example of this that is worth to mention is Metal Gear Solid, the game brought a new genre to the video game industry by implementing stealth elements, instead of the popular straight forward and shooting. But those elements couldn't be fully explored as Hideo Kojima intended because of the hardware limitations at the time. Jumping forward from the 3th to the 5th generation of consoles, Konami and Hideo Kojima presented the same title but this time with a lot more interactions, possibilities and behaviors from the AI elements of the game, making it so successful and important in the video game history that it's easy to see its influence in a large number of games that came after Metal Gear Solid. Metal Gear Solid - Sony Playstation 1 Visual and Audio Awareness The game in the above screenshot implemented visual and audio awareness to the enemy. This feature stablished the genre that we know today as a stealth game. So the game uses Pathfinding and Finite States Machine, features that already came from the beginning of the video game industry but in order to create something new they also created new features such as Interaction with the Environment, Navigation Behavior, Visual/Audio Awareness and AI interaction. A lot of things that didn't existed at the time but that is widely used today even on different game genres such as Sports, Racing, Fighting or FPS games were also introduced. After that huge step for game design, developers still faced other problems or should i say, this new possibilities brought even more problems, because it was not perfect. The AI still didn't react as a real person and many other elements was necessary to implement, not only on stealth games but in all other genres and one in particular needed to improve their AI to make the game feel realistic. We are talking about sport games, especially those who tried to simulate the real world team behaviors such as Basketball or Football. If we think about it, the interaction with the player is not the only thing that we need to care about, we left the chess long time ago, where it was 1 vs 1. Now we want more and watching other games getting realistic AI behaviors, sport fanatics started to ask that same features on their favorite games, after all those games was based on real world events and for that reason the AI should react realistically as possible. At this point developers and game designers started to take in consideration the AI interaction with itself and just like the enemies from Pac-Man, the player should get the impression that each character on the game, thinks for itself and reacts differently from the others. If we analyze it closely the AI that is present on a sports game is structured like an FPS or RTS game is, using different animation states, general movements, interactions, individual decisions and finally tactic and collective decisions. So it shouldn't be a surprise that sports games could reach the same level of realism as the other genres that greatly evolved in terms of AI development, .However there's a few problems that only sport games had at the time and it was how to make so many characters on the same screen react differently but working together to achieve the same objective. With this problem in mind, developers started to improve the individual behaviors of each character, not only for the AI that was playing against the player, but also the AI that was playing alongside with the player. Once again Finite State Machines made a crucial part of the Artificial Intelligence but the special touch that helped to create a realistic approach in the sports genre was the anticipation and awareness used on stealth games. The computer needed to calculate what the player was doing, where the ball was going and coordinate all of that, plus giving a false impression of a team mindset towards the same plan. Combining the newly features used on the new genre of stealth games with a vast number of characters on the same screen, it was possible to innovate the sports genre by creating a sports simulation type of game that has gained so much popularity over the years. This helps us to understand that we can use almost the same methods for any type of game even if it looks completely different, the core principles that we saw on the computer that played chess it's still valuable to the sport game released 30 years later. Let's move on to our last example that also has a great value in terms of how an AI character should behave to make it more realistic, the game is F.E.A.R. developed by Monolith Productions. What made this game so special in terms of Artificial Intelligence was the dialogues between the enemy characters. While it wasn't an improvement in a technical point of view, it was definitely something that helped to showcase all of the development work that was put on the characters AI and this is so crucial because if the AI don't say it, it didn't happen. This is an important factor to take in consideration while creating a realistic AI character, giving the illusion that it's real, the false impression that the computer reacts like humans and humans interact so AI should do the same. Not only the dialogues help to create a human like atmosphere, it also helps to exhale all of the development put on the character that otherwise the player wouldn't notice that it was there. When the AI detects the player for the first time, he shouts that he found it, when the AI loses sight of the player, he also express that emotion. When the squad of AI's are trying to find the player or ambush him, they speak about that, living the player imagining that the enemy is really capable of thinking and planning against him. Why is this so important? Because if we only had numbers and mathematical equations to the characters, they will react that way, without any human feature, just math and to make it look more human it's necessary to input mistakes, errors and dialogues inside the character AI, just to distract the player from the fact that he's playing against a machine. The history of video game artificial intelligence is still far away from perfect and it's possible that it would take us decades to improve just a little bit what we achieve from the early 50's until this present day, so don't be afraid of exploring what you are about to learn, combine, change or delete some of the things to find different results, because great games did it in the past and they had a lot of success with it. Summary In this article we learned about the AI impact in the video game history, how everything started from a simple idea to have a computer to compete against humans in traditional games and how that naturally evolved into the the world of video games. We also learned about the challenges and difficulties that were present since the day one and how coincidentally programmers kept facing and still face the same problems.
Read more
  • 0
  • 0
  • 43322

article-image-creating-and-calling-subroutines
Packt
21 Jul 2017
8 min read
Save for later

Creating and Calling Subroutines

Packt
21 Jul 2017
8 min read
In this article by James Kent Lewis author for the book Linux Shell Scripting Bootcamp, we will learn how to create and call subroutines in a script. The topics covered in this article are as follows: Show subroutines that take parameters Mention return codes again and how they work in scripts How to use subroutines First, let's start with a selection of simple but powerful scripts. These are mainly shown to give the reader an idea of just what can be done quickly with a script. (For more resources related to this topic, see here.) Clearing the screen The tput clear terminal command can be used to clear the current command-line session. You could type tput clear all the time, but wouldn't just cls be nicer? Here's a simple script that clears the current screen: Chapter 4 - Script 1 #!/bin/sh # # 5/9/2017 # tput clear Notice that this was so simple that I didn't even bother to include a Usage message or return code. Remember, to make this a command on your system, do this: cd $HOME/bin Create/edit a file named cls Copy and paste the preceding code into this file Save the file Run chmod 755 cls You can now type cls from any terminal (under that user) and your screen will clear. Try it. File redirection At this point, we need to go over file redirection. This is the ability to have the output from a command or script be copied into a file instead of going to the screen. This is done using the redirection operator, which is really just the greater than sign. These commands were run on my system and here is the output: Command piping Now, let's look at command piping, which is the ability to run a command and have the output from it serve as the input to another command. Suppose a program or script named loop1 is running on your system and you want to know the PID of it. You could run the psauxw command to a file, and then grep the file for loop1. Alternatively, you could do it in one step using a pipe as follows: Pretty cool, right? This is a very powerful feature in a Linux system and is used extensively. We will be seeing a lot more of this soon. The next section shows another very short script using some command piping. This clears the screen and then shows only the first 10 lines from dmesg. Chapter 4 - Script 2 #!/bin/sh # # 5/9/2017 # tput clear dmesg | head The next section shows file redirection. Chapter 4 - Script 3 #!/bin/sh # # 5/9/2017 # FN=/tmp/dmesg.txt dmesg> $FN echo "File $FN created." Try it on your system! This shows how easy it is to create a script to perform commands that you would normally type on the command line. Also, notice the use of the FN variable. If you want to use a different filename later, you only have to make the change in one place. Subroutines Now, let's really get into subroutines. To do this, we will use more of the tput commands: tput cup <row><col> # puts the cursor at row, col tput cup 0 0 # puts the cursor at the upper left hand side of terminal tput cup $LINES $COLUMNS # puts the cursor at bottom right hand side tput clear # clears the terminal screen tputsmso # bolds the text that follows tputrmso # un-bolds the text that follows test of Code[PACKT] 123456789012345678901234567890 tput cup 0 0 # tputsmso # The script is shown in the next section. This was mainly written to show the concept of subroutines; however, it can also be used as a guide on writing interactive tools. Chapter 4 - Script 4 i #!/bin/sh # # 5/9/2017 # echo "script4 - Linux Scripting Book" # Subroutines cls() { tput clear return 0 } home() { tput cup 0 0 return 0 } end() { let x=$COLUMNS-1 tput cup $LINES $x echo -n "X" # no newline or else will scroll } bold() { tputsmso } unbold() { tputrmso } underline() { tputsmul } normalline() { tputrmul } # Code starts here rc=0 # return code if [ $# -ne 1 ] ; then echo "Usage: script4 parameter" echo "Where parameter can be: " echo " home - put an X at the home position" echo " cls - clear the terminal screen" echo " end - put an X at the last screen position" echo " bold - bold the following output" echo " underline - underline the following output" exit 255 fi parm=$1 # main parameter 1 if [ "$parm" = "home" ] ; then echo "Calling subroutine home." home echo -n "X" elif [ "$parm" = "cls" ] ; then cls elif [ "$parm" = "end" ] ; then echo "Calling subroutine end." end elif [ "$parm" = "bold" ] ; then echo "Calling subroutine bold." bold echo "After calling subroutine bold." unbold echo "After calling subroutine unbold." elif [ "$parm" = "underline" ] ; then echo "Calling subroutine underline." underline echo "After subroutine underline." normalline echo "After subroutine normalline." else echo "Unknown parameter: $parm" rc=1 fi exit $rc The following is the output: Try this on your system. If you run it with the home parameter, it might look a little strange to you. The code puts a capital X at the home position (0,0) and this causes the prompt to print 1 character over. Nothing is wrong here, it just looks a little weird. Don't worry if this still doesn't make sense to you, just go ahead and look at Script 5. Using parameters Okay, let's add some routines to this script to show how to use parameters with a subroutine. In order to make the output look better, the cls routine is called first to clear the screen, which is shown in the next section. Chapter 4 - Script 5 #!/bin/sh # # 5/9/2017 # echo "script5 - Linux Scripting Book" # Subroutines cls() { tput clear return 0 } home() { tput cup 0 0 return 0 } end() { let x=$COLUMNS-1 tput cup $LINES $x echo -n "X" # no newline or else will scroll } bold() { tputsmso } unbold() { tputrmso } underline() { tputsmul } normalline() { tputrmul } move() # move cursor to row, col { tput cup $1 $2 } movestr() # move cursor to row, col { tput cup $1 $2 echo $3 } # Code starts here cls # clear the screen to make the output look better rc=0 # return code if [ $# -ne 1 ] ; then echo "Usage: script5 parameter" echo "Where parameter can be: " echo " home - put an X at the home position" echo " cls - clear the terminal screen" echo " end - put an X at the last screen position" echo " bold - bold the following output" echo " underline - underline the following output" echo " move - move cursor to row,col" echo " movestr - move cursor to row,col and output string" exit 255 fi parm=$1 # main parameter 1 if [ "$parm" = "home" ] ; then home echo -n "X" elif [ "$parm" = "cls" ] ; then cls elif [ "$parm" = "end" ] ; then move 0 0 echo "Calling subroutine end." end elif [ "$parm" = "bold" ] ; then echo "Calling subroutine bold." bold echo "After calling subroutine bold." unbold echo "After calling subroutine unbold." elif [ "$parm" = "underline" ] ; then echo "Calling subroutine underline." underline echo "After subroutine underline." normalline echo "After subroutine normalline." elif [ "$parm" = "move" ] ; then move 10 20 echo "This line started at row 10 col 20" elif [ "$parm" = "movestr" ] ; then movestr 30 40 "This line started at 30 40" else echo "Unknown parameter: $parm" rc=1 fi exit $rc Since this script only has two extra functions, you can just run them. This will be shown one command at a time as follows: guest1 $ script5 guest1 $ script5 move guest1 $ script5 movestr Since we are now placing the cursor at a specific location, the output should make more sense to you. Notice how the command-line prompt reappears where the last cursor position was. You probably noticed that the parameters to a subroutine work just like with a script. Parameter 1 is $1, parameter 2 is $2, and so on. This is good and bad, good because you don't have to learn anything radically different. But bad in that it is very easy to get the $1, $2, and vars mixed up if you are not careful. A possible solution, and the one I use, is to assign the $1, $2, and so on variables in the main script to a variable with a good meaningful name. For example, in these example scripts, I set parm1 equal to $1 (parm1=$1), and so on. Summary We started with some very simple scripts and then proceeded to show some simple subroutines. We then showed some subroutines that take parameters. Return codes were mentioned again to show how they work in subroutines. We included several scripts to show the concepts, and also included a special bonus script at no extra charge. Resources for Article:   Further resources on this subject: Linux Shell Scripting [article] Linux Shell Scripting – various recipes to help you [article] GLSL 4.0: Using Subroutines to Select Shader Functionality [article]
Read more
  • 0
  • 0
  • 6000
article-image-windows-drive-acquisition
Packt
21 Jul 2017
13 min read
Save for later

Windows Drive Acquisition

Packt
21 Jul 2017
13 min read
In this article, by Oleg Skulkin and Scar de Courcier, authors of Windows Forensics Cookbook, we  will cover drive acquisition in E01 format with FTK Imager, drive acquisition in RAW Format with DC3DD, and mounting forensic images with Arsenal Image Mounter. (For more resources related to this topic, see here.) Before you can begin analysing evidence from a source, it first of all, needs to be imaged. This describes a forensic process in which an exact copy of a drive is taken. This is an important step, especially if evidence needs to be taken to court because forensic investigators must be able to demonstrate that they have not altered the evidence in any way. The term forensic image can refer to either a physical or a logical image. Physical images are precise replicas of the drive they are referencing, whereas a logical image is a copy of a certain volume within that drive. In general, logical images show what the machine’s user will have seen and dealt with, whereas physical images give a more comprehensive overview of how the device works at a higher level. A hash value is generated to verify the authenticity of the acquired image. Hash values are essentially cryptographic digital fingerprints which show whether a particular item is an exact copy of another. Altering even the smallest bit of data will generate a completely new hash value, thus demonstrating that the two items are not the same. When a forensic investigator images a drive, they should generate a hash value for both the original drive and the acquired image. Some pieces of forensic software will do this for you. There are a number of tools available for imaging hard drives, some of which are free and open source. However, the most popular way for forensic analysts to image hard drives is by using one of the more well-known forensic software vendors solutions. This is because it is imperative to be able to explain how the image was acquired and its integrity, especially if you are working on a case that will be taken to court. Once you have your image, you will then be able to analyse the digital evidence from a device without directly interfering with the device itself. In this chapter, we will be looking at various tools that can help you to image a Windows drive, and taking you through the process of acquisition. Drive acquisition in E01 format with FTK Imager FTK Imager is an imaging and data preview tool by AccessData, which allows an examiner not only to create forensic images in different formats, including RAW, SMART, E01 and AFF, but also to preview data sources in a forensically sound manner. In the first recipe of this article, we will show you how to create a forensic image of a hard drive from a Windows system in E01 format. E01 or EnCase's Evidence File is a standard format for forensic images in law enforcement. Such images consist of a header with case info, including acquisition date and time, examiner's name, acquisition notes, and password (optional), bit-by-bit copy of an acquired drive (consists of data blocks, each is verified with its own CRC or Cyclical Redundancy Check), and a footer with MD5 hash for the bitstream.  Getting ready First of all, let's download FTK Imager from AccessData website. To do it, go to SOLUTIONS tab, and after - to Product Downloads. Now choose DIGITAL FORENSICS, and after - FTK Imager. At the time of this writing, the most up-to-date version is 3.4.3, so click DOWNLOAD PAGE green button on the right. Ok, now you should be at the download page. Click on DOWNLOAD NOW button and fill in the form, after this you'll get the download link to the email you provided. The installation process is quite straightforward, all you need is just click Next a few times, so we won't cover it in the recipe. How to do it... There are two ways of initiating drive imaging process: Using Create Disk Image button from the Toolbar as shown in the following figure: Create Disk Image button on the Toolbar Use Create Disk Image option from the File menu as shown in the following figure: Create Disk Image... option in the File Menu You can choose any option you like. The first window you see is Select Source. Here you have five options: Physical Drive: This allows you to choose a physical drive as the source, with all partitions and unallocated space Logical Drive: This allows you to choose a logical drive as the source, for example, E: drive Image File: This allows you to choose an image file as the source, for example, if you need to convert you forensic image from one format to another Contents of a Folder: This allows you to choose a folder as the source, of course, no deleted files, and so on will be included Fernico Device: This allows you to restore images from multiple CD/DVD Of course, we want to image the whole drive to be able to work with deleted data and unallocated space, so: Let's choose Physical Drive option. Evidence source mustn't be altered in any way, so make sure you are using a hardware write blocker, you can use the one from Tableau, for example. These devices allow acquisition of  drive contents without creating the possibility of modifying the data.  FTK Imager Select Source window Click Next and you'll see the next window - Select Drive. Now you should choose the source drive from the drop down menu, in our case it's .PHYSICALDRIVE2. FTK Imager Select Drive window Ok, the source drive is chosen, click Finish. Next window - Create Image. We'll get back to this window soon, but for now, just click Add...  It's time to choose the destination image type. As we decided to create our image in EnCase's Evidence File format, let's choose E01. FTK Imager Select Image Type window Click Next and you'll see Evidence Item Information window. Here we have five fields to fill in: Case Number, Evidence Number, Unique Description, Examiner and Notes. All fields are optional. FTK Imager Evidence Item Information window Filled the fields or not, click Next. Now choose image destination. You can use Browse button for it. Also, you should fill in image filename. If you want your forensic image to be split, choose fragment size (in megabytes). E01 format supports compression, so if you want to reduce the image size, you can use this feature, as you can see in the following figure, we have chosen 6. And if you want the data in the image to be secured, use AD Encryption feature. AD Encryption is a whole image encryption, so not only is the raw data encrypted, but also any metadata. Each segment or file of the image is encrypted with a randomly generated image key using AES-256. FTK Imager Select Image Destination window Ok, we are almost done. Click Finish and you'll see Create Image window again. Now, look at three options at the bottom of the window. The verification process is very important, so make sure Verify images after they are created option is ticked, it helps you to be sure that the source and the image are equal. Precalculate Progress Statistics option is also very useful: it will show you estimated time of arrival during the imaging process. The last option will create directory listings of all files in the image for you, but of course, it takes time, so use it only if you need it.  FTK Imager Create Image window All you need to do now is to click Start. Great, the imaging process has been started! When the image is created, the verification process starts. Finally, you'll get Drive/Image Verify Results window, like the one in the following figure: FTK Imager Drive/Image Verify Results window As you can see, in our case the source and the image are identical: both hashes matched. In the folder with the image, you will also find an info file with valuable information such as drive model, serial number, source data size, sector count, MD5 and SHA1 checksums, and so on. How it works... FTK Imager uses the physical drive of your choice as the source and creates a bit-by-bit image of it in EnCase's Evidence File format. During the verification process, MD5 and SHA1 hashes of the image and the source are being compared. See more FTK Imager download page: http://accessdata.com/product-download/digital-forensics/ftk-imager-version-3.4.3 FTK Imager User Guide: https://ad-pdf.s3.amazonaws.com/Imager/3_4_3/FTKImager_UG.pdf Drive acquisition in RAW format with DC3DD DC3DD is a patched (by Jesse Kornblum) version of classic GNU DD utility with some computer forensics features. For example, the fly hashing with a number of algorithms, such as MD5, SHA-1, SHA-256, and SHA-512, showing the progress of the acquisition process, and so on. Getting ready You can find a compiled stand alone 64 bit version of DC3DD for Windows at Sourceforge. Just download the ZIP or 7z archive, unpack it, and you are ready to go. How to do it... Open Windows Command Prompt and change directory (you can use cd command to do it) to the one with dc3dd.exe, and type the following command: dc3dd.exe if=.PHYSICALDRIVE2 of=X:147-2017.dd hash=sha256 log=X:147-2017.log Press Enter and the acquisition process will start. Of course, your command will be a bit different, so let's find out what each part of it means: if: It stands for input file, yes, originally DD is a Linux utility, and, if you don't know, everything is a file in Linux, as you can see in our command, we put physical drive 2 here (this is the drive we wanted to image, but in your case it can be another drive, depend on the number of drives connected to your workstation). of: It stands for output file, here you should type the destination of your image, as you remember, in RAW format, in our case it's X: drive and 147-2017.dd file. hash: As it's already been said, DC3DD supports four hashing algorithms: MD5, SHA-1, SHA-256, and SHA-512, we chose SHA-256, but you can choose the one you like. log: Here you should type the destination for the logs, you will find the image version, image hash, and so on in this file after acquisition is completed. How it works... DC3DD creates bit-by-bit image of the source drive n RAW format, so the size of the image will be the same as source, and calculates the image hash using the algorithm of the examiner's choice, in our case SHA-256. See also DC3DD download page: https://sourceforge.net/projects/dc3dd/files/dc3dd/7.2%20-%20Windows/ Mounting forensic images with Arsenal Image Mounter Arsenal Image Mounter is an open source tool developed by Arsenal Recon. It can help a digital forensic examiner to mount a forensic image or virtual machine disk in Windows. It supports both E01 (and Ex01) and RAW forensic images, so you can use it with any of the images we created in the previous recipes. It's very important to note, that Arsenal Image Mounter mounts the contents of disk images as complete disks. The tool supports all file systems you can find on Windows drives: NTFS, ReFS, FAT32 and exFAT. Also, it has temporary write support for images and it's very useful feature, for example, if you want to boot system from the image you are examining. Getting ready Go to Arsenal Image Mounter web page at Arsenal Recon website and click on Download button to download the ZIP archive. At the time of this writing the latest version of the tool is 2.0.010, so in our case, the archive has the name  Arsenal_Image_Mounter_v2.0.010.0_x64.zip. Extract it to a location of your choice and you are ready to go, no installation is needed. How to do it... There two ways to choose an image for mounting in Arsenal Image Mounter: You can use File menu and choose Mount image. Use the Mount image button as shown in the following figure:  Arsenal Image Mounter main window When you choose Mount image option from File menu or click on Mount image button, Open window will popup - here you should choose an image for mounting. The next windows you will see - Mount options, like the one in the following figure:  Arsenal Image Mounter Mount options window As you can see, there are a few options here: Read only: If you choose this option, the image is mounted in read-only mode, so no write operations are allowed (Do you still remember that you mustn't alter the evidence in any way? Of course, it's already an image, not the original drive, but nevertheless).Fake disk signatures: If an all-zero disk signature is found on the image, Arsenal Image Mounter reports a random disk signature to Windows, so it's mounted properly. Write temporary: If you choose this option, the image is mounted in read-write mode, but all modifications are written not in the original image file, but to a temporary differential file. Write original: Again, this option mounts the image in read-write mode, but this time the original image file will be modified. Sector size: This option allows you to choose sector size. Create "removable" disk device: This option emulates the attachment of a USB thumb drive.   Choose the options you think you need and click OK. We decided to mount our image as read only. Now you can see a hard drive icon on the main windows of the tool - the image is mounted. If you mounted only one image and want to unmount it- select the image and click on Remove selected button. If you have a few mounted images and want to unmount all of them - click on Remove all button. How it works... Arsenal Image Mounter mounts forensic images or virtual machine disks as complete disks in read-only or read-write mode. Later, a digital forensics examiner can access their contents even with Windows Explorer. See also Arsenal Image Mounter page at Arsenal Recon website: https://arsenalrecon.com/apps/image-mounter/ Summary In this article, the author has explained about the process and importance of drive acquisition using imaging software's which are available with well-known forensic software vendors such as FTK Imager and DC3DD. Drive acquisition being the first step in the analysis of digital evidence, need to be carried out with utmost care which in turn will make the analysis process smooth. Resources for Article: Further resources on this subject: Forensics Recovery [article] Digital and Mobile Forensics [article] Mobile Forensics and Its Challanges [article]
Read more
  • 0
  • 0
  • 27664

article-image-getting-started-android-things
Packt
21 Jul 2017
16 min read
Save for later

Getting Started with Android Things

Packt
21 Jul 2017
16 min read
In this article by Charbel Nemnom, author of the book Getting Started with Nano Server, we will cover following topics: Internet of things overview Internet of Things components Android Things overview Things support library Android Things board compatibility How to install Android Things on Raspberry Creating the first Android Things project (For more resources related to this topic, see here.) Internet of things overview Internet of things, or briefly IoT, is one of the most promising trends in technology. According to many analysts, Internet of things can be the most disruptive technology in the upcoming decade. It will have a huge impact on our lives and it promises to modify our habits. IoT is and will be in the future a pervasive technology that will span its effects across many sectors: Industry Healthcare Transportation Manufacturing Agriculture Retail Smart cities All these areas will have benefits using IoT. Before diving into IoT projects, it is important to know what Internet of things means. There are several definitions about Internet of things addressing different aspects considering different areas of application. Anyway, it is important to underline that IoT is much more than a network of smartphones, tablets, and PCs connected to each other. Briefly, IoT is an ecosystem where objects are interconnected and, at the same time, they connect to the internet. Internet of things includes every object that can potentially connect to the internet and exchange data and information. These objects are connected always, anytime, anywhere, and they exchange data. The concept of connected objects is not new and over the years it has been developed. The level of circuit miniaturization and the increasing power of CPU with a lower consumption makes it possible to imagine a future where there are millions of “things” that talk each other. The first time that Internet of things was officially recognized was in 2005. International Communication Union (ITU) in a report titled The Internet of Things (https://www.itu.int/osg/spu/publications/internetofthings/InternetofThings_summar y.pdf), gave the first definition: “A new dimension has been added to the world of information and communication technologies (ICTs): from anytime, any place connectivity for anyone, we will now have connectivity for anything…. Connections will multiply and create an entirely new dynamic network of networks – an Internet of Things” In other words, IoT is a network of smart objects (or things) that can receive and send data and we can control it remotely. Internet of Things components There are several elements that contribute to creating the IoT ecosystem and it is important to know clearly the role they play in order to have a clear picture about IoT. This will be useful to better understand the projects we will build using Android Things. The basic brick of IoT is a smart object. It is a device that connects to the internet and it is capable of exchanging data. It can be a simple sensor that measures a quantity such as pressure, temperature, and so on or a complex system. Extending this concept, our oven, our coffee machine, our washing machine are examples of smart objects once they connect to the internet. All of these smart objects contribute to developing the internet of things network. Anyway, not only household appliances are examples of smart objects, but cars, buildings, actuators, and so on can be included in the list. We can reference these objects, when connected, using a unique identifier and start talking to it. At the low level, these devices exchange data using a network layer. The most important and known protocols at the base of Internet of things are: Wi-Fi Bluetooth Zigbee Cellular Network NB-IoT LoRA From an application point of view, there are several application protocols widely used in the internet of things. Some protocols derive from different contexts (such as the web) others are IoT-specific. To name a few of them, we can remember: HTTP MQTT CoAP AMQP Rest XMPP Stomp By now, they could be just names or empty boxes, but throughout this article we will explore how to use these protocols with Android Things. Prototyping boards play an important role in Internet of things and they help to develop the number of connected objects. Using prototyping boards, we can experiment IoT projects and in this article, we will explore how to build and test IoT projects using boards compatible with Android Things. As you may already know, there are several prototyping boards available on the market, each one having specific features. Just to name a few of them, we can list: Arduino (in different flavors) Raspberry Pi (in different flavors) Intel Edison ESP8266 NXP We will focus our attention on Raspberry Pi 3 and Intel Edison because Android Things supports officially these two boards. We will also use other developments boards so that you can understand how to integrate them. Android Things overview Android Things is the new operating system developed by Google to build IoT projects. This helps you to develop professional applications using trusted platforms and Android. Yes Android, because Android Things is a modified version of Android and we can reuse our Android knowledge to implement smart Internet of things projects. This OS has great potential because Android developers can smoothly move to IoT and start developing and building projects in a few days. Before diving into Android Things, it is important to have an overview. Android Things OS has the layer structure shown in the following diagram: This structure is slightly different from Android OS because it is much more compact so that apps for Android Things have fewer layers beneath and they are closer to drivers and peripherals than normal Android apps. Even if Android Things derives from Android, there are some APIs available in Android not supported in Android Things. We will now briefly describe the similarities and the differences. Let us start with the content providers, widely used in Android, and not present in Android Things SDK. Therefore, we should pay attention when we develop an Android Things app. To have more information about these content providers not supported, please refer to the Official Android Things website https://developer.android.com/things/sdk/index.html. Moreover, like a normal Android app, an Android Things app can have a User Interface (UI), even if this is optional and it depends on the type of application we are developing. A user can interact with the UI to trigger events as they happen in an Android app. From this point of view, as we will see later, the developing process of a UI is the same used in Android. This is an interesting feature because we can develop an IoT UI easily and fast re- using our Android knowledge. It is worth noticing that Android Things fits perfectly in the Google services. Almost all cloud services implemented by Google are available in Android Things with some exceptions. Android Things does not support Google services strictly connected to the mobile world and those that require a user input or authentication. Do not forget that user interface for an Android Things app is optional. To have a detailed list of Google services available in Android Things refer to the Official page (https://developer.android.com/things/sdk/index.html). An important Android aspect is the permission management. An Android app runs in a sandbox with limited access to the resources. When an app needs to access a specific resource outside the sandbox it has to request the permission. In an Android app, this happens in the Manifest.xml file. This is still true in Android Things and all the permissions requested by the app are granted at installation time. Android 6 (API level 23) has introduced a new way to request a permission. An app can request a permission not only at installation time (using the Manifest.xml file), but at runtime too. Android Things does not support this new feature, so we have to request all the permissions in the Manifest file. The last thing to notice is the notifications. As we will see later, Android Things UI does not support the notification status bar, so we cannot trigger notifications from our Android Things apps. To make things simpler, you should remember that all the services related to the user interface or that require a user interface to accomplish the task are not guaranteed to work in Android Things. Things support library Things support library is the new library developed by Google to handle the communication with peripherals and drivers. This is a completely new library not present in the Android SDK and this library is one of the most important features. It exposes a set of Java Interface and classes (APIs) that we can use to connect and exchange data with external devices such as sensors, actuators, and so on. This library hides the inner communication details supporting several industry standard protocols as: GPIO I2C PWM SPI UART Moreover, this library exposes a set of APIs to create and register new device drivers called user drivers. These drivers are custom components deployed with the Android Things app that extends the Android Things framework. In other words, they are custom libraries that enable an app to communicate with other device types not supported by Android Things natively. This article will guide you, step by step, to know how to build real-life projects using Android. You will explore the new Android Things APIs and how to use them. In the next sections, you will learn how to install Android Things on Raspberry Pi 3 and Intel Edison. Android Things board compatibility Android Things is a new operating system specifically built for IoT. At the time of writing, Android Things supported four different development boards: Raspberry Pi 3 Model B Intel Edison NXP Pico i.MX6UL Intel Joule 570x In the near future, more boards will be added to the list. Google has already announced that it will support this new board: NXP Argon i.MX6UL The article will focus on using the first two boards: Raspberry Pi 3 and Intel Edison. Anyway, you can develop. This is the power of Android Things: it abstracts the underlying hardware providing a common way to interact with peripherals and devices. The paradigm that made Java famous, Write Once and Run Anywhere (WORA), applies to Android Things too. This is a winning feature of Android Things because we can develop an Android Things app without worrying about the underlying board. Anyway, when we develop an IoT app there are some minor aspects we should consider so that our app will be portable to other compatible boards. How to install Android Things on Raspberry Raspberry Pi 3 is the latest board developed by Raspberry. It is an upgrade of Raspberry Pi 2 Model B and respect to its predecessor it has some great features: Quad-core ARMv8 Cpu at 1.2Ghz Wireless Lan 802.11n Bluetooth 4.0 The below screenshot shows a Raspberry Pi 3 Model B: In this section, you will learn how to install Android Things on Raspberry Pi 3 using a Windows PC or a Mac OS X. Before starting the installation process you must have: Raspberry Pi 3 Model B At least 8Gb SD card A USB cable to connect Raspberry to your PC An HDMI cable to connect Raspberry to a TV/monitor (optional) If you do not have an HDMI cable you can use a screen mirroring tool. This is useful to know the result of the installation process and when we will develop the Android Things UIs. The installation steps are different if you are using Windows, OS X or Linux. How to install Android Things using Windows At the beginning we will cover how to install Android Things on Raspberry Pi 3 using a Windows PC: Download the Android Things image from this link: https://developer.android.com/things/preview/download.html. Select the right image, in this case, you have to choose the Raspberry Pi image. Accept the license and wait until the download is completed. Once the download is complete, extract the zip file. To install the image on the SD card, there is a great application called Win32 Disk Imager that works perfectly. It is free and you can download it from SourceForge using this link: https://sourceforge.net/projects/win32diskimager/. At the time of writing, the application version is 0.9.5. After you downloaded it, you have to run the installation executable as Administrator. Now you are ready to burn the image into the SD card. Insert the SD card into your PC. Select the image you have unzipped at step 3 and be sure to select the right disk name (your SD). At the end click on Write. You are done! The image is installed on the SD card and we can now start Raspberry Pi. How to install Android Things using OS X If you have a Mac OS X, the steps to install Android Things are slightly different. There are several options to flash this OS to the SD card, you will learn the fastest and easiest one. These are the steps to follow: Format your SD card using FAT32. Insert your SD card into your Mac and run Disk Utility. You should see something like it: Download the Android Things OS image using this link: https://developer.android.com/things/preview/download.html. Unzip the file you have downloaded. Insert the SD card into your Mac. Now it is time to copy the image to the SD card. Open a terminal window and write: sudo dd bs=1m if=path_of_your_image.img of=/dev/rdiskn Where the path_to_your_image is the path to the file with img extension you downloaded at step 2. In order to find out the rdiskn you have to select Preferences and then System Report. The result is shown in the picture below: The BSD name is the disk name we are looking for. In this case, we have to write: sudo dd bs=1m if=path_of_your_image.img of=/dev/disk1 That's all. You have to wait until the image is copied into the SD card. Do not forget that the copying process could take a while. So be patient! Creating the first Android Things project Considering that Android Things derives from Android, the development process and the app structure are the same we use in a common Android app. For this reason, the development tool to use for Android Things is Android studio. If you have already used Android studio in the past, reading this paragraph you will discover the main differences between an Android Things app and Android app. Otherwise, if you are new to Android development, this paragraph will guide you step by step to create your first Android Things app. Android Studio is the official development environment to develop Android Things apps, therefore, before starting, it is necessary you have installed it. If not, go to https://developer.android.com/studio/index.html, download and install it. The development environment must adhere to these prerequisites: SDK tools version 24 or higher. Update the SDK with Android 7 (API level 24) Android Studio 2.2 or higher If your environment does not meet the previous conditions, you have to update your Android Studio using the Update manager. Now there are two alternatives to starting a new project: Cloning a template project from Github and import it into Android Studio Create a new Android project in Android Studio To better understand the main differences between Android and Android Things you should follow option number 2, at least the first time. Cloning the template project This is the fastest path because with a few steps you are ready to develop an Android Things app: go to https://github.com/androidthings/new-project-template and clone the repository. Open a terminal and write: git clone https://github.com/androidthings/new-project-template.git Now you have to import the cloned project into Android Studio. Create the project manually This step is a quite longer respect to the previous option but it is useful to know the main differences between these two worlds. Create a new Android project. Do not forget to set the Minimum SDK to API Level 24: By now, you should create a project with empty activity. Confirm and create the new project. There are some steps you have to follow before your Android app project turns into an Android Things app project: Open Gradle scripts folder and modify build.gradle (app-level) and replace the dependency directive with the following lines: dependencies { provided 'com.google.android.things:androidthings:0.2-devpreview' } Open res folder and remove all the files under it except strings.xml. Open Manifest.xml and remove android: theme attribute in application tag. In the Manifest.xmladd the following line inside application tag: <uses-library android_name="com.google.android.things"/> In the layout folder open all the layout files created automatically and remove the references to values. In the Activity created by default (MainActivity.java) remove this line: import android.support.v7.app.AppCompatActivity; Replace AppCompatActivity with Activity. Under the folder java remove all the folders except the one with your package name. That's all. You have now transformed an Android app project into an Android Things app project. Compiling the code you will have no errors. For the next times, you can simply clone the repository holding the project template and start coding. Differences between Android and Android Things As you can notice an Android Things project is very similar to an Android project. We have always Activities, layouts, gradle files and so on. At the same time, there are some differences: Android Things does not use multiple layouts to support different screen sizes. So when we develop an Android Things app we create only one layout Android Things does not support themes and styles. Android support libraries are not available in Android Things. Summary In this article we have covered, Internet of things overview and components, Android Things overview, Things support library, and Android Things board compatibility, and also learned how to create first Android Things project. Resources for Article: Further resources on this subject: Saying Hello to Unity and Android [article] Getting started with Android Development [article] Android Game Development with Unity3D [article]
Read more
  • 0
  • 0
  • 16037

article-image-vulnerability-assessment
Packt
21 Jul 2017
11 min read
Save for later

Vulnerability Assessment

Packt
21 Jul 2017
11 min read
"Finding a risk is learning, Ability to identify risk exposure is a skill and exploiting it is merely a choice" In this article by Vijay Kumar Velu, the author of the book Mastering Kali Linux for Advanced Penetration Testing - Second Edition, we will learn about vulnerability assessment. The goal of passive and active reconnaissance is to identify the exploitable target and vulnerability assessment is to find the security flaws that are most likely to support the tester's or attacker's objective (denial of service, theft, or modification of data). The vulnerability assessment during the exploit phase of the kill chain focuses on creating the access to achieve the objective—mapping of the vulnerabilities to line up the exploits to maintain the persistent access to the target. Thousands of exploitable vulnerabilities have been identified, and most are associated with at least one proof-of-concept code or technique to allow the system to be compromised. Nevertheless, the underlying principles that govern success are the same across networks, operating systems, and applications. In this article, you will learn: Using online and local vulnerability resources Vulnerability scanning with nmap Vulnerability nomenclature Vulnerability scanning employs automated processes and applications to identify vulnerabilities in a network, system, operating system, or application that may be exploitable. When performed correctly, a vulnerability scan delivers an inventory of devices (both authorized and rogue devices); known vulnerabilities that have been actively scanned for, and usually a confirmation of how compliant the devices are with various policies and regulations. Unfortunately, vulnerability scans are loud—they deliver multiple packets that are easily detected by most network controls and make stealth almost impossible to achieve. They also suffer from the following additional limitations: For the most part, vulnerability scanners are signature based—they can only detect known vulnerabilities, and only if there is an existing recognition signature that the scanner can apply to the target. To a penetration tester, the most effective scanners are open source and they allow the tester to rapidly modify code to detect new vulnerabilities. Scanners produce large volumes of output, frequently containing false-positive results that can lead a tester astray; in particular, networks with different operating systems can produce false-positives with a rate as high as 70 percent. Scanners may have a negative impact on the network—they can create network latency or cause the failure of some devices (refer to the Network Scanning Watch List at www.digininja.org, for devices known to fail as a result of vulnerability testing). In certain jurisdictions, scanning is considered as hacking, and may constitute an illegal act. There are multiple commercial and open source products that perform vulnerability scans. Local and online vulnerability databases Together, passive and active reconnaissance identifies the attack surface of the target, that is, the total number of points that can be assessed for vulnerabilities. A server with just an operating system installed can only be exploited if there are vulnerabilities in that particular operating system; however, the number of potential vulnerabilities increases with each application that is installed. Penetration testers and attackers must find the particular exploits that will compromise known and suspected vulnerabilities. The first place to start the search is at vendor sites; most hardware and application vendors release information about vulnerabilities when they release patches and upgrades. If an exploit for a particular weakness is known, most vendors will highlight this to their customers. Although their intent is to allow customers to test for the presence of the vulnerability themselves, attackers and penetration testers will take advantage of this information as well. Other online sites that collect, analyze, and share information about vulnerabilities are as follows: The National vulnerability database that consolidates all public vulnerability data released by the US Government available at http://web.nvd.nist.gov/view/vuln/search Secunia available at http://secunia.com/community/ Open Source Vulnerability Database Project (OSVDP) available at http://www.osvdb.org/search/advsearch Packetstorm security available at http://packetstormsecurity.com/ SecurityFocus available at http://www.securityfocus.com/vulnerabilities Inj3ct0r available at http://1337day.com/ The Exploit database maintained by Offensive Security available at http://www.db-exploit.com The Exploit database is also copied locally to Kali and it can be found in the /usr/share/exploitdb directory. Before using it, make sure that it has been updated using the following command: cd /usr/share/exploitdb wget http://www.exploit-db.com/archive.tar.bz2 tar -xvjfarchive.tar.bz2 rmarchive.tar.bz2 To search the local copy of exploitdb, open a Terminal window and enter searchsploit and the desired search term(s) in the Command Prompt. This will invoke a script that searches a database file (.csv) that contains a list of all exploits. The search will return a description of known vulnerabilities as well as the path to a relevant exploit. The exploit can be extracted, compiled, and run against specific vulnerabilities. Take a look at the following screenshot, which shows the description of the vulnerabilities: The search script scans for each line in the CSV file from left to right, so the order of the search terms is important—a search for Oracle 10g will return several exploits, but 10g Oracle will not return any. Also, the script is weirdly case sensitive; although you are instructed to use lower case characters in the search term, a search for bulletproof FTP returns no hits, but bulletproof FTP returns seven hits, and bulletproof FTP returns no hits. More effective searches of the CSV file can be conducted using the grep command or a search tool such as KWrite (apt-get install kwrite). A search of the local database may identify several possible exploits with a description and a path listing; however, these will have to be customized to your environment, and then compiled prior to use. Copy the exploit to the /tmp directory (the given path does not take into account that the /windows/remote directory resides in the /platforms directory). Exploits presented as scripts such as Perl, Ruby, and PHP are relatively easy to implement. For example, if the target is a Microsoft II 6.0 server that may be vulnerable to a WebDAV remote authentication bypass, copy the exploit to the root directory and then execute as a standard Perl script, as shown in the following screenshot: Many of the exploits are available as source code that must be compiled before use. For example, a search for RPC-specific vulnerabilities identifies several possible exploits. An excerpt is shown in the following screenshot: The RPC DCOM vulnerability identified as 76.c is known from practice to be relatively stable. So we will use it as an example. To compile this exploit, copy it from the storage directory to the /tmp directory. In that location, compile it using GCC with the command as follows: root@kali:~# gcc76.c -o 76.exe This will use the GNU Compiler Collection application to compile 76.c to a file with the output (-o) name of 76.exe, as shown in the following screenshot: When you invoke the application against the target, you must call the executable (which is not stored in the /tmp directory) using a symbolic link as follows: root@kali:~# ./76.exe The source code for this exploit is well documented and the required parameters are clear at the execution, as shown in the following screenshot: Unfortunately, not all exploits from exploit database and other public sources compiled as readily as 76.c. There are several issues that make the use of such exploits problematic, even dangerous, for penetration testers listed as follows: Deliberate errors or incomplete source code are commonly encountered as experienced developers attempt to keep exploits away from inexperienced users, especially beginners who are trying to compromise systems without knowing the risks that go with their actions. Exploits are not always sufficiently documented; after all, there is no standard that governs the creation and use of code intended to be used to compromise a data system. As a result, they can be difficult to use, particularly for testers who lack expertise in application development. Inconsistent behaviors due to changing environments (new patches applied to the target system and language variations in the target application) may require significant alterations to the source code; again, this may require a skilled developer. There is always the risk of freely available code containing malicious functionalities. A penetration tester may think that they are conducting a proof of concept (POC) exercise and will be unaware that the exploit has also created a backdoor in the application being tested that could be used by the developer. To ensure consistent results and create a community of coders who follow consistent practices, several exploit frameworks have been developed. The most popular exploitation framework is the Metasploit framework. Vulnerability scanning with nmap There are no security operating distributions without nmap, so far we have discussed how to utilize nmap during active reconnaissance, but attackers don't just use nmap to find open ports and services, but also engage the nmap to perform the vulnerability assessment. As of 10 March 2017, the latest version of nmap is 7.40 and it ships with 500+ NSE (nmap scripting engine) scripts, as shown in the following screenshot: Penetration testers utilize nmap's most powerful and flexible features, which allows them to write their own scripts and also automate them to ease the exploitation. Primarily the NSE was developed for the following reasons: Network discovery: Primary purpose that attackers would utilize the nmap is for the network discovery. Classier version detection of a service: There are 1000's of services with multiple version details to the same service, so make it more sophisticated. Vulnerability detection: To automatically identify vulnerability in a vast network range; however, nmap itself cannot be a fully vulnerability scanner in itself. Backdoor detection: Some of the scripts are written to identify the pattern if there are any worms infections on the network, it makes the attackers job easy to narrow down and focus on taking over the machine remotely. Vulnerability exploitation: Attackers can also potentially utilize nmap to perform exploitation in combination with other tools such as Metasploit or write a custom reverse shell code and combine nmap's capability of exploitation. Before firing the nmap to perform the vulnerability scan, penetration testers must update the nmap script db to see if there are any new scripts added to the database so that they don't miss the vulnerability identification: nmap –script-updatedb Running for all the scripts against the target host: nmap-T4 -A -sV -v3 -d –oATargetoutput --script all --script-argsvulns.showalltarget.com Introduction to LUA scripting Light weight embeddable scripting language, which is built on top of the C programming language, was created in Brazil in 1993 and is still actively developed. It is a powerful and fast programming language mostly used in gaming applications and image processing. Complete source code, manual, plus binaries for some platforms do not go beyond 1.44 MB (which is less than a floppy disk). Some of the security tools that are developed in LUA are nmap, Wireshark, and Snort 3.0. One of the reasons why LUA is chosen to be the scripting language in Information security is due to the compactness, no buffer overflows and format string vulnerabilities, and it can be interpreted. LUA can be installed directly to Kali Linux by issuing the apt-get install lua5.1 command on the Terminal. The following code extract is the sample script to read the file and print the first line: #!/usr/bin/lua local file = io.open("/etc/passwd", "r") contents = file:read() file:close() print (contents) LUA is similar to any other scripting such as bash and PERL scripting. The preceding script should produce the output as shown in the following screenshot: Customizing NSE scripts In order to achieve maximum effectiveness, customization of scripts helps penetration testers in finding the right vulnerabilities within the given span of time. However, attackers do not have the time limit. The following code extract is a LUANSE script to identify a specific file location that we will search on the entire subnet using nmap: local http=require 'http' description = [[ This is my custom discovery on the network ]] categories = {"safe","discovery"} require("http") functionportrule(host, port) returnport.number == 80 end function action(host, port) local response response = http.get(host, port, "/test.txt") ifresponse.status and response.status ~= 404 then return "successful" end end Save the file into the /usr/share/nmap/scripts/ folder. Finally, your script is ready to be tested as shown in the following screenshot; you must be able to run your own NSE script without any problems: To completely understand the preceding NSE script here is the description of what is in the code: local http: requires HTTP – calling the right library from the LUA, the line calls the HTTP script and made it a local request. Description: Where testers/researchers can enter the description of the script. Categories: This typically has two variables, where one declares whether it is safe or intrusive.  
Read more
  • 0
  • 0
  • 24093
article-image-and-running-pandas
Packt
18 Jul 2017
15 min read
Save for later

Up and Running with pandas

Packt
18 Jul 2017
15 min read
In this article by Michael Heydt, author of the book Learning Pandas - Second Edition, we will cover how to install pandas and start using its basic functionality.  The content is provided as IPython and Jupyter notebooks, and hence we will also take a quick look at using both of those tools. We will utilize the Anaconda Scientific Python distribution from Continuum. Anaconda is a popular Python distribution with both free and paid components. Anaconda provides cross-platform support—including Windows, Mac, and Linux. The base distribution of Anaconda installs pandas, IPython and Jupyter notebooks, thereby making it almost trivial to get started. (For more resources related to this topic, see here.) IPython and Jupyter Notebook So far we have executed Python from the command line / terminal. This is the default Read-Eval-Print-Loop (REPL)that comes with Python. Let’s take a brief look at both. IPython IPython is an alternate shell for interactively working with Python. It provides several enhancements to the default REPL provided by Python. If you want to learn about IPython in more detail check out the documentation at https://ipython.org/ipython-doc/3/interactive/tutorial.html To start IPython, simply execute the ipython command from the command line/terminal. When started you will see something like the following: The input prompt which shows In [1]:. Each time you enter a statement in the IPythonREPL the number in the prompt will increase. Likewise, output from any particular entry you make will be prefaced with Out [x]:where x matches the number of the corresponding In [x]:.The following demonstrates: This numbering of in and out statements will be important to the examples as all examples will be prefaced with In [x]:and Out [x]: so that you can follow along. Note that these numberings are purely sequential. If you are following through the code in the text and occur errors in input or enter additional statements, the numbering may get out of sequence (they can be reset by exiting and restarting IPython).  Please use them purely as reference. Jupyter Notebook Jupyter Notebook is the evolution of IPython notebook. It is an open-source web application that allows you to create and share documents that contain live code, equations, visualizations, and markdown. The original IPython notebook was constrained to Python as the only language. Jupyter Notebook has evolved to allow many programming languages to be used including Python, R, Julia, Scala, and F#. If you want to take a deeper look at Jupyter Notebook head to the following URL http://jupyter.org/: where you will be presented with a page similar to the following: Jupyter Notebookcan be downloaded and used independently of Python. Anaconda does installs it by default. To start a Jupyter Notebookissue the following command at the command line/Terminal: $ jupyter notebook To demonstrate, let's look at how to run the example code that comes with the text. Download the code from the Packt website and and unzip the file to a directory of your choosing. In the directory, you will see the following contents: Now issue the jupyter notebook command. You should see something similar to the following: A browser page will open displaying the Jupyter Notebook homepage which is http://localhost:8888/tree. This will open a web browser window showing this page, which will be a directory listing: Clicking on a .ipynb link opens a notebook page like the following: The notebook that is displayed is HTML that was generated by Jupyter and IPython.  It consists of a number of cells that can be one of 4 types: code, markdown, raw nbconvert or heading. Jupyter runs an IPython kernel for each notebook.  Cells that contain Python code are executed within that kernel and the results added to the notebook as HTML. Double-clicking on any of the cells will make the cell editable.  When done editing the contents of a cell, press shift-return,  at which point Jupyter/IPython will evaluate the contents and display the results. If you wan to learn more about the notebook format that underlies the pages, see https://ipython.org/ipython-doc/3/notebook/nbformat.html. The toolbar at the top of a notebook gives you a number of abilities to manipulate the notebook. These include adding, removing, and moving cells up and down in the notebook. Also available are commands to run cells, rerun cells, and restart the underlying IPython kernel. To create a new notebook, go to File > New Notebook > Python 3 menu item. A new notebook page will be created in a new browser tab. Its name will be Untitled: The notebook consists of a single code cell that is ready to have Python entered.  EnterIPython1+1 in the cell and press Shift + Enter to execute. The cell has been executed and the result shown as Out [2]:.  Jupyter also opened a new cellfor you to enter more code or markdown. Jupyter Notebook automatically saves your changes every minute, but it's still a good thing to save manually every once and a while. One final point before we look at a little bit of pandas. Code in the text will be in the format of  command -line IPython. As an example, the cell we just created in our notebook will be shown as follows: In [1]: 1+1 Out [1]: 2 Introducing the pandas Series and DataFrame Let’s jump into using some pandas with a brief introduction to pandas two main data structures, the Series and the DataFrame. We will examine the following: Importing pandas into your application Creating and manipulating a pandas Series Creating and manipulating a pandas DataFrame Loading data from a file into a DataFrame The pandas Series The pandas Series is the base data structure of pandas. A Series is similar to a NumPy array, but it differs by having an index which allows for much richer lookup of items instead of just a zero-based array index value. The following creates a Series from a Python list.: In [2]: # create a four item Series s = Series([1, 2, 3, 4]) s Out [2]:0 1 1 2 2 3 3 4 dtype: int64 The output consists of two columns of information.  The first is the index and the second is the data in the Series. Each row of the output represents the index label (in the first column) and then the value associated with that label. Because this Series was created without specifying an index (something we will do next), pandas automatically creates an integer index with labels starting at 0 and increasing by one for each data item. The values of a Series object can then be accessed through the using the [] operator, paspassing the label for the value you require. The following gets the value for the label 1: In [3]:s[1] Out [3]: 2 This looks very much like normal array access in many programming languages. But as we will see, the index does not have to start at 0, nor increment by one, and can be many other data types than just an integer. This ability to associate flexible indexes in this manner is one of the great superpowers of pandas. Multiple items can be retrieved by specifying their labels in a python list. The following retrieves the values at labels 1 and 3: In [4]: # return a Series with the row with labels 1 and 3 s[[1, 3]] Out [4]: 1 2 3 4 dtype: int64 A Series object can be created with a user-defined index by using the index parameter and specifying the index labels. The following creates a Series with the same valuesbut with an index consisting of string values: In [5]:# create a series using an explicit index s = pd.Series([1, 2, 3, 4], index = ['a', 'b', 'c', 'd']) s Out [5}:a 1 b 2 c 3 d 4 dtype: int64 Data in the Series object can now be accessed by those alphanumeric index labels. The following retrieves the values at index labels 'a' and 'd': In [6]:# look up items the series having index 'a' and 'd' s[['a', 'd']] Out [6]:a 1 d 4 dtype: int64 It is still possible to refer to the elements of this Series object by their numerical 0-based position. : In [7]:# passing a list of integers to a Series that has # non-integer index labels will look up based upon # 0-based index like an array s[[1, 2]] Out [7]:b 2 c 3 dtype: int64 We can examine the index of a Series using the .index property: In [8]:# get only the index of the Series s.index Out [8]: Index(['a', 'b', 'c', 'd'], dtype='object') The index is itself actually a pandas objectand this output shows us the values of the index and the data type used for the index. In this case note that the type of the data in the index (referred to as the dtype) is object and not string. A common usage of a Series in pandas is to represent a time-series that associates date/time index labels withvalues. The following demonstrates by creating a date range can using the pandas function pd.date_range(): In [9]:# create a Series who's index is a series of dates # between the two specified dates (inclusive) dates = pd.date_range('2016-04-01', '2016-04-06') dates Out [9]:DatetimeIndex(['2016-04-01', '2016-04-02', '2016- 04-03', '2016-04-04', '2016-04-05', '2016-04-06'], dtype='datetime64[ns]', freq='D') This has created a special index in pandas referred to as a DatetimeIndex which is a specialized type of pandas index that is optimized to index data with dates and times. Now let's create a Series using this index. The data values represent high-temperatures on those specific days: In [10]:# create a Series with values (representing temperatures) # for each date in the index temps1 = Series([80, 82, 85, 90, 83, 87], index = dates) temps1 Out [10]:2016-04-01 80 2016-04-02 82 2016-04-03 85 2016-04-04 90 2016-04-05 83 2016-04-06 87 Freq: D, dtype: int64 This type of Series with a DateTimeIndexis referred to as a time-series. We can look up a temperature on a specific data by using the date as a string: In [11]:temps1['2016-04-04'] Out [11]:90 Two Series objects can be applied to each other with an arithmetic operation. The following code creates a second Series and calculates the difference in temperature between the two: In [12]:# create a second series of values using the same index temps2 = Series([70, 75, 69, 83, 79, 77], index = dates) # the following aligns the two by their index values # and calculates the difference at those matching labels temp_diffs = temps1 - temps2 temp_diffs Out [12]:2016-04-01 10 2016-04-02 7 2016-04-03 16 2016-04-04 7 2016-04-05 4 2016-04-06 10 Freq: D, dtype: int64 The result of an arithmetic operation (+, -, /, *, …) on two Series objects that are non-scalar values returns another Series object. Since the index is not integerwe can also then look up values by 0-based value: In [13]:# and also possible by integer position as if the # series was an array temp_diffs[2] Out [13]: 16 Finally, pandas provides many descriptive statistical methods. As an example, the following returns the mean of the temperature differences: In [14]: # calculate the mean of the values in the Serie temp_diffs Out [14]: 9.0 The pandas DataFrame A pandas Series can only have a single value associated with each index label. To have multiple values per index label we can use a DataFrame. A DataFrame represents one or more Series objects aligned by index label.  Each Series will be a column in the DataFrame, and each column can have an associated name.— In a way, a DataFrame is analogous to a relational database table in that it contains one or more columns of data of heterogeneous types (but a single type for all items in each respective column). The following creates a DataFrame object with two columns and uses the temperature Series objects: In [15]:# create a DataFrame from the two series objects temp1 # and temp2 and give them column names temps_df = DataFrame( {'Missoula': temps1, 'Philadelphia': temps2}) temps_df Out [15]: Missoula Philadelphia 2016-04-01 80 70 2016-04-02 82 75 2016-04-03 85 69 2016-04-04 90 83 2016-04-05 83 79 2016-04-06 87 77 The resulting DataFrame has two columns named Missoula and Philadelphia. These columns are new Series objects contained within the DataFrame with the values copied from the original Series objects. Columns in a DataFrame object can be accessed using an array indexer [] with the name of the column or a list of column names. The following code retrieves the Missoula column: In [16]:# get the column with the name Missoula temps_df['Missoula'] Out [16]:2016-04-01 80 2016-04-02 82 2016-04-03 85 2016-04-04 90 2016-04-05 83 2016-04-06 87 Freq: D, Name: Missoula, dtype: int64 And the following code retrieves the Philadelphia column: In [17]:# likewise we can get just the Philadelphia column temps_df['Philadelphia'] Out [17]:2016-04-01 70 2016-04-02 75 2016-04-03 69 2016-04-04 83 2016-04-05 79 2016-04-06 77 Freq: D, Name: Philadelphia, dtype: int64 A Python list of column names can also be used to return multiple columns.: In [18]:# return both columns in a different order temps_df[['Philadelphia', 'Missoula']] Out [18]: Philadelphia Missoula 2016-04-01 70 80 2016-04-02 75 82 2016-04-03 69 85 2016-04-04 83 90 2016-04-05 79 83 2016-04-06 77 87 There is a subtle difference in a DataFrame object as compared to a Series object. Passing a list to the [] operator of DataFrame retrieves the specified columns whereas aSerieswould return rows. If the name of a column does not have spaces it can be accessed using property-style: In [19]:# retrieve the Missoula column through property syntax temps_df.Missoula Out [19]:2016-04-01 80 2016-04-02 82 2016-04-03 85 2016-04-04 90 2016-04-05 83 2016-04-06 87 Freq: D, Name: Missoula, dtype: int64 Arithmetic operations between columns within a DataFrame are identical in operation to those on multiple Series. To demonstrate, the following code calculates the difference between temperatures using property notation: In [20]:# calculate the temperature difference between the two #cities temps_df.Missoula - temps_df.Philadelphia Out [20]:2016-04-01 10 2016-04-02 7 2016-04-03 16 2016-04-04 7 2016-04-05 4 2016-04-06 10 Freq: D, dtype: int64 A new column can be added to DataFrame simply by assigning another Series to a column using the array indexer [] notation. The following adds a new column in the DataFrame with the temperature differences: In [21]:# add a column to temp_df which contains the difference # in temps temps_df['Difference'] = temp_diffs temps_df Out [21]: Missoula Philadelphia Difference 2016-04-01 80 70 10 2016-04-02 82 75 7 2016-04-03 85 69 16 2016-04-04 90 83 7 2016-04-05 83 79 4 2016-04-06 87 77 10 The names of the columns in a DataFrame are accessible via the.columns property. : In [22]:# get the columns, which is also an Index object temps_df.columns Out [22]:Index(['Missoula', 'Philadelphia', 'Difference'], dtype='object') The DataFrame (and Series) objects can be sliced to retrieve specific rows. The following slices the second through fourth rows of temperature difference values: In [23]:# slice the temp differences column for the rows at # location 1 through 4 (as though it is an array) temps_df.Difference[1:4] Out [23]: 2016-04-02 7 2016-04-03 16 2016-04-04 7 Freq: D, Name: Difference, dtype: int64 Entire rows from a DataFrame can be retrieved using the .loc and .iloc properties. .loc ensures that the lookup is by index label, where .iloc uses the 0-based position. - The following retrieves the second row of the DataFrame. In[24]:# get the row at array position 1 temps_df.iloc[1] Out [24]:Missoula 82 Philadelphia 75 Difference 7 Name: 2016-04-02 00:00:00, dtype: int64 Notice that this result has converted the row into a Series with the column names of the DataFrame pivoted into the index labels of the resulting Series.The following shows the resulting index of the result. In [25]:# the names of the columns have become the index # they have been 'pivoted' temps_df.iloc[1].index Out [25]:Index(['Missoula', 'Philadelphia', 'Difference'], dtype='object') Rows can be explicitly accessed via index label using the .loc property. The following code retrieves a row by the index label: In [26]:# retrieve row by index label using .loc temps_df.loc['2016-04-05'] Out [26]:Missoula 83 Philadelphia 79 Difference 4 Name: 2016-04-05 00:00:00, dtype: int64 Specific rows in a DataFrame object can be selected using a list of integer positions. The following selects the values from the Differencecolumn in rows at integer-locations 1, 3, and 5: In [27]:# get the values in the Differences column in rows 1, 3 # and 5using 0-based location temps_df.iloc[[1, 3, 5]].Difference Out [27]:2016-04-02 7 2016-04-04 7 2016-04-06 10 Freq: 2D, Name: Difference, dtype: int64 Rows of a DataFrame can be selected based upon a logical expression that is applied to the data in each row. The following shows values in the Missoulacolumn that are greater than 82 degrees: In [28]:# which values in the Missoula column are > 82? temps_df.Missoula > 82 Out [28]:2016-04-01 False 2016-04-02 False 2016-04-03 True 2016-04-04 True 2016-04-05 True 2016-04-06 True Freq: D, Name: Missoula, dtype: bool The results from an expression can then be applied to the[] operator of a DataFrame (and a Series) which results in only the rows where the expression evaluated to Truebeing returned: In [29]:# return the rows where the temps for Missoula > 82 temps_df[temps_df.Missoula > 82] Out [29]: Missoula Philadelphia Difference 2016-04-03 85 69 16 2016-04-04 90 83 7 2016-04-05 83 79 4 2016-04-06 87 77 10 This technique is referred to as boolean election in pandas terminologyand will form the basis of selecting rows based upon values in specific columns (like a query in —SQL using a WHERE clause - but as we will see also being much more powerful). Visualization We will dive into visualization in quite some depth in Chapter 14, Visualization, but prior to then we will occasionally perform a quick visualization of data in pandas. Creating a visualization of data is quite simple with pandas. All that needs to be done is to call the .plot() method. The following demonstrates by plotting the Close value of the stock data. In [40]: df[['Close']].plot(); Summary In this article, took an introductory look at the pandas Series and DataFrame objects, demonstrating some of the fundamental capabilities. This exposition showed you how to perform a few basic operations that you can use to get up and running with pandas prior to diving in and learning all the details. Resources for Article: Further resources on this subject: Using indexes to manipulate pandas objects [article] Predicting Sports Winners with Decision Trees and pandas [article] The pandas Data Structures [article]
Read more
  • 0
  • 0
  • 3151

article-image-parallelize-it
Packt
18 Jul 2017
15 min read
Save for later

Parallelize It

Packt
18 Jul 2017
15 min read
In this article by Elliot Forbes, the author of the book Learning Concurrency in Python, will explain concurrency and parallelism thoroughly, and bring necessary CPU knowledge related to it. Concurrency and parallelism are two concepts that are commonly confused. The reality though is that they are quite different and if you designed software to be concurrent when instead you needed parallel execution then you could be seriously impacting your software’s true performance potential. Due to this, it's vital to know exactly what the two concepts mean so that you can understand the differences. Through knowing these differences you’ll be putting yourself at a distinct advantage when it comes to designing your own high performance software in Python. In this article we’ll be covering the following topics: What is concurrency and what are the major bottlenecks that impact our applications? What is parallelism and how does this differ from concurrency? (For more resources related to this topic, see here.) Understanding concurrency Concurrency is essentially the practice of doing multiple things at the same time, but not specifically in parallel. It can help us to improve the perceived performance of our applications and it can also improve the speed at which our applications run. The best way to think of how concurrency works is to imagine one person working on multiple tasks and quickly switching between these tasks. Imagine this one person was working concurrently on a program and at the same time dealing with support requests. This person would focus primarily on the writing of their program and quickly context switch to fixing a bug or dealing with a support issue should there be one. Once they complete the support task, they could context switch again back to writing their program really quickly. However, in computing there are typically two performance bottlenecks that we have to watch out for and guard against when writing our programs. It’s important to know the differences between the two bottlenecks as if we tried to apply concurrency to a CPU based bottleneck then you could find that the program actually starts to see performance decreases as opposed to increases. And if you tried to apply parallelism to a task that really require a concurrent solution then again you could see the same performance hits. Properties of concurrent systems All concurrent systems share a similar set of properties, these can be defined as: Multiple actors: This represent the different processes and threads all trying to actively make progress on their own tasks. We could have multiple processes that contain multiple threads all trying to run at the same time. Shared Resources: This represents the memory, the disk and other resources that the actors in the above group must utilize in order to perform what they need to do. Rules: All concurrent systems must follow a strict set of rules that define when actors can and can’t acquire locks, access memory, modify state and so on. These rules are vital in order for these concurrent systems to work otherwise our programs would tear themselves apart. Input/Output bottlenecks Input/Output bottlenecks, or I/O bottlenecks for short, are bottlenecks where your computer spends more time waiting on various inputs and outputs than it does on processing the information. You’ll typically find this type of bottleneck when you are working with an I/O heavy application. We could take your standard web browser as an example of a heavy I/O application. In a browser we typically spend a significantly longer amount of time waiting for network requests to finish for things like style sheets, scripts or HTML pages to load as opposed to rendering this on the screen. If the rate at which data is requested is slower than the rate than which it is consumed at then you have yourself an I/O bottleneck. One of the main ways to improve the speed of these applications typically is to either improve the speed of the underlying I/O by buying more expensive and faster hardware or to improve the way in which we handle these I/O requests. A great example of a program bound by I/O bottlenecks would be a web crawler. Now the main purpose of a web crawler is to traverse the web and essentially index web pages so that they can be taken into consideration when Google runs its search ranking algorithm to decide the top 10 results for a given keyword. We’ll start by creating a very simple script that just requests a page and times how long it takes to request said web page: import urllib.request import time t0 = time.time() req = urllib.request.urlopen('http://www.example.com') pageHtml = req.read() t1 = time.time() print("Total Time To Fetch Page: {} Seconds".format(t1-t0)) If we break down this code, first we import the two necessary modules, urllib.request and the time module. We then record the starting time and request the web page: example.com and then record the ending time and printing out the time difference. Now say we wanted to add a bit of complexity and follow any links to other pages so that we could index them in the future. We could use a library such as BeautifulSoup in order to make our lives a little easier: import urllib.request import time from bs4 import BeautifulSoup t0 = time.time() req = urllib.request.urlopen( 'http://www.example.com' ) t1 = time.time() print("Total Time To Fetch Page: {} Seconds".format(t1-t0)) soup = BeautifulSoup(req.read(), "html.parser" ) for link in soup.find_all( 'a' ): print (link.get( 'href' )) t2 = time.time() print( "Total Execeution Time: {} Seconds" .format) When I execute the above program I see the results like so in my terminal: You’ll notice from this output that the time to fetch the page is over a quarter of a second. Now imagine we wanted to run our web crawler for a million different web pages, our total execution time would be roughly a million times longer. The main real cause for this enormous execution time would be purely down to the I/O bottleneck we face in our program. We spend a massive amount of time waiting on our network requests and a fraction of that time parsing our retrieved page for further links to crawl. Understanding parallelism Parallelism is the art of executing two or more actions simultaneously as opposed to concurrency in which you make progress on two or more things at the same time. This is an important distinction, and in order to achieve true parallelism, we’ll need multiple processors on which to run our code on at the same time. A good analogy to think of parallel processing is to think of a queue for coffee. If you had say two queues of 20 people all waiting to use this coffee machine so that they can get through the rest of the day. Well this would be an example of concurrency. Now say you were to introduce a second coffee machine into the mix, this would then be an example of something happening in parallel. This is exactly how parallel processing works, each of the coffee machines in that room would represent one processing core and are able to make progress on tasks simultaneously. A real life example which highlights the true power of parallel processing is your computer’s graphics card. These graphics cards tend to have hundreds if not thousands of individual processing cores that live independently and can compute things at the same time. The reason we are able to run high-end PC games at such smooth frame rates is due to the fact we’ve been able to put so many parallel cores onto these cards. CPU bound bottleneck A CPU bound bottleneck is typically the inverse of an I/O bound bottleneck. This bottleneck is typically found in applications that do a lot of heavy number crunching or any other task that is computationally expensive. These are programs for which the rate at which they execute is bound by the speed of the CPU, if you throw a faster CPU in your machine you should see a direct increase in the speed of these programs. If the rate at which you are processing data far outweighs the rate at which you are requesting data then you have a CPU Bound Bottleneck. How do they work on a CPU? Understanding the differences outlined in the previous section between both concurrency and parallelism is essential but it’s also very important to understand more about the systems that your software will be running on. Having an appreciation of the different architecture styles as well as the low level mechanics helps you make the most informed decisions in your software design. Single core CPUs Single core processors will only ever execute one thread at any given time as that is all they are capable of. However, in order to ensure that we don’t see our applications hanging and being unresponsive, these processors rapidly switch between multiple threads of execution many thousands of times per second. This switching between threads is what is called a "context switch" and involves storing all the necessary information for a thread at a specific point of time and then restoring it at a different point further down the line. Using this mechanism of constantly saving and restoring threads allows us to make progress on quite a number of threads within a given second and it appears like the computer is doing multiple things at once. It is in fact doing only one thing at any given time but doing it at such speed that it’s imperceptible to users of that machine. When writing multi-threaded applications in Python it is important to note that these context switches are computationally quite expensive. There is no way to get around this unfortunately and much of the design of operating systems these days is about optimizing for these context switches so that we don’t feel the pain quite as much. Advantages of single core CPUs: They do not require any complex communication protocols between multiple cores Single core CPUs require less power which typically makes them better suited for IoT devices Disadvantages: They are limited in speed and larger applications will cause them to struggle and potentially freeze Heat dissipation issues place a hard limit on how fast a single core CPU can go Clock rate One of the key limitations to a single-core application running on a machine is the Clock Speed of the CPU. When we talk about Clock rate, we are essentially talking about how many clock cycles a CPU can execute every second. For the past 10 years we have watched as manufacturers have been able to surpass Moore’s law which was essentially an observation that the number of transistors one was able to place on a piece of silicon was able to double roughly every 2 years. This doubling of transistors every 2 years paved the way for exponential gains in single-cpu clock rates and CPUs went from the low MHz to the 4-5GHz clock speeds we are seeing on Intel’s i7 6700k processor. But with transistors getting as small as a few nanometers across, this is inevitably coming to an end. We’ve started to hit the boundaries of Physics and unfortunately if we go any smaller we’ll start to be hit by the effects of quantum tunneling. Due to these physical limitations we need to start looking at other methods in order to improve the speeds at which we are able to compute things. This is where Materlli’s Model of Scalability comes into play. Martelli model of scalability The author of Python Cookbook, Alex Martelli came up with a model on scalability which Raymond Hettinger discussed in his brilliant hour-long talk "Thinking about Concurrency", which he gave at PyCon Russia 2016. This model represents three different types of problem and programs: 1 core: single threaded and single process programs 2-8 cores: multithreaded and multiprocess programs 9+ cores: distributed computing The first category, the single core, single threaded category is able to handle a growing number of problems due to the constant improvements of the speed of single core CPUs and as a result the second category is being rendered more and more obsolete. We will eventually hit a limit with the speed at which a 2-8 core system can run at and as a result we’ll have to start looking at other methods such as multiple CPU systems or even distributed computing. If your problem is worth solving quickly and it requires a lot of power then the sensible approach is to go with the distributed computing category and spin up multiple machines and multiple instances of your program in order to tackle your problems in a truly parallel manner. Large enterprise systems that handle hundreds of millions of requests are the main inhabitants of this category. You’ll typically find that these enterprise systems are deployed on tens, if not hundreds of high performance, incredibly powerful servers in various locations across the world. Time-Sharing - the task scheduler One of the most important parts of the Operating System is the task scheduler. This acts as the maestro of the orchestra and directs everything with impeccable precision and incredible timing and discipline. This maestro has only one real goal and that is to ensure that every task has a chance to run through till completion, the when and where of a task’s execution however is non-deterministic. That is to say, if we gave a task scheduler two identical competing processes one after the other, there is no guarantee that the first process will complete first. This non-deterministic nature is what makes concurrent programming so challenging. An excellent example that highlights this non-deterministic behavior is say we take the following code: import threading import time import random counter = 1 def workerA(): global counter while counter < 1000: counter += 1 print("Worker A is incrementing counter to {}".format(counter)) sleepTime = random.randint(0,1) time.sleep(sleepTime) def workerB(): global counter while counter > -1000: counter -= 1 print("Worker B is decrementing counter to {}".format(counter)) sleepTime = random.randint(0,1) time.sleep(sleepTime) def main(): t0 = time.time() thread1 = threading.Thread(target=workerA) thread2 = threading.Thread(target=workerB) thread1.start() thread2.start() thread1.join() thread2.join() t1 = time.time() print("Execution Time {}".format(t1-t0)) if __name__ == '__main__': main() Here we have two competing threads in Python that are each trying to accomplish their own goal of either decrementing the counter to 1,000 or conversely incrementing it to 1,000. In a single core processor there is the possibility that worker A managers to complete its task before worker B has a chance to execute and the same can be said for worker B. However there is a third potential possibility and that is that the task scheduler continues to switch between worker A and worker B for an infinite number of times and never complete. The above code incidentally also shows one of the dangers of multiple threads accessing shared resources without any form of synchronization. There is no accurate way to determine what will happen to our counter and as such our program could be considered unreliable. Multi-core processors We’ve now got some idea as to how single-core processors work, but now it’s time to take a look at multicore processors. Multicore processors contain multiple independent processing units or “cores”. Each core contains everything it needs in order to execute a sequence of stored instructions. These cores each follow their own cycle: Fetch - This step involves fetching instructions from program memory. This is dictated by a program counter (PC) which identifies the location of the next step to execute. Decode - The core converts the instruction that it has just fetched and converts it into a series of signals that will trigger various other parts of the CPU. Execute - Finally we perform the execute step. This is where we run the instruction that we have just fetched and decoded and typically the results of this execution are then stored in a CPU register. Having multiple cores offers us the advantage of being able to work independently on multiple Fetch -> Decode -> Execute cycles. This style of architecture essentially enables us to create higher performance programs that leverage this parallel execution. Advantages of multicore processors: We are no longer bound by the same performance limitations that a single core processor is bound Applications that are able to take advantage of multiple cores will tend to run faster if well designed Disadvantages of multicore processors: They require more power than your typical single core processor. Cross-core communication is no simple feat, we have multiple different ways of doing this. Summary In this article we covered a multitude of topics including the differences between Concurrency and Parallelism. We also looked at how they both leverage the CPU in different ways. Resources for Article: Further resources on this subject: Python Data Science Up and Running [article] Putting the Fun in Functional Python [article] Basics of Python for Absolute Beginners [article]
Read more
  • 0
  • 0
  • 4959
Modal Close icon
Modal Close icon