Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletter Hub
Free Learning
Arrow right icon
timer SALE ENDS IN
0 Days
:
00 Hours
:
00 Minutes
:
00 Seconds

How-To Tutorials

7014 Articles
article-image-tangled-web-not-all
Packt
22 Jun 2017
20 min read
Save for later

Tangled Web? Not At All!

Packt
22 Jun 2017
20 min read
In this article by Clif Flynt, the author of the book Linux Shell Scripting Cookbook - Third Edition, we can see a collection of shell-scripting recipes that talk to services on the Internet. This articleis intended to help readers understand how to interact with the Web using shell scripts to automate tasks such as collecting and parsing data from web pages. This is discussed using POST and GET to web pages, writing clients to web services. (For more resources related to this topic, see here.) In this article, we will cover the following recipes: Downloading a web page as plain text Parsing data from a website Image crawler and downloader Web photo album generator Twitter command-line client Tracking changes to a website Posting to a web page and reading response Downloading a video from the Internet The Web has become the face of technology and the central access point for data processing. The primary interface to the web is via a browser that's designed for interactive use. That's great for searching and reading articles on the web, but you can also do a lot to automate your interactions with shell scripts. For instance, instead of checking a website daily to see if your favorite blogger has added a new blog, you can automate the check and be informed when there's new information. Similarly, twitter is the current hot technology for getting up-to-the-minute information. But if I subscribe to my local newspaper's twitter account because I want the local news, twitter will send me all news, including high-school sports that I don't care about. With a shell script, I can grab the tweets and customize my filters to match my desires, not rely on their filters. Downloading a web page as plain text Web pages are simply text with HTML tags, JavaScript and CSS. The HTML tags define the content of a web page, which we can parse for specific content. Bash scripts can parse web pages. An HTML file can be viewed in a web browser to see it properly formatted. Parsing a text document is simpler than parsing HTML data because we aren't required to strip off the HTML tags. Lynx is a command-line web browser which download a web page as plaintext. Getting Ready Lynx is not installed in all distributions, but is available via the package manager. # yum install lynx or apt-get install lynx How to do it... Let's download the webpage view, in ASCII character representation, in a text file by using the -dump flag with the lynx command: $ lynx URL -dump > webpage_as_text.txt This command will list all the hyperlinks <a href="link"> separately under a heading References, as the footer of the text output. This lets us parse links separately with regular expressions. For example: $lynx -dump http://google.com > plain_text_page.txt You can see the plaintext version of text by using the cat command: $ cat plain_text_page.txt Search [1]Images [2]Maps [3]Play [4]YouTube [5]News [6]Gmail [7]Drive [8]More » [9]Web History | [10]Settings | [11]Sign in [12]St. Patrick's Day 2017 _______________________________________________________ Google Search I'm Feeling Lucky [13]Advanced search [14]Language tools [15]Advertising Programs [16]Business Solutions [17]+Google [18]About Google © 2017 - [19]Privacy - [20]Terms References Parsing data from a website The lynx, sed, and awk commands can be used to mine data from websites. How to do it... Let's go through the commands used to parse details of actresses from the website: $ lynx -dump -nolist http://www.johntorres.net/BoxOfficefemaleList.html | grep -o "Rank-.*" | sed -e 's/ *Rank-([0-9]*) *(.*)/1t2/' | sort -nk 1 > actresslist.txt The output is: # Only 3 entries shown. All others omitted due to space limits 1 Keira Knightley 2 Natalie Portman 3 Monica Bellucci How it works... Lynx is a command-line web browser—it can dump a text version of a website as we would see in a web browser, instead of returning the raw html as wget or cURL do. This saves the step of removing HTML tags. The -nolist option shows the links without numbers. Parsing and formatting the lines that contain Rank is done with sed: sed -e 's/ *Rank-([0-9]*) *(.*)/1t2/' These lines are then sorted according to the ranks. See also The Downloading a web page as plain text recipe in this article explains the lynx command. Image crawler and downloader Image crawlers download all the images that appear in a web page. Instead of going through the HTML page by hand to pick the images, we can use a script to identify the images and download them automatically. How to do it... This Bash script will identify and download the images from a web page: #!/bin/bash #Desc: Images downloader #Filename: img_downloader.sh if [ $# -ne 3 ]; then echo "Usage: $0 URL -d DIRECTORY" exit -1 fi while [ -n $1 ] do case $1 in -d) shift; directory=$1; shift ;; *) url=$1; shift;; esac done mkdir -p $directory; baseurl=$(echo $url | egrep -o "https?://[a-z.-]+") echo Downloading $url curl -s $url | egrep -o "<imgsrc=[^>]*>" | sed's/<imgsrc="([^"]*).*/1/g' | sed"s,^/,$baseurl/,"> /tmp/$$.list cd $directory; while read filename; do echo Downloading $filename curl -s -O "$filename" --silent done < /tmp/$$.list An example usage is: $ ./img_downloader.sh http://www.flickr.com/search/?q=linux -d images How it works... The image downloader script reads an HTML page, strips out all tags except <img>, parses src="URL" from the <img> tag, and downloads them to the specified directory. This script accepts a web page URL and the destination directory as command-line arguments. The [ $# -ne 3 ] statement checks whether the total number of arguments to the script is three, otherwise it exits and returns a usage example. Otherwise, this code parses the URL and destination directory: while [ -n "$1" ] do case $1 in -d) shift; directory=$1; shift ;; *) url=${url:-$1}; shift;; esac done The while loop runs until all the arguments are processed. The shift command shifts arguments to the left so that $1 will take the next argument's value; that is, $2, and so on. Hence, we can evaluate all arguments through $1 itself. The case statement checks the first argument ($1). If that matches -d, the next argument must be a directory name, so the arguments are shifted and the directory name is saved. If the argument is any other string it is a URL. The advantage of parsing arguments in this way is that we can place the -d argument anywhere in the command line: $ ./img_downloader.sh -d DIR URL Or: $ ./img_downloader.sh URL -d DIR The egrep -o "<imgsrc=[^>]*>"code will print only the matching strings, which are the <img> tags including their attributes. The phrase [^>]*matches all the characters except the closing >, that is, <imgsrc="image.jpg">. sed's/<imgsrc="([^"]*).*/1/g' extracts the url from the string src="url". There are two types of image source paths—relative and absolute. Absolute paths contain full URLs that start with http:// or https://. Relative URLs starts with / or image_name itself. An example of an absolute URL is http://example.com/image.jpg. An example of a relative URL is /image.jpg. For relative URLs, the starting / should be replaced with the base URL to transform it to http://example.com/image.jpg. The script initializes the baseurl by extracting it from the initial url with the command: baseurl=$(echo $url | egrep -o "https?://[a-z.-]+") The output of the previously described sed command is piped into another sed command to replace a leading / with the baseurl, and the results are saved in a file named for the script's PID: /tmp/$$.list. sed"s,^/,$baseurl/,"> /tmp/$$.list The final while loop iterates through each line of the list and uses curl to downloas the images. The --silent argument is used with curl to avoid extra progress messages from being printed on the screen. The final while loop iterates through each line of the list and uses curl to downloas the images. The --silent argument is used with curl to avoid extra progress messages from being printed on the screen. Web photo album generator Web developers frequently create photo albums of full sized and thumbnail images. When a thumbnail is clicked, a large version of the picture is displayed. This requires resizing and placing many images. These actions can be automated with a simple bash script. The script creates thumbnails, places them in exact directories, and generates the code fragment for <img> tags automatically.  Web developers frequently create photo albums of full sized and thumbnail images. When a thumbnail is clicked, a large version of the picture is displayed. This requires resizing and placing many images. These actions can be automated with a simple bash script. The script creates thumbnails, places them in exact directories, and generates the code fragment for <img> tags automatically. Getting ready This script uses a for loop to iterate over every image in the current directory. The usual Bash utilities such as cat and convert (from the Image Magick package) are used. These will generate an HTML album, using all the images, in index.html. How to do it... This Bash script will generate an HTML album page: #!/bin/bash #Filename: generate_album.sh #Description: Create a photo album using images in current directory echo "Creating album.." mkdir -p thumbs cat <<EOF1 > index.html <html> <head> <style> body { width:470px; margin:auto; border: 1px dashed grey; padding:10px; } img { margin:5px; border: 1px solid black; } </style> </head> <body> <center><h1> #Album title </h1></center> <p> EOF1 for img in *.jpg; do convert "$img" -resize "100x""thumbs/$img" echo "<a href="$img">">>index.html echo "<imgsrc="thumbs/$img" title="$img" /></a>">> index.html done cat <<EOF2 >> index.html </p> </body> </html> EOF2 echo Album generated to index.html Run the script as follows: $ ./generate_album.sh Creating album.. Album generated to index.html How it works... The initial part of the script is used to write the header part of the HTML page. The following script redirects all the contents up to EOF1 to index.html: cat <<EOF1 > index.html contents... EOF1 The header includes the HTML and CSS styling. for img in *.jpg *.JPG; iterates over the file names and evaluates the body of the loop. convert "$img" -resize "100x""thumbs/$img" creates images of 100 px width as thumbnails. The following statements generate the required <img> tag and appends it to index.html: echo "<a href="$img">" echo "<imgsrc="thumbs/$img" title="$img" /></a>">> index.html Finally, the footer HTML tags are appended with cat as done in the first part of the script. Twitter command-line client Twitter is the hottest micro-blogging platform, as well as the latest buzz of the online social media now. We can use Twitter API to read tweets on our timeline from the command line! Twitter is the hottest micro-blogging platform, as well as the latest buzz of the online social media now. We can use Twitter API to read tweets on our timeline from the command line! Let's see how to do it. Getting ready Recently, Twitter stopped allowing people to log in by using plain HTTP Authentication, so we must use OAuth to authenticate ourselves.  Perform the following steps: Download the bash-oauth library from https://github.com/livibetter/bash-oauth/archive/master.zip, and unzip it to any directory. Go to that directory and then inside the subdirectory bash-oauth-master, run make install-all as root.Go to https://apps.twitter.com/ and register a new app. This will make it possible to use OAuth. After registering the new app, go to your app's settings and change Access type to Read and Write. Now, go to the Details section of the app and note two things—Consumer Key and Consumer Secret, so that you can substitute these in the script we are going to write. Great, now let's write the script that uses this. How to do it... This Bash script uses the OAuth library to read tweets or send your own updates. #!/bin/bash #Filename: twitter.sh #Description: Basic twitter client oauth_consumer_key=YOUR_CONSUMER_KEY oauth_consumer_scret=YOUR_CONSUMER_SECRET config_file=~/.$oauth_consumer_key-$oauth_consumer_secret-rc if [[ "$1" != "read" ]] && [[ "$1" != "tweet" ]]; then echo -e "Usage: $0 tweet status_messagen ORn $0 readn" exit -1; fi #source /usr/local/bin/TwitterOAuth.sh source bash-oauth-master/TwitterOAuth.sh TO_init if [ ! -e $config_file ]; then TO_access_token_helper if (( $? == 0 )); then echo oauth_token=${TO_ret[0]} > $config_file echo oauth_token_secret=${TO_ret[1]} >> $config_file fi fi source $config_file if [[ "$1" = "read" ]]; then TO_statuses_home_timeline'''YOUR_TWEET_NAME''10' echo $TO_ret | sed's/,"/n/g' | sed's/":/~/' | awk -F~ '{} {if ($1 == "text") {txt=$2;} else if ($1 == "screen_name") printf("From: %sn Tweet: %snn", $2, txt);} {}' | tr'"''' elif [[ "$1" = "tweet" ]]; then shift TO_statuses_update''"$@" echo 'Tweeted :)' fi Run the script as follows: $./twitter.sh read Please go to the following link to get the PIN: https://api.twitter.com/oauth/authorize?oauth_token=LONG_TOKEN_STRING PIN: PIN_FROM_WEBSITE Now you can create, edit and present Slides offline. - by A Googler $./twitter.sh tweet "I am reading Packt Shell Scripting Cookbook" Tweeted :) $./twitter.sh read | head -2 From: Clif Flynt Tweet: I am reading Packt Shell Scripting Cookbook How it works... First of all, we use the source command to include the TwitterOAuth.sh library, so we can use its functions to access Twitter. The TO_init function initializes the library. Every app needs to get an OAuth token and token secret the first time it is used. If these are not present, we use the library function TO_access_token_helper to acquire them. Once we have the tokens, we save them to a config file so we can simply source it the next time the script is run. The library function TO_statuses_home_timeline fetches the tweets from Twitter. This data is retuned as a single long string in JSON format, which starts like this: [{"created_at":"Thu Nov 10 14:45:20 +0000 "016","id":7...9,"id_str":"7...9","text":"Dining... Each tweet starts with the created_at tag and includes a text and a screen_nametag. The script will extract the text and screen name data and display only those fields. The script assigns the long string to the variable TO_ret. The JSON format uses quoted strings for the key and may or may not quote the value. The key/value pairs are separated by commas, and the key and value are separated by a colon :. The first sed to replaces each," character set with a newline, making each key/value a separate line. These lines are piped to another sed command to replace each occurrence of ": with a tilde ~ which creates a line like screen_name~"Clif_Flynt" The final awk script reads each line. The -F~ option splits the line into fields at the tilde, so $1 is the key and $2 is the value. The if command checks for text or screen_name. The text is first in the tweet, but it's easier to read if we report the sender first, so the script saves a text return until it sees a screen_name, then prints the current value of $2 and the saved value of the text. The TO_statuses_updatelibrary function generates a tweet. The empty first parameter defines our message as being in the default format, and the message is a part of the second parameter. Tracking changes to a website Tracking website changes is useful to both web developers and users. Checking a website manually impractical, but a change tracking script can be run at regular intervals. When a change occurs, it generate a notification. Getting ready Tracking changes in terms of Bash scripting means fetching websites at different times and taking the difference by using the diff command. We can use curl and diff to do this. How to do it... This bash script combines different commands, to track changes in a webpage: #!/bin/bash #Filename: change_track.sh #Desc: Script to track changes to webpage if [ $# -ne 1 ]; then echo -e "$Usage: $0 URLn" exit 1; fi first_time=0 # Not first time if [ ! -e "last.html" ]; then first_time=1 # Set it is first time run fi curl --silent $1 -o recent.html if [ $first_time -ne 1 ]; then changes=$(diff -u last.html recent.html) if [ -n "$changes" ]; then echo -e "Changes:n" echo "$changes" else echo -e "nWebsite has no changes" fi else echo "[First run] Archiving.." fi cp recent.html last.html Let's look at the output of the track_changes.sh script on a website you control. First we'll see the output when a web page is unchanged, and then after making changes. Note that you should change MyWebSite.org to your website name. First, run the following command: $ ./track_changes.sh http://www.MyWebSite.org [First run] Archiving.. Second, run the command again. $ ./track_changes.sh http://www.MyWebSite.org Website has no changes Third, run the following command after making changes to the web page: $ ./track_changes.sh http://www.MyWebSite.org Changes: --- last.html 2010-08-01 07:29:15.000000000 +0200 +++ recent.html 2010-08-01 07:29:43.000000000 +0200 @@ -1,3 +1,4 @@ +added line :) data How it works... The script checks whether the script is running for the first time by using [ ! -e "last.html" ];. If last.html doesn't exist, it means that it is the first time and, the webpage must be downloaded and saved as last.html. If it is not the first time, it downloads the new copy recent.html and checks the difference with the diff utility. Any changes will be displayed as diff output.Finally, recent.html is copied to last.html. Note that changing the website you're checking will generate a huge diff file the first time you examine it. If you need to track multiple pages, you can create a folder for each website you intend to watch. Posting to a web page and reading the response POST and GET are two types of requests in HTTP to send information to, or retrieve information from a website. In a GET request, we send parameters (name-value pairs) through the webpage URL itself. The POST command places the key/value pairs in the message body instead of the URL. POST is commonly used when submitting long forms or to conceal the information submitted from a casual glance. Getting ready For this recipe, we will use the sample guestbook website included in the tclhttpd package.  You can download tclhttpd from http://sourceforge.net/projects/tclhttpd and then run it on your local system to create a local webserver. The guestbook page requests a name and URL which it adds to a guestbook to show who has visited a site when the user clicks the Add me to your guestbook button. This process can be automated with a single curl (or wget) command. How to do it... Download the tclhttpd package and cd to the bin folder. Start the tclhttpd daemon with this command: tclsh httpd.tcl The format to POST and read the HTML response from generic website resembles this: $ curl URL -d "postvar=postdata2&postvar2=postdata2" Consider the following example: $ curl http://127.0.0.1:8015/guestbook/newguest.html -d "name=Clif&url=www.noucorp.com&http=www.noucorp.com" curl prints a response page like this: <HTML> <Head> <title>Guestbook Registration Confirmed</title> </Head> <Body BGCOLOR=white TEXT=black> <a href="www.noucorp.com">www.noucorp.com</a> <DL> <DT>Name <DD>Clif <DT>URL <DD> </DL> www.noucorp.com </Body> -d is the argument used for posting. The string argument for -d is similar to the GET request semantics. var=value pairs are to be delimited by &. You can POST the data using wget by using --post-data "string". For example: $ wgethttp://127.0.0.1:8015/guestbook/newguest.cgi --post-data "name=Clif&url=www.noucorp.com&http=www.noucorp.com" -O output.html Use the same format as cURL for name-value pairs. The text in output.html is the same as that returned by the cURL command. The string to the post arguments (for example, to -d or --post-data) should always be given in quotes. If quotes are not used, & is interpreted by the shell to indicate that this should be a background process. How to do it... If you look at the website source (use the View Source option from the web browser), you will see an HTML form defined, similar to the following code: <form action="newguest.cgi"" method="post"> <ul> <li> Name: <input type="text" name="name" size="40"> <li> Url: <input type="text" name="url" size="40"> <input type="submit"> </ul> </form> Here, newguest.cgi is the target URL. When the user enters the details and clicks on the Submit button, the name and url inputs are sent to newguest.cgi as a POST request, and the response page is returned to the browser. Downloading a video from the internet There are many reasons for downloading a video. If you are on a metered service, you might want to download videos during off-hours when the rates are cheaper. You might want to watch videos where the bandwidth doesn't support streaming, or you might just want to make certain that you always have that video of cute cats to show your friends. Getting ready One program for downloading videos is youtube-dl. This is not included in most distributions and the repositories may not be up to date, so it's best to go to the youtube-dl main site:http://yt-dl.org You'll find links and information on that page for downloading and installing youtube-dl. How to do it… Using youtube-dl is easy. Open your browser and find a video you like. Then copy/paste that URL to the youtube-dl command line. youtube-dl  https://www.youtube.com/watch?v=AJrsl3fHQ74 While youtube-dl is downloading the file it will generate a status line on your terminal. How it works… The youtube-dl program works by sending a GET message to the server, just as a browser would do. It masquerades as a browser so that YouTube or other video providers will download a video as if the device were streaming. The –list-formats (-F) option will list the available formats a video is available in, and the –format (-f) option will specify which format to download. This is useful if you want to download a higher-resolution video than your internet connection can reliably stream. Summary In this article we learned how to download and parse website data, send data to forms, and automate website-usage tasks and similar activities. We can automate many activities that we perform interactively through a browser with a few lines of scripting. Resources for Article: Further resources on this subject: Linux Shell Scripting – various recipes to help you [article] Linux Shell Script: Tips and Tricks [article] Linux Shell Script: Monitoring Activities [article]
Read more
  • 0
  • 0
  • 33382

article-image-setting-intel-edison
Packt
21 Jun 2017
8 min read
Save for later

Setting up Intel Edison

Packt
21 Jun 2017
8 min read
In this article by Avirup Basu, the author of the book Intel Edison Projects, we will be covering the following topics: Setting up the Intel Edison Setting up the developer environment (For more resources related to this topic, see here.) In every Internet of Things(IoT) or robotics project, we have a controller that is the brain of the entire system. Similarly we have Intel Edison. The Intel Edison computing module comes in two different packages. One of which is a mini breakout board the other of which is an Arduino compatible board. One can use the board in its native state as well but in that case the person has to fabricate his/hers own expansion board. The Edison is basically a size of a SD card. Due to its tiny size, it's perfect for wearable devices. However it's capabilities makes it suitable for IoT application and above all, the powerful processing capability makes it suitable for robotics application. However we don't simply use the device in this state. We hook up the board with an expansion board. The expansion board provides the user with enough flexibility and compatibility for interfacing with other units. The Edison has an operating system that is running the entire system. It runs a Linux image. Thus, to setup your device, you initially need to configure your device both at the hardware and at software level. Initial hardware setup We'll concentrate on the Edison package that comes with an Arduino expansion board. Initially you will get two different pieces: The Intel® Edison board The Arduino expansion board The following given is the architecture of the device: Architecture of Intel Edison. Picture Credits: https://software.intel.com/en-us/ We need to hook these two pieces up in a single unit. Place the Edison board on top of the expansion board such that the GPIO interfaces meet at a single point. Gently push the Edison against the expansion board. You will get a click sound. Use the screws that comes with the package to tighten the set up. Once, this is done, we'll now setup the device both at hardware level and software level to be used further. Following are the steps we'll cover in details: Downloading necessary software packages Connecting your Intel® Edison to your PC Flashing your device with the Linux image Connecting to a Wi-Fi network SSH-ing your Intel® Edison device Downloading necessary software packages To move forward with the development on this platform, we need to download and install a couple of software which includes the drivers and the IDEs. Following is the list of the software along with the links that are required: Intel® Platform Flash Tool Lite (https://01.org/android-ia/downloads/intel-platform-flash-tool-lite) PuTTY (http://www.chiark.greenend.org.uk/~sgtatham/putty/download.html) Intel XDK for IoT (https://software.intel.com/en-us/intel-xdk) Arduino IDE (https://www.arduino.cc/en/Main/Software) FileZilla FTP client (https://filezilla-project.org/download.php) Notepad ++ or any other editor (https://notepad-plus-plus.org/download/v7.3.html) Drivers and miscellaneous downloads Latest Yocto* Poky image Windows standalone driver for Intel Edison FTDI drivers (http://www.ftdichip.com/Drivers/VCP.htm) The 1st and the 2nd packages can be downloaded from (https://software.intel.com/en-us/iot/hardware/edison/downloads) Plugging in your device After all the software and drivers installation, we'll now connect the device to a PC. You need two Micro-B USB Cables(s) to connect your device to the PC. You can also use a 9V power adapter and a single Micro-B USB Cable, but for now we will not use the power adapter: Different sections of Arduino expansion board of Intel Edison A small switch exists between the USB port and the OTG port. This switch must be towards the OTG port because we're going to power the device from the OTG port and not through the DC power port. Once it is connected to your PC, open your device manager and expands the ports section. If all installations of drivers were successful, then you must see two ports: Intel Edison virtual com port USB serial port Flashing your device Once your device is successfully detected an installed, you need to flash your device with the Linux image. For this we'll use the flash tool provided by Intel: Open the flash lite tool and connect your device to the PC: Intel phone flash lite tool Once the flash tool is opened, click on Browse... and browse to the .zip file of the Linux image you have downloaded. After you click on OK, the tool will automatically unzip the file. Next, click on Start to flash: Intel® Phone flash lite tool – stage 1 You will be asked to disconnect and reconnect your device. Do as the tool says and the board should start flashing. It may take some time before the flashing is completed. You are requested not to tamper with the device during the process. Once the flashing is completed, we'll now configure the device: Intel® Phone flash lite tool – complete Configuring the device After flashing is successfully we'll now configure the device. We're going to use the PuTTY console for the configuration. PuTTY is an SSH and telnet client, developed originally by Simon Tatham for the Windows platform. We're going to use the serial section here. Before opening PuTTY console: Open up the device manager and note the port number for USB serial port. This will be used in your PuTTY console: Ports for Intel® Edison in PuTTY Next select Serialon PuTTY console and enter the port number. Use a baud rate of 115200. Press Open to open the window for communicating with the device: PuTTY console – login screen Once you are in the console of PuTTY, then you can execute commands to configure your Edison. Following is the set of tasks we'll do in the console to configure the device: Provide your device a name Provide root password (SSH your device) Connect your device to Wi-Fi Initially when in the console, you will be asked to login. Type in root and press Enter. Once entered you will see root@edison which means that you are in the root directory: PuTTY console – login success Now, we are in the Linux Terminal of the device. Firstly, we'll enter the following command for setup: configure_edison –setup Press Enter after entering the command and the entire configuration will be somewhat straightforward: PuTTY console – set password Firstly, you will be asked to set a password. Type in a password and press Enter. You need to type in your password again for confirmation. Next, we'll set up a name for the device: PuTTY console – set name Give a name for your device. Please note that this is not the login name for your device. It's just an alias for your device. Also the name should be at-least 5 characters long. Once you entered the name, it will ask for confirmation press y to confirm. Then it will ask you to setup Wi-Fi. Again select y to continue. It's not mandatory to setup Wi-Fi, but it's recommended. We need the Wi-Fi for file transfer, downloading packages, and so on: PuTTY console – set Wi-Fi Once the scanning is completed, we'll get a list of available networks. Select the number corresponding to your network and press Enter. In this case it 5 which corresponds to avirup171which is my Wi-Fi. Enter the network credentials. After you do that, your device will get connected to the Wi-Fi. You should get an IP after your device is connected: PuTTY console – set Wi-Fi -2 After successful connection you should get this screen. Make sure your PC is connected to the same network. Open up the browser in your PC, and enter the IP address as mentioned in the console. You should get a screen similar to this: Wi-Fi setup – completed Now, we are done with the initial setup. However Wi-Fi setup normally doesn't happens in one go. Sometimes your device doesn't gets connected to the Wi-Fi and sometimes we cannot get this page as shown before. In those cases you need to start wpa_cli to manually configure the Wi-Fi. Refer to the following link for the details: http://www.intel.com/content/www/us/en/support/boards-and-kits/000006202.html Summary In this article, we have covered the areas of initial setup of Intel Edison and configuring it to the network. We have also covered how to transfer files to the Edison and vice versa. Resources for Article: Further resources on this subject: Getting Started with Intel Galileo [article] Creating Basic Artificial Intelligence [article] Using IntelliTrace to Diagnose Problems with a Hosted Service [article]
Read more
  • 0
  • 0
  • 26088

article-image-cors-nodejs
Packt
20 Jun 2017
14 min read
Save for later

CORS in Node.js

Packt
20 Jun 2017
14 min read
In this article by Randall Goya, and Rajesh Gunasundaram the author of the book CORS Essentials, Node.js is a cross-platform JavaScript runtime environment that executes JavaScript code at server side. This enables to have a unified language across the web application development. JavaScript becomes the unified language that runs both on client side and server side. (For more resources related to this topic, see here.) In this article we will learn about: Node.js is a JavaScript platform for developing server-side web applications. Node.js can provide the web server for other frameworks including Express.js, AngularJS, Backbone,js, Ember.js and others. Some other JavaScript frameworks such as ReactJS, Ember.js and Socket.IO may also use Node.js as the web server. Isomorphic JavaScript can add server-side functionality for client-side frameworks. JavaScript frameworks are evolving rapidly. This article reviews some of the current techniques, and syntax specific for some frameworks. Make sure to check the documentation for the project to discover the latest techniques. Understanding CORS concepts, you may create your own solution, because JavaScript is a loosely structured language. All the examples are based on the fundamentals of CORS, with allowed origin(s), methods, and headers such as Content-Type, or preflight, that may be required according to the CORS specification. JavaScript frameworks are very popular JavaScript is sometimes called the lingua franca of the Internet, because it is cross-platform and supported by many devices. It is also a loosely-structured language, which makes it possible to craft solutions for many types of applications. Sometimes an entire application is built in JavaScript. Frequently JavaScript provides a client-side front-end for applications built with Symfony, Content Management Systems such as Drupal, and other back-end frameworks. Node.js is server-side JavaScript and provides a web server as an alternative to Apache, IIS, Nginx and other traditional web servers. Introduction to Node.js Node.js is an open-source and cross-platform library that enables in developing server-side web applications. Applications will be written using JavaScript in Node.js can run on many operating systems, including OS X, Microsoft Windows, Linux, and many others. Node.js provides a non-blocking I/O and an event-driven architecture designed to optimize an application's performance and scalability for real-time web applications. The biggest difference between PHP and Node.js is that PHP is a blocking language, where commands execute only after the previous command has completed, while Node.js is a non-blocking language where commands execute in parallel, and use callbacks to signal completion. Node.js can move files, payloads from services, and data asynchronously, without waiting for some command to complete, which improves performance. Most JS frameworks that work with Node.js use the concept of routes to manage pages and other parts of the application. Each route may have its own set of configurations. For example, CORS may be enabled only for a specific page or route. Node.js loads modules for extending functionality via the npm package manager. The developer selects which packages to load with npm, which reduces bloat. The developer community creates a large number of npm packages created for specific functions. JXcore is a fork of Node.js targeting mobile devices and IoTs (Internet of Things devices). JXcore can use both Google V8 and Mozilla SpiderMonkey as its JavaScript engine. JXcore can run Node applications on iOS devices using Mozilla SpiderMonkey. MEAN is a popular JavaScript software stack with MongoDB (a NoSQL database), Express.js and AngularJS, all of which run on a Node.js server. JavaScript frameworks that work with Node.js Node.js provides a server for other popular JS frameworks, including AngularJS, Express.js. Backbone.js, Socket.IO, and Connect.js. ReactJS was designed to run in the client browser, but it is often combined with a Node.js server. As we shall see in the following descriptions, these frameworks are not necessarily exclusive, and are often combined in applications. Express.js is a Node.js server framework Express.js is a Node.js web application server framework, designed for building single-page, multi-page, and hybrid web applications. It is considered the "standard" server framework for Node.js. The package is installed with the command npm install express –save. AngularJS extends static HTML with dynamic views HTML was designed for static content, not for dynamic views. AngularJS extends HTML syntax with custom tag attributes. It provides model–view–controller (MVC) and model–view–viewmodel (MVVM) architectures in a front-end client-side framework.  AngularJS is often combined with a Node.js server and other JS frameworks. AngularJS runs client-side and Express.js runs on the server, therefore Express.js is considered more secure for functions such as validating user input, which can be tampered client-side. AngularJS applications can use the Express.js framework to connect to databases, for example in the MEAN stack. Connect.js provides middleware for Node.js requests Connect.js is a JavaScript framework providing middleware to handle requests in Node.js applications. Connect.js provides middleware to handle Express.js and cookie sessions, to provide parsers for the HTML body and cookies, and to create vhosts (virtual hosts) and error handlers, and to override methods. Backbone.js often uses a Node.js server Backbone.js is a JavaScript framework with a RESTful JSON interface and is based on the model–view–presenter (MVP) application design. It is designed for developing single-page web applications, and for keeping various parts of web applications (for example, multiple clients and the server) synchronized. Backbone depends on Underscore.js, plus jQuery for use of all the available fetures. Backbone often uses a Node.js server, for example to connect to data storage. ReactJS handles user interfaces ReactJS is a JavaScript library for creating user interfaces while addressing challenges encountered in developing single-page applications where data changes over time. React handles the user interface in model–view–controller (MVC) architecture. ReactJS typically runs client-side and can be combined with AngularJS. Although ReactJS was designed to run client-side, it can also be used server-side in conjunction with Node.js. PayPal and Netflix leverage the server-side rendering of ReactJS known as Isomorphic ReactJS. There are React-based add-ons that take care of the server-side parts of a web application. Socket.IO uses WebSockets for realtime event-driven applications Socket.IO is a JavaScript library for event-driven web applications using the WebSocket protocol ,with realtime, bi-directional communication between web clients and servers. It has two parts: a client-side library that runs in the browser, and a server-side library for Node.js. Although it can be used as simply a wrapper for WebSocket, it provides many more features, including broadcasting to multiple sockets, storing data associated with each client, and asynchronous I/O. Socket.IO provides better security than WebSocket alone, since allowed domains must be specified for its server. Ember.js can use Node.js Ember is another popular JavaScript framework with routing that uses Moustache templates. It can run on a Node.js server, or also with Express.js. Ember can also be combined with Rack, a component of Ruby On Rails (ROR). Ember Data is a library for  modeling data in Ember.js applications. CORS in Express.js The following code adds the Access-Control-Allow-Origin and Access-Control-Allow-Headers headers globally to all requests on all routes in an Express.js application. A route is a path in the Express.js application, for example /user for a user page. app.all sets the configuration for all routes in the application. Specific HTTP requests such as GET or POST are handled by app.get and app.post. app.all('*', function(req, res, next) { res.header("Access-Control-Allow-Origin", "*"); res.header("Access-Control-Allow-Headers", "X-Requested-With"); next(); }); app.get('/', function(req, res, next) { // Handle GET for this route }); app.post('/', function(req, res, next) { // Handle the POST for this route }); For better security, consider limiting the allowed origin to a single domain, or adding some additional code to validate or limit the domain(s) that are allowed. Also, consider limiting sending the headers only for routes that require CORS by replacing app.all with a more specific route and method. The following code only sends the CORS headers on a GET request on the route/user, and only allows the request from http://www.localdomain.com. app.get('/user', function(req, res, next) { res.header("Access-Control-Allow-Origin", "http://www.localdomain.com"); res.header("Access-Control-Allow-Headers", "X-Requested-With"); next(); }); Since this is JavaScript code, you may dynamically manage the values of routes, methods, and domains via variables, instead of hard-coding the values. CORS npm for Express.js using Connect.js middleware Connect.js provides middleware to handle requests in Express.js. You can use Node Package Manager (npm) to install a package that enables CORS in Express.js with Connect.js: npm install cors The package offers flexible options, which should be familiar from the CORS specification, including using credentials and preflight. It provides dynamic ways to validate an origin domain using a function or a regular expression, and handler functions to process preflight. Configuration options for CORS npm origin: Configures the Access-Control-Allow-Origin CORS header with a string containing the full URL and protocol making the request, for example http://localdomain.com. Possible values for origin: Default value TRUE uses req.header('Origin') to determine the origin and CORS is enabled. When set to FALSE CORS is disabled. It can be set to a function with the request origin as the first parameter and a callback function as the second parameter. It can be a regular expression, for example /localdomain.com$/, or an array of regular expressions and/or strings to match. methods: Sets the Access-Control-Allow-Methods CORS header. Possible values for methods: A comma-delimited string of HTTP methods, for example GET, POST An array of HTTP methods, for example ['GET', 'PUT', 'POST'] allowedHeaders: Sets the Access-Control-Allow-Headers CORS header. Possible values for allowedHeaders: A comma-delimited string of  allowed headers, for example "Content-Type, Authorization'' An array of allowed headers, for example ['Content-Type', 'Authorization'] If unspecified, it defaults to the value specified in the request's Access-Control-Request-Headers header exposedHeaders: Sets the Access-Control-Expose-Headers header. Possible values for exposedHeaders: A comma-delimited string of exposed headers, for example 'Content-Range, X-Content-Range' An array of exposed headers, for example ['Content-Range', 'X-Content-Range'] If unspecified, no custom headers are exposed credentials: Sets the Access-Control-Allow-Credentials CORS header. Possible values for credentials: TRUE—passes the header for preflight FALSE or unspecified—omit the header, no preflight maxAge: Sets the Access-Control-Allow-Max-Age header. Possible values for maxAge An integer value in milliseconds for TTL to cache the request If unspecified, the request is not cached preflightContinue: Passes the CORS preflight response to the next handler. The default configuration without setting any values allows all origins and methods without preflight. Keep in mind that complex CORS requests other than GET, HEAD, POST will fail without preflight, so make sure you enable preflight in the configuration when using them. Without setting any values, the configuration defaults to: { "origin": "*", "methods": "GET,HEAD,PUT,PATCH,POST,DELETE", "preflightContinue": false } Code examples for CORS npm These examples demonstrate the flexibility of CORS npm for specific configurations. Note that the express and cors packages are always required. Enable CORS globally for all origins and all routes The simplest implementation of CORS npm enables CORS for all origins and all requests. The following example enables CORS for an arbitrary route " /product/:id" for a GET request by telling the entire app to use CORS for all routes: var express = require('express') , cors = require('cors') , app = express(); app.use(cors()); // this tells the app to use CORS for all re-quests and all routes app.get('/product/:id', function(req, res, next){ res.json({msg: 'CORS is enabled for all origins'}); }); app.listen(80, function(){ console.log('CORS is enabled on the web server listening on port 80'); }); Allow CORS for dynamic origins for a specific route The following example uses corsOptions to check if the domain making the request is in the whitelisted array with a callback function, which returns null if it doesn't find a match. This CORS option is passed to the route "product/:id" which is the only route that has CORS enabled. The allowed origins can be dynamic by changing the value of the variable "whitelist." var express = require('express') , cors = require('cors') , app = express(); // define the whitelisted domains and set the CORS options to check them var whitelist = ['http://localdomain.com', 'http://localdomain-other.com']; var corsOptions = { origin: function(origin, callback){ var originWhitelisted = whitelist.indexOf(origin) !== -1; callback(null, originWhitelisted); } }; // add the CORS options to a specific route /product/:id for a GET request app.get('/product/:id', cors(corsOptions), function(req, res, next){ res.json({msg: 'A whitelisted domain matches and CORS is enabled for route product/:id'}); }); // log that CORS is enabled on the server app.listen(80, function(){ console.log(''CORS is enabled on the web server listening on port 80''); }); You may set different CORS options for specific routes, or sets of routes, by defining the options assigned to unique variable names, for example "corsUserOptions." Pass the specific configuration variable to each route that requires that set of options. Enabling CORS preflight CORS requests that use a HTTP method other than GET, HEAD, POST (for example DELETE), or that use custom headers, are considered complex and require a preflight request before proceeding with the CORS requests. Enable preflight by adding an OPTIONS handler for the route: var express = require('express') , cors = require('cors') , app = express(); // add the OPTIONS handler app.options('/products/:id', cors()); // options is added to the route /products/:id // use the OPTIONS handler for the DELETE method on the route /products/:id app.del('/products/:id', cors(), function(req, res, next){ res.json({msg: 'CORS is enabled with preflight on the route '/products/:id' for the DELETE method for all origins!'}); }); app.listen(80, function(){ console.log('CORS is enabled on the web server listening on port 80''); }); You can enable preflight globally on all routes with the wildcard: app.options('*', cors()); Configuring CORS asynchronously One of the reasons to use NodeJS frameworks is to take advantage of their asynchronous abilities, handling multiple tasks at the same time. Here we use a callback function corsDelegateOptions and add it to the cors parameter passed to the route /products/:id. The callback function can handle multiple requests asynchronously. var express = require('express') , cors = require('cors') , app = express(); // define the allowed origins stored in a variable var whitelist = ['http://example1.com', 'http://example2.com']; // create the callback function var corsDelegateOptions = function(req, callback){ var corsOptions; if(whitelist.indexOf(req.header('Origin')) !== -1){ corsOptions = { origin: true }; // the requested origin in the CORS response matches and is allowed }else{ corsOptions = { origin: false }; // the requested origin in the CORS response doesn't match, and CORS is disabled for this request } callback(null, corsOptions); // callback expects two parameters: error and options }; // add the callback function to the cors parameter for the route /products/:id for a GET request app.get('/products/:id', cors(corsDelegateOptions), function(req, res, next){ res.json({msg: ''A whitelisted domain matches and CORS is enabled for route product/:id'}); }); app.listen(80, function(){ console.log('CORS is enabled on the web server listening on port 80''); }); Summary We have learned important stuffs of applying CORS in Node.js. Let us have a qssuick recap of what we have learnt: Node.js provides a web server built with JavaScript, and can be combined with many other JS frameworks as the application server. Although some frameworks have specific syntax for implementing CORS, they all follow the CORS specification by specifying allowed origin(s) and method(s). More robust frameworks allow custom headers such as Content-Type, and preflight when required for complex CORS requests. JavaScript frameworks may depend on the jQuery XHR object, which must be configured properly to allow Cross-Origin requests. JavaScript frameworks are evolving rapidly. The examples here may become outdated. Always refer to the project documentation for up-to-date information. With knowledge of the CORS specification, you may create your own techniques using JavaScript based on these examples, depending on the specific needs of your application. https://en.wikipedia.org/wiki/Node.js  Resources for Article: Further resources on this subject: An Introduction to Node.js Design Patterns [article] Five common questions for .NET/Java developers learning JavaScript and Node.js [article] API with MongoDB and Node.js [article]
Read more
  • 0
  • 0
  • 25710

article-image-grouping-sets-advanced-sql
Packt
20 Jun 2017
6 min read
Save for later

Grouping Sets in Advanced SQL

Packt
20 Jun 2017
6 min read
In this article by Hans JurgenSchonig, the author of the book Mastering PostgreSQL 9.6, we will learn about advanced SQL. Introducing grouping sets Every advanced user of SQL should be familiar with GROUP BY and HAVING clauses. But are you also aware of CUBE, ROLLUP, and GROUPING SETS? If not this articlemight be worth reading for you. Loading some sample data To make this article a pleasant experience for you, I have compiled some sample data, which has been taken from the BP energy report at http://www.bp.com/en/global/corporate/energy-economics/statistical-review-of-world-energy.html. Here is the data structure, which will be used: test=# CREATE TABLE t_oil ( region text, country text, year int, production int, consumption int ); CREATE TABLE The test data can be downloaded from our website using curl directly: test=# COPY t_oil FROM PROGRAM 'curl www.cybertec.at/secret/oil_ext.txt'; COPY 644 On some operating systems curl is not there by default or has not been installed so downloading the file before might be the easier option for many people. All together there is data for 14 nations between 1965 and 2010, which are in two regions of the world: test=# SELECT region, avg(production) FROM t_oil GROUP BY region; region | avg ---------------+--------------------- Middle East | 1992.6036866359447005 North America | 4541.3623188405797101 (2 rows) Applying grouping sets The GROUP BY clause will turn many rows into one row per group. However, if you do reporting in real life, you might also be interested in the overall average. One additional line might be needed. Here is how this can be achieved: test=# SELECT region, avg(production) FROM t_oil GROUP BY ROLLUP (region); region | avg ---------------+----------------------- Middle East | 1992.6036866359447005 North America | 4541.3623188405797101 | 2607.5139860139860140 (3 rows) The ROLLUP clause will inject an additional line, which will contain the overall average. If you do reporting it is highly likely that a summary line will be needed. Instead of running two queries, PostgreSQL can provide the data running just a single query. Of course this kind of operation can also be used if you are grouping by more than just one column: test=# SELECT region, country, avg(production) FROM t_oil WHERE country IN ('USA', 'Canada', 'Iran', 'Oman') GROUP BY ROLLUP (region, country); region | country | avg ---------------+---------+----------------------- Middle East | Iran | 3631.6956521739130435 Middle East | Oman | 586.4545454545454545 Middle East | | 2142.9111111111111111 North America | Canada | 2123.2173913043478261 North America | USA | 9141.3478260869565217 North America | | 5632.2826086956521739 | | 3906.7692307692307692 (7 rows) In this example, PostgreSQL will inject three lines into the result set. One line will be injected for Middle East, one for North America. On top of that we will get a line for the overall averages. If you are building a web application the current result is ideal because you can easily build a GUI to drill into the result set by filtering out the NULL values. The ROLLUPclause is nice in case you instantly want to display a result. I always used it to display final results to end users. However, if you are doing reporting, you might want to pre-calculate more data to ensure more flexibility. The CUBEkeyword is what you might have been looking for: test=# SELECT region, country, avg(production) FROM t_oil WHERE country IN ('USA', 'Canada', 'Iran', 'Oman') GROUP BY CUBE (region, country); region | country | avg ---------------+---------+----------------------- Middle East | Iran | 3631.6956521739130435 Middle East | Oman | 586.4545454545454545 Middle East | | 2142.9111111111111111 North America | Canada | 2123.2173913043478261 North America | USA | 9141.3478260869565217 North America | | 5632.2826086956521739 | | 3906.7692307692307692 | Canada | 2123.2173913043478261 | Iran | 3631.6956521739130435 | Oman | 586.4545454545454545 | USA | 9141.3478260869565217 (11 rows) Note that even more rows have been added to the result. The CUBEwill create the same data as: GROUP BY region, country + GROUP BY region + GROUP BY country + the overall average. So the whole idea is to extract many results and various levels of aggregation at once. The resulting cube contains all possible combinations of groups. The ROLLUP and CUBE are really just convenience features on top of GROUP SETS. With the GROUPING SETS clause you can explicitly list the aggregates you want: test=# SELECT region, country, avg(production) FROM t_oil WHERE country IN ('USA', 'Canada', 'Iran', 'Oman') GROUP BY GROUPING SETS ( (), region, country); region | country | avg ---------------+---------+----------------------- Middle East | | 2142.9111111111111111 North America | | 5632.2826086956521739 | | 3906.7692307692307692 | Canada | 2123.2173913043478261 | Iran | 3631.6956521739130435 | Oman | 586.4545454545454545 | USA | 9141.3478260869565217 (7 rows) In this I went for three grouping sets: The overall average, GROUP BY region and GROUP BY country. In case you want region and country combined, use (region, country). Investigating performance Grouping sets are a powerful feature, which help to reduce the number of expensive queries. Internally,PostgreSQL will basically turn to traditional GroupAggregates to make things work. A GroupAggregate node requires sorted data so be prepared that PostgreSQL might do a lot of temporary sorting: test=# explain SELECT region, country, avg(production) FROM t_oil WHERE country IN ('USA', 'Canada', 'Iran', 'Oman') GROUP BY GROUPING SETS ( (), region, country); QUERY PLAN --------------------------------------------------------------- GroupAggregate (cost=22.58..32.69 rows=34 width=52) Group Key: region Group Key: () Sort Key: country Group Key: country -> Sort (cost=22.58..23.04 rows=184 width=24) Sort Key: region ->Seq Scan on t_oil (cost=0.00..15.66 rows=184 width=24) Filter: (country = ANY ('{USA,Canada,Iran,Oman}'::text[])) (9 rows) Hash aggregates are only supported for normal GROUP BY clauses involving no grouping sets. According to the developer of grouping sets (AtriShama), adding support for hashes is not worth the effort so it seems PostgreSQL already has an efficient implementation even if the optimizer has fewer choices than it has with normal GROUP BY statements. Combining grouping sets with the FILTER clause In real world applications grouping sets can often be combined with so called FILTER clauses. The idea behind FILTER is to be able to run partial aggregates. Here is an example: test=# SELECT region, avg(production) AS all, avg(production) FILTER (WHERE year < 1990) AS old, avg(production) FILTER (WHERE year >= 1990) AS new FROM t_oil GROUP BY ROLLUP (region); region | all | old | new ---------------+----------------+----------------+---------------- Middle East | 1992.603686635 | 1747.325892857 | 2254.233333333 North America | 4541.362318840 | 4471.653333333 | 4624.349206349 | 2607.513986013 | 2430.685618729 | 2801.183150183 (3 rows) The idea here is that not all columns will use the same data for aggregation. The FILTER clauses allow you to selectively pass data to those aggregates. In my example, the second aggregate will only consider data before 1990 while the second aggregate will take care of more recent data. If it is possible to move conditions to a WHERE clause it is always more desirable as less data has to be fetched from the table. The FILTERis only useful if the data left by the WHERE clause is not needed by each aggregate. The FILTER works for all kinds of aggregates and offers a simple way to pivot your data. Summary We have learned about advanced feature provided by SQL. On top of the simple aggregates,PostgreSQL provides, grouping sets to create custom aggregates.  Resources for Article: Further resources on this subject: PostgreSQL in Action [article] PostgreSQL as an Extensible RDBMS [article] Recovery in PostgreSQL 9 [article]
Read more
  • 0
  • 0
  • 2721

article-image-analyzing-social-networks-facebook
Packt
20 Jun 2017
15 min read
Save for later

Analyzing Social Networks with Facebook

Packt
20 Jun 2017
15 min read
In this article by Raghav Bali, Dipanjan Sarkar and Tushar Sharma, the authors of the book Learning Social Media Analytics with R, we got a good flavor of the various aspects related to the most popular social micro-blogging platform, Twitter. In this article, we will look more closely at the most popular social networking platform, Facebook. With more than 1.8 billion monthly active users, over 18 billion dollars annual revenue and record breaking acquisitions for popular products including Oculus, WhatsApp and Instagram have truly made Facebook the core of the social media network today. (For more resources related to this topic, see here.) Before we put Facebook data under the microscope, let us briefly look at Facebook’s interesting origins! Like many popular products, businesses and organizations, Facebook too had a humble beginning. Originally starting off as Mark Zuckerberg’s brainchild in 2004, it was initially known as “Thefacebook” located at thefacebook.com, which was branded as an online social network, connecting university and college students. While this social network was only open to Harvard students in the beginning, it soon expanded within a month by including students from other popular universities. In 2005, the domain facebook.com was finally purchased and “Facebook” extended its membership to employees of companies and organizations for the first time. Finally in 2006, Facebook was finally opened to everyone above 13 years of age and having a valid email address. The following snapshot shows us how the look and feel of the Facebook platform has evolved over the years! Facebook’s evolving look over time While Facebook has a primary website, also known as a web application, it has also launched mobile applications for the major operating systems on handheld devices. In short, Facebook is not just a social network website but an entire platform including a huge social network of connected people and organizations through friends, followers and pages. We will leverage Facebook’s social “Graph API” to access actual Facebook data to perform various analyses. Users, brands, business, news channels, media houses, retail stores and many more are using Facebook actively on a daily basis for producing and consuming content. This generates vast amount of data and a substantial amount of this is available to users through its APIs.  From a social media analytics perspective, this is really exciting because this treasure trove of data with easy to access APIs and powerful open source libraries from R, gives us enormous potential and opportunity to get valuable information from analyzing this data in various ways. We will follow a structured path in this article and cover the following major topics sequentially to ensure that you do not get overwhelmed with too much content at once. Accessing Facebook data Analyzing your personal social network Analyzing an English football social network Analyzing English football clubs’ brand page engagements We will use libraries like Rfacebook, igraph and ggplot2 to retrieve, analyze and visualize data from Facebook. All the following sections of the book assume that you have a Facebook account which is necessary to access data from the APIs and analyze it. In case you do not have an account, do not despair. You can use the data and code files for this article to follow along with the hands-on examples to gain a better understanding of the concepts of social network and engagement analysis.    Accessing Facebook data You will find a lot of content in several books and on the web about various techniques to access and retrieve data from Facebook. There are several official ways of doing this which include using the Facebook Graph API either directly through low level HTTP based calls or indirectly through higher level abstract interfaces belonging to libraries like Rfacebook. Some alternate ways of retrieving Facebook data would be to use registered applications on Facebook like Netvizz or the GetNet application built by Lada Adamic, used in her very popular “Social Network Analysis” course (Unfortunately http://snacourse.com/getnet is not working since Facebook completely changed its API access permissions and privacy settings). Unofficial ways include techniques like web scraping and crawling to extract data. Do note though that Facebook considers this to be a violation of its terms and conditions of accessing data and you should try and avoid crawling Facebook for data especially if you plan to use it for commercial purposes. In this section, we will take a closer look at the Graph API and the Rfacebook package in R. The main focus will be on how you can extract data from Facebook using both of them. Understanding the Graph API To start using the Graph API, you would need to have an account on Facebook to be able to use the API. You can access the API in various ways. You can create an application on Facebook by going to https://developers.facebook.com/apps/ and then create a long-lived OAuth access token using the fbOAuth(…)function from the Rfacebook package. This enables R to make calls to the Graph API and you can also store this token on the disk and load it for future use. An easier way is to create a short-lived token which would let you access the API data for about two hours by going to the Facebook Graph API Explorer page which is available at https://developers.facebook.com/tools/explorer and get a temporary access token from there. The following snapshot depicts how to get an access token for the Graph API from Facebook. Facebook’s Graph API explorer On clicking “Get User Access Token” in the above snapshot, it will present a list of checkboxes with various permissions which you might need for accessing data including user data permissions, events, groups and pages and other miscellaneous permissions. You can select the ones you need and click on the “Get Access Token” button in the prompt. This will generate a new access token the field depicted in the above snapshot and you can directly copy and use it to retrieve data in R. Before going into that, we will take a closer look at the Graph API explorer which directly allows you to access the API from your web browser itself and helps if you want to do some quick exploratory analysis. A part of it is depicted in the above snapshot. The current version of the API when writing this book is v2.8 which you can see in the snapshot beside the GET resource call. Interestingly, the Graph API is so named because Facebook by itself can be considered as a huge social graph where all the information can be classified into the following three categories. Nodes: These are basically users, pages, photos and so on. Nodes indicate a focal point of interest which is connected to other points. Edges: These connect various nodes together forming the core social graph and these connections are based on various relations like friends, followers and so on. Fields: These are specific attributes or properties about nodes, an example would be a user’s address, birthday, name and so on. Like we mentioned before, the API is HTTP based and you can make HTTPGET requests to nodes or edges and all requests are passed to graph.facebook.com to get data. Each node usually has a specific identifier and you can use it for querying information about a node as depicted in the following snippet. GET graph.facebook.com /{node-id} You can also use edge names in addition to the identifier to get information about the edges of the node. The following snippet depicts how you can do the same. GET graph.facebook.com /{node-id}/{edge-name} The following snapshot shows us how we can get information about our own profile. Querying your details in the Graph API explorer Now suppose, I wanted to retrieve information about a Facebook page,“Premier League” which represents the top tier competition in English Football using its identifier and also take a look at its liked pages. I can do the same using the following request. Querying information about a Facebook Page using the Graph API explorer Thus from the above figure, you can clearly see the node identifier, page name and likes for the page, “Premier League”. It must be clear by now that all API responses are returned in the very popular JSON format which is easy to parse and format as needed for analysis. Besides this, there also used to be another way of querying the social graph in Facebook, which was known as FQL or Facebook Query Language, an SQL like interface for querying and retrieving data. Unfortunately, Facebook seems to have deprecated its use and hence covering it would be out of our present scope. Now that you have a firm grasp on the syntax of the Graph API and have also seen a few examples of how to retrieve data from Facebook, we will take a closer look at the Rfacebook package. Understanding Rfacebook Since we will be accessing and analyzing data from Facebook using R, it makes sense to have some robust mechanism to directly query Facebook and retrieve data instead of going to the browser every time like we did in the earlier section. Fortunately, there is an excellent package in R called Rfacebook which has been developed by Pablo Barberá. You can either install it from CRAN or get its most updated version from GitHub. The following snippet depicts how you can do the same. Remember you might need to install the devtools package if you don’t have it already, to download and install the latest version of the Rfacebook package from GitHub. install.packages("Rfacebook") # install from CRAN # install from GitHub library(devtools) install_github("pablobarbera/Rfacebook/Rfacebook") Once you install the package, you can load up the package using load(Rfacebook) and start using it to retrieve data from Facebook by using the access token you generated earlier. The following snippet shows us how you can access your own details like we had mentioned in the previous section, but this time by using R. > token = 'XXXXXX' > me <- getUsers("me", token=token) > me$name [1] "Dipanjan Sarkar" > me$id [1] "1026544" The beauty of this package is that you directly get the results in curated and neatly formatted data frames and you do not need to spend extra time trying to parse the raw JSON response objects from the Graph API. The package is well documented and has high level functions for accessing personal profile data on Facebook as well as page and group level data points. We will now take a quick look at Netvizz a Facebook application, which can also be used to extract data easily from Facebook. Understanding Netvizz The Netvizz application was developed by Bernhard Rieder and is a tool which can be used to extract data from Facebook pages, groups, get statistics about links and also extract social networks from Facebook pages based on liked pages from each connected page in the network. You can access Netvizz at https://apps.facebook.com/netvizz/ and on registering the application on your profile, you will be able to see the following screen. The Netvizz application interface From the above app snapshot, you can see that there are various links based on the type of operation you want to execute to extract data. Feel free to play around with this tool and we will be using its “page like network” capability later on in one of our analyses in a future section. Data Access Challenges There are several challenges with regards to accessing data from Facebook. Some of the major issues and caveats have been mentioned in the following points: Facebook will keep evolving and updating its data access APIs and this can and will lead to changes and deprecation of older APIs and access patterns just like FQL was deprecated. Scope of data available keeps changing with time and evolving of Facebook’s API and privacy settings. For instance we can no longer get details of all our friends from the API any longer. Libraries and Tools built on top of the API can tend to break with changes to Facebook’s APIs and this has happened before with Rfacebook as well as Netvizz. Besides this, Lada Adamic’s GetNet application has stopped working permanently ever since Facebook changed the way apps are created and the permissions they require. You can get more information about it here http://thepoliticsofsystems.net/2015/01/the-end-of-netvizz/ Thus what was used in the book today for data retrieval might not be working completely tomorrow if there are any changes in the APIs though it is expected it will be working fine for at least the next couple of years. However to prevent any hindrance on analyzing Facebook data, we have provided the datasets we used in most of our analyses except personal networks so that you can still follow along with each example and use-case. Personal names have been anonymized wherever possible to protect their privacy. Now that we have a good idea about Facebook’s Graph API and how to access data, let’s analyze some social networks! Analyzing your personal social network Like we had mentioned before, Facebook by itself is a massive social graph, connecting billions of users, brands and organization. Consider your own Facebook account if you have one. You will have several friends which are your immediate connections, they in turn will be having their own set of friends including you and you might be friends with some of them and so on. Thus you and your friends form the nodes of the network and edges determine the connections. In this section we will analyze a small network of you and your immediate friends and also look at how we can extract and analyze some properties from the network. Before we jump into our analysis, we will start by loading the necessary packages needed which are mentioned in the following snippet and storing the Facebook Graph API access token in a variable. library(Rfacebook) library(gridExtra) library(dplyr) # get the Graph API access token token = ‘XXXXXXXXXXX’ You can refer to the file fb_personal_network_analysis.R for code snippets used in the examples depicted in this section. Basic descriptive statistics In this section, we will try to get some basic information and descriptive statistics on the same from our personal social network on Facebook. To start with let us look at some details of our own profile on Facebook using the following code. # get my personal information me <- getUsers("me", token=token, private_info = TRUE) > View(me[c('name', 'id', 'gender', 'birthday')]) This shows us a few fields from the data frame containing our personal details retrieved from Facebook. We use the View function which basically invokes a spreadsheet-style data viewer on R objects like data frames. Now, let us get information about our friends in our personal network. Do note that Facebook currently only lets you access information about those friends who have allowed access to the Graph API and hence you may not be able to get information pertaining to all friends in your friend list. We have anonymized their names below for privacy reasons. anonymous_names <- c('Johnny Juel', 'Houston Tancredi',..., 'Julius Henrichs', 'Yong Sprayberry') # getting friends information friends <- getFriends(token, simplify=TRUE) friends$name <- anonymous_names # view top few rows > View(head(friends)) This gives us a peek at some people from our list of friends which we just retrieved from Facebook. Let’s now analyze some descriptive statistics based on personal information regarding our friends like where they are from, their gender and so on. # get personal information friends_info <- getUsers(friends$id, token, private_info = TRUE) # get the gender of your friends >View(table(friends_info$gender)) This gives us the gender of my friends, looks like more male friends have authorized access to the Graph API in my network! # get the location of your friends >View(table(friends_info$location)) This depicts the location of my friends (wherever available) in the following data frame. # get relationship status of your friends > View(table(friends_info$relationship_status)) From the statistics in the following table I can see that a lot of my friends have gotten married over the past couple of years. Boy that does make me feel old! Suppose I want to look at the relationship status of my friends grouped by gender, we can do the same using the following snippet. # get relationship status of friends grouped by gender View(table(friends_info$relationship_status, friends_info$gender)) The following table gives us the desired results and you can see the distribution of friends by their gender and relationship status. Summary This article has been proven very beneficial to know some basic analytics of social networks with the help of R. Moreover, you will also get to know the information regarding the packages that R use. Resources for Article: Further resources on this subject: How to integrate social media with your WordPress website [article] Social Media Insight Using Naive Bayes [article] Social Media in Magento [article]
Read more
  • 0
  • 0
  • 4530

article-image-monitoring-logging-and-troubleshooting
Packt
20 Jun 2017
6 min read
Save for later

Monitoring, Logging, and Troubleshooting

Packt
20 Jun 2017
6 min read
In this article by Gigi Sayfan, the author of the book Mastering Kubernetes, we will learn how to do the monitoring Kubernetes with Heapster. (For more resources related to this topic, see here.) Monitoring Kubernetes with Heapster Heapster is a Kubernetes project that provides a robust monitoring solution for Kubernetes clusters. It runs as a pod (of course), so it can be managed by Kubernetes itself. Heapster supports Kubernetes and CoreOS clusters. It has a very modular and flexible design. Heapster collects both operational metrics and events from every node in the cluster, stores them in a persistent backend (with a well-defined schema) and allows visualization and programmatic access. Heapster can be configured to use different backends (or sinks, in Heapster’s parlance) and their corresponding visualization frontends. The most common combination is InfluxDB as backend and Grafana as frontend. The Google Cloud platform integrates Heapster with the Google monitoring service. There are many other less common backends, such as the following: Log InfluxDB Google Cloud monitoring Google Cloud logging Hawkular-Metrics(metrics only) OpenTSDB Monasca (metrics only) Kafka (metrics only) Riemann (metrics only) Elasticsearch You can use multiple backends by specifying sinks on the command-line: --sink=log --sink=influxdb:http://monitoring-influxdb:80/ cAdvisor cAdvisor is part of the kubelet, which runs on every node. It collects information about the CPU/cores usage, memory, network,and file systems of each container. It provides a basic UI on port 4194, but, most importantly for Heapster, it provides all this information through the kubelet. Heapster records the information collected by cAdvisor on each node and stores it in its backend for analysis and visualization. The cAdvisor UI is useful if you want to quickly verify that a particular node is setup correctly, for example, while creating a new cluster when Heapster is not hooked up yet. Here is what it looks same as shown following: InfluxDB backend InfluxDB is a modern and robust distributed time-series database. It is very well-suited and used broadly for centralized metrics and logging. It is also the preferred Heapster backend (outside the Google Cloud platform). The only thing is InfluxDB clustering, high availability is part of enterprise offering. The storageschema The InfluxDB storage schema defines the information that Heapster stores in InfluxDB and is available for querying and graphing later. The metrics are divided into multiple categories, called measurements. You can treat and query each metric separately, or you can query a whole category as one measurement and receive the individual metrics as fields. The naming convention is <category>/<metrics name> (except for uptime, which has a single metric). If you have a SQL background you can think of measurements as tables. Each metrics are stored per container. Each metric is labeled with the following information: pod_id – Unique ID of a pod pod_name – User-provided name of a pod pod_namespace – The namespace of a pod container_base_image – Base image for the container container_name – User-provided name of the container or full cgroup name for system containers host_id – Cloud-provider-specified or user-specified Identifier of a node hostname – Hostname where the container ran labels – Comma-separated list of user-provided labels; format is key:value’ namespace_id – UID of the namespace of a pod resource_id – A unique identifier used to differentiate multiple metrics of the same type, for example, FS partitions under filesystem/usage Here are all the metrics grouped by category. As you can see, it is quite extensive. CPU cpu/limit – CPU hard limit in millicores cpu/node_capacity – CPU capacity of a node cpu/node_allocatable – CPU allocatable of a node cpu/node_reservation – Share of CPU that is reserved on the node allocatable cpu/node_utilization – CPU utilization as a share of node allocatable cpu/request – CPU request (the guaranteed amount of resources) in millicores cpu/usage – Cumulative CPU usage on all cores cpu/usage_rate – CPU usage on all cores in millicores File system filesystem/usage – Total number of bytes consumed on a filesystem filesystem/limit – The total size of the filesystem in bytes filesystem/available – The number of available bytes remaining in the filesystem Memory memory/limit – Memory hard limit in bytes memory/major_page_faults – Number of major page faults memory/major_page_faults_rate – Number of major page faults per second memory/node_capacity – Memory capacity of a node memory/node_allocatable – Memory allocatable of a node memory/node_reservation – Share of memory that is reserved on the node allocatable memory/node_utilization – Memory utilization as a share of memory allocatable memory/page_faults – Number of page faults memory/page_faults_rate – Number of page faults per second memory/request – Memory request (the guaranteed amount of resources) in bytes memory/usage – Total memory usage memory/working_set – Total working set usage; working set is the memory being used and not easily dropped by the kernel Network network/rx – Cumulative number of bytes received over the network network/rx_errors – Cumulative number of errors while receiving over the network network/rx_errors_rate – Number of errors per second while receiving over the network network/rx_rate – Number of bytes received over the network per second network/tx – Cumulative number of bytes sent over the network network/tx_errors – Cumulative number of errors while sending over the network network/tx_errors_rate – Number of errors while sending over the network network/tx_rate – Number of bytes sent over the network per second Uptime uptime – Number of milliseconds since the container was started You can work with InfluxDB directly if you’re familiar with it. You can either connect to it using its own API or use its web interface. Type the following command to find its port: k describe service monitoring-influxdb --namespace=kube-system | grep NodePort Type: NodePort NodePort: http 32699/TCP NodePort: api 30020/TCP Now you can browse to the InfluxDB web interface using the HTTP port. You’ll need to configure it to point to the API port. The username and password are root and root by default: Once you’re setup you can select what database to use (see top-right corner). The Kubernetes database is called k8s. You can now query the metrics using the InfluxDB query language. Grafana visualization Grafana runs in its own container and serves a sophisticated dashboard that works well with InfluxDB as a data source. To locate the port, type the following command: k describe service monitoring-influxdb --namespace=kube-system | grep NodePort Type: NodePort NodePort: <unset> 30763/TCP Now you can access the Grafana web interface on that port. The first thing you need to do is setup the data source to point to the InfluxDB backend: Make sure to test the connection and then go explore the various options in the dashboards. There are several default dashboards, but you should be able to customize it to your preferences. Grafana is designed to let adapt it to your needs. Summary In this article we have learned how to do monitoring Kubernetes with Heapster.  Resources for Article: Further resources on this subject: The Microsoft Azure Stack Architecture [article] Building A Recommendation System with Azure [article] Setting up a Kubernetes Cluster [article]
Read more
  • 0
  • 0
  • 21389
Unlock access to the largest independent learning library in Tech for FREE!
Get unlimited access to 7500+ expert-authored eBooks and video courses covering every tech area you can think of.
Renews at $19.99/month. Cancel anytime
article-image-introduction-nfrs
Packt
20 Jun 2017
14 min read
Save for later

Introduction to NFRs

Packt
20 Jun 2017
14 min read
In this article by Sameer Paradkar, the author of the book Mastering Non-Functional Requirements, we will learn the non-functional requirements are those aspects of the IT system that, while not directly affect the business functionality of the application but have a profound impact on the efficiency and effectiveness of business systems for end users as well as the people responsible for supporting the program. The definition of these requirements is an essential factor in developing a total customer solution that delivers business goals. Non-functional requirements are used primarily to drive the operational aspects of the architecture, in other words, to address major operational and technical areas of the system to ensure the robustness and ruggedness of the application. Benchmark or Proof-of-Concept can be used to verify if the implementation meets these requirements or indicate if a corrective action is necessary. Ideally, a series of tests should be planned that maps to the development schedule and grows in complexity. The topics that are covered in this article are as follows: Definition of NFRs NFR KPIs and metrics (For more resources related to this topic, see here.) Introducing NFR The following pointers state the definition of NFR: To define requirements and constraints on the IT system As a basis for cost estimates and early system sizing To assess the viability of the proposed IT system NFRs are an important determining factor of the architecture and design of the operational models As an guideline to design phase to meet NFRs such as performance, scalability, availability The NFRs foreach of the domains e.g. scalability, availability and so on,must be understood to facilitate the design and development of the target operating model. These include the servers, networks, and platforms including the application runtime environments. These are critical for the execution of benchmark tests. They also affect the design of technical and application components. End users have expectations about the effectiveness of the application. These characteristics include ease of software use, speed, reliability, and recoverability when unexpected conditions arise. The NFRs define these aspects of the IT system. The non-functional requirements should be defined precisely and involves quantifying them. NFRs should provide measurements the application must meet. For example, the maximum number of time allowed to execute a process, the number of hours in a day an application must be available, the maximum size of a database on disk, and the number of concurrent users supported are typical NFRs the software must implement. Figure 1: Key Non-Functional Requirements There are many kinds of non-functional requirements, including: Performance Performance is the responsiveness of the application to perform specific actions in a given time span. Performance is scored in terms of throughput or latency. Latency is the time taken by the application to respond to an event. Throughput is the number of events scored in a given time interval. An application’s performance can directly impact its scalability. Enhancing application’s performance often enhances scalability by reducing contention for shared resources. Performance attributes specify the timing characteristics of the application. Certain features are more time-sensitive than others; the NFRs should identify such software tasks that have constraints on their performance. Response time relates to the time needed to complete specific business processes, batch or interactive, within the target business system. The system must be designed to fulfil the agreed upon response time requirements, while supporting the defined workload mapped against the given static baseline, on a system platform that does not exceed the stated utilization. The following attributes are: Throughput: The ability of the system to execute a given number of transactions within a given unit of time Response times: The distribution of time which the system takes to respond to the request Scalability Scalability is the ability to handle an increase in the work load without impacting the performance, or the ability to quickly expand the architecture. Itis the ability to expand the architecture to accommodate more users, more processes, more transactions, additional systems and services as the business requirements change and the systems evolve to meet the future business demands. This permits existing systems to be extended without replacing them. Thisdirectly affects the architecture and the selection of software components and hardware. The solution must allow the hardware and the deployed software services and components to be scaled horizontally as well as vertically. Horizontal scaling involves replicating the same functionality across additional nodes vertical scaling involves the same functionality across bigger and more powerful nodes. Scalability definitions measure volumes of users and data the system should support. There are two key techniques for improving both vertical and horizontal scalability. Vertical Scaling is also known as scaling up and includes adding more resources such as memory, CPUand hard disk to a system. Horizontal scaling is also know as scaling out and includes adding more nodes to a cluster forwork load sharing. The following attributes are: Throughput: Number of maximum transactions your system needs to handle. E.g., thousand a day or A million Storage: Amount  of data you going to need to store Growth requirements: Data growth in the next 3-5 years Availability Availability is the time frame in which the system functions normally and without failures. Availability is measured as the percentage of total application downtime over a defined time period. Availability is affected by failures, exceptions, infrastructure issues, malicious attacks, and maintenance and upgrades. It is the uptime or the amount of time the system is operational and available for use. This is specified because some systems are architected with expected downtime for activities like database upgrades and backups. Availability also conveys the number of hours or days per week or weeks per year the application will be available to its end customers, as well as how rapidly it can recover from faults. Since the architecture establishes software, hardware, and networking entities, this requirement extends to all of them. Hardware availability, recoverability, and reliability definitions measure system up-time. For example, it is specified in terms of mean time between failures or “MTBF”. The following attributes are: Availability: Application availability considering the weekends, holidays and maintenance times and failures. Locations of operation: Geographic location, Connection requirements and the restrictions of the network prevail. Offline Requirement: Time available for offline operations including batch processing & system maintenance. Length of time between failures Recoverability: Time required by the system can resume operation in the event of failure. Resilience: The reliability characteristics of the system and sub-components Capacity This non-functional requirement defines the ways in which the system is expected to scale-up by increasing capacity, hardware or adding machines based on business objectives. Capacity is delivering enough functionality required for the end users.  A request for a web service to provide 1,000 requests per second when the server is only capable of 100 requests a second, may not succeed.  While this sounds like an availability issue, it occurs because the server is unable to handle the requisite capacity. A single node may not be able to provide enough capacity, and one needs to deploy multiple nodes with a similar configuration to meet organizational capacity requirements. Capacity to identify a failing node and restart it on another machine or VM is a non-functional requirement. The following attributes are: Throughput is the number of peak transactions the system needs to handle Storage: Volume of data the system can persist at run time to disk and relates to the memory/disk Year-on-yeargrowthrequirements (users, processing and storage) e-channel growth projections Different types of things (for example, activities or transactions supported, and so on) For each type of transaction, volumes on an hourly, daily, weekly, monthly, and so on During the specific time of the day (for example, at lunch), week, month or year are volumes significantly higher Transaction volume growth expected and additional volumes you will be able to handle Security Security is the ability of an application to avoid malicious incidences and events outside of the designed system usage, and prevent disclosure or loss of information. Improving security increases the reliability of application by reducing the likelihood of an attack succeeding and impairing operations. Adding security controls protects assets and prevents unauthorized access and manipulation of critical information. The factors that affect an application security are confidentiality and integrity. The key security controls used to secure systems are authorization, authentication, encryption, auditing, and logging. Definition and monitoring of effectiveness in meeting the security requirements of the system, for example, to avoid financial harm in accounting systems, is critical. Integrityrequirements are restrictingaccess to functionality or data to certain users and protecting the privacyof data entered into the software. The following attributes are: Authentication: Correct identification of parties attempting to access systems and protection of systems from unauthorized parties Authorization: Mechanism required to authorize users to perform different functions within the systems Encryption(data at rest or data in flight): All external communications between the data server and clients must beencrypted Data confidentiality: All data must be protectively marked, stored and protected Compliance: The process to confirm systems compliance with the organization's security standards and policies  Maintainability Maintainability is the ability of any application to go through modifications and updates with a degree of ease. This is the degree of flexibility with which the application can be modified, whether for bug fixes or to update functionality. These changes may impact any of the components, services, functionality, or interfaces in the application landscape while modifying to fix errors, or to meet changing business requirements. This is also a degree of time it takes to restore the system to its normal state following a failure or fault. Improving maintainability can improve the availability and reduce the run-time defects. Application’s maintainability is dependent on the overall quality attributes. It is critical as a large chunk of the IT budget is spent on maintenance of systems. The more maintainable a system is the lower the total cost of ownership. The following attributes are: Conformance to design standards, coding standards, best practices, reference architectures, and frameworks. Flexibility: The degree to which the system is intended to support change Release support: The way in which the system supports the introduction of initial release, phased rollouts and future releases Manageability Manageability is the ease with which the administrators can manage the application, through useful instrumentation exposed for monitoring. It is the ability of the system or the group of the system to provide key information to the operations and support team to be able to debug, analyze and understand the root cause of failures. It deals with compliance/governance with the domain frameworks and polices. The key is to design the application that is easy to manage, by exposing useful instrumentation for monitoring systems and for understanding the cause of failures. The following attributes are: System must maintain total traceability of transactions Businessobjectsand database fields are part of auditing User and transactional timestamps. File characteristics include size before, size after and structure Getting events and alerts as thresholds (for example, memory, storage, processor) are breached Remotely manage applications and create new virtual instances at the click of a button Rich graphical dashboard for all key applications metrics and KPI Reliability Reliability is the ability of the application to maintain its integrity and veracity over a time span and also in the event of faults or exceptions. It is measured as the probability that the software will not fail and that it will continue functioning for a defined time interval. It alsospecifies the ability of the system to maintain its performance over a time span. Unreliable software is prone to failures anda few processes may be more sensitive to failure than others, because such processes may not be able to recover from a fault or exceptions. The following attributes are: The characteristic of a system to perform its functions under stated conditions for a specificperiod of time. Mean Time To Recovery: Time is available to get the system back up online. Mean Time Between Failures – Acceptable threshold for downtime Data integrity is also known as referential integrity in database tables and interfaces Application Integrity and Information Integrity: during transactions Fault trapping (I/O): Handling failures and recovery Extensibility Extensibility is the ability of a system to cater to future changes through flexible architecture, design or implementation. Extensible applications have excellent endurance, which prevents the expensive processes of procuring large inflexible applications and retiring them due to changes in business needs. Extensibility enables organizations to take advantage of opportunities and respond to risks. While there is a significant difference extensibility is often tangled with modifiability quality. Modifiability means that is possible to change the software whereas extensibility means that change has been planned and will be effortless. Adaptability is at times erroneously leveraged with extensibility. However, adaptability deals with how the user interactions with the system are managed and governed. Extensibilityallows a system, people, technology, information, and processes all working together to achieve following objectives: The following attributes are: Handle new information types Manage new or changed business entities Consume or provide new feeds Recovery In the event of a natural calamity for example, flood or hurricane, the entire facility where the application is hosted may become inoperable or inaccessible. Business-critical applications should have a strategy to recover from such disasters within a reasonable amount of time frame. The solution implementing various processes must be integrated with the existing enterprise disaster recovery plan. The processes must be analysed to understand the criticality of each process to the business, the impact of loss to the business in case of non-availability of the process. Based on this analysis, appropriate disaster procedures must be developed, and plans should be outlined. As part of disaster recovery, electronic backups of data and procedures must be maintained at the recovery location and be retrievable within the appropriate time frames for system function restoration. In the case of high criticality, real-time mirroring to a mirror site should be deployed. The following attributes are: Recoveryprocess: Recovery Time Objectives(RTO) / Recovery Point Objectives(RPO) Restore time: Time required switching to the secondary site when the primary fails RPO/Backup time: Time it takes to back your data Backup frequencies: Frequency of backing-up the transaction data, configuration data and code Interoperability Interoperability is the ability to exchange information and communicate with internal and external applications and systems. Interoperable systems make it easier to exchange information both internally and externally. The data formats, transport protocols and interfaces are the key attributes for architecting interoperable systems. Standardization of data formats, transport protocols and interfaces are the key aspect to be considered when architecting interoperable system. Interoperability is achieved through: Publishing and describing interfaces Describing the syntax used to communicate Describing the semantics of information it produces and consumes Leveraging open standards to communicate with external systems Loosely coupled with external systems The following attributes are: Compatibility with shared applications: Other system it needs to integrate Compatibility with 3rd party applications: Other systems it has to live with amicably Compatibility with various OS: Different OS compatibility Compatibility on different platforms: Hardware platforms it needs to work on Usability Usability measures characteristics such as consistency and aesthetics in the user interface. Consistency is the constant use of mechanisms employed in the user interface while Aesthetics refers to the artistic, visual quality of the user interface. It is the ease at which the users operate the system and make productive use of it. Usability is discussed with relation to the system interfaces, but it can just as well be applied to any tool, device, or rich system. This addresses the factors that establish the ability of the software to be understood, used, and learned by its intended users. The application interfaces must be designed with end users in mind so that they are intuitive to use, are localized, provide access for differently abled users, and provide an excellent overall user experience. The following attributes are: Look and feel standards: Layout and flow, screen element density, keyboard shortcuts, UI metaphors, colors. Localization/Internationalization requirements: Keyboards, paper sizes, languages, spellings, and so on Summary It explains he introduction of NFRs and why NFRs are a critical for building software systems. The article also explained various KPI for each of the key of NFRs i.e. scalability, availability, reliability and do on.  Resources for Article: Further resources on this subject: Software Documentation with Trac [article] The Software Task Management Tool - Rake [article] Installing Software and Updates [article]
Read more
  • 0
  • 0
  • 7250

article-image-understanding-basics-rxjava
Packt
20 Jun 2017
15 min read
Save for later

Understanding the Basics of RxJava

Packt
20 Jun 2017
15 min read
In this article, by Tadas Subonis author of the book Reactive Android Programming, will go through the core basics of RxJava so that we can fully understand what it is, what are the core elements, and how they work. Before that, let's take a step back and briefly discuss how RxJava is different from other approaches. RxJava is about reacting to results. It might be an item that originated from some source. It can also be an error. RxJava provides a framework to handle these items in a reactive way and to create complicated manipulation and handling schemes in a very easy-to-use interface. Things like waiting for an arrival of an item before transforming it become very easy with RxJava. To achieve all this, RxJava provides some basic primitives: Observables: A source of data Subscriptions: An activated handle to the Observable that receives data Schedulers: A means to define where (on which Thread) the data is processed First of all, we will cover Observables--the source of all the data and the core structure/class that we will be working with. We will explore how are they related to Disposables (Subscriptions). Furthermore, the life cycle and hook points of an Observable will be described, so we will actually know what's happening when an item travels through an Observable and what are the different stages that we can tap into. Finally, we will briefly introduce Flowable--a big brother of Observable that lets you handle big amounts of data with high rates of publishing. To summarize, we will cover these aspects: What is an Observable? What are Disposables (formerly Subscriptions)? How items travel through the Observable? What is backpressure and how we can use it with Flowable? Let's dive into it! (For more resources related to this topic, see here.) Observables Everything starts with an Observable. It's a source of data that you can observe for emitted data (hence the name). In almost all cases, you will be working with the Observable class. It is possible to (and we will!) combine different Observables into one Observable. Basically, it is a universal interface to tap into data streams in a reactive way. There are lots of different ways of how one can create Observables. The simplest way is to use the .just() method like we did before: Observable.just("First item", "Second item"); It is usually a perfect way to glue non-Rx-like parts of the code to Rx compatible flow. When an Observable is created, it is not usually defined when it will start emitting data. If it was created using simple tools such as.just(), it won't start emitting data until there is a subscription to the observable. How do you create a subscription? It's done by calling .subscribe() : Observable.just("First item", "Second item") .subscribe(); Usually (but not always), the observable be activated the moment somebody subscribes to it. So, if a new Observable was just created, it won't magically start sending data "somewhere". Hot and Cold Observables Quite often, in the literature and documentation terms, Hot and Cold Observables can be found. Cold Observable is the most common Observable type. For example, it can be created with the following code: Observable.just("First item", "Second item") .subscribe(); Cold Observable means that the items won't be emitted by the Observable until there is a Subscriber. This means that before the .subscribe() is called, no items will be produced and thus none of the items that are intended to be omitted will be missed, everything will be processed. Hot Observable is an Observable that will begin producing (emitting) items internally as soon as it is created. The status updates are produced constantly and it doesn't matter if there is something that is ready to receive them (like Subscription). If there were no subscriptions to the Observable, it means that the updates will be lost. Disposables A disposable (previously called Subscription in RxJava 1.0) is a tool that can be used to control the life cycle of an Observable. If the stream of data that the Observable is producing is boundless, it means that it will stay active forever. It might not be a problem for a server-side application, but it can cause some serious trouble on Android. Usually, this is the common source of memory leaks. Obtaining a reference to a disposable is pretty simple: Disposable disposable = Observable.just("First item", "Second item") .subscribe(); Disposable is a very simple interface. It has only two methods: dispose() and isDisposed() .  dispose() can be used to cancel the existing Disposable (Subscription). This will stop the call of .subscribe()to receive any further items from Observable, and the Observable itself will be cleaned up. isDisposed() has a pretty straightforward function--it checks whether the subscription is still active. However, it is not used very often in regular code as the subscriptions are usually unsubscribed and forgotten. The disposed subscriptions (Disposables) cannot be re-enabled. They can only be created anew. Finally, Disposables can be grouped using CompositeDisposable like this: Disposable disposable = new CompositeDisposable( Observable.just("First item", "Second item").subscribe(), Observable.just("1", "2").subscribe(), Observable.just("One", "Two").subscribe() ); It's useful in the cases when there are many Observables that should be canceled at the same time, for example, an Activity being destroyed. Schedulers As described in the documentation, a scheduler is something that can schedule a unit of work to be executed now or later. In practice, it means that Schedulers control where the code will actually be executed and usually that means selecting some kind of specific thread. Most often, Subscribers are used to executing long-running tasks on some background thread so that it wouldn't block the main computation or UI thread. This is especially relevant on Android when all long-running tasks must not be executed on MainThread. Schedulers can be set with a simple .subscribeOn() call: Observable.just("First item", "Second item") .subscribeOn(Schedulers.io()) .subscribe(); There are only a few main Schedulers that are commonly used: Schedulers.io() Schedulers.computation() Schedulers.newThread() AndroidSchedulers.mainThread() The AndroidSchedulers.mainThread() is only used on Android systems. Scheduling examples Let's explore how schedulers work by checking out a few examples. Let's run the following code: Observable.just("First item", "Second item") .doOnNext(e -> Log.d("APP", "on-next:" + Thread.currentThread().getName() + ":" + e)) .subscribe(e -> Log.d("APP", "subscribe:" + Thread.currentThread().getName() + ":" + e)); The output will be as follows: on-next:main:First item subscribe:main:First item on-next:main:Second item subscribe:main:Second item Now let's try changing the code to as shown: Observable.just("First item", "Second item") .subscribeOn(Schedulers.io()) .doOnNext(e -> Log.d("APP", "on-next:" + Thread.currentThread().getName() + ":" + e)) .subscribe(e -> Log.d("APP", "subscribe:" + Thread.currentThread().getName() + ":" + e)); Now, the output should look like this: on-next:RxCachedThreadScheduler-1:First item subscribe:RxCachedThreadScheduler-1:First item on-next:RxCachedThreadScheduler-1:Second item subscribe:RxCachedThreadScheduler-1:Second item We can see how the code was executed on the main thread in the first case and on a new thread in the next. Android requires that all UI modifications should be done on the main thread. So, how can we execute a long-running process in the background but process the result on the main thread? That can be done with .observeOn() method: Observable.just("First item", "Second item") .subscribeOn(Schedulers.io()) .doOnNext(e -> Log.d("APP", "on-next:" + Thread.currentThread().getName() + ":" + e)) .observeOn(AndroidSchedulers.mainThread()) .subscribe(e -> Log.d("APP", "subscribe:" + Thread.currentThread().getName() + ":" + e)); The output will be as illustrated: on-next:RxCachedThreadScheduler-1:First item on-next:RxCachedThreadScheduler-1:Second item subscribe:main:First item subscribe:main:Second item You will note that the items in the doOnNext block were executed on the "RxThread", and the subscribe block items were executed on the main thread. Investigating the Flow of Observable The logging inside the steps of an Observable is a very powerful tool when one wants to understand how they work. If you are in doubt at any point as to what's happening, add logging and experiment. A few quick iterations with logs will definitely help you understand what's going on under the hood. Let's use this technique to analyze a full flow of an Observable. We will start off with this script: private void log(String stage, String item) { Log.d("APP", stage + ":" + Thread.currentThread().getName() + ":" + item); } private void log(String stage) { Log.d("APP", stage + ":" + Thread.currentThread().getName()); } Observable.just("One", "Two") .subscribeOn(Schedulers.io()) .doOnDispose(() -> log("doOnDispose")) .doOnComplete(() -> log("doOnComplete")) .doOnNext(e -> log("doOnNext", e)) .doOnEach(e -> log("doOnEach")) .doOnSubscribe((e) -> log("doOnSubscribe")) .doOnTerminate(() -> log("doOnTerminate")) .doFinally(() -> log("doFinally")) .observeOn(AndroidSchedulers.mainThread()) .subscribe(e -> log("subscribe", e)); It can be seen that it has lots of additional and unfamiliar steps (more about this later). They represent different stages during the processing of an Observable. So, what's the output of the preceding script?: doOnSubscribe:main doOnNext:RxCachedThreadScheduler-1:One doOnEach:RxCachedThreadScheduler-1 doOnNext:RxCachedThreadScheduler-1:Two doOnEach:RxCachedThreadScheduler-1 doOnComplete:RxCachedThreadScheduler-1 doOnEach:RxCachedThreadScheduler-1 doOnTerminate:RxCachedThreadScheduler-1 doFinally:RxCachedThreadScheduler-1 subscribe:main:One subscribe:main:Two doOnDispose:main Let's go through some of the steps. First of all, by calling .subscribe() the doOnSubscribe block was executed. This started the emission of items from the Observable as we can see on the doOnNext and doOnEach lines. Finally, the stream finished at termination life cycle was activated--the doOnComplete, doOnTerminate and doOnFinally. Also, the reader will note that the doOnDispose block was called on the main thread along with the subscribe block. The flow will be a little different if .subscribeOn() and .observeOn() calls won't be there: doOnSubscribe:main doOnNext:main:One doOnEach:main subscribe:main:One doOnNext:main:Two doOnEach:main subscribe:main:Two doOnComplete:main doOnEach:main doOnTerminate:main doOnDispose:main doFinally:main You will readily note that now, the doFinally block was executed after doOnDispose while in the former setup, doOnDispose was the last. This happens due to the way Android Looper schedulers code blocks for execution and the fact that we used two different threads in the first case. The takeaway here is that whenever you are unsure of what is going on, start logging actions (and the thread they are running on) to see what's actually happening. Flowable Flowable can be regarded as a special type of Observable (but internally it isn't). It has almost the same method signature like the Observable as well. The difference is that Flowable allows you to process items that emitted faster from the source than some of the following steps can handle. It might sound confusing, so let's analyze an example. Assume that you have a source that can emit a million items per second. However, the next step uses those items to do a network request. We know, for sure, that we cannot do more than 50 requests per second: That poses a problem. What will we do after 60 seconds? There will be 60 million items in the queue waiting to be processed. The items are accumulating at a rate of 1 million items per second between the first and the second steps because the second step processes them at a much slower rate. Clearly, the problem here is that the available memory will be exhausted and the programming will fail with an OutOfMemory (OOM) exception. For example, this script will cause an excessive memory usage because the processing step just won't be able to keep up with the pace the items are emitted at. PublishSubject<Integer> observable = PublishSubject.create(); observable .observeOn(Schedulers.computation()) .subscribe(v -> log("s", v.toString()), this::log); for (int i = 0; i < 1000000; i++) { observable.onNext(i); } private void log(Throwable throwable) { Log.e("APP", "Error", throwable); } By converting this to a Flowable, we can start controlling this behavior: observable.toFlowable(BackpressureStrategy.MISSING) .observeOn(Schedulers.computation()) .subscribe(v -> log("s", v.toString()), this::log); Since we have chosen not to specify how we want to handle items that cannot be processed (it's called Backpressuring), it will throw a MissingBackpressureException. However, if the number of items was 100 instead of a million, it would have been just fine as it wouldn't hit the internal buffer of Flowable. By default, the size of the Flowable queue (buffer) is 128. There are a few Backpressure strategies that will define how the excessive amount of items should be handled. Drop Items Dropping means that if the downstream processing steps cannot keep up with the pace of the source Observable, just drop the data that cannot be handled. This can only be used in the cases when losing data is okay, and you care more about the values that were emitted in the beginning. There are a few ways in which items can be dropped. The first one is just to specify Backpressure strategy, like this: observable.toFlowable(BackpressureStrategy.DROP) Alternatively, it will be like this: observable.toFlowable(BackpressureStrategy.MISSING) .onBackpressureDrop() A similar way to do that would be to call .sample(). It will emit items only periodically, and it will take only the last value that's available (while BackpressureStrategy.DROP drops it instantly unless it is free to push it down the stream). All the other values between "ticks" will be dropped: observable.toFlowable(BackpressureStrategy.MISSING) .sample(10, TimeUnit.MILLISECONDS) .observeOn(Schedulers.computation()) .subscribe(v -> log("s", v.toString()), this::log); Preserve Latest Item Preserving the last items means that if the downstream cannot cope with the items that are being sent to them, stop emitting values and wait until they become available. While waiting, keep dropping all the values except the last one that arrived and when the downstream becomes available to send the last message that's currently stored. Like with Dropping, the "Latest" strategy can be specified while creating an Observable: observable.toFlowable(BackpressureStrategy.LATEST) Alternatively, by calling .onBackpressure(): observable.toFlowable(BackpressureStrategy.MISSING) .onBackpressureLatest() Finally, a method, .debounce(), can periodically take the last value at specific intervals: observable.toFlowable(BackpressureStrategy.MISSING) .debounce(10, TimeUnit.MILLISECONDS) Buffering It's usually a poor way to handle different paces of items being emitted and consumed as it often just delays the problem. However, this can work just fine if there is just a temporal slowdown in one of the consumers. In this case, the items emitted will be stored until later processing and when the slowdown is over, the consumers will catch up. If the consumers cannot catch up, at some point the buffer will run out and we can see a very similar behavior to the original Observable with memory running out. Enabling buffers is, again, pretty straightforward by calling the following: observable.toFlowable(BackpressureStrategy.BUFFER) or observable.toFlowable(BackpressureStrategy.MISSING) .onBackpressureBuffer() If there is a need to specify a particular value for the buffer, one can use .buffer(): observable.toFlowable(BackpressureStrategy.MISSING) .buffer(10) Completable, Single, and Maybe Types Besides the types of Observable and Flowable, there are three more types that RxJava provides: Completable: It represents an action without a result that will be completed in the future Single: It's just like Observable (or Flowable) that returns a single item instead of a stream Maybe: It stands for an action that can complete (or fail) without returning any value (like Completable) but can also return an item like Single However, all these are used quite rarely. Let's take a quick look at the examples. Completable Since Completable can basically process just two types of actions--onComplete and onError--we will cover it very briefly. Completable has many static factory methods available to create it but, most often, it will just be found as a return value in some other libraries. For example, the Completable can be created by calling the following: Completable completable = Completable.fromAction(() -> { log("Let's do something"); }); Then, it is to be subscribed with the following: completable.subscribe(() -> { log("Finished"); }, throwable -> { log(throwable); }); Single Single provides a way to represent an Observable that will return just a single item (thus the name). You might ask, why it is worth having it at all? These types are useful to tell the developers about the specific behavior that they should expect. To create a Single, one can use this example: Single.just("One item") The Single and the Subscription to it can be created with the following: Single.just("One item") .subscribe((item) -> { log(item); }, (throwable) -> { log(throwable); }); Make a note that this differs from Completable in that the first argument to the .subscribe() action now expects to receive an item as a result. Maybe Finally, the Maybe type is very similar to the Single type, but the item might not be returned to the subscriber in the end. The Maybe type can be created in a very similar fashion as before: Maybe.empty(); or like Maybe.just("Item"); However, the .subscribe() can be called with arguments dedicated to handling onSuccess (for received items), onError (to handle errors), and onComplete (to do a final action after the item is handled): Maybe.just("Item") .subscribe( s -> log("success: " + s), throwable -> log("error"), () -> log("onComplete") ); Summary In this article, we covered the most essentials parts of RxJava. Resources for Article: Further resources on this subject: The Art of Android Development Using Android Studio [article] Drawing and Drawables in Android Canvas [article] Optimizing Games for Android [article]
Read more
  • 0
  • 0
  • 14739

article-image-scraping-web-page
Packt
20 Jun 2017
11 min read
Save for later

Scraping a Web Page

Packt
20 Jun 2017
11 min read
In this article by Katharine Jarmul author of the book Python Web Scraping - Second Edition we can look at some example as suppose I have a shop selling shoes and want to keep track of my competitor's prices. I could go to my competitor's website each day and compare each shoe's price with my own, however this will take a lot of time and will not scale well if I sell thousands of shoes or need to check price changes frequently. Or maybe I just want to buy a shoe when it's on sale. I could come back and check the shoe website each day until I get lucky, but the shoe I want might not be on sale for months. These repetitive manual processes could instead be replaced with an automated solution using the web scraping techniques covered in this book. In an ideal world, web scraping wouldn't be necessary and each website would provide an API to share the data in a structured format. Indeed, some websites do provide APIs, but they typically restrict the data that is available and how frequently it can be accessed. Additionally, a website developer might change, remove or restrict the backend API. In short, we cannot rely on APIs to access the online data we may want and therefore, we need to learn about web scraping techniques. (For more resources related to this topic, see here.) Three approaches to scrape a web page Now that we understand the structure of this web page we will investigate three different approaches to scraping its data, first with regular expressions, then with the popular BeautifulSoup module, and finally with the powerful lxml module. Regular expressions If you are unfamiliar with regular expressions or need a reminder, there is a thorough overview available at (https://docs.python.org/3/howto/regex.html). Even if you use regular expressions (or regex) with another programming language, I recommend stepping through it for a refresher on regex with Python. To scrape the country area using regular expressions, we will first try matching the contents of the <td> element, as follows: >>> import re >>> from advanced_link_crawler import download >>> url = 'http://example.webscraping.com/view/UnitedKingdom-239' >>> html = download(url) >>> re.findall(r'(.*?)', html) ['<'img src="/places/static/images/flags/gb.png" />', '244,820 square kilometres', '62,348,447', 'GB', 'United Kingdom', 'London', 'EU', '.uk', 'GBP', 'Pound', '44', '@# #@@|@## #@@|@@# #@@|@@## #@@|@#@ #@@|@@#@ #@@|GIR0AA', '^(([A-Z]\d{2}[A-Z]{2})|([A-Z]\d{3}[A-Z]{2})|([A-Z]{2}\d{2} [A-Z]{2})|([A-Z]{2}\d{3}[A-Z]{2})|([A-Z]\d[A-Z]\d[A-Z]{2}) |([A-Z]{2}\d[A-Z]\d[A-Z]{2})|(GIR0AA))$', 'en-GB,cy-GB,gd', 'IE '] This result shows that thetag is used for multiple country attributes. If we simply wanted to scrape the country area, we can select the second matching element, as follows: >>> re.findall('(.*?)', html)[1]'244,820 square kilometres' This solution works but could easily fail if the web page is updated. Consider if this table is changed and the area is no longer in the second matching element. If we just need to scrape the data now, future changes can be ignored. However, if we want to rescrape this data at some point, we want our solution to be as robust against layout changes as possible. To make this regular expression more specific, we can include the parentelement, which has an ID, so it ought to be unique: >>> re.findall(' Area: (.*?) ', html) ['244,820 square kilometres'] This iteration is better; however, there are many other ways the web page could be updated in a way that still breaks the regular expression. For example, double quotation marks might be changed to single, extra spaces could be added between the tags, or the area_label could be changed. Here is an improved version to try and support these various possibilities: >>> re.findall('''.*?<tds*class=["']w2p_fw["']>(.*?) ''', html) ['244,820 square kilometres'] This regular expression is more future-proof but is difficult to construct, and quite unreadable. Also, there are still plenty of other minor layout changes that would break it, such as if a title attribute was added to the <td> tag or if the tr or td elements changed their CSS classes or IDs. From this example, it is clear that regular expressions provide a quick way to scrape data but are too brittle and easily break when a web page is updated. Fortunately, there are better data extraction solutions such as. Beautiful Soup Beautiful Soup is a popular library that parses a web page and provides a convenient interface to navigate content. If you do not already have this module, the latest version can be installed using this command: pip install beautifulsoup4 The first step with Beautiful Soup is to parse the downloaded HTML into a soup document. Many web pages do not contain perfectly valid HTML and Beautiful Soup needs to correct improper open and close tags. For example, consider this simple web page containing a list with missing attribute quotes and closing tags: <ul class=country> <li>Area <li>Population </ul> If the Population item is interpreted as a child of the Area item instead of the list, we could get unexpected results when scraping. Let us see how Beautiful Soup handles this: >>> from bs4 import BeautifulSoup >>> broken_html = '<ul class=country><li>Area<li>Population</ul>' >>> # parse the HTML >>> soup = BeautifulSoup(broken_html, 'html.parser') >>> fixed_html = soup.prettify() >>> print(fixed_html) <ul class="country"> <li> Area <li> Population </li> </li> </ul> We can see that using the default html.parser did not result in properly parsed HTML. We can see from the previous snippet that it has used nested li elements, which might make it difficult to navigate. Luckily there are more options for parsers. We can install LXML or we can also use html5lib. To install html5lib, simply use pip: pip install html5lib Now, we can repeat this code, changing only the parser like so: >>> soup = BeautifulSoup(broken_html, 'html5lib') >>> fixed_html = soup.prettify() >>> print(fixed_html) <html> <head> </head> <body> <ul class="country"> <li> Area </li> <li> Population </li> </ul> </body> </html>  Here, BeautifulSoup using html5lib was able to correctly interpret the missing attribute quotes and closing tags, as well as add the <html> and <body> tags to form a complete HTML document. You should see similar results if you used lxml. Now, we can navigate to the elements we want using the find() and find_all() methods: >>> ul = soup.find('ul', attrs={'class':'country'}) >>> ul.find('li') # returns just the first match <li>Area</li> >>> ul.find_all('li') # returns all matches [<li>Area</li>, <li>Population</li>] For a full list of available methods and parameters, the official documentation is available at http://www.crummy.com/software/BeautifulSoup/bs4/doc/. Now, using these techniques, here is a full example to extract the country area from our example website: >>> from bs4 import BeautifulSoup >>> url = 'http://example.webscraping.com/places/view/United-Kingdom-239' >>> html = download(url) >>> soup = BeautifulSoup(html) >>> # locate the area row >>> tr = soup.find(attrs={'id':'places_area__row'}) >>> td = tr.find(attrs={'class':'w2p_fw'}) # locate the data element >>> area = td.text # extract the text from the data element >>> print(area) 244,820 square kilometres This code is more verbose than regular expressions but easier to construct and understand. Also, we no longer need to worry about problems in minor layout changes, such as extra whitespace or tag attributes. We also know if the page contains broken HTML that BeautifulSoup can help clean the page and allow us to extract data from very broken website code. Lxml Lxml is a Python library built on top of the libxml2 XML parsing library written in C, which helps make it faster than Beautiful Soup but also harder to install on some computers, specifically Windows. The latest installation instructions are available at http://lxml.de/installation.html. If you run into difficulties installing the library on your own, you can also use Anaconda to do so:  https://anaconda.org/anaconda/lxml. If you are unfamiliar with Anaconda, it is a package and environment manager primarily focused on open data science packages built by the folks at Continuum Analytics. You can download and install Anaconda by following their setup instructions here: https://www.continuum.io/downloads. Note that using the Anaconda quick install will set your PYTHON_PATH to the Conda installation of Python. As with Beautiful Soup, the first step when using lxml is parsing the potentially invalid HTML into a consistent format. Here is an example of parsing the same broken HTML: >>> from lxml.html import fromstring, tostring >>> broken_html = '<ul class=country><li>Area<li>Population</ul>' >>> tree = fromstring(broken_html) # parse the HTML >>> fixed_html = tostring(tree, pretty_print=True) >>> print(fixed_html) <ul class="country"> <li>Area</li> <li>Population</li> </ul> As with BeautifulSoup, lxml was able to correctly parse the missing attribute quotes and closing tags, although it did not add the <html> and <body> tags. These are not requirements for standard XML and so are unnecessary for lxml to insert. After parsing the input, lxml has a number of different options to select elements, such as XPath selectors and a find() method similar to Beautiful Soup. Instead, we will use CSS selectors here, because they are more compact and can be reused later when parsing dynamic content. Some readers will already be familiar with them from their experience with jQuery selectors or use in front-end web application development. We will compare performance of these selectors with XPath. To use CSS selectors, you might need to install the cssselect library like so: pip install cssselect Now we can use the lxml CSS selectors to extract the area data from the example page: >>> tree = fromstring(html) >>> td = tree.cssselect('tr#places_area__row > td.w2p_fw')[0] >>> area = td.text_content() >>> print(area) 244,820 square kilometres By using the cssselect method on our tree, we can utilize CSS syntax to select a table row element with the places_area__row ID, and then the child table data tag with the w2p_fw class. Since cssselect returns a list, we then index the first result and call the text_content method, which will iterate over all child elements and return concatenated text of each element. In this case, we only have one element, but this functionality is useful to know for more complex extraction examples. Summary We have walked through a variety of ways to scrape data from a web page. Regular expressions can be useful for a one-off scrape or to avoid the overhead of parsing the entire web page, and BeautifulSoup provides a high-level interface while avoiding any difficult dependencies. However, in general, lxml will be the best choice because of its speed and extensive functionality, so we will use it in future examples. Resources for Article: Further resources on this subject: Web scraping with Python (Part 2) [article] Scraping the Web with Python - Quick Start [article] Scraping the Data [article]
Read more
  • 0
  • 0
  • 2431

article-image-manipulating-functions-functional-programming
Packt
20 Jun 2017
6 min read
Save for later

Manipulating functions in functional programming

Packt
20 Jun 2017
6 min read
In this article by Wisnu Anggoro, author of the book Learning C++ Functional Programming, you will learn to apply functional programming techniques to C++ to build highly modular, testable, and reusable code. In this article, you will learn the following topics: Applying a first-class function in all functions Passing a function as other functions parameter Assigning a function to a variable Storing a function in the container (For more resources related to this topic, see here.) Applying a first-class function in all functions There's nothing special with the first-class function since it's a normal class. We can treat the first-class function like any other data type. However, in the language that supports the first-class function, we can perform the following tasks without invoking the compiler recursively: Passing a function as other function parameters Assigning functions to a variable Storing functions in collections Fortunately, C++ can be used to solve the preceding tasks. We will discuss it in depth in the following topics. Passing a function as other functions parameter Let's start passing a function as the function parameter. We will choose one of four functions and invoke the function from its main function. The code will look as follows: /* first-class-1.cpp */ #include <functional> #include <iostream> using namespace std; typedef function<int(int, int)> FuncType; int addition(int x, int y) { return x + y; } int subtraction(int x, int y) { return x - y; } int multiplication(int x, int y) { return x * y; } int division(int x, int y) { return x / y; } void PassingFunc(FuncType fn, int x, int y) { cout << "Result = " << fn(x, y) << endl; } int main() { int i, a, b; FuncType func; cout << "Select mode:" << endl; cout << "1. Addition" << endl; cout << "2. Subtraction" << endl; cout << "3. Multiplication" << endl; cout << "4. Division" << endl; cout << "Choice: "; cin >> i; cout << "a = "; cin >> a; cout << "b = "; cin >> b; switch(i) { case 1: PassingFunc(addition, a, b); break; case 2: PassingFunc(subtraction, a, b); break; case 3: PassingFunc(multiplication, a, b); break; case 4: PassingFunc(division, a, b); break; } return 0; } From the preceding code, we can see that we have four functions, and we want the user to choose one from them, and then run it. In the switch statement, we will invoke one of the four functions based on the choice of the user. We will pass the selected function to PassingFunc(), as we can see in the following code snippet: case 1: PassingFunc(addition, a, b); break; case 2: PassingFunc(subtraction, a, b); break; case 3: PassingFunc(multiplication, a, b); break; case 4: PassingFunc(division, a, b); break; The result we get on the screen should look like the following screenshot: The preceding screenshot shows that we selected the Subtraction mode and gave 8 to a and 3 to b. As we expected, the code gives us 5 as a result. Assigning a function to variable We can also assign a function to the variable so that we can call the function by calling the variable. We will refactor first-class-1.cpp, and it will be as follows: /* first-class-2.cpp */ #include <functional> #include <iostream> using namespace std; // Adding the addition, subtraction, multiplication, and // division function as we've got in first-class-1.cpp int main() { int i, a, b; function<int(int, int)> func; cout << "Select mode:" << endl; cout << "1. Addition" << endl; cout << "2. Subtraction" << endl; cout << "3. Multiplication" << endl; cout << "4. Division" << endl; cout << "Choice: "; cin >> i; cout << "a = "; cin >> a; cout << "b = "; cin >> b; switch(i) { case 1: func = addition; break; case 2: func = subtraction; break; case 3: func = multiplication; break; case 4: func = division; break; } cout << "Result = " << func(a, b) << endl; return 0; } We will now assign the four functions based on the user choice. We will store the selected function in func variable inside the switch statement, as follows: case 1: func = addition; break; case 2: func = subtraction; break; case 3: func = multiplication; break; case 4: func = division; break; After the func variable is assigned with the user's choice, the code will just call the variable like calling the function as follows: cout << "Result = " << func(a, b) << endl; Moreover, we will obtain the same output on the console if we run the code. Storing a function in the container Now, let's save the function to the container. Here, we will use the vector as the container. The code is as follows: /* first-class-3.cpp */ #include <vector> #include <functional> #include <iostream> using namespace std; typedef function<int(int, int)> FuncType; // Adding the addition, subtraction, multiplication, and // division function as we've got in first-class-1.cpp int main() { vector<FuncType> functions; functions.push_back(addition); functions.push_back(subtraction); functions.push_back(multiplication); functions.push_back(division); int i, a, b; function<int(int, int)> func; cout << "Select mode:" << endl; cout << "1. Addition" << endl; cout << "2. Subtraction" << endl; cout << "3. Multiplication" << endl; cout << "4. Division" << endl; cout << "Choice: "; cin >> i; cout << "a = "; cin >> a; cout << "b = "; cin >> b; cout << "Result = " << functions.at(i - 1)(a, b) << endl; return 0; } From the preceding code, we can see that we created a new vector named functions, then stored four different functions to it. Same with our two previous code samples, we ask the user to select the mode as well. However, now the code becomes simpler since we don't need to add the switch statement as we can select the function directly by selecting the vector index, as we can see in the following line of code: cout << "Result = " << functions.at(i - 1)(a, b) << endl; However, since the vector is a zero-based index, we have to adjust the index with the menu choice. The result will be the same with our two previous code samples. Summary In this article, we discussed that there are some techniques to manipulate a function to produce the various purpose on it. Since we can implement the first-class function in C++ language, we can pass a function as other functions parameter. We can treat a function as a data object so we can assign it to a variable and store it in the container. Resources for Article: Further resources on this subject: Introduction to the Functional Programming [article] Functional Programming in C# [article] Putting the Function in Functional Programming [article]
Read more
  • 0
  • 0
  • 18827
article-image-what-are-microservices
Packt
20 Jun 2017
12 min read
Save for later

What are Microservices?

Packt
20 Jun 2017
12 min read
In this article written by Gaurav Kumar Aroraa, Lalit Kale, Kanwar Manish, authors of the book Building Microservices with .NET Core, we will start with a brief introduction. Then, we will define its predecessors: monolithic architecture and service-oriented architecture (SOA). After this, we will see how microservices fare against both SOA and the monolithic architecture. We will then compare the advantages and disadvantages of each one of these architectural styles. This will enable us to identify the right scenario for these styles. We will understand the problems that arise from having a layered monolithic architecture. We will discuss the solutions available to these problems in the monolithic world. At the end, we will be able to break down a monolithic application into a microservice architecture. We will cover the following topics in this article: Origin of microservices Discussing microservices (For more resources related to this topic, see here.) Origin of microservices The term microservices was used for the first time in mid-2011 at a workshop of software architects. In March 2012, James Lewis presented some of his ideas about microservices. By the end of 2013, various groups from the IT industry started having discussions on microservices, and by 2014, it had become popular enough to be considered a serious contender for large enterprises. There is no official introduction available for microservices. The understanding of the term is purely based on the use cases and discussions held in the past. We will discuss this in detail, but before that, let's check out the definition of microservices as per Wikipedia (https://en.wikipedia.org/wiki/Microservices), which sums it up as: Microservices is a specialization of and implementation approach for SOA used to build flexible, independently deployable software systems. In 2014, James Lewis and Martin Fowler came together and provided a few real-world examples and presented microservices (refer to http://martinfowler.com/microservices/) in their own words and further detailed it as follows: The microservice architectural style is an approach to developing a single application as a suite of small services, each running in its own process and communicating with lightweight mechanisms, often an HTTP resource API. These services are built around business capabilities and independently deployable by fully automated deployment machinery. There is a bare minimum of centralized management of these services, which may be written in different programming languages and use different data storage technologies. It is very important that you see all the attributes James and Martin defined here. They defined it as an architectural style that developers could utilize to develop a single application with the business logic spread across a bunch of small services, each having their own persistent storage functionality. Also, note its attributes: it can be independently deployable, can run in its own process, is a lightweight communication mechanism, and can be written in different programming languages. We want to emphasize this specific definition since it is the crux of the whole concept. And as we move along, it will come together by the time we finish this book. Discussing microservices Until now, we have gone through a few definitions of microservices; now, let's discuss microservices in detail. In short, a microservice architecture removes most of the drawbacks of SOA architectures.  Slicing your application into a number of services is neither SOA nor microservices. However, combining service design and best practices from the SOA world along with a few emerging practices, such as isolated deployment, semantic versioning, providing lightweight services, and service discovery in polyglot programming, is microservices. We implement microservices to satisfy business features and implement them with reduced time to market and greater flexibility. Before we move on to understand the architecture, let's discuss the two important architectures that have led to its existence: The monolithic architecture style SOA Most of us would be aware of the scenario where during the life cycle of an enterprise application development, a suitable architectural style is decided. Then, at various stages, the initial pattern is further improved and adapted with changes that cater to various challenges, such as deployment complexity, large code base, and scalability issues. This is exactly how the monolithic architecture style evolved into SOA, further leading up to microservices. Monolithic architecture The monolithic architectural style is a traditional architecture type and has been widely used in the industry. The term "monolithic" is not new and is borrowed from the Unix world. In Unix, most of the commands exist as a standalone program whose functionality is not dependent on any other program. As seen in the succeeding image, we can have different components in the application such as: User interface: This handles all of the user interaction while responding with HTML or JSON or any other preferred data interchange format (in the case of web services). Business logic: All the business rules applied to the input being received in the form of user input, events, and database exist here. Database access: This houses the complete functionality for accessing the database for the purpose of querying and persisting objects. A widely accepted rule is that it is utilized through business modules and never directly through user-facing components. Software built using this architecture is self-contained. We can imagine a single .NET assembly that contains various components, as described in the following image: As the software is self-contained here, its components are interconnected and interdependent. Even a simple code change in one of the modules may break a major functionality in other modules. This would result in a scenario where we'd need to test the whole application. With the business depending critically on its enterprise application frameworks, this amount of time could prove to be very critical. Having all the components tightly coupled poses another challenge: whenever we execute or compile such software, all the components should be available or the build will fail; refer to the preceding image that represents a monolithic architecture and is a self-contained or a single .NET assembly project. However, monolithic architectures might also have multiple assemblies. This means that even though a business layer (assembly, data access layer assembly, and so on) is separated, at run time, all of them will come together and run as one process.  A user interface depends on other components' direct sale and inventory in a manner similar to all other components that depend upon each other. In this scenario, we will not be able to execute this project in the absence of any one of these components. The process of upgrading any one of these components will be more complex as we may have to consider other components that require code changes too. This results in more development time than required for the actual change. Deploying such an application will become another challenge. During deployment, we will have to make sure that each and every component is deployed properly; otherwise, we may end up facing a lot of issues in our production environments. If we develop an application using the monolithic architecture style, as discussed previously, we might face the following challenges: Large code base: This is a scenario where the code lines outnumber the comments by a great margin. As components are interconnected, we will have to bear with a repetitive code base. Too many business modules: This is in regard to modules within the same system. Code base complexity: This results in a higher chance of code breaking due to the fix required in other modules or services. Complex code deployment: You may come across minor changes that would require whole system deployment. One module failure affecting the whole system: This is in regard to modules that depend on each other. Scalability: This is required for the entire system and not just the modules in it. Intermodule dependency: This is due to tight coupling. Spiraling development time: This is due to code complexity and interdependency. Inability to easily adapt to a new technology: In this case, the entire system would need to be upgraded. As discussed earlier, if we want to reduce development time, ease of deployment, and improve maintainability of software for enterprise applications, we should avoid the traditional or monolithic architecture. Service-oriented architecture In the previous section, we discussed the monolithic architecture and its limitations. We also discussed why it does not fit into our enterprise application requirements. To overcome these issues, we should go with some modular approach where we can separate the components such that they should come out of the self-contained or single .NET assembly. The main difference between SOA & monolithic is not one or multiple assembly. But as the service in SOA runs as separate process, SOA scales better compared to monolithic. Let's discuss the modular architecture, that is, SOA. This is a famous architectural style using which the enterprise applications are designed with a collection of services as its base. These services may be RESTful or ASMX Web services. To understand SOA in more detail, let's discuss "service" first. What is service? Service, in this case, is an essential concept of SOA. It can be a piece of code, program, or software that provides some functionality to other system components. This piece of code can interact directly with the database or indirectly through another service. Furthermore, it can be consumed by clients directly, where the client may either be a website, desktop app, mobile app, or any other device app. Refer to the following diagram: Service refers to a type of functionality exposed for consumption by other systems (generally referred to as clients/client applications). As mentioned earlier, it can be represented by a piece of code, program, or software. Such services are exposed over the HTTP transport protocol as a general practice. However, the HTTP protocol is not a limiting factor, and a protocol can be picked as deemed fit for the scenario. In the following image, Service – direct selling is directly interacting with Database, and three different clients, namely Web, Desktop, and Mobile, are consuming the service. On the other hand, we have clients consuming Service – partner selling, which is interacting with Service – channel partners for database access. A product selling service is a set of services that interacts with client applications and provides database access directly or through another service, in this case, Service – Channel partner.  In the case of Service – direct selling, shown in the preceding example, it is providing some functionality to a Web Store, a desktop application, and a mobile application. This service is further interacting with the database for various tasks, namely fetching data, persisting data, and so on. Normally, services interact with other systems via some communication channel, generally the HTTP protocol. These services may or may not be deployed on the same or single servers. In the preceding image, we have projected an SOA example scenario. There are many fine points to note here, so let's get started. Firstly, our services can be spread across different physical machines. Here, Service-direct selling is hosted on two separate machines. It is a possible scenario that instead of the entire business functionality, only a part of it will reside on Server 1 and the remaining on Server 2. Similarly, Service – partner selling appears to be having the same arrangement on Server 3 and Server 4. However, it doesn't stop Service – channel partners being hosted as a complete set on both the servers: Server 5 and Server 6. A system that uses a service or multiple services in a fashion mentioned in the preceding figure is called an SOA. We will discuss SOA in detail in the following sections. Let's recall the monolithic architecture. In this case, we did not use it because it restricts code reusability; it is a self-contained assembly, and all the components are interconnected and interdependent. For deployment, in this case, we will have to deploy our complete project after we select the SOA (refer to preceding image and subsequent discussion). Now, because of the use of this architectural style, we have the benefit of code reusability and easy deployment. Let's examine this in the wake of the preceding figure: Reusability: Multiple clients can consume the service. The service can also be simultaneously consumed by other services. For example, OrderService is consumed by web and mobile clients. Now, OrderService can also be used by the Reporting Dashboard UI. Stateless: Services do not persist any state between requests from the client, that is, the service doesn't know, nor care, that the subsequent request has come from the client that has/hasn't made the previous request. Contract-based: Interfaces make it technology-agnostic on both sides of implementation and consumption. It also serves to make it immune to the code updates in the underlying functionality. Scalability: A system can be scaled up; SOA can be individually clustered with appropriate load balancing. Upgradation: It is very easy to roll out new functionalities or introduce new versions of the existing functionality. The system doesn't stop you from keeping multiple versions of the same business functionality. Summary In this article, we discussed what the microservice architectural style is in detail, its history, and how it differs from its predecessors: monolithic and SOA. We further defined the various challenges that monolithic faces when dealing with large systems. Scalability and reusability are some definite advantages that SOA provides over monolithic. We also discussed the limitations of the monolithic architecture, including scaling problems, by implementing a real-life monolithic application. The microservice architecture style resolves all these issues by reducing code interdependency and isolating the dataset size that any one of the microservices works upon. We utilized dependency injection and database refactoring for this. We further explored automation, CI, and deployment. These easily allow the development team to let the business sponsor choose what industry trends to respond to first. This results in cost benefits, better business response, timely technology adoption, effective scaling, and removal of human dependency. Resources for Article: Further resources on this subject: Microservices and Service Oriented Architecture [article] Breaking into Microservices Architecture [article] Microservices – Brave New World [article]
Read more
  • 0
  • 0
  • 30666

article-image-basics-python-absolute-beginners
Packt
19 Jun 2017
5 min read
Save for later

Basics of Python for Absolute Beginners

Packt
19 Jun 2017
5 min read
In this article by Bhaskar Das and Mohit Raj, authors of the book, Learn Python in 7 days, we will learn basics of Python. The Python language had a humble beginning in the late 1980s when a Dutchman, Guido Von Rossum, started working on a fun project that would be a successor to the ABC language with better exception handling and capability to interface with OS Amoeba at Centrum Wiskunde and Informatica. It first appeared in 1991. Python 2.0 was released in the year 2000 and Python 3.0 was released in the year 2008. The language was named Python after the famous British television comedy show Monty Python's Flying Circus, which was one of the favorite television programmes of Guido. Here, we will see why Python has suddenly influenced our lives, various applications that use Python, and Python's implementations. In this article, you will be learning the basic installation steps required to perform on different platforms (that is Windows, Linux and Mac), about environment variables, setting up environment variables, file formats, Python interactive shell, basic syntaxes, and, finally, printing out the formatted output. (For more resources related to this topic, see here.) Why Python? Now you might be suddenly bogged with the question, why Python? According to the Institute of Electrical and Electronics Engineers (IEEE) 2016 ranking, Python ranked third after C and Java. As per Indeed.com's data of 2016, Python job market search ranked fifth. Clearly, all the data points to the ever-rising demand in the job market for Python. It's a cool language if you want to learn it just for fun. Also, you will adore the language if you want to build your career around Python. At the school level, many schools have started including Python programming for kids. With new technologies taking the market by surprise, Python has been playing a dominant role. Whether it's cloud platform, mobile app development, BigData, IoT with Raspberry Pi, or the new Blockchain technology, Python is being seen as a niche language platform to develop and deliver scalable and robust applications. Some key features of the language are: Python programs can run on any platform, you can carry code created in a Windows machine and run it on Mac or Linux Python has a large inbuilt library with prebuilt and portable functionality, known as the standard library Python is an expressive language Python is free and open source Python code is about one third of the size of equivalent C++ and Java code. Python can be both dynamically and strongly typed In dynamically typed, the type of a variable is interpreted at runtime, which means that there is no need to define the type (int, float) of a variable in Python Python applications One of the most famous platform where Python is extensively used is YouTube. Other places where you will find Python being extensively used are special effects in Hollywood movies, drug evolution and discovery, traffic control systems, ERP systems, cloud hosting, e-commerce platform, CRM systems, and whichever field you can think of. Versions At the time of writing this book, the two main versions of the Python programming language available in the market were Python 2.x and Python 3.x. The stable releases at the time of writing this book were Python 2.7.13 and Python 3.6.0. Implementations of Python Major implementations include CPython, Jython, IronPython, MicroPython and PyPy. Installation Here, we will look forward to the installation of Python on three different OS platforms, namely Windows, Linux, and Mac OS. Let's begin with the Windows platform. Installation on Windows platform Python 2.x can be downloaded from https://www.python.org/downloads. The installer is simple and easy to install. Follow these steps to install the setup: Once you click on the setup installer, you will get a small window on your desktop screen as shown. Click onNext: Provide a suitable installation folder to install Python. If you don't provide the installation folder, then the installer will automatically create an installation folder for you as shown in the screenshot shown. Click on Next: After the completion of Step 2, you will get a window to customize Python as shown in the following screenshot. Note that theAdd python.exe to Path option has been markedx. Select this option to add it to system path variable. Click onNext: Finally, clickFinish to complete the installation: Summary So far, we did a walk through on the beginning and brief history of Python. We looked at the various implementations and flavors of Python. You also learned about installing on Windows OS. Hope this article has incited enough interest in Python and serves as your first step in the kingdom of Python, with enormous possibilities! Resources for Article: Further resources on this subject: Layout Management for Python GUI [article] Putting the Fun in Functional Python [article] Basics of Jupyter Notebook and Python [article]
Read more
  • 0
  • 0
  • 12415

article-image-understanding-basics-gulp
Packt
19 Jun 2017
15 min read
Save for later

Understanding the Basics of Gulp

Packt
19 Jun 2017
15 min read
In this article written by Travis Maynard, author of the book Getting Started with Gulp - Second Edition, we will take a look at the basics of gulp and how it works. Understanding some of the basic principles and philosophies behind the tool, it's plugin system will assist you as you begin writing your own gulpfiles. We'll start by taking a look at the engine behind gulp and then follow up by breaking down the inner workings of gulp itself. By the end of this article, you will be prepared to begin writing your own gulpfile. (For more resources related to this topic, see here.) Installing node.js and npm As you learned in the introduction, node.js and npm are the engines that work behind the scenes that allow us to operate gulp and keep track of any plugins we decide to use. Downloading and installing node.js For Mac and Windows, the installation is quite simple. All you need to do is navigate over to http://nodejs.org and click on the big green install button. Once the installer has finished downloading, run the application and it will install both node.js and npm. For Linux, there are a couple more steps, but don't worry; with your newly acquired command-line skills, it should be relatively simple. To install node.js and npm on Linux, you'll need to run the following three commands in Terminal: sudo add-apt-repository ppa:chris-lea/node.js sudo apt-get update sudo apt-get install nodejs The details of these commands are outside the scope of this book, but just for reference, they add a repository to the list of available packages, update the total list of packages, and then install the application from the repository we added. Verify the installation To confirm that our installation was successful, try the following command in your command line: node -v If node.js is successfully installed, node -v will output a version number on the next line of your command line. Now, let's do the same with npm: npm -v Like before, if your installation was successful, npm -v should output the version number of npm on the next line. The versions displayed in this screenshot reflect the latest Long Term Support (LTS) release currently available as of this writing. This may differ from the version that you have installed depending on when you're reading this. It's always suggested that you use the latest LTS release when possible. The -v  command is a common flag used by most command-line applications to quickly display their version number. This is very useful to debug version issues while using command-line applications. Creating a package.json file Having npm in our workflow will make installing packages incredibly easy; however, we should look ahead and establish a way to keep track of all the packages (or dependencies) that we use in our projects. Keeping track of dependencies is very important to keep your workflow consistent across development environments. Node.js uses a file named package.json to store information about your project, and npm uses this same file to manage all of the package dependencies your project requires to run properly. In any project using gulp, it is always a great practice to create this file ahead of time so that you can easily populate your dependency list as you are installing packages or plugins. To create the package.json file, we will need to run npm's built in init action using the following command: npm init Now, using the preceding command, the terminal will show the following output: Your command line will prompt you several times asking for basic information about the project, such as the project name, author, and the version number. You can accept the defaults for these fields by simply pressing the Enter key at each prompt. Most of this information is used primarily on the npm website if a developer decides to publish a node.js package. For our purposes, we will just use it to initialize the file so that we can properly add our dependencies as we move forward. The screenshot for the preceding command is as follows: Installing gulp With npm installed and our package.json file created, we are now ready to begin installing node.js packages. The first and most important package we will install is none other than gulp itself. Locating gulp Locating and gathering information about node.js packages is very simple, thanks to the npm registry. The npm registry is a companion website that keeps track of all the published node.js modules, including gulp and gulp plugins. You can find this registry at http://npmjs.org. Take a moment to visit the npm registry and do a quick search for gulp. The listing page for each node.js module will give you detailed information on each project, including the author, version number, and dependencies. Additionally, it also features a small snippet of command-line code that you can use to install the package along with readme information that will outline basic usage of the package and other useful information. Installing gulp locally Before we install gulp, make sure you are in your project's root directory, gulp-book, using the cd and ls commands you learned earlier. If you ever need to brush up on any of the standard commands, feel free to take a moment to step back and review as we progress through the book. To install packages with npm, we will follow a similar pattern to the ones we've used previously. Since we will be covering both versions 3.x and 4.x in this book, we'll demonstrate installing both: For installing gulp 3.x, you can use the following: npm install --save-dev gulp For installing gulp 4.x, you can use the following: npm install --save-dev gulpjs/gulp#4.0 This command is quite different from the 3.x command because this command is installing the latest development release directly from GitHub. Since the 4.x version is still being actively developed, this is the only way to install it at the time of writing this book. Once released, you will be able to run the previous command to without installing from GitHub. Upon executing the command, it will result in output similar to the following: To break this down, let's examine each piece of this command to better understand how npm works: npm: This is the application we are running install: This is the action that we want the program to run. In this case, we are instructing npm to install something in our local folder --save-dev: This is a flag that tells npm to add this module to the dev dependencies list in our package.json file gulp: This is the package we would like to install Additionally, npm has a –-save flag that saves the module to the list of dependencies instead of devDependencies. These dependency lists are used to separate the modules that a project depends on to run, and the modules a project depends on when in development. Since we are using gulp to assist us in development, we will always use the --save-dev flag throughout the book. So, this command will use npm to contact the npm registry, and it will install gulp to our local gulp-book directory. After using this command, you will note that a new folder has been created that is named node_modules. It is where node.js and npm store all of the installed packages and dependencies of your project. Take a look at the following screenshot: Installing gulp-cli globally For many of the packages that we install, this will be all that is needed. With gulp, we must install a companion module gulp-cli globally so that we can use the gulp command from anywhere in our filesystem. To install gulp-cli globally, use the following command: npm install -g gulp-cli In this command, not much has changed compared to the original command where we installed the gulp package locally. We've only added a -g flag to the command, which instructs npm to install the package globally. On Windows, your console window should be opened under an administrator account in order to install an npm package globally. At first, this can be a little confusing, and for many packages it won't apply. Similar build systems actually separate these usages into two different packages that must be installed separately; once that is installed globally for command-line use and another installed locally in your project. Gulp was created so that both of these usages could be combined into a single package, and, based on where you install it, it could operate in different ways. Anatomy of a gulpfile Before we can begin writing tasks, we should take a look at the anatomy and structure of a gulpfile. Examining the code of a gulpfile will allow us to better understand what is happening as we run our tasks. Gulp started with four main methods:.task(), .src(), .watch(), and .dest(). The release of version 4.x introduced additional methods such as: .series() and .parallel(). In addition to the gulp API methods, each task will also make use of the node.js .pipe() method. This small list of methods is all that is needed to understand how to begin writing basic tasks. They each represent a specific purpose and will act as the building blocks of our gulpfile. The task() method The .task() method is the basic wrapper for which we create our tasks. Its syntax is .task(string, function). It takes two arguments—string value representing the name of the task and a function that will contain the code you wish to execute upon running that task. The src() method The .src() method is our input or how we gain access to the source files that we plan on modifying. It accepts either a single glob string or an array of glob strings as an argument. Globs are a pattern that we can use to make our paths more dynamic. When using globs, we can match an entire set of files with a single string using wildcard characters as opposed to listing them all separately. The syntax is for this method is .src(string || array).  The watch() method The .watch() method is used to specifically look for changes in our files. This will allow us to keep gulp running as we code so that we don't need to rerun gulp any time we need to process our tasks. This syntax is different between the 3.x and 4.x version. For version 3.x the syntax is—.watch(string || array, array) with the first argument being our paths/globs to watch and the second argument being the array of task names that need to be run when those files change. For version 4.x the syntax has changed a bit to allow for two new methods that provide more explicit control of the order in which tasks are executed. When using 4.x instead of passing in an array as the second argument, we will use either the .series() or .parallel() method like so—.watch(string || array, gulp.series() || gulp.parallel()). The dest() method The dest() method is used to set the output destination of your processed file. Most often, this will be used to output our data into a build or dist folder that will be either shared as a library or accessed by your application. The syntax for this method is .dest(string). The pipe() method The .pipe() method will allow us to pipe together smaller single-purpose plugins or applications into a pipechain. This is what gives us full control of the order in which we would need to process our files. The syntax for this method is .pipe(function). The parallel() and series() methods The parallel and series methods were added in version 4.x as a way to easily control whether your tasks are run together all at once or in a sequence one after the other. This is important if one of your tasks requires that other tasks complete before it can be ran successfully. When using these methods the arguments will be the string names of your tasks separated by a comma. The syntax for these methods is—.series(tasks) and .parallel(tasks); Understanding these methods will take you far, as these are the core elements of building your tasks. Next, we will need to put these methods together and explain how they all interact with one another to create a gulp task. Including modules/plugins When writing a gulpfile, you will always start by including the modules or plugins you are going to use in your tasks. These can be both gulp plugins or node.js modules, based on what your needs are. Gulp plugins are small node.js applications built for use inside of gulp to provide a single-purpose action that can be chained together to create complex operations for your data. Node.js modules serve a broader purpose and can be used with gulp or independently. Next, we can open our gulpfile.js file and add the following code: // Load Node Modules/Plugins var gulp = require('gulp'); var concat = require('gulp-concat'); var uglify = require('gulp-uglify'); The gulpfile.js file will look as shown in the following screenshot: In this code, we have included gulp and two gulp plugins: gulp-concat and gulp-uglify. As you can now see, including a plugin into your gulpfile is quite easy. After we install each module or plugin using npm, you simply use node.js' require() function and pass it in the name of the module. You then assign it to a new variable so that you can use it throughout your gulpfile. This is node.js' way of handling modularity, and because a gulpfile is essentially a small node.js application, it adopts this practice as well. Writing a task All tasks in gulp share a common structure. Having reviewed the five methods at the beginning of this section, you will already be familiar with most of it. Some tasks might end up being larger than others, but they still follow the same pattern. To better illustrate how they work, let's examine a bare skeleton of a task. This skeleton is the basic bone structure of each task we will be creating. Studying this structure will make it incredibly simple to understand how parts of gulp work together to create a task. An example of a sample task is as follows: gulp.task(name, function() { return gulp.src(path) .pipe(plugin) .pipe(plugin) .pipe(gulp.dest(path)); }); In the first line, we use the new gulp variable that we created a moment ago and access the .task() method. This creates a new task in our gulpfile. As you learned earlier, the task method accepts two arguments: a task name as a string and a callback function that will contain the actions we wish to run when this task is executed. Inside the callback function, we reference the gulp variable once more and then use the .src() method to provide the input to our task. As you learned earlier, the source method accepts a path or an array of paths to the files that we wish to process. Next, we have a series of three .pipe() methods. In each of these pipe methods, we will specify which plugin we would like to use. This grouping of pipes is what we call our pipechain. The data that we have provided gulp with in our source method will flow through our pipechain to be modified by each piped plugin that it passes through. The order of the pipe methods is entirely up to you. This gives you a great deal of control in how and when your data is modified. You may have noticed that the final pipe is a bit different. At the end of our pipechain, we have to tell gulp to move our modified file somewhere. This is where the .dest() method comes into play. As we mentioned earlier, the destination method accepts a path that sets the destination of the processed file as it reaches the end of our pipechain. If .src() is our input, then .dest() is our output. Reflection To wrap up, take a moment to look at a finished gulpfile and reflect on the information that we just covered. This is the completed gulpfile that we will be creating from scratch, so don't worry if you still feel lost. This is just an opportunity to recognize the patterns and syntaxes that we have been studying so far. We will begin creating this file step by step: // Load Node Modules/Plugins var gulp = require('gulp'); var concat = require('gulp-concat'); var uglify = require('gulp-uglify'); // Process Styles gulp.task('styles', function() {     return gulp.src('css/*.css')         .pipe(concat('all.css'))         .pipe(gulp.dest('dist/')); }); // Process Scripts gulp.task('scripts', function() {     return gulp.src('js/*.js')         .pipe(concat('all.js'))         .pipe(uglify())         .pipe(gulp.dest('dist/')); }); // Watch Files For Changes gulp.task('watch', function() {     gulp.watch('css/*.css', 'styles');     gulp.watch('js/*.js', 'scripts'); }); // Default Task gulp.task('default', gulp.parallel('styles', 'scripts', 'watch')); The gulpfile.js file will look as follows: Summary In this article, you installed node.js and learned the basics of how to use npm and understood how and why to install gulp both locally and globally. We also covered some of the core differences between the 3.x and 4.x versions of gulp and how they will affect your gulpfiles as we move forward. To wrap up the article, we took a small glimpse into the anatomy of a gulpfile to prepare us for writing our own gulpfiles from scratch. Resources for Article: Further resources on this subject: Performing Task with Gulp [article] Making a Web Server in Node.js [article] Developing Node.js Web Applications [article]
Read more
  • 0
  • 0
  • 17807
article-image-overview-important-concepts-microsoft-dynamics-nav-2016
Packt
19 Jun 2017
15 min read
Save for later

Overview of Important Concepts of Microsoft Dynamics NAV 2016

Packt
19 Jun 2017
15 min read
In this article by Rabindra Sah, author of the book, Mastering Microsoft Dynamics NAV 2016, we will cover the important concepts of Microsoft Dynamics NAV 2016.   (For more resources related to this topic, see here.) Version control Version control systems are the third-party system that provides the service that tracks changes in the file and folders of the system. In this section, we will be discussing two popular version control systems for Microsoft Dynamics NAV. Let's take a look at the web services architecture in Dynamics NAV. Microsoft Dynamics NAV 2016 uses two types of web services: SOAP web services and OData web services:   ODATA SOAP Page Yes Yes Query Yes No Codeunit No Yes The main difference between the two in Microsoft Dynamics NAV is that with SOAP web services, you can publish and reuse business logic, and with OData web services, you can change data to external applications. In a dataset, you can make certain changes in order to improve the performance of the report. This is not always applicable for all reports; it depends on the nature of the problem, and you should spend time analyzing the problem before fixing it. The following are some of the basic considerations that you should keep in mind when dealing with datasets in NAV reports:  Try to avoid the creation of dataset lines if possible; create variables instead Try to reduce the number of rows and columns Apply filters to the request page For slow reports with higher runtime, use the job queue to generate the report in the server Use text constants for the caption, if needed Try to avoid Include Caption for the column with no need for captions  Technical upgrade Technical upgrade is the least used upgrade process, which is when you are making one version upgrade at a time, that is, from Version 2009 to Version 2013 or Version 2013 to Version 2015. So, when you are planning to jump multiple versions at the same time, then technical upgrade might not be the perfect option to choose. It can be efficient when there are minimal changes in the source database objects, that is, fewer customizations. It can also be considered an efficient choice when the business requirement from the product is still the same or has very less significant changes. Upgrading estimates In this section, we are going to look at the core components that are responsible for the estimates of the upgrade process. The components that are to be considered while estimating for the upgrade process are as follows:   Code upgrade Object transformation Data upgrade Testing and implementation   Code upgrade The best method to estimate the code upgrade is to use a file compare tool. It helps us compare the file, folder, version control, conflicts and resolution, automatic intelligent merging, in-place editing of files, track changes, and code analysis. You can also design your own compare tool if you want, for example, take two version of the same object, take two versions of Customer table. Open them in Notepad and check line by line whether there is any difference in the line, and then get that line value and show it as a log. You can achieve this via C# or any programming language. This should run for each object in two versions of the NAV system and provide you with the statistics of the amount of changes: This can be really handy when it comes to the point of estimation for the code changes. You can also do it manually if the number of objects is less. It is recommended that one must use the Microsoft Dynamics Sure Step methodology while carrying out any upgrade projects. Dynamics Sure step is a full life cycle methodology tool which is designed to provide a discipline and best practice to upgrade, migrate, configure and deploy Microsoft Dynamics NAV Solution. Object transformation We must take a close look in the case of some objects that are not directly upgraded. As, for example, if your source database reports are in the classic version or early RTC version, it might not be feasible to transform them into the latest reports because of the huge technological gap between the reports. In these cases, you must be very careful while estimating for these upgrades. For example, TransHeader and TransFooter and other categorizations that are present in classic reports are hard to map directly into Dynamics NAV 2016 reports. We might have to develop our own logic to achieve these grouping values, which might take some additional time. So, always treat this section as a customization instead of upgrade. Mostly, Microsoft partner vendors keep this section separate and, in most of the cases, separate resources are assigned to do that for parallel work environments. Reports can also have word layouts that should also be considered during the estimates. Data upgrade We perform a number of distinct steps while upgrading data. You must consider the time for each section in the data upgrade process in order to correctly estimate the time for it: The first thing that we do is a trial data upgrade process. This allows us to analyze different aspects, such as, to see how long it takes; if data upgrade process works or not; and will it allow us to test the results of this trial data upgrade. Once we are done with the trial data upgrade, we might need to do it a number of times before it works. Then, we can do a preproduction data upgrade because since the moment we started our analyses and development, the production data might have changed, so we also need to do a preproduction run to also have a closer estimate of the time windows that we have available when we are going to do the real implementation. Acceptance testing is also a very important phase. Once you have done the data upgrade, you need the end users or key users to confirm that our data has been converted correctly. And then you are ready to perform the live data upgrade. So all of these different phases in the data upgrade will also require some time. The amount of time will also depend on the size of the database or the version that you are starting from. So, this gives you an overview of the different pillars that are important to estimate how much time it might take to prepare and analyze the updates project. Software as a Service Software as a Service is a cloud services delivery model, which offers an on-demand online software subscription. The latest SaaS release of Microsoft is Dynamics 365 (previously known as Project Madeira). The following image illustrates the SAAS taxonomy. Here you can clearly understand different services like Sales force, NetSuite, QuickBooks which are distributed as SAAS: Software as a Service taxonomy Understanding the PowerShell cmdlets We can categorize the PowerShell commands into five major categories of use:   Commands for Server administrators Command for implementers for company management Commands for administrators for upgrades Commands for administrator for security Commands for developers   Commands for server administrators The first category contains commands that can be used for administrative operations like create, save, remove, get, import, export, set, and the like as given in the following table: Dismount-NAVTenant New-NAVServerConfiguration Export-NAVApplication New-NAVServerInstance Get-NAVApplication New-NAVWebServerInstance Get-NAVServerConfiguration Remove-NAVApplication Get-NAVServerInstance Remove-NAVServerInstance Get-NAVServerSession Remove-NAVServerSession Get-NAVTenant Save-NAVTenantConfiguration Get-NAVWebServerInstance Set-NAVServerConfiguration Export-NAVServerLicenseInformation Set-NAVServerInstance Import-NAVServerLicense Set-NAVWebServerInstanceConfiguration Mount-NAVApplication Sync-NAVTenant Mount-NAVTenant   We can set up web server instances, change configurations, and create a multitenant environment; we can only use PowerShell for a multitenant environment. Command for implementers for company management The second category of commands can be used by implementers, in particular, for operations related to installation and configuration of the system. The following are a few examples of this category of commands: Copy-NAVCompany Get-NAVCompany New-NAVCompany Remove-NAVCompany Rename-NAVCompany Commands for administrators for upgrades The third category is a special category for administrators, which is related to upgradation of operations. Get-NAVDataUpgrade Resume-NAVDataUpgrade Start-NAVDataUpgrade Stop-NAVDataUpgrade The third category of commands can be useful along with the upgrade toolkit. Commands for administrator for security This is one of the most important categories, which is related to the backend of the system. The commands in this category grant the power of accessibility and permission to the administrators. I strongly recommend these make-life-easy commands if you are working on security operations. Commands in this category include the following: Get-NAVServerUser Remove-NAVServerPermission Get-NAVServerUserPermissionSet Remove-NAVServerPermissionSet New-NAVServerPermission Remove-NAVServerUser New-NAVServerPermissionSet Remove-NAVServerUserPermissionSet New-NAVServerUser Set-NAVServerPermission New-NAVServerUserPermissionSet Set-NAVServerPermissionSet   Set-NAVServerUser These commands can be used basically to add users, and for permission set. Commands for developers The last, but not the least, treasure of commands is dedicated to developers, and is one of my most-used commands. It covers a wide range of commands, and should be included in your daily work life. This set of commands includes the following: Get-NAVWebService Join-NAVApplicationObjectFile Invoke-NAVCodeunit Join-NAVApplicationObjectLanguageFile New-NAVWebService Merge-NAVApplicationObject Remove-NAVWebService Remove-NAVApplicationObjectLanguage Compare-NAVApplicationObject Set-NAVApplicationObjectProperty Export-NAVAppliactionObjectLanguage Split-NAVApplicationApplicationObjectFile Get-NAVApplicationObjectProperty Split-NAVApplicationObjectLanguageFile Import-NAVApplicationObjectLanguage Test-NAVApplicationObjectLanguage   Update-NAVApplicationObject Microsoft Dynamics NAV 2016 Posting preview In Microsoft Dynamics NAV 2016, you can review the entries to be created before you post a document or journal. This is made possible by the introduction of a new feature called Preview Posting, which enables you to preview the impact of a posting against each ledger associated with a document. In every document and journal that can be posted, you can click on Preview Posting to review the different types of entries that will be created when you post the document or journal. Workflow Workflow enables you to understand modern life business processes along with the best practices or industry standard practice. For example, ensuring that a customer credit limit has been independently verified and the requirement of two approvers for a payment process has been met. Workflow has these three main capabilities:   Approvals Notifications Automation   Workflow basically has three components, that is, Event, Condition, and Response. Event defines the name of any event in the system, whereas On Condition specifies the event, and the Then response is the action that needs to be taken on the basis of that condition. This is shown in the following screenshot: Exception handling Exception handling is a new concept in Microsoft Dynamics NAV. It was imported from .NET, and is now gaining popularity among the C/AL programmers because of its effective usage. Like C#, for exception handling, we use the Try function. The Try functions are the new additions to the function library, which enable you to handle errors that occur in the application during runtime. Here we are not dealing with compile time issues. For example, the message Error Returned: Divisible by Zero Error. is always a critical error, and should be handled in order to be avoided. This also stops the system from entering into the unsafe state. Like C sharp and other rich programming languages, the Try functions in C/AL provide easy-to-understand error messages, which can also be dynamic and directly generated by the system. This feature helps us preplan those errors and present better descriptive errors to the users. You can use the Try functions to catch errors/exceptions that are thrown by Microsoft Dynamics NAV or exceptions that are thrown during the .NET Framework interoperability operations. The Try function is in many ways similar to the conditional Codeunit.Run function except for following points: The database records that are changed because of the Try function cannot be rolled back The Try function calls do not need to be committed to the database Visual Basic programming Visual Basic (VB)is an event-driven programming language. It is also an Integrated development environment (IDE). If you are familiar with BASIC programming language, then it will be easy to understand Visual Basic, since it is derived from BASIC. I will try to provide the basics about this language here, since it is the least discussed topic in the NAV community, but very essential to be understood by all report designers and developers. Here we do not need to understand each and every detail of the VB programming language, but understanding the syntax and structure will help us understand the code that we are going to use in the RDLC report. An example code of VB can be written as follows: Public Function BlankZero(ByVal Value As Decimal) if Value = 0 then Return "" End if Return Value End Function End Sub This preceding function, BlankZero, basically just returns the value of the parameter. This is simplest function which can be found in the code section of the RDLC report. Unlike C/AL, we do not need to end the code line with a colon (;): Writing your own Test unit Writing your own Test unit is very important, not just to test your code but also to give you an eagle's-eye view on how your code is actually interacting with the system. It gives your coding a meaning, and allows others to understand and relate to your development. Writing a unit test involves basically four steps as shown in the following diagram: Certificates A certificate is nothing but a token that binds an identity to a cryptographic key. Microsoft Management Console (MMC) is a presentation service for management applications in the Microsoft Windows environment. It is a part of the independent software vendor (ISV) extensible service, that is, it provides a common integrated environment for snap-ins, provided by Microsoft and third-party software vendors. Certificate authority A certification authority (CA) is an entity that issues certificates. If all certificates have a common issuer, then the issuer's public key can be distributed out of band: In the preceding diagram, the certificate server is the third party, which has a secure relationship with both the parties that want to communicate. CA is connected to both parties through a secure channel. User B sends a copy of his public key to the CA. Then the CA encrypts public key of User B using a different key. Two files are created because of this trigger: the first is an encrypted package, which is nothing but the certificate, and the second is the digital signature of the certificate server. The certificate server returns the certificate to user B. Now User A asks for certificate from User B. User B sends the copy of its public key to User A. This is again done using a secure communication channel. User A decrypts the certificate using the key obtained from the certificate server, and extracts public key of User B. User A also checks the digital signature of the certificate server to ensure that the certificate is authentic. Here, whatever data is encrypted using public key of User B, it can only be decrypted using the private key of User B, which is only present with User B and not with any of the intruders over the Internet. So only User B can decrypt and read the content send by User A. Once the keys are transferred, User A can communicate with User B. In case User B wants to send data to User A, then User B would need public key of User A, which will be again granted by CA. Certificates are issued to a principal. – The Issuance policy specifies the principals to which the CA will issue certificates. The certification authority does not need to be online to check the validity of the certificate. It can be present in a server of a locked room. It is only consulted when a principal needs a certificate. Certificates are a way of associating an identity with a public key and distinguished name. Authentication policy for CA The authentication policy defines the way principals prove their identities. Each CA has its own requirements constrained by contractual requirements such as with Primary Certification Authority (PCA): PCA issues certificates to CA CA issues certificates to individuals and organizations All rely on non-electronic proofs of identity, such as biometrics (fingerprints), documents (drivers license or passport), or personal knowledge. A specific authentication policy can be determined by checking the policy of the CA that signed the certificate. Kinds of certificates There are at least four kinds of certificates, which are as follows: Site certificates (for example, www.msdn.microsoft.com). Personal certificates (used if the server wants to authenticate the client). You can install a personal certificate in your browser. Software vendor certificates (used when software is installed). Often, when you run a program, a dialog box appears warning that The publisher could not be verified. Are you sure you want to run this software? This is caused either because the software does not have a software vendor certificate, or because you do not trust the CA who signed the software vendor certificate. Anonymous certificates, (used by a whistle blower to indicate that the same person sent a sequence of messages, but doesn't know who that person is). Other types of certificates Certificates can also be based on a principal's association with an organization (such as Microsoft (MSDN)), where the principal lives, or the role played in an organization (such as the comptroller). Summary In this article we covered the important concepts in Dynamics Nav 2016 such as version control, dataset considerations, technical upgrade, Software as a Service, certificates, and so on. Resources for Article: Further resources on this subject: Introduction to Microsoft Dynamics NAV 2016 [article] Customization in Microsoft Dynamics CRM [article] Exploring Microsoft Dynamics NAV – An Introduction [article]
Read more
  • 0
  • 0
  • 1999

article-image-introduction-cyber-extortion
Packt
19 Jun 2017
21 min read
Save for later

Introduction to Cyber Extortion

Packt
19 Jun 2017
21 min read
In this article by Dhanya Thakkar, the author of the book Preventing Digital Extortion, explains how often we make the mistake of relying on the past for predicting the future, and nowhere is this more relevant than in the sphere of the Internet and smart technology. People, processes, data, and things are tightly and increasingly connected, creating new, intelligent networks unlike anything else we have seen before. The growth is exponential and the consequences are far reaching for individuals, and progressively so for businesses. We are creating the Internet of Things and the Internet of Everything. (For more resources related to this topic, see here.) It has become unimaginable to run a business without using the Internet. It is not only an essential tool for current products and services, but an unfathomable well for innovation and fresh commercial breakthroughs. The transformative revolution is spillinginto the public sector, affecting companies like vanguards and diffusing to consumers, who are in a feedback loop with suppliers, constantly obtaining and demanding new goods. Advanced technologies that apply not only to machine-to-machine communication but also to smart sensors generate complex networks to which theoretically anything that can carry a sensor can be connected. Cloud computing and cloud-based applications provide immense yet affordable storage capacity for people and organizations and facilitate the spread of data in more ways than one. Keeping in mind the Internet’s nature, the physical boundaries of business become blurred, and virtual data protection must incorporate a new characteristic of security: encryption. In the middle of the storm of the IoT, major opportunities arise, and equally so, unprecedented risks lurk. People often think that what they put on the Internet is protected and closed information. It is hardly so. Sending an e-mail is not like sending a letter in a closed envelope. It is more like sending a postcard, where anyone who gets their hands on it can read what's written on it. Along with people who want to utilize the Internet as an open business platform, there are people who want to find ways of circumventing legal practices and misusing the wealth of data on computer networks by unlawfully gaining financial profits, assets, or authority that can be monetized. Being connected is now critical. As cyberspace is growing, so are attempts to violate vulnerable information gaining global scale. This newly discovered business dynamic is under persistent threat of criminals. Cyberspace, cyber crime, and cyber security are perceptibly being found in the same sentence. Cyber crime –under defined and under regulated A massive problem encouraging the perseverance and evolution of cyber crime is the lack of an adequate unanimous definition and the under regulation on a national, regional, and global level. Nothing is criminal unless stipulated by the law. Global law enforcement agencies, academia, and state policies have studied the constant development of the phenomenon since its first appearance in 1989, in the shape of the AIDS Trojan virus transferred from an infected floppy disk. Regardless of the bizarre beginnings, there is nothing entertaining about cybercrime. It is serious. It is dangerous. Significant efforts are made to define cybercrime on a conceptual level in academic research and in national and regional cybersecurity strategies. Still, as the nature of the phenomenon evolves, so must the definition. Research reports are still at a descriptive level, and underreporting is a major issue. On the other hand, businesses are more exposed due to ignorance of the fact that modern-day criminals increasingly rely on the Internet to enhance their criminal operations. Case in point: Aaushi Shah and Srinidhi Ravi from the Asian School of Cyber Laws have created a cybercrime list by compiling a set of 74 distinctive and creativelynamed actions emerging in the last three decades that can be interpreted as cybercrime. These actions target anything from e-mails to smartphones, personal computers, and business intranets: piggybacking, joe jobs, and easter eggs may sound like cartoons, but their true nature resembles a crime thriller. The concept of cybercrime Cyberspace is a giant community made out of connected computer users and data on a global level. As a concept, cybercrime involves any criminal act dealing withcomputers andnetworks, including traditional crimes in which the illegal activities are committed through the use of a computer and the Internet. As businesses become more open and widespread, the boundary between data freedom and restriction becomes more porous. Countless e-shopping transactions are made, hospitals keep record of patient histories, students pass exams, and around-the-clock payments are increasingly processed online. It is no wonder that criminals are relentlessly invading cyberspace trying to find a slipping crack. There are no recognizable border controls on the Internet, but a business that wants to evade harm needs to understand cybercrime's nature and apply means to restrict access to certain information. Instead of identifying it as a single phenomenon, Majid Jar proposes a common denominator approach for all ICT-related criminal activities. In his book Cybercrime and Society, Jar refers to Thomas and Loader’s working concept of cybercrime as follows: “Computer-mediated activities which are either illegal or considered illicit by certain parties and which can be conducted through global electronic network.” Jar elaborates the important distinction of this definition by emphasizing the difference between crime and deviance. Criminal activities are explicitly prohibited by formal regulations and bear sanctions, while deviances breach informal social norms. This is a key note to keep in mind. It encompasses the evolving definition of cybercrime, which keeps transforming after resourceful criminals who constantly think of new ways to gain illegal advantages. Law enforcement agencies on a global level make an essential distinction between two subcategories of cybercrime: Advanced cybercrime or high-tech crime Cyber-enabled crime The first subcategory, according to Interpol, includes newly emerged sophisticated attacks against computer hardware and software. On the other hand, the second category contains traditional crimes in modern clothes,for example crimes against children, such as exposing children to illegal content; financial crimes, such as payment card frauds, money laundering, and counterfeiting currency and security documents; social engineering frauds; and even terrorism. We are much beyond the limited impact of the 1989 cybercrime embryo. Intricate networks are created daily. They present new criminal opportunities, causing greater damage to businesses and individuals, and require a global response. Cybercrime is conceptualized as a service embracing a commercial component.Cybercriminals work as businessmen who look to sell a product or a service to the highest bidder. Critical attributes of cybercrime An abridged version of the cybercrime concept provides answers to three vital questions: Where are criminal activities committed and what technologies are used? What is the reason behind the violation? Who is the perpetrator of the activities? Where and how – realm Cybercrime can be an online, digitally committed, traditional offense. Even if the component of an online, digital, or virtual existence were not included in its nature, it would still have been considered crime in the traditional, real-world sense of the word. In this sense, as the nature of cybercrime advances, so mustthe spearheads of lawenforcement rely on laws written for the non-digital world to solve problems encountered online. Otherwise, the combat becomesstagnant and futile. Why – motivation The prefix "cyber" sometimes creates additional misperception when applied to the digital world. It is critical to differentiate cybercrime from other malevolent acts in the digital world by considering the reasoning behind the action. This is not only imperative for clarification purposes, but also for extending the definition of cybercrime over time to include previously indeterminate activities. Offenders commit a wide range of dishonest acts for selfish motives such as monetary gain, popularity, or gratification. When the intent behind the behavior is misinterpreted, confusion may arise and actions that should not have been classified as cybercrime could be charged with criminal prosecution. Who –the criminal deed component The action must be attributed to a perpetrator. Depending on the source, certain threats can be translated to the criminal domain only or expanded to endanger potential larger targets, representing an attack to national security or a terrorist attack. Undoubtedly, the concept of cybercrime needs additional refinement, and a comprehensive global definition is in progress. Along with global cybercrime initiatives, national regulators are continually working on implementing laws, policies, and strategies to exemplify cybercrime behaviors and thus strengthen combating efforts. Types of common cyber threats In their endeavors to raise cybercrime awareness, the United Kingdom'sNational Crime Agency (NCA) divided common and popular cybercrime activities by affiliating themwith the target under threat. While both individuals and organizations are targets of cyber criminals, it is the business-consumer networks that suffer irreparable damages due to the magnitude of harmful actions. Cybercrime targeting consumers Phishing The term encompasses behavior where illegitimate e-mails are sent to the receiver to collect security information and personal details Webcam manager A webcam manager is an instance of gross violating behavior in which criminals take over a person's webcam File hijacker Criminals hijack files and hold them "hostage" until the victim pays the demanded ransom Keylogging With keylogging, criminals have the means to record what the text behind the keysyou press on your keyboard is Screenshot manager A screenshot manager enables criminals to take screenshots of an individual’s computer screen Ad clicker Annoying but dangerous ad clickers direct victims’ computer to click on a specific harmful link Cybercrime targeting businesses Hacking Hacking is basically unauthorized access to computer data. Hackers inject specialist software with which they try to take administrative control of a computerized network or system. If the attack is successful, the stolen data can be sold on the dark web and compromise people’s integrity and safety by intruding and abusing the privacy of products as well as sensitive personal and business information. Hacking is particularly dangerous when it compromises the operation of systems that manage physical infrastructure, for example, public transportation. Distributed denial of service (DDoS) attacks When an online service is targeted by a DDoS attack, the communication links overflow with data from messages sent simultaneously by botnets. Botnets are a bunch of controlled computers that stop legitimate access to online services for users. The system is unable to provide normal access as it cannot handle the huge volume of incoming traffic. Cybercrime in relation to overall computer crime Many moons have passed since 2001, when the first international treatythat targeted Internet and computer crime—the Budapest Convention on Cybercrime—was adopted. The Convention’s intention was to harmonize national laws, improve investigative techniques, and increase cooperation among nations. It was drafted with the active participation of the Council of Europe's observer states Canada, Japan, South Africa, and the United States and drawn up by the Council of Europe in Strasbourg, France. Brazil and Russia, on the other hand, refused to sign the document on the basis of not being involved in the Convention's preparation. In The Understanding Cybercrime: A Guide to Developing Countries(Gercke, 2011), Marco Gercke makes an excellent final point: “Not all computer-related crimes come under the scope of cybercrime. Cybercrime is a narrower notion than all computer-related crime because it has to include a computer network. On the other hand, computer-related crime in general can also affect stand-alone computer systems.” Although progress has been made, consensus over the definition of cybercrime is not final. Keeping history in mind, a fluid and developing approach must be kept in mind when applying working and legal interpretations. In the end, international noncompliance must be overcome to establish a common and safe ground to tackle persistent threats. Cybercrime localized – what is the risk in your region? Europol’s heat map for the period between 2014 and 2015 reports on the geographical distribution of cybercrime on the basis of the United Nations geoscheme. The data in the report encompassed cyber-dependent crime and cyber-enabled fraud, but it did not include investigations into online child sexual abuse. North and South America Due to its overwhelming presence, it is not a great surprise that the North American region occupies several lead positions concerning cybercrime, both in terms of enabling malicious content and providing residency to victims in the regions that participate in the global cybercrime numbers. The United States hosted between 20% and nearly 40% of the total world's command-and-control servers during 2014. Additionally, the US currently hosts over 45% of the world's phishing domains and is in the pack of world-leading spam producers. Between 16% and 20% percent of all global bots are located in the United States, while almost a third of point-of-sale malware and over 40% of all ransomware incidents were detected there. Twenty EU member states have initiated criminal procedures in which the parties under suspicion were located in the United States. In addition, over 70 percent of the countries located in the Single European Payment Area have been subject to losses from skimmed payment cards because of the distinct way in which the US, under certain circumstances, processes card payments without chip-and-PIN technology. There are instances of cybercrime in South America, but the scope of participation by the southern continent is way smaller than that of its northern neighbor, both in industry reporting and in criminal investigations. Ecuador, Guatemala, Bolivia, Peru, and Brazil are constantly rated high on the malware infection scale, and the situation is not changing, while Argentina and Colombia remain among the top 10 spammer countries. Brazil has a critical role in point-of-sale malware, ATM malware, and skimming devices. Europe The key aspect making Europe a region with excellent cybercrime potential is the fast, modern, and reliable ICT infrastructure. According to The Internet Organised Crime Threat Assessment (IOCTA) 2015, Cybercriminals abuse Western European countries to host malicious content and launch attacks inside and outside the continent. EU countries host approximately 13 percent of the global malicious URLs, out of which Netherlands is the leading country, while Germany, the U.K., and Portugal come second, third, and fourth respectively. Germany, the U.K., the Netherlands, France, and Russia are important hosts for bot C&C infrastructure and phishing domains, while Italy, Germany, the Netherlands, Russia, and Spain are among the top sources of global spam. Scandinavian countries and Finland are famous for having the lowest malware infection rates. France, Germany, Italy, and to some extent the U.K. have the highest malware infection rates and the highest proportion of bots found within the EU. However, the findings are presumably the result of the high population of the aforementioned EU countries. A half of the EU member states identified criminal infrastructure or suspects in the Netherlands, Germany, Russia, or the United Kingdom. One third of the European law enforcement agencies confirmed connections to Austria, Belgium, Bulgaria, the Czech Republic, France, Hungary, Italy, Latvia, Poland, Romania, Spain, or Ukraine. Asia China is the United States' counterpart in Asia in terms of the top position concerning reported threats to Internet security. Fifty percent of the EU member states' investigations on cybercrime include offenders based in China. Moreover, certain authorities quote China as the source of one third of all global network attacks. In the company of India and South Korea, China is third among the top-10 countries hosting botnet C&C infrastructure, and it has one of the highest global malware infection rates. India, Indonesia, Malaysia, Taiwan, and Japan host serious bot numbers, too. Japan takes on a significant part both as a source country and as a victim of cybercrime. Apart from being an abundant spam source, Japan is included in the top three Asian countries where EU law enforcement agencies have identified cybercriminals. On the other hand, Japan, along with South Korea and the Philippines, is the most popular country in the East and Southeast region of Asia where organized crime groups run sextortion campaigns. Vietnam, India, and China are the top Asian countries featuring spamming sources. Alternatively, China and Hong Kong are the most prominent locations for hosting phishing domains. From another point of view, the country code top-level domains (ccTLDs) for Thailand and Pakistan are commonly used in phishing attacks. In this region, most SEPA members reported losses from the use of skimmed cards. In fact, five (Indonesia, Philippines, South Korea, Vietnam, and Malaysia) out of the top six countries are from this region. Africa Africa remains renowned for combined and sophisticated cybercrime practices. Data from the Europol heat map report indicates that the African region holds a ransomware-as-a-service presence equivalent to the one of the European black market. Cybercriminals from Africa make profits from the same products. Nigeria is on the list of the top 10 countries compiled by the EU law enforcement agents featuring identified cybercrime perpetrators and related infrastructure. In addition, four out of the top five top-level domains used for phishing are of African origin: .cf, .za, .ga, and .ml. Australia and Oceania Australia has two critical cybercrime claims on a global level: First, the country is present in several top-10 charts in the cybersecurity industry, including bot populations, ransomware detection, and network attack originators. Second, the country-code top-level domain for the Palau Islands in Micronesia is massively used by Chinese attackers as the TLD with the second highest proportion of domains used for phishing. Cybercrime in numbers Experts agree that the past couple of years have seen digital extortion flourishing. In 2015 and 2016, cybercrime reached epic proportions. Although there is agreement about the serious rise of the threat, putting each ransomware aspect into numbers is a complex issue. Underreporting is not an issue only in academic research but also in practical case scenarios. The threat to businesses around the world is growing, because businesses keep it quiet. The scope of extortion is obscured because companies avoid reporting and pay the ransom in order to settle the issue in a conducive way. As far as this goes for corporations, it is even more relevant for public enterprises or organizations that provide a public service of any kind. Government bodies, hospitals, transportation companies, and educational institutions are increasingly targeted with digital extortion. Cybercriminals estimate that these targets are likely to pay in order to protect drops in reputation and to enable uninterrupted execution of public services. When CEOs and CIOs keep their mouths shut, relying on reported cybercrime numbers can be a tricky question. The real picture is not only what is visible in the media or via professional networking, but also what remains hidden and is dealt with discreetly by the security experts. In the second quarter of 2015, Intel Security reported an increase in ransomware attacks by 58%. Just in the first 3 months of 2016, cybercriminals amassed $209 million from digital extortion. By making businesses and authorities pay the relatively small average ransom amount of $10,000 per incident, extortionists turn out to make smart business moves. Companies are not shaken to the core by this amount. Furthermore, they choose to pay and get back to business as usual, thus eliminating further financial damages that may arise due to being out of business and losing customers. Extortionists understand the nature of ransom payment and what it means for businesses and institutions. As sound entrepreneurs, they know their market. Instead of setting unreasonable skyrocketing prices that may cause major panic and draw severe law enforcement action, they keep it low profile. In this way, they maintain the dark business in flow, moving from one victim to the next and evading legal measures. A peculiar perspective – Cybercrime in absolute and normalized numbers “To get an accurate picture of the security of cyberspace, cybercrime statistics need to be expressed as a proportion of the growing size of the Internet similar to the routine practice of expressing crime as a proportion of a population, i.e., 15 murders per 1,000 people per year.” This statement by Eric Jardine from the Global Commission on Internet Governance (Jardine, 2015) launched a new perspective of cybercrime statistics, one that accounts for the changing nature and size of cyberspace. The approach assumes that viewing cybercrime findings isolated from the rest of the changes in cyberspace provides a distorted view of reality. The report aimed at normalizing crime statistics and thus avoiding negative, realistic cybercrime scenarios that emerge when drawing conclusions from the limited reliability of absolute numbers. In general, there are three ways in which absolute numbers can be misinterpreted: Absolute numbers can negatively distort the real picture, while normalized numbers show whether the situation is getting better Both numbers can show that things are getting better, but normalized numbers will show that the situation is improving more quickly Both numbers can indicate that things are deteriorating, but normalized numbers will indicate that the situation is deteriorating at a slower rate than absolute numbers Additionally, the GCIG (Global Commission on Internet Governance) report includes some excellent reasoning about the nature of empirical research undertaken in the age of the Internet. While almost everyone and anything is connected to the network and data can be easily collected, most of the information is fragmented across numerous private parties. Normally, this entangles the clarity of the findings of cybercrime presence in the digital world. When data is borrowed from multiple resources and missing slots are modified with hypothetical numbers, the end result can be skewed. Keeping in mind this observation, it is crucial to emphasize that the GCIG report measured the size of cyberspace by accounting for eight key aspects: The number of active mobile broadband subscriptions The number of smartphones sold to end users The number of domains and websites The volume of total data flow The volume of mobile data flow The annual number of Google searches The Internet’s contribution to GDP It has been illustrated several times during this introduction that as cyberspace grows, so does cybercrime. To fight the menace, businesses and individuals enhance security measures and put more money into their security budgets. A recent CIGI-Ipsos (Centre for International Governance Innovation - Ipsos) survey collected data from 23,376 Internet users in 24 countries, including Australia, Brazil, Canada, China, Egypt, France, Germany, Great Britain, Hong Kong, India, Indonesia, Italy, Japan, Kenya, Mexico, Nigeria, Pakistan, Poland, South Africa, South Korea, Sweden, Tunisia, Turkey, and the United States. Survey results showed that 64% of users were more concerned about their online privacy compared to the previous year, whereas 78% were concerned about having their banking credentials hacked. Additionally, 77% of users were worried about cyber criminals stealing private images and messages. These perceptions led to behavioral changes: 43% of users started avoiding certain sites and applications, some 39% regularly updated passwords, while about 10% used the Internet less (CIGI-Ipsos, 2014). GCIC report results are indicative of a heterogeneous cyber security picture. Although many cybersecurity aspects are deteriorating over time, there are some that are staying constant, and a surprising number are actually improving. Jardine compares cyberspace security to trends in crime rates in a specific country operationalizing cyber attacks via 13 measures presented in the following table, as seen in Table 2 of Summary Statistics for the Security of Cyberspace(E. Jardine, GCIC Report, p. 6):    Minimum Maximum Mean Standard Deviation New Vulnerabilities 4,814 6,787 5,749 781.880 Malicious Web Domains 29,927 74,000 53,317 13,769.99 Zero-day Vulnerabilities 8 24 14.85714 6.336 New Browser Vulnerabilities 232 891 513 240.570 Mobile Vulnerabilities 115 416 217.35 120.85 Botnets 1,900,000 9,437,536 4,485,843 2,724,254 Web-based Attacks 23,680,646 1,432,660,467 907,597,833 702,817,362 Average per Capita Cost 188 214 202.5 8.893818078 Organizational Cost 5,403,644 7,240,000 6,233,941 753,057 Detection and Escalation Costs 264,280 455,304 372,272 83,331 Response Costs 1,294,702 1,738,761 1,511,804 152,502.2526 Lost Business Costs 3,010,000 4,592,214 3,827,732 782,084 Victim Notification Costs 497,758 565,020 565,020 30,342   While reading the table results, an essential argument must be kept in mind. Statistics for cybercrime costs are not available worldwide. The author worked with the assumption that data about US costs of cybercrime indicate costs on a global level. For obvious reasons, however, this assumption may not be true, and many countries will have had significantly lower costs than the US. To mitigate the assumption's flaws, the author provides comparative levels of those measures. The organizational cost of data breaches in 2013 in the United States was a little less than six million US dollars, while the average number on the global level, which was drawn from the Ponemon Institute’s annual Cost of Data Breach Study (from 2011, 2013, and 2014 via Jardine, p.7) measured the overall cost of data breaches, including the US ones, as US$2,282,095. The conclusion is that US numbers will distort global cost findings by expanding the real costs and will work against the paper's suggestion, which is that normalized numbers paint a rosier picture than the one provided by absolute numbers. Summary In this article, we have covered the birth and concept of cyber crime and the challenges law enforcement, academia, and security professionals face when combating its threatening behavior. We also explored the impact of cyber crime by numbers on varied geographical regions, industries, and devices. Resources for Article:  Further resources on this subject: Interactive Crime Map Using Flask [article] Web Scraping with Python [article]
Read more
  • 0
  • 0
  • 15520
Modal Close icon
Modal Close icon