Search icon
Arrow left icon
All Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletters
Free Learning
Arrow right icon
Big Data Analytics with Hadoop 3

You're reading from  Big Data Analytics with Hadoop 3

Product type Book
Published in May 2018
Publisher Packt
ISBN-13 9781788628846
Pages 482 pages
Edition 1st Edition
Languages
Concepts
Author (1):
Sridhar Alla Sridhar Alla
Profile icon Sridhar Alla

Table of Contents (18) Chapters

Title Page
Copyright and Credits
Packt Upsell
Contributors
Preface
Introduction to Hadoop Overview of Big Data Analytics Big Data Processing with MapReduce Scientific Computing and Big Data Analysis with Python and Hadoop Statistical Big Data Computing with R and Hadoop Batch Analytics with Apache Spark Real-Time Analytics with Apache Spark Batch Analytics with Apache Flink Stream Processing with Apache Flink Visualizing Big Data Introduction to Cloud Computing Using Amazon Web Services Index

Chapter 10. Visualizing Big Data

This chapter explores one of the most important activities in big data processing and analysis, which is creating a powerful visualization of data and insights. We tend to understand anything graphical better than anything textual or numerical. During the analytical process, you will need to constantly make sense of data and manipulate its usage and interpretation; this will be much easier if you can visualize the data instead of reading it from tables, columns, or text files. When you have used one of the many ways of analyzing data and generated insights that we have seen so far (such as through Python, R, Spark, Flink, Hive, MapReduce, and so on), anyone trying to make sense of the insights will want to understand those in the context of the data. For this purpose, you need some pictorial representation for that as well.

In a nutshell, the following topics will be covered throughout this chapter:

  • Introduction
  • Tableau
  • Chart types
  • Using Python
  • Using R
  • Data visualization...

Introduction


One of the most valuable means through which we can make sense of big data, and thus make it more useful to most people, is data visualization. Visualization of data depends a lot on the use cases. Graphs and charts are visual representations of data. They provide a powerful means of summarizing and presenting data in a way that most people find easier to comprehend. Charts and graphs enable us to see the main features or characteristics of some data. They not only enable us to present the numerical findings of a study but also provide the shape and pattern of the data, which is critical in data analysis and decision making. There are many key considerations you need to keep in mind when developing data visualizations:

  • What type of graphical representation to use for which type of data
  • How to design a visualization approach that allows interactive features
  • How to search and modify datasets graphically
  • How to differentiate between data and the resultant insights
  • How to develop a visualization...

Tableau


In this section, we will set up Tableau, which is a very popular visualization tool. For this, we can simply download a trial version of Tableau and install it on our local machine. You can find Tableau at https://www.tableau.com/.

The following screenshot shows the download link for Tableau:

Once you've installed the trial version (or if you already have a licensed copy available), you are ready to go through some basic visualization exercises.

The following is a screenshot of the launch of Tableau, where you will see the various sources of data you can start with:

Let's start by opening the file OnlineRetail.csv. The following is a screenshot of the blank worksheet:

Select the Quantity as a column to see a bar chart with one bar, as shown in the following screenshot:

Select the Description as a row to see the bar chart showing the quantity for each item, as follows:

You can apply filters to eliminate the negative quantity values, as shown in this screenshot:

You will see the range of values...

Chart types


A chart can take a large variety of forms; however, there are common features that provide the chart with its ability to extract meaning from data. Typically, the data in a chart is represented graphically, since humans are generally able to infer meanings from pictures quicker than from text. Text is generally used only to annotate the data.

One of the most important uses of text in a graph is the title. A graph's title usually appears above the main graphic and provides a succinct description of what the data in the graph refers to. Dimensions in the data are often displayed on axes. If a horizontal and a vertical axis are used, they are usually referred to as the x axis and y axis respectively. Each axis will have a scale, denoted by periodic graduations and usually accompanied by numerical or categorical indications. Each axis will typically also have a label displayed outside or beside it, briefly describing the dimension represented. If the scale is numerical, the label...

Using Python to visualize data


Python provides many extensive capabilities of analysis of big data as well as the plotting and visualization of data.

Note

Analyzing and Visualizing Big Data using Python is covered in Chapter 4, Scientific Computing and Big Data Analysis with Python and Hadoop.

Here is one such example of using Python, involving a single column:

d8 = pd.DataFrame(df, columns=['Quantity'])[0:100]
d8.plot()

Here, only the first 100 elements are selected to make the graph less crowded and illustrate the example better.

Now, you'll have:

Suppose that you want multiple columns to show up. Look at the following code:

d8 = pd.DataFrame(df, columns=['Quantity', 'UnitPrice'])[0:100]
d8.plot()

Just remember that it will not plot qualitative data columns such as Description but only things that can be graphed, such as Quantity and UnitPrice.

Using R to visualize data


R provides many extensive capabilities for the analysis of big data as well as the plotting and visualization of data.

Note

Analyzing and Visualizing Big Data using R is covered in Chapter 5, Statistical Big Data Computing with R and Hadoop.

Using R, we can also plot a column of choice. Look at this:

plot(df$UnitPrice)
plot(d1, type="b")

Big data visualization tools


A quick survey of the big data tools marketplace reveals the presence of big names, including Microsoft, SAP, IBM, and SAS. But there are plenty of specialist software vendors offering leading big data visualization tools, and these include Tableau, Qlik, and TIBCO. Leading data visualization products include those offered by the following:

Summary


In this chapter, we discussed the power of visualization and various concepts behind a good visualization practice. In the next chapter, we will look at the power of Cloud computing and how it is changing the landscape of big data and big data analytics.

lock icon The rest of the chapter is locked
You have been reading a chapter from
Big Data Analytics with Hadoop 3
Published in: May 2018 Publisher: Packt ISBN-13: 9781788628846
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime}