Effective Business Intelligence with QuickSight

By Rajesh Nadipalli
  • Instant online access to over 7,500+ books and videos
  • Constantly updated with 100+ new titles each month
  • Breadth and depth in over 1,000+ technologies
  1. A Quick Start to QuickSight

About this book

Amazon QuickSight is the next-generation Business Intelligence (BI) cloud service that can help you build interactive visualizations on top of various data sources hosted on Amazon Cloud Infrastructure. QuickSight delivers responsive insights into big data and enables organizations to quickly democratize data visualizations and scale to hundreds of users at a fraction of the cost when compared to traditional BI tools.

This book begins with an introduction to Amazon QuickSight, feature differentiators from traditional BI tools, and how it fits in the overall AWS big data ecosystem. With practical examples, you will find tips and techniques to load your data to AWS, prepare it, and finally visualize it using QuickSight. You will learn how to build interactive charts, reports, dashboards, and stories using QuickSight and share with others using just your browser and mobile app.

The book also provides a blueprint to build a real-life big data project on top of AWS Data Lake Solution and demonstrates how to build a modern data lake on the cloud with governance, data catalog, and analysis. It reviews the current product shortcomings, features in the roadmap, and how to provide feedback to AWS.

Grow your profits, improve your products, and beat your competitors.

Publication date:
March 2017
Publisher
Packt
Pages
262
ISBN
9781786466365

 

Chapter 1.  A Quick Start to QuickSight

We are in an era where data drives business and is also growing exponentially. The organizations that can empower their business with easy-to-use, fast, and real-time data will have the competitive edge over their peers. Amazon QuickSight is the next generation Business Intelligence (BI) application that can help you build interactive visualizations on top of various data sources hosted on Amazon Cloud Infrastructure. QuickSight delivers fast and responsive insights on big data and enables organizations to quickly democratize data visualizations and scale to hundreds of users at a fraction of the cost when compared to traditional BI tools.

In this introduction chapter, we will get an overview of Amazon QuickSight and its differences compared to traditional BI tools. We will cover the following topics:

  • Era of big data

  • Current BI landscape

  • Rise of cloud powered BI

  • Overview of QuickSight

  • How is this different than other BI tools?

  • High level architecture of a BI solution on QuickSight

  • How does one get started with QuickSight?

  • Performing your first analysis in under 60 seconds

 

Era of big data


We live in a digital era where data is generated everywhere, from smart connected devices to social media. In 2014, every second over 5,700 tweets were sent and 800 links were shared using Facebook, and the digital universe expanded by about 1.7 MB per minute for every person on earth (source: IDC 2014 report). This amount of data sharing and storing is unprecedented and is contributing to what is known as big data. In 2013, about 4.4 ZB were created and in 2020 the forecast is 44 ZB, which is 44 trillion GB (source: http://www.emc.com/leadership/digital-universe/2014iview/executive-summary.htm).

Figure 1.1: Data growth

IDC predicts that organizations that are able to analyze this big data and derive actionable insights will see an additional $430 billion in productivity benefits over their peers (source: IDC FutureScape: Worldwide Big Data and Analytics 2016 Predictions, https://www.cloudera.com/content/dam/www/static/documents/analyst-reports/idc-futurescape.pdf).

Let's look at some real use cases that have benefited from big data:

  • IT systems in all major banks are constantly monitoring fraudulent activities and alerting customers within milliseconds. These systems apply complex business rules and analyze historical data, geography, type of vendor, and other parameters based on the customer to get accurate results and protect millions of customers across the globe.

  • Commercial drones are transforming agriculture by analyzing real-time aerial images and identifying the problem areas. These drones are cheaper and more efficient than satellite imagery, as they fly under the clouds and can take images anytime. They identify irrigation issues related to water, pests, or fungal infections, which thereby increases the crop productivity and quality. These drones are equipped with technology to capture high quality images every second and transfer them to a cloud-hosted big data system for further processing (you can refer to http://www.technologyreview.com/featuredstory/526491/agricultural-drones/ for more information).

  • Almost all shopping websites are using recommendation engines to improve customer experience like Amazon, Netflix, and Pandaro. These engines are sophisticated systems that perform big data analysis on historical buying preferences of the customer, ratings from social media, and associations rules from other similar customer purchases. This is now feasible due to advances in big data storage, compute, and in-memory analytics making these systems more intelligent and effective. You can refer to this article for more information at http://www.sas.com/en_us/insights/articles/big-data/recommendation-systems.html.

This unprecedented growth of data has resulted in need for faster insights, quicker co-relations, and the need to democratize data and analysis.  Let's next look at the current BI landscape, key features they provide, and also their limitations.

 

Current BI landscape


Over the last 20 years, several vendors have built products to satisfy BI needs for organizations and this space is pretty crowded with big vendors and niche players that include Tableau desktop, Oracle OBIEE, SAP Business Objects, IBM Cognos, Qlick, and DOMO. Let's review the key features provided by the BI vendors.

Key features provided by BI tools

Most of the features offered by BI vendors fit into one of these categories:

  • Interactive reports: It helps users to analyze data and support decision-making using slice-and-dice, drill down, trends, averages, percentile, and support ad hoc queries with ability to export data/charts.

  • Executive dashboards: It presents data to business leaders in easy-to-understand Key Performance Indicators (KPI) and summary charts/tables. These dashboards are typically refreshed at regular intervals (weekly, monthly, or quarterly) based on the business need.

  • Integrations: Integrates with various data sources to report stats on them and then publish results to websites, e-mails, and social media.

  • Metadata management: It tracks the relationships between datasets, calculations, and hierarchies.

 

Typical process to build visualizations


Let's review the process for creating insights using traditional BI tools like Oracle OBIEE, SAP Business Objects, and IBM Cognos. At a high level, building a BI dashboard involves the following:

  • Ingestion framework to collect data from source systems. These systems are typically files and relational databases.

  • Standardize, clean, and build facts, dimensions, and aggregates based on key performance indicators requested by business.

  • Build BI logical data models; typically star or snowflakes based on various dashboard needs.

  • Build reports and dashboards on the web.

  • Publish and share results with data analysts and business stakeholders.

The preceding data flow is shown in the following diagram and is primarily built by IT with regular consultation with data stewards and dashboard consumers:

Figure 1.2: Process flow for traditional BI tools

Key issues with traditional BI tools

The traditional BI tools have primarily following issues that organizations are facing:

  • BI software is expensive. In a study done by Amazon, a three year Total Cost of Ownership (TCO) is between $150 to $250 per user per month (source: AWS Summit Series 2016, Chicago at https://aws.amazon.com/summits/chicago/).

  • It requires a large IT team to acquire data, model data, build reports, publish and repeat the entire process. A typical BI initiative will require at least 6 months before a production rollout of the dashboard (source: AWS Summit Series 2016, Chicago at https://aws.amazon.com/summits/chicago/).

  • They do not work well with unstructured, NoSQL, and streaming data sources. The old BI tools often require ETL teams to build aggregate data in relational form to report.

  • They do not scale well as data grows, which is required for big data analytics.

  • They do not work well with cloud-hosted data sources like Amazon S3, RDS, and other cloud sources.

 

Rise of cloud BI services


There is an explosion of data being generated by cloud applications like Salesforce.com, Adobe Analytics, and HubSpot. In addition, enterprises are increasingly migrating their data to cloud with services like Dropbox, DocuSign, Google, AWS S3, and AWS RDS. 

While enterprises have adopted Software as a Service (SaaS) for applications; the move to BI as a service has been slower. This trend is about to change and IDC predicts that by 2020, 50% of business analytics software will incorporate predictive analytics based on cloud platforms. 

The definition of cloud BI would vary based on whom you talk to. First you have the traditional enterprise BI vendors; who have ability to host their software on a cloud server and call it a cloud service. Then you have the next generation BI vendors that have designed their software from ground up to use the elastic cloud services of Microsoft, AWS, and Google and offer real cloud BI service. AWS QuickSight falls under the second category of cloud BI vendors. Here are some of the popular cloud BI vendors Microsoft Power BI, Looker, Chartio, and Bime.

There are several operational and financial drivers that work in favor of cloud  BI service, the key ones being:

  • Speed of implementation and deployment: Infrastructure and software are available immediately provisioned exactly as required for the BI service

  • Elasticity: Leveraging the compute power that can be scaled up or down on-demand based on business needs

  • Lower total cost of ownership: It significantly reduces the infrastructure and related operational expenses

  • Improved connectivity: For enterprises that already have moved to cloud for hosting their applications, the cloud BI is evolution and is the next step in their journey to be more cloud native

Next we will look into what makes AWS QuickSight stand out when compared to the others.

 

Overview of QuickSight


The Amazon web services team has been in the forefront of providing real solutions that are designed for massive scale; their S3 storage is the most popular cloud storage service and it is used by companies like Netflix, Nasdaq, Airbnb, and Redfin to name just a few. QuickSight is a new cloud BI service specifically designed to address the need for every day analytics and reporting. QuickSight allows enterprises to get started in minutes, access data from multiple sources, build interactive visuals, get answers fast, and tell a story with data.

While QuickSight has competitors, I believe it is poised to become the leader in cloud powered BI due to the deep connectivity to AWS services, next generation distributed architecture, with an in-memory caching layer on the worlds most scalable infrastructure. Additionally, AWS has made it extremely affordable that competitors will find it hard to match. Let's review in detail how QuickSight differs from other BI tools.

 

How is QuickSight different to other BI tools?


Amazon QuickSight was built to address pain points of the traditional BI tools and provides IT and business teams with a fast, cloud-powered BI service at one-tenth the cost of traditional BI software.

Here are the key features that make QuickSight the next generation cloud-powered BI tool:

  • QuickSight empowers data analysts to build their reports quickly by pointing them to any data source without the need of a large IT team that traditionally has to build metadata in a BI tool before data analysts can use them.

  • QuickSight pricing starts at $9 per user per month and is a complete managed service, which eliminates the need for software install and maintenance.

  • QuickSight has deep integration within AWS data sources including RDS, DynamoDB, Kinesis, S3, Athena, and Redshift. It also supports user file uploads in Excel, CSV, TSV, and can also connect to Salesforce Cloud and on-premise databases. This list will continue to grow.

  • QuickSight has smart visualizations that infer data type and provide suggestions for best possible visualizations. Additionally, QuickSight also suggests relationships between datasets.

  • QuickSight has easy to use browser-based interface and does not require any desktop software to build metadata and/or reports.

  • QuickSight has a special distributed and intelligent caching layer that provides blazing fast performance and response times.

  • QuickSight allows users to share analysis, read only dashboards, and story board with peers.

  • QuickSight has planned integration with a number of partner BI tools like Tibco, Domo, and Qlikview so that enterprises can leverage current investment, but still benefit from the caching and deep connectivity that QuickSight provides.

The following diagram shows the simplified flow from data to visuals using QuickSight:

Figure 1.3: QuickSight simplified process

It has the following four steps, most of which can be done by a power user without the need for personal IT:

  1. Source data from various sources including files, RDBMS on the AWS data platform.

  2. Import data to QuickSight caching layer (SPICE).

  3. Prepare/edit data if required for analysis.

  4. Visualize and publish to dashboards.

 

High level BI solution architecture with QuickSight


Let's take a deeper look into how the complete solution architecture will look with QuickSight. The following diagram shows the high level architecture that takes data from various sources and presents as insights using QuickSight:

Figure 1.4: QuickSight architecture

Let's review this architecture starting from the bottom and going to the top:

  • Data Sources: QuickSight can handle many data sources including files already in S3, files from your laptop in Excel files, standard log files, relational databases including RDS, Aurora, Redshift, DynamoDB, NoSQL databases, and on-premise databases. You can also connect to SaaS like Salesforce directly and report it on QuickSight.

  • Caching layer: Superfast, Parallel, In-memory, Calculation Engine (SPICE) is an in-memory columnar database with a SQL-like interface that provides quick responses to the queries made by the visualization layer. SPICE has APIs and interfaces planned to integrate with partner products like TIBCO, Tableau, and DOMO.

  • Visualization: QuickSight comes with intuitive visualizations with autographs based on automatic data type detection, native mobile user experience, and ability to integrate third-party visualization tools.

Next, let's look at how easy it is to get started and build a dashboard using QuickSight.

 

Getting started with QuickSight


In this section, we will review how to get started with QuickSight and build a real dashboard on data.

Registering for QuickSight

You have two options for signing up to QuickSight. If you are new to AWS, then sign up for Amazon QuickSight with a new AWS account to create an Amazon QuickSight account. If you have an existing AWS account and want to use IAM user credentials to create an Amazon QuickSight account, sign up for Amazon QuickSight with an existing account. Let's review each of these options in detail.

Signing up to QuickSight with a new AWS account

Let's review the steps to sign up to QuickSight with a new AWS account:

  1. Go to http://www.quicksight.aws and choose TRY IT FOR FREE.

  2. You will see a Sign In or Create an AWS Account page. Choose I am a new user and then follow the instructions to create a new account. This step involves receiving a phone call from the system.

  3. After you are done signing up for an AWS account, you are taken to the Amazon QuickSight sign up page, where you specify an account name, an e-mail address for notifications about Amazon QuickSight, and a home AWS region. Here are a few guidelines for completing this form:

    • For account name, type a unique name for your team, for example, YourCompanyName-Marketing-Analytics. Account names can only contain characters (A-Z, or a-z), digits (0-9), and dashes (-).

    • For e-mail address, provide one where Amazon QuickSight should send service and usage notifications.

    • For the region, select the region closest to your physical location, and the same region where you have the majority of your other AWS resources (like Amazon RDS instances).

  4. Click Continue.

  5. Next, you will be redirected to Grant Amazon QuickSight read-only access to AWS resources page, accept the default selections to allow QuickSight to read from Amazon Redshift clusters, Amazon RDS instances, Amazon S3 buckets, and IAM entities you create under your AWS account.

  6. Finally, click on Finish to complete your registration to Amazon QuickSight.

Note

You can next proceed to the Building your first analysis in under 60 seconds section.

Signing up to QuickSight with an existing AWS account

If you already have an AWS account, follow this section to create an Amazon QuickSight account that is connected to your AWS account. You can use your AWS credentials to login or use IAM user credentials.

To use IAM, the user must have a permissions policy attached that includes the following statements:

{
  "Statement": [
    {
      "Action": [
        "iam:ListPolicyVersions",
        "iam:ListAccountAliases",
        "iam:AttachRolePolicy",
        "iam:GetPolicy",
        "iam:GetPolicyVersion",
        "iam:CreateRole",
        "iam:CreatePolicy",
        "iam:CreatePolicyVersion",
        "iam:DeletePolicyVersion",
        "iam:GetRole",
        "iam:ListAttachedRolePolicies",
        "iam:ListRoles",
        "ds:CheckAlias",
        "ds:CreateIdentityPoolDirectory",
        "ds:CreateAlias",
        "ds:AuthorizeApplication",
        "ds:DescribeDirectories",
        "ds:UnauthorizeApplication",
        "ds:DeleteDirectory",
        "quicksight:Subscribe",
        "quicksight:Unsubscribe",
        "s3:ListAllMyBuckets"
      ],
      "Effect": "Allow",
      "Resource": [
        "*"
      ]
    }
  ],
  "Version": "2012-10-17"
}

Let's review the steps to sign up to QuickSight with an existing AWS account:

  1. Go to http://www.quicksight.aws and choose TRY IT FOR FREE.

  2. You will see a Sign In or Create an AWS Account page. Choose I am returning user and enter your username and password.

  3. After you have signed up to your AWS account, you are taken to the Amazon QuickSight sign up page, where you specify an account name, an e-mail address for notifications about Amazon QuickSight, and a home AWS region. Here are a few guidelines for completing this form:

    • For account name, type a unique name for your team, for example, YourCompanyName-Marketing-Analytics. Account names can only contain characters (A-Z, or a-z), digits (0-9), and dashes (-).

    • For e-mail address, provide one where Amazon QuickSight should send service and usage notifications.

    • For the region, select the region closest to your physical location, and the same region where you have the majority of your other AWS resources (like Amazon RDS instances).

  4. Click Continue.

  5. Next, you will be redirected to Grant Amazon QuickSight read-only access to AWS resources page where the following is recommended:

    • Select the option for Amazon Redshift to allow QuickSight to auto-discover Redshift clusters associated with your AWS account.

    • Select the option for Amazon RDS to allow QuickSight to auto-discover RDS instances associated with your AWS account.

    • Leave IAM selected to allow QuickSight to get a list of IAM users associated with your AWS account. This will enable you to invite these users to access this Amazon QuickSight account.

  6. Finally, click on Finish to complete your registration to Amazon QuickSight.

Note

You can now proceed to the, Building your first analysis under 60 seconds section.

 

Building your first analysis under 60 seconds


With QuickSight, it is really easy to build your analysis with minimal effort. Let's do a test drive of QuickSight and build our first analysis using data from the US Department of Education that provides information about college tuition across all the states in the USA.

Downloading data

The dataset is available from the following public URL: https://catalog.data.gov/dataset/college-scorecard. Click on the Download icon as shown in the following screenshot to download raw data:

Figure 1.5: College scores raw data

Preparing data

The dataset has several files, one per each calendar year; for this demo, we will use the file MERGED2013_PP.csv. To simplify the analysis, I have selected a subset of the columns, changed all NULL to blank, changed PrivacySuppressed to blank, and uploaded this file to the following GitHub location: https://github.com/rnadipalli/quicksight/blob/master/sampledata/MERGED2013_PP.csv.

QuickSight navigation

Once you have registered and started QuickSight you will see the home page. Let's review the key navigation icons that you need to get used to, as shown in the next screenshot:

  • The QuickSight icon in the top-left is a quick way to get back to the home page

  • To upload new data, click on Manage data in the top-right

  • To create a new analysis, click on New analysis on the left-hand side below the QuickSight logo

  • To manage your account settings, click on the person icon in the top-right corner

    Figure 1.6: QuickSight navigation

Loading data to QuickSight

Let's explore the steps to load our data to QuickSight:

  1. From the QuickSight home page, click on Manage data icon.

  2. Next, click on the New data set icon and you will see Create a Data Set page with several options, as shown in the following screenshot. Select the Upload a file option and upload the MERGED2013_PP.csv file from your local desktop to AWS QuickSight:

    Figure 1.7: Uploading a CSV file

  3. After you have successfully uploaded the MERGED2013_PP.csv file, you will see a confirmation screen from QuickSight, as shown in the following screenshot. Click on the Next button to accept the defaults:

    Figure 1.8: Confirm upload file settings

  4. After the confirmation page, QuickSight imports the data to SPICE and provides quick access to visualization of the data, as shown in the following screenshot. Click on the Visualize button and then proceed to the next section:

    Figure 1.9: Data source details

Starting your visualizations

Now you are ready to start visualizing data using the built-in charts in QuickSight. Let's see how to create our first useful chart, which is also demonstrated in the next screenshot. Follow the steps to create a chart showing the average tuition fees by state:

  1. First select the horizontal bar chart from the Visual types.

  2. Next select the STABBR as the Y axis and TUITFTE as the Value field.

  3. Next in the Field wells option, change the Aggregate type of the value from the default Sum to Average.

The visualization is complete, as shown in the following screenshot, and now you can explore the chart and get more insights from the data:

Figure 1.10: Bar chart

Building multiple visualizations

QuickSight additionally supports line graphs, area line charts, scatter plots, heat maps, pie graphs, tree maps, and pivot tables. You can next add another visual for the same dataset pretty easily and get further understanding of the data. In the following steps, we will see how to build a pie chart that shows the sum of in-state tuition by city:

  1. Click on the + to add a new visual.

  2. Select the pie chart from the Visual types.

  3. Next select the CITY as the Group/Color and TUTIONFEE_IN as the Value field.

  4. Notice the default aggregation for value is automatically set to Sum.

The visualization is complete, as shown in the following screenshot, and now you can explore the chart and get more insights from the data:

Figure 1.11: Pie chart

 

Summary


We live and breathe data with every action in our life, and organizations that have the ability to analyze relevant data in a timely manner will save an additional $430 billion by 2020 over their peers. While there is a need, the traditional BI platforms are slow, expensive, and are not designed to handle the volume, variety, and velocity needs that organizations face today.

Amazon QuickSight is the innovative and next-generation, cloud-hosted BI platform that addresses short falls of traditional BI systems and is at an extremely competitive price from $9 per user per month. QuickSight can source data from various sources including relational databases, files, streaming, and NoSQL databases. QuickSight also comes with an in-memory caching layer that can cache and calculate aggregates on the fly. With QuickSight, data analysts are truly empowered and can build intuitive reports in 60 seconds without any significant setup by IT. In the next chapter, we will look into the details of onboarding various data sources that are supported by QuickSight.

About the Author

  • Rajesh Nadipalli

    Rajesh Nadipalli is currently Director, Professional Services and Support at Zaloni, an award-wining provider of enterprise data lake management solutions that enables global clients to innovate and leverage big data for business impact. Rajesh leads Hadoop-based technical proof-of-concepts, strategy, solution architectures, and post-sales product support for his clients. His clientele includes AIG, NBCU, Verizon, Du, American Express, Netapp, Dell-EMC, United Health Group, and Cisco. In his previous role as the director of product management, he was leading the product strategy, roadmap, and feature definitions for Zaloni's Hadoop data management platform.

    Throughout his 20 plus years in IT, Rajesh has had a passion for data and held various roles as big data architect, solutions architect, database administrator (DBA), business intelligence architect, and Etldeveloper. He believes in using technology as a strategic advantage for his clients by improving productivity, performance, and real-time insight to relevant data.

    Rajesh is also the author of HDInsight Essentials, by Packt publishing, which takes you through the journey of building a modern data lake architecture using HDInsight, a Hadoop-based service that allows you to successfully manage high volume and velocity data in Azure Cloud.

    He is a regular blogger and his articles are published in Zaloni blog, Datafloq, and Dzone sites.

    He holds a MBA from North Carolina State University and a BS in EE from University of Mumbai, India.

    Browse publications by this author