Preparation Analysis of Data Source

Exclusive offer: get 50% off this eBook here
PowerPivot for Advanced Reporting and Dashboards

PowerPivot for Advanced Reporting and Dashboards — Save 50%

Leverage the capabilities of the Microsoft Excel PowerPivot add-on to create advanced BI solutions with this book and ebook

$23.99    $12.00
by Robert Bosco J | November 2013 | Open Source

In this article by Robert Bosco J, author of PowerPivot and Advanced Reporting and Dashboards, we will learn to explore ways to prepare data for analysis. Different types of data sources that can be imported into the PowerPivot interface have been described. A general overview of the ETL process has also been given.

The following topics will be covered in this article:

  • List of data sources
  • Purpose of import data from a variety of sources
  • General overview of ETL

(For more resources related to this topic, see here.)

List of data sources

Here, the wide varieties of data sources that are supported in the PowerPivot interface are given in brief. The vital part is to install providers such as OLE DB and ODBC that support the existing data source, because when installing the PowerPivot add-in, it will not install the provider too and some providers might already be installed with other applications.

For instance, if there is a SQL Server installed in the system, the OLE DB provider will be installed with the SQL Server, so that later on it wouldn't be necessary to install OLE DB while adding data sources by using the SQL Server as a data source. Hence, make sure to verify the provider before it is added as a data source.

Perhaps the provider is required only for relational database data sources.

Relational database

By using RDBMS, you can import tables and views into the PowerPivot workbook. The following is a list of various data sources:

  • Microsoft SQL Server
  • Microsoft SQL Azure
  • Microsoft SQL Server Parallel Data Warehouse
  • Microsoft Access
  • Oracle
  • Teradata
  • Sybase
  • Informix
  • IBM DB2
  • Other providers (OLE DB / ODBC)

Multidimensional sources

Multidimensional data sources can only be added from Microsoft Analysis Services (SSAS).

Data feeds

The three types of data feeds are as follows:

  • SQL Server Reporting Service (SSRS)
  • Azure DataMarket dataset
  • Other feeds such as atom service documents and single data feed

Text files

The two types of text files are given as follows:

  • Excel file
  • Text file

Purpose of importing data from a variety of sources

In order to make a decision about a particular subject area, you should analyze all the required data that is relevant to the subject area. If the data is stored in a variety of data sources, importing the data from different data sources has to be done. If all the data is only in one data source, only the data needs to be imported from the required tables for the subject and then various types of analysis can be done.

The reason why users need to import data from different data sources is that they would then have an ample amount of data when they need to make any analysis. Another generic reason would be to cross-analyze data from different business systems such as Customer Relationship Management (CRM) and Campaign Management System (CMS). Data sourcing from only one source wouldn't be as sophisticated as an analysis done from different data sources as the amount of data from which the analysis was done for multisourced data is more detailed than the data from only a single source. It also might reveal conflicts between multiple data sources.

In real time, usually in the e-commerce industry, blogs, and forum websites wouldn't ask for more details about customers at the time of registration, because the time consumed for long registrations would discourage the user, leading to cancellation the of the order.

For instance, the customer table that would be stored in the database of an e-commerce industry would contain the following attributes:

Customer

FirstName

LastName

E-mail

BirthDate

Zip Code

Gender

However, this kind of industry needs to know their customers more in order to increase their sales. Since the industry only saves a few attributes about the customer during registration, it is difficult to track down the customers and it is even more difficult to advertise according to individual customers. Therefore, in order to find some relevant data about the customers, the e-commerce industries try to make another data source using the Internet or other types of sources.

For instance, by using the Postcode given by the customer during registration, the user can get Country | State | City from various websites and then use the information obtained to make a table format either in Excel or CSV, as follows:

Location

Postcode

City

State

Country

So, finally the user would have two sources—one source is from the user's RDBMS database and the other source is from the Excel file created later—both of these can be used in order to make a customer analysis based on their location.

General overview of ETL

The main goal is to facilitate the development of data migration applications by applying data transformations.

Extraction Transform Load (ETL) comprises of the first step in a data warehouse process. It is the most important and the hardest part, because it determines the quality of the data warehouse and the scope of the analyses the users will be able to build upon it.

Let us discuss in detail what ETL is.

The first substep is extraction. As the users would want to regroup all the information a company has, they will have to collect the data from various heterogeneous sources (operational data such as databases and/or external data such as CSV files) and various models (relational, geographical, web pages, and so on).

The difficulty starts here. As data sources are heterogeneous, formats are usually different, and the users would have different types of data models for similar information (different names and/or formats) and different keys for the same objects. These are some of the main problems.

The aim of the transformation task is to correct such problems, as much as possible.

Let us now see what a transformation task is.

The users would need to transform heterogeneous data of different sources into homogeneous data. Here are some examples of what they can do:

  • Extraction of the data sources
  • Identification of relevant data sources
  • Filtering of non-admissible data
  • Modification of formats or values

Summary

This article shows the users how to prepare data for analysis. We also covered different types of data sources that can be imported into the PowerPivot interface. Some information about the provider and a general overview of the ETL process has also been given. Users now know how the ETL process works in the PowerPivot interface; also, users were shown an introduction and the advantages of DAX.

Resources for Article:


Further resources on this subject:


PowerPivot for Advanced Reporting and Dashboards Leverage the capabilities of the Microsoft Excel PowerPivot add-on to create advanced BI solutions with this book and ebook
Published: November 2013
eBook Price: $23.99
Book Price: $39.99
See more
Select your format and quantity:

About the Author :


Robert Bosco J

Robert Bosco J is a BI Reports Developer and holds an international Master's degree in Business Intelligence from Université Tours François Rabelais, Blois, France, along with an internship as a BI Analyst at Yomii (a French e-commerce company for design products) in Paris, France. He has a Bachelor of Engineering degree in Computer Science from the Oxford Engineering college, Tiruchirappalli, India. Robert also worked as a Data Processing Executive at Meritgroup Ltd., India, before pursuing his Master's.

Books From Packt


 Instant Creating Data Models with PowerPivot How-to [Instant]
Instant Creating Data Models with PowerPivot How-to [Instant]

 Building Dashboards with Microsoft Dynamics GP 2013 and Excel 2013
Building Dashboards with Microsoft Dynamics GP 2013 and Excel 2013

 Excel 2010 Financials Cookbook
Excel 2010 Financials Cookbook

 Excel Programming with VBA Starter [Instant]
Excel Programming with VBA Starter [Instant]

 Microsoft Dynamics CRM 2011: Dashboards Cookbook
Microsoft Dynamics CRM 2011: Dashboards Cookbook

 Sage ACT! 2011 Dashboard and Report Cookbook
Sage ACT! 2011 Dashboard and Report Cookbook

Instant HubSpot Dashboard Customization [Instant]
Instant HubSpot Dashboard Customization [Instant]

 IBM Cognos Business Intelligence 10.1 Dashboarding Cookbook
IBM Cognos Business Intelligence 10.1 Dashboarding Cookbook


No votes yet

Post new comment

CAPTCHA
This question is for testing whether you are a human visitor and to prevent automated spam submissions.
2
h
L
R
k
M
Enter the code without spaces and pay attention to upper/lower case.
Code Download and Errata
Packt Anytime, Anywhere
Register Books
Print Upgrades
eBook Downloads
Video Support
Contact Us
Awards Voting Nominations Previous Winners
Judges Open Source CMS Hall Of Fame CMS Most Promising Open Source Project Open Source E-Commerce Applications Open Source JavaScript Library Open Source Graphics Software
Resources
Open Source CMS Hall Of Fame CMS Most Promising Open Source Project Open Source E-Commerce Applications Open Source JavaScript Library Open Source Graphics Software