Microsoft Power BI Complete Reference

4.3 (6 reviews total)
By Devin Knight , Brian Knight , Mitchell Pearson and 2 more
  • Instant online access to over 7,500+ books and videos
  • Constantly updated with 100+ new titles each month
  • Breadth and depth in over 1,000+ technologies
  1. Getting Started with Importing Data Options

About this book

Microsoft Power BI Complete Reference Guide gets you started with business intelligence by showing you how to install the Power BI toolset, design effective data models, and build basic dashboards and visualizations that make your data come to life.

In this Learning Path, you will learn to create powerful interactive reports by visualizing your data and learn visualization styles, tips and tricks to bring your data to life. You will be able to administer your organization's Power BI environment to create and share dashboards. You will also be able to streamline deployment by implementing security and regular data refreshes.

Next, you will delve deeper into the nuances of Power BI and handling projects. You will get acquainted with planning a Power BI project, development, and distribution of content, and deployment. You will learn to connect and extract data from various sources to create robust datasets, reports, and dashboards. Additionally, you will learn how to format reports and apply custom visuals, animation and analytics to further refine your data.

By the end of this Learning Path, you will learn to implement the various Power BI tools such as on-premises gateway together along with staging and securely distributing content via apps.

This Learning Path includes content from the following Packt products:

  • Microsoft Power BI Quick Start Guide by Devin Knight et al. 
  • Mastering Microsoft Power BI by Brett Powell
Publication date:
December 2018
Publisher
Packt
Pages
794
ISBN
9781789950045

 

Chapter 1. Getting Started with Importing Data Options

Power BI may very well be one of the most aptly named tools ever developed by Microsoft, giving analysts and developers a powerful business intelligence and analytics playground while still packaging it in a surprisingly lightweight application. Using Microsoft Power BI, the processes of data discovery, data modeling, data visualization, and sharing are made elegantly simple using a single product. These processes are so commonplace when developing Power BI solutions that this book has adopted sections that follow this pattern. However, from your perspective, the really exciting thing may be that development problems that would previously take you weeks to solve in a corporate BI solution can now be accomplished in only hours. 

Power BI is a Software as a Service (SaaS) offering in the Azure cloud, and, as such, the Microsoft product team follows a strategy of cloud first as they develop and add new features to the product. However, this does not mean that Power BI is only available in the cloud. Microsoft presents two options for sharing your results with others. The first, most often-utilized method is the cloud-hosted Power BI Service, which is available to users for a low monthly subscription fee. The second option is the on-premises Power BI Report Server, which can be obtained through either your SQL Server licensing with Software Assurance or a subscription level known as Power BI Premium. Both solutions require a development tool called Power BI Desktop, which is available for free, and is where you must start to design your solutions.

 

Using the Power BI Desktop application enables you to define your data discovery and data preparation steps, organize your data model, and design engaging data visualizations on your reports. In this first chapter, the development environment will be introduced, and the data discovery process will be explored in depth. The topics detailed in this chapter include the following:

  • Getting started
  • Importing data
  • Direct query
  • Live Connection
 

Getting started


The Power BI Desktop is available free and can be found via a direct download link at Power BI(https://powerbi.microsoft.com/), or by installing it as an app from Windows Store. There are several benefits in using the Windows Store Power BI app, including automatic updates, no requirement for admin privileges, and making it easier for planned IT roll-out of Power BI.

Note

If you are using the on-premises Power BI Report Server for your deployment strategy, then you must download a different Power BI Desktop, which is available by clicking the advanced download options at https://powerbi.microsoft.com/en-us/report-server/. A separate install is required because updates are released more often to Power BI in the cloud. This book will be written primarily under the assumption that the reader is using the cloud-hosted Power BI Service as their deployment strategy. 

Once you download, install, and launch the Power BI Desktop, you will likely be welcomed by the Start screen, which is designed to help new users find their way. Close this start screen so we can review some of the most commonly used features of the application:

Power BI Desktop

Following the numbered figures, let's learn the names and purposes of some of the most important features in the Power BI Desktop:

  • Get Data: Used for selecting and configuring data sources. 
  • Edit Queries: Launches the Power Query Editor, which is used for applying data transformations to incoming data.
  • Report View: The report canvas used for designing data visualizations. This is the default view open when the Power BI Desktop is launched. 
  • Data View: Provides a view of the data in your model. This looks similar to a typical Excel spreadsheet, but it is read-only.
  • Relationship View: Primarily used when your data model has multiple tables and relationships need to be defined between them.

 

 

 

Importing data


Power BI is best known for the impressive data visualizations and dashboard capabilities it has.  However, before you can begin building reports, you first need to connect to the necessary data sources. Within the Power BI Desktop, a developer has more than 80 unique data connectors to choose from, ranging from traditional file types, database engines, big data solutions, cloud sources, data stored on a web page, and other SaaS providers. This book will not cover all 80 connectors that are available, but it will highlight some of the most popular.

When establishing a connection to a data source, you may be presented with one of three different options on how your data should be treated: Import, DirectQuery, or Live Connection. This section will focus specifically on the Import option.

Choosing to import data, which is the most common option, and default behavior, means that Power BI will physically extract rows of data from the selected source and store it in an in-memory storage engine within Power BI. The Power BI Desktop uses a special method for storing data, known as xVelocity, which is an in-memory technology that not only increases the performance of your query results but can also highly compress the amount of space taken up by your Power BI solution. In some cases, the compression that takes place can even lower the disk space required up to one-tenth of the original data source size. The xVelocity engine uses a local unseen instance of SQL Server Analysis Services (SSAS) to provide these in-memory capabilities. 

There are consequences to using the import option within Power BI that you should also consider. These consequences will be discussed later in this chapter, but as you read on, consider the following:

  • How does data that has been imported into Power BI get updated?
  • What if I need a dashboard to show near real-time analytics?
  • How much data can really be imported into an in-memory storage system?

Excel as a source

Believe it or not, Excel continues to be the most popular application in the world and as such, you should expect that at some point you will be using it as a data source:

  1. To get started, open the Power BI Desktop and close the start-up screen if it automatically appears.
  1. Under the Home ribbon, you will find that Get Data button, which you already learned is used for selecting and configuring data sources. Selecting the down arrow next to the button will show you the most common connectors, but selecting the center of the button will launch the full list of all available connectors. Regardless of which way you select the button, you will find Excel at the top of both lists.
  2. Navigate to and open the file called AdventureWorksDW.xlsx from the book resources. This will launch the Navigator dialog, which is used for selecting the objects in the Excel workbook you desire to take data from:
  1. In this example, you see six separate spreadsheets you can choose from. Clicking once on the spreadsheet name will give you a preview of the data it stores, while clicking the checkbox next to the name will include it as part of the data import. For this example, select the checkboxes next to all of the available objects, then notice the options available in the bottom right.
  2. Selecting Load will immediately take the data from the selected spreadsheets and import them as separate tables in your Power BI data model. Choosing Edit will launch an entirely new window called the Power Query Editor that allows you to apply business rules or transforms to your prior to importing it. You will learn much more about the Power Query Editor in Chapter 2, Data Transformation Strategies. Since you will learn more about this later, simply select Load to end this example.

Another topic you will learn more about in Chapter 6Using a Cloud Deployment with the Power BI Service, is the concept of data refreshes. This is important because, when you import data into Power BI, that data remains static until another refresh is initiated. This refresh can either be initiated manually or set on a schedule. This also requires the installation of a Data Gateway, the application in charge of securely pushing data into the Power BI Service. Feel free to skip to Chapter 6Using a Cloud Deployment with the Power BI Service, if configuring a data refresh is a subject you need to know now.

SQL Server as a source

Another common source designed for relational databases is Microsoft SQL Server:

  1. To connect to SQL Server, select the Get Databutton again, but this time choose SQL Server. Here, you must provide the server, but the database is optional and can be selected later:
  1. For the first time, you are asked to choose the type of Data Connectivity mode you would like. As mentioned previously, Import is the default mode, but you can optionally select DirectQuery. DirectQuery will be discussed in greater detail later in this chapter. Expanding the Advanced options provides a way to insert a SQL statement that may be used as your source. For the following example, in the server is the only one property populated before clicking OK:
  1. Next, you will be prompted to provide the credentials you are using to connect to the database server you provided on the previous screen.
  2. Click Connect after providing the proper credentials to launch the same Navigator dialog that you may remember from when you connected to Excel. Here, you will select the tables, views, or functions within your SQL Server database that you desire to import into your Power BI solution. Once again, the final step in this dialog allows you to choose to either Load or Edit the results.

Web as a source

One pleasant surprise to many Power BI Developers is the availability of a web connector. Using this connection type allows you to source data from files that are stored on a website or even data that has been embedded into an HTML table on the web page. Using this type of connector can often be helpful when you would like to supplement your internal corporate data sources with information that can be publicly found on the internet.

For this example, imagine you are working for a major automobile manufacturer in the United States. You have already designed a Power BI solution using data internally available within your organization that shows historical patterns in sales trends. However, you would like to determine whether there are any correlations in periods of historically higher fuel prices and lower automobile sales. Fortunately, you found that the United States Department of Labor publicly posts historical average consumer prices of many commonly purchased items, including fuel prices.

  1. Now that you understand the scenario within the Power BI Desktop, select the Get Data button and choose Web as your source. You will then be prompted to provide the URL where the data can be found. In this example, the data can be found by searching on the website Data.Gov (https://www.data.gov/) or, to save you some time, use the direct link: https://download.bls.gov/pub/time.series/ap/ap.data.2.Gasoline. Once you provide the URL, click OK:
  1. Next, you will likely be prompted with an Access Web Content dialog box. This is important when you are using a data source that requires a login to access. Since this data source does not require a login to find the data, you can simply select anonymous access, which is the default, and then click Connect:

Notice on the next screen that the Power BI Desktop recognizes the URL provided as a tab-delimited file that can now easily be added to any existing data model you have designed.

 

DirectQuery


Many of you have likely been trying to envision how you may implement these data imports in your environment. You may ask yourself questions such as the following:

  • If data imported into Power BI uses an in-memory technology, did my company provide me a machine that has enough memory to handle this? 
  • Am I really going to import my source table with tens of billions of rows into memory?
  • How do I handle a requirement of displaying results in real time from the source?

These are all excellent questions that would have many negative answers if the only way to connect to your data was by importing your source into Power BI. Fortunately, there is another way. Using DirectQuery, Power BI allows you to connect directly to a data source so that no data is imported or copied into the Power BI Desktop.

Why is this a good thing? Consider the questions that were asked at the beginning of this section. Since no data is imported to the Power BI Desktop, that means it is less important how powerful your personal laptop is because all query results are now processed on the source server instead of your laptop. It also means that there is no need to refresh the results in Power BI because any reports you design are always pointing to a live version of the data source. That's a huge benefit!

Enabling this feature can be done by simply selecting DirectQuery during the configuration of a data source. The following screenshot shows a connection to an SQL Server database with the DirectQuery option selected:

Earlier in this chapter, the Data Gateway application was mentioned as a requirement to schedule data refreshes for sources that used the import option. This same application is also needed with DirectQuery if your data is an on-premises source. Even though there is no scheduled data refresh, the Data Gateway is still required to push on-premises data into the cloud. Again, this will be discussed in more depth inChapter 6, Using a Cloud Deployment with the Power BI Service.

Limitations

So, if DirectQuery is so great, why not choose it every time? Well, with every great feature you will also find limitations. The first glaring limitation is that not all data sources support DirectQuery. As of the time this book was written, the following data sources support DirectQuery in Power BI:

  • Amazon Redshift
  • Azure HDInsight Spark 
  • Azure SQL Database
  • Azure SQL Data Warehouse
  • Google BigQuery 
  • IBM Netezza 
  • Impala (Version 2.x)
  • Oracle Database (Version 12 and above)
  • SAP Business Warehouse Application Server
  • SAP Business Warehouse Message Server
  • SAP HANA
  • Snowflake
  • Spark  (Version 0.9 and above)
  • SQL Server
  • Teradata Database
  • Vertica 

Depending on the data source you choose, there is a chance of slower query performance when using DirectQuery compared to the default data import option. Keep in mind that when the import option is selected it leverages a highly sophisticated in-memory storage engine. When selecting DirectQuery, performance will depend on the source type you have chosen from the list above.

Another limitation worth noting is that not all Power BI features are supported when you choose DirectQuery. For example, depending on the selected source, some the Power Query Editor features are disabled and could result in the following message: This step results in a query that is not supported in DirectQuery mode. Another example is that some DAX functions are unavailable when using DirectQuery. For instance, several Time Intelligence functions such as TotalYTD would generate the following type error when using DirectQuery:

The reason for this limitation is because DirectQuery automatically attempts to convert DAX functions such as this one to a query in the data source's native language. So, if the source of this solution was SQL Server, then Power BI would attempt to convert this DAX function into a comparable T-SQL script. Once Power BI realizes the DAX function used is not compatible with the source, the error is generated.

Note

You can turn on functions that DirectQuery blocks by going to File | Optionsandsettings | Options | DirectQuery | Allow restricted measures in DirectQuery Mode. When this option is selected, any DAX expressions that are valid for a measure can be used. However, you should know that selecting this can result in very slow query performance when these blocked functions are used.

 

 

Live Connection


The basic concept of Live Connection is very similar to that of DirectQuery. Just like DirectQuery, when you use a Live Connection no data is actually imported into Power BI. Instead, your solution points directly to the underlying data source and leverages Power BI Desktop simply as a data visualization tool. So, if these two things are so similar, then why give them different names? The answer is because even though the basic concept is the same, DirectQuery and Live Connection vary greatly.

One difference that should quickly be noticeable is the query performance experience. It was mentioned in the last section that DirectQuery can often have poor performance depending on the data source type. With Live Connection, you generally will not have any performance problem because it is only supported by the following types of data sources:

  • SQL Server Analysis Services Tabular
  • SQL Server Analysis Services Multidimensional
  • Power BI Service

The reason performance does not suffer from these data sources is because they either use the same xVelocity engine that Power BI does, or another high-performance storage engine. To set up your own Live Connection to one of these sources, you can choose the SQL Server Analysis Services database from the list of sources after selecting Get Data. Here, you can specify that the connection should be live:

Note

If a dataset is configured for a Live Connection or DirectQuery, then you can expect automatic refreshes to occur approximately every hour or when interaction with the data occurs. You can manually adjust the refresh frequency in the Scheduled cache refresh option in the Power BI service.

Limitations

So far, this sounds great! You have now learned that you can connect directly to your data sources, without importing data into your model, and you won't have significant performance consequences. Of course, these benefits don't come without giving something up, so what are the limitations of a Live Connection? 

What you will encounter with Live Connections are limitations that are generally a result of the fact that Analysis Services is an Enterprise BI tool. Thus, if you are going to connect to it, then it has probably already gone through significant data cleansing and modeling by your IT team. 

Modeling capabilities such as defining relationships are not available because these would be designed in an Analysis Services Model. Also, the Power Query Editor is not available at all against a Live Connection source. While at times this may be frustrating, it does make sense that it works this way because any of the changes you may desire to make with relationships or in the query editor should be done in Analysis Services, not Power BI.

 

Which should I choose?


Now that you have learned about the three different ways to connect to your data, you're left to wonder which option is best for you. It's fair to say that the choice you make will really depend on the requirements of each individual project you have. To summarize, some of the considerations that were mentioned in this chapter are listed in the following table:

Consideration

Import Data

DirectQuery

Live Connection

Best performance

X

X

Best design experience

X

Best for keeping data up-to-date

X

X

Data sources availability

X

Most scalable

 

X

X

 

 

Some of these items to consider may be more important than others to you. So, to make this more personal, try using the Decision Matrix file that is included with this book. In this file, you can rank (from 1 to 10) the importance of each of these considerations to help give you some guidance on which option is best for you.

Since the Data Import option presents the most available features, going forward, this book primarily uses this option. In Chapter 2, Data Transformation Strategies, you will learn how to implement data transformation strategies to ensure all the necessary business rules are applied to your data.

 

Summary


Power BI provides users a variety of methods for connecting to data sources with natively built-in data connectors. The connector you choose for your solution will depend on where your data is located. Once you connect to a data source, you can decide on the  type of query mode that best suits your needs. Some connectors allow for zero latency in your results with the options of Direct Query or Live Connection. In this chapter, you learned about the benefits and disadvantages of each query mode, and you were given a method for weighting these options using a decision matrix. In the next chapter, you will learn more about how data transformations may be applied to your data import process so that incoming data will be properly cleansed.

About the Authors

  • Devin Knight

    Devin Knight a Microsoft Data Platform MVP and the Training Director at Pragmatic Works. At Pragmatic Works, Devin determines which courses are created, delivered, and updated for customers, including 10+ Power BI courses. This is his seventh SQL Server and Business Intelligence book to author. Devin often speaks at conferences like PASS Summit, PASS Business Analytics Conference, SQL Saturdays, and Code Camps. He is also a contributing member to several PASS Virtual Chapters. Making his home in Jacksonville, FL, Devin is the Vice President of the local Power BI User Group and SQL Server User Group (JSSUG). His personal blog can be found at Devin Knight's website.

    Browse publications by this author
  • Brian Knight

    Brian Knight is the owner and founder of Pragmatic Works, and is a serial entrepreneur, starting up other companies. Brian is a contributing columnist at several technical magazines. He is the author of 16 technical books. Brian has spoken at conferences like PASS Summit, SQL Connections, TechEd, SQLSaturdays, and Code Camps. He has received a number of awards from the State of Florida, governor, and press, including the Business Ambassador Award (Governor) and Top CEO (Jacksonville Magazine). His blog can be found at Pragmatic Works website.

    Browse publications by this author
  • Mitchell Pearson

    Mitchell Pearson has worked for Pragmatic Works for six years as a Business Intelligence Consultant and Training Content manager. Mitchell has experience developing enterprise level BI Solutions using the full suite of products offered by Microsoft (SSRS, SSIS, SSAS, & Power BI). Mitchell is very active in the community presenting at local user groups, SQL Saturday events, PASS virtual chapters and giving free webinars for Pragmatic Works. Mitchell can also be found blogging at Mitchellsql website. Mitchell is also the president of the local Power BI User Group in Jacksonville, Florida. In his spare time Mitchell spends his time with his wife and three kids. For fun Mitchell enjoys playing table top games with friends.

    Browse publications by this author
  • Manuel Quintana

    Manuel Quintana is a Training Content Manager at Pragmatic Works. Previously he was senior manager working in the hotel industry. He joined the Pragmatic Works team in 2014 with no knowledge in the Business Intelligence space but now speaks at SQL Saturday's and SQL Server User Groups locally and virtually. He also teaches various BI technologies to many different fortune 500 companies on behalf of Pragmatic Works. Since 2014 he has called Jacksonville home and before that Orlando but he was born on the island of Puerto Rico and loves to go back and visit his family. When he isn't working on creating new content for Pragmatic Works you can probably find him playing board games or watching competitive soccer matches.

    Browse publications by this author
  • Brett Powell

    Brett Powell is the owner of Frontline Analytics, a data and analytics consulting firm and Microsoft Power BI partner. He has worked with Power BI technologies since they were first introduced with the Power Pivot add-in for Excel 2010 and has contributed to the design and delivery of Microsoft BI solutions across retail, manufacturing, finance, and professional services. He is also the author of Microsoft Power BI Cookbook and a regular speaker at Microsoft technology events such as the Power BI World Tour and the Data & BI Summit. He regularly shares technical tips and examples on his blog, Insight Quest, and is a co-organizer of the Boston BI User Group.

    Browse publications by this author

Latest Reviews

(6 reviews total)
Books with relevant content.
Good Quality and good examples with code
Excellent service. The download was flawless.

Recommended For You

Book Title
Access this book, plus 7,500 other titles for FREE
Access now