Over the years most businesses have spent considerable amount of time, money, and effort in building databases, reporting systems, and Business Intelligence (BI) systems. IT often thinks that they are providing the necessary information to the business users for them to make the right decisions.
However, when I meet the users they tell me a different story. Most often they say that they do not have the information they need to do their job. Or they have to spend a lot of time getting the relevant information. Many users state that they spend more time getting access to the data than understanding the information.
This divide between IT and business is very common, it causes a lot of frustration and can cost a lot of money, which is a real issue for companies that needs to be solved for them to be profitable in the future. Research shows that by 2015 companies that build a good information management system will be 20 percent more profitable compared to their peers.
You can read the entire research publication from http://download.microsoft.com/download/7/B/8/7B8AC938-2928-4B65-B1B3-0B523DDFCDC7/Big%20Data%20Gartner%20information_management_in_the_21st%20Century.pdf.
So how can an organization avoid the pitfalls in BI systems and create an effective way of working with information? This chapter will cover the following topics concerning it:
Common user requirements related to BI
Understanding how these requirements can be solved by Analysis Services
An introduction to self-service reporting
In many cases, companies that struggle with information delivery do not have a dedicated reporting system or data warehouse. Instead the users have access only to the operational reports provided by each line of business application. This is extremely troublesome for the users that want to compare information from different systems.
As an example, think of a sales person that wants to have a report that shows the sales pipeline, from the Customer Relationship Management (CRM) system together with the actual sales figures from the Enterprise Resource Planning (ERP) system. Without a common reporting system the users have to combine the information themselves with whatever tools are available to them.
Most often this tool is Microsoft Excel. While Microsoft Excel is an application that can be used to effectively display information to the users, it is not the best system for data integration. To perform the steps of extracting, transforming, and loading data, from the source system, the users have to write tedious formulas and macros to clean data, before they can start comparing the numbers and taking actual decisions based on the information.
Lack of a dedicated reporting system can also cause trouble with the performance of the Online Transaction Processing (OLTP) system. When I worked in the SQL Server support group at Microsoft, we often had customers contacting us on performance issues that they had due to the users running the heavy reports directly on the production system.
To solve this problem, many companies invest in a dedicated reporting system or a data warehouse. The purpose of this system is to contain a database customized for reporting, where the data can be transformed and combined once and for all from all source systems. The data warehouse also serves another purpose and that is to serve as the storage of historic data.
Many companies that have invested in a common reporting database or data warehouse still require a person with IT skills to create a report. The main reason for this is that the organizations that have invested in a reporting system have had the expert users define the requirements for the system. Expert users will have totally different requirements than the majority of the users in the organization and an expert tool is often very hard to learn.
An expert tool that is too hard for the normal users will put a strain on the IT department that will have to produce all the reports. This will result in the end users waiting for their reports for weeks and even months.
One large corporation that I worked with had invested millions of dollars in a reporting solution, but to get a new report the users had to wait between nine and 12 months, before they got the report in their hand. Imagine the frustration and the grief that waiting this long before getting the right information causes the end users.
To many users, BI means simple reports with only the ability to filter data in a limited way.
While simple reports such as the one in the preceding screenshot can provide valuable information, it does not give the users the possibility to examine the data in detail. The users cannot slice-and-dice the information and they cannot drill down to the details, if the aggregated level that the report shows is insufficient for decision making.
If a user would like to have these capabilities, they would need to export the information into a tool that enables them to easily do so. In general, this means that the users bring the information into Excel to be able to pivot the information and add their own measures. This often results in a situation where there are thousands of Excel spreadsheets floating around in the organization, all with their own data, and with different formulas calculating the same measures.
When analyzing data, the data itself is the most important thing. But if you cannot understand the values, the data is of no benefit to you. Many users find that it is easier to understand information, if it is presented in a way that they can consume efficiently.
This means different things to different users, if you are a CEO, you probably want to consume aggregated information in a dashboard such as the one you can see in the following screenshot:
On the other hand, if you are a controller, you want to see the numbers on a very detailed level that would enable you to analyze the information. A controller needs to be able to find the root cause, which in most cases includes analyzing information on a transaction level.
A sales representative probably does not want to analyze the information. Instead, he or she would like to have a pre-canned report filtered on customers and time to see what goods the customers have bought in the past, and maybe some suggested products that could be recommended to the customers.
What the companies need is a way for the end users to access information in a user-friendly interface, where they can create their own analytical reports. Analytical reporting gives the user the ability to see trends, look at information on an aggregated level, and drill down to the detailed information with a single-click.
In most cases this will involve building a data warehouse of some kind, especially if you are going to reuse the information in several reports. The reason for creating a data warehouse is mainly the ability to combine different sources into one infrastructure once. If you build reports that do the integration and cleaning of the data in the reporting layer, then you will end up doing the same tasks of data modification in every report. This is both tedious and could cause unwanted errors as the developer would have to repeat all the integration efforts in all the reports that need to access the data. If you do it in the data warehouse you can create an ETL program that will move the data, and prepare it for the reports once, and all the reports can access this data. A data warehouse is also beneficial from many other angles. With a data warehouse, you have the ability to offload the burden of running the reports from the transactional system, a system that is built mainly for high transaction rates at high speed, and not for providing summarized data in a report to the users.
From a report authoring perspective a data warehouse is also easier to work with. Consider the simple static report shown in the first screenshot. This report is built against a data warehouse that has been modeled using dimensional modeling. This means that the query used in the report is very simple compared to getting the information from a transactional system. In this case, the query is a join between six tables containing all the information that is available about dates, products, sales territories, and sales.
select f.SalesOrderNumber, s.EnglishProductSubcategoryName, SUM(f.OrderQuantity) as OrderQuantity, SUM(f.SalesAmount) as SalesAmount, SUM(f.TaxAmt) as TaxAmt from FactInternetSales f join DimProduct p on f.ProductKey=p.ProductKey join DimProductSubcategory s on p.ProductSubcategoryKey = s.ProductSubcategoryKey join DimProductCategory c on s.ProductCategoryKey = c.ProductCategoryKey join DimDate d on f.OrderDateKey = d.DateKey join DimSalesTerritory t on f.SalesTerritoryKey = t.SalesTerritoryKey where c.EnglishProductCategoryName = @ProductCategory and d.CalendarYear = @Year and d.EnglishMonthName = @MonthName and t.SalesTerritoryCountry = @Country group by f.SalesOrderNumber, s.EnglishProductSubcategoryName
You can download the example code files for all Packt books you have purchased from your account at http://www.packtpub.com. If you purchased this book elsewhere, you can visit http://www.packtpub.com/support and register to have the files e-mailed directly to you.
The preceding query is included for illustrative purposes. As you can see it is very simple to write for someone that is well versed in Transact-SQL.
Compare this to getting all the information from the operational system necessary to produce this report, and all the information stored in the six tables. It would be a daunting task. Even though the sample database for AdventureWorks is very simple, we still need to query a lot of tables to get to the information. The following figure shows the necessary tables from the OLTP system you would need to query, to get the information available in the six tables in the data warehouse.
Now imagine creating the same query against a real system, it could easily be hundreds of tables involved to extract the data that are stored in a simple data model used for sales reporting. As you can see clearly now, working against a model that has been optimized for reporting is much simpler when creating the reports. More information on how to structure a reporting database using dimensional modeling will be provided in Chapter 3, Creating Your First Multidimensional Cube.
Even with a well-structured data warehouse, many users would struggle with writing the select query driving the report shown earlier. The users, in general, do not know SQL. They typically do not understand the database schema since the table and column names usually consists of abbreviations that can be cryptic to the casual user.
What if a user would like to change the report, so that it would show data in a matrix with the ability to drill down to lower levels? Then they most probably would need to contact IT. IT would need to rewrite the query and change the entire report layout, causing a delay between the need of the data and the availability.
What is needed is a tool that enables the users to work with the business attributes instead of the tables and columns, with simple understandable objects instead of a complex database engine. Fortunately for us SQL Server contains this functionality; it is just for us database professionals to learn how to bring these capabilities to the business.
That is what this book is all about, creating a flexible reporting solution allowing the end users to create their own reports. I have assumed that you as the reader have knowledge of databases and are well-versed with your data. What you will learn in this book is, how to use a component of SQL Server 2012 called SQL Server Analysis Services to create a cube or semantic model, exposing data in the simple business attributes allowing the users to use different tools to create their own ad hoc reports.
Think of the cube as a PivotTable spreadsheet in Microsoft Excel. From the users perspective, they have full flexibility when analyzing the data. You can drag-and-drop whichever column you want to, into either the rows, columns, or filter boxes.
The PivotTable spreadsheet also summarizes the information depending on the different attributes added to the PivotTable spreadsheet. The same capabilities are provided through the semantic model or the cube. When you are using the semantic model the data is not stored locally within the PivotTable spreadsheet, as it is when you are using the normal PivotTable functionality in Microsoft Excel. This means that you are not limited to the number of rows that Microsoft Excel is able to handle.
Since the semantic model sits in a layer between the database and the end user reporting tool, you have the ability to rename fields, add calculations, and enhance your data. It also means that whenever new data is available in the database and you have processed your semantic model, then all the reports accessing the model will be updated.
The semantic model is available in SQL Server Analysis Services. It has been part of the SQL Server package since Version 7.0 and has had major revisions in the SQL Server 2005, 2008 R2, and 2012 versions. This book will focus on how to create semantic models or cubes through practical step-by-step instructions.
SQL Server Analysis Services is an application that allows you to create a semantic model that can be used to analyze very large amounts of data with great speed. The models can either be user created, or created and maintained by IT.
If the user wants to create it, they can do so, by using a component in Microsoft Excel 2010 and upwards called PowerPivot. If you run Microsoft Excel 2013, it is included in the installed product, and you just need to enable it. In Microsoft Excel 2010, you have to download it as a separate add-in that you either can find on the Microsoft homepage or on the site called http://www.powerpivot.com. PowerPivot creates and uses a client-side semantic model that runs in the context of the Microsoft Excel process; you can only use Microsoft Excel as a way of analyzing the data. If you just would like to run a user created model, you do not need SQL Server at all, you just need Microsoft Excel. On the other hand, if you would like to maintain user created models centrally then you need, both SQL Server 2012 and SharePoint.
Instead, if you would like IT to create and maintain a central semantic model, then IT need to install SQL Server Analysis Services. IT will, in most cases, not use Microsoft Excel to create the semantic models. Instead, IT will use Visual Studio as their tool. Visual Studio is much more suitable for IT compared to Microsoft Excel. Not only will they use it to create and maintain SQL Server Analysis Services semantic models, they will also use it for other database related tasks. It is a tool that can be connected to a source control system allowing several developers to work on the same project.
The semantic models that they create from Visual Studio will run on a server that several clients can connect to simultaneously. The benefit of running a server-side model is that they can use the computational power of the server, this means that you can access more data. It also means that you can use a variety of tools to display the information.
This book will focus on IT created models, but the information described in Chapter 9, In-memory, the Future, can be used to learn how to create models in PowerPivot. Both approaches enable users to do their own self-service reporting. In the case where PowerPivot is used they have complete freedom; but they also need the necessary knowledge to extract the data from the source systems and build the model themselves.
The users are, in this case, limited to the data that is available in the predefined model, but on the other hand, it is much simpler to do their own reporting. This is something that can be seen in the preceding figure that shows Microsoft Excel 2013 connected to a semantic model.
SQL Server Analysis Services is available in the Standard edition with limited functionality, and in the BI and Enterprise edition with full functionality. For smaller departmental solutions the Standard edition can be used, but in many cases you will find that you need either the BI or the Enterprise edition of SQL Server. If you would like to create in-memory models as described in Chapter 9, In-memory, the Future, then you definitely cannot run the Standard edition of the software since this functionality is not available in the Standard edition of SQL Server.
In this chapter, you learned about the requirements that most organizations have when it comes to an information management platform. You were introduced to SQL Server Analysis Services that provides the capabilities needed to create a self-service platform that can serve as the central place for all the information handling. SQL Server Analysis Services allows users to work with the data in the form of business entities, instead of through accessing a databases schema. It allows users to use easy to learn query tools such as Microsoft Excel to analyze the large amounts of data with subsecond response times. The users can easily create different kinds of reports and dashboards with the semantic model as the data source.
In this book, you will learn how to create a semantic model in SQL Server Analysis Services following easy steps. The next chapter, will focus on the steps necessary to get you started with your development. This includes learning about the different semantic models that are available, and how to install the system.