Home Data Business Intelligence Career Master Plan

Business Intelligence Career Master Plan

By Eduardo Chavez , Danny Moncada
books-svg-icon Book
Subscription FREE
eBook $35.99
Print + eBook $44.99
READ FOR FREE Free Trial for 7 days. $15.99 p/m after trial. Cancel Anytime! BUY NOW BUY NOW
What do you get with a Packt Subscription?
This book & 7000+ ebooks & video courses on 1000+ technologies
60+ curated reading lists for various learning paths
50+ new titles added every month on new and emerging tech
Early Access to eBooks as they are being written
Personalised content suggestions
Customised display settings for better reading experience
50+ new titles added every month on new and emerging tech
Playlists, Notes and Bookmarks to easily manage your learning
Mobile App with offline access
What do you get with a Packt Subscription?
This book & 6500+ ebooks & video courses on 1000+ technologies
60+ curated reading lists for various learning paths
50+ new titles added every month on new and emerging tech
Early Access to eBooks as they are being written
Personalised content suggestions
Customised display settings for better reading experience
50+ new titles added every month on new and emerging tech
Playlists, Notes and Bookmarks to easily manage your learning
Mobile App with offline access
What do you get with eBook + Subscription?
Download this book in EPUB and PDF formats
This book & 6500+ ebooks & video courses on 1000+ technologies
60+ curated reading lists for various learning paths
50+ new titles added every month on new and emerging tech
Early Access to eBooks as they are being written
Personalised content suggestions
Customised display settings for better reading experience
50+ new titles added every month on new and emerging tech
Playlists, Notes and Bookmarks to easily manage your learning
Mobile App with offline access
What do you get with a Packt Subscription?
This book & 6500+ ebooks & video courses on 1000+ technologies
60+ curated reading lists for various learning paths
50+ new titles added every month on new and emerging tech
Early Access to eBooks as they are being written
Personalised content suggestions
Customised display settings for better reading experience
50+ new titles added every month on new and emerging tech
Playlists, Notes and Bookmarks to easily manage your learning
Mobile App with offline access
What do you get with eBook?
Download this book in EPUB and PDF formats
Access this title in our online reader
DRM FREE - Read whenever, wherever and however you want
Online reader with customised display settings for better reading experience
What do you get with video?
Download this video in MP4 format
Access this title in our online reader
DRM FREE - Watch whenever, wherever and however you want
Online reader with customised display settings for better learning experience
What do you get with Audiobook?
Download a zip folder consisting of audio files (in MP3 Format) along with supplementary PDF
READ FOR FREE Free Trial for 7 days. $15.99 p/m after trial. Cancel Anytime! BUY NOW BUY NOW
Subscription FREE
eBook $35.99
Print + eBook $44.99
What do you get with a Packt Subscription?
This book & 7000+ ebooks & video courses on 1000+ technologies
60+ curated reading lists for various learning paths
50+ new titles added every month on new and emerging tech
Early Access to eBooks as they are being written
Personalised content suggestions
Customised display settings for better reading experience
50+ new titles added every month on new and emerging tech
Playlists, Notes and Bookmarks to easily manage your learning
Mobile App with offline access
What do you get with a Packt Subscription?
This book & 6500+ ebooks & video courses on 1000+ technologies
60+ curated reading lists for various learning paths
50+ new titles added every month on new and emerging tech
Early Access to eBooks as they are being written
Personalised content suggestions
Customised display settings for better reading experience
50+ new titles added every month on new and emerging tech
Playlists, Notes and Bookmarks to easily manage your learning
Mobile App with offline access
What do you get with eBook + Subscription?
Download this book in EPUB and PDF formats
This book & 6500+ ebooks & video courses on 1000+ technologies
60+ curated reading lists for various learning paths
50+ new titles added every month on new and emerging tech
Early Access to eBooks as they are being written
Personalised content suggestions
Customised display settings for better reading experience
50+ new titles added every month on new and emerging tech
Playlists, Notes and Bookmarks to easily manage your learning
Mobile App with offline access
What do you get with a Packt Subscription?
This book & 6500+ ebooks & video courses on 1000+ technologies
60+ curated reading lists for various learning paths
50+ new titles added every month on new and emerging tech
Early Access to eBooks as they are being written
Personalised content suggestions
Customised display settings for better reading experience
50+ new titles added every month on new and emerging tech
Playlists, Notes and Bookmarks to easily manage your learning
Mobile App with offline access
What do you get with eBook?
Download this book in EPUB and PDF formats
Access this title in our online reader
DRM FREE - Read whenever, wherever and however you want
Online reader with customised display settings for better reading experience
What do you get with video?
Download this video in MP4 format
Access this title in our online reader
DRM FREE - Watch whenever, wherever and however you want
Online reader with customised display settings for better learning experience
What do you get with Audiobook?
Download a zip folder consisting of audio files (in MP3 Format) along with supplementary PDF
  1. Free Chapter
    Chapter 2: How to Become Proficient in Analyzing Data
About this book
Navigating the challenging path of a business intelligence career requires you to consider your expertise, interest, and skills. Business Intelligence Career Master Plan explores key skills like data modeling, visualization, warehousing, organizational structures, technology stacks, coursework, certifications, and interview advice, enabling you to make informed decisions about your BI journey. You’ll start by assessing the different roles in BI and matching your skills and career with the tech stack. You’ll then learn to build taxonomy and a data story using visualization types. Additionally, you’ll explore the fundamentals of programming, frontend development, backend development, software development lifecycle, and project management, giving you a broad view of the end-to-end BI process. With the help of the author’s expert advice, you’ll be able to identify what subjects and areas of study are crucial and would add significant value to your skillset. By the end of this book, you’ll be well-equipped to make an informed decision on which of the myriad paths to choose in your business intelligence journey based on your skillset and interests.
Publication date:
August 2023
Publisher
Packt
Pages
284
ISBN
9781801077958

 

How to Become Proficient in Analyzing Data

At this point, you might be working for an organization or are about to start a project, and you are wondering what to do first and how to keep improving your skills. Like everything in life, we have to take it step by step. This chapter will focus on a methodical approach to navigate through the ocean of data. Remember that it is easy to get lost in the different analysis techniques and methodologies, but it is imperative that you develop your own discipline and rhythm. In this book, we will give you some advice and examples on how to tackle a data project, the minimum sequential steps you can take, and a routine if you do that will serve as the basis for your future projects.

This is not a final recipe, nor is it supposed to be the norm, but it will help you create, as quickly as possible, a picture of the roadmap your data journey will take you through, thus helping you navigate it. These are minimal common-sense steps that will spark your creativity. Most of your work is not quantifiable and is not an algorithm; it takes deviations, branches out and comes back to the original point, and moves far away to a different path. This is supposed to give a sense of order to a chaotic enterprise.

 

Building a taxonomy of your data sources

The initial and imperative step is to construct a comprehensive taxonomy of your data sources. Without any delay, dedicate yourself to gathering all the relevant data sources, and meticulously sketch out a well-structured diagram that depicts the interdependencies and relationships within the data flow, across the various repositories it encounters. By creating a clear visual representation of this data ecosystem, you will gain valuable insights into the origins, transformations, and destinations of your data, facilitating a more organized and systematic approach to data management.

Building a taxonomy of your data sources involves organizing your data into a hierarchical structure that reflects its relationships and dependencies:

  • Identify the scope and purpose of your taxonomy: Determine the purpose of the taxonomy and what it will be used for. Decide on the level of detail and granularity you need for your taxonomy.
  • Collect information about your data sources: Gather information about all your data sources, including databases, files, applications, and systems. Identify the attributes that describe each data source, such as type, format, subject matter, and frequency of updates.
  • Categorize your data sources: Identify the categories and subcategories that your data sources belong to. Create a hierarchy that reflects the relationships between the categories and subcategories.
  • Define the relationships between the categories: Determine how the categories and subcategories are related to each other. For example, a database of customer information might be related to a sales system and a marketing system.
  • Create a classification scheme: Develop a set of rules and guidelines to classify data sources. This will ensure that new data sources can be easily categorized and added to the taxonomy.
  • Test and refine the taxonomy: Test the taxonomy with a small sample of data sources to ensure that it reflects the flow as expected. Refine the taxonomy as needed to ensure that it accurately reflects the relationships between your data sources. Find leaks and adjust for possible branches that data hangs from.

Technically speaking, this translates into finding logs and monitoring data pipelines. However, this might be complicated, as you may not have access to these resources. In a worst-case scenario, your data sources consist of a simple file in your possession that some data engineer lands in your inbox, so what would we do next in this scenario?

  • Ask where it comes from.
  • Ask with what frequency it gets generated.
  • Ask how long it takes to generate.
  • Find out what other processes or data sources are involved in its creation.
  • Find out whether the source is larger or smaller. Is it an aggregate?
  • Start mapping and demanding information to map out and reach other levels in your taxonomy.

At the end of this exercise, you will have a sort of sequence that is missing lots of steps and levels, but it is like a candle; the more you navigate, the more you light up with your inquiries. The whole map of data sources often is a convoluted one and is designed to please specific situations, rather than a holistic solution that prioritizes data flow efficiency. There are many reasons why this happens, but that’s what makes this exercise even more important. It will encourage you to ask more questions. Having a whole image of the map in mind (as shown in the following figure) will help you find the paths where data flows, improving the gaps in your knowledge.

Figure 2.1 – A typical data infrastructure showing the unknown paths data takes

Figure 2.1 – A typical data infrastructure showing the unknown paths data takes

Gather all the facts at your disposal, and like a detective, put them in sequence. Once you hit a wall, like a rat in a maze, find your way around by asking the teams involved in those unknown systems. In the scenario portrayed in the preceding diagram, this is what we know so far as a new data analyst at Big Corp X:

  • You now have a data.csv file
  • It comes from database ABC as an aggregate table called agg_data
  • This table pulls from a data lake called Big Data Lake
  • This data lake also gets data from a transactional system called Big Enterprise System

Building a taxonomy of your data sources is important in a data project for several reasons, as listed here:

  • Improved data management: A taxonomy can help you better organize and manage your data sources. It can help you identify data redundancies, data quality issues, and gaps in data coverage. By having a clear understanding of your data sources, you can make more informed decisions about how to collect, store, and analyze your data.
  • Enhanced data analysis: With a taxonomy in place, you can more easily perform data analysis across multiple data sources. A taxonomy can help you identify relationships and dependencies between data sources and select the most appropriate data sources for your analysis.
  • Facilitate collaboration: A taxonomy can help facilitate collaboration among team members. By having a standardized way of organizing and labeling data sources, team members can more easily share and communicate about data. This can lead to more efficient and effective data analysis and decision-making.
  • Better decision-making: By having a taxonomy in place, you can more easily identify patterns and trends in your data and make more informed decisions based on this information. A taxonomy can help you identify which data sources are most relevant to a particular business question and can help you ensure that you use the most accurate and complete data available.

Building a taxonomy of your data sources is important in a data project because it can improve data management, enhance data analysis, facilitate collaboration, and lead to better decision-making. At the end of this exercise, you will have a clear idea of where to go and who to ask for any data-related questions. It is not about knowing everything in terms of data but, instead, knowing where you need to get information from in order to make your analysis better.

Here are the steps on how to build an effective data model of your data sources:

  1. Define your business needs: What are you trying to accomplish with your data model? What data do you need to collect and store? What questions do you need to answer?
  2. Identify your data sources: Where is your data coming from? What format is it in?
  3. Understand your data: What are the different types of data you have? What are the relationships between the data?
  4. Design your data model: Create a diagram that shows how your data is related.
  5. Implement your data model: Create the tables and columns in your database.
  6. Test your data model: Make sure that you can access and use your data.
  7. Maintain your data model: As your data changes, you need to update your data model accordingly.

Now that you have created a map and have an idea of where data is located at every step of the pipeline, it is time for you to explore it. Doing it manually would be a waste of energy when we have modern BI tools that excel beyond our capabilities. The question here is not only how to pick the best one but also a more general one – what features you should use in order to allow you a thorough exploration of the data.

 

How to use a BI tool to explore data

There are many BI tools and programming languages that will help you along the way to visualize and explore data in many ways. In Chapter 5, we will discuss some of the tools out there in the market. In this section, we will discuss how to use them. Regardless of the tool type, there are data manipulation techniques that every data tool should do as a minimum.

As we define each of these, we will discuss the tool-less alternative and find the cost-benefit of using a BI tool to perform each of these. There are two basic concepts we need to understand before moving on – metrics and attributes. These can also be unveiled by asking What do we want to measure? and How do we want to describe such a measurement?

For example, if we measure sales by region, what are we measuring? Sales. What describes such measurements? Regions. This basic statement, although simplistic, is the basis of every analysis and exploration. Many complex analyses will derive from asking such questions. With additions, filters, and other types of calculations, you can make a more robust report.

Another basic term we need to understand is granularity. Granularity in Business Intelligence (BI) refers to the level of detail or the degree of aggregation of the data that is analyzed. It refers to the size of the individual units of data that are examined.

For example, if you analyze sales data, the granularity can be at the level of individual transactions or a higher level of aggregation, such as monthly sales, quarterly sales, or annual sales. The level of granularity can have a significant impact on the insights that can be derived from the data. Granularity may be accompanied by a temporal dimension. It is important to notice the difference because repeating a record over time doesn’t mean the granularity changes if the analysis is made using different measures of time (i.e., year, month, or day).

Here’s a mental exercise on granularity.

If we have a dataset that describes employees' performance, each record could represent how many sales we had in a department by month. Each row has a sales amount, a department ID, and the number of employees.

Granularity level: department

Figure 2.2 – A table depicting the granularity at the department level by month. In this table, granularity shows when a row is unique, and there are no two rows with the same department ID in a given month

Figure 2.2 – A table depicting the granularity at the department level by month. In this table, granularity shows when a row is unique, and there are no two rows with the same department ID in a given month

The temporal unit of measurement here is month, so we can expect departments to repeat or not every month. Different model techniques may repeat a department with zero sales in a given month, but this will totally depend on the preferences of the design. Zero sales could mean no sales, but on some occasions, this may represent sales that add up to zero for many reasons (discounts, gifts, promotions, or devolutions).

Now, if we want to analyze on a more detailed level, we could change the level of granularity to the employee; every record would have the following: Employee ID, Department ID, Sales Amount, and Date.

Granularity level: employee

Figure 2.3 – A table depicting granularity at the employee level. In this table, granularity shows when a row is unique. There are no two rows with the same employee ID on a given date, and sales are added up and stored daily

Figure 2.3 – A table depicting granularity at the employee level. In this table, granularity shows when a row is unique. There are no two rows with the same employee ID on a given date, and sales are added up and stored daily

The temporal unit now is Date, a day in the life of a salesperson. Again, many experts in dimensional modeling may offer different ways to represent this; some may require each individual sale during a day, and some would show the summary of sales by an employee during that day.

Can we go one level further down? Certainly – let’s imagine now records storing sales of every product by every employee on a given date. This is exactly the kind of analysis we could find in master-detail modeling. This is where the header of the sale may have a summary of what was sold and the detail level contains each individual product, itemized with the proper measurements of such sale – for example, quantity, stock-keeping unit (SKU), and unit of measurement.

In BI, master-detail modeling is a data modeling technique used to represent the hierarchical relationships between data entities. It involves creating a relationship between two tables or datasets, where one table is the master and the other is the detail.

The master table contains the main data elements, while the detail table contains the related data elements that are associated with the main data elements. For example, in a sales analysis, the master table may contain information about sales transactions, while the detail table may contain information about the products sold in each transaction.

The master-detail relationship is typically created by defining a primary key in the master table and a foreign key in the detail table. The foreign key is used to link the detailed data to the corresponding master data.

Master-detail relationships are often used in reporting and analysis to drill down from summary information to more detailed information. For example, a report may show total sales by product category, with the ability to drill down to see the sales by individual products within each category.

Master-detail relationships are also used in data visualization tools to create interactive dashboards and reports. By using the master-detail relationship, a user can interactively explore the data, filter it, and drill down to view more detailed information as needed.

Master-detail relationships are an important data modeling technique in BI, allowing for the flexible and powerful analysis and reporting of hierarchical data structures.

As you can see, granularity can go up and aggregate detailed data, but it definitely can go down to levels that you can find deep inside an online transactional processing (OLTP) when your customers require it. A high level of granularity means that data is analyzed at a more detailed level, which can provide more specific insights but may require more time and effort to analyze. On the other hand, a lower level of granularity means that the data is analyzed at a more summarized level, which can provide a broader view of the data but may miss out on important details.

The choice of granularity depends on the specific business problem and the goals of the analysis. Generally, the level of granularity should be chosen based on the level of detail required to support the business decisions that need to be made. Data modelers and architects may decide to specify a low granularity level even though reports and dashboards are shown at the department, organization, or even company level. Part of their job involves building future-proof data structures; hence, they may find it advantageous to define a fine granularity level so that other analyses become supportable when the business requires it. However, modeling techniques are out of the scope of this book.

By understanding metrics, attributes or dimensions, and granularity, we can extrapolate to other concepts that follow these three, as they represent the lowest level in terms of “data hierarchy.” Now, we go up, and aggregations are at the next level. In BI, aggregations refer to the process of summarizing or grouping data into higher-level units, such as totals, averages, counts, or percentages. The purpose of aggregation is to make data more manageable, understandable, and actionable.

Aggregations are used to reduce the volume of data to be analyzed so that it can be processed more efficiently and effectively. By summarizing data into higher-level units, it becomes easier to identify patterns, trends, and outliers in the data.

For example, in a sales analysis, aggregating data by month or by product category can help identify which products are selling well and which ones are not. Aggregations can also be used to compare performance over time or across different regions or customer segments.

Aggregations can be performed at different levels of detail, depending on the business needs and the data available. Aggregations can be pre-calculated and stored in a data warehouse, or they can be calculated on the fly using BI tools and technologies.

A BI tool should be able to create aggregations with little to no performance issues, as a good data model is based on the premise that users can take advantage of the aggregation engine behind a BI tool. An example of aggregations in different technologies is shown here:

For SQL, you can have aggregation at the following levels:

  • Aggregation at a departmental level:
    Select region, organization, department, sum(amount) as sales_amountFrom sales_tableGroup by region, organization, department
  • Aggregation at an organization level:
    Select region, organization, sum(amount) as sales_amountFrom sales_tableGroup by region, organization
  • Aggregation at a regional level:
    Select region, sum(amount) as sales_amountFrom sales_tableGroup by region

Consider doing the same operation in Excel, as shown in the following screenshot. You can use the interface to aggregate at different levels and obtain the same result:

Figure 2.4 – Pivot table functionality, as shown in an Excel spreadsheet

Figure 2.4 – Pivot table functionality, as shown in an Excel spreadsheet

If you want to make large and complex datasets more manageable and meaningful, thus enabling better decision-making based on actionable insights, aggregations are the BI technology that will achieve it.

When picking the right tool, you have to make sure you can perform these basic activities:

  • Creating metrics or calculations
  • Analyzing them by dimensions
  • Exploring granularity
  • Aggregating data to different levels

Now that we know we can create a report with aggregations, calculations, and attributes from a certain granularity, the report itself should have some level of interaction with a user. This interaction sometimes allows the user to navigate the data in an easier way and, at the same time, get every metric to recalculate for the different scenarios required. We call this drilling down and rolling up.

In BI, drilling down and rolling up are techniques used to navigate through hierarchical data structures and analyze data at different levels of detail.

Drilling down refers to the process of moving from a higher-level summary of data to a lower-level detail. For example, starting with a report that shows total sales by region, drilling down would involve clicking on a specific region to see the sales by country, and then clicking on a specific country to see the sales by city.

Rolling up, on the other hand, refers to the process of moving from a lower-level detail to a higher-level summary. For example, starting with a report that shows sales by city, rolling up would involve aggregating the data to show the sales by region, and then by country.

Drilling down and rolling up are often used together to analyze data at multiple levels of detail. By drilling down to lower levels of detail, analysts can gain insights into the factors that drive overall trends. By rolling up to higher levels of summary, analysts can identify patterns and trends across different regions or segments.

Drilling down and rolling up can be performed manually by analysts using BI tools, or they can be automated through the use of drill-down and roll-up functionality in reporting and analysis tools.

Overall, drilling down and rolling up are important techniques in BI that enable analysts to explore data at different levels of detail, gaining insights that can inform decision-making and drive business performance.

A successful tool should allow us to navigate these data structures up (the aggregation levels) and down (the granularity) with different dimensions and the recalculation of metrics, helped by modeling techniques. One modeling technique a BI tool should allow us to create is known as hierarchies. In BI, a hierarchy is a way of organizing data elements into a logical structure that reflects their relationships and dependencies. Hierarchies are used to represent complex data relationships in a simplified and intuitive way, making it easier for users to navigate and analyze the data.

A hierarchy consists of a series of levels, with each level representing a different category or dimension of data. For example, in a sales analysis, a hierarchy may be defined as follows:

  • Level 1: Year
  • Level 2: Quarter
  • Level 3: Month
  • Level 4: Week
  • Level 5: Day

Each level in the hierarchy contains a set of members, which represent the values for that level. For example, the members for the month level might include January, February, and March.

Hierarchies can be used to organize data for reporting and analysis and to facilitate drilling down and rolling up through different levels of detail. For example, a user might start by looking at total sales for the year, and then drill down to see the sales by quarter, month, week, and day.

Hierarchies can also be used to define relationships between different data elements. For example, a hierarchy might be defined that relates products to product categories, which in turn are related to product departments.

Hierarchies are really an important concept in BI that enable users to navigate and analyze complex data structures intuitively and efficiently.

As we learn about hierarchies, dimensions, calculations, aggregations, and the act of drilling down and rolling up the granularity level, we can now conclude what things we should be able to do at a minimum with a BI tool. The act of putting all of these together in an ad hoc report is called slicing and dicing. In BI, slice and dice is a technique used to analyze data by selecting a subset of data (slicing) and then examining it from different perspectives (dicing). It allows users to break down data into smaller, more manageable parts and analyze them from different angles to gain deeper insights.

Slicing involves selecting a subset of data based on a specific dimension or category. For example, slicing by time might involve selecting data for a specific month, quarter, or year. Slicing by location might involve selecting data for a specific region, country, or city.

Dicing involves examining the sliced data from different perspectives or dimensions. For example, dicing by product might involve analyzing sales data by product category, brand, or SKU. Dicing by the customer might involve analyzing sales data by demographic, loyalty level, or purchase history.

Together, slicing and dicing allow users to drill down into specific areas of interest and then analyze them in more detail from different perspectives. For example, a user might start by slicing the data by time to look at sales for a specific quarter, and then dice the data by product to look at sales by category, brand, and SKU.

Slice and dice functionality is often built into BI tools and software, allowing users to easily select and analyze data based on different dimensions and categories. It enables users to quickly identify trends, patterns, and outliers in data and make informed decisions, based on the insights gained from the analysis.

This is it – these are the basis for any robust analysis and charts. Trend analysis, forecasting, time series, bar charts, pie charts, scatter plots, correlations, and so on – it all derives from being able to perform such basic operations on top of your dataset. If you are in search of a good BI tool, performing such activities will guarantee you find your BI career path, as they will show you not the end of your roadmap but, instead, spark new ideas and help you understand your data gaps and needs.

 

Understanding your data needs

In retrospect, every single task or operation we do on top of our data is an iterative process that sends us on the path of understanding what needs we have in terms of data. While analyzing sales, let’s imagine that you find out that an organization, department, or region draws an imperfect picture, and numbers don’t match with official financial revenue systems. A storm is coming, and it is going to make the data ocean wild and turbulent… numbers won’t match.

We could start a drama here and show you the many meetings it took, the teams that collaborated, the back and forth, directors contacting directors, developers messaging developers, databases being queried, and hours and hours of investigation, but instead, let’s forward to months into the future. It is now known that the ERP system was patched with a customized interface that allows the sales department in Japan to do cross-department sales. Yes, there’s now new knowledge in the business process that throws light on the root cause, and a complex calculation has been implemented to allocate sales percentages to departments if a given department participated in a demo of a product for a different department.

The nightmare is finally over; tables called sales_ratio, sales, foreign_rate, and many more, are now part of the equation. You have to put all of them together in order to come up with an accurate calculation of sales. This is your job – create a full tracking of your data needs and gaps you have in order to make your analysis more complete. This is an iterative and sometimes recursive operation that you need to perform every day when trying to assess your data needs:

  • You find your data, and you perform an analysis.
    1. You test and find out that something is missing.
    • You find the stakeholders.
      • You ask.
      • You gather.
      • You perform analysis.
      • You test again.

We can actually see these steps and organize them sequentially, resulting in better project management. If you visualize them, then you can plan better and specify deadlines that adjust according to the complexity of each step. Such a flow should be close to the following:

Figure 2.5 – A process flow to follow when trying to understand your customer’s data needs

Figure 2.5 – A process flow to follow when trying to understand your customer’s data needs

Undoubtedly, it may seem challenging to emphasize this enough, but adhering to established guidelines is remarkably crucial when engaging in the inherently subjective and creative exercise of analyzing and exploring data. While this may appear contradictory, following a structured approach based on established principles adds objectivity to the process. By employing standardized methods and techniques, you can ensure a more consistent and unbiased analysis, allowing for meaningful insights to emerge from the data. Ultimately, by playing by the book, you foster a solid foundation for your data exploration endeavors, enabling a more rigorous and reliable interpretation of the information at hand:

  • Define the business problem: Start by identifying the business problem or question that needs to be answered. This will help to determine what data is required and how it should be analyzed.
  • Identify the stakeholders: Identify the stakeholders who will use the data and the insights generated from the analysis. This will help to understand their specific data needs and preferences.
  • Conduct a requirements gathering: Conduct a requirement gathering process to collect information about the data needed for analysis. This process may involve interviews, surveys, focus groups, or other methods of gathering information from stakeholders. If this is difficult, create mock-ups, do not wait for your customers, find similar reports done by other analysts in the company or investigate common sales reports in the industry, study the domain, study sales if this is the subject of your investigation, and use common sense when creating basic reports. Go back to the stakeholders and go back and forth, iterating new improved versions of such mock-ups, in the hope that these spark some creativity in their minds. Any feedback will help you improve your prototypes. Prototyping is key; we’ll discuss more of this in Chapter 4.

If you don’t have any specific business requirements for a dashboard, you can still prototype a mock-up by following these steps:

  1. Define the purpose of the dashboard: Even without specific business requirements, you can define the general purpose of the dashboard. For example, is it intended to provide an overview of key metrics, or to allow users to drill down into specific data?
  2. Identify the target audience: Consider who the dashboard is intended for and what their needs might be. For example, do they need to see real-time data or historical trends? What level of detail is required?
  3. Choose the right visualization types: Select the visualization types that are best suited for the purpose and audience of the dashboard. For example, use pie charts to show proportions, bar charts to show comparisons, or maps to show geographic data.
  4. Create a wireframe: Use a tool such as Balsamiq or Sketch to create a wireframe of the dashboard. A wireframe is a basic mock-up that shows the layout and content of the dashboard without going into too much detail.
  5. Refine the design: Once you have a basic wireframe, you can start to refine the design by adding more detail, choosing colors and fonts, and adding real data where possible. You may also want to get feedback from stakeholders to help refine the design further.

By following these steps, you can prototype a mock-up dashboard even without specific business requirements. While the dashboard may not be fully optimized for the needs of the business, it can still provide a starting point for further development and refinement as requirements become clearer. The following are examples of actions you can take to refine and formalize data prototypes:

  • Identify the data sources: Identify the data sources that contain the required data. This may include internal data sources such as databases or spreadsheets, as well as external data sources such as third-party data providers.
  • Assess the quality of the data: Assess the quality of the data to ensure that it is accurate, complete, and relevant to the analysis. This may involve data cleansing, data validation, or other data quality assurance processes.
  • Develop a data model: Develop a data model that defines the relationships between the different data elements. This will help to ensure that the data is organized in a way that is suitable for analysis.
  • Create reports and dashboards: Create reports and dashboards that visualize data in a way that is easy to understand and analyze. This may involve creating charts, graphs, tables, or other visualizations.
  • Test and refine the analysis: Test the analysis and refine it based on feedback from stakeholders. This may involve making changes to the data model, modifying the reports and dashboards, or adjusting the analysis approach. By following these steps, an analyst can understand the data needs of a business and develop a BI solution that meets those needs. This will help to ensure that the analysis is accurate, relevant, and actionable and that it provides insights that drive business performance.

This process is quite subjective, and depending on your company, the output could be different. By setting your expectations correctly and continuously improving your mapping of your data architecture, you will become proficient when trying to identify new data needs in your organization.

 

Summary

In this chapter, you gained valuable insights and practical advice on how to enhance your proficiency in data analysis. We emphasized the significance of thoroughly mapping out each data source and adopting a systematic process to gain a deeper understanding of your organization’s data requirements. It is important to acknowledge that while this chapter serves as a valuable resource, it is not an exhaustive guide. Rather, it aims to inspire you to explore new avenues and bridge any knowledge gaps you may have in your data architecture. It is your responsibility to adapt and refine these techniques to gather the necessary information effectively. As you accomplish this, your path forward becomes clearer – having identified your sources and obtained the required data, it is now time to derive meaning from it and effectively present it to your intended audience. Preparing to share your data story necessitates honing your skills in data visualization and analysis, empowering you to effectively communicate insights derived from your data.

About the Authors
  • Eduardo Chavez

    Eduardo Chavez, a Google Professional with over 18 years of industry experience, hails from Mazatlan, Mexico. He holds multiple Data certifications from Oracle, Microsoft, Google, and AWS since 2010. With a Bachelor's degree in Information Systems and three Master's degrees, including IT, Business Administration, and Business Analytics, Eduardo specializes in SQL, Semantic Layers, and Data Modeling. He has worked for prominent private companies like Accenture, Oracle, and Google, as well as the University of Minnesota in the Public Sector. Eduardo is known for advocating the rapid development cycle, emphasizing Late Binding Data Warehousing, Rapid Prototyping, and a Top-Down approach.

    Browse publications by this author
  • Danny Moncada

    Danny Moncada is a seasoned IT professional with 15 years of experience, excelling in project implementations related to database administration, data analysis, and data engineering. He has contributed his expertise to renowned companies like Cigna Healthcare, Indeed, IOTAS, as well as the University of Minnesota in the non-profit sector. In 2020, Danny completed his M.S. in Business Analytics from the Carlson School of Management, where he was recognized as the most helpful student by his peers. He has achieved various certifications, including Python Programming from NYU, Google Cloud Platform, and IBM Data Engineering paths from Coursera. Danny's specialties lie in data visualization, analysis, and data warehousing.

    Browse publications by this author
Business Intelligence Career Master Plan
Unlock this book and the full library FREE for 7 days
Start now