Reader small image

You're reading from  Cracking the Data Science Interview

Product typeBook
Published inFeb 2024
PublisherPackt
ISBN-139781805120506
Edition1st Edition
Concepts
Right arrow
Authors (2):
Leondra R. Gonzalez
Leondra R. Gonzalez
author image
Leondra R. Gonzalez

Leondra R. Gonzalez is a data scientist at Microsoft and Chief Data Officer for tech startup CulTRUE, with 10 years of experience in tech, entertainment, and advertising. During her academic career, she has completed educational opportunities with Google, Amazon, NBC, and AT&T.
Read more about Leondra R. Gonzalez

Aaren Stubberfield
Aaren Stubberfield
author image
Aaren Stubberfield

Aaren Stubberfield is a senior data scientist for Microsoft's digital advertising business and the author of three popular courses on Datacamp. He graduated with an MS in Predictive Analytics and has over 10 years of experience in various data science and analytical roles focused on finding insights for business-related questions.
Read more about Aaren Stubberfield

View More author details
Right arrow

Visualizing Data and Data Storytelling

Data visualization is the process of creating images, charts, and other visual data. This is performed to reveal and understand underlying trends and patterns in the data. These skills are important in order for data scientists to tell compelling data stories. For example, a marketing analyst may examine online customer behavior to identify purchasing habit trends such as seasonal trends, product preferences, or demographic correlations. These patterns can be used to craft targeted marketing campaigns or develop personalized recommendations, enhancing customers. Alternatively, an analyst may analyze historical financial time series data to identify patterns in market trends, stock performance, or economic indicators. By recognizing patterns, they can make informed predictions about future market behavior, guide investment decisions, and develop risk management strategies.

In this chapter, you will delve into the world of data visualization...

Understanding data visualization

As data scientists, we sometimes feel like we are explorers navigating the wild frontiers of massive datasets, hunting for insightful patterns and significant relationships. Yet, the real value of our journey lies in the capacity to translate these discoveries into stories that influence decisions, inspire action, and propel innovation. This is where the art of data visualization and storytelling comes into play.

Data visualization is a powerful tool beyond simply showcasing statistics or trends – it breathes life into data, transforming numbers and variables into visual narratives that capture attention, invoke emotion, and provoke thought. It is a translation process, converting the abstract language of data into an intuitive, visual dialect that people can understand and engage with. More than mere graphics, well-crafted data visualizations can tell compelling stories.

The power of visualization lies in its appropriateness to the data...

Surveying tools of the trade

There is an array of visualization tools available that cater to a variety of needs, skill sets, and use cases. This section will discuss several popular data visualization tools, including Power BI, Tableau, R’s Shiny, and Python libraries such as Matplotlib and Seaborn, providing guidance on when to use one over another. However, the goal here is to help give you more general knowledge to prepare you for your technical interview on understanding when to choose a particular tool.

Power BI

Power BI is a business intelligence tool developed by Microsoft. It offers interactive visualizations with an interface simple enough for end users to create reports and dashboards.

When to use it: Power BI is very effective when dealing with large quantities of complex data sources, which requires considerable data wrangling or modeling. It’s an excellent choice for businesses seeking to create interactive, user-friendly dashboards or for integrating...

Developing dashboards, reports, and KPIs

In some technical interviews, you are given a take-home technical task to complete, and this might include data visualization. In the previous section, we touched on some common dashboarding tools a data scientist might use. In this section, we will delve deeper into some best practices for your dashboards, reports, and KPIs.

As a data scientist, you’re not only tasked with uncovering insights from data but also communicating these insights effectively. This often involves creating dashboards, reports, and KPIs. While the aesthetics of your visuals are important, clarity, accuracy, and usability should always take precedence. The following are some best practices to help you create effective dashboards and reports:

  • Prioritize clarity and simplicity: Avoid cluttered or overly complex visualizations. Keep your dashboards and reports simple and intuitive. Stick to one primary message per chart and limit the number of visualizations...

Developing charts and graphs

While there are many tools for creating different data visuals, we will review a few basic visualizations, including bar charts, scatter plots, and histograms in Python. Two standard libraries for creating data visualizations in Python are Matplotlib and Seaborn.

In this section, we will discuss the different chart types and how to make them in Matplotlib and Seaborn.

Bar chart – Matplotlib

Matplotlib is a foundational library for visualizations in Python. Here’s a basic example of how you might create a bar chart with Matplotlib:

import Matplotlib.pyplot as plt
# Categories and their associated values
categories = ['Category1', 'Category2', 'Category3', 'Category4']
values = [50, 60, 70, 80]
plt.figure(figsize=(8,6)) # Create a new figure with a specific size (width, height)
plt.bar(categories, values) # Create a bar chart
# Labels for x-axis, y-axis and the plot
plt.xlabel('Categories...

Applying scenario-based storytelling

One of the most important aspects of a data scientist’s role is to translate complex datasets into a narrative that people who aren’t data scientists can understand. The ability to present your findings clearly and compellingly is a crucial skill for a data scientist. This section provides a framework for structuring your data story effectively:

  • Begin with your end: Before crunching numbers, clarify your goal. What is the key message you want to communicate? What action do you want to take? A clear objective will guide your analysis, influence your choice of visualizations, and ensure your story resonates with your audience.
  • Know your audience: Understanding your audience’s needs, interests, and level of knowledge will help you present your data meaningfully. Tailor your story to fit your audience – the detail, complexity, and visualizations you use should vary depending on who you’re speaking to.
  • ...

Summary

In the first half of this chapter, we established the critical role of data visualization and storytelling in the field of data science. Beginning with an overview of why data visualization is crucial, we delved into a framework for choosing the right visualization based on data types and the goal of communication. We explored a variety of data visualization types, such as bar charts, pie charts, histograms, scatter plots, and box plots, discussing their use cases, creation processes, and tips for enhancing their storytelling power. Additionally, we analyzed various visualization tools, including Power BI, Tableau, R’s Shiny, Python’s Matplotlib, and Seaborn, providing insights into their advantages, limitations, and ideal use cases.

The latter part of this chapter focused on the practical aspects of data visualization and storytelling. We covered the best practices for creating effective dashboards, reports, and KPIs, emphasizing clean, uncluttered visuals...

lock icon
The rest of the chapter is locked
You have been reading a chapter from
Cracking the Data Science Interview
Published in: Feb 2024Publisher: PacktISBN-13: 9781805120506
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Authors (2)

author image
Leondra R. Gonzalez

Leondra R. Gonzalez is a data scientist at Microsoft and Chief Data Officer for tech startup CulTRUE, with 10 years of experience in tech, entertainment, and advertising. During her academic career, she has completed educational opportunities with Google, Amazon, NBC, and AT&T.
Read more about Leondra R. Gonzalez

author image
Aaren Stubberfield

Aaren Stubberfield is a senior data scientist for Microsoft's digital advertising business and the author of three popular courses on Datacamp. He graduated with an MS in Predictive Analytics and has over 10 years of experience in various data science and analytical roles focused on finding insights for business-related questions.
Read more about Aaren Stubberfield