Reader small image

You're reading from  Data Wrangling with R

Product typeBook
Published inFeb 2023
PublisherPackt
ISBN-139781803235400
Edition1st Edition
Concepts
Right arrow
Author (1)
Gustavo R Santos
Gustavo R Santos
author image
Gustavo R Santos

Gustavo R Santos has worked in the Technology Industry for 13 years, improving processes, and analyzing datasets and creating dashboards. Since 2020, he has been working as a Data Scientist in the retail industry, wrangling, analyzing, visualizing and modeling data with the most modern tools like R, Python and Databricks. Gustavo also gives lectures from time to time at an online school about Data Science concepts. He has a background in Marketing, is certified as Data Scientist by the Data Science Academy Brazil and pursues his specialist MBA in Data Science at the University of São Paulo
Read more about Gustavo R Santos

Right arrow

Other Data Visualization Options

Data visualization is an important part of data science. There are a lot of resources available, as we have been seeing throughout Part 3 of the book. In this last chapter of Part 3, we will go over two extra visualization options:

  • Plotting graphics in Microsoft Power BI using R
  • Preparing data for plotting
  • Creating word clouds in RStudio

Technical requirements

All the code can be found in the book’s GitHub repository: https://github.com/PacktPublishing/Data-Wrangling-with-R/tree/main/Part3/Chapter12.

The following are the libraries to load to RStudio for this chapter:

library(tidyverse)
library(wordcloud2)
library(officer)
library(tidytext)

Plotting graphics in Microsoft Power BI using R

Many business intelligence (BI) tools, such as Microsoft Power BI, have been developed and launched in the last few years. Microsoft’s tool appeared in 2014 and it currently accepts its own graphics and integrates with programming languages such as R and Python.

Working with BI tools is very practical. Most of what can be done in terms of visualization does not require coding; instead, you can just drag and drop variables to create graphics. However, like any other tool, there are pros and cons. For example, if we want to create a more customized graphic or add a type of graphic that is not available in Power BI, such as a histogram, we will have to adapt and look for an alternative to create that visualization. For that reason, we will learn how to plot a ggplot2 graphic in Power BI. It will give us the flexibility to make much more than the standard graphics provided by the BI tool.

Let’s see how to plot histograms...

Preparing data for plotting

There are many kinds of graphics, for instance, univariate, bivariate with one numeric and one categorical variable or two numeric variables, and others. The input data will be different for each of them, requiring the data scientist to munge the data to fit a specific format to plot it. An example studied in this book was the data that was not in tidy format, requiring transformations before it could be plotted.

In this section, we will learn how to prepare a text to be plotted as a word cloud, which is a graphical way to show the content of text. A word cloud is a graphical representation of the most frequent words that appear in a text. The more frequently the word occurs, the bigger it appears in the plot, consequently providing a sense of the content of the text.

A text is a combination of words, but it does not have rows and columns of data. Instead, it is a whole piece. So, prior to plotting the word cloud, it is necessary to transform the...

Creating word clouds in RStudio

A word cloud is very useful when we want to visualize the content of a text quickly. The more times a word is repeated within the text, the bigger it will be displayed on the word cloud, giving us a sense of what we can expect if we read the text. It is kind of a summary.

Once we have a dataset with words and their frequencies, plotting a word cloud takes just one line of code. The function is the same name as the library, wordcloud2(), and it takes as inputs the dataset, a color palette or a vector of colors, and the size of the words:

# Generating WordCloud
wordcloud2(data=word_freq, color="random-dark", size=1)

The result after running this code is printed as follows.

Figure 12.10 – Word cloud generated with the content from Chapter 10 of this book

Just to refresh our minds, Chapter 10 is about an introduction to the ggplot2 library. It brings the concepts of the grammar of graphics and introduces...

Summary

In this brief chapter, we covered some additional options for visualization. We began the chapter by showing how we can integrate ggplot2 graphics in Microsoft Power BI, enhancing the capabilities of the tool. Next, we moved on to learn in practice how we can prepare data for plotting, with the creation of word clouds as the final goal, and, at the end of the chapter, we learned how to plot one and how to interpret it.

Exercises

  1. What programming languages can integrate with Power BI?
  2. What is the benefit of plotting R graphics in Power BI?
  3. What should you have in mind when preparing data for plotting?
  4. List the library name and function to create a word cloud.
  5. What are the two variables needed in the input dataset for a word cloud?

Further reading

lock icon
The rest of the chapter is locked
You have been reading a chapter from
Data Wrangling with R
Published in: Feb 2023Publisher: PacktISBN-13: 9781803235400
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Author (1)

author image
Gustavo R Santos

Gustavo R Santos has worked in the Technology Industry for 13 years, improving processes, and analyzing datasets and creating dashboards. Since 2020, he has been working as a Data Scientist in the retail industry, wrangling, analyzing, visualizing and modeling data with the most modern tools like R, Python and Databricks. Gustavo also gives lectures from time to time at an online school about Data Science concepts. He has a background in Marketing, is certified as Data Scientist by the Data Science Academy Brazil and pursues his specialist MBA in Data Science at the University of São Paulo
Read more about Gustavo R Santos