Reader small image

You're reading from  Hands-On Machine Learning with Microsoft Excel 2019

Product typeBook
Published inApr 2019
PublisherPackt
ISBN-139781789345377
Edition1st Edition
Tools
Right arrow
Author (1)
Julio Cesar Rodriguez Martino
Julio Cesar Rodriguez Martino
author image
Julio Cesar Rodriguez Martino

Julio Cesar Rodriguez Martino is a machine learning (ML) and artificial intelligence (AI) platform architect, focusing on applying the latest techniques and models in these fields to optimize, automate, and improve the work of tax and accounting consultants. The main tool used in this practice is the MS Office platform, which Azure services complement perfectly by adding intelligence to the different tasks. Julio's background is in experimental physics, where he learned and applied advanced statistical and data analysis methods. He also teaches university courses and provides in-company training on machine learning and analytics, and has a lot of experience leading data science teams.
Read more about Julio Cesar Rodriguez Martino

Right arrow

Importing Data into Excel from Different Data Sources

Serious data analysis and machine learning cannot be done by using hand-typed data. Data sources come in different flavors and sizes, and Excel can handle many of them. This chapter deals with how to import data from different sources, which is the first step of any analysis.

Get & Transform (known as Power Query in Excel versions before 2016) is a powerful tool that you can use to load data from different sources and transform it. These transformations are necessary so that you have a clean data sample that you can then use to train and test any machine learning model.

If you are running Excel 2010 SP1 or Excel 2013, you need to download and install Power Query. Refer link https://www.microsoft.com/en-us/download/details.aspx?id=39379 for instructions on how to install it.

In this chapter, we will cover the following topics...

Technical requirements

Importing data from a text file

The more commonly used data text file is comma-separated values (CSV). As the name suggests, values are written to the file in rows and, for each row, a comma separates the values belonging to each column. Open a new workbook and follow these steps:

  1. Click on Data.
  2. Navigate to Get Data | From File | From Text/CSV:
  1. Navigate to the file's location and open the homes.csv file.
  1. A window will pop up, showing you a preview of the file's contents, as shown in the following screenshot:

We can see that Excel correctly identifies the different columns by using the right delimiter (comma). It also tries to detect the data types automatically. There is a small problem, though. This file is not pure CSV, and has an extra line at the beginning, showing us where it was downloaded from originally. This is good for giving credit to the original...

Importing data from another Excel workbook

Why would we import data from an Excel workbook if we can just open it? The main reason is to take advantage of the transformations we can do using Get & Transform. We will show this by using a file containing real data from the Titanic passengers, which is often used to test machine learning classification models and predict whether a given passenger survived the tragedy or not.

Let's follow some simple steps to load and transform the data. While in a new workbook, follow these steps:

  1. Click on Data.
  2. Navigate to Get Data | From File | From Workbook, as shown in the following screenshot:

The preview window is slightly different to the one we see when opening a CSV file, as you can see in the following screenshot:

In the left part of the preview, you can see a folder named titanic.xls, which represents the file, and below that...

Importing data from a web page

If a web page contains data in tabular form, then Excel is able to import these tables automatically. As an example, we will import a table from the Wikipedia page about Excel. Here's how we will go about it:

  1. Click on Data and then navigate to Get Data | From Other Sources | From Web, as shown in the following screenshot:
  1. In the dialog that pops up, we type in the URL of the mentioned web page and click OK, as shown in the following screenshot:
  1. Now, we will be able to see a list of available tables. Pick the one showing Excel's release history. A table preview is shown to the right, as shown in the following screenshot:
  1. As we did before, we can just load the data or edit it, transforming the values according to our needs. The final result is an Excel table.
We also have the choice of changing to web view and seeing the target...

Importing data from Facebook

It is possible to import data directly from a Facebook profile or page. If we want to, for example, register the dates and times when a new post appears, we could follow these steps:

  1. Click on Data.
  2. Navigate to Get Data | From Online Services | From Facebook, as shown in the following screenshot:

  1. You need to specify the user or page name, which is the last part of the page URL. As an example, I will use my own Facebook page, which is the default option. The first time you connect, you will have to log in with your username and password (to connect to Facebook from Excel, you need a Facebook profile). The following dialog will appear:
  1. We will choose to get information about Posts. After clicking OK, we get the following screen (I will only show it partially, to protect my privacy):
  1. As we can see, the date and times of the posts are concatenated...

Importing data from a JSON file

JSON is a standard format for sharing data, since it uses text fields that can be read by a human being. It is used by most web applications for data input and output.

In our example, we will use the Azure Text Analytics API. Given a sentence, this service can identify the text sentiment and the language and extract keywords, among other things.

The input sentence is I had a wonderful trip to Seattle and enjoyed seeing the Space Needle!. The API correctly identifies the language as English, extracts the main keywords, and tells us that the sentiment is positive (assigning a value larger than 0.5). All of this information is given in JSON format on the right-hand side of the window, and is available in the file we are going to import. The following screenshot has been extracted from the Azure Text Analytics demonstration page:

To load the input...

Importing data from a database

There are many different databases available, and Excel can connect to most of them. The connection procedure is similar for all of them. We are going to use one as an example: the MS SQL Server Express database, which is free and can be downloaded on any computer. It has some limitations, but it is extremely useful for learning and testing with small amounts of data. Assuming there is a local database in your computer, perform the following steps to connect the database:

  1. Click on Data.
  2. Navigate to Get Data | From Database | From SQL Server Database, as shown in the following screenshot:
  1. The pop-up dialog will request the name of the database server. In this case, it is the local computer name and the SQL Express Server. Optionally, we can add the database name, but if we leave it blank, we will be able to see a list of all databases in the server...

Summary

In this chapter, we described different methods of inputting information into an Excel spreadsheet, going beyond what can be done by hand-typing data. A variety of file types, web data sources, and databases can be analyzed from within Excel by using Power Query and Query Editor to extract, transform, and load data. I encourage you to explore other data sources, since the loading procedure is very similar.

So far, we have seen some very simple data transformations being applied to the data before it was loaded. In the next chapter, we will discuss more advanced techniques for cleansing data.

Questions

  1. Which characters can be used as separators in a text file?
  2. Why is it so important to pre-process the data before loading it?
  3. What is the difference between opening an Excel file and importing it?
  4. What type of information can be imported from a web page?
  5. JSON is one of the structured formats that's used to exchange information. What other formats exist?
  6. Do some online research to understand the advantages of using databases over individual data files.
lock icon
The rest of the chapter is locked
You have been reading a chapter from
Hands-On Machine Learning with Microsoft Excel 2019
Published in: Apr 2019Publisher: PacktISBN-13: 9781789345377
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Author (1)

author image
Julio Cesar Rodriguez Martino

Julio Cesar Rodriguez Martino is a machine learning (ML) and artificial intelligence (AI) platform architect, focusing on applying the latest techniques and models in these fields to optimize, automate, and improve the work of tax and accounting consultants. The main tool used in this practice is the MS Office platform, which Azure services complement perfectly by adding intelligence to the different tasks. Julio's background is in experimental physics, where he learned and applied advanced statistical and data analysis methods. He also teaches university courses and provides in-company training on machine learning and analytics, and has a lot of experience leading data science teams.
Read more about Julio Cesar Rodriguez Martino