Reader small image

You're reading from  Hands-On Machine Learning with Microsoft Excel 2019

Product typeBook
Published inApr 2019
PublisherPackt
ISBN-139781789345377
Edition1st Edition
Tools
Right arrow
Author (1)
Julio Cesar Rodriguez Martino
Julio Cesar Rodriguez Martino
author image
Julio Cesar Rodriguez Martino

Julio Cesar Rodriguez Martino is a machine learning (ML) and artificial intelligence (AI) platform architect, focusing on applying the latest techniques and models in these fields to optimize, automate, and improve the work of tax and accounting consultants. The main tool used in this practice is the MS Office platform, which Azure services complement perfectly by adding intelligence to the different tasks. Julio's background is in experimental physics, where he learned and applied advanced statistical and data analysis methods. He also teaches university courses and provides in-company training on machine learning and analytics, and has a lot of experience leading data science teams.
Read more about Julio Cesar Rodriguez Martino

Right arrow

Assessment

Chapter 1, Implementing Machine Learning Algorithms

  1. In classical programming, the code developed and run in the computer is a step-by-step set of instructions telling the computer what to do and how to handle different options. Machine learning is about showing the computer examples of data to either teach it what to do by example, or to let it learn information that is hidden in the data.
  2. The machine learning models can be either regression (if the target variable is numerical and continuous) or classification (if the target variable is categorical or discrete).
  3. Models that learn by example, training on labeled data, are called supervised machine learning models. In comparison, those that find information in the unlabeled data are called unsupervised machine learning models.
  4. The following are the main steps that are needed when creating and using a machine learning model:
    1. Obtaining...

Chapter 2, Hands-On Examples of Machine Learning Models

  1. Encoding prepares categorical features in order to feed them into a machine learning model and does not assume any prior correlation between the encoded values.
  2. By setting a limit to the length of the tree or by defining a minimum entropy value.
  3. Temperature_hot is equally split; two values end in Train_outside = yes, and two values end in Train_outside = no. This represents the maximum entropy value, where there is no clear information about what to do if the temperature is hot.
  1. The following IF statements would be considered when deciding whether or not to train outside:
    • If outlook is Sunny and it's not windy, then train outside.
    • If outlook is Sunny and it's windy, then don't train outside.
    • If outlook is Overcast, then Train outside.
    • If outlook is Rainy and Humidity is high, then don't train outside...

Chapter 3, Importing Data into Excel from Different Data Sources

  1. Any character that is not confused with the file contents.
  2. The outcome of a machine learning model will be affected by missing or incorrect data entries, and the correct format should also be used.
  3. Importing an Excel file will open the Power Query interface in order to preprocess the data.
  4. Data that is in a tabular form.
  5. An exhaustive list can be found at https://gist.github.com/gelisam/13d04ac5a54b577b2492785c1084281f.
  6. An example can be found at https://stackoverflow.com/questions/38120895/database-vs-file-system-storage.

Chapter 4, Data Cleansing and Preliminary Data Analysis

  1. Instead of building the decision tree manually, it would be interesting to study in-depth the example built-in Azure Machine Learning Studio, which was shown in Chapter 10, Azure and Excel - Machine Learning in the Cloud.
  2. cabin and fare, pclass and fare, home.dest and fare are some examples.
  3. Missing values could be replaced by the mean value of the variable.
  4. Any unbalance in the dataset is referred to as bias. This will affect the results of any machine learning model, since the model will find more examples of a given class or some tendency to a particular target value.
  5. You can, for example, try to see some correlations between variables using scatter plots.

Chapter 5, Correlations and the Importance of Variables

  1. You can, for example, build a diagram with the categorical values on the x axis and the numerical values on the y axis; any correlation would be clear from this diagram.
  2. It should be easy for the reader to build diagrams and understand the relationship between variables.
  3. No. It means that when a variable increases, the other variable decreases.
  4. This formatting was used in Chapter 6, Data Mining Models in Excel Hands-On Examples.
  5. We calculated the Squared Error (SSE) as ([@mpg]-[@prediction])^2. The other sum we need is SST = ([@mpg]-average([@prediction]))^2. Then, we calculate R2 = 1-SSE/SST.
  6. You can try using an exponential function (EXP()) or another function with a similar shape. The R2 value will probably still be far from 1, since the dispersion in the data is very high.
...

Chapter 6, Data Mining Models in Excel Hands-On Examples

  1. Use the previous knowledge of the business to discard these associations.
  2. Not necessarily. These types of analysis are usually dependent on the business domain and even on the particular place where we perform them. This means that some results can be generalized, but, often, not all of them.
  3. It means that there is no customer that started buying products by the time indicated in the column and that kept buying after the period of time shown in the row.
  4. There are no customers that old (in terms of time spent as customers).
  5. For example, focusing on those that stop buying and aiming ad campaigns at them.

Chapter 7, Implementing Time Series

  1. By setting increasing(TravelDate) to the moving average values in the calculation and following the same steps.
  2. If the seasonality is too different from the real value in the data, then the prediction will have less accuracy. If we increase the confidence interval, then the error will also increase.
  3. Using the COVARIANCE.P function in Excel.
  1. The time series diagram, after applying the logarithm, will look like the following screenshot:

The trend is still ascending, but the standard deviation looks flat and is not dependent on the time.

Chapter 8, Visualizing Data in Diagrams, Histograms, and Maps

  1. It is very difficult to distinguish the different pie slices.
  2. Multiple line charts.
  3. You can get data from https://openaddresses.io/ and follow the instructions in this article: https://www.roguegeographer.com/create-your-own-maps-using-excel-3d-maps/.
  4. It is possible to do it and get a result, but the accuracy will be bad. The result of an election depends mostly on external factors that are not taken into account by the data, and not so much on the historical results of past elections.

Chapter 9, Artificial Neural Networks

  1. The result will depend on the artificial neural network training. You can follow the step-by-step instructions in the Evaluating models subsection in Chapter 1, Implementing Machine Learning Algorithms.
  2. The dataset is unbalanced and that will affect the results.

Chapter 10, Azure and Excel - Machine Learning in the Cloud

  1. Cost, speed, global scale, productivity, performance, and security.
  2. Cloud computing is useful for many different applications and, in fact, can replace everything that was built on-premise, from databases to visualizations.
  3. Web services are applications hosted on the internet, which can communicate with other applications through predefined protocols and data formats. The advantage of using web services is that they are easy to share and are independent from the operating system and programming language used.
  4. Azure Machine Learning Studio needs the input data format, and this is taken from the input data module.
  5. The training flow is used to train the model and then save it. The same model is then used in a separate flow for prediction, without the need to retrain the model every time it is used.
...

Chapter 11, The Future of Machine Learning

  1. The model training and testing is replaced by data mining, which works by trying to get useful information from the data.
  2. New data is included continuously into the data flow, and the full cycle must be fulfilled before feeding it into a machine learning model.
  3. A hyperparameter value is set before starting the learning process and defines some characteristics of the model (for example, the number of cycles in an artificial neural network training model).
  4. The following steps can be performed automatically by AutoML:
    • Data preprocessing
    • Feature engineering
    • Model selection
    • Optimization of the model hyperparameters
    • Analysis of the model results
lock icon
The rest of the chapter is locked
You have been reading a chapter from
Hands-On Machine Learning with Microsoft Excel 2019
Published in: Apr 2019Publisher: PacktISBN-13: 9781789345377
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Author (1)

author image
Julio Cesar Rodriguez Martino

Julio Cesar Rodriguez Martino is a machine learning (ML) and artificial intelligence (AI) platform architect, focusing on applying the latest techniques and models in these fields to optimize, automate, and improve the work of tax and accounting consultants. The main tool used in this practice is the MS Office platform, which Azure services complement perfectly by adding intelligence to the different tasks. Julio's background is in experimental physics, where he learned and applied advanced statistical and data analysis methods. He also teaches university courses and provides in-company training on machine learning and analytics, and has a lot of experience leading data science teams.
Read more about Julio Cesar Rodriguez Martino