Preface

Machine learning in the healthcare domain is booming because of its ability to provide accurate and stable techniques. Machine learning algorithms provide strategies to deal with a variety of structured, unstructured, and semi-structured data. This book is packed with new approaches and methodologies to create powerful solutions for healthcare analytics.

This book will implement key machine learning algorithms and their use cases using a range of libraries from the Python ecosystem. We will build five end-to-end projects within the organization to evaluate the efficiency of artificial intelligence applications when carrying out simple and complex healthcare analytics tasks. Each project will help you to delve deep into newer and better ways to manage insights and handle healthcare data efficiently. We will use machine learning to detect cancer in a set of patients using the SVM and KNN models. Apart from that, we will create a deep neural network in Keras to predict the onset of diabetes on a huge dataset of patients. We will also learn how to predict heart diseases using neural networks.

By the end of this book, you will learn how to address long-standing challenges, provide specialized solutions to deal with them, and carry out a range of cognitive tasks in the healthcare domain.

Who this book is for

If you are a data scientist, machine learning engineer, or a healthcare professional who wants to implement machine learning algorithms to build smart AI applications, then this is a book for you.

Basic knowledge of Python or any programming language is expected to get the most from this book.

What this book covers

Chapter 1, Breast Cancer Detection, will show you how to import data from the UCI repository. In this chapter, we will name the columns (or features) and put them into a pandas DataFrame. We will learn how to preprocess our data and remove the ID column. We will also explore the data so that we know more about it. We will also see how to create histograms (so that we can understand the distributions of the different features) and a scatterplot matrix (so that we can look for linear relationships between the variables). We will learn how to implement some testing parameters, build a KNN classifier and an SVC, and compare their results using a classification report. Finally, we will build our own cell and explore what it would take to actually get a malignant or benign classification.

Chapter 2, Diabetes Onset Detection, covers the building of a deep neural network in Keras. We will explore the optimal hyperparameters using the scikit-learn grid search. We will also learn how to optimize a network by tuning the hyperparameters. In this chapter, we will explore how to apply the network to predict the onset of diabetes in a huge dataset of patients.

Chapter 3, DNA Classification, will show how to predict the functional outcome—or a promoter/non-promoter —for a DNA sequence from E. coli bacteria with 96% accuracy. We will look at how to import data from a repository and how to convert textual inputs to numerical data. We will then learn to build and train classification algorithms and compare and contrast them using the classification report.

Chapter 4, Diagnosing Coronary Artery Disease, will show how to use sklearn and Keras, how to import data from a UCI repository using the pandas read_csv function, and how to preprocess that data. We will then learn how to describe the data and print out histograms so we know what we're working with, followed by executing a train/test split with the model_selection function from sklearn.

Furthermore, we will also learn how to convert one-hot encoded vectors for a categorical classification, defining simple neural networks using Keras. We will look at activation functions, such as softmax, for categorical classifications with categorical_crossentropy. We will also look at training the data and how we fit our model to our training data for both categorical and binary problems. Ultimately, we will look at how to do a classification report and an accuracy score for our results.

Chapter 5, Autism Screening with Machine Learning, will show how to predict autism in patients with approximately 90% accuracy. We will also learn how to deal with categorical data; a lot of health applications are going to have categorical data and one way to address them is by using one-hot encoded vectors. Furthermore, we will learn how to reduce overfitting using dropout regularization.

To get the most out of this book

This book will help you to build real-world machine learning solutions across the healthcare vertical using NumPy, pandas, matplotlib, scikit-learn, and so on. You need not have any prior knowledge before exploring this book. You will get well versed on how exactly machine learning is implemented to evaluate the efficiency of AI apps, and how to carry out simple-to-complex healthcare analytics tasks. This is a perfect entry point packed with practical examples to carry out a range of cognitive tasks. By the end of this book, you will have learned how to address long-standing challenges in the healthcare domain, and produce solutions for dealing with them.

Download the example code files

You can download the example code files for this book from your account at www.packt.com. If you purchased this book elsewhere, you can visit www.packt.com/support and register to have the files emailed directly to you.

You can download the code files by following these steps:

Log in or register at www.packt.com.
Select the SUPPORT tab.
Click on Code Downloads and Errata.
Enter the name of the book in the Search box and follow the onscreen instructions.

Once the file is downloaded, please make sure that you unzip or extract the folder using the latest version of:

WinRAR/7-Zip for Windows
Zipeg/iZip/UnRarX for Mac
7-Zip/PeaZip for Linux

The code bundle for the book is also hosted on GitHub at https://github.com/PacktPublishing/Machine-Learning-for-Healthcare-Analytics-Projects. In case there's an update to the code, it will be updated on the existing GitHub repository. We also have other code bundles from our rich catalog of books and videos available at https://github.com/PacktPublishing/. Check them out!

Download the color images

We also provide a PDF file that has color images of the screenshots/diagrams used in this book. You can download it here: https://www.packtpub.com/sites/default/files/downloads/9781789536591_ColorImages.pdf.

Conventions used

There are a number of text conventions used throughout this book.

CodeInText: Indicates code words in text, database table names, folder names, filenames, file extensions, pathnames, dummy URLs, user input, and Twitter handles. Here is an example: "We will then rename that file autism_detection."

A block of code is set as follows:

import sys
import pandas as pd
import sklearn
import keras
print 'Python: {}'.format(sys.version)
print 'Pandas: {}'.format(pd.__version__)
print 'Sklearn: {}'.format(sklearn.__version__)
print 'Keras: {}'.format(keras.__version__)

When we wish to draw your attention to a particular part of a code block, the relevant lines or items are set in bold:

[default]
exten => s,1,Dial(Zap/1|30)
exten => s,2,Voicemail(u100)
exten => s,102,Voicemail(b100)
exten => i,1,Voicemail(s0)

Any command-line input or output is written as follows:

jupyter lab

Bold: Indicates a new term, an important word, or words that you see onscreen. For example, words in menus or dialog boxes appear in the text like this. Here is an example: "If we go into Files, we will see all the files that we have in the directory, as shown in the following screenshot."

Warnings or important notes appear like this.

Tips and tricks appear like this.

Get in touch

Feedback from our readers is always welcome.

General feedback: If you have questions about any aspect of this book, mention the book title in the subject of your message and email us at customercare@packtpub.com.

Errata: Although we have taken every care to ensure the accuracy of our content, mistakes do happen. If you have found a mistake in this book, we would be grateful if you would report this to us. Please visit www.packt.com/submit-errata, selecting your book, clicking on the Errata Submission Form link, and entering the details.

Piracy: If you come across any illegal copies of our works in any form on the Internet, we would be grateful if you would provide us with the location address or website name. Please contact us at copyright@packt.com with a link to the material.

If you are interested in becoming an author: If there is a topic that you have expertise in and you are interested in either writing or contributing to a book, please visit authors.packtpub.com.

Reviews

Please leave a review. Once you have read and used this book, why not leave a review on the site that you purchased it from? Potential readers can then see and use your unbiased opinion to make purchase decisions, we at Packt can understand what you think about our products, and our authors can see your feedback on their book. Thank you!

For more information about Packt, please visit packt.com.