You're reading from Neural Network Projects with Python

Product typeBook

Published inFeb 2019

Reading LevelBeginner

PublisherPackt

ISBN-139781789138900

Edition1st Edition

Languages

Python

Concepts

Neural Networks

Author (1)

James Loy

Predicting Diabetes with Multilayer Perceptrons

In the first chapter, we went through the inner workings of a neural network, how to build our own neural network using Python libraries such as Keras, as well as the end-to-end machine learning workflow. In this chapter, we will apply what we have learned to build a multilayer perceptron (MLP) that can predict whether a patient is at risk of diabetes. This marks the first neural network project that we will build from scratch.

In this chapter, we will cover the following topics:

Understanding the problem that we're trying to tackle—diabetes mellitus
How AI is being used in healthcare today, and how AI will continue to transform healthcare
An in-depth analysis of the diabetes mellitus dataset, including data visualization using Python
Understanding MLPs, and the model architecture that we will use
A step-by-step guide...

Technical requirements

The key Python libraries required for this chapter are as follows:

matplotlib 3.0.2
pandas 0.23.4
Keras 2.2.4
NumPy 1.15.2
seaborn 0.9.0
scikit-learn 0.20.2

To download the dataset required for this project, please refer to the instructions at https://raw.githubusercontent.com/PacktPublishing/Neural-Network-Projects-with-Python/master/Chapter02/how_to_download_the_dataset.txt.

The code for this chapter can be found in the GitHub repository for the book at https://github.com/PacktPublishing/Neural-Network-Projects-with-Python.

To download the code into your computer, you may run the following git clone command:

$ git clone https://github.com/PacktPublishing/Neural-Network-Projects-with-Python.git

After the process is complete, there will be a folder titled Neural-Network-Projects-with-Python . Enter the folder by running this command:

$ cd Neural-Network...

Diabetes – understanding the problem

Diabetes is a chronic medical condition that is associated with elevated blood sugar levels in the body. Diabetes often leads to cardiovascular disease, stroke, kidney damage, and long-term damage to the extremities (that is, limbs and eyes).

It is estimated that there are 415 million people in the world suffering from diabetes, with up to 5 million deaths every year attributed to diabetes-related complications. In the United States, diabetes is estimated to be the seventh highest cause of death. Clearly, diabetes is a cause of concern to the wellbeing of modern society.

Diabetes can be divided into two subtypes: type 1 and type 2. Type 1 diabetes results from the body's inability to produce sufficient insulin. Type 1 diabetes is relatively rare compared to type 2 diabetes, and it only accounts for approximately 5% of diabetes....

AI in healthcare

Beyond predicting diabetes using machine learning, the field of healthcare, in general, is ripe for disruption by AI. According to a study by Accenture, the market for AI in healthcare is set for explosive growth, with an estimated compound annual growth rate of 40% by 2021. This significant growth is driven by a proliferation of AI and tech companies in healthcare.

Apple's chief executive officer, Tim Cook, believes that Apple can make significant contributions in healthcare. Apple's vision for disrupting healthcare can be exemplified by its developments in wearable technology. In 2018, Apple announced a new generation of smartwatches with active monitoring of cardiovascular health. Apple's smartwatches can now conduct electrocardiography in real time, and even warn you when your heart rate becomes abnormal, which is an early sign of cardiovascular...

The diabetes mellitus dataset

The dataset that we will be using for this project comes from the Pima Indians Diabetes dataset, as provided by the National Institute of Diabetes and Digestive and Kidney Diseases (and hosted by Kaggle).

The Pima Indians are a group of native Americans living in Arizona, and they are a highly studied group of people due to their genetic predisposition to diabetes. It is believed that the Pima Indians carry a gene that allows them to survive long periods of starvation. This thrifty gene allowed the Pima Indians to store in their bodies whatever glucose and carbohydrates they may eat, which is genetically advantageous in an environment where famines were common.

However, as society modernized and the Pima Indians began to change their diet to one of processed food, the rate of type 2 diabetes among them began to increase as well. Today, the incidence...

Exploratory data analysis

Let's dive into the dataset to understand the kind of data we are working with. We import the dataset into pandas:

import pandas as pd

df = pd.read_csv('diabetes.csv')

Let's take a quick look at the first five rows of the dataset by calling the df.head() command:

print(df.head())

We get the following output:

It looks like there are nine columns in the dataset, which are as follows:

Pregnancies: Number of previous pregnancies
Glucose: Plasma glucose concentration
BloodPressure: Diastolic blood pressure
SkinThickness: Skin fold thickness measured from the triceps
Insulin : Blood serum insulin concentration
BMI: Body mass index
DiabetesPedigreeFunction: A summarized score that indicates the genetic predisposition of the patient for diabetes, as extrapolated from the patient's family record for diabetes
Age: Age in years
Outcome...

Data preprocessing

In the previous section, Exploratory data analysis, we have discovered that there are 0 values in certain columns, which indicates missing values. We have also seen that the variables have different scales, which can negatively impact model performance. In this section, we will perform data preprocessing to handle these issues.

Handling missing values

First, let's call the isnull() function to check whether there are any missing values in the dataset:

print(df.isnull().any())

We'll see the following output:

It seems like there are no missing values in the dataset, but are we sure? Let's get a statistical summary of the dataset to investigate further:

print(df.describe())

The output is as...

MLPs

Now that we have completed exploratory data analysis and data preprocessing, let's turn our attention towards designing the neural network architecture. In this project, we will be using MLPs.

An MLP is a class of feedforward neural network, and it distinguishes itself from the single-layer perceptron that we've discussed in Chapter 1, Machine Learning and Neural Networks 101, by having at least one hidden layer, with each layer activated by a non-linear activation function. This multilayer neural network architecture and non-linear activation allows MLPs to produce non-linear decision boundaries, which is crucial in multi-dimensional real-world datasets such as the Pima Indians Diabetes dataset.

Model architecture

...

Model building in Python using Keras

We're finally ready to build and train our MLP in Keras.

Model building

As we mentioned in Chapter 1, Machine Learning and Neural Networks 101, the Sequential() class in Keras allows us to construct a neural network like Lego, stacking layers on top of one another.

Let's create a new Sequential() class:

from keras.models import Sequential

model = Sequential()

Next, let's stack our first hidden layer. The first hidden will have 32 nodes, and the input dimensions will be 8 (because there are 8 columns in X_train). Notice that for the very first hidden layer, we need to indicate the input dimensions. Subsequently, Keras will take care of the size compatibility of other hidden...

Results analysis

Having successfully trained our MLP, let's evaluate our model based on the testing accuracy, confusion matrix, and receiver operating characteristic (ROC) curve.

Testing accuracy

We can evaluate our model on the training set and testing set using the evaluate() function:

scores = model.evaluate(X_train, y_train)
print("Training Accuracy: %.2f%%\n" % (scores[1]*100))

scores = model.evaluate(X_test, y_test)
print("Testing Accuracy: %.2f%%\n" % (scores[1]*100))

We get the following result:

The accuracy is 91.85% and 78.57% on the training set and testing set respectively. The difference in accuracy between the training and testing set isn't surprising since the model was trained on...

Summary

In this chapter, we have designed and implemented an MLP that is capable of predicting the onset of diabetes with ~80% accuracy.

We first performed exploratory data analysis where we looked at the distribution of each variable, as well as the relationship between each variable and the target variable. We then performed data preprocessing to remove missing data and we also standardized our data such that each variable has a mean of 0 with unit standard deviation. Finally, we split our original data randomly into a training set, a validation set, and a testing set.

We then looked at the architecture of the MLP that we used, which consists of 2 hidden layers, with 32 nodes in the first hidden layer and 16 nodes in the second hidden layer. We then implemented this MLP in Keras using the sequential model, which allows us to stack layers on one another. We then trained our MLP...

Questions

How do we plot a histogram of each variable in a pandas DataFrame, and why are histograms useful?

We can plot a histogram by calling the df.hist() function built into a pandas DataFrame class. A histogram provides an accurate representation of the distribution of our numerical data.

How do we check for missing values (NaN values) in a pandas DataFrame?

We can call the df.isnull().any() function to easily check whether there are any null values in each column of the dataset.

Besides NaN values, what other kinds of missing values could appear in a dataset?

Missing values can also appear in the form of 0 values. Missing values are often recorded as 0 in a dataset due to certain issues during data collection—perhaps the equipment was faulty, or there are other issues hindering data collection.

Why is it crucial to remove missing values in a dataset before...

The rest of the chapter is locked

You have been reading a chapter from

Neural Network Projects with Python

Published in: Feb 2019Publisher: PacktISBN-13: 9781789138900

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

undefined

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at $15.99/month. Cancel anytime

Author (1)

James Loy

James Loy has more than five years, expert experience in data science in the finance and healthcare industries. He has worked with the largest bank in Singapore to drive innovation and improve customer loyalty through predictive analytics. He has also experience in the healthcare sector, where he applied data analytics to improve decision-making in hospitals. He has a master's degree in computer science from Georgia Tech, with a specialization in machine learning. His research interest includes deep learning and applied machine learning, as well as developing computer-vision-based AI agents for automation in industry. He writes on Towards Data Science, a popular machine learning website with more than 3 million views per month.
Read more about James Loy

Other recommended products

Related to this chapter

Machine Learning for Healthcare Analytics Projects

Machine Learning in the healthcare domain is booming because of its abilities to provide accurate and stabilized techniques. This book is packed with new methodologies to create efficient solutions for healthcare analytics. We will build five end-to-end projects to evaluate the efficiency of AI apps to carry out simple-to-complex healthcare analytics tasks.

BookOct 2018134 pages

Applied Deep Learning with Keras

Applied Deep Learning with Keras takes you from a basic knowledge of machine learning and Python to an expert understanding of applying Keras to develop efficient deep learning solutions. This book teaches you new techniques to handle neural networks, and in turn, broadens your options as a data scientist.

BookApr 2019412 pages

Hands-On One-shot Learning with Python

This book is a step by step guide to one-shot learning using Python-based libraries. It is designed to help you understand and design models that can learn information about your data from one, or only a few, training examples. You will also learn to apply these techniques with real-world examples and datasets for classification and regression.

BookApr 2020156 pages

Python Deep Learning Cookbook

Deep Learning is a rapidly evolving field of Machine Learning science which gives machines the ability to learn from information. This book contains detailed recipes to tackle with the common and not so common problems while dealing with deep learning algorithms and models in Python. You will benefit from this book by finding technical solutions to the issues presented, along with a detailed explanation of the solutions, and a discussion on corresponding pros and cons of implementing the proposed solution using Theano, Tensorflow, MXNet, and Keras. You'll come across recipes on data pre-processing, network models and topologies, supervised and unsupervised learning presented in a “solution to problem” fashion.

BookOct 2017330 pages

Hands-On Java Deep Learning for Computer Vision

This book will take you through the process of efficiently training deep neural networks in Java for Computer Vision-related tasks. You will build real-world applications ranging from simple Java handwritten digit recognition models to real-time autonomous car driving systems and face recognition models using the popular Java-based libraries.

BookFeb 2019260 pages

Hands-On Deep Learning with TensorFlow

With deep learning going mainstream, making sense of data and getting accurate results using deep networks is possible. Dan Van Boxel is your guide to exploring the possibilities with deep learning; he will enable you to understand data like never before. With the efficiency and simplicity of TensorFlow, you will be able to process your data and gain insights that will change how you look at data.

BookJul 2017174 pages

Python Machine Learning Workbook for Beginners

Through a series of machine learning and data science projects, this book represents a beginner-friendly crash course to Python’s practical application in businesses and your own career.

BookMar 2021279 pages

Keras Deep Learning Cookbook

This book gives you a practical, hands-on understanding of how you can leverage the power of Python and Keras to perform effective deep learning. It presents a unique problem-solution approach to tackle various problems in training different types of neural networks while taking care of the speed and accuracy of these models

BookOct 2018252 pages

The Deep Learning with Keras Workshop

Cut through the noise and get real results with a step-by-step approach to understanding deep learning with Keras programming

BookFeb 2020446 pages

Deep Learning with R Cookbook

This book will help you get through the problems that you face during the execution of different tasks and understand hacks in deep learning. With unique recipes, you will implement various deep learning architectures using R 3.5.x. You will cover complex algorithms to perform tasks such as reinforcement learning, GANs, advanced neural networks and more.

BookFeb 2020328 pages

Deep Learning with Microsoft Cognitive Toolkit Quick Start Guide

Cognitive Toolkit is one of the most popular and recently open sourced deep learning toolkit by Microsoft. Cognitive Toolkit is used to train fast and effective deep learning models. This book will be a quick introduction to using Cognitive Toolkit and will teach you how to train and validate different types of neural networks.

BookMar 2019208 pages

Advanced Deep Learning with R

This book will help readers to apply deep learning algorithms in R using advanced examples. You will cover variants of neural network models such as ANN, CNN, RNN, LSTM, and more using expert techniques. Readers will make use of popular deep learning libraries such as Keras-R, Tensorflow-R, and more to implement AI models.

BookDec 2019352 pages

Personalised recommendations for you

Based on your interests and search pattern

Et al.

Ever wonder why speech recognition systems don't understand the Scottish accent, or what would happen if an astronaut only ate mac 'n' cheese, or other spurious reflections you'd have at a bar? We did, then collated those deliberations into absurd research articles with fake figures and methodologies inspired by even more fictionally absurd studies.

BookAug 2023230 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages4

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages1

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Mastering Tableau 2023

This book is a comprehensive resource to mastering your Tableau skills and becoming a BI expert. As you progress, you will learn how to build advanced dashboards and improve your storytelling to derive key business insight, as well as make you well-versed with advanced functionalities of Tableau in the business intelligence domain.

BookAug 2023684 pages

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages5

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages2

Data Engineering with AWS

Embark on a journey to master data engineering pipelines on AWS! Our book offers a hands-on experience of AWS services for ingesting, transforming, and consuming data. Whether you're an absolute beginner or someone with basic data engineering experience, this guide is an indispensable resource.

BookOct 2023636 pages5

Modern Data Architecture on AWS

Every organization wants an agile, performant, and cost-effective data platform that meets all their current and future business needs. Purpose-built AWS analytics services and their features play a big part in building such a modern data platform. This book brings to you all the design and architectural patterns that’ll help you achieve this goal.

BookAug 2023420 pages5

Practical Guide to Applied Conformal Prediction in Python

Discover the power of Conformal Prediction with the "Practical Guide to Applied Conformal Prediction in Python." Master the latest techniques to quantify uncertainty in machine learning and computer vision models, and seamlessly apply them to your industry applications.

BookDec 2023240 pages

TinyML Cookbook

With over 70 project-based recipes, the TinyML Cookbook is a practical guide that will help you to get the most out of your microcontrollers. It provides a comprehensive understanding of the theoretical foundations while giving you hands-on experience training ML models for deployment on Arduino Nano 33 BLE Sense, Raspberry Pi Pico, and SparkFun RedBoard Artemis Nano microcontrollers.

BookNov 2023664 pages