Search icon
Arrow left icon
All Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletters
Free Learning
Arrow right icon
Machine Learning for Healthcare Analytics Projects

You're reading from  Machine Learning for Healthcare Analytics Projects

Product type Book
Published in Oct 2018
Publisher Packt
ISBN-13 9781789536591
Pages 134 pages
Edition 1st Edition
Languages

DNA Classification

In this chapter, we will explore the world of bioinformatics. We will use Markov models, k-nearest neighbors algorithms, support vector machines, and other common classifiers, to classify short E. coli DNA sequences. For this project, will use a dataset from the UCI machine learning repository that has 106 DNA sequences, with 57 sequential nucleotides each. You will learn how to import data from the UCI repository, convert text input to numerical data, build and train classification algorithms, and compare and contrast classification machine learning algorithms.

We will cover the following topics:

  • Classifying DNA sequences
  • Data preprocessing

Classifying DNA sequences

Let's classify DNA sequences by performing the following steps:

  1. Let's start a new Python Jupyter Notebook and name it DNA Classification. As always, one of the first steps that we want to take is to import the libraries and the modules that we need, and check the versions, in order to make sure that we're on the same page and that we don't make any errors while importing the modules. If we do, we can go back and install them again.

Take a look at the following code snippet:

import sys
import numpy
import sklearn
import pandas

print('Python: {}'.format(sys.version))
print('Numpy: {}'.format(numpy.__version__))
print('Sklearn: {}'.format(sklearn.__version__))
print('Pandas: {}'.format(pandas.__version__))

The preceding code snippet will import all of the necessary libraries, and will indicate which...

Summary

In this chapter, we were able to predict whether or not a short sequence of E.coli bacteria DNA was a promoter or a non-promoter with 96% accuracy. We looked at how to import data from a repository, and how to convert textual input to numerical data. We then built and trained classification algorithms and compared and contrasted them by using the classification report.

In the next chapter, we will learn about diagnosing coronary artery disease.

lock icon The rest of the chapter is locked
You have been reading a chapter from
Machine Learning for Healthcare Analytics Projects
Published in: Oct 2018 Publisher: Packt ISBN-13: 9781789536591
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime}