Reader small image

You're reading from  Bioinformatics with Python Cookbook. - Second Edition

Product typeBook
Published inNov 2018
Reading LevelIntermediate
PublisherPackt
ISBN-139781789344691
Edition2nd Edition
Languages
Right arrow
Author (1)
Tiago Antao
Tiago Antao
author image
Tiago Antao

Tiago Antao is a bioinformatician currently working in the field of genomics. A former computer scientist, Tiago moved into computational biology with an MSc in Bioinformatics from the Faculty of Sciences at the University of Porto (Portugal) and a PhD on the spread of drug-resistant malaria from the Liverpool School of Tropical Medicine (UK). Postdoctoral, Tiago has worked with human datasets at the University of Cambridge (UK) and with mosquito whole genome sequencing data at the University of Oxford (UK), before helping to set up the bioinformatics infrastructure at the University of Montana. He currently works as a data engineer in the biotechnology field in Boston, MA. He is one of the co-authors of Biopython, a major bioinformatics package written in Python.
Read more about Tiago Antao

Right arrow

Preparing a dataset for phylogenetic analysis


In this recipe, we will download and prepare the dataset to be used for our analysis. The dataset contains complete genomes of the Ebola virus. We will use DendroPy to download and prepare the data.

Getting ready

We will download complete genomes from GenBank; these genomes were collected from various Ebola outbreaks, including several from the 2014 outbreak. Note that there are several virus species that cause the Ebola virus disease; the species involved in the 2014 outbreak (the EBOV virus, formally known as the Zaire Ebola virus) is the most common, but this disease is caused by more species of the genus Ebolavirus; four others are also available in sequenced form. You can read more at https://en.wikipedia.org/wiki/Ebolavirus.

If you have already gone through the previous chapters, you may panic looking at the potential data sizes involved here; this is not a problem at all because these are genomes of viruses that are around 19 kbp in size...

lock icon
The rest of the page is locked
Previous PageNext Page
You have been reading a chapter from
Bioinformatics with Python Cookbook. - Second Edition
Published in: Nov 2018Publisher: PacktISBN-13: 9781789344691

Author (1)

author image
Tiago Antao

Tiago Antao is a bioinformatician currently working in the field of genomics. A former computer scientist, Tiago moved into computational biology with an MSc in Bioinformatics from the Faculty of Sciences at the University of Porto (Portugal) and a PhD on the spread of drug-resistant malaria from the Liverpool School of Tropical Medicine (UK). Postdoctoral, Tiago has worked with human datasets at the University of Cambridge (UK) and with mosquito whole genome sequencing data at the University of Oxford (UK), before helping to set up the bioinformatics infrastructure at the University of Montana. He currently works as a data engineer in the biotechnology field in Boston, MA. He is one of the co-authors of Biopython, a major bioinformatics package written in Python.
Read more about Tiago Antao