Preparing a dataset for phylogenetic analysis
In this recipe, we will download and prepare the dataset to be used for our analysis. The dataset contains complete genomes of the Ebola virus. We will use DendroPy (https://jeetsukumaran.github.io/DendroPy/) to download and prepare the data. DendroPy offers Python functions for phylogenetic computing. It includes a variety of methods for reading and writing phylogenetic trees in popular formats such as Newick, NEXUS, and Phylip. It can also generate and compare phylogenetic trees. Here we will first use DendroPy to download and format several Ebola genomes. We will then create FASTA files which will be used throughout the recipes in this chapter to examine the phylogenetic relationships between the different species of Ebola. We will learn about the DnaCharacterMatrix class in DendroPy which is a useful container class for storing and manipulating your sequences.Next, we’ll see how to extract a subset of genes from the alignment and...