Using the Sequence Read Archive
We often want to retrieve raw data, such as FASTQ data, either for testing purposes or to obtain data for an organism, or from publicly available experiments. The Sequence Read Archive (SRA) from the NCBI provides a huge collection of sequencing data from numerous studies, and includes DNA, RNA, and metagenomic data from multiple types of platforms.
Getting ready
You will want to make sure that the SRA tools and fasterq-dump
are installed and in your PATH
. We briefly covered this in Chapter 5, Alignment and Variant Calling Tools for Sequence Manipulation. If you have not already performed this installation, please refer back to that recipe and install the SRA Toolkit now.If fasterq-dump
is not in your PATH
, you may have trouble with the code in this recipe. To make sure the SRA Toolkit and fasterq-dump
are in your PATH
, you can do the following:
echo 'export PATH=$PATH:~/Software/sratoolkit.3.1.1-mac-x86_64/bin' >> ~/.zshrc
source ~/.zshrc...