Introducing Galaxy
In this recipe, we’ll be learning about a popular public workflow system called Galaxy that can even be used by non-technical users. But before we dive into Galaxy, let’s go over some basic concepts and terminology relating to workflow systems that we’ll be using throughout this chapter.
A workflow is typically defined as a series of tasks, which are jobs that need to be run. When one task has to finish before another can begin, we call this a dependency. In many cases, a task needs to be parallelized. For example, we may want to break a FASTA file into many pieces, annotate each piece, and then combine the results. This type of process is typically called a scatter-gather.
We typically represent these dependencies in a workflow as a Directed Acyclic Graph (DAG). Take a look at this illustrative DAG:
Figure 15.1 – Example of a directed acyclic graph in bioinformatics
Let’s imagine that we want...