Workflow Systems
One of the core technologies used in bioinformatics is pipelines, also known as workflow systems. Almost all bioinformatics tasks require more than one step. In many cases, certain steps must also be parallelized, and their results gathered together. These sets of ordered tasks are referred to as bioinformatics pipelines or workflows. In bioinformatics, you can find three main types of workflow system:
- Frameworks such as Galaxy (https://usegalaxy.org) or DNAnexus (https://www.dnanexus.com/), which are geared toward end users.
- Programmatic workflows, which are geared toward code interfaces that, while generic, often originate from the bioinformatics or machine learning space. Two examples are Snakemake (https://snakemake.readthedocs.io/) and Nextflow (https://www.nextflow.io/).
- Totally generic workflow systems such as Apache Airflow (https://airflow.incubator.apache.org/), which take a less domain-specific approach to workflow management. These are especially...