Writing a simple bioinformatics file parser using AI
Manipulating sequences is one of the most basic functions in Bioinformatics. It involves formatting and file type conversion, compression, and manipulation of biological sequences such as reverse complementation or translation of DNA into protein. and We’ll be looking at this in Chapter 5, Recipe 2, “Tools for Sequence Manipulation”. But first we’ll take a quick look at how AI is impacting bioinformatics and coding.In the past several years, Large Language Models (LLMs) have taken the AI world by storm. These models use machine learning to model relationships between text. They have become very powerful and can write code and unit tests for you among other things. In Bioinformatics, LLMs are being used to create new proteins and even design entire genomes. For example, ProGen can generate functional proteins of many types (Madani et al, “Large Language Models generate functional protein sequences across...