Primer: The envelope of sequence bioinformatics in 2022
Institut Pasteur & CNRS
This mysterious title is meant to suggest the immense realm of possibilities in sequence bioinformatics today. There is an explosion of data and computational biology research is struggling to keep pace, despite having made great advances. This primer provides a high level survey of sequence alignment, genome assembly, and large-scale sequence search. I cover the underlying aspects of the algorithms, their strengths and weaknesses, and also touch on their capacity to scale to the petabyte regime. I also provide a mini-primer on sequence assembly, as this is another significant (and underrated) source of sensitivity limits. Taken together the different components of the primer lay the groundwork for understanding state-of-the-art algorithms for virus discovery at the petabyte scale.