Max Planck Institute for Evolutionary Biology, Plön
"Genome Comparison using Suffix Arrays"
When comparing two or more genomes, the first step is usually to align the sequences such that the number of differences is minimized. However, the computation of alignments is often the rate-limiting step in genome analysis. We have therefore been working on replacing the alignment by a suffix array as the central data structure. A suffix array consists of an alphabetical ordering of all suffixes in a text. Like in a book index, the look-up times in a suffix array are independent of the text length. During the past 15 years great progress has been made in the computation of suffix arrays. In my presentation I survey some of this progress, explain how suffix arrays work in practice, and demonstrate their application in genome analysis software we have developed over the last few years.