Multiple sequence alignments
(Agarwal; Kolodny)

Multiple sequence alignment is one of the central problems in computational biology and has been widely studied. For a given set of k sequences, each of length at most n, the best known algorithm can find a globally optimal solution in time n to the k-th power. The problem is known to be NP-complete and a few approximation algorithms are known that guarantee approximation ratios of 2 minus a constant times 1/k in the worst case. Several practical heuristics with various advantages and disadvantages have been proposed, many of which are based on progressive application of a pairwise sequence-alignment algorithm. These approaches tend to produce a locally optimal solution.

We are investigating an approach that works well when each pair of sequences are either very similar to each other or quite different and if they are similar then their optimal alignment has only a few gaps. These assumptions hold for typical protein sequences, and the lower bound constructions do not extend under these assumptions. Using geometric techniques we are developing a compact representation of the dynamic programming table, which can be computed efficiently. The efficiency of the algorithm will depend on how well the pairs of sequences align, and one can achieve tradeoff between efficiency and quality of the solution.