Protein-protein docking
(Agarwal, Edelsbrunner, Rudolph; Bespamyatnikh, Choi, Guria, Wang)

Protein-protein interactions are at the same time among the most important and the least understood biochemical phenomena. We take a geometric approach and search for configurations with maximum fit or complementarity. To establish a baseline for this approach, we implemented an exhaustive search of all possible configurations in the six-dimensional space of rigid motions (translations and rotations in three-dimensional space). The fit is measured by a score function that counts the number of atom pairs in close vicinity of each other. We established that we find the right solution in virtually every trial if
(i) the problem is a re-docking experiment with very few bumps,
(ii) we sample the space of rigid motions fine enough,

The second requirement causes computational problems because fine sampling a six-dimensional space means considering billions of configurations.

Building upon the success of the exhaustive search method, we began developing a more efficient algorithm based on biased samplings and local improvement. The biased sample of the space of rigid motions is computed from a hierarchical collection of critical points of a Morse function defined on the protein surface. The function is designed to reflect the shape by assuming extreme values at protrusions and pockets of the topography. The hierarchy is built based on a numerical estimate (the topological persistence) of the height of protrusions and the depth of pockets. We will experiment with a variety of functions, including ones that measure energy potentials and with combinations of different functions.

Given the hierarchy of critical points, we have developed an algorithm that generates a sparse set of configurations covering the high scoring region of the space of rigid motions. Our preliminary experiments in two dimensions suggest that the new method is significantly more economical in sampling than the exhaustive search and the traditional geometric hashing methods, while covering the high scoring regions reasonably well. In the second phase of the algorithm, we use an iterative technique to find a locally optimal solution in the neighborhood of each configuration returned in the first phase. At each step, we keep one protein fixed and find a rigid motion of the other protein that improves the score function, if there exists one. We then move the second protein along this computed trajectory until it almost collides with the first protein. Our initial experiments confirmed the potential of this approach.