Protein-protein docking
(Edelsbrunner; Agarwal, Chien, Choi, Headd, Mian, Rudolph, Wang)

We have so far focused on the bound protein docking, in which one asks for determining a motion that positions one rigid protein relative to the other into the correct docked configuration. Implementation of a fast and accurate bound docking algorithm will help develop methods that add the higher dimensionality of conformational changes seen in real docking problems.

Given proteins A and B, we use a score function that approximates the van der Waals interaction by counting the pairs of atoms that are close to each other. Our goal is to find a placement for B under rigid motions that minimizes the score, while keeping the number of collisions, the pairs of intersecting atoms between A and B, below a prespecified threshold. Our algorithm works in two stages. The first stage, called the candidate generation, identifies cavities and protrusions on the surface of each protein and generates a set of candidate placements for B by matching these features. We use combinatorial Morse theory to identify cavities and protrusions on the surface of a protein. Using the features returned by the algorithm on each surface of proteins A and B, we compute a set of placements of B at which these features match sufficiently well. We rank these placements using our score function and return a fixed number of the highest ranked placements. The second stage, called the refinement stage, chooses a subset of the top ranked candidate placements and refines each of them using an iterative local-improvement algorithm, so that the score is locally maximized and the number of collisions is low.

We measure the success of our algorithm by testing it on known structures of protein complexes and by measuring the rms distance between the native configuration of B and the configuration of B computed by the algorithm. We have tested the two-stage approach on a collection of 17 protein complexes. At the end of the first stage, our algorithm returns 100 configurations for B. We then run the local improvement algorithm on each configuration and rerank the set of output by the improved scores. For all of the 17 protein complexes, the configuration with highest score after the second stage has a low rms distance. In particular, for 10 out of the 17 cases, the rms distance of the top-scored configurations is less than 1.0A. For five of the remaining cases, it is below 1.5A, and for the last two, it is around 2.5A. For all protein complexes, including the last two cases, the local improvement algorithm significantly reduces the number of collisions and increases the score of near-native configurations.