| Most methods for protein structure prediction produce
a large set of decoy structures, which then are scored in an attempt to
find the one that is nearest the native structure. An ideal scoring function
would rank the native structure above all decoys. The fitness of a scoring
function is often measured by its Z-score, which is the number of standard
deviations that the native structure's score lies above the average score
of an entire set of decoy structures. One scoring method that considers
packing interactions is the Delaunay-based four-body statistical potential.
In 3D, the Delaunay tessellation defines vertices, edges, triangles and
tetrahedra by a geometric neighbor criterion. Tropsha's lab at UNC describes
each residue in a protein by a single point, usually the C_alpha or side-chain
centroid, computes the Delaunay, then characterizes tetrahedra by their
amino acid content and their primary-structure topology. They examine
the statistical properties of tetrahedra in a large set of known structures,
and score each kind of tetrahedra by the deviation of its observed frequency
from randomness, effectively answering the question what amino acids have
a strong preference to be neighbors. Tropsha's lab seeks to expand this
potential to capture more of the information in the training set by dividing
tetrahedra into finer categories, but this would require a training set
larger than the number of available structures with high-resolution and
low sequence identity. We investigated an alternate formulation of the
Delaunay statistical potentials based on triangles. This enabled him to
add a further geometrical description of buriedness using the existing
training set. This three-body potential, in tandem with the existing four-body
potential, shows a 10% Z-score improvement in decoy discrimination.
|