TY - JOUR
T1 - Composite motifs integrating multiple protein structures increase sensitivity for function prediction.
AU - Chen, Brian Y.
AU - Bryant, Drew H.
AU - Cruess, Amanda E.
AU - Bylund, Joseph H.
AU - Fofanov, Viacheslav Y.
AU - Kristensen, David M.
AU - Kimmel, Marek
AU - Lichtarge, Olivier
AU - Kavraki, Lydia E.
PY - 2007
Y1 - 2007
N2 - The study of disease often hinges on the biological function of proteins, but determining protein function is a difficult experimental process. To minimize duplicated effort, algorithms for function prediction seek characteristics indicative of possible protein function. One approach is to identify substructural matches of geometric and chemical similarity between motifs representing known active sites and target protein structures with unknown function. In earlier work, statistically significant matches of certain effective motifs have identified functionally related active sites. Effective motifs must be carefully designed to maintain similarity to functionally related sites (sensitivity) and avoid incidental similarities to functionally unrelated protein geometry (specificity). Existing motif design techniques use the geometry of a single protein structure. Poor selection of this structure can limit motif effectiveness if the selected functional site lacks similarity to functionally related sites. To address this problem, this paper presents composite motifs, which combine structures of functionally related active sites to potentially increase sensitivity. Our experimentation compares the effectiveness of composite motifs with simple motifs designed from single protein structures. On six distinct families of functionally related proteins, leave-one-out testing showed that composite motifs had sensitivity comparable to the most sensitive of all simple motifs and specificity comparable to the average simple motif. On our data set, we observed that composite motifs simultaneously capture variations in active site conformation, diminish the problem of selecting motif structures, and enable the fusion of protein structures from diverse data sources.
AB - The study of disease often hinges on the biological function of proteins, but determining protein function is a difficult experimental process. To minimize duplicated effort, algorithms for function prediction seek characteristics indicative of possible protein function. One approach is to identify substructural matches of geometric and chemical similarity between motifs representing known active sites and target protein structures with unknown function. In earlier work, statistically significant matches of certain effective motifs have identified functionally related active sites. Effective motifs must be carefully designed to maintain similarity to functionally related sites (sensitivity) and avoid incidental similarities to functionally unrelated protein geometry (specificity). Existing motif design techniques use the geometry of a single protein structure. Poor selection of this structure can limit motif effectiveness if the selected functional site lacks similarity to functionally related sites. To address this problem, this paper presents composite motifs, which combine structures of functionally related active sites to potentially increase sensitivity. Our experimentation compares the effectiveness of composite motifs with simple motifs designed from single protein structures. On six distinct families of functionally related proteins, leave-one-out testing showed that composite motifs had sensitivity comparable to the most sensitive of all simple motifs and specificity comparable to the average simple motif. On our data set, we observed that composite motifs simultaneously capture variations in active site conformation, diminish the problem of selecting motif structures, and enable the fusion of protein structures from diverse data sources.
UR - http://www.scopus.com/inward/record.url?scp=38449083700&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=38449083700&partnerID=8YFLogxK
U2 - 10.1142/9781860948732_0035
DO - 10.1142/9781860948732_0035
M3 - Article
C2 - 17951837
SN - 1752-7791
VL - 6
SP - 343
EP - 355
JO - Computational systems bioinformatics / Life Sciences Society. Computational Systems Bioinformatics Conference
JF - Computational systems bioinformatics / Life Sciences Society. Computational Systems Bioinformatics Conference
ER -