Abstract
An algorithm is presented to compute a multiple structure alignment for a set of proteins and to generate a consensus (pseudo) protein for the set. The algorithm is a heuristic in that it computes an approximation to the optimal multiple structure alignment that minimizes the sum of the pairwise distances between the protein structures. The algorithm chooses an input protein as the initial consensus and computes a correspondence between the protein structures (which are represented as sets of unit vectors) using an approach analogous to the center-star method for multiple sequence alignment. From this correspondence, a set of rotation matrices (optimal for the given correspondence) is derived to align the structures and derive the new consensus. The process is iterated until the sum of pairwise distances converges. The computation of the optimal rotations is itself an iterative process that both makes use of the current consensus and generates simultaneously a new one. This approach is based on an interesting result that allows the sum of all pairwise distances to be represented compactly as distances to the consensus. Experimental results on several protein families are presented, showing that the algorithm converges quite rapidly.
Original language | English (US) |
---|---|
Pages (from-to) | 986-1000 |
Number of pages | 15 |
Journal | Journal of Computational Biology |
Volume | 11 |
Issue number | 5 |
DOIs | |
State | Published - 2004 |
Keywords
- Center-star method
- Consensus protein
- Correspondence
- Dynamic programming
- Frobenius norm
- Rotation matrix
- Singular value decomposition
- Structure alignment
ASJC Scopus subject areas
- Modeling and Simulation
- Molecular Biology
- Genetics
- Computational Mathematics
- Computational Theory and Mathematics