TY - GEN
T1 - Ensemble multiple sequence alignment via advising
AU - DeBlasio, Dan
AU - Kececioglu, John
N1 - Funding Information: We thank the reviewers for their helpful comments. This work was supported by NSF Grant IIS-1217886 to J.K., and a PhD fellowship to D.D. from NSF Grant DGE-0654435. Publisher Copyright: Copyright 2015 ACM.
PY - 2015/9/9
Y1 - 2015/9/9
N2 - The multiple sequence alignments computed by an aligner for different settings of its parameters, as well as the alignments computed by different aligners using their default settings, can differ markedly in accuracy. Parameter advising is the task of choosing a parameter setting for an aligner to maximize the accuracy of the resulting alignment. We extend parameter advising to aligner advising, which in contrast chooses among a set of aligners to maximize accuracy. In the context of aligner advising, default advising selects from a set of aligners that are using their default settings, while general advising selects both the aligner and its parameter setting. In this paper, we apply aligner advising for the first time, to create a true ensemble aligner. Through cross-validation experiments on benchmark protein sequence alignments, we show that parameter advising boosts an aligner's accuracy beyond its default setting for virtually all of the standard aligners currently used in practice. Furthermore, aligner advising with a collection of aligners further improves upon parameter advising with any single aligner, though surprisingly the performance of default advising on testing data is actually superior to general advising due to less overfitting to training data. The new ensemble aligner that results from aligner advising is significantly more accurate than the best single default aligner, especially on hard-to-align sequences. This successfully demonstrates how to construct out of a collection of individual aligners, a more accurate ensemble aligner.
AB - The multiple sequence alignments computed by an aligner for different settings of its parameters, as well as the alignments computed by different aligners using their default settings, can differ markedly in accuracy. Parameter advising is the task of choosing a parameter setting for an aligner to maximize the accuracy of the resulting alignment. We extend parameter advising to aligner advising, which in contrast chooses among a set of aligners to maximize accuracy. In the context of aligner advising, default advising selects from a set of aligners that are using their default settings, while general advising selects both the aligner and its parameter setting. In this paper, we apply aligner advising for the first time, to create a true ensemble aligner. Through cross-validation experiments on benchmark protein sequence alignments, we show that parameter advising boosts an aligner's accuracy beyond its default setting for virtually all of the standard aligners currently used in practice. Furthermore, aligner advising with a collection of aligners further improves upon parameter advising with any single aligner, though surprisingly the performance of default advising on testing data is actually superior to general advising due to less overfitting to training data. The new ensemble aligner that results from aligner advising is significantly more accurate than the best single default aligner, especially on hard-to-align sequences. This successfully demonstrates how to construct out of a collection of individual aligners, a more accurate ensemble aligner.
KW - Accuracy estimation
KW - Aligner advising
KW - Ensemble methods
KW - Multiple sequence alignment
KW - Parameter advising
UR - http://www.scopus.com/inward/record.url?scp=84963556566&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84963556566&partnerID=8YFLogxK
U2 - 10.1145/2808719.2808766
DO - 10.1145/2808719.2808766
M3 - Conference contribution
T3 - BCB 2015 - 6th ACM Conference on Bioinformatics, Computational Biology, and Health Informatics
SP - 452
EP - 461
BT - BCB 2015 - 6th ACM Conference on Bioinformatics, Computational Biology, and Health Informatics
PB - Association for Computing Machinery, Inc
T2 - 6th ACM Conference on Bioinformatics, Computational Biology, and Health Informatics, BCB 2015
Y2 - 9 September 2015 through 12 September 2015
ER -