TY - GEN
T1 - Applications of Machine Learning Techniques in Genetic Circuit Design
AU - Zhu, Jiajie
AU - Zhang, Qi
AU - Forouraghi, Babak
AU - Wang, Xiao
N1 - Funding Information: The biological experiments in this study (X. Wang) were financially supported by a grant from National Institute of Health (R01-GM131405). Publisher Copyright: © 2021 ACM.
PY - 2021/2/26
Y1 - 2021/2/26
N2 - Construction of mathematical models to investigate genetic circuit design is a powerful technique in synthetic biology with real-world applications in biomanufacturing and biosensing. The challenge of building such models is to accurately discover flow of information in simple as well as complex biological systems. However, building synthetic biological models is often a time-consuming process with relatively low prediction accuracy for highly complex genetic circuits. The primary goal of this study was to investigate the utility of various machine learning (ML) techniques to accurately construct mathematical models for predicting gene expressions in genetic circuit designs. Specifically, classification and regressions models were built using Random Forrest (RF), Support Vector Machines (SVM), and Artificial Neural Networks (ANN). The obtained accuracy of the regression model using RF and ANN yielded R2 scores of 0.97 and 0.95, respectively, compared to the best score of 0.63 obtained in an earlier study. Furthermore, a classifier model was built using the green fluorescent protein (GFP) measurements obtained from the experiments conducted in this work. Biologists use GFP as an indicator of gene expression, enabling easy measurement of its protein level in the living cells. The measured GFP values were predicted with 100% accuracy by both RF and ANN classifier models while identifying various synthetic gene circuit patterns. The paper also highlights importance of the relevant data preparation techniques to ensure high accuracy is obtained by the utilized ML models.
AB - Construction of mathematical models to investigate genetic circuit design is a powerful technique in synthetic biology with real-world applications in biomanufacturing and biosensing. The challenge of building such models is to accurately discover flow of information in simple as well as complex biological systems. However, building synthetic biological models is often a time-consuming process with relatively low prediction accuracy for highly complex genetic circuits. The primary goal of this study was to investigate the utility of various machine learning (ML) techniques to accurately construct mathematical models for predicting gene expressions in genetic circuit designs. Specifically, classification and regressions models were built using Random Forrest (RF), Support Vector Machines (SVM), and Artificial Neural Networks (ANN). The obtained accuracy of the regression model using RF and ANN yielded R2 scores of 0.97 and 0.95, respectively, compared to the best score of 0.63 obtained in an earlier study. Furthermore, a classifier model was built using the green fluorescent protein (GFP) measurements obtained from the experiments conducted in this work. Biologists use GFP as an indicator of gene expression, enabling easy measurement of its protein level in the living cells. The measured GFP values were predicted with 100% accuracy by both RF and ANN classifier models while identifying various synthetic gene circuit patterns. The paper also highlights importance of the relevant data preparation techniques to ensure high accuracy is obtained by the utilized ML models.
KW - Synthetic gene circuit design
KW - gene expression
KW - machine Learning
UR - http://www.scopus.com/inward/record.url?scp=85109212522&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85109212522&partnerID=8YFLogxK
U2 - 10.1145/3457682.3457683
DO - 10.1145/3457682.3457683
M3 - Conference contribution
T3 - ACM International Conference Proceeding Series
SP - 1
EP - 7
BT - 2021 13th International Conference on Machine Learning and Computing, ICMLC 2021
PB - Association for Computing Machinery
T2 - 2021 13th International Conference on Machine Learning and Computing, ICMLC 2021
Y2 - 26 February 2021 through 1 March 2021
ER -