TY - GEN
T1 - Linking biological databases semantically for knowledge discovery
AU - Ram, Sudha
AU - Zhang, Kunpeng
AU - Wei, Wei
N1 - Funding Information: This research is supported in part by research grants #EF0735191 and #IIS0455993 from the National Science Foundation, USA.
PY - 2008
Y1 - 2008
N2 - Many important life sciences questions are aimed at studying the relationships and interactions between biological functions/processes and biological entities such as genes. The answers may be found by examining diverse types of biological/genomic databases. Finding these answers, however, requires accessing, and retrieving data, from diverse biological data sources. More importantly, sophisticated knowledge discovery processes involve traversing through large numbers of inherent links among various data sources. Currently, the links among data are either implemented as hyperlinks without explicitly indicating their meanings and labels, or hidden in a seemingly simple text format. Consequently, biologists spend numerous hours identifying potentially useful links and following each lead manually, which is time-consuming and error-prone. Our research is aimed at constructing semantic relationships among all biological entities. We have designed a semantic model to categorize and formally define the links. By incorporating ontologies such as Gene or Sequence ontology, we propose techniques to analyze the links embedded within and among data records, to explicitly label their semantics, and to facilitate link traversal, querying, and data sharing. Users may then ask complicated and ad hoc questions and even design their own workflow to support their knowledge discovery processes. In addition, we have performed an empirical analysis to demonstrate that our method can not only improve the efficiency of querying multiple databases, but also yield more useful information.
AB - Many important life sciences questions are aimed at studying the relationships and interactions between biological functions/processes and biological entities such as genes. The answers may be found by examining diverse types of biological/genomic databases. Finding these answers, however, requires accessing, and retrieving data, from diverse biological data sources. More importantly, sophisticated knowledge discovery processes involve traversing through large numbers of inherent links among various data sources. Currently, the links among data are either implemented as hyperlinks without explicitly indicating their meanings and labels, or hidden in a seemingly simple text format. Consequently, biologists spend numerous hours identifying potentially useful links and following each lead manually, which is time-consuming and error-prone. Our research is aimed at constructing semantic relationships among all biological entities. We have designed a semantic model to categorize and formally define the links. By incorporating ontologies such as Gene or Sequence ontology, we propose techniques to analyze the links embedded within and among data records, to explicitly label their semantics, and to facilitate link traversal, querying, and data sharing. Users may then ask complicated and ad hoc questions and even design their own workflow to support their knowledge discovery processes. In addition, we have performed an empirical analysis to demonstrate that our method can not only improve the efficiency of querying multiple databases, but also yield more useful information.
KW - Conceptual modeling
KW - Ontology
KW - Semantics
UR - http://www.scopus.com/inward/record.url?scp=70350686526&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=70350686526&partnerID=8YFLogxK
U2 - 10.1007/978-3-540-87991-6_4
DO - 10.1007/978-3-540-87991-6_4
M3 - Conference contribution
SN - 3540879900
SN - 9783540879909
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 22
EP - 32
BT - Advances in Conceptual Modeling - Challenges and Opportunities - ER 2008 Workshops CMLSA, ECDM, FP-UML, M2AS, RIGiM, SeCoGIS, WISM, Proceedings
PB - Springer-Verlag
T2 - 27th International Conference on Conceptual Modeling, ER 2008 Workshops: CMLSA, ECDM, FP-UML, M2AS, RIGiM, SeCoGIS, WISM
Y2 - 20 October 2008 through 23 October 2008
ER -