TY - JOUR
T1 - Toward a Multi-Representational Approach to Prediction and Understanding, in Support of Discovery in Hydrology
AU - De la Fuente, Luis A.
AU - Gupta, Hoshin V.
AU - Condon, Laura E.
N1 - Funding Information: This publication is the product of research done by De la Fuente ( 2021 ) to satisfy the requirements for obtaining a Master of Science degree in Hydrology while being funded by the Agencia Nacional de Investigacion y Desarrollo (Chile) through “” Gupta acknowledges partial support from the Australian Research Council (ARC) through the Centre of Excellence for Climate Extremes Grant CE170100023. Condon acknowledges partial support from NSF Early Career Award Grant 1945195. Beca de Magíster en el Extranjero, Becas Chile en Áreas Prioritarias, Convocatoria 2018. Funding Information: This publication is the product of research done by De la Fuente (2021) to satisfy the requirements for obtaining a Master of Science degree in Hydrology while being funded by the Agencia Nacional de Investigacion y Desarrollo (Chile) through “Beca de Magíster en el Extranjero, Becas Chile en Áreas Prioritarias, Convocatoria 2018.” Gupta acknowledges partial support from the Australian Research Council (ARC) through the Centre of Excellence for Climate Extremes Grant CE170100023. Condon acknowledges partial support from NSF Early Career Award Grant 1945195. Publisher Copyright: © 2022. The Authors.
PY - 2023/1
Y1 - 2023/1
N2 - Key to model development is the selection of an appropriate representational system, including both the representation of what is observed (the data), and the formal mathematical structure used to construct the input-state-output mapping. These choices are critical, because they completely determine the questions we can ask, the nature of the analyses and inferences we can perform, and the answers we can obtain. Accordingly, a representation that is suitable for one kind of investigation might be limited in its ability to support some other kind. Arguably, how different representational approaches affect what we can learn from data is poorly understood. This paper explores three representational strategies as vehicles for understanding how catchment scale hydrological processes vary across hydro-geo-climatologically diverse Chile. Specifically, we test a lumped water-balance model (GR4J), a data-based dynamical systems model (LSTM), and a data-based regression tree model (Random Forest). Insights were obtained regarding system memory encoded in data, spatial transferability by use of surrogate attributes, and informational deficiencies of the data set that limit our ability to learn an adequate input-output relationship. As expected, each approach exhibits specific strengths, with LSTM providing the best characterization of dynamics, GR4J being the most robust under informationally deficient conditions, and Random Forest regression-tree method being most supportive of interpretation. Overall, the contrasting nature of the three approaches suggests the value of adopting a multi-representational framework to more fully extract information from the data and, by doing so, find information that better facilities the goals of robust prediction and improved understanding, ultimately supporting enhanced scientific discovery.
AB - Key to model development is the selection of an appropriate representational system, including both the representation of what is observed (the data), and the formal mathematical structure used to construct the input-state-output mapping. These choices are critical, because they completely determine the questions we can ask, the nature of the analyses and inferences we can perform, and the answers we can obtain. Accordingly, a representation that is suitable for one kind of investigation might be limited in its ability to support some other kind. Arguably, how different representational approaches affect what we can learn from data is poorly understood. This paper explores three representational strategies as vehicles for understanding how catchment scale hydrological processes vary across hydro-geo-climatologically diverse Chile. Specifically, we test a lumped water-balance model (GR4J), a data-based dynamical systems model (LSTM), and a data-based regression tree model (Random Forest). Insights were obtained regarding system memory encoded in data, spatial transferability by use of surrogate attributes, and informational deficiencies of the data set that limit our ability to learn an adequate input-output relationship. As expected, each approach exhibits specific strengths, with LSTM providing the best characterization of dynamics, GR4J being the most robust under informationally deficient conditions, and Random Forest regression-tree method being most supportive of interpretation. Overall, the contrasting nature of the three approaches suggests the value of adopting a multi-representational framework to more fully extract information from the data and, by doing so, find information that better facilities the goals of robust prediction and improved understanding, ultimately supporting enhanced scientific discovery.
KW - GR4J
KW - LSTM
KW - Random Forest
KW - catchments
KW - conceptual model
KW - discovery
KW - hydro-geo-climatology
KW - hydrological processes
KW - lumped water balance model
KW - machine learning
KW - representation
KW - understanding
UR - http://www.scopus.com/inward/record.url?scp=85147089935&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85147089935&partnerID=8YFLogxK
U2 - 10.1029/2021WR031548
DO - 10.1029/2021WR031548
M3 - Article
SN - 0043-1397
VL - 59
JO - Water Resources Research
JF - Water Resources Research
IS - 1
M1 - e2021WR031548
ER -