Physical process-based hydrological models are widely adopted to simulate the water quantity or quality. One of the most commonly used hydrological models is Soil and Water Assessment Tool (SWAT). SWAT models for a large watershed can have over tens of thousands of Hydrological Resource Units (HRUs) which necessitates considerable computational resources. One way to speed up applications of the SWAT model could be to leverage machine learning techniques to identify the crucial features for the prediction task - feature selection. However, majority of the feature selection techniques rely on correlations or some form of a score metric (e.g. mutual information). Furthermore, since correlation does not imply causation, it is important to identify the causal features to improve the prediction accuracy while enhancing the interpretability of machine learning models. However, the SWAT model uses multiple data inputs and features that typically vary by space/HRUs, but may or may not vary over time. This makes it difficult to directly utilize causal discovery models to infer the causal relations. Furthermore, due to the lack of the ground truth causal graph for the SWAT model it is difficult to comment on the validity of the learned causal relations. To overcome these problems, we propose a novel framework that first infers the causal relations for the daily scale of the SWAT data using causal discovery algorithms. Then, it utilizes a community detection module to group similar features together for better interpretability. Finally, it identifies the stable causal relations that appear most often across all the timesteps and leverage them for the prediction of the water quantity. By utilizing only the causal features for the prediction of the target variable can lead to high accuracy as it removes the reliance on spurious correlations. Furthermore, we conduct extensive experiments to validate the effectiveness of the proposed framework along with a real-world case study to evaluate whether the selected features are interpretable or not.

Original languageEnglish (US)
Title of host publicationProceedings - 2022 IEEE International Conference on Big Data, Big Data 2022
EditorsShusaku Tsumoto, Yukio Ohsawa, Lei Chen, Dirk Van den Poel, Xiaohua Hu, Yoichi Motomura, Takuya Takagi, Lingfei Wu, Ying Xie, Akihiro Abe, Vijay Raghavan
PublisherInstitute of Electrical and Electronics Engineers Inc.
Number of pages10
ISBN (Electronic)9781665480451
StatePublished - 2022
Event2022 IEEE International Conference on Big Data, Big Data 2022 - Osaka, Japan
Duration: Dec 17 2022Dec 20 2022

Publication series

NameProceedings - 2022 IEEE International Conference on Big Data, Big Data 2022


Conference2022 IEEE International Conference on Big Data, Big Data 2022


  • SWAT models
  • causal discovery
  • feature selection
  • hydrological systems
  • neural networks

ASJC Scopus subject areas

  • Modeling and Simulation
  • Computer Networks and Communications
  • Information Systems
  • Information Systems and Management
  • Safety, Risk, Reliability and Quality
  • Control and Optimization


Dive into the research topics of 'Causal Discovery for Feature Selection in Physical Process-Based Hydrological Systems'. Together they form a unique fingerprint.

Cite this