TY - JOUR
T1 - A self-supervised learning-based approach to clustering multivariate time-series data with missing values (SLAC-Time)
T2 - An application to TBI phenotyping
AU - Ghaderi, Hamid
AU - Foreman, Brandon
AU - Nayebi, Amin
AU - Tipirneni, Sindhu
AU - Reddy, Chandan K.
AU - Subbian, Vignesh
N1 - Publisher Copyright: © 2023 Elsevier Inc.
PY - 2023/7
Y1 - 2023/7
N2 - Self-supervised learning approaches provide a promising direction for clustering multivariate time-series data. However, real-world time-series data often include missing values, and the existing approaches require imputing missing values before clustering, which may cause extensive computations and noise and result in invalid interpretations. To address these challenges, we present a Self-supervised Learning-based Approach to Clustering multivariate Time-series data with missing values (SLAC-Time). SLAC-Time is a Transformer-based clustering method that uses time-series forecasting as a proxy task for leveraging unlabeled data and learning more robust time-series representations. This method jointly learns the neural network parameters and the cluster assignments of the learned representations. It iteratively clusters the learned representations with the K-means method and then utilizes the subsequent cluster assignments as pseudo-labels to update the model parameters. To evaluate our proposed approach, we applied it to clustering and phenotyping Traumatic Brain Injury (TBI) patients in the Transforming Research and Clinical Knowledge in Traumatic Brain Injury (TRACK-TBI) study. Clinical data associated with TBI patients are often measured over time and represented as time-series variables characterized by missing values and irregular time intervals. Our experiments demonstrate that SLAC-Time outperforms the baseline K-means clustering algorithm in terms of silhouette coefficient, Calinski Harabasz index, Dunn index, and Davies Bouldin index. We identified three TBI phenotypes that are distinct from one another in terms of clinically significant variables as well as clinical outcomes, including the Extended Glasgow Outcome Scale (GOSE) score, Intensive Care Unit (ICU) length of stay, and mortality rate. The experiments show that the TBI phenotypes identified by SLAC-Time can be potentially used for developing targeted clinical trials and therapeutic strategies.
AB - Self-supervised learning approaches provide a promising direction for clustering multivariate time-series data. However, real-world time-series data often include missing values, and the existing approaches require imputing missing values before clustering, which may cause extensive computations and noise and result in invalid interpretations. To address these challenges, we present a Self-supervised Learning-based Approach to Clustering multivariate Time-series data with missing values (SLAC-Time). SLAC-Time is a Transformer-based clustering method that uses time-series forecasting as a proxy task for leveraging unlabeled data and learning more robust time-series representations. This method jointly learns the neural network parameters and the cluster assignments of the learned representations. It iteratively clusters the learned representations with the K-means method and then utilizes the subsequent cluster assignments as pseudo-labels to update the model parameters. To evaluate our proposed approach, we applied it to clustering and phenotyping Traumatic Brain Injury (TBI) patients in the Transforming Research and Clinical Knowledge in Traumatic Brain Injury (TRACK-TBI) study. Clinical data associated with TBI patients are often measured over time and represented as time-series variables characterized by missing values and irregular time intervals. Our experiments demonstrate that SLAC-Time outperforms the baseline K-means clustering algorithm in terms of silhouette coefficient, Calinski Harabasz index, Dunn index, and Davies Bouldin index. We identified three TBI phenotypes that are distinct from one another in terms of clinically significant variables as well as clinical outcomes, including the Extended Glasgow Outcome Scale (GOSE) score, Intensive Care Unit (ICU) length of stay, and mortality rate. The experiments show that the TBI phenotypes identified by SLAC-Time can be potentially used for developing targeted clinical trials and therapeutic strategies.
KW - Clustering
KW - Multivariate time-series data
KW - Self-supervised learning
KW - Transformer
KW - Traumatic brain injury
UR - http://www.scopus.com/inward/record.url?scp=85160613776&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85160613776&partnerID=8YFLogxK
U2 - 10.1016/j.jbi.2023.104401
DO - 10.1016/j.jbi.2023.104401
M3 - Article
C2 - 37225066
SN - 1532-0464
VL - 143
JO - Journal of Biomedical Informatics
JF - Journal of Biomedical Informatics
M1 - 104401
ER -