Abstract
With the improvement of data-acquisition technology, big data streams that involve continuous observations with high dimensionality and large volume frequently appear in modern applications, which poses significant challenges for statistical process control. In this article we consider the problem of online monitoring a class of big data streams where each data stream is associated with a spatial location. Our goal is to quickly detect shifts occurring in such big data streams when only partial information can be observed at each time and the out-of-control variables are clustered in a small and unknown region. To achieve this goal, we propose a novel spatial-adaptive sampling and monitoring (SASAM) procedure that aims to leverage the spatial information of the data streams for quick change detection. Specifically, the proposed sampling strategy will adaptively and intelligently integrate two seemingly contradictory ideas: (1) random sampling that quickly searches for possible out-of-control variables; and (2) directional sampling that focuses on highly suspicious out-of-control variables that may cluster in a small region. Simulation and real case studies show that the proposed method significantly outperforms the existing sampling strategy without taking the spatial information of the data streams into consideration.
Original language | English (US) |
---|---|
Pages (from-to) | 329-343 |
Number of pages | 15 |
Journal | Journal of Quality Technology |
Volume | 50 |
Issue number | 4 |
DOIs | |
State | Published - 2018 |
Externally published | Yes |
Keywords
- Big data streams
- Cumulative-sum statistics
- High-dimensional and high-frequency data
- Partial information
- Scalable monitoring schemes
- Statistical process control
ASJC Scopus subject areas
- Safety, Risk, Reliability and Quality
- Strategy and Management
- Management Science and Operations Research
- Industrial and Manufacturing Engineering