Privacy-Preserving Redaction of Diagnosis Data through Source Code Analysis

Lixi Zhou, Lei Yu, Jia Zou, Hong Min

Research output: Chapter in Book/Report/Conference proceedingConference contribution


Protecting sensitive information in diagnostic data such as logs, is a critical concern in the industrial software diagnosis and debugging process. While there are many tools developed to automatically redact the logs for identifying and removing sensitive information, they have severe limitations which can cause either over redaction and loss of critical diagnostic information (false positives), or disclosure of sensitive information (false negatives), or both. To address the problem, in this paper, we argue for a source code analysis approach for log redaction. To identify a log message containing sensitive information, our method locates the corresponding log statement in the source code with logger code augmentation, and checks if the log statement outputs data from sensitive sources by using the data flow graph built from the source code. Appropriate redaction rules are further applied depending on the sensitiveness of the data sources to preserve the privacy information in the logs. We conducted experimental evaluation and comparison with other popular baselines. The results demonstrate that our approach can significantly improve the detection precision of the sensitive information and reduce both false positives and negatives.

Original languageEnglish (US)
Title of host publicationScientific and Statistical Database Management - 35th International Conference, SSDBM 2023 - Proceedings
EditorsRobert Schuler, Carl Kesselman, Kyle Chard, Alejandro Bugacov
PublisherAssociation for Computing Machinery
ISBN (Electronic)9798400707469
StatePublished - Jul 10 2023
Event35th International Conference on Scientific and Statistical Database Management, SSDBM 2023 - Los Angeles, United States
Duration: Jul 10 2023Jul 12 2023

Publication series

NameACM International Conference Proceeding Series


Conference35th International Conference on Scientific and Statistical Database Management, SSDBM 2023
Country/TerritoryUnited States
CityLos Angeles

ASJC Scopus subject areas

  • Human-Computer Interaction
  • Computer Networks and Communications
  • Computer Vision and Pattern Recognition
  • Software


Dive into the research topics of 'Privacy-Preserving Redaction of Diagnosis Data through Source Code Analysis'. Together they form a unique fingerprint.

Cite this