Optimization and Comparison of Coordinate- and Metric-Based Indexes on GPUs for Distance Similarity Searches

Michael Gowanlock, Benoit Gallet, Brian Donnelly

Research output: Chapter in Book/Report/Conference proceedingConference contribution

1 Scopus citations

Abstract

The distance similarity search (DSS) is a fundamental operation for large-scale data analytics, as it is used to find all points that are within a search distance of a query point. Given that new scientific instruments are generating a tremendous amount of data, it is critical that these searches are highly efficient. Recently, GPU algorithms have been proposed to parallelize the DSS. While most work shows that GPU algorithms largely outperform parallel CPU algorithms, there is no single GPU algorithm that outperforms all other state-of-the-art approaches; therefore, it is not clear which algorithm should be selected based on a dataset/workload. We compare two GPU DSS algorithms: one that indexes directly on the data coordinates, and one that indexes using the distances between data points to a set of reference points. A counterintuitive finding is that the data dimensionality is not a good indicator of which algorithm should be used on a given dataset. We also find that the intrinsic dimensionality (ID) which quantifies structure in the data can be used to parameter tune the algorithms to improve performance over the baselines reported in prior work. Lastly, we find that combining the data dimensionality and ID can be used to select between the best performing GPU algorithm on a dataset.

Original languageEnglish (US)
Title of host publicationComputational Science – ICCS 2023 - 23rd International Conference, Proceedings
EditorsJiří Mikyška, Clélia de Mulatier, Valeria V. Krzhizhanovskaya, Peter M.A. Sloot, Maciej Paszynski, Jack J. Dongarra
PublisherSpringer Science and Business Media Deutschland GmbH
Pages357-364
Number of pages8
ISBN (Print)9783031360206
DOIs
StatePublished - 2023
Event23rd International Conference on Computational Science, ICCS 2023 - Prague, Czech Republic
Duration: Jul 3 2023Jul 5 2023

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume14074 LNCS

Conference

Conference23rd International Conference on Computational Science, ICCS 2023
Country/TerritoryCzech Republic
CityPrague
Period7/3/237/5/23

Keywords

  • Distance Similarity Search
  • GPGPU
  • Metric-based Index

ASJC Scopus subject areas

  • Theoretical Computer Science
  • General Computer Science

Fingerprint

Dive into the research topics of 'Optimization and Comparison of Coordinate- and Metric-Based Indexes on GPUs for Distance Similarity Searches'. Together they form a unique fingerprint.

Cite this