Accelerating the yinyang k-means algorithm using the GPU

Colin Taylor, Michael Gowanlock

Research output: Chapter in Book/Report/Conference proceedingConference contribution

7 Scopus citations

Abstract

The k-means clustering algorithm is widely employed for unsupervised learning. The algorithm takes as input a multidimensional dataset of points and number of clusters/centroids, k, where each point is assigned to one of the clusters. For exact k-means clustering, the algorithm must compute the same result as Lloyd's algorithm, which is well-known to be computationally expensive due to the large number of distance comparisons between each point and the k centroids. Several algorithms have been proposed for k-means clustering that avoid distance calculations but produce an exact result. However, these algorithms have all been designed for execution using the CPU, and no published works have examined using the GPU to accelerate k-means while simultaneously avoiding distance calculations. This paper examines the state-of-the-art Yinyang algorithm that avoids distance calculations as executed on the GPU. Since Lloyd's algorithm is well-suited to a GPU execution, it is not clear whether the Yinyang algorithm will obtain significant performance gains on GPU hardware. In this context, this paper: (i) proposes the first GPU-accelerated Yinyang algorithm in the literature; (ii) advances several optimizations to GPU kernels; (iii) contrasts and evaluates different degrees of distance calculation pruning; and, (iv) compares the performance of our GPU-accelerated Yinyang algorithm to four reference implementations. Our GPU algorithm achieves a speedup over the multi-core CPU Yinyang algorithm of up to 8× on real-world datasets.

Original languageEnglish (US)
Title of host publicationProceedings - 2021 IEEE 37th International Conference on Data Engineering, ICDE 2021
PublisherIEEE Computer Society
Pages1835-1840
Number of pages6
ISBN (Electronic)9781728191843
DOIs
StatePublished - Apr 2021
Externally publishedYes
Event37th IEEE International Conference on Data Engineering, ICDE 2021 - Virtual, Chania, Greece
Duration: Apr 19 2021Apr 22 2021

Publication series

NameProceedings - International Conference on Data Engineering
Volume2021-April

Conference

Conference37th IEEE International Conference on Data Engineering, ICDE 2021
Country/TerritoryGreece
CityVirtual, Chania
Period4/19/214/22/21

Keywords

  • GPGPU
  • K-Means
  • Lloyd's Algorithm
  • Yinyang Algorithm

ASJC Scopus subject areas

  • Software
  • Signal Processing
  • Information Systems

Fingerprint

Dive into the research topics of 'Accelerating the yinyang k-means algorithm using the GPU'. Together they form a unique fingerprint.

Cite this