ARD-KMeans++ (Adaptive Radius Density K-Means++)

ARD-KMeans++ is an experimental clustering initialization strategy designed to explore how local density information can improve centroid placement in K-Means clustering.

The algorithm extends the standard K-Means++ approach by refining each distance-selected centroid using neighborhood density analysis before finalizing its position.

Instead of accepting a raw farthest-point candidate, ARD-KMeans++ relocates the candidate to a nearby dense region using an adaptive radius derived from k-nearest neighbor distances.

Motivation

K-Means++ improves clustering stability by spreading initial centroids, but it can still select centroids in sparse or tail regions, especially in datasets with uneven density or elongated shapes.

ARD-KMeans++ was created to study whether incorporating local density information during initialization can reduce sensitivity to outliers and produce more representative initial centroids.

Design Overview

The algorithm follows the standard K-Means++ distance-based selection to identify a candidate centroid.

For each candidate, an adaptive radius is computed using k-nearest neighbor distances. All points within this radius form a local neighborhood.

The densest point within this neighborhood—defined as the point with the highest local neighbor count—is selected as the final centroid.

Strengths

ARD-KMeans++ reduces the likelihood of initializing centroids in sparse regions and improves stability in datasets with density imbalance or long tails.

The method preserves compatibility with standard K-Means while introducing minimal conceptual complexity.

Limitations

The density refinement step increases initialization time due to neighborhood and distance computations.

Improvements over standard K-Means++ are dataset-dependent and are most noticeable in non-uniform or irregular cluster distributions.

Implementation

The full clustering pipeline, including initialization, assignment, centroid updates, and convergence checks, is implemented from scratch in Python without external machine learning libraries.

The project includes empirical benchmarking against standard K-Means++ and visualization of centroid initialization behavior.

Current Status

ARD-KMeans++ is an experimental research-oriented project. Future work includes alternative density definitions, adaptive neighborhood strategies, and integration with larger-scale datasets.

Tech Stack

Python, Numerical Computing, Distance Geometry, Clustering Algorithms

GitHub Repo