## K-means & correlation distance

Here's a useful observation related to the use of K-means together with the Pearson correlation distance (© Alex).

The standard K-means update step, where you update the cluster centers by taking the means of the corresponding points is technically not very appropriate in the case of the correlation distance *d(x, y) = 1 - corr(x, y)*. The proper step would be to take the *sum of the *normalized* points* as the new cluster center:

*c = sum _{i}(x_{i}/|x_{i}|)*

However, you would get an equivalent result if you just used the standard k-means with the *eucleidian* metric on the normalized dataset.

In short: the standard k-means is *only* meant to be used with the eucleidian metric, so *don't* use correlation distance with k-means. Just normalize the points before clustering if you want correlation to be the measure of similarity.

PS: The above statements are not blind rules of the thumb, but can be substantiated by some straightforward maths which I leave for you to work out if you wish.