This content has been marked as final. Show 2 replies
Hi Paul,1 person found this helpful
ODM reports a dispersion metric per cluster. This is the average distance of the members of a cluster to the centroid. You can compute a global SSE by multiplying the dispersion per cluster and the number of rows in that cluster and summing them up.
That said, as you increase the number of clusters, your SSE on the training data should be going down monotonically due to overfitting, so it isn't really useful in determining an optimal number of clusters. It will be better to score on a held-aside and compute your SSE there.
There are a number methods to find an optimal number of clusters in k-Means - some are based on finding an elbow in a chosen metric, others are information theoretical and balance the number of free parameters with the goodness of fit.
ODM doesn't explicitly support them but it shouldn't be hard to compute them on the side.
I hope this helps,
Thanks a lot for explanation :)