2 Replies Latest reply: May 14, 2012 5:53 AM by 920802 RSS

    Sum of squared error (SSE) for clustering


      How to see/calculate SSE (sum of squared error) for clustering in ODM ?
      I want to compare results of clustering for few runs with different number of clusters using KM algorithm.
      This measure seems to be a standard for such comparation, but I didn't fount it in ODM.

      Or maybe there is other type measure in ODM, which helps to pick optimal number of clusters for dataset ??

      Thanks in advance,
        • 1. Re: Sum of squared error (SSE) for clustering
          Hi Paul,

          ODM reports a dispersion metric per cluster. This is the average distance of the members of a cluster to the centroid. You can compute a global SSE by multiplying the dispersion per cluster and the number of rows in that cluster and summing them up.
          That said, as you increase the number of clusters, your SSE on the training data should be going down monotonically due to overfitting, so it isn't really useful in determining an optimal number of clusters. It will be better to score on a held-aside and compute your SSE there.
          There are a number methods to find an optimal number of clusters in k-Means - some are based on finding an elbow in a chosen metric, others are information theoretical and balance the number of free parameters with the goodness of fit.
          ODM doesn't explicitly support them but it shouldn't be hard to compute them on the side.

          I hope this helps,
          • 2. Re: Sum of squared error (SSE) for clustering
            Thanks a lot for explanation :)