This discussion is archived
2 Replies Latest reply: May 14, 2012 3:53 AM by 920802 RSS

Sum of squared error (SSE) for clustering

920802 Newbie
Currently Being Moderated
Hello,


How to see/calculate SSE (sum of squared error) for clustering in ODM ?
I want to compare results of clustering for few runs with different number of clusters using KM algorithm.
This measure seems to be a standard for such comparation, but I didn't fount it in ODM.

Or maybe there is other type measure in ODM, which helps to pick optimal number of clusters for dataset ??


Thanks in advance,
Paul.
  • 1. Re: Sum of squared error (SSE) for clustering
    400983 Newbie
    Currently Being Moderated
    Hi Paul,

    ODM reports a dispersion metric per cluster. This is the average distance of the members of a cluster to the centroid. You can compute a global SSE by multiplying the dispersion per cluster and the number of rows in that cluster and summing them up.
    That said, as you increase the number of clusters, your SSE on the training data should be going down monotonically due to overfitting, so it isn't really useful in determining an optimal number of clusters. It will be better to score on a held-aside and compute your SSE there.
    There are a number methods to find an optimal number of clusters in k-Means - some are based on finding an elbow in a chosen metric, others are information theoretical and balance the number of free parameters with the goodness of fit.
    ODM doesn't explicitly support them but it shouldn't be hard to compute them on the side.

    I hope this helps,
    Boriana
  • 2. Re: Sum of squared error (SSE) for clustering
    920802 Newbie
    Currently Being Moderated
    Thanks a lot for explanation :)

Legend

  • Correct Answers - 10 points
  • Helpful Answers - 5 points