This discussion is archived
2 Replies Latest reply: Apr 9, 2012 6:50 PM by 400983 RSS

Clustering

920802 Newbie
Currently Being Moderated
Hi,

I'm exploring clustering in ODM.

I have a question regarding clustering algorithms.

1. What is a meaning of "number of clusters" for both algorithms ?
For OC it seems to be a max possible value, while for KM value to fit.
How to predict "proper" number of clusters for data (especially for KM)?

2. Where can I find some more informations about coefficients of algorithms ?
But not what they are, but what are consequences of changing default values for algorithms
(e.g what will be a consequence for changing convergence tollerance for KM from 0,1 to 0,2) ??


Best regards,
Paul.
  • 1. Re: Clustering
    Mark Kelly Oracle ACE
    Currently Being Moderated
    Hi Paul,

    I can answer the first question from just referring to the doc:

    1. What is a meaning of "number of clusters" for both algorithms ?
    For OC it seems to be a max possible value, while for KM value to fit.
    How to predict "proper" number of clusters for data (especially for KM)?

    Answer:
    Maximum number of leaf clusters generated by a clustering algorithm.
    (Oracle Data Mining clustering algorithms are hierarchical)
    Enhanced k-Means usually produces the exact number of clusters specified by CLUS_NUM_CLUSTERS, unless there are fewer distinct data points.

    O-Cluster may produce fewer clusters than the number specified by CLUS_NUM_CLUSTERS, depending on the data.

    2. Where can I find some more informations about coefficients of algorithms ?
    But not what they are, but what are consequences of changing default values for algorithms
    (e.g what will be a consequence for changing convergence tollerance for KM from 0,1 to 0,2) ??

    Answer: The documentation does not always get into the details of the impact of changing a specific setting.
    I will see if I can get some further information from the developer.
    Thanks, Mark

    virtual api book:

    http://www.oracle.com/pls/db112/vbook_subject?subject=dma
  • 2. Re: Clustering
    400983 Newbie
    Currently Being Moderated
    Changing the convergence affects the quality of the solution. If you increase the convergence tolerance too much, the clustering may not converge and the model will be suboptimal. Decreasing the tolerance improves model quality but it also slows down the model build.

Legend

  • Correct Answers - 10 points
  • Helpful Answers - 5 points