2 Replies Latest reply: Apr 9, 2012 8:50 PM by 400983 RSS

    Clustering

    920802
      Hi,

      I'm exploring clustering in ODM.

      I have a question regarding clustering algorithms.

      1. What is a meaning of "number of clusters" for both algorithms ?
      For OC it seems to be a max possible value, while for KM value to fit.
      How to predict "proper" number of clusters for data (especially for KM)?

      2. Where can I find some more informations about coefficients of algorithms ?
      But not what they are, but what are consequences of changing default values for algorithms
      (e.g what will be a consequence for changing convergence tollerance for KM from 0,1 to 0,2) ??


      Best regards,
      Paul.
        • 1. Re: Clustering
          Mark Kelly-Oracle
          Hi Paul,

          I can answer the first question from just referring to the doc:

          1. What is a meaning of "number of clusters" for both algorithms ?
          For OC it seems to be a max possible value, while for KM value to fit.
          How to predict "proper" number of clusters for data (especially for KM)?

          Answer:
          Maximum number of leaf clusters generated by a clustering algorithm.
          (Oracle Data Mining clustering algorithms are hierarchical)
          Enhanced k-Means usually produces the exact number of clusters specified by CLUS_NUM_CLUSTERS, unless there are fewer distinct data points.

          O-Cluster may produce fewer clusters than the number specified by CLUS_NUM_CLUSTERS, depending on the data.

          2. Where can I find some more informations about coefficients of algorithms ?
          But not what they are, but what are consequences of changing default values for algorithms
          (e.g what will be a consequence for changing convergence tollerance for KM from 0,1 to 0,2) ??

          Answer: The documentation does not always get into the details of the impact of changing a specific setting.
          I will see if I can get some further information from the developer.
          Thanks, Mark

          virtual api book:

          http://www.oracle.com/pls/db112/vbook_subject?subject=dma
          • 2. Re: Clustering
            400983
            Changing the convergence affects the quality of the solution. If you increase the convergence tolerance too much, the clustering may not converge and the model will be suboptimal. Decreasing the tolerance improves model quality but it also slows down the model build.