This discussion is archived
6 Replies Latest reply: May 15, 2013 12:41 AM by Brendan RSS

DBMS_PREDICTIVE_ANALYTICS and DBMS_DATAMINING.ATTRIBUTE_IMPORTANCE

Brendan Oracle ACE Director
Currently Being Moderated
What is difference is between using the EXPLAIN function in DBMS_PREDICTIVE_ANALYTICS and using the DBMS_DATA_MINING.MODEL function with model set to DBMS_DATAMINING.ATRIBUTE_IMPORTANCE ?

Is there a difference or do they do the same thing?
  • 1. Re: DBMS_PREDICTIVE_ANALYTICS and DBMS_DATAMINING.ATTRIBUTE_IMPORTANCE
    56160 Newbie
    Currently Being Moderated
    They both use the same algorithm (and underlying code).
    EXPLAIN adds some pre-processing to handle input attributes using the DATE data type and attributes with unstructured text (in 12c).
    EXPLAIN also adds post-processing to normalize the attribute importance values so that they range from 0-1.

    -Peter
  • 2. Re: DBMS_PREDICTIVE_ANALYTICS and DBMS_DATAMINING.ATTRIBUTE_IMPORTANCE
    Brendan Oracle ACE Director
    Currently Being Moderated
    But attribute importance in the GUI and in DBMS_DATA_MINING also give the range 0-1
    They used to give -1 - +1

    Is there a setting to get the -1 to +1 range or are all negative values set to zero
  • 3. Re: DBMS_PREDICTIVE_ANALYTICS and DBMS_DATAMINING.ATTRIBUTE_IMPORTANCE
    Mark Kelly Oracle ACE
    Currently Being Moderated
    Hi Brendan,
    ODMr no longer actually creates a AI model.
    Attribute Importance is generated in the Column Filter node using a ODM function, so no model is persisted.
    The range is 0 to 1.
    When you refer to an older behavior can you state what versions of the db you are comparing against and what version of ODMr?
    Thanks, Mark
  • 4. Re: DBMS_PREDICTIVE_ANALYTICS and DBMS_DATAMINING.ATTRIBUTE_IMPORTANCE
    Brendan Oracle ACE Director
    Currently Being Moderated
    Hi Mark
    It was in the classic version. The earlier version of the the 11.2 documentation at it in 9-2 and 9-3. I can email you the doc.
    Brendan
  • 5. Re: DBMS_PREDICTIVE_ANALYTICS and DBMS_DATAMINING.ATTRIBUTE_IMPORTANCE
    Mark Kelly Oracle ACE
    Currently Being Moderated
    Hi Brendan,
    So you are correct, there is a change in how AI scales the results.
    Here is the explanation from the algorithm developer to clarify the intent.
    Thanks, Mark

    The raw score for attribute Importance is a simple two-part code MDL measure. It views a model as an attempt to reduce communication costs, measured in transmission bits. The cost is the sum of the costs of transmitting the model and transmitting the data using the model to compress the data. This gives a way of comparing a set of different models, in particular, a model consisting of the of the target probability conditioned on a binned set of attribute values versus the prior. The benefit is measured, using an idealized codes, the entropies, p log p. The best possible code has a cost equal to within a bit of the entropy. The benefit is equal to the reduction in communication cost when the attribute model is chosen relative to the prior model. It is not a good thing, if that reduction is negative. In that respect, the measure differs from correlation, where the sign is a direction and the magnitude, a strength. Negative values represent uninteresting attributes, so these were set to benefit 0.

    The problem with the raw measure is that the range of values depends on the problem. The higher the entropy of the target (prior model), the greater the scale of raw values. This makes it difficult for users to interpret. To simplify, we re-scaled the values. The rescaling is the per-row benefit.

Legend

  • Correct Answers - 10 points
  • Helpful Answers - 5 points