This content has been marked as final. Show 6 replies
They both use the same algorithm (and underlying code).
EXPLAIN adds some pre-processing to handle input attributes using the DATE data type and attributes with unstructured text (in 12c).
EXPLAIN also adds post-processing to normalize the attribute importance values so that they range from 0-1.
But attribute importance in the GUI and in DBMS_DATA_MINING also give the range 0-1
They used to give -1 - +1
Is there a setting to get the -1 to +1 range or are all negative values set to zero
ODMr no longer actually creates a AI model.
Attribute Importance is generated in the Column Filter node using a ODM function, so no model is persisted.
The range is 0 to 1.
When you refer to an older behavior can you state what versions of the db you are comparing against and what version of ODMr?
It was in the classic version. The earlier version of the the 11.2 documentation at it in 9-2 and 9-3. I can email you the doc.
So you are correct, there is a change in how AI scales the results.
Here is the explanation from the algorithm developer to clarify the intent.
The raw score for attribute Importance is a simple two-part code MDL measure. It views a model as an attempt to reduce communication costs, measured in transmission bits. The cost is the sum of the costs of transmitting the model and transmitting the data using the model to compress the data. This gives a way of comparing a set of different models, in particular, a model consisting of the of the target probability conditioned on a binned set of attribute values versus the prior. The benefit is measured, using an idealized codes, the entropies, p log p. The best possible code has a cost equal to within a bit of the entropy. The benefit is equal to the reduction in communication cost when the attribute model is chosen relative to the prior model. It is not a good thing, if that reduction is negative. In that respect, the measure differs from correlation, where the sign is a direction and the magnitude, a strength. Negative values represent uninteresting attributes, so these were set to benefit 0.
The problem with the raw measure is that the range of values depends on the problem. The higher the entropy of the target (prior model), the greater the scale of raw values. This makes it difficult for users to interpret. To simplify, we re-scaled the values. The rescaling is the per-row benefit.