Forum Stats

  • 3,757,818 Users
  • 2,251,272 Discussions
  • 7,869,925 Comments

Discussions

OJDM: table not deleted when removing test metrics

766176
766176 Member Posts: 4
edited Mar 13, 2013 10:42AM in Machine Learning
Hi friends,

We use both the PL/SQL and Java Data Mining APIs for a mining project. When generating test metrics with OJDM and finally removing them via the Connection#removeObject(..) operation i noticed that one table remains in the user's tablespace which is named with a 'DM$T' prefix followed by a random number. I'm able to manually delete it although it's clear to me that this shouldn't be done as it's a mining object that is implicitely created during metrics generation.

The table schema is defined as follows:
(1) for classification metrics (for example 'DM$T1081836780'):
ID number,
PREDICTION number,
PROBABILITY number,
COST number.
(2) for regression metrics (for example 'DM$T531094'):
ID number,
PREDICTION number.
I assume that 'ID' comes from the column name defined as the case id (in our case this is 'ID'). It seems to be a table temporarily created and used during metrics generation. My question: Is this a known bug in the current API version that this table is not deleted when removing metrics? Or should it be left there as some internal metrics depend on it?

In our project we will have to generate hundreds of such metrics (even if temporarily) that would result in a huge number of needless tables.

Thanks for your help!

Edited by: meister_e on 08.03.2013 01:38
Tagged:

Answers

  • jdadashev-Oracle
    jdadashev-Oracle Member Posts: 57 Employee
    Hello,

    The leftover table seems to be a bug but unfortunately JDM API is no longer supported by Oracle and we won't be able to provide the fix.
    To make sure that it is actually a bug you can paste a short use case with the JDM code which you are using and I can try to confirm whether or not it is bug.

    But the preferred solution is to use DBMS_DATA_MINING API.
    Below is a simple example which demonstrate what JDM is doing internally

    Thanks,
    Jim

    -- create build data
    create table sample_data as
    SELECT
    a.CUST_ID, a.CUST_GENDER,
    2003-a.CUST_YEAR_OF_BIRTH AGE,
    a.CUST_MARITAL_STATUS, c.COUNTRY_NAME, a.CUST_INCOME_LEVEL, b.EDUCATION,
    b.OCCUPATION, b.HOUSEHOLD_SIZE, b.YRS_RESIDENCE, b.AFFINITY_CARD,
    b.BULK_PACK_DISKETTES, b.FLAT_PANEL_MONITOR, b.HOME_THEATER_PACKAGE,
    b.BOOKKEEPING_APPLICATION, b.PRINTER_SUPPLIES, b.Y_BOX_GAMES,
    b.os_doc_set_kanji, b.comments
    FROM
    sh.customers a,
    sh.supplementary_demographics b,
    sh.countries c
    WHERE
    a.CUST_ID = b.CUST_ID
    AND a.country_id = c.country_id
    AND a.cust_id between 100001 and 104500;

    -- create apply data
    create table sample_apply_data as
    SELECT
    a.CUST_ID, a.CUST_GENDER,
    2003-a.CUST_YEAR_OF_BIRTH AGE,
    a.CUST_MARITAL_STATUS, c.COUNTRY_NAME, a.CUST_INCOME_LEVEL, b.EDUCATION,
    b.OCCUPATION, b.HOUSEHOLD_SIZE, b.YRS_RESIDENCE, b.AFFINITY_CARD,
    b.BULK_PACK_DISKETTES, b.FLAT_PANEL_MONITOR, b.HOME_THEATER_PACKAGE,
    b.BOOKKEEPING_APPLICATION, b.PRINTER_SUPPLIES, b.Y_BOX_GAMES,
    b.os_doc_set_kanji, b.comments
    FROM
    sh.customers a,
    sh.supplementary_demographics b,
    sh.countries c
    WHERE
    a.CUST_ID = b.CUST_ID
    AND a.country_id = c.country_id
    AND a.cust_id between 100001 and 104500;

    --cleanup

    BEGIN DBMS_DATA_MINING.DROP_MODEL('DT_SH_Clas_sample');
    EXCEPTION WHEN OTHERS THEN NULL; END;
    /

    BEGIN EXECUTE IMMEDIATE 'DROP TABLE dt_sh_sample_settings';
    EXCEPTION WHEN OTHERS THEN NULL; END;
    /
    BEGIN EXECUTE IMMEDIATE 'DROP TABLE dt_sh_sample_cost';
    EXCEPTION WHEN OTHERS THEN NULL; END;
    /

    BEGIN EXECUTE IMMEDIATE 'DROP TABLE dt_apply_results';
    EXCEPTION WHEN OTHERS THEN NULL; END;
    /

    -- create settings table
    CREATE TABLE dt_sh_sample_settings (
    setting_name VARCHAR2(30),
    setting_value VARCHAR2(4000));

    -- create cost matrix
    CREATE TABLE dt_sh_sample_cost (
    actual_target_value NUMBER,
    predicted_target_value NUMBER,
    cost NUMBER);
    INSERT INTO dt_sh_sample_cost VALUES (0,0,0);
    INSERT INTO dt_sh_sample_cost VALUES (0,1,1);
    INSERT INTO dt_sh_sample_cost VALUES (1,0,8);
    INSERT INTO dt_sh_sample_cost VALUES (1,1,0);

    BEGIN
    -- Populate settings table
    INSERT INTO dt_sh_sample_settings VALUES
    (dbms_data_mining.algo_name, dbms_data_mining.algo_decision_tree);
    INSERT INTO dt_sh_sample_settings VALUES
    (dbms_data_mining.clas_cost_table_name, 'dt_sh_sample_cost');
    END;
    /

    -- Build a Decision Tree model
    BEGIN
    DBMS_DATA_MINING.CREATE_MODEL(
    model_name => 'DT_SH_Clas_sample',
    mining_function => dbms_data_mining.classification,
    data_table_name => 'sample_data',
    case_id_column_name => 'cust_id',
    target_column_name => 'affinity_card',
    settings_table_name => 'dt_sh_sample_settings');
    END;
    /

    -- List customers (ordered by their id) along with likelihood and cost
    -- to use or reject the affinity card (Note: while this example has a
    -- binary target, such a query is useful in multi-class classification -
    -- Low, Med, High for example).

    create table dt_apply_results as
    SELECT T.cust_id, S.prediction, S.probability, S.cost
    FROM (SELECT cust_id,
    PREDICTION_SET(dt_sh_clas_sample COST MODEL USING *) pset
    FROM sample_apply_data) T,
    TABLE(T.pset) S
    ORDER BY cust_id, S.prediction;
  • 766176
    766176 Member Posts: 4
    edited Mar 13, 2013 9:49AM
    Thanks jdadashev,

    we know that the JDM API is based on the core of the DBMS mining package and that it is not as commonly used as the core functions. But what is an alternative for java EE applications to integrate ODM? I mean the JSR standard is old, of course, but it's at least standard and most tools on the market are proprietary. I can only imagine a jdbc solution that encapsulates DBMS calls but then we are again away from any standardised approach. It would be interesting to know whether future releases will only not support the old (current) JDM implementation or if there won't be any java data mining integration at all?

    Btw, the thing that we do when the problem mentioned above occurs is (e.g. for classification):
    // retrieve data mining connection ...
    Connection conn = connectionFactory.getConnection();
    
    // retrieve model
    ClassificationModel model = (ClassificationModel) conn.retrieveObject("DM_MODEL", NamedObject.model);
    
    // retrieve dataset to be tested
    PhysicalDataSet dataset = (PhysicalDataSet) conn.retrieveObject("DM_TEST_DATASET", NamedObject.physicalDataSet);
    
    // create/execute test task
    ClassificationTestTaskFactory factory = createFactory(conn, ClassificationTestTask.class.getName(), ClassificationTestTaskFactory.class);
    ClassificationTestTask task = factory.create(dataset.getName(), model.getName(), "DM_METRICS");
    conn.execute(task, Long.MAX_VALUE);
    
    // retrieve generated metrics
    ClassificationTestMetrics metrics = (ClassificationTestMetrics) conn.retrieveObject("DM_METRICS", NamedObject.testMetrics);
    // examine metrics ..
    
    // delete metrics
    conn.removeObject("DM_METRICS", NamedObject.testMetrics);
  • jdadashev-Oracle
    jdadashev-Oracle Member Posts: 57 Employee
    Hello,

    Since Oracle JDM will no longer be available in DB 12.1, jdbc is the only option for java EE applications to integrate ODM.
    For long running operations the process can either be invoked via the Oracle DB Scheduler or Oracle Fusion Scheduler depending on which framework you prefer.

    Thanks
    jdadashev-Oracle
  • 766176
    766176 Member Posts: 4
    Mmh, no good news - but thank you for this info.
This discussion has been closed.