2 Replies Latest reply: Apr 23, 2008 1:51 PM by 265939 RSS

    What are the advantages of Oracle Data Mining vs Microsoft data Mining tech


      Can any one specify the advantages of using Oracle data mining vs Microsoft data mining techniques.
        • 1. Re: What are the advantages of Oracle Data Mining vs Microsoft data Mining
          Hasnur Ramadhan
          I can not tell you the advantages of Oracle Data Mining over Microsoft's, but I will share with you what I know.

          Oracle provides: General Linear Model (logistic and multiple regression), Decision Trees, Naive Bayes, Support Vector Machine (SVM), Minimum Descripton Length (MDL), Apriori, K-Means Clustering, O-Cluster, Non-negative Matrix Factorization (NMF), Text Mining (requires Oracle Text)
          Microsoft provides: Decision Trees, Naive Bayes, Microsoft Clustering, Microsoft Time Series (modified Decision Trees), Neural Network, Microsoft Association, Linear & Logistic Regression, Text Mining (part of Integration Services/ETL).

          Oracle provides Java Data Mining API (JSR-73 API) and Data Mining PL/SQL packages
          Microsoft provides OLE DB for Data Mining and DMX (Data Mining eXtensions)

          Development tools:
          Oracle: JDeveloper, SQL Developer, Data Miner
          Microsoft: Visual Studio

          User tools:
          Oracle: Data Miner, MS Excel
          Microsoft: MS Excel (the plug-in more advanced than Oracle's)

          Oracle: In-Database Data Mining
          Microsoft: Separate process from DB process (MS Analysis Services)

          Any corrections are welcome.
          • 2. Re: What are the advantages of Oracle Data Mining vs Microsoft data Mining

            The feature breakdown you gave is pretty good. The key differences emerge as a result of the entries in your Technology category. ODM is part of the Oracle RDBMS and most of its code is part of the database kernel. This has a number of performance and scalability benefits, specially during scoring. Here are a list of some of the benefits that come out of this:

            - No data movement between the database and an external server. This has a major impact on performance and security.
            - Database security. ODM leverages schema level security for models and in 11g also has introduced privileges for models.
            - Most ODM models can be loaded in the shared cursor cache. This avoids the need of loading the model multiple times for different users. A single model can serve multiple users and the model load time is bypassed altogether after the first invocation. Think about a call center scenario where many customers are scored simultaneously.
            - Out of the box support for parallel queries and RAC. Nothing special is required for ODM to take advantage of these features.
            - Scalable and flexible scoring. SQL Server scores models with a special syntax called prediction join. It is the result of treating models as tables. This restricts how models can be used in SQL queries. ODM scores models as single row functions. Models are viewed as special transformations. This makes scoring very performant and flexible. For example, ODM can use models in the SELECT, GROUP BY, ORDER BY, and WHERE parts of a query. Also it is possible to use a model score as an argument to another model score. Results are pipped. To do the same wit SQL server would be very complex and require multiple prediction joins. For examples of scoring take a look at http://oracledmt.blogspot.com/2006/05/sql-of-analytics-1-data-mining.html
            - Model scoring functions can be used in derived objects such as functional indexes and materialized views. Very fast pre-computed results that can be used immediately upon request - no need for re-scoring.
            - Automatic data preparation (ADP) and embedded data preparation (EDP) in 11g. ODM automatically process the data in the way that is suitable for algorithms (ADP) but it also allows for experts to embed their own transformations into models thus simplifying deployment and allowing for compact queries. For example, the automatic data preparation is aware of what to do for sparse and missing at random data, outliers, and missing values to name a few of its capabilities.
            - ODM has some cutting edge functionalities that until recently no other vendor had: SVM and Non-negative matrix factorization.