1 Reply Latest reply on Aug 21, 2012 2:30 PM by Mark Kelly-Oracle

    Importing a small dataset for model creation

      Hello All,
      I am trying to create the standard toy weather data problem which aims to predict whether or not to play tennis (see dataset below).
      Imported this using an external table and then inserted it into an internal table.
      However when i create a basic 2 node workflow (data source - classification model set to naive bayes) , there seems to be no probabilities being generated for attributes. When I used a decision tree, it simply creates 1 single leaf node!
      I was wondering if there is any limit on the dataset size? Perhaps I am missing some settings for the classifier?

      outlook, temp, humidity, windy, play
      sunny, 85, 85, FALSE, no
      sunny, 80, 90, TRUE, no
      overcast, 83, 86, FALSE, yes
      rainy, 70, 96, FALSE, yes
      rainy, 68, 80, FALSE, yes
      rainy, 65, 70, TRUE, no
      overcast, 64, 65, TRUE, yes
      sunny, 72, 95, FALSE, no
      sunny, 69, 70, FALSE, yes
      rainy, 75, 80, FALSE, yes
      sunny, 75, 70, TRUE, yes
      overcast, 72, 90, TRUE, yes
      overcast, 81, 75, FALSE, yes
      rainy, 71, 91, TRUE, no
        • 1. Re: Importing a small dataset for model creation
          Mark Kelly-Oracle
          Hi Nir,
          I used SQL Dev's Import Data feature to import the data as a csv file.
          it is a pretty nice feature that is located in the database navigator.
          You access it by right clicking the table folder and selecting import data.
          The NB and DT models are basically defaulting to priors given the lack of information in the data.
          The GLM and SVM models do better and achieve better than average results.
          See the test metrics output to see these comparisons.
          To really test out the mining capabilities you should use better data.
          When you have a minimal number of rows you can change the test settings to use all the data for both build and test.
          That helps only slightly with toy data.
          Thanks, Mark