I am trying to create the standard toy weather data problem which aims to predict whether or not to play tennis (see dataset below).
Imported this using an external table and then inserted it into an internal table.
However when i create a basic 2 node workflow (data source - classification model set to naive bayes) , there seems to be no probabilities being generated for attributes. When I used a decision tree, it simply creates 1 single leaf node!
I was wondering if there is any limit on the dataset size? Perhaps I am missing some settings for the classifier?
outlook, temp, humidity, windy, play
sunny, 85, 85, FALSE, no
sunny, 80, 90, TRUE, no
overcast, 83, 86, FALSE, yes
rainy, 70, 96, FALSE, yes
rainy, 68, 80, FALSE, yes
rainy, 65, 70, TRUE, no
overcast, 64, 65, TRUE, yes
sunny, 72, 95, FALSE, no
sunny, 69, 70, FALSE, yes
rainy, 75, 80, FALSE, yes
sunny, 75, 70, TRUE, yes
overcast, 72, 90, TRUE, yes
overcast, 81, 75, FALSE, yes
rainy, 71, 91, TRUE, no
I used SQL Dev's Import Data feature to import the data as a csv file.
it is a pretty nice feature that is located in the database navigator.
You access it by right clicking the table folder and selecting import data.
The NB and DT models are basically defaulting to priors given the lack of information in the data.
The GLM and SVM models do better and achieve better than average results.
See the test metrics output to see these comparisons.
To really test out the mining capabilities you should use better data.
When you have a minimal number of rows you can change the test settings to use all the data for both build and test.
That helps only slightly with toy data.