1 Reply Latest reply on Dec 17, 2012 6:53 PM by Mark Kelly-Oracle

    Data Mining on a "Weblog" file


      I'm using Oracle 3.0.04. I've been trying to use data mining tools, i.e. clustering, from ODMiner to cluster Weblog data. The Weblog file contains web access information from a busy web server for one day, and consists of the following 12 attributes,

      1. log_date date
      2. log_time time
      3. c_ip character (15)
      4. cs_username character (1)
      5. sc_method character (4)
      6. cs_uri_stem character (56)
      7. cs_uri_query character (255)
      8. sc_status integer
      9. cs_host character (19)
      10. cs_user_agent character (157)
      11. cs_cookie character (143)
      12. cs_referrer character (192)

      Since the information recorded is for a day only, the dates are all the same, except the times. For preparing the data first, I've cleaned it by removing image and audio records. I've then used SQL queries to aggregate cs_uri_stem, grouping by c_ip. It didn't work and I kept getting the error, "character string buffer too small". I couldn't use any other data type than varchar2 since the other data types are not accepted. So instead I used cs_uri_stem as it is, in a new table with a sequence id and used that for modelling in order to understand the user behaviour. In the Clust Build node, I tried with K-means cluster, but I kept getting the error,
      Build failed due to ORA-20114: Invalid training data for model build.
      The data in the final table looks like this,
      1 /uk/letters/letters.asp

      I'm not really sure about what steps I should take in mining these attributes. I've never worked with the ODMiner Tools before either. The main web mining task is to develop an intelligent recommend model based on queries recorded in the web log. I would really appreciate any advice/recommendations with this, because I'm just going around in circles without any clue.
        • 1. Re: Data Mining on a "Weblog" file
          Mark Kelly-Oracle
          Being new to mining you have really set off on a ambitious mining project :)

          Couple of technical pointers:

          *1) Version of Data Miner being Used*
          You are using the original Data Miner release.
          I would download the latest SQL Dev release that contains the current Data Miner client and repository installation.
          SQL Developer 3.2.2 RTM Version Build MAIN-09.87
          Drop the old repository and start with this latest one, assuming you are just getting started and have no significant mining worklows created.
          You can always export the workflows to disk if you want to import them to the repository.
          Alternatively you can migrate the older repository, but I would avoid unless you really need to, as it requires Data Miner to hold on to some older repository definitions.

          *2) Handling of text*
          It seems your primary source of data for the clustering process will be the cs_uri_query.
          You might find better results processing it as text data rather than as categorical data.
          You can use the Build Text node to transform cs_uri_query into a nested column that contains text tokens.

          *3) Methodology definition*
          This is probably your biggest challenge really.
          What is the overall methodology to produce the desired result.
          You stated your objective is: develop an intelligent recommend model based on queries recorded in the web log
          Once you create clusters from this data, what are your next steps?
          What type of recommendation do you want to generate?

          Thanks, Mark