6 Replies Latest reply: May 15, 2013 10:46 AM by Mark Kelly RSS

    How does ODM handle dimensionality?

    713616
      When we pass a large number of values (1,000) through an aggregator and then into a classification model (all 3), does ODM do anything to automagically deal with the problem of high dimensionality and sparsity?
        • 1. Re: How does ODM handle dimensionality?
          Mark Kelly
          Hi,
          Yes, each algorithm provides automatic data preparation. What actual data preparation performed is algorithm specific.
          The intent is to simplify the users responsibilities by removing standard data preparation tasks.
          However, the user can override this behavior on a per column basis by turning off ADP for that column.
          The user can also imbed their own column specific transformations into the model if they wish.
          Thanks, Mark
          • 2. Re: How does ODM handle dimensionality?
            713616
            Thanks Mark! Could you expand on the ADP for GLM? Our target is a Y/N variable indicating a preterm birth so I assume the GLM is choosing a logistic regression. In one model, the input variables are diagnosis codes and there could be 1000. However, a typical patient would only have ~20, so this creates a great deal of sparsity. How does ADP handle this? Does it eliminate variables (diagnoses) that don't occur for any patient, or ones that occur below a certain threshold?
            • 3. Re: How does ODM handle dimensionality?
              Mark Kelly
              HI,
              Check out the following link that describes the ADP approach used by ODM algorithms.
              Note, the details of how this is done are an internal implementation.
              Thanks, Mark
              http://docs.oracle.com/cd/E11882_01/datamine.112/e16808/xform_data.htm#BGBHAIBF
              • 4. Re: How does ODM handle dimensionality?
                713616
                Thanks again Mark. Good article but it states that "the handling of nested data, sparsity, and missing values is standard across algorithms and occurs independently of ADP." It's important for us to be able to explain what ODM does to address sparsity. The article also states that ADP is turned off by default. Where do we turn on ADP for th classifier?
                • 5. Re: How does ODM handle dimensionality?
                  Mark Kelly
                  Hi,
                  Here are some additional links that provide info on handling missing values, sparsity and nested tables.

                  http://docs.oracle.com/cd/E11882_01/datamine.112/e12218/xform_casetbl.htm

                  Thanks, Mark
                  • 6. Re: How does ODM handle dimensionality?
                    Mark Kelly
                    Forgot to pass on the link for how to turn off/on ADP
                    http://docs.oracle.com/cd/E11882_01/appdev.112/e25788/d_datmin.htm#i1069305

                    The UI allows this to be done as well.

                    Mark