Forum Stats

  • 3,760,204 Users
  • 2,251,664 Discussions


How does ODM handle dimensionality?

713616 Member Posts: 7
edited May 15, 2013 11:46AM in Machine Learning
When we pass a large number of values (1,000) through an aggregator and then into a classification model (all 3), does ODM do anything to automagically deal with the problem of high dimensionality and sparsity?


  • Hi,
    Yes, each algorithm provides automatic data preparation. What actual data preparation performed is algorithm specific.
    The intent is to simplify the users responsibilities by removing standard data preparation tasks.
    However, the user can override this behavior on a per column basis by turning off ADP for that column.
    The user can also imbed their own column specific transformations into the model if they wish.
    Thanks, Mark
  • 713616
    713616 Member Posts: 7
    Thanks Mark! Could you expand on the ADP for GLM? Our target is a Y/N variable indicating a preterm birth so I assume the GLM is choosing a logistic regression. In one model, the input variables are diagnosis codes and there could be 1000. However, a typical patient would only have ~20, so this creates a great deal of sparsity. How does ADP handle this? Does it eliminate variables (diagnoses) that don't occur for any patient, or ones that occur below a certain threshold?
  • HI,
    Check out the following link that describes the ADP approach used by ODM algorithms.
    Note, the details of how this is done are an internal implementation.
    Thanks, Mark
  • 713616
    713616 Member Posts: 7
    Thanks again Mark. Good article but it states that "the handling of nested data, sparsity, and missing values is standard across algorithms and occurs independently of ADP." It's important for us to be able to explain what ODM does to address sparsity. The article also states that ADP is turned off by default. Where do we turn on ADP for th classifier?
  • Hi,
    Here are some additional links that provide info on handling missing values, sparsity and nested tables.

    Thanks, Mark
  • Forgot to pass on the link for how to turn off/on ADP

    The UI allows this to be done as well.

This discussion has been closed.