This content has been marked as final. Show 6 replies
Yes, each algorithm provides automatic data preparation. What actual data preparation performed is algorithm specific.
The intent is to simplify the users responsibilities by removing standard data preparation tasks.
However, the user can override this behavior on a per column basis by turning off ADP for that column.
The user can also imbed their own column specific transformations into the model if they wish.
Thanks Mark! Could you expand on the ADP for GLM? Our target is a Y/N variable indicating a preterm birth so I assume the GLM is choosing a logistic regression. In one model, the input variables are diagnosis codes and there could be 1000. However, a typical patient would only have ~20, so this creates a great deal of sparsity. How does ADP handle this? Does it eliminate variables (diagnoses) that don't occur for any patient, or ones that occur below a certain threshold?
Check out the following link that describes the ADP approach used by ODM algorithms.
Note, the details of how this is done are an internal implementation.
Thanks again Mark. Good article but it states that "the handling of nested data, sparsity, and missing values is standard across algorithms and occurs independently of ADP." It's important for us to be able to explain what ODM does to address sparsity. The article also states that ADP is turned off by default. Where do we turn on ADP for th classifier?
Here are some additional links that provide info on handling missing values, sparsity and nested tables.
Forgot to pass on the link for how to turn off/on ADP
The UI allows this to be done as well.