--keep_low_variance_outcomes
Switch
--keep_low_variance_outcomes
or --keep_low_variance
Description
Keep any outcomes, controls or interactions that have low variance.
Argument and Default Value
By default DLATK will calculate the variance for all outcomes, controls and interaction variables and remove if less than the default threshold. Use this flag to turn this feature off.
The default threshold is 0 and is set via a variable in dlaConstants.py
:
DEF_LOW_VARIANCE_THRESHOLD = 0.0
or can be changed via OutcomeGetter
and OutcomeAnalyzer
instance variables:
OutcomeGetter(..., low_variance_thresh=foo, ...)
OutcomeAnalyzer(..., low_variance_thresh=foo, ...)
Other Switches
Required Switches:
Optional Switches:
Some correlation command: --correlate, --logistic_reg, etc.
Some regression command: --nfold_test_regression, --predict_regression, --test_regression, etc.
Some classification command: --nfold_test_classifiers, --predict_classifiers, --test_classifiers, etc.
Example Commands
These are two toy examples where we correlate language features with gender but only consider males. You probably don't want to do this in practice.
# run DLA over only males
dlatkInterface.py -d dla_tutorial -t msgs -c user_id --outcome_table blog_outcomes \
--outcomes gender -f 'feat$1gram$msgs$user_id$16to16' --correlate --where "gender = 0" --keep_low_variance
# use 1grams to predict the gender of only males via 10-fold cross validation
dlatkInterface.py -d dla_tutorial -t msgs -c user_id --outcome_table blog_outcomes \
--outcomes gender -f 'feat$1gram$msgs$user_id$16to16' --combo_test_regression \
--folds 10 --where "gender = 0" --keep_low_variance