--train_regression
Switch
--train_regression
Description
Trains a regression model using the features given.
Argument and Default Value
None
Details
This switch will cause the infrastructure to train a machine learning model to predict the outcome(s) (--outcomes) from the features in the feature tables -f (Note that you can put multiple feature tables in there). Features are loaded into memory, and are filtered/clustered using the feature selection (see below) and then standardized over the groups (unless --no_standardize is used), then fed into the regression model. It is usually useful to use this switch with --save_model, but put the order of the features into the name cause those aren't yet stored in the model.
Feature Selection In order to avoid overfitting, we have a couple of feature selection steps that once can do. Most of our feature selection is done using the Scikit:doc:fwflag_Learn package. To use it, (un)comment the lines below:
# feature selection: featureSelectionString = None
Every feature selector string will create an object if evaluated, and said object needs to have the following two functions: fit(X, y) transform(X) If putting a lot of features into the model, it's good to use the pipeline feature selection:
- featureSelectionString = 'Pipeline([("1_mean_value_filter", OccurrenceThreshold(threshold=(X.shape[0]/100.0))),
("2_univariate_select", SelectFwe(f_regression, alpha=70.0)), ("3_rpca", RandomizedPCA(n_components=.4/len(self.featureGetters), random_state=42,
whiten=False, iterated_power=3, max_components=X.shape[0]/max(1.5, len(self.featureGetters))))])'
If there aren't many features, you can choose not to use any feature selection. Talk to a CS PostDoc about this :)
Model selection See below for choosing the model. Once the model is chosen, you should tweak the parameters by commenting in/out the appropriate line in regressionPredictor.py below
# Model Parameters cvParams = {...
You can choose your model using --model, and choose one of the following: linear ridge ridgecv ridgefirstpasscv ridgehighcv ridgelowcv rpcridgecv lasso lassocv elasticnet elasticnetcv lars lassolars lassolarscv svr sgdregressor extratrees par
Other Switches
Required Switches: -d, -g, -t, -f, --outcome_table, --outcomes Optional Switches: --group_freq_thresh --model --save_model --picklefile --no_standardize --sparse --regression_to_lexicon etc.