Output Formats
Summary of output types available in DLATK. Most of the following commands need the --output_name flag which specifies the output directory.
In the csv-type outputs the DLATK namespace will be dumped as the first line. This allows you to revisit results and know your exact settings.
CSV
Correlations
dlatkInterface.py -d dla_tutorial -t msgs -c user_id --correlate --csv --outcome_table blog_outcomes --outcomes age gender is_student is_education is_technology --outcome_with_outcome_only --output ~/correlations
feature,age,p,N,CI_l,CI_u,freq,gender,p,N,CI_l,CI_u,freq,is_education,p,N,CI_l,CI_u,freq,is_student,p,N,CI_l,CI_u,freq,is_technology,p,N,CI_l,CI_u,freq
outcome_age,1.0,0.0,1000,1.0,1.0,1000,0.013089769277,0.679288049345,1000,-0.048943029258,0.0750219732934,1000,0.172756193994,9.62582613937e-08,1000,0.111962191762,0.232261824573,1000,-0.45911352125,6.87199798082e-53,1000,-0.50668539378,-0.408754348795,1000,0.117238226305,0.000337894917181,1000,0.0556496028558,0.177938062241,1000
outcome_gender,0.013089769277,0.679288049345,1000,-0.048943029258,0.0750219732934,1000,1.0,0.0,1000,1.0,1.0,1000,-0.161206890368,4.95925394952e-07,1000,-0.220991447954,-0.100215331046,1000,-0.0221719208099,0.483710711985,1000,-0.0840494768068,0.0398759714239,1000,0.065154589658,0.0492504223304,1000,0.00317432871342,0.126636171693,1000
outcome_is_education,0.172756193994,6.41721742624e-08,1000,0.111962191762,0.232261824573,1000,-0.161206890368,7.43888092428e-07,1000,-0.220991447954,-0.100215331046,1000,1.0,0.0,1000,1.0,1.0,1000,-0.14424303335,7.7633413628e-06,1000,-0.204408282607,-0.0829920718038,1000,-0.0537304778134,0.0894683992084,1000,-0.115339374305,0.00829021870801,1000
outcome_is_student,-0.45911352125,6.87199798082e-53,1000,-0.50668539378,-0.408754348795,1000,-0.0221719208099,0.604638389981,1000,-0.0840494768068,0.0398759714239,1000,-0.14424303335,5.8225060221e-06,1000,-0.204408282607,-0.0829920718038,1000,1.0,0.0,1000,1.0,1.0,1000,-0.141292966419,1.82280081643e-05,1000,-0.20152088984,-0.0800006243256,1000
outcome_is_technology,0.117238226305,0.000253421187886,1000,0.0556496028558,0.177938062241,1000,0.065154589658,0.0656672297739,1000,0.00317432871342,0.126636171693,1000,-0.0537304778134,0.0894683992084,1000,-0.115339374305,0.00829021870801,1000,-0.141292966419,9.11400408216e-06,1000,-0.20152088984,-0.0800006243256,1000,1.0,0.0,1000,1.0,1.0,1000
The --sort flag creates a second table where each column is sort by descending effect size and appends the table to the bottom of the csv:
SORTED:
rank,age,r,p,N,CI_l,CI_u,freq,gender,r,p,N,CI_l,CI_u,freq,is_education,r,p,N,CI_l,CI_u,freq,is_student,r,p,N,CI_l,CI_u,freq,is_technology,r,p,N,CI_l,CI_u,freq
1,outcome_age,1.0,0.0,1000,1.0,1.0,1000,outcome_gender,1.0,0.0,1000,1.0,1.0,1000,outcome_is_education,1.0,0.0,1000,1.0,1.0,1000,outcome_is_student,1.0,0.0,1000,1.0,1.0,1000,outcome_is_technology,1.0,0.0,1000,1.0,1.0,1000
2,outcome_is_education,0.172756193994,6.41721742624e-08,1000,0.111962191762,0.232261824573,1000,outcome_is_technology,0.065154589658,0.0656672297739,1000,0.00317432871342,0.126636171693,1000,outcome_age,0.172756193994,9.62582613937e-08,1000,0.111962191762,0.232261824573,1000,outcome_gender,-0.0221719208099,0.483710711985,1000,-0.0840494768068,0.0398759714239,1000,outcome_age,0.117238226305,0.000337894917181,1000,0.0556496028558,0.177938062241,1000
3,outcome_is_technology,0.117238226305,0.000253421187886,1000,0.0556496028558,0.177938062241,1000,outcome_age,0.013089769277,0.679288049345,1000,-0.048943029258,0.0750219732934,1000,outcome_is_technology,-0.0537304778134,0.0894683992084,1000,-0.115339374305,0.00829021870801,1000,outcome_is_technology,-0.141292966419,9.11400408216e-06,1000,-0.20152088984,-0.0800006243256,1000,outcome_gender,0.065154589658,0.0492504223304,1000,0.00317432871342,0.126636171693,1000
4,outcome_gender,0.013089769277,0.679288049345,1000,-0.048943029258,0.0750219732934,1000,outcome_is_student,-0.0221719208099,0.604638389981,1000,-0.0840494768068,0.0398759714239,1000,outcome_is_student,-0.14424303335,5.8225060221e-06,1000,-0.204408282607,-0.0829920718038,1000,outcome_is_education,-0.14424303335,7.7633413628e-06,1000,-0.204408282607,-0.0829920718038,1000,outcome_is_education,-0.0537304778134,0.0894683992084,1000,-0.115339374305,0.00829021870801,1000
5,outcome_is_student,-0.45911352125,6.87199798082e-53,1000,-0.50668539378,-0.408754348795,1000,outcome_is_education,-0.161206890368,7.43888092428e-07,1000,-0.220991447954,-0.100215331046,1000,outcome_gender,-0.161206890368,4.95925394952e-07,1000,-0.220991447954,-0.100215331046,1000,outcome_age,-0.45911352125,6.87199798082e-53,1000,-0.50668539378,-0.408754348795,1000,outcome_is_student,-0.141292966419,1.82280081643e-05,1000,-0.20152088984,-0.0800006243256,1000
Predictions
Output prediction values to a csv.
dlatkInterface.py -d dla_tutorial -t msgs -c user_id -f 'feat$cat_met_a30_2000_cp_w$msgs$user_id$16to16' --combo_test_regression --outcome_table blog_outcomes --outcomes age --output ~/prediction.values --prediction_csv
The above command produces the file prediction.values.predicted_data.csv:
Id,age__withLanguage
2863105,25.678967347
3155970,25.0352946253
671748,27.6127094761
4184069,24.1537464257
4016135,25.6544622564
3485704,22.359848659
3321866,18.403504235
...
Probabilities
Output classification probabilities to a csv.
../fwinterface/fwflag_probability_csv:
dlatkInterface.py -d dla_tutorial -t msgs -c user_id -f 'feat$cat_met_a30_2000_cp_w$msgs$user_id$16to16' --combo_test_classif --outcome_table blog_outcomes --outcomes is_student is_education is_technology --output ~/probability.values --probability_csv
The above command produces the file probability.values.prediction_probabilities.csv:
Id,is_education__withLanguage,is_student__withLanguage,is_technology__withLanguage
2863105,-0.995385541164,-0.606670069561,-1.04686722815
3155970,-1.02199555614,-0.364230235902,-0.945148184099
671748,-1.03019054224,-0.605114549867,-0.95579279429
4184069,-1.04002299128,-0.673432284751,-1.26935520215
4016135,-0.972723923762,-0.65093645124,-0.860030650986
3485704,-0.957358049868,-0.334117399851,-1.43404698847
...
Prediction / Classification Metrics
Output prediction or classification performance metrics to a csv delimited by |.
dlatkInterface.py -d dla_tutorial -t msgs -c user_id -f 'feat$cat_met_a30_2000_cp_w$msgs$user_id$16to16' --combo_test_regression --outcome_table blog_outcomes --outcomes age --output ~/regression.metrics --csv
The above command produces the file regression.metrics.variance_data.csv:
row_id|outcome|model_controls|mae|mae_folds|mse|mse_folds|N|num_features|R|r|R2|R2_folds|r_folds|r_p|r_p_folds|rho|rho_p|se_mae_folds|se_mse_folds|se_R2_folds|se_r_folds|se_r_p_folds|se_train_mean_mae_folds|test_size|train_mean_mae|train_mean_mae_folds|train_size|{model_desc}|{modelFS_desc}|w/ lang.
1|age|()1|4.83561091756|4.83367105025|42.8168508661|42.7795793244|978|2000|0.617025365643|0.631052186337|0.380720301847|0.382388363028|0.634404850072|9.32342564224e-110|7.83089815326e-20|0.681655428515|1.40167196195e-134|0.175281125172|3.03944440616|0.0193760958564|0.0165877776996|6.98930451373e-20|0.123031255372|198|3.33229129841|6.45703007082|780|RidgeCV(alphas=array([ 1.00000e+00, 1.00000e-02, 1.00000e-04, 1.00000e+02, 1.00000e+04, 1.00000e+06]), cv=None, fit_intercept=True, gcv_mode=None, normalize=False, scoring=None, store_cv_values=False)|None|1
HTML
Correlation Matrix
dlatkInterface.py -d dla_tutorial -t msgs -c user_id --correlate --rmatrix --outcome_table blog_outcomes --outcomes age gender is_student is_education is_technology --outcome_with_outcome_only --output ~/correlations
Here is HTML output from the above command.
Word Clouds
1 to 3 Gram Clouds
--tagcloud: Produces data for making Wordle tag clouds, saved in text files
--make_wordclouds: Creates pngs from the text file output
dlatkInterface.py -d dla_tutorial -t msgs -c user_id -f 'feat$1gram$msgs$user_id$16to16' --tagcloud --make_wordclouds --outcome_table blog_outcomes --outcomes age gender is_student is_education is_technology --output ~/1to3grams
creates the following files:
1to3grams_tagcloud_wordclouds/age_pos.png
1to3grams_tagcloud_wordclouds/age_neg.png
1to3grams_tagcloud_wordclouds/gender_1_pos.png
1to3grams_tagcloud_wordclouds/gender_1_neg.png
1to3grams_tagcloud_wordclouds/is_student_pos.png
1to3grams_tagcloud_wordclouds/is_student_neg.png
1to3grams_tagcloud_wordclouds/is_education_pos.png
1to3grams_tagcloud_wordclouds/is_education_neg.png
1to3grams_tagcloud_wordclouds/is_technology_pos.png
Topic Clouds
--topic_tagcloud: Produces data for making topic Wordles, saved in text files
--make_topic_wordclouds: Creates pngs from the text file output
dlatkInterface.py -d dla_tutorial -t msgs -c user_id -f 'feat$cat_met_a30_2000_cp_w$msgs$user_id$16to16' --topic_tagcloud --make_topic_wordclouds --outcome_table blog_outcomes --outcomes age gender is_student is_education is_technology --output ~/fbtopics --topic_lexicon met_a30_2000_freq_t50ll
creates the following directories where the topic clouds are printed:
fbtopics_topic_tagcloud_wordclouds/age
fbtopics_topic_tagcloud_wordclouds/gender
fbtopics_topic_tagcloud_wordclouds/is_student
fbtopics_topic_tagcloud_wordclouds/is_education
fbtopics_topic_tagcloud_wordclouds/is_technology
Print Word Clouds For A Topic Lexicon
Create a word clound png for every topic in a given topic lexicon.
../fwinterface/fwflag_make_all_topic_wordclouds
dlatkInterface.py --topic_lexicon met_a30_2000_freq_t50ll --output all_met_a30_2000_tagclouds --make_all_topic_wordclouds
Ini files
See the Using INI Files tutorial for more info.
Pickles
Save a predictive model to a pickle file with:
dlatkInterface.py -d dla_tutorial -t msgs -c user_id -f 'feat$1gram$msgs$user_id$16to16' --outcome_table blog_outcomes --outcomes age --train_regression --save_model --picklefile age.pickle
Load a predictive model to a pickle file with:
# Loads the regression model in age.pickle, and uses the features to predict the ages of the users in
# blog_outcomes, and compares the predicted ages to the actual ages in the table.
dlatkInterface.py -d dla_tutorial -t msgs -c user_id -f 'feat$1gram$msgs$user_id$16to16' --outcome_table blog_outcomes --outcomes age --train_regression --load_model --picklefile age.pickle