--correlate
Switch
--correlate
Description
Correlates features with the given outcomes and print r-s to standard output.
Argument and Default Value
None
Details
This is one of the flag that triggers the correlation code (just like --rmatrix or --tagcloud). If none of the --outcome_controls or --outcome_interaction flags are specified, a Pearson correlation will be done for every feature's group_norm and the outcome. See the specific pages for the types of analyses performed when there's controls or interaction variables.
Every p-value is by default bonferroni corrected, unless --no_correction is specified.
NOTE - group columns must match in type between the message table, feature table and outcome table!
The following pseudo-code is happening
for feat in all_features:
for outcome in outcomes:
# x: column vector of group_norms for given feature
# y: column vector of outcome values; aligned to x
(r, p) = pearsonr(x,y)
--correlate prints out the following tuples to the stdout:
("feature", (pearson-r, p-value, (confidence interval left, confidence interval right), number of groups/sample size, total count of "feature")
Other Switches
Required Switches:
Optional Switches:
--p_correction [METHOD]
--bootstrapp (as of 2015-07-24 only implemented with AUC)
Example Commands
# Correlates LIWC lexical features with age and gender for every user in masterstats_andy_r10k
dlatkInterface.py -d fb20 -t messages_en -c user_id --outcome_table masterstats_andy_r10k --outcomes age gender -f 'feat$cat_LIWC2007$messages_en$user_id$16to16' --correlate