--correlate

Switch

--correlate

Description

Correlates features with the given outcomes and print r-s to standard output.

Argument and Default Value

None

Details

This is one of the flag that triggers the correlation code (just like --rmatrix or --tagcloud). If none of the --outcome_controls or --outcome_interaction flags are specified, a Pearson correlation will be done for every feature's group_norm and the outcome. See the specific pages for the types of analyses performed when there's controls or interaction variables.

Every p-value is by default bonferroni corrected, unless --no_correction is specified.

NOTE - group columns must match in type between the message table, feature table and outcome table!

The following pseudo-code is happening

for feat in all_features:
        for outcome in outcomes:
        # x: column vector of group_norms for given feature
        # y: column vector of outcome values; aligned to x
        (r, p) = pearsonr(x,y)

--correlate prints out the following tuples to the stdout:

("feature", (pearson-r, p-value, (confidence interval left, confidence interval right), number of groups/sample size, total count of "feature")

Other Switches

Required Switches:

Optional Switches:

--group_freq_thresh
--outcome_controls
--outcome_interaction
--rmatrix
--no_correction
--p_correction [METHOD]
--AUC
--spearman
--IDP
--bootstrapp (as of 2015-07-24 only implemented with AUC)
--csv
--sort
--tagcloud
--make_wordclouds
--topic_tagcloud
--whitelist
--blacklist

Example Commands

# Correlates LIWC lexical features with age and gender for every user in masterstats_andy_r10k
dlatkInterface.py -d fb20 -t messages_en -c user_id --outcome_table masterstats_andy_r10k --outcomes age gender -f 'feat$cat_LIWC2007$messages_en$user_id$16to16' --correlate