Packaged Datasets
All datasets are available on our github page, the World Well-Being Project site and via the pip install.
Note: some lexica and datasets are distributed on more restrictive licenses than DLATK. Please review each before use.
Language Data
Lexica
Age and Gender Lexica
Our data-driven age and gender lexica were generated from about 97,000 Facebook, Blogger and Twitter users.
MySQL: permaLexicon.dd_emnlp14_ageGender
PERMA Lexicon
Our lexicon to predict well-being as measured through PERMA scales.
MySQL: permaLexicon.dd_permaV3
Spanish PERMA Lexicon
Our lexicon to measure PERMA in Spanish, derived from Spanish tweets annotated with PERMA.
MySQL: permaLexicon.dd_sperma_v2
Other Lexica
Prospection Lexicon: Temporal Orientation:
MySQL: permaLexicon.dd_PaPreFut
Affect and Intensity Lexicon:
MySQL: permaLexicon.dd_intAff
LDA Topics
2000 Facebook Topics
Top 20 words per topic: [.csv] [Excel file]
MySQL: permaLexicon.met_a30_2000_cp and permaLexicon.met_a30_2000_freq_t50ll
All words: [.csv]
Conditional probabilities [.csv] (sparse matrix format)