Differential Language Analysis ToolKit

DLATK is an end to end human text analysis package, specifically suited for social media and social scientific applications. It is written in Python 3 and developed by the World Well-Being Project at the University of Pennsylvania and Stony Brook University. It contains:

feature extraction
part-of-speech tagging
correlation
prediction and classification
mediation
dimensionality reduction and clustering
wordcloud visualization

DLATK can utilize:

HuggingFace for transformer language models
Mallet for creating LDA topics
Stanford Parser
CMU's TweetNLP
pandas dataframe output

Getting Started

Citations

If you use DLATK in your work please cite the following paper:

@InProceedings{DLATKemnlp2017,
  author =  "Schwartz, H. Andrew
      and Giorgi, Salvatore
      and Sap, Maarten
      and Crutchley, Patrick
      and Eichstaedt, Johannes
      and Ungar, Lyle",
  title =   "DLATK: Differential Language Analysis ToolKit",
  booktitle =  "Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing: System Demonstrations",
  year =    "2017",
  publisher =  "Association for Computational Linguistics",
  pages =   "55--60",
  location =   "Copenhagen, Denmark",
  url =  "http://aclweb.org/anthology/D17-2010"
}

More Information

DLATK is licensed under a GNU General Public License v3 (GPLv3).