Differential Language Analysis ToolKit

DLATK is an end to end human text analysis package, specifically suited for social media and social scientific applications. It is written in Python 3 and developed by the World Well-Being Project at the University of Pennsylvania and Stony Brook University. It contains:

  • feature extraction

  • part-of-speech tagging

  • correlation

  • prediction and classification

  • mediation

  • dimensionality reduction and clustering

  • wordcloud visualization

DLATK can utilize:

Getting Started


If you use DLATK in your work please cite the following paper:

  author =  "Schwartz, H. Andrew
      and Giorgi, Salvatore
      and Sap, Maarten
      and Crutchley, Patrick
      and Eichstaedt, Johannes
      and Ungar, Lyle",
  title =   "DLATK: Differential Language Analysis ToolKit",
  booktitle =  "Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing: System Demonstrations",
  year =    "2017",
  publisher =  "Association for Computational Linguistics",
  pages =   "55--60",
  location =   "Copenhagen, Denmark",
  url =  "http://aclweb.org/anthology/D17-2010"

More Information

DLATK is licensed under a GNU General Public License v3 (GPLv3).