Differential Language Analysis ToolKit
DLATK is an end to end human text analysis package, specifically suited for social media and social scientific applications. It is written in Python 3 and developed by the World Well-Being Project at the University of Pennsylvania and Stony Brook University. It contains:
- feature extraction 
- part-of-speech tagging 
- correlation 
- prediction and classification 
- mediation 
- dimensionality reduction and clustering 
- wordcloud visualization 
DLATK can utilize:
- HuggingFace for transformer language models 
- Mallet for creating LDA topics 
- pandas dataframe output 
Getting Started
Citations
If you use DLATK in your work please cite the following paper:
@InProceedings{DLATKemnlp2017,
  author =  "Schwartz, H. Andrew
      and Giorgi, Salvatore
      and Sap, Maarten
      and Crutchley, Patrick
      and Eichstaedt, Johannes
      and Ungar, Lyle",
  title =   "DLATK: Differential Language Analysis ToolKit",
  booktitle =  "Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing: System Demonstrations",
  year =    "2017",
  publisher =  "Association for Computational Linguistics",
  pages =   "55--60",
  location =   "Copenhagen, Denmark",
  url =  "http://aclweb.org/anthology/D17-2010"
}
More Information
DLATK is licensed under a GNU General Public License v3 (GPLv3).