Differential Language Analysis ToolKit
DLATK is an end to end human text analysis package, specifically suited for social media and social scientific applications. It is written in Python 3 and developed by the World Well-Being Project at the University of Pennsylvania and Stony Brook University. It contains:
feature extraction
part-of-speech tagging
correlation
prediction and classification
mediation
dimensionality reduction and clustering
wordcloud visualization
DLATK can utilize:
HuggingFace for transformer language models
Mallet for creating LDA topics
pandas dataframe output
Getting Started
Citations
If you use DLATK in your work please cite the following paper:
@InProceedings{DLATKemnlp2017,
author = "Schwartz, H. Andrew
and Giorgi, Salvatore
and Sap, Maarten
and Crutchley, Patrick
and Eichstaedt, Johannes
and Ungar, Lyle",
title = "DLATK: Differential Language Analysis ToolKit",
booktitle = "Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing: System Demonstrations",
year = "2017",
publisher = "Association for Computational Linguistics",
pages = "55--60",
location = "Copenhagen, Denmark",
url = "http://aclweb.org/anthology/D17-2010"
}
More Information
DLATK is licensed under a GNU General Public License v3 (GPLv3).