Installation
DLATK is highly dependent on MySQL. If you do not have MySQL installed and running please skip Recommended Install and follow the steps in the Full Install section.
Please see Install FAQs for common install issues.
Recommended Install
The easiest way to install dlatk is through Docker (follow the Installing DLATK with Docker tutorial) or through pip (with sudo privileges):
sudo pip install dlatk
If you do not have sudo then you can use the --user flag
pip install --user dlatk
This will install all of the required Python dependencies listed below. See Full List of Dependencies.
Note: some features in DLATK require other non-Python packages (besides MySQL). The Docker installation will install everything whereas pip will not. Please see notes below on installing other dependencies.
Full Install
Setup
Before installing DLATK you need to install the necessary system requirements (MySQL being the most important). The next steps will walk you through how to do this on a machine running Ubuntu or OSX.
Linux
Warning
This will install MySQL on your computer.
Install the required Ubuntu libraries. The requirements.sys can be found on the DLATK GitHub page. The r-base
package might be difficult to install and can be removed from requirements.sys
if needed though this will limit some minor functionality.
wget https://github.com/dlatk/dlatk/blob/public/install/requirements.sys
xargs apt-get install < requirements.sys
DLATK has been tested on Ubuntu 14.04.
OSX (with brew)
Warning
This will install MySQL on your computer.
Install dependencies with brew.
brew install python mysql
DLATK has been tested on OSX 10.11.
With the system requirements out of the way you can now install the Python code via Pip, Anaconda or GitHub:
Install (pip)
Install the Python 3 version via pip:
pip install dlatk
To install the Python 2.7 version use:
pip install "dlatk < 1.0"
Install (Anaconda)
Run the following in a Python 3.5 conda env:
conda install -c wwbp dlatk
Install (GitHub)
Run the following:
git clone https://github.com/dlatk/dlatk.git
cd dlatk
python setup.py install
Install Other Dependencies
Load NLTK corpus
Load NLTK data from the command line:
python -c "import nltk; nltk.download('wordnet')"
Install Stanford Parser
Download the zip file from http://nlp.stanford.edu/software/lex-parser.shtml.
Extract into
../dlatk/Tools/StanfordParser/
.Move
../dlatk/Tools/StanfordParser/oneline.sh
into the folder you extracted:../dlatk/Tools/StanfordParser/stanford-parser-full*/
.
Install Tweet NLP v0.3 (ark-tweet-nlp-0.3)
Download the tgz file (for version 0.3) from http://www.cs.cmu.edu/~ark/TweetNLP/.
Extract this file into
../dlatk/Tools/TwitterTagger/
.
Python Modules (optional)
You can install the optional python dependencies with
pip install image jsonrpclib-pelix langid rpy2 simplejson textstat wordcloud
Standard DLATK functions can be run without these modules.
Install the IBM Wordcloud jar file (optional)
The IBM wordcloud module is our default. To install this you must sign up for a IBM DeveloperWorks account and download ibm-word-cloud.jar. Place this file into ../dlatk/lib/
.
If you are unable to install this jar then you can use the python wordcloud module:
pip install wordcloud
Change
wordcloud_algorithm='ibm'
in ../dlatk/lib/wordcloud.py towordcloud_algorithm='amueller'
.
Note: You must install either the IBM Wordcloud jar or the Python wordcloud module to print wordclouds.
Mallet (optional)
Mallet can be used with DLATK to create LDA topics (see the DLATK LDA Interface tutorial). Directions on downloading and installing can be found here.
Full List of Dependencies
Python
matplotlib (>=1.3.1)
nltk (>=3.1)
pandas (>=0.17.1)
python-dateutil (>=2.5.0)
scikit-learn (>=0.17.1)
SQLAlchemy (>=0.9.9)
statsmodels (>=0.6.1)
Other
Python (optional)
image
jsonrpclib-pelix (>=0.2.8)
langid (>=1.1.4)
rpy2 (2.6.0)
simplejson (>=3.3.1)
textstat (>=0.6.1)
wordcloud (>=1.1.3)
Other (optional)
IBM Wordcloud (for wordcloud visualization)
Mallet (for creating LDA topics)
Python version support
DLATK is available for Python 2.7 and 3.5, with the 3.5 version being the official release. The 2.7 version is fully functional (as of v0.6.1) but will not be maintained and also does not contain some of the newer features available in v1.0.
To install the Python 2.7 version run:
pip install "dlatk < 1.0"
Getting Started
Command Line Interface
DLATK is run using dlatkInterface.py which is added to /usr/local/bin during the installation process.
MySQL Configuration
Any calls to dlatkInterface.py will open MySQL. We assume any table with text data has the following columns:
message: text data
message_id: unique numeric identifier for each message
All lexicon tables are assumed to be in a database called dlatk_lexica (a sample database with this name is distributed with the release). To change this you must edit dlaConstants.py: DEF_LEXICON_DB = 'dlatk_lexica'
Sample Datasets
DLATK comes packaged with two sample databases: dla_tutorial and dlatk_lexica. See Packaged Datasets for more information on the databases. To install them use the following:
mysql -u username -p < /path/to/dlatk/data/dla_tutorial.sql
mysql -u username -p < /path/to/dlatk/data/dlatk_lexica.sql
The path to DLATK can be found using the following:
python -c "import dlatk; print(dlatk.__file__)"
Warning
If the above databases already exist then the above commands will add tables to the them.
Next Steps
Try the Differential Language Analysis (DLA) Tutorial once you have everything running.
Install Issues
See Install FAQs for more info.