Installation

DLATK is highly dependent on MySQL. If you do not have MySQL installed and running please skip Recommended Install and follow the steps in the Full Install section.

Please see Install FAQs for common install issues.

Recommended Install

The easiest way to install dlatk is through Docker (follow the Installing DLATK with Docker tutorial) or through pip (with sudo privileges):

sudo pip install dlatk

If you do not have sudo then you can use the --user flag

pip install --user dlatk

This will install all of the required Python dependencies listed below. See Full List of Dependencies.

Note: some features in DLATK require other non-Python packages (besides MySQL). The Docker installation will install everything whereas pip will not. Please see notes below on installing other dependencies.

Full Install

Setup

Before installing DLATK you need to install the necessary system requirements (MySQL being the most important). The next steps will walk you through how to do this on a machine running Ubuntu or OSX.

Linux

Warning

This will install MySQL on your computer.

Install the required Ubuntu libraries. The requirements.sys can be found on the DLATK GitHub page. The r-base package might be difficult to install and can be removed from requirements.sys if needed though this will limit some minor functionality.

wget https://github.com/dlatk/dlatk/blob/public/install/requirements.sys
xargs apt-get install < requirements.sys

DLATK has been tested on Ubuntu 14.04.

OSX (with brew)

Warning

This will install MySQL on your computer.

Install dependencies with brew.

brew install python mysql

DLATK has been tested on OSX 10.11.

With the system requirements out of the way you can now install the Python code via Pip, Anaconda or GitHub:

Install (pip)

Install the Python 3 version via pip:

pip install dlatk

To install the Python 2.7 version use:

pip install "dlatk < 1.0"

Install (Anaconda)

Run the following in a Python 3.5 conda env:

conda install -c wwbp dlatk

Install (GitHub)

Run the following:

git clone https://github.com/dlatk/dlatk.git
cd dlatk
python setup.py install

Install Other Dependencies

Load NLTK corpus

Load NLTK data from the command line:

python -c "import nltk; nltk.download('wordnet')"

Install Stanford Parser

Download the zip file from http://nlp.stanford.edu/software/lex-parser.shtml.
Extract into ../dlatk/Tools/StanfordParser/.
Move ../dlatk/Tools/StanfordParser/oneline.sh into the folder you extracted: ../dlatk/Tools/StanfordParser/stanford-parser-full*/.

Install Tweet NLP v0.3 (ark-tweet-nlp-0.3)

Download the tgz file (for version 0.3) from http://www.cs.cmu.edu/~ark/TweetNLP/.
Extract this file into ../dlatk/Tools/TwitterTagger/.

Python Modules (optional)

You can install the optional python dependencies with

pip install image jsonrpclib-pelix langid rpy2 simplejson textstat wordcloud

Standard DLATK functions can be run without these modules.

Install the IBM Wordcloud jar file (optional)

The IBM wordcloud module is our default. To install this you must sign up for a IBM DeveloperWorks account and download ibm-word-cloud.jar. Place this file into ../dlatk/lib/.

If you are unable to install this jar then you can use the python wordcloud module:

pip install wordcloud
Change wordcloud_algorithm='ibm' in ../dlatk/lib/wordcloud.py to wordcloud_algorithm='amueller'.

Note: You must install either the IBM Wordcloud jar or the Python wordcloud module to print wordclouds.

Mallet (optional)

Mallet can be used with DLATK to create LDA topics (see the DLATK LDA Interface tutorial). Directions on downloading and installing can be found here.

Full List of Dependencies

Python

matplotlib (>=1.3.1)
mysqlclient
nltk (>=3.1)
numpy
pandas (>=0.17.1)
python-dateutil (>=2.5.0)
scikit-learn (>=0.17.1)
scipy
SQLAlchemy (>=0.9.9)
statsmodels (>=0.6.1)

Other

Python (optional)

image
jsonrpclib-pelix (>=0.2.8)
langid (>=1.1.4)
rpy2 (2.6.0)
simplejson (>=3.3.1)
textstat (>=0.6.1)
wordcloud (>=1.1.3)

Other (optional)

IBM Wordcloud (for wordcloud visualization)
Mallet (for creating LDA topics)

Python version support

DLATK is available for Python 2.7 and 3.5, with the 3.5 version being the official release. The 2.7 version is fully functional (as of v0.6.1) but will not be maintained and also does not contain some of the newer features available in v1.0.

To install the Python 2.7 version run:

pip install "dlatk < 1.0"

Getting Started

Command Line Interface

DLATK is run using dlatkInterface.py which is added to /usr/local/bin during the installation process.

MySQL Configuration

Any calls to dlatkInterface.py will open MySQL. We assume any table with text data has the following columns:

message: text data
message_id: unique numeric identifier for each message

All lexicon tables are assumed to be in a database called dlatk_lexica (a sample database with this name is distributed with the release). To change this you must edit dlaConstants.py: DEF_LEXICON_DB = 'dlatk_lexica'

Sample Datasets

DLATK comes packaged with two sample databases: dla_tutorial and dlatk_lexica. See Packaged Datasets for more information on the databases. To install them use the following:

mysql -u username -p  < /path/to/dlatk/data/dla_tutorial.sql
mysql -u username -p  < /path/to/dlatk/data/dlatk_lexica.sql

The path to DLATK can be found using the following:

python -c "import dlatk; print(dlatk.__file__)"

Warning

If the above databases already exist then the above commands will add tables to the them.

Next Steps

Try the Differential Language Analysis (DLA) Tutorial once you have everything running.

Install Issues

See Install FAQs for more info.