Installing DLATK with Docker
Run from Docker Hub
Step 1: Install Docker
Installing Docker is very easy. Visit the official Docker installation page page and follow the instructions tailored for your operating system.
After you’ve installed Docker, open the terminal and type the following to verify the installation:
> docker info
you should see something like
> docker info
Containers: 0
Running: 0
Paused: 0
Stopped: 0
Images: 0
...
Step 2: Install MySQL
You can pull the offical image of MySQL from Docker Hub. Starting a MySQL instance is simple:
> docker run --name some-mysql --env MYSQL_ROOT_PASSWORD=my-secret-pw --detach mysql:tag
where some-mysql is the name you want to assign to your container, my-secret-pw is the password to be set for the MySQL root user and tag is the tag specifying the MySQL version you want. We've tested using MySQL v5.5:
> docker run --name mysql_v5 --env MYSQL_ROOT_PASSWORD=my-secret-pw --detach mysql:5.5
and we can confirm the installation with:
> docker images
REPOSITORY TAG IMAGE ID CREATED SIZE
mysql 5.5 a8a59477268d 7 weeks ago 445MB
> docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
552954e73844 mysql "docker-entrypoint..." 5 minutes ago Up 5 minutes 0.0.0.0:3306->3306/tcp mysql_v5
Note that this is running on port 3306. Here is the command broken down:
run: Run a command in a new container.
--name: Assign a name to the container. If you don’t specify this, Docker will generate a random name.
--env: Set environment variables
--detach: Run container in background and print container ID
mysql: The image name as stated on the Docker Hub page. This is the simplest image name. The standard is “username/image_name:tag”, for example “severalnines/mysql:5.6”. In this case, we specified “mysql”, which means it has no username (the image is built and maintained by Docker, therefore no username), the image name is “mysql” and the tag is latest (default). If the image does not exist, it will pull it first from Docker Hub into the host, and then run the container.
You can see which IP the MySQL container is running on via:
> docker inspect mysql_v5 | grep IPAddress
"SecondaryIPAddresses": null,
"IPAddress": "172.17.0.2",
"IPAddress": "172.17.0.2",
Both of these can be used to configure a graphical SQL client such as Heidi, MySQL Workbench or Sequel Pro.
To open this MySQL instance we run, remembering that we set the root password to my-secret-pw:
> docker exec -it mysql_v5 bash
root@d6ed6aa86c31:/# mysql -p
Enter password:
Welcome to the MySQL monitor. Commands end with ; or \g.
Your MySQL connection id is 13
Server version: 8.0.11 MySQL Community Server - GPL
Copyright (c) 2000, 2018, Oracle and/or its affiliates. All rights reserved.
Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.
Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
mysql> show databases;
+--------------------+
| Database |
+--------------------+
| information_schema |
| mysql |
| performance_schema |
| sys |
+--------------------+
4 rows in set (0.00 sec)
mysql>
Step 3: Link MySQL and DLATK
Here we run DLATK and link to MySQL. We pull DLATK from it's official repo at DockerHub:
> docker run -it --rm --name dlatk_docker --link mysql_v5:mysql dlatk/dlatk bash
which should give you a new prompt. Here we can open MySQL as follows:
root@70032e45f971:/# mysql -p
Enter password:
Welcome to the MariaDB monitor. Commands end with ; or \g.
Your MySQL connection id is 1
Server version: 5.5.60 MySQL Community Server (GPL)
Copyright (c) 2000, 2017, Oracle, MariaDB Corporation Ab and others.
Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
MySQL [(none)]> show databases;
+--------------------+
| Database |
+--------------------+
| information_schema |
| mysql |
| performance_schema |
+--------------------+
3 rows in set (0.00 sec)
MySQL [(none)]> exit
Bye
Next we will upload the sample data packaged with DLATK into MySQL, noting that we have access to the DLATK install path via $DLATK_DIR:
root@70032e45f971:/# echo $DLATK_DIR
/usr/local/lib/python3.6/site-packages/dlatk
root@70032e45f971:/# mysql < $DLATK_DIR/data/dla_tutorial.sql
root@70032e45f971:/# mysql < $DLATK_DIR/data/permaLexicon.sql
root@70032e45f971:/# mysql
Welcome to the MariaDB monitor. Commands end with ; or \g.
Your MySQL connection id is 4
Server version: 5.5.60 MySQL Community Server (GPL)
Copyright (c) 2000, 2017, Oracle, MariaDB Corporation Ab and others.
Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
MySQL [(none)]> show databases;
+--------------------+
| Database |
+--------------------+
| information_schema |
| dla_tutorial |
| mysql |
| performance_schema |
| permaLexicon |
+--------------------+
5 rows in set (0.00 sec)
Going back to the prompt we can run DLATK through the interface script dlatkInterface.py:
root@70032e45f971:/# dlatkInterface.py -h
Note that this also installs Mallet, Stanford Parser and Tweet NLP with Mallet added to your path:
root@0f8f18074713:/# mallet
Unrecognized command:
Mallet 2.0 commands:
import-dir load the contents of a directory into mallet instances (one per file)
import-file load a single file into mallet instances (one per line)
import-svmlight load SVMLight format data files into Mallet instances
info get information about Mallet instances
train-classifier train a classifier from Mallet data files
classify-dir classify data from a single file with a saved classifier
classify-file classify the contents of a directory with a saved classifier
classify-svmlight classify data from a single file in SVMLight format
train-topics train a topic model from Mallet data files
infer-topics use a trained topic model to infer topics for new documents
evaluate-topics estimate the probability of new documents under a trained model
prune remove features based on frequency or information gain
split divide data into testing, training, and validation portions
bulk-load for big input files, efficiently prune vocabulary and import docs
Include --help with any option for more information
Build Image from DockerFile
This is more advanced and probably not needed for most use cases. First we download the DockerFile from GitHub. If you have git installed you can run
> git clone https://github.com/dlatk/dlatk-docker.git && cd dlatk-docker
To build the image we run:
> docker build -t dlatk-docker .
Here is the command broken down:
build: Build an image from a Dockerfile
-t: Alias for --tag. Name and optionally a tag in the 'name:tag' format. Since we are not specifying a tag we will pull the latest version.
You will see the following output:
Sending build context to Docker daemon 84.48kB
Step 1/15 : FROM python:3.6-stretch
stretch: Pulling from library/python
cc1a78bfd46b: Downloading [=============================================> ] 40.86MB/45.32MB
d2c05365ee2a: Download complete
231cb0e216d3: Download complete
3d2aa70286b8: Downloading [===================================> ] 35.08MB/50.06MB
e80dfb6a4adf: Downloading [=======> ] 31.16MB/213.2MB
....
At the end you should see:
Removing intermediate container c4776548e966
Successfully built dc2005cd24a6
Successfully tagged dlatk-docker:latest
and we can confirm the installation with:
> docker images
docker images
REPOSITORY TAG IMAGE ID CREATED SIZE
dlatk-docker latest 10eea3e0202a About a minute ago 2.56GB
python 3.6-stretch d330010a503a 3 days ago 912MB
Acknowledgment
The DockerFile was originally written by Michael Becker at Penn Medicine.