--add_tweettok
Switch
--add_tweettok
Description
Use Carnegie Mellon University's TweetNLP tokenizer to create a tokenized version of the message table.
Argument and Default Value
None
Details
This will create a table called TABLE_tweettok (where TABLE is specified by -t) in the database specified by -d. The message column in this new table is a list of tokens.
Example on one message
Original message:
"@antijokeapple: What do you call a Bee who is having a bad hair day? A Frisbee." Hahah.
Tokenized message:
["\"", "@antijokeapple", ":", "What", "do", "you", "call", "a", "Bee", "who", "is", "having", "a", "bad", "hair", "day", "?", "A", "Frisbee", ".", "\"", "Hahah", "."]
Other Switches
Required Switches:
Example Commands
# creates the table msgs_tweettok
./dlatkInterface.py -d dla_tutorial -t msgs -c message_id --add_tweettok
mysql> select message from msgs_tweettok limit 1;
+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| message |
+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| ["can", "you", "believe", "it", "??", "my", "mom", "wouln't", "let", "me", "go", "out", "on", "my", "b'day", "...", "i", "was", "really", "really", "mad", "at", "her", ".", "still", "am", ".", "but", "i", "got", "more", "presents", "from", "my", "friends", "this", "year", ".", "so", "thats", "great", "."] |
+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+