--print_tokenized_lines
Switch
--print_tokenized_lines
Description
Prints tokenized version of messages to lines.
Argument and Default Value
You must supply an output file name.
Details
Looks for the table TABLENAME_tok, where TABLENAME is specified by -t. Each line of the output file contains the message id, lanugage, and tokens. Example:
# Sample message from tokenized input table: # ["is", "worth", "it", "just", "follow", "your", "heart", "its", "never", "wrong", ":", "-", "rrb", "-"] # Output line: # 128675651556356096 en is worth it just follow your heart its never wrong : - rrb -
Other Switches
Required Switches: -d, -t Optional Switches: --feat_whitelist Example Commands ================ .. code:doc:fwflag_block:: python
# General command python fwInterface.py -d DATABASE -t TABLE --print_tokenized_lines OUTPUTFILE_NAME
# Example command # searches for the table twt_20mil_tok # outputs the file twt_20mil.txt python fwInterface.py -d twitterGH -t twt_20mil --print_tokenized_lines twt_20mil.txt