--lex_interface

Switch

--lex_interface

Description

Override the argparser in dlatkInterface and send all arguments to lexInterface. lexInterface is often used to upload csv's to MySQL during the LDA process. See the DLATK LDA Interface tutorial for more details.

Details

The full list of available flags in lexInterface:

python lexInterface.py -h

usage: lexInterface.py [-h] [-f FILENAME] [-g GFILE] [--sparsefile SPARSEFILE]
                       [--weightedsparsefile WEIGHTEDSPARSEFILE]
                       [--dicfile DICFILE] [--topicfile TOPICFILE]
                       [--topic_csv] [--filter] [-n NAME] [-c CREATE] [-p]
                       [--print_weighted] [--pprint] [-w WHERE] [-u UNION]
                       [-i INTERSECT] [--super_topic SUPERTOPIC] [-r]
                       [--depol] [--ungroup] [--compare COMPARE]
                       [--annotate_senses SENSE_ANNOTATED_LEX]
                       [--topic_threshold TOPICTHRESHOLD] [-a] [-l]
                       [--corpus_examples] [--corpus_samples] [-e] [-d DB]
                       [-t TABLE] [--lexicondb DB] [--corpus_term_field FIELD]
                       [--corpus_message_field FIELD]
                       [--corpus_messageid_field FIELD] [--min_word_freq NUM]
                       [--lexicon_category CATEGORY] [--num_rand_messages NUM]

On Features Class.

optional arguments:
  -h, --help            show this help message and exit

:

  -f FILENAME, --file FILENAME
                        Lexicon Filename (default: None)
  -g GFILE, --gfile GFILE
                        Lexicon Filename in google format (default: None)
  --sparsefile SPARSEFILE
                        Lexicon Filename in sparse format (default: None)
  --weightedsparsefile WEIGHTEDSPARSEFILE
                        Lexicon Filename in weighted sparse format (default:
                        None)
  --dicfile DICFILE     Lexicon Filename in dic (LIWC) format (default: None)
  --topicfile TOPICFILE
                        Lexicon Filename in topic format (default: None)
  --topic_csv, --weighted_file
                        tells interface to use the topic csv format to make a
                        weighted lexicon (default: False)
  --filter              Allows lexicon filtering if True (default: False)
  -n NAME, --name NAME  Existing Lexicon Table Name (will load) (default:
                        None)
  -c CREATE, --create CREATE
                        Create a new lexicon table (must supply new lexicon
                        name, and either -f, -g or -n) (default: None)
  -p, --print           print lexicon to stdout (default csv format) (default:
                        False)
  --print_weighted      print lexicon to stdout (weighted csv format)
                        (default: False)
  --pprint              print lexicon to stdout as pprint output (default:
                        False)
  -w WHERE, --where WHERE
                        where phrase to add to sql query (default: None)
  -u UNION, --union UNION
                        Unions two tables and uses the result as myLexicon
                        (default: None)
  -i INTERSECT, --intersect INTERSECT
                        Intersects two tables and uses the result as myLexicon
                        (default: None)
  --super_topic SUPERTOPIC
                        Maps the current lexicon with a super topic mapping
                        lexicon to make a super_topic (default: None)
  -r, --randomize       Randomizes the categories of terms (default: False)
  --depol               Depolarize the categories (removes +/-) (default:
                        False)
  --ungroup             places each word in its own category (default: False)
  --compare COMPARE     Unions two tables and uses the result as myLexicon
                        (default: None)
  --annotate_senses SENSE_ANNOTATED_LEX
                        Asks the user to annotate senses of words and creates
                        a new lexicon with senses (new lexicon name is the
                        parameter) (default: None)
  --topic_threshold TOPICTHRESHOLD
                        sets the threshold to use for a csv topicfile
                        (default: None)
  -a, --add_terms       Adds terms from the loaded lexicon to a given corpus
                        (options below) (default: False)
  -l, --corpus_lexicon  Load a lexicon based on finding words in a given
                        corpus (BETA) (options below) (default: False)
  --corpus_examples     Find example instances of words in the given corpus
                        (using rlike; equal number for all words) (default:
                        False)
  --corpus_samples      Find sample of matches for lexicon. (default: False)
  -e, --expand_lexicon  Expands the lexicon to more terms. (default: False)

Terms OR Corpus Lexicon Options:

  -d DB, --corpus_db DB
                        Corpus database to use [default: dla_tutorial]
  -t TABLE, --corpus_table TABLE
                        Corpus table to use [default: msgs]
  --lexicondb DB        The database which stores all lexicons. (default:
                        dlatk_lexica)
  --corpus_term_field FIELD
                        field of the corpus table that contains terms (lexicon
                        table always uses 'term') [default: term]
  --corpus_message_field FIELD
                        field of the corpus table that contains the actual
                        message [default: message]
  --corpus_messageid_field FIELD
                        field of the table that contains message ids (set to
                        '' to not use group by [default: message_id]
  --min_word_freq NUM   minimum number of instances to include in lexicon (-l
                        option) [default: 1000]
  --lexicon_category CATEGORY
                        category in lexicon to get random samples from
                        (default: None)
  --num_rand_messages NUM
                        number of random messages to select when getting
                        samples from lexicon category (default: 100)

Example Commands

Upload the topic given word probability distributions generated during LDA. This creates a table in dlatk_lexica called msgs_lda_cp.

dlatkInterface.py --lex_interface --topic_csv  \
--topicfile=/home/user/lda_tutorial/msgs_lda_tok_lda.lda_topics.topicGivenWord.csv  \
-c msgs_lda_cp