--tf_idf
Switch
--tf_idf
Description
Creates new feature table where the group_norm is the tf-idf score. Each group_id is seen as a document for calculating tf-idf.
Argument and Default Value
None
Details
The -f flag should be an ngram table.
Resulting value refers to value in ngram table. Group_norm refers to tf:doc:fwflag_idf score.
Other Switches
Required:
Example Commands
./dlatkInterface.py -d dla_tutorial -t msgs -c user_id -f 'feat$1gram$msgs$user_id$16to16' --tf_idf
mysql> select * from feat$tf_idf_1gram$msgs$user_id order limit 5;;
+---------+----------+-----------+-------+--------------------+
| id | group_id | feat | value | group_norm |
+---------+----------+-----------+-------+--------------------+
| 307349 | 2033616 | delivered | 1 | 0.0000878334772103 |
| 278647 | 4144593 | crap | 6 | 0.000998442620366 |
| 1043863 | 3482840 | story | 2 | 0.000334689956064 |
| 1150911 | 2876677 | uh | 2 | 0.000141436336165 |
| 283547 | 3711805 | crosses | 2 | 0.000827587016091 |
+---------+----------+-----------+-------+--------------------+