Reduces a feature space to clusters.

Argument and Default Value

If --n_components is not specified then the default number of clusers is 24 (when applicable).


Using --model one can specify the following clustering algorithms:

  • NMF - Non:doc:fwflag_Negative matrix factorization by Projected Gradient (NMF)

  • PCA - (Principal component analysis) Linear dimensionality reduction using Singular Value Decomposition of the data and keeping only the most significant singular vectors to project the data to a lower dimensional space.

  • SPARSEPCA - (Sparse Principal Components Analysis) Finds the set of sparse components that can optimally reconstruct the data. The amount of sparseness is controllable by the coefficient of the L1 penalty.

  • LDA - (Linear Discriminant Analysis) A classifier with a linear decision boundary, generated by fitting class conditional densities to the data and using Bayes’ rule.

  • KMEANS - K:doc:fwflag_Means clustering

  • DBSCAN - (Density:doc:fwflag_Based Spatial Clustering of Applications with Noise) Finds core samples of high density and expands clusters from them. Good for data which contains clusters of similar density.

  • SPECTRAL - Apply clustering to a projection to the normalized laplacian. In practice Spectral Clustering is very useful when the structure of the individual clusters is highly non:doc:fwflag_convex or more generally when a measure of the center and spread of the cluster is not a suitable description of the complete cluster. For instance when clusters are nested circles on the 2D plan.

  • GMM - (Gaussian Mixture Model)

Other Switches

Required Switches:

  • -d, -g, -t


  • --model nmf, pca, sparsepca, lda, kmeans, dbscan, spectral or gmm

Optional Switches:

Example Commands

# General syntax
dlatkInterface.py -d <DATABASE> -t <TABLE> -c <> -f <FEATURE_TABLE> --fit_reducer --model <MODEL_NAME>

# Example command
dlatkInterface.py -d primals -t primals_new -c dp_id -f 'feat$1to3gram$primals_new$dp_id$16to1$0_0001' --fit_reducer --model spectral --group_freq_thresh 100