| big_tokenize_transform | String tokenization and transformation for big data sets | 
| bytes_converter | bytes converter of a text file ( KB, MB or GB ) | 
| cluster_frequency | Frequencies of an existing cluster object | 
| cosine_distance | cosine distance of two character strings (each string consists of more than one words) | 
| COS_TEXT | Cosine similarity for text documents | 
| Count_Rows | Number of rows of a file | 
| dense_2sparse | convert a dense matrix to a sparse matrix | 
| dice_distance | dice similarity of words using n-grams | 
| dims_of_word_vecs | dimensions of a word vectors file | 
| Doc2Vec | Conversion of text documents to word-vector-representation features ( Doc2Vec ) | 
| JACCARD_DICE | Jaccard or Dice similarity for text documents | 
| levenshtein_distance | levenshtein distance of two words | 
| load_sparse_binary | load a sparse matrix in binary format | 
| matrix_sparsity | sparsity percentage of a sparse matrix | 
| read_characters | read a specific number of characters from a text file | 
| read_rows | read a specific number of rows from a text file | 
| save_sparse_binary | save a sparse matrix in binary format | 
| select_predictors | Exclude highly correlated predictors | 
| sparse_Means | RowMens and colMeans for a sparse matrix | 
| sparse_Sums | RowSums and colSums for a sparse matrix | 
| sparse_term_matrix | Term matrices and statistics ( document-term-matrix, term-document-matrix) | 
| TEXT_DOC_DISSIM | Dissimilarity calculation of text documents | 
| text_file_parser | text file parser | 
| text_intersect | intersection of words or letters in tokenized text | 
| tokenize_transform_text | String tokenization and transformation ( character string or path to a file ) | 
| tokenize_transform_vec_docs | String tokenization and transformation ( vector of documents ) | 
| token_stats | token statistics | 
| utf_locale | utf-locale for the available languages | 
| vocabulary_parser | returns the vocabulary counts for small or medium ( xml and not only ) files |