This Analytic provides a semantic matching score for words in the dataset either by using Latent Semantic Analysis (LSA) or Term Frequency/In-document frequency (TFIDF) methods. The results provided are pruned by a given percentile of matches - a percentile of 0.99 corresponds to a returned set of only the highest 1% of semantically useful terms across the dataset. In practice, what this provides you is the other closely matching terms that the dataset matches - for example, if you searched for "Romney", "Mitt" should appear high up on the list, alongside possibly surprising terms, which is what you can use to gain a sense of the types of topics that these tweets cover.
| Name | User Modifiable? | Position | Kind | Description | Possible Values | Options |
|---|---|---|---|---|---|---|
| percentile | Yes | 0 | enum | The percentile at which the semantic analyzer algorithm cuts off results. The value, represented in decimal notation, corresponds to the portion of results that are omitted. A value of 0.9, for instance, is the 90th percentile, which means that only the top 10% semantically matched words will be shown in the results. No stop words are removed so as to make this an entirely transparent analytic. | '0.99', '0.95', '0.9', '0.85', '0.80', '0.75', '0.7', '0.6', '0.5', '0.4', '0.3', '0.25', '0.2', '0.1', '0.0' | Remove |