Using advanced text analytics algorithms, the Popular Terms analysis determines the terms most commonly used in the body of email and ranks them in descending order from most popular to least popular.
In order for a term to be considered "popular," it must reach a minimum threshold of mentions across multiple emails in the time period selected.
Additionally, several advanced text analytics principals are employed to ensure that the terms are relevant and interesting including, but not limited to:
- Zoning: Isolation of only the original authored content for analysis. This includes identifying and removing other parts of an email body such as the salutation and closing, auto-signature, forwarded message content, legal disclaimers, etc.
- Term Frequency (TF): Measures how frequently a term occurs in a document normalized by document length. All terms are considered equally important
- Inverse Document Frequency (IDF): Measures how important a term is taking into consideration how many times the term is mentioned in a document as well as across a corpus of documents. Assumes the more frequently a term is used in a documents/across multiple documents, the less important a term is
- POS Filtering: Removes terms classified as parts of speech that are typically considered unimportant. Currently, only singular and plural nouns are included in the analysis
- Stop Words Filtering: Removes additional terms that are not relevant to the analysis which have not already been removed by other algorithms, for instance, abbreviations
Please note: We do not currently analyze the content of attachments in our text analytics algorithms. Text analytics are restricted to the email body's original content as described above in Zoning.
Comments
0 comments
Please sign in to leave a comment.