Linguistic Relevance
Using the principles of linguistics, FAST ESP is able to find information that does not match a traditional keyword query. The result is improved relevancy, recall and precision. The linguistic search tools may be tuned toward languages or industry-specific vocabularies.
FAST ESP offers a number of capabilities to bolster linguistic analysis and relevance:
- Language Detection and Encoding. The language of the content is used to select language-specific dictionaries and algorithms during document and query processing. FAST ESP automatically recognizes about 80 different languages.
- Lemmatization. All inflections, or different grammatical forms, of a search term are considered, including irregular inflections. For example, the search term “car” matches both “car” and “cars.”
- Synonyms and Thesaurus Framework. Users may find relevant content by searching across terms with a similar meaning. Synonyms may be language-specific, domain-specific or spelling variations.
- Tokenization and Character Normalization. Tokenization involves the detection of word delimiters in the text (white space and other symbols). For Asian languages, this includes a complex semantic analysis.
- Character normalization enables search across variations in accent and case.
- Phrase Detection and Spell Check. Phrase detection provides the recognition and grouping of idioms (e.g., “home run” or “Christmas tree”) for improved precision. Spell Check compares and corrects query terms and phrases against a dictionary.
Source: FAST ESP Brochure 2007
