Content Capture and Refinement
Search applications typically are integrated into various content repositories, content management systems, large software applications, or other complex information systems. This capability ensures the optimal and flexible gathering and preparation of content to the search system. Content is the external data, from many different structured, unstructured, and rich media sources, that is fed into a search engine. Before being stored, the content is refined for optimal retrieval.
The core of FAST ESP’s document navigation, mining and discovery capabilities lies in its entity identification and extraction. Entities may include proper names, dates, prices, company names, and other data elements. In addition, metadata tagging (consisting of simple attribute/value pairs) of documents and information through FAST ESP produces enhanced search precision as results may be sorted by their meta-attributes.
Greater search relevancy is achieved through the tool’s document analysis and processing abilities, including custom-developed pre-processors for document conversion. As a result, document format conversion, language detection, natural language query support, and automatic content classification are at the user’s fingertips. FAST ESP accepts more than 370 data formats from disparate sources within the enterprise, creating a single interface available for searching and alerting.
With FAST ESP, connectors extract content from the source, map the content to a compatible format and submit content such as metadata, access control lists and binaries/text files for indexing. Access control information is used to enforce the security constraints in the source system.
When coupled with knowledge about the data and users of the search service, linguistic features of advanced search engines can greatly improve precision and recall.
Source: FAST ESP Brochure 2007
