Information Extraction
 

Our Information Extraction (IE) technology complements our PowerAnswer Question Answering (QA) approach perfectly. While PowerAnswer provides an open-domain solution for direct, relevant answers from heterogeneous document collections, IE is ideally suited for customers interested in retrieving highly-relevant information for domain-specific topics. We do not rely on keyword or simple pattern matching, instead, we provide deep syntactic and semantic understanding of domain information. Our IE technology, implemented in the Cicero Information Extraction system, extracts and formats information from virtually any domain of interest. Typical applications range from market analysis, e.g. what stocks changed and how, to intelligence applications, such as monitoring the bombing events in the Middle East.

The Information Extraction technology behind Cicero was developed by our team of renowned researchers. Technical details about our technology have been published in May 2002 in the proceedings of the prestigious "Human Language Technology" conference. The complete paper is available here.

Our prominent position in Information Extraction was recently awarded with significant research contracts for further technology development. Information Extraction is a complex task that encompasses many Natural Language Processing techniques, such as syntactic parsing, entity and event coreference, and information fusion. Our unique ways in approaching these tasks have raised the accuracy of the Cicero IE system well into the 80% range, leaving the other competing systems well behind at low 60% accuracy rates. Language Computer Corporation has developed an open-domain IE framework, which includes open-domain parsing, coreference and named-entity recognition. Fast customization to different domains is guaranteed by our proprietary grammar development environment, which includes a grammar specification language and a run-time grammar execution environment that address the ambiguities of the natural language.