Luxid® Annotation Factory

Luxid® Annotation Factory provides a robust and scalable natural language processing pipeline that supports fast and reliable annotation services.

 

 

Overview

Based on UIMA, Luxid® Annotation Factory's distributed architecture automatically balances the workload to all available processing units, and implements native fail-over capabilities sustaining the continuous processing required in mission-critical applications.

At the core : Skill Cartridges®

At the core of this distributed pipeline, specialized extraction modules called Skill Cartridges® flexibly support information extraction needs across multiple domains and applications. Depending on the context, Skill Cartridges® may focus on entities of general interest such as companies, people, locations, or dates, or on those that are more domain-specific– such as proteins or genes in biology. Beyond entities, Skill Cartridges® are also able to extract more structured information in the form of the relations that link these entities (for example a merger between two companies or a chemical reaction between two compounds), including the roles played by each entity in the relation, as well as their attributes or other contextual information mentioned in the document.

 

A broad range of extraction techniques

To optimize its performance across use cases, Luxid® Annotation Factory provides a market-leading part-of-speech tagging layer supporting 20 languages and embeds a wide range of information extraction techniques including morpho-syntactic reasoning, statistics, thesaurus- and taxonomy-based extraction, machine learning and rules-based extraction. Luxid® Annotation Factory is also capable of performing corpus-level operations such as Categorization (the classification of documents in predefined categories) or Clustering (the grouping of similar documents into dynamically created clusters). Adding even more flexibility to these native capabilities, Annotation Factory also supports the integration of third-party UIMA-compliant annotators in annotation plans.

 

Deployment and Integrations

A key component of Luxid® Annotation Factory, its Workflow Engine is in charge of the end-to-end management of complex workflows that can involve multiple cascaded Skill Cartridges® as well distributed multi-CPU and multi-node processing. To streamline platform deployment and support the robust operation of such workflows, the Workflow Engine also implements advanced fault-tolerance mechanisms and centralized log management. A web-based administration interface provides users control over all platform parameters and centralized access to its logs. The platform is furthermore JMX compliant, easily integrating into common monitoring consoles.

 

While the core of its pipeline is natively based on XML, Luxid® Annotation Factory supports more than 200 file formats on the ingress side, and provides a flexible range of output options including XML and RDF. To ease its deployment within the major content management systems, a growing range of off-the-shelf integrations is available, in particular for Marklogic, Microsoft SharePoint 2010, EMC Documentum, Alfresco Enterprise, and Nuxeo. The Web Services offered by Luxid® Content Enrichment Platform facilitate the its integration within other content management systems, applications and information management workflows.

Related downloads
pdf
pdf
628 Kb
Structure, manage and exploit your unstructured content.
Download