UIMA Tools for the S4 Text Analytics Services

uima_logo

Back in September we created a GATE plugin for the S4 text analytics services, so that language engineers and developers can easily integrate S4 text analytics services into GATE based text analytics pipelines and applications. We are now happy to announce that we have also developed various add-ons that make it easy to integrate the S4 text analytics services into applications based on another very popular language engineering platform – the Unstructured Information Management Architecture (UIMA).

The UIMA Platform

UIMA provides a specification, architecture and an open source implementation for designing, integrating and deploying components related to various text analytics and search tasks.

Developers and language engineers can easily plug in and run their components (called annotators) on the UIMA runtime environment as part of more complex text analytics applications (called analysis engines). Text analytics components (annotators) analyse the provided input text documents and produce annotations, which in turn can be used as inputs for the other components part of the same application (analysis engine). UIMA provides the standard data structures and interfaces for components to create annotations and to exchange annotations with each other.

 

uima_chart

Developers and language engineers can easily plug in and run their components (called annotators) on the UIMA runtime environment as part of more complex text analytics applications (called analysis engines). Text analytics components (annotators) analyse the provided input text documents and produce annotations, which in turn can be used as inputs for the other components part of the same application (analysis engine). UIMA provides the standard data structures and interfaces for components to create annotations and to exchange annotations with each other.

Using S4 Text Analytics with UIMA

In order to make it easy for developers to integrate S4 analytics services into UIMA applications, we have developed a UIMA Annotator for S4 as well as a uimaFIT SDK. Both tools are open source and available from the S4 GitHub.

The UIMA Annotator for S4 provides a standard Annotator component which can be used standalone, or integrated into more complex applications (analysis engines). The Annotator can also be used with the various GUI tools on the UIMA platform, such as the Document Analyser and the CAS Visual Debugger.

In order to test the S4 Annotator, follow the steps:

  1. Download the PEAR package for the Annotator and install it with the PEAR installer
  2. Configure the annotator by specifying the endpoints of the S4 text analytics services you want to use, and your personal S4 API key pair
  3. Load the Annotator into one of the GUI tools (CAS Visual Debugger or Document Analyser)

 

annotator

Figure 1 Using the S4 Annotator with the UIMA Document Analyser

 

uimaFIT is a library that makes it easy to programmatically configure, test and run the various UIMA text analytics components (annotators, or complex analysis engines). We have also created a uimaFIT SDK for the S4 text analytics services, so that they can be easily integrated into and executed as part of complex UIMA applications. The SDK can be easily installed with Maven.

Writing code with the uimaFIT SDK is also easy:

  1. Provide the various parameters for the S4 uimaFIT annotator (S4 service endpoint, API key pair)
  2. Configure input/output directories
  3. Configure and instantiate an analysis engine
  4. Finally, run the analysis engine with the specified parameters and input data

 

You can also see some sample code using the uimaFIT SDK for S4.

Next Steps

The UIMA tools for S4 provide an easy way for developers and language engineers to integrate the various S4 text analytics services into UIMA text analytics applications. If you haven’t done so already – register for an S4 developer account, download the UIMA Annotator and uimaFIT SDK for S4 and start using the S4 text analytics services right away!

Marin Dimitrov

Marin Dimitrov

CTO at Ontotext
As the technological captain of Ontotext, he is leading the company on the right tech route and reserving our spot on the map of the world. His sharp mind can explain complex things in a simple way, making him an invaluable resource in semantics. Marin is a frequent speaker on semantic conferences and open data meetups at various technology related events.
Marin Dimitrov

Related Posts

Back to top