Getting Started with S4, The Self-Service Semantic Suite

Getting started with S4, The Self-Service Semantic Suite is as easy as registering for a developer account and accessing RESTful services for text analysis and linked data querying.

Here’s how S4 developers can get started with The Self-Service Semantic Suite.  This post provides you with practical information on the following topics:

  • Registering a developer account and generating API keys
  • RESTful services & free tier quotas
  • Practical examples of using S4 for text analytics and Linked Data querying

Registering a Developer Account

The first requirement for using the S4 services is to register a personal developer account with the S4 Management Console at http://s4.ontotext.com

Once logged in, the developer should create an API Key pair via the “API Key Management” section of the Console. Any access to the S4 REST services requires a private API key pair, which consists of two randomly generated parts: the “key ID” and “key secret”. Each developer can generate an unlimited number of API key pairs and each API key pair can be enabled/disabled or deleted as necessary. Once the API key pair has been generated, the developer’s applications can access any RESTful services running on the S4 platform.

Note: You must store the API Key pair (ID & secret) at a secure location. If you lose your API key secret, you will not be able to recover it. In this case you will need to delete your existing key and generate a new one.

S4 RESTful Services

All S4 RESTful services utilize a transport level encryption via SSL. Unencrypted HTTP access is not supported. All S4 services support gzip compression of the service response for better performance (using this feature is recommended but optional). As already explained in the previous section, all S4 API requests require a valid API key pair for HTTP basic authentication.

The S4 RESTful services can be accessed by applications via the following endpoints:

At present S4 provides a free usage tier with the following limits:

  • 250 MB of text processed monthly (via the text analytics services)
  • 5,000 SPARQL queries monthly (via the LOD SPARQL service)

If the free quota is completely exhausted then subsequent S4 service requests will fail with HTTP error 429 “Too Many Requests”

We will demonstrate S4 services in two simple scenarios:

  • Annotate a news document via the News analytics service
  • Send a simple SPARQL query to the Linked Data service

Text Analytics

The HTTP headers for the text analytics RESTful service requests provide options for specifying various parameters of the service request and response, but the only mandatory request parameter is Content-Type:

  • Content-Type (required) specifying the MIME type of the request. It should be set to “application/json”

Note: Developers may refer to the documentation of the text analytics services for more details on additional optional parameters for the RESTful services.

The JSON request format provides means for specifying various input parameters such as the document content (or the URL from which the document can be retrieved), the document type (plain text, HTML, Twitter JSON, etc.), as well as means for filtering the results returned by the text analytics service (e.g. limiting results only to specific types of annotations such as entities, relations, organizations, etc.). Developers may refer to the documentation of the text analytics services for more details.

In our example we will use a very simple request with a remote HTML document content which looks like:

{
  "documentUrl" : "http://www.theguardian.com/world/2014/aug/21/ferguson-eric-holder-says-justice-will-be-upheld-in-michael-brown-case" ,
  "documentType" : "text/html"
}
We are now ready to send a simple RESTful request to the S4 text analytics services:
API_KEY=...
KEY_SECRET=...
SERVICE_ENDPOINT="https://$API_KEY:$KEY_SECRET@text.s4.ontotext.com/v1/news"
 
CONTENT_URL="http://www.theguardian.com/world/2014/aug/21/ferguson-eric-holder-says-justice-will-be-upheld-in-michael-brown-case"
CONTENT_TYPE="text/html"
JSON_REQUEST="{\"documentUrl\" : \"$CONTENT_URL\", \"documentType\" : \"$CONTENT_TYPE\"}"
 
curl -X POST -H "Content-Type: application/json" -H "Accept: application/json" -d "$JSON_REQUEST" $SERVICE_ENDPOINT
Lets go step-by-step through the sample code above:
  1. We specify the API Key and secret – as already explained all S4 requests need a valid API key and secret pair
  2. We specify the S4 RESTful service to be used – in this case the “News” analytics service. As part of the endpoint URL we also provide the API key pair
  3. We have chosen to analyse an HTML document located on the TheGuardian.com website
  4. We construct the proper JSON request document for S4 – comprised of the content URL + “text/html” as content type
  5. We make a RESTful request to the S4 service via the command line tool curl, providing the JSON request document (from step 4), the S4 service endpoint (from step 2) and we specify in the HTTP header that this HTTP request type is “application/json” (note that this is different from the document content type, which was “text/html”)

The result of the service invocation is another JSON document (the structure is described on the Text Analytics page) which contains annotations and their offsets for various entities found in text:

  • Organization (“Guardian News”, “ABC News”, “FBI”, “St. Ann Police Force”, etc)
  • Location (“Ferguson”)
  • Person (“Michael Brown”, “James Foley”)
  • quotations
  • relations (“police officer”, “the county prosecutor”, etc)

Linked Data Querying

The HTTP headers for the LOD RESTful service requests provides options for specifying various parameters of the service request and response:

  • Content-Type (required) which must be set to “application/x-www-form-urlencoded
  • Accept (required) which specify the required output format, for example “application/sparql-results+xml“, “application/sparql-results+json“, “text/rdf+n3” or others

Note: Developers may refer to the documentation of the LOD services for more details on the parameters for the RESTful service.

The request itself has one mandatory parameter:

  • query (required) specifying a valid SPARQL query for the LOD service

Let’s execute the following simple SPARQL query, listing the names of all European countries:

# Countries in Europe
PREFIX dbp-ont: <http://dbpedia.org/ontology/>
PREFIX geo-ont: <http://www.geonames.org/ontology#>
 
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
PREFIX dbpedia: <http://dbpedia.org/resource/>
SELECT DISTINCT ?name
WHERE {
    ?country rdf:type dbp-ont:Country ;
             skos:prefLabel ?name ;
             geo-ont:parentFeature dbpedia:Europe .
} ORDER BY ?name
We will use a simple online tool like curl to execute the request to the LOD service:
API_KEY=...
KEY_SECRET=...
SERVICE_ENDPOINT="https://$API_KEY:$KEY_SECRET@lod.s4.ontotext.com/v1/FactForge/sparql"
 
SPARQL_QUERY="PREFIX+dbp-ont%3A+<http%3A%2F%2Fdbpedia.org%2Fontology%2F>%0D%0APREFIX+geo-ont%3A+<http%3A%2F%2Fwww.geonames.org%2Fontology%23>%0D%0A%0D%0APREFIX+rdf%3A+<http%3A%2F%2Fwww.w3.org%2F1999%2F02%2F22-rdf-syntax-ns%23>%0D%0APREFIX+skos%3A+<http%3A%2F%2Fwww.w3.org%2F2004%2F02%2Fskos%2Fcore%23>%0D%0APREFIX+dbpedia%3A+<http%3A%2F%2Fdbpedia.org%2Fresource%2F>%0D%0ASELECT+DISTINCT+%3Fname%0D%0AWHERE+{%0D%0A++++%3Fcountry+rdf%3Atype+dbp-ont%3ACountry+%3B%0D%0A+++++++++++++skos%3AprefLabel+%3Fname+%3B%0D%0A+++++++++++++geo-ont%3AparentFeature+dbpedia%3AEurope+.%0D%0A}+ORDER+BY+%3Fname"
 
curl -X POST -H "Content-Type: application/x-www-form-urlencoded"  -H "Accept: application/sparql-results+json" -d "query=$SPARQL_QUERY" $SERVICE_ENDPOINT
Let’s go step-by-step through the sample code above:
  1. We specify the API Key and secret – as already explained all S4 requests need a valid API key and secret pair
  2. We specify the S4 RESTful service to be used – in this case the LOD service SPARQL endpoint. As part of the service endpoint URL we also provide the API key pair
  3. We have chosen to execute a simple SPARQL query listing the names of all European countries
  4. We make a RESTful request to the S4 service via the command line tool curl, providing the query (from step 3), the S4 service endpoint (from step 2) and we specify in the HTTP header that 1) this HTTP request type is “application/x-www-form-urlencoded” and 2) that the response format should be JSON (“application/sparql-results+json“)

The result of the service invocation is a JSON file which looks like:

{
    "head": {
        "vars": [ "name" ]
    },
    "results": {
        "bindings": [
            {
                "name": { "type": "literal", "xml:lang": "en", "value": "Albania" }
            },
            {
                "name": { "type": "literal", "xml:lang": "en", "value": "Andorra" }
            },
            {
                "name": { "type": "literal", "xml:lang": "en", "value": "Austria" }
            },
            {
                "name": { "type": "literal", "xml:lang": "en", "value": "Belarus" }
            },
...
...
            {
                "name": { "type": "literal", "xml:lang": "en", "value": "Vatican City" }
            }
        ]
    }
}

Final Words

We have demonstrated the use of two S4 services:

  • News analytics, for annotating a simple HTML document from TheGuardian.com website
  • Linked Data querying, with a simple SPARQL query listing the names of all European countries

For our examples we used a very simple command line tool like curl, but S4 provides a simple Java API as well as sample code for C#, Java, Groovy and Python. In the near future we will provide more complex scenarios of analyzing social media content with S4.

If you haven’t done so already – register and start using S4 right away!

Marin Dimitrov

Marin Dimitrov

CTO at Ontotext
As the technological captain of Ontotext, he is leading the company on the right tech route and reserving our spot on the map of the world. His sharp mind can explain complex things in a simple way, making him an invaluable resource in semantics. Marin is a frequent speaker on semantic conferences and open data meetups at various technology related events.
Marin Dimitrov

Related Posts

  • Open data fosters a culture of creativity and innovation

    Open Data Innovation? Open Your Data And See It Happen.

    As more and more companies and startups are creating business and social value out of open data, the open data trend-setting governments and local authorities are not sitting idle and are opening up data sets and actively encouraging citizens, developers, and firms to innovate with open data.

  • Linked Open Data Sets

    Linked Data Innovation – A Key To Foster Business Growth

      ‘Data is the new oil’, once said Neelie Kroes,  former Vice-President of the European Commission responsible for the Digital Agenda, aptly describing how the growing amounts of data are changing businesses and our lives. The year…

  • featured image

    Linked Open Data for Cultural Heritage and Digital Humanities

    The Galleries, Libraries, Archives and Museums (GLAM) sector deals with complex and varied data. Integrating that data, especially across institutions, has always been a challenge. On the other hand, the value of linked data is especially high in this sector, since culture by its very nature is cross-border and interlinked.

Back to top