Using Semantic Technology to Identify Fraud
The International Center for Asset Recovery (ICAR) within the Basel Institute on Governance, fights financial crime and helps countries to recover stolen assets, with an emphasis on asset tracking and international cooperation.
To create a prototype asset tracking system that could be used to expedite the recovery of internationally distributed stolen national assets.
Corruption and stealing of national assets by dictators and former dictators (so-called Kleptocrats) is a big problem world-wide. It is especially huge in developing countries, since it robs the national economies of much needed financial resources. For example, General Sani Abacha, who was de facto President of Nigeria from 1993 to 1998, is reported to have siphoned a total of 3 billion GBP out of the country’s coffers.
ICAR helps the Financial Intelligence Units (FIUs) of developed and developing countries in the areas of financial investigations, fight against corruption and money laundering, asset tracing and recovery, mutual legal assistance. ICAR provides capacity building, technical assistance and training, access to critical intelligence leads from public domain sources.
The Solution: Asset Recovery Intelligence System (ARIS) Prototype
The ARIS project was facilitated with the cooperation of Dow Jones, World Check and EDGAR (international organization of the FIUs of 104 countries)
The ARIS Platform helps financial investigators, analysts and FIUs with tracking stolen assets. ARIS is a web-based secure service that provides the user with the asset-related profile of a named legal entity.
The conceptual architecture of ARIS is below. The data sources used are described further.
Products and Techniques
Ontotext used the following products and techniques to accomplish the task:
- KIM: platform for semantic annotation and multi-paradigm search over documents, data, and knowledge.
- OWLIM: semantic repository (Knowledge Base) allowing efficient storage of semantic data (RDF triples or facts), efficient inference and query answering. Numerous independent benchmarks show that OWLIM is one of the most scalable and efficient semantic repositories world-wide
- NER: Named Entity Recognition: recognize people, companies, roles, etc in free text
- IdR: Identity Resolution: recognize variant spellings (e.g. Sani Abacha, S.Abacha or General Abacha) as referring to the same entity
- CR: Coreference Resolution: recognize (e.g. pronouns)
- REX: Relation Extraction: recognize and infer relations between entities. Uses a grammar-based approach, and “weak counterparts” for some relations (this allows bidirectional navigation between relations). Below is an example of navigating relations to find hidden connections between two people:
- IE: Information Extraction: semantic annotation techniques to extract information from unstructured text such as web pages, Word and PDF documents. Examples of used techniques:
- Gazetteers (e.g. lists of common person forenames)
- Rules such as: a capital word followed by “Inc.” is likely to be a company; the text “son of” followed by a named entity indicates a family relationship
- Learning from a set of documents annotated with occurrences of named entities, and optionally with lower level annotations like POS, Token, Lookups, etc., used to “train” Machine Learning components to classify new ones correctly.
- IR: Information Retrieval: finding documents related to a semantic query and the entities mentioned in the query (even if they are not explicitly mentioned in the document)
- NLP: Natural Language Processing: to extract taxonomy and context information by matching keywords and establishing taxonomy hierarchy. Stemming (lemmatization) is used to match keyword variations.
Data Sources, Entities, Relations
ARIS uses various data sources:
- Domain-specific documents, such as Suspicious Activity Reports (SARs) from a bank or financial institution to the FIUs
- Google search
- News feeds from world-wide press agencies
- Financial and risk-related information from external sources such as:
- Dow Jones Watchlist data feed
- Dow Jones Factiva news feed
- WorldCheck Politically Exposed Persons (PEP) data feed
- Background knowledge from Ontotext’s World Knowledge Base, which includes information about people, companies and relations
- KIM is used to extract named entities, relationships, factual knowledge such as dates and financial amounts, and taxonomy information (context-driven index) from free text. All extracted knowledge is annotated with semantic information, cleaned up, referenced and interlinked. Then the knowledge is stored in Ontotext’s OWLIM semantic repository, which makes very fast inferences and answers semantic queries.
To get an idea about the variety of recognized entities and relations, we list below only two types. A small part of the ARIS ontology is shown on the right:
- businesses including an airline
- house in Haiti
- bank account
- BHS Bank, bank
- luxury penthouse in Cape Town, South Africa, which has three en-suite bedrooms, two lounges, a designer kitchen, an entertainment room, five plasma screen television and a sound system worth half a million Rands
- property known as Chelsea Hotel Abuja
- Santolina Investment Corporation’s account with National Westminster Bank
- company’s account with the National Westminster Bank with A/C No. 1234567 Sort Code 15-00-25
- Name a business partner of Name
- Name the right hand man of Name
- Name an acquaintance of Name
- Name a friend of Name
- Name a colleague of Name
- Name’s colleague Name
- Name has Driver Name
- Name has Lawyer Name
- Name has Accountant Name
- Name has Butler Name
- Name has Secretary Name
- Name has Spokesman Name
The key of ARIS’ financial intelligence capabilities are the relations between entities, mentioned in the source databases and various public documents. A conceptual model of relations and mentions is shown here:
By exploiting all available data sources, ARIS discovers hidden relations between people, companies, accounts, transfers, etc. An example of the relations graph for Sani Abacha is shown below:
ARIS summarizes collected and inferred facts and found keywords about an entity in the “Entity Profile”. Below is the profile for one Au Man Long of Macau:
The complete set of facts, including the Relations Graph, is presented in the “Entity Factsheet”, as shown below.
Information on corrupt practices is available in country A
Information on financial transactions is available in country B
Very often, the transactions are not made in the name of a corrupt official, but on his/her behalf by friends, family or close business associates.
How to make this information flow and connect the dots?
The Solution: ARIS.
Notifies FIUs with the names of individuals and proceeds of corruption that may be linked to them and corruption cases reported in regional news outlets and other online databases.
Provides FIUs with on-demand profiles of individuals, including information like the person’s network, roles, activities, etc.