Text analysis is about parsing texts in order to extract machine-readable facts from them. The purpose of text analysis is to create sets of structured data out of heaps of unstructured, heterogeneous documents.
The process can be thought of as slicing and dicing documents into easy-to-manage and integrate data pieces.
After sentences are split, the important concepts and entities (i.e.the proper nouns) are identified through dictionary word lists.
For example, through text analysis the text in the sentence Rome was the centre of the Roman Empire and there were over 400 000 km of Roman roads connecting the provinces to Rome is divided into small chunks, which are further classified. This is done by algorithms that first parse the textual content and then extract salient facts about pre-specified types of events, people, things, entities or relationships.
Often the purpose of text analysis is semantic annotation, which overarching goal is to allow easy-to-automate operations related to textual sources.
Text Analysis vs. Text Analytics
You will often find text analysis used interchangeably with text analytics. And while to the untrained mind these might sound like synonyms, from the point of view of practice and experience, there is a subtle difference worth mentioning.
Text analysis is the term describing the very process of computational analysis of texts
Text analytics involves a set of techniques and approaches towards bringing textual content to a point where it is represented as data and then mined for insights/trends/patterns.
Case in point, text analysis helps translate a text in the language of data. And it is when text analysis “prepares” the content, that text analytics kicks in to help make sense of these data.
To get back to the sentence about Rome, text analysis is what you do in order to transform the sentence into data and be able to present to computers what this text is about: Rome, the Roman Empire. Then, once presented in the universal language of data, this sentence can easily enter many analytical processes, text analytics included. With text analytics, you will be able to derive a conclusion about the percentage of texts that mention Rome in the context of the Roman Empire, and not in the context of vacations in Europe, for instance.
How Can Text Analysis Help You?
Companies use text analysis to set the stage for data-driven approach towards managing content. The moment textual sources are sliced into easy-to-automate data pieces, a whole new set of opportunities opens for processes like decision making, product development, marketing optimization, business intelligence and more.
In business context, analyzing texts to capture data from them supports the broader tasks of:
- content management;
- semantic search;
- content recommendation;
- regulatory compliance.
When turned into data, textual sources can be further used for deriving valuable information, discovering patterns, automatically managing, using and reusing content, searching beyond keywords and more.
Using text analysis is one of the first steps in many data-driven approaches, as the process extracts machine readable facts from large bodies of texts and allows these facts to be further entered automatically into a database or a spreadsheet. The database or the spreadsheet are then used to analyze the data for trends, to give a natural language summary, or may be used for indexing purposes in Information Retrieval applications.
For an in-depth view of text analytics and its applications, download our white paper Text Analytics for Enterprise: Use Why Interweave Semantic Data Into Texts.
Or you can listen to our webinar recording Efficient Practices for Large Scale Text Mining Process to see how text analysis can help in the context of your enterprise needs.