Text mining

GIB has been working on integration of databases using structured and unstructured text mining techniques. On the one hand, sources are usually structured with a logical / conceptual schema, such as relational or object-oriented databases, and on the other hand, the unstructured sources, that have no logical pattern, usually collections of textual documents without any structure.

To resolve this problem, was created a 4-stage method based on techniques of text mining and natural language processing, allowing automatically extracting a logical schema that describes an unstructured source. This method was implemented in a software tool that supports this process, called "OntoAnnotator".

Below is a logical pattern extracted from a text collection in the domain of cancer:

