Text mining

GIB has been working on integration of databases using structured and unstructured text mining techniques. On the one hand, sources are usually structured with a logical / conceptual schema, such as relational or object-oriented databases, and on the other hand, the unstructured sources, that have no logical pattern, usually collections of textual documents without any structure.

To resolve this problem, was created a 4-stage method based on techniques of text mining and natural language processing, allowing automatically extracting a logical schema that describes an unstructured source. This method was implemented in a software tool that supports this process, called "OntoAnnotator".

Below is a logical pattern extracted from a text collection in the domain of cancer:

Text mining

 

References:

  1. M. García-Remesal, V. Maojo, J. Crespo, H. Billhardt (2007), Logical Schema Acquisition from Text-Based Sources for Structured and Non-Structured Biomedical Sources Integration, Proceedings Proceedings of the American Medical Informatics Association 2007 Annual Symposium, Chicago, USA, 10-14.
  2. M. Garcia-Remesal, P. Gil, V. Maojo, H. Billhardt, J. Crespo (2008), SAT & ZB: Novel Tools to Acquire and Browse Conceptual Schemas from Public Online Databases for Biomedical Applications, Lecture Notes in Computer Science 4801, pp. 65-70. ISSN 0302-9743. Springer, Germany.