Text Mining & Natural Language Processing

Data is is usually structured withing a logical/conceptual schema, such as a relational or object-oriented databases. Unfortunately, some textual documents are unstructured since they do not have a logical pattern. To resolve this problem, the Biomedical Informatics Group has created a 4-stage method using Text Mining and Natural Language Processing techniques. The tool that supports this method is called «OntoAnnotator» and it allows the automatically extraction of a logical schema that describes an unstructured source.

Over the last decade, the GIB has been involved in a large number of text mining and information extraction/retrieval projects. We have been active in accessing and extracting knowledge from various unstructured sources, and from the biomedical literature available in PubMed. Bringing together structured and text-based sources is an exciting challenge for biomedical informaticians, since the most relevant biomedical sources belong to one of these categories. Unfortunately, the methods and tools provided by state-of-the-art database integration tools cannot be reused to bridge together structured and non-structured (text-based) sources, since all of them require the individual sources to be equipped with a logical schema. To address this issue, we created various approaches based on text mining techniques to automatically create a logical schema for non-structured sources. As seen in other sections, we have widely used text mining techniques in a large number of areas.

Related Projects

OntoMineBase

Related News

LAST CALL: «Clinical and Research Databases» Community of Practice!

LAST CALL: «Data Mining in Biomedicine» Community of Practice!

Professor Victor Maojo, AFRICA BUILD Coordinator, exchanging views with two ministers from Cameroon

AFRICA BUILD Portal: Connecting Health Researchers in Africa