Data is is usually structured withing a logical/conceptual schema, such as a relational or object-oriented databases. Unfortunately, some textual documents are unstructured since they do not have a logical pattern. To resolve this problem, the Biomedical Informatics Group has created a 4-stage method using Text Mining and Natural Language Processing techniques. The tool that supports this method is called «OntoAnnotator» and it allows the automatically extraction of a logical schema that describes an unstructured source.
Over the last decade, the GIB has been involved in a large number of text mining and information extraction/retrieval projects. We have been active in accessing and extracting knowledge from various unstructured sources, and from the biomedical literature available in PubMed. Bringing together structured and text-based sources is an exciting challenge for biomedical informaticians, since the most relevant biomedical sources belong to one of these categories. Unfortunately, the methods and tools provided by state-of-the-art database integration tools cannot be reused to bridge together structured and non-structured (text-based) sources, since all of them require the individual sources to be equipped with a logical schema. To address this issue, we created various approaches based on text mining techniques to automatically create a logical schema for non-structured sources. As seen in other sections, we have widely used text mining techniques in a large number of areas.
The main objective of the project is the creation of centres of excellence to promote health research, education and practice in Africa. The creation of these centres will be based on four main pillars: e-learning, knowledge sharing, "know-how" and information technologies. More...