The University of Manchester


Exploring Semi-Automated Ontology Learning for Biological Ontologies

Robert Stevens1, Simon Jupp1, Jaclyn Bibby1, Johanna Volker2 and David Shotton3
1 BioHealth Informatics Group, School of Computer Science, University of Manchester, Manchester, UK
2 Institute AIFB, University of Karlsruhe, Germany
3 Department of Zoology, University of Oxford, South Parks Road, Oxford, UK

This webpage has all the supplementary material and experimental results for the 'Evaluating an ontology learning tool in Biology' paper. The Text2Onto application can be downloaded here.

  1. Cell biology corpus including provenance
  2. Definitive list
  3. JAPE Patterns
  4. Ontology files
  5. Protégé Plugin - T2O promoter


A zipped file containing a raw text file extracted from Pubmed Central and Google scholar using terms from the CTO is available here.


For each term in the CTO a document was retrieved from either Pubmed Central or Google scholar. This folder conatins a mapping from each term to the document id where the text was extracted.


Definitive List

The flat list of terms extracted under each condition using Text2Onto. (All lowercase, white spaces replaced with _).


JAPE patterns

We wrote two JAPE files for term extraction and relationship extraction. bio_entities.jape and bio_relations.jape.
We also generated a list of word to ignore when searching for cell types, this list was added to the GATE directory as a Gazetteer, whoch was then used ignore certain types of cells. ignore_list.txt.


Ontology files

These are the ontologies generated by Text2Onto. They have been promoted using our protégé plugin and classified using the Pellet DL-reasoner to remove redundancy. Each OWL file is represented in the RDF/XML syntax, we recommend using Protégé 4 to view them, howevere you can view them directly in a web browser using the Manchester Ontology Browser, follow the link for each file.

The cell type ontology is available here, the OBO flat file can also be loaded into Protégé 4. We converted the OBO version of the cell type ontology into the a OWL RDF/XML representation using the GONG tool. The classified Cell Type Ontology was used to perform the gold standard evaluation, it is available here (browse)(Note: this version is rendered with the term names as identifiers rather than the conventional id's).


Protégé Plugin

We implemented a promoter that converts the Text2Onto meta-ontology into an ontology in its own right. The program has various switches to control the promotion process. The program is available as a Protégé 4 plugin available for download here.