Simple Bio Upper Ontology

Alan Rector, Robert Stevens, Jeremy Rogers
and the CO-ODE and BioHealth Informatics Teams
School of Computer Science
University of Manchester
Manchester M13 9PL, England

January 2006

I An experimental simple upper ontology for biomedicine

The notes below describe the purpose of this OWL implementation of a simple upper ontology for biomedicine. All this is work in progress and intended stimulate discussion. Many users will already be in discussion groups. Or mail me at rector@cs.man.ac.uk. Please put "BIO-ONT:" in subject line (automatic if you click here.)

II General Consideations

We factor ontology implementations into two parts:

The upper ontology or top ontology - which consists of the basic abstract categories and the major relations that link them (properties in OWL).
The domain ontology that contains all the domain specific concepts. It is often useful to separate the domain ontology into the:

Top domain ontology - the major domain categories that hook directly to the upper ontology - e.g. in biology "Cell", "Organism", "Body Part", "Organ", "Tissue" etc. The Top domain ontology usually includes further basic relations/properties and constraints on their use, and may further constrain the use of the relations from the upper ontology.
The domain ontology proper - the bulk of the entities to be represented - e.g. "Blood cell", "Mouse", "Limb", "Liver", "Liver parenchyma", etc.

We recommend implementing domain ontologies around a skeleton of disjoint trees of primitives using the methods for normalisation described in Modularisation of domain ontologies implemented in description logicsand related formalisms including OWL.

We recommend implementing upper ontologies as a series of dichotomies following a twenty questions model as shown below. The purpose of these demonstrations is to provide a simple upper ontology suitable for biomedicine and show how the Top domain ontology for biomedicine links to it.

III Background - an engineering approach to upper ontologies

The simple-bio-upper-ontology is described below. This section gives a bit of motivation. If you just want to download the ontology, you may skip this part and go straight to the implementation and download.

Upper ontologies are different from domain ontologies. A great deal of time and effort can be expended on upper ontologies, and some even doubt their effectiveness. We strongly advise that an upper ontology is NOT the place to start developing an ontology. Work out what you need in your domain first. However, the distinctions in upper ontologies are important to all but the simplest ontologies. Using a suitable upper ontology can cut the time and effort to build an ontology and avoid simple mistakes. If you plan to share an ontology, basing it on a sound upper ontology will help avoid simpel mistakes and make the ontology more likely to be re-usable.

This upper ontology is intended to serve three purposes:

It provides a set of basic relations and the classes to express the constraints on their use
It provides a starting point and helps with basic distinctions

It provides a common vocabulary and point of attachment for the top domain ontology.

This upper ontology is meant to be light weight and easy to use. It is tuned to biology, and the biology module adds specific biological concepts and the advanced relations model likewise contains properties motivated by biological examples. five basic principles have governed its construction:

No distinction without a difference - all classes in the upper ontology should be motivated by constraints and inferences that can be made from them, in particular most are either the domain or range of some property (relation)

The twenty questions approach: Membership in classes should be determined by a series of simple, intuitive question

Deferred commitment - decisions (ontological commitments) should be deferred whenever possible until they are actually required. Otherwise known as "Do only waht you have to".
Copy rather than invent - where possible, we have used names and notions from others, particularly DOLCE.
Implementable in OWL-DL - the entire ontology is to be implemented in OWL-DL and subject to inference using standard reasoners. Inevitably this means that some constraints cannot be expressesd and some notions are under-specified.

Because the upper ontology is implemented as a series of dichotomies, a series of yes-no questions as to which branch of the dichotomy to follow should suffice to place each item in the top domain ontology. Once these are placed, the rest should follow.

In principle, it ought to be possible to determine the location of any item in the top domain ontology by a game of 'twenty questions'. Often determining the top ontology categories for classes is sufficient to identify which properties can hold between them. When there are several, as in the relations between processes and things or for describing the different ways in which one thing can be part of another, then a second game of 'twenty questions' should be sufficient to determine which to use.

The methodology is described loosely in the papers in III below and in the annotations of the Simple Bio Top Ontology which follows. (A set of teaching slides relating to earlier versions of the ontology can also be found here and there are a series of papers describing the methodology). More detailed discussions are in preparation.

Comments and suggestions are welcome to rector@cs.man.ac.uk. Please put "BIO-ONT:" in subject line. Formulating good questions is particularly difficult, so comments and suggestions are particularly welcome.

IV Simple Bio Upper Ontology - OWL implementation and downloads

The OWL models are intended to be self documenting, with extensive comments on most classes including definitions and the key questions. Unfortunately, it is not possible to control the order of presentation of the classes, so the dichotomies that form the background have to be recognised from the names. Otherwise, we hope that the intentions are clear from the comments. If not, it is a good topic for discussion.

The factored ontology is somewhat harder to handle but easier to extend smoothly.
The factored ontology can be downloaded as a zip file from http://www.cs.man.ac.uk/~rector/ontologies/simple-top-bio/simple-top-bio-factored.zip

The modules all load with Protege-OWL3.2 beta build 304 (for downloading and installation instructions see below.) In order to make sense of them, they must be classified. We commonly use Fact++ or Racer 1.7 or FaCT++, but Pellet should also work.

There is a 'boot' ontology which simply includes all other ontologies. Alternatively, any of the ontologies should load individually. In general it is better to load from the OWL file. Once loaded and the imports are resolved for your machine, save the file. Subsequent loads can be from the .pprj file.

Unresolved imports - It is not an error! All these ontologies will require establishing an ontology repository in the file in which they have been unzipped.   When the rather nasty pop-up appears saying "unresolved import", click "Add Repository" and then choose "Local folder" and navigate to the folder where you unzipped the OWL files, normally the same folder from which you opened the main OWL file. (Yes, we'll make this smoother Real Soon Now.)

The ontology is factored into a series of modules starting from 'very-top'.

very-top - A few very general categories
    top-self-standing - The heart of the real top ontology
        additiona-self-standing - Some additional things that may be more controversial
          refining-entities-and-properties - the key modifiers
             quantities - a very basic ontology of quantities sufficient for demo only
             basic-substances - the basic notions of substances including water for demonstrations
             vertebrate-gross-anatomy - a very top bit of gross anatomy, almost compiant with the FMA
             cells - a very basic notions of cells to provide Red Blood Cell for demonstration with collectives
             normality - the GALEN model of normal and nonNormal, patholological and non-pathological
             sequences - a demonstration of using lists in OWL for sequences.
             collectives - a demonstration of collectives and mixtures
                 basic-body-substances - a demonstration of the use of mixtures and collectives for blood in OWL
             samples - samples and experiments as shown in the PSB poster. (missing in this first release)

BEWARE. The imports mechanism is still a bit fragile. If you edit anything, be sure you have made a copy of the entire folder first. DO NOT change the names in any file that is imported by another file. OWL uses only names (URIs actually) as its references between modules. Change one, and you may break a reference, after which the importing file will not load in Protege    . (More robust behaviour when things aren't found and an implementation that transparently uses anonymous IDs and names as labels are under development.)

An older less well commented version of the ontology can be found at Unfactored Mini-top-bio-with-demo-entities.

V Background and Papers describing the ontology

This ontology began as a development of the reconstruction of the GALEN upper ontology and an attempt to reconcile it with Guarino and Welty's DOLCE and the Digital Anatomist Foundational Model of Anatomy (FMA) with considerations of Smith et al's BFO. It also includes the standard GALEN scheme for "Normal/NonNormal" and 'Pathological/NonPathological" and an implementation of the notion of "Collectives" described in the Rome ontologies meeting. (PowerPoint here draft paper here. (The journal version is about to appear and will be linked as soon as it does.).

The paper “Patterns, Properties and Minimizing Commitment: Reconstruction of the GALEN Upper Ontology in OWL", Alan L Rector and Jeremy Rogers, Core Ontologies Workshop (CORONT) in conjunction with the European Knowledge Acquisition Workshop(EKAW-2004), Northampton, UK is located at galen-top-reconstructed-rector-rogers.pdf. The slides from the presentation with a good example of the "twenty questions" approach to placing top domain entities is at EKAW-GALEN-Upper-Ont-Reconstructed.ppt. The OWL model of the ontology is intended to be self documenting, with extensive comments on the critical sections including critical questions.

The notion of using a Twenty Questions approach was suggested by Robert Stevens and resulted in the short poster and has proved popular with students. A brief summary is given in the PSB poster available here

VI Downloading tools to view the ontology

Protege can be obtained from http://protege.stanford.edu. Be sure to download the latest beta, currently 3.2 build 304. Imports do not work in older versions! You are also advised to donwload a copy of the Manchester Syntax Editor, Unit Testing Framework, Debugger, List Wizard, and OwlDoc from http://www.co-ode.org.

You will also need a copy of a classifier. You must set the port in OWL--> Preferences-->Reasoner-->URL to the port corresponding to the classifier installed. Racer listens on 8080, the default. FaCT++ listens on 3490. (These can be changed as you wish - see documentation for each reasoner.)

FaCT++ is available from http://owl.man.ac.uk/factplusplus/.
Racer is available from http://www.fh-wedel.de/%7Emo/racer/
Pellet is available from http://www.mindswap.org/2003/pellet/

For OwlViz you will need a recent copy of the GraphViz tools available from http://www.research.att.com/sw/tools/graphviz/

This file

This file should be located at http://www.cs.man.ac.uk/~rector/ontologies/sample-top-bio/ and/or at http://www.co-ode.org/ontologies/simple-top-bio/

Acknowledgements

Many ideas in this ontology come from Guarino and Welty's DOLCE and from Smith et al's BFO. However, they should not be held responsible for anything here. The work is based on experience in developing the GALEN ontology which was largely constructed by Jeremy Rogers but involved contributions from many members of a large team. The tools and teaching experience come from the CO-ODE team where many further resources can be found. The Protege-OWL tools have been developed in collaboration between the Stanford Medical Informatics and CO-ODE. Protege itself was developed at Stanford and has a long history and extensive user group - see protege.stanford.edu.