next up previous
Next: Bibliography Up: Ontology-based Knowledge Representation for Previous: Tools for Ontology Development


Discussion

This briefing has introduced the need and use of ontology within the bioinformatics community. The need for ontologies arises from the need to be able to cope with the size and complexity of biological knowledge and data. Ontologies enable knowledge to be used within systems for communication, specification and other processing tasks (see Section 3).

Several bio-ontologies have already been used within the community. Those reviewed in Section 4 demonstrate a wide range of scopes and granularities. Most have in common some core features of molecular biology, such as Gene, Protein and related biologicalFunction and BiologicalProcess, but differ widely in both the content and articulation of their knowledge. This is primarily due to the wide range of tasks to which the ontologies are put. Both RiboWeb and EcoCyc use part of their ontology to define the structure and content of their databases, but as the databases are as different as ribosomal subunit structure and E. coli. metabolism, the ontologies are also necessarily different. Even the common areas, such as macromolecule, differ widely between ontologies, but without any of the ontologies being incorrect.

Bio-ontologies are currently being used for communication of knowledge, as well as database schema definition, query formulation and annotation. When the use of conceptual annotation grows we can expect to see a concomitant change in database retrieval. This will become much more precise and complete than is currently possible with natural language based annotations. Annotation by ontologies should also allow the relationships describing functions, process and components etc. of retrieved entries to be explored with ease.

There are a number of open issues to be addressed in the use of ontology within the bioinformatics community:

Knowledge based reasoning
This briefing started with a description of how biology research is often driven by the use of knowledge, especially by determination of function by sequence similarity. Only RiboWeb, of the ontologies described, approaches this kind of use. It can be expected that the use of ontology to assist in analysis will grow further. This will be made easier by the conceptual annotation of the primary databases - A collection of similar sequences returned by a search could be clustered within an ontology of protein function and features. Such clustering should be able to help with the analysis of similarity search results and other bioinformatics analyses.

Re-use vs Specific
Currently there is little re-use of bio-ontologies - this is partly because of difficulties in the diversity of their representational form, the explicitness of their semantics and the range of applications they address. OIL moves us further forward to a common representational language. As the number of bio-ontologies increases, it will be interesting to see whether there is a growth in the re-use of ontology. The use of ontology in annotation could drive this process, as well as that of ontology in analysis. An open issue in ontology re-use is the evolution of the source ontology once it has been re-used in another ontology. If the original ontology changes, should the changes be reflected where it is re-used and how would this evolution be managed?

Tools and Libraries
The frame-based Protégé ontology development tool [42] is currently being adapted to represent ontologies in OIL, so that we can build and deliver frame-based ontologies whilst gaining from the reasoning services offered by a DL. This may be less important with small local ontologies designed by one expert, but becomes important for large, collaboratively developed ontologies that are intended to be re-used and shared. Libraries of ontologies, such as those held by WebOnto and Ontolingua, must be developed if re-use is to be promoted.

Methodologies for constructing ontologies
The process of building an ontology, as described in Section 5, is a high-cost process. The reality is that the construction of ontologies is an art rather than a science. Methodologies (supported by tools) are essential to: help the developer spot a concept; to modularise their ontologies; to avoid problems such as over elaboration (when should I stop elaborating the ontology); to ensure relevance (when is a concept relevant for an application?) and to verify the ontology for its fitness of purpose and its re-usability (if any).

If the application genuinely needs an ontology and that ontology will be long lived, then the investment may well be worth while. Like many technologies, in a discipline such as bioinformatics, it is the community effort that is important in making the use of that technology productive.

Acknowledgements: Robert Stevens is supported by a grant from the BBSRC/EPSRC under the bioinformatics initiative (34/BIO12090); Sean Bechhofer is supported by a grant from the EPSRC under the DIM initiative (GR/M/75426).


next up previous
Next: Bibliography Up: Ontology-based Knowledge Representation for Previous: Tools for Ontology Development
Robert Stevens 2001-07-19