next up previous
Next: Tools for Ontology Development Up: Building an Ontology Previous: The Development Lifecycle


Knowledge representation languages

For ontologies to be used within an application, the ontology must be specified, that is, delivered using some concrete representation. This is the encoding step described above. There are a variety of languages which can be used for representation of conceptual models, with varying characteristics in terms of their expressiveness, ease of use and computational complexity. The field of knowledge representation (KR) has, of course, long been a focal point of research in the Artificial Intelligence community [29]- here we simply outline some of the KR languages which have been used for ontologies in bioinformatics (see Table 1).

Major considerations in the choice of representation are the expressivity of the encoding language, the rigour of an encoding and the semantics of a language:

Languages currently used for specifying bio-ontologies fall into three kinds: vocabularies defined using natural language; object-based knowledge representation languages such as frames and UML, and languages based on predicates expressed in logic such as Description Logics.

Vocabularies support the creation of purely hand-crafted ontologies with simple tree-like inheritance structures. The Gene Ontology, for example, has a hierarchical structure which is asserted - the position of each concept and its relation with others in the ontology is completely determined by the modeller or ontologist. Each entry or concept in the GO has a name, an identifier and other optional pieces of information such as synonyms, references to external databases and so on.

Although this provides great flexibility, the lack of any structure in the representation can lead to difficulties with maintenance or preserving consistency, and there are usually no formally defined semantics. The single inheritance provided by a tree structure (each concept has only one parent in the is-a hierarchy) can also prove limiting. Maintaining multiple inheritance hierarchies, however, is an arduous task - the hand-crafting of single inheritance hierarchies is a difficult enough exercise.

A frame-based system provides greater structure. Frame-based systems are based around the notion of frames or classes which represent collections of instances (the concepts of the ontology). Each frame has an associated collection of slots or attributes which can be filled by values or other frames. In particular, frames can have a kind-of slot which allows the assertion of a frame taxonomy. This hierarchy can then be used for inheritance of slots, allowing a sparse representation. As well as frames representing concepts, a frame-based representation may also contain instance frames, which represent particular instances.

Frame-based systems have been used extensively in the KR world, particularly for applications in natural language processing. The most well known frame system is Ontolingua [31]. Both EcoCyc and RiboWeb use a frame representation. EcoCyc has a frame, amongst others, called `Gene', representing the concept Gene. This frame has slots describing relationships to other concepts, such as Polypeptide product, gene name, synonyms and so on. Frames are popular because frame-based modelling is similar to object-based modelling and is intuitive for many users.

The semantics of frame systems are defined by the OKBC standard [32], although this is a little unclear in places. For example, it is not always clear how to interpret an assertion that a slot is filled with a particular value. Does this mean that all instances of the frame must have this particular attribute taking this value? Or does the value represent possible fillers for the slot for each instance? For example, we might want to say that the frame Gene has a slot saying `all genes must have a GeneName', but it is only a possibility that Genes `have a Polypeptide Product' (some, after all, produce tRNAs).

An alternative to frames is logic, notably Description Logics (DLs) [33,34]. DLs describe knowledge in terms of concepts and relations that are used to automatically derive classification taxonomies. A major characteristic of a DL is that concepts are defined in terms of descriptions using other roles and concepts. For instance, in the TaO, the concept Enzyme was not simply asserted by the ontologist. Instead, a composite concept was made from Protein and Reaction, joined with the relation `catalyses' - to make the concept Protein which catalyses Reaction. Thus someone viewing the ontology can see a definition for the concept Enzyme and the DL reasoner can automatically classify Enzyme as a kind of Protein. In this way, the model is built up from small pieces in a descriptive way, rather than through the assertion of hierarchies. The DL supplies a number of reasoning services which allow the construction of classification hierarchies and the checking of consistency of these descriptions. These reasoning services can then be made available to applications that wish to make use of the knowledge represented in the ontology [35].

Frames generally provide quite a rich set of language constructs but impose very restrictive constraints on how they can be combined or used to define a class. They only support the definition of primitive concepts, and the kind of taxonomy must be hand-crafted. Description Logics have a more limited set of language constructs, but allow primitives to be combined to create defined concepts (as described in Section 2). The taxonomy for these defined concepts is automatically established by the logic reasoning system of the Description Logic.

The drawback, however, is that as languages become more and more expressive, the computational complexity of reasoning increases. Recent results [36], however, show that efficient and practical implementations of expressive languages are feasible, despite their theoretical complexity. The TaO is represented using one such DL formalism. Early implementations of TaO made use of the DL GRAIL [37] - the TaO is now represented using FaCT [36], one of the new breed of DL implementations.

As DLs have clear semantics, it is possible to use all of the knowledge encapsulated in the ontology to reason whether it is consistent and complete. This is not possible with simple representations such as GO - the only relationship available for exploitation is the is-a hierarchy. On the other hand, many DL implementations do not have reasoning over instances.

In fact Description Logics and frames are not that far apart - DLs are a logical reformulation of frames. The OIL (Ontology Inference Layer) knowledge interchange language unifies both into one language, defined using RDF [30]. This turns out to have the simplicity of frames combined with the reasoning services of a DL.


next up previous
Next: Tools for Ontology Development Up: Building an Ontology Previous: The Development Lifecycle
Robert Stevens 2001-07-19