Next: Applications and Types of Up: Ontology-based Knowledge Representation for Previous: Introduction

What is an Ontology?

Ontology is the study or concern about what kinds of things exist - what entities or `things' there are in the universe [3]. The computer science view of ontology is somewhat narrower, where an ontology is the working model of entities and interactions either generically (e.g. the Cyc ontology [4]) or in some particular domain of knowledge or practice, such as molecular biology or bioinformatics. The following definition is given in [5]:

`An ontology may take a variety of forms, but necessarily it will include a vocabulary of terms, and some specification of their meaning. This includes definitions and an indication of how concepts are inter-related which collectively impose a structure on the domain and constrain the possible interpretations of terms.'

Gruber defines an ontology as `the specification of conceptualisations, used to help programs and humans share knowledge' [6]. The conceptualisation is the couching of knowledge about the world in terms of entities (things, the relationships they hold and the constraints between them). The specification is the representation of this conceptualisation in a concrete form. One step in this specification is the encoding of the conceptualisation in a knowledge representation language. The goal is to create an agreed-upon vocabulary and semantic structure for exchanging information about that domain. The specification or encoding of an ontology will be explored in Section 5.

The main components of an ontology are concepts, relations, instances and axioms. A concept represents a set or class of entities or `things' within a domain. Protein is a concept within the domain of molecular biology. Concepts fall into two kinds:

primitive concepts are those which only have necessary conditions (in terms of their properties) for membership of the class. For example, a globular protein is a kind of protein with a hydrophobic core, so all globular proteins must have a hydrophobic core, but there could be other things that have a hydrophobic core that are not globular proteins.
defined concepts are those whose description is both necessary and sufficient for a thing to be a member of the class. For example, Eukaryotic cells are kinds of cells that have a nucleus. Not only does every eukaryotic cell have a nucleus, every nucleus containing cell is eukaryotic.

Relations describe the interactions between concepts or a concept's properties. Relations also fall into two broad kinds:

Taxonomies that organise concepts into sub- super-concept tree structures. The most common forms of these are
- Specialisation relationships commonly known as the `is a kind of' relationship. For example, an Enzyme is a kind of Protein, which in turn is a kind of Macromolecule.
- Partitive relationships describe concepts that are part of other concepts - Protein hasComponent ModificationSite.
Associative relationships that relate concepts across tree structures. Commonly found examples include the following:
- Nominative relationships describe the names of concepts - Protein hasAccessionNumber AccessionNumber (in the context of bioinformatics) and Gene hasName GeneName.
- Locative relationships describe the location of one concept with respect to another - Chromosome hasSubcellularLocation Nucleus.
- Associative relationships that represent, for example, the functions, processes a concept has or is involved in, and other properties of the concept - Protein hasFunction Receptor, Protein isAssociatedWithProcess Transcription and Protein hasOrganismClassification Species.
- Many other types of relationships exist, such as `causative' relationships, that are described in [7,8].

The relations, like concepts, can be organised into taxonomies. For example, hasName can be subdivided into hasGeneName, hasProteinName and hasDiseaseName. Relations also have properties that capture further knowledge about the relationships between concepts. These include, but are not restricted to:

whether it is universally necessary that a relationship must hold on a concept. For example, when describing a protein database, we might want to say that Protein hasAccessionNumber AccessionNumber holds universally, i.e., for all proteins.
whether a relationship can optionally hold on a concept, for example, we might want to describe that Enzyme hascofactor Cofactor only describes the possibility that enzymes have a cofactor, as not all enzymes do have a cofactor.
whether the concept a relationship links to is restricted to certain kinds of concepts. For example, Protein hasFunction Receptor restricts the hasFunction relation to only link to concepts that are kinds of receptors. Protein hasFunction says that Protein has a function but does not restrict as to what kind of concept the function might be.
the cardinality of the relationship. For example, a particular AccessionNumber is the accession number of only one Protein, but one Chromosome may have many Genes.
whether the relationship is transitive, for example if Protein isAssociatedWithProcess Transcription and Transcription isAssociatedWithProcess GeneExpression, then Protein isAssociatedWithProcess GeneExpression. The taxonomy relations always have this property.

Once this conceptualisation has been made concrete (see Section 5) an ontology has been produced.

Instances are the `things' represented by a concept - a human cytochrome C is an instance of the concept Protein. Strictly speaking, an ontology should not contain any instances, because it is supposed to be a conceptualisation of the domain. The combination of an ontology with associated instances is what is known as a knowledge base. However, deciding whether something is a concept of an instance is difficult, and often depends on the application [9]. For example, Atom is a concept and `potassium' is an instance of that concept. It could be argued that Potassium is a concept representing the different instances of potassium and its isotopes etc. This is a well known and open question in knowledge management research.

Finally, axioms are used to constrain values for classes or instances. In this sense the properties of relations are kinds of axioms. Axioms also, however, include more general rules, such as nucleic acids shorter than 20 residues are oligonucleiotides.

Next: Applications and Types of Up: Ontology-based Knowledge Representation for Previous: Introduction

Robert Stevens 2001-07-19