Next: Applications and Types of
Up: Ontology-based Knowledge Representation for
Previous: Introduction
What is an Ontology?
Ontology is the study or concern about what kinds of things exist - what
entities or `things' there are in the universe [3].
The computer science view of ontology is somewhat narrower, where an ontology is
the working model of entities and interactions either generically (e.g. the Cyc ontology [4]) or in some particular domain of
knowledge or practice, such as molecular biology or bioinformatics. The following definition is given in [5]:
`An ontology may take a variety of forms, but necessarily it will include a vocabulary
of terms, and some specification of their meaning. This includes definitions and an
indication of how concepts are inter-related which collectively impose a structure on
the domain and constrain the possible interpretations of terms.'
Gruber defines an ontology as `the specification of conceptualisations, used to help
programs and humans share knowledge' [6]. The
conceptualisation is the couching of knowledge about the world in terms
of entities (things, the relationships they hold and the constraints between
them). The specification is the representation of this conceptualisation
in a concrete form. One step in this specification is the encoding of the conceptualisation in a knowledge representation language.
The goal is to create an agreed-upon vocabulary and semantic structure for exchanging
information about that domain. The specification or encoding of
an ontology will be explored in Section 5.
The main components of an ontology are concepts, relations, instances and axioms.
A concept represents a set or class of entities or `things' within a domain.
Protein is a concept within the domain of molecular biology. Concepts fall into
two kinds:
- primitive concepts are those which only have necessary
conditions (in terms of their properties) for membership of the
class. For example, a globular protein is a kind of
protein with a hydrophobic core, so all globular proteins must have
a hydrophobic core, but there could be other things that have a
hydrophobic core that are not globular proteins.
- defined concepts are those whose description is both
necessary and sufficient for a thing to be a member of the class.
For example, Eukaryotic cells are kinds of cells that have a
nucleus. Not only does every eukaryotic cell have a nucleus, every
nucleus containing cell is eukaryotic.
Relations describe the interactions between concepts or a concept's properties. Relations also fall into two
broad kinds:
- Taxonomies that organise concepts into sub- super-concept tree structures.
The most common forms of these are
- Specialisation relationships commonly known as the `is a kind of' relationship.
For example, an Enzyme is a kind of Protein,
which in turn is a kind of Macromolecule.
- Partitive relationships describe concepts that are part of other concepts
- Protein hasComponent ModificationSite.
- Associative relationships that relate concepts across tree structures.
Commonly found examples include the following:
- Nominative relationships describe the names of concepts - Protein hasAccessionNumber AccessionNumber (in the context of bioinformatics) and Gene hasName
GeneName.
- Locative relationships describe the location of one concept with respect
to another - Chromosome hasSubcellularLocation Nucleus.
- Associative relationships that represent, for example, the functions, processes a concept has
or is involved in, and other properties of the concept - Protein hasFunction Receptor, Protein
isAssociatedWithProcess Transcription and Protein hasOrganismClassification Species.
- Many other types of relationships exist, such as `causative' relationships, that are
described in [7,8].
The relations, like concepts, can be organised into taxonomies. For example,
hasName can be subdivided into hasGeneName, hasProteinName and hasDiseaseName. Relations also have properties that capture further knowledge about the relationships between
concepts. These include, but are not restricted to:
- whether it is universally necessary that a relationship must hold on a concept.
For example, when describing a protein database, we might want to say that Protein
hasAccessionNumber AccessionNumber holds universally, i.e., for all proteins.
- whether a relationship can optionally hold on a concept, for example,
we might want to describe that Enzyme hascofactor Cofactor
only describes the possibility that enzymes have a cofactor, as not all enzymes do
have a cofactor.
- whether the concept a relationship links to is restricted to certain kinds of
concepts. For example, Protein hasFunction Receptor restricts the hasFunction
relation to only link to concepts that are kinds of receptors. Protein hasFunction
says that Protein has a function but does not restrict as to what kind of concept
the function might be.
- the cardinality of the relationship. For example, a particular AccessionNumber
is the accession number of only one Protein, but one Chromosome
may have many Genes.
- whether the relationship is transitive, for example if Protein
isAssociatedWithProcess Transcription and Transcription isAssociatedWithProcess GeneExpression,
then Protein isAssociatedWithProcess GeneExpression. The taxonomy relations always have this property.
Once this conceptualisation has been made concrete (see Section 5)
an ontology has been produced.
Instances are the `things' represented by a concept - a human cytochrome C is an
instance of the concept Protein. Strictly speaking, an ontology should not contain any instances, because it is supposed to be a conceptualisation of the domain. The combination of an ontology with associated instances is what is known as a knowledge base.
However, deciding whether something is a
concept of an instance is difficult, and often depends on the application [9]. For
example, Atom is a concept and `potassium' is an instance of that
concept. It could be argued that Potassium is a concept representing
the different instances of potassium and its isotopes etc. This is a well known and open question in knowledge management research.
Finally, axioms are used to constrain values for classes or instances. In this sense the
properties of relations are kinds of axioms. Axioms also, however, include more general rules, such as
nucleic acids shorter than 20 residues are oligonucleiotides.
Next: Applications and Types of
Up: Ontology-based Knowledge Representation for
Previous: Introduction
Robert Stevens
2001-07-19