The RASHdb Metamodel

The RASHdb is a database of schema of bioinformatics resources; the resources they model; their documentation and relationships between schema of different resources. A database schema or metamodel is a model of the data contained within the database. So, a schema for SWISS-PROT models the data contained within the SWISS-PROT database. The data held within RASHdb are the objects that make up a resource's schema. That is, the entities, attributes, attribute domains or types and the relationships between entities. Thus, the RASHdb schema is a model of database schema or models -- a metamodel or model of models.

The RASHdb schema describes four categories of information:

  1. The RASH Container simply captures information about the model being created, the creator, the date of creation, the version of the schema, ... It is metadata about the modelling task being described. the RASH container is based upon the Dublin Core Model. it will allow questions such as "find me schema created by Robert Stevens" or "find me models created since 11 Feb 2002".
  2. The RASH resource schema describes the name of the resource being modelled; the version numbers of the resource; its owners; its locations; the documentation that describes that resource. the objects in this schema will allow such questions as "what resources does RASHdb describe?", "who owns resource x?", "what documentation describes schema x?" and "where is schema x to be found?". This schema also talks about fragments of the resource and its documentation. This allows questions to be asked about "which schema objects describe fragment x of resource y?" -- i.e., find the schema objects that model the OS line of SWISS-PROT.
  3. The RASH Schema schema is the heart of the RASH metamodel -- it describes the content and structure of database schema. In brief, a schema is composed of many schema objects, each of which has a name. A schema contains many kinds of schema objects: A schema contains entities and these have properties; Properties themselves are either attributes or relationships; attributes have a type or domain, including standard types, glossaries and derived values. The relationship property describes the relationship between two entities, which may be unary, binary or nery, it describes the type of relationship (kind of, partitive, associaative, ...). All properties record cardinality and other constraints upon the property.

    Many questions may be asked about schema or models of resources in RASHdb. A selection might be: "find me the entities described in SWISS-PROT version 40", "recover the schema objects for SWISS-PROT version 39", "what are the attributes of the species entity in SWISS-PROT?", "is there a relationship between sequence feature and literature citation in SWISS-PROT VERSION 40?", "show me the documentation and portion of original resource that describe the entity sequence in version 40 of SWISS-PROT. Many more questions are possible....

  4. The last part of the RASH metamodel is the intention schema. This simply allows other objects within the metamodel to be labelled with their intention. The names of objects, resources, etc. may be a distraction from their intention or purpose within the schema. SWISS-PROT and PIR contain much the same information. PIR, however, states its intention as being to highlight the evolutionary relationship between sequences in the databank, an intention not highlighted by SWISS-PROT. Similarly, both databank's entries have a keyword field and surface appearance would lead one to believe these fields have the same intention. Close examination of the documentation reveals that in the case of SWISS-PROT, the keyword field is intended to summarise the contents of the annotation; whereas in PIR it is intended as a catch all field for annotation that does not fit in a more specialised field.

RASH Controlled Vocabularies

The intention of objects in the RASHdb are recorded with a structured controlled bocabulary. Terms within the vocabulary are contained in a taxonomy of "is a kind of" link. At present, no other relationships between terms are used, but this controlled vocabulary may develop into a full-blown ontology in the future. The terminology is divided into two broad domains: Biological and data intention. The biological intention describes the content of individual objects and their biological purpose. For instance, a databank might have the overall genomic or proteomic purpose, but contain many other kinds of biological information. Data intention captures content and purpose of RASHdb objects from a data perspective -- this can be what the object is used for and the type of data it contains, e.g., units.

w