RASHdb: A Metadata Database

Metadata is usually defined as data about data. For a database, metadata is the databases' schema. the schema describes a view of the data held within the database. In a schema, these are the entities, the attributes and their types that make those entities and the relationships held between the entities.

Database metadata can also include such information as:

The RASH database RASHdb is a database of bioinformatics resource schema and the related metadata described above. The information stored in RASHdb can be used to answer the queries in the BioCompass.

RASHdb itself needs a schema that describes the elements within a database schema and the other information described above. So, if metadata is a model of the data in a database, then metadata that itself describes metadata is a metamodel. There are three levels to this metamodel:

  1. The data of the resources -- For example, protein sequences and their annotations;
  2. A database schema, which is a model of the data -- A model of protein sequences and their annotations (the data that appears in level one);
  3. The RASHdb database schema, which is a metamodel that describes the models of data used in databases. Its data is the data that appears in level two.

RASHdb describes the elements of a database schema: Entities and their properties (attributes and relationships); the domain of attributes and the facets of relationships. It also describes the provenence of those elements -- where in the original resource that conceptualisation arose and where it is described in the documentation. The RASHdb schema also contains information on resource location, version, dates and creator of the schema.

The RASHdb metamodel is a schema represented in the information modelling language EXPRESS. This is an object flavoured language that can capture the elements of data representation in relational and object orientated databases, as well as those schema implicit in flat-file format databanks. It is the recommended representational form for the RASH process and use in the BioCompass. The RASHdb schema is itself represented in EXPRESS and automatically transformed to the data definition language statements to form the RASHdb. The schema has four principal components:

  1. The RASH model of database schema (An EER diagram for IE5 only); This part of the RASH metamodel describes the elements that can be present in a schema and the relationships between those schema elements. An EXPRESS schema called rash_model is available for this part of the metamodel.
  2. RASHdb records metadata about schema entered during the RASH process -- creator, date, location, etc. An EXPRESS schema called rash_container describes this part of the RASH metamodel.
  3. A schema describing a resource's management (version, documentation, etc.); These are the materials needed to describe, explain, find, etc. the resource described in the schema. An EXPRESS schema called rash_management describes this part of the RASH metamodel.
  4. A schema describing the intention of metamodel elements; An EXPRESS schema called rash_intention describes this part of the RASH metamodel. This is a small schema, creating a place for holding controlled vocabulary items describing the data or biological intention of resources, schema elements, documentation, etc.
  5. a schema describing interschema relationships. An EXPRESS schema called rash_inter_schema describes this part of the RASH metamodel. It describes the heterogeneity existing between schema objects (if any), according to the classificaiton of Won Kim. A description of the heterogeneiety in natural language can be held, together with information on how to resolve the conflict.

Having populated the RASHdb with schema and resource data, it can be used by the biocompass to answer resource queries and act as a guide to bioinformatics resources.