RASH, which started in February 2000, aims to address the problem of semantic heterogeneity in bioinformatics resources. The Biological community is a distributed one, with a culture of sharing information. This network of information services forms a loose federation of autonomous, distributed, heterogeneous data repositories. These repositories are typically not databases, but proprietary file structures, with their associated search facilities and analysis tools. Often, the schemas (or metadata) of such databases are either implicit or not easily available, so it is difficult to determine exactly the type of conceptual information captured within specific data records.
Similarly, many resources overlap in their content, but may vary considerably on the view that is taken of that content. This can make it difficult to integrate resources in sensible and structured ways. Moreover, the metadata of the sources changes frequently. Thus, a biologist may encounter problems in using these resources, such as:
Many bioinformaticians spend considerable amounts of time, repetitiously integrating these diverse resources. It is therefore clear that database heterogeneity is a major problem in bioinformatics. There are broadly two types of heterogeneity problems:
creation of the common global view, needed for integration and interoperation, of the sources requires the explicit elicitation of semantic heterogeneities and proposed reconciliations. Rather than individuals and groups within the community repetitiously ploughing the reconciliation furrow, the community needs access to information on semantic heterogeneity, a systematic and replicable way of identifying and reconciling such heterogeneity to give a resource that covers a wide range of the resources available in a sharable manner.
This project has two major objectives:
The objectives required in order to achieve these aims break down into three major areas: the development of reconciliation methodologies; the construction of source and unifying schemas, and the development of a software tool to allow the management and sharing of this information.
As the RASH process is developed, the details of that process and the case studies supporting the process will become available via this site.