Hammer and McCleod describe semantic heterogeneity "By this [semantic heterogeneity] we mean variations in the manner in which data is specified and structured in different components. Semantic heterogeneity is a natural consequence of the independent creation and evolution of autonomous databases which are tailored to the requirements of the application system they serve." Won Kim expands this description of semantic heterogeneiety in databases in the follwoing way: "A schema contains a semantic descriptionof the information in a given database. It is possible to define equivalent schemas in as many ways as there are data models. Further, the same (or similar) information can be represented in many ways in the same data model. Given such inter- and intra-model variability, it is indeed a formidable task to integrate many schemas into a homogeneous schema."
a schema is the organization or structure for a database. Different people will structure or organise the same data or system in different ways. In addition to this human element are the differing capibilities of the modelling languages used to represents the modeller's view of the data. An ER schema, for instance, has neither inheritance nor collection types in its modelling repetoire, both of which are present in object oreintated modelling languages. this factor leads to further differences between schemata. Hammer and McCleod say that in the database context, this heterogeneity refers to differences in the meaning and use of data that make it difficult to identify the various relationships that exist between similar or related objects in the schema of different component databases. It is the identification and resolution of these differences that the RASH process seeks to accomplish.
Hammer and McCleod enumerate a corresponding set of causes for semantic heterogeneity as Kim:
So, semantic heterogeneiety deals with how data is represented in structural, organisational terms within a database. So, it caqptures whether some data is represented as an entity in one schema, but only as an attribute of an entity in another. This definition extends as far as what data types are used (integer, real or string, for example), as well as units used (centimetres or inches) and precision (two or four decimal places; mark or grade in an exam). Semantic heterogeneity does not extend as far as the instances placed within the schemata of different databases. this description of semantic heterogeneiety does not extend to the fact that SWISS-PROT entry P21598 is equivalent (not identical) to PIR entry S13142. Separate techniques are required to resolve these instance conflicts.
A more grey area is encountered when considering the SWISS-PROT keyword `loop' in this entry is the same as the PIR keyword `p-loop' or that the SWISS-PROT feature key `DISULFID' is equivalent to the term `disulfide' in PIR. It is easier to reconcile two large collections of keywords as a separate exercise from the data organisation reconciliation. Sometimes, a value in one DB can correspond to an attribute or entity in another -- For example, some of the feature key values from SWISS-PROT map onto record names in the feature table of PIR. These cases can be classified to their type of semantic heterogeneity, if the value is assumed to be an attribute. the boundary between data and metadata is somewhat blurred and can depend upon perspective. Tackling this problem is a feature of the RASH schema comparison process
This classification of the semantic heterogeneities existing in database schema has been taken from Won Kim. it has been adapted, by Won Kim, from an earlier, purely relational form to one that accommodates an object view. Here, the word entity is used for both class and entity or table. Similarly, attribute is used for both field, attribute and method. The classification is as follows: