This document suggests a formalisation (using BNF) of the OBO Flat File Format Specification, version 1.2, and a semantics for (part of) the language defined via a translation to the OWL DL abstract syntax (which is a decidable fragment of FOL corresponding to the Description Logic SHOIN).
1 Introduction
This document suggests a formalisation (using BNF) of the OBO Flat
File Format Specification, version 1.2 (which we will refer to
from now on as OBO), and a semantics for (part of) the language
defined via a translation to the OWL DL abstract syntax (which is a
decidable fragment of FOL corresponding to the Description Logic
SHOIN). The objectives are:
- To identify and help to resolve any ambiguities in OBO.
- To clarify/define the semantics of OBO via a mapping to OWL DL.
2 OBO Syntax
This section suggests a BNF style grammar for OBO. This helps
to make the structure a little more precise and allows any ambiguities
and/or misunderstandings to be clarified. It is also allows the grammar
to be automatically checked (e.g., for ambiguities) and a parser to be
generated.
The grammar is presented in the standard BNF notation. Nonterminal
symbols are denoted in bold (e.g., stanza), terminal symbols are written in
single quotes (e.g. 'Term', zero or more
instances of a symbol is denoted with curly braces (e.g., { stanza }), alternative productions are denoted with the
vertical bar (e.g., term-stanza | typedef-stanza), and zero ore one instances
of a symbol are denoted with square brackets (e.g., [ 'is_anonymous: true' ]).
Several simplifications have been made in the following grammar:
details relating to XML syntax have been ignored, and details of some
of the annotation tags are omitted. Angle brackets are used to denote
parts of the specification which are nonterminal, but which are not
further defined here (e.g., < saved-by
>).
The OBO specification did not seem to be completely clear with
respect to some details of syntax and semantics. Some of these have been resolved by communications from the OBO developers, in particular Chris Mungall:
- An OBO file may include zero stanzas (e.g., a file of standard idspaces in obo format).
- In OBO, name is purely documentary (like
RDF labels), and the id is what is used
to refer to a term in other stanzas (although it is strongly encouraged,
and may even be enforced,
that names uniquely identify a class, relation or instance within
an ontology or OBO namespace).
- The sets of Typedef (relationship) IDs, Term IDs and
Instance IDs must be pairwise disjoint.
- The minimum and maximum number of occurrences of each tag type in a given stanza has now been explicated. For example, an OBO Typedef stanza may contain at most one of each of the following tags: range, domain, inverse, transover and comment.
- OBO may adopt an OWL style import semantics in order to avoid potential problems with the existing "append document at parse
time" semantics in case of cyclical
imports statements.
- The not_necessary and inverse_necessary modifiers are likely to be deprecated, with their meaning being encoded in the semantics of relevant relations. Future versions of OBO may allow existential and universally quantified relationships to be distinguished.
- The relationship tag may be banned in Typedef stanzas as its meaning there is not clear.
A few points remain to be clarified:
- The derived modifier could lead to
some strange situations. E.g., there could be two facts such that
either can be derived, but not both. I isn't clear what the status of an
OBO ontology would be if derived modifiers are "inconsistent". (This will be clarified in future versions of OBO.)
- Is there any restriction on mixing of datatypes and classes,
e.g., in intersection and union?
- Are cardinality tags meant to signify a qualified or unqualified restriction?
- Are OBO built-in objects semantically meaningful? They are
potentially problematical (higher order) and don't seem obviously
useful.
- Do the "false" cases of the relation qualifiers (is_symmetric etc.) signify "not necessarily
true" or "necessarily false"?
- What is meant by "If a particular
relationship tag has a "true" trailing qualifier then the inverse
applies at the term level."?
2.1 OBO File Structure
An OBO file consists of a header followed by zero or more stanzas.
OBO-Doc := header
{ stanza }
2.2 OBO Header
The header consists of a number of tag-value pairs, most of which
we will ignore for the time being. Many of these (e.g., <remark>
could clearly be treated as
annotations; others (e.g., <default-namespace>
correspond to parts of
an XML document preamble.
header :=
<format-version>
[ <data-version> ]
[ <date> ]
[ <saved-by> ]
[ <auto-generated-by> ]
[ <subsetdef> ]
{ import }
{ <synonymtypedef> }
{ <idspace> }
[ <default-relationship-id> ]
{ <idmapping> }
[ <remark> ]
import := 'import:' <URL>
stanza := term-stanza | typedef-stanza | instance-stanza
2.3 Term Stanzas
Term stanzas introduce and define the meaning of terms (AKA
concepts, classes and unary predicates).
term-stanza :=
'[Term]'
termid-TVP
'name:'<string>
[ <namespace> ]
{ <alt_id> }
[ <def> ]
[ <comment> ]
{ <subset> }
{ <synonym> }
{ <xref> }
{ isa-TVP }
{ intersection-TVP }
{ union-TVP }
{ disjoint-TVP }
{ relationship-TVP }
[ <is_obsolete> ]
[ <replaced_by> ]
{ <consider> }
termid-TVP :=
'id:' term-id
[ 'is_anonymous: true' ]
term-id := <string>
isa-TVP :=
'is_a:' term-id
[ 'namespace=' <namespace-id> ]
[ 'derived=true' | 'derived=false' ]
intersection-TVP :=
'intersection_of:' termOrRestr
[ 'namespace=' <namespace-id> ]
termOrRestr := term-id | restriction
restriction := relationship-id term-id
relationship-id := <string>
union-TVP :=
'union_of:' termOrRestr
[ 'namespace=' <namespace-id> ]
disjoint-TVP :=
'disjoint_from:' term-id
[ 'namespace=' <namespace-id> ]
[ 'derived=true' | 'derived=false' ]
relationship-TVP :=
'relationship:' restriction
[ 'not_necessary=true' | 'not_necessary=false' ]
[ 'inverse_necessary=true' | 'inverse_necessary=false' ]
[ 'cardinality=' <non-neg-int> ]
[ 'maxCardinality=' <non-neg-int> ]
[ 'minCardinality=' <non-neg-int> ]
2.4 Typedef Stanzas
Typedef stanzas introduce and define the meaning of
relations (AKA roles, properties and binary predicates).
typedef-stanza :=
'[Typedef]'
typedef-TVP
'name:'<string>
[ <namespace> ]
{ <alt_id> }
[ <def> ]
[ <comment> ]
{ <subset> }
{ <synonym> }
{ <xref> }
[ domain-TVP ]
[ range-TVP ]
{ meta-property-TVP }
{ r-isa-TVP }
[ inverse-TVP ]
[ transover-TVP ]
{ relationship-TVP }
[ 'is_metadata_tag:true' | 'is_metadata_tag:false' ]
[ <is_obsolete> ]
[ <replaced_by> ]
{ <consider> }
typedefid-TVP :=
'id:' relationship-id
[ 'is_anonymous: true' ]
domain-TVP := 'domain:' termOrReserved
termOrReserved := term-id | <reserved-id>
range-TVP := 'range:' termOrReserved
meta-property-TVP :=
'is_anti_symmetric:true' | 'is_anti_symmetric:false' |
'is_cyclic:true' | 'is_cyclic:false' |
'is_reflexive:true' | 'is_reflexive:false' |
'is_symmetric:true' | 'is_symmetric:false' |
'is_transitive:true' | 'is_transitive:false'
r-isa-TVP := 'is_a:' relationship-id [ isa-mlist ]
isa-mlist := '{' isa-modifier { ',' isa-modifier } '}'
isa-modifier := namespace-mod | derived-mod
namespace-mod := 'namespace=' namespace-id
derived-mod := 'derived=true' | 'derived=false'
inverse-TVP := 'inverse:' relationship-id
transover-TVP := 'transitive_over:' relationship-id
2.5 Instance Stanzas
Instance stanzas introduce and define the meaning of
instances (AKA individuals, individual names, constants).
instance-stanza :=
'[Instance]'
instanceid-TVP
'name:'<string>
[ <namespace>]
{ <alt_id> }
[ <comment> ]
{ <synonym> }
{ <xref> }
'instance_of:' term-id { 'instance_of:' term-id }
{ p-obj-value-TVP | p-data-value-TVP }
[ <is_obsolete> ]
[ <replaced_by> ]
{ <consider> }
instanceid-TVP :=
'id:' instance-id
[ 'is_anonymous: true' ]
p-obj-value-TVP := 'property_value:' relationship-id instance-id
p-data-value-TVP := 'property_value:' relationship-id '"' <string> '"' <XML-Schema-datatype>
3 OBO Semantics
This section defines the semantics of (a large subset of) OBO via
mappings to OWL DL and OWL 1.1. The mappings could also be used to specify a
translation procedure and/or an interface to OWL tools (such as OWL
reasoners).
3.1 Mapping to OWL DL
A number of simplifying assumptions are made in order to improve
readability:
- Syntax related declarations (namespace, etc.) and "annotations" (comment, etc.) are largely ignored.
- OBO identifiers/names are translated "as is".
- Only cardinality relationship modifiers are considered, and these are treaded as unqualified (although they could be translated into qualified cardinality restrictions in OWL 1.1).
- Reflexive, antisymmetric and transitive_over relation qualifying tags are treated as annotations in OWL (although they could be translated into the relevant property axioms in OWL 1.1).
- The cyclic relation qualifying tag is treated as an annotation in OWL.
- The "false" cases of relation qualifying tags are ignored (although they could be translated into OWL "facts" that assert counter-examples, if this is deemed to be appropriate).
- It is assumed that relations will be translated into OWL datatype properties if either the Typedef stanza specifies a range that is an XML datatype or if some super relation was translated into a datatype property.
- Relationships in Typedef stanzas are ignored due to uncertainty as to the intended semantics.
The translation is defined using a translation function T which translates (a fragment of) OBO into OWL
DL. The definition of T is often recursive,
but it will eventually "ground out" in (a fragment of) OWL DL.
Translation to OWL
OBO syntax (fragment) - S |
Translation - T(S) |
header stanza_1 ... stanza_n |
T(header) T(stanza_1) ... T(stanza_n) |
| |
[Term]
id:term-id
name:name-string
isa:isa-term_1
:
isa:isa-term_i
intersection_of:int-termOrRestr_1
:
intersection_of:int-termOrRestr_j
union-of:union-termOrRestr_1
:
union-of:union-termOrRestr_k
disjoint_from:disjoint-term_1
:
disjoint_from:disjoint-term_m
relationship-TVP_1
:
relationship-TVP_n |
Class(term-id partial
annotation(label name-string)
isa-term_1
:
isa-term_i
EquivalentClasses(term-id
intersectionOf(T(intersection_of:int-termOrRestr_1)
... T(intersection_of:int-termOrRestr_j)))
EquivalentClasses(term-id
unionOf(T(union-of:union-termOrRestr_1)
...T(union-of:union-termOrRestr_k)))
DisjointClasses(term-id disjoint-term_1)
:
DisjointClasses(term-id disjoint-term_n)
SubClassOf(term-id T(relationship-TVP_1))
:
SubClassOf(term-id T(relationship-TVP_n))
|
intersection_of:term-id | term-id |
intersection_of:relationship-id term-id |
restriction(relationship-id someValuesFrom(term-id)) |
union_of:term-id |
term-id |
union_of:relationship-id term-id |
restriction(relationship-id someValuesFrom(term-id)) |
relationship: relationship-id term-id |
restriction(relationship-id someValuesFrom(term-id)) |
relationship: relationship-id term-id
cardinality=card |
restriction(relationship-id cardinality(card)) |
relationship: relationship-id term-id
maxCardinality=max |
restriction(relationship-id maxCardinality(card)) |
relationship: relationship-id term-id
minCardinality=min |
restriction(relationship-id minCardinality(card)) |
| |
[Typedef]
id:relationship-id
name:name-string
domain-TVP
range-TVP
meta-property-TVP_1 ... meta-property-TVP_k
r-isa-TVP_1 ... r-isa-TVP_m
inverse-TVP
transover-TVP
|
ObjectProperty(relationship-id annotation(label name-string)
T(transover-TVP)
T(r-isa-TVP_1) ... T(r-isa-TVP_m)
T(meta-property-TVP_1) ... T(meta-property-TVP_k)
T(inverse-TVP)
T(domain-TVP)
T(range-TVP)
|
transitive_over:relationship-id |
annotation(transitive_over:relationship-id) |
r-isa:relationship-id |
super(relationship-id) |
is_anti_symmetric:true |
annotation(is_anti_symmetric:true) |
is_anti_symmetric:false |
annotation(is_anti_symmetric:false) |
is_cyclic:true |
annotation(is_cyclic:true) |
is_cyclic:false |
annotation(is_cyclic:false) |
is_reflexive:true |
annotation(is_reflexive:true) |
is_reflexive:false |
annotation(is_reflexive:false) |
is_symmetric:true |
Symmetric |
is_symmetric:false |
annotation(is_symmetric:false) |
is_transitive:true |
Transitive |
is_transitive:false |
annotation(is_transitive:false) |
inverse:relationship-id |
inverseOf(relationship-id) |
domain:term-id |
domain(term-id) |
range:term-id |
range(term-id) |
| |
[Instance]
id:instance-id
name:name-string
instance_of_1 ... instance_of_i
p-obj-value-TVP_1 ... p-obj-value-TVP_j
p-data-value-TVP_1 ... p-data-value-TVP_k |
Individual(instance-id annotation(label name-string)
T(instance_of_1) ... T(instance_of_i)
T(p-obj-value-TVP_1) ... T(p-obj-value-TVP_j)
T(p-data-value-TVP_1) ... T(p-data-value-TVP_k))
|
instance_of:term-id |
type(term-id) |
property_value:relationship-id instance-id |
value(relationship-id instance-id) |
property_value:relationship-id "val" datatype-id |
value(relationship-id val^^datatype-id) |
3.2 Mapping to OWL 1.1
This section defines the semantics of (an even larger subset of) OBO via a
mapping to OWL 1.1.
In addition to the simplifying assumptions mentioned above, it should be noted that:
- Annotations are added using declarations, because they annotate entities (classes, properties and individuals) rather than the axioms used to make assertions about entities (OWL 1.1 allows for both kinds of annotation).
- Qualified cardinality restrictions are handled correctly in OWL 1.1.
- According to the translation, OBO relations are translated as OWL object properties; it is assumed that, in practice, OBO relations would be translated into OWL datatype properties if either the Typedef stanza specifies a range that is an XML datatype or if some super relation was translated into a datatype property.
- Reflexive, antisymmetric and transitive_over relation qualifying tags are handled correctly in OWL 1.1.
Translation to OWL 1.1
OBO syntax (fragment) - S |
Translation - T(S) |
header stanza_1 ... stanza_n |
T(header) T(stanza_1) ... T(stanza_n) |
| |
[Term]
id:term-id
name:name-string
isa:isa-term_1
:
isa:isa-term_i
intersection_of:int-termOrRestr_1
:
intersection_of:int-termOrRestr_j
union-of:union-termOrRestr_1
:
union-of:union-termOrRestr_k
disjoint_from:disjoint-term_1
:
disjoint_from:disjoint-term_m
relationship-TVP_1
:
relationship-TVP_n |
Declaration(OWLClass(term-id))
EntityAnnotation(OWLClass(term-id) Label(name-string))
SubClassOf(term-id isa-term_1)
:
SubClassOf(term-id isa-term_i)
EquivalentClasses(term-id
ObjectIntersectionOf(T(intersection_of:int-termOrRestr_1)
... T(intersection_of:int-termOrRestr_j)))
EquivalentClasses(term-id
ObjectUnionOf(T(union-of:union-termOrRestr_1)
... T(union-of:union-termOrRestr_k)))
DisjointClasses(term-id disjoint-term_1)
:
DisjointClasses(term-id disjoint-term_m)
SubClassOf(term-id T(relationship-TVP_1))
:
SubClassOf(term-id T(relationship-TVP_n))
|
intersection_of:term-id | term-id |
intersection_of:relationship-id term-id |
ObjectSomeValuesFrom(relationship-id term-id) |
union_of:term-id |
term-id |
union_of:relationship-id term-id |
ObjectSomeValuesFrom(relationship-id term-id) |
relationship:relationship-id term-id |
ObjectSomeValuesFrom(relationship-id term-id) |
relationship:relationship-id term-id
cardinality=card |
ObjectExactCardinality(card relationship-id term-id) |
relationship:relationship-id term-id
maxCardinality=max |
ObjectMaxCardinality(max relationship-id term-id) |
relationship:relationship-id term-id
minCardinality=min |
ObjectMinCardinality(min relationship-id term-id) |
| |
[Typedef]
id:relationship-id
name:name-string
domain:domain-id
range:range-id
meta-property-TVP_1
:
meta-property-TVP_k
r-isa:isa-id_1
:
r-isa:isa-id_m
inverse:inv-id
transitive_over:tr-over-id
|
Declaration(ObjectProperty(relationship-id))
EntityAnnotation(ObjectProperty(relationship-id) Label(name-string))
ObjectPropertyDomain(relationship-id domain-id)
ObjectPropertyRange(relationship-id range-id)
T(relationship-id meta-property-TVP_1)
:
T(relationship-id meta-property-TVP_k)
SubObjectPropertyOf(relationship-id isa-id_1)
:
SubObjectPropertyOf(relationship-id isa-id_m)
InverseObjectProperties(relationship-id inv-id)
SubObjectPropertyOf(
SubObjectPropertyChain(relationship-id tr-over-id)
relationship-id)
|
relationship-id is_anti_symmetric:true |
AntisymmetricObjectProperty(relationship-id) |
relationship-id is_anti_symmetric:false |
Declaration(ObjectProperty(relationship-id)
comment(is_anti_symmetric:false)) |
relationship-id is_cyclic:true |
Declaration(ObjectProperty(relationship-id)
comment(is_cyclic:true)) |
relationship-id is_cyclic:false |
Declaration(ObjectProperty(relationship-id)
comment(is_cyclic:false)) |
relationship-id is_reflexive:true |
ReflexiveObjectProperty(relationship-id) |
relationship-id is_reflexive:false |
Declaration(ObjectProperty(relationship-id)
comment(is_reflexive:false)) |
relationship-id is_symmetric:true |
SymmetricObjectProperty(relationship-id) |
relationship-id is_symmetric:false |
Declaration(ObjectProperty(relationship-id)
comment(is_symmetric:false)) |
relationship-id is_transitive:true |
TransitiveObjectProperty(relationship-id) |
relationship-id is_transitive:false |
Declaration(ObjectProperty(relationship-id)
comment(is_transitive:false)) |
| |
[Instance]
id:instance-id
name:name-string
instance_of_1
:
instance_of_i
property_value:obj-r_1 instance-id_1
:
property_value:obj-r_j instance-id_j
property_value:data-r_1 "val_1" data-t_1
:
property_value:data-r_k "val_k" data-t_k |
Declaration(Individual(instance-id))
EntitiyAnnotation(Individual(instance-id) Label(name-string)))
ClassAssertion(instance-id instance_of_1)
:
ClassAssertion(instance-id instance_of_i)
ObjectPropertyAssertion(obj-r_1 instance-id instance-id_1)
:
ObjectPropertyAssertion(obj-r_j instance-id instance-id_j)
DataPropertyAssertion(data-r_1 instance-id val_1^^data-t_1)
:
DataPropertyAssertion(data-r_k instance-id val_k^^data-t_k)
|