This document suggests a formalisation (using BNF) of the OBO Flat File Format Specification, version 1.2, and a semantics for (part of) the language defined via a translation to the OWL DL abstract syntax (which is a decidable fragment of FOL corresponding to the Description Logic SHOIN).
This is an editor's draft, for comment by the community.
Comments should be sent to obo-format@lists.sourceforge.net (archive) to ensure wide visibility.
This document suggests a formalisation (using BNF) of the OBO Flat File Format Specification, version 1.2 (which we will refer to from now on as OBO), and a semantics for (part of) the language defined via a translation to the OWL DL abstract syntax (which is a decidable fragment of FOL corresponding to the Description Logic SHOIN). The objectives are:
This section suggests a BNF style grammar for OBO. This helps to make the structure a little more precise and allows any ambiguities and/or misunderstandings to be clarified. It is also allows the grammar to be automatically checked (e.g., for ambiguities) and a parser to be generated.
The grammar is presented in the standard BNF notation. Nonterminal symbols are denoted in bold (e.g., stanza), terminal symbols are written in single quotes (e.g. 'Term', zero or more instances of a symbol is denoted with curly braces (e.g., { stanza }), alternative productions are denoted with the vertical bar (e.g., term-stanza | typedef-stanza), and zero ore one instances of a symbol are denoted with square brackets (e.g., [ 'is_anonymous: true' ]).
Several simplifications have been made in the following grammar: details relating to XML syntax have been ignored, and details of some of the annotation tags are omitted. Angle brackets are used to denote parts of the specification which are nonterminal, but which are not further defined here (e.g., < saved-by >).
The OBO specification did not seem to be completely clear with respect to some details of syntax and semantics. Some of these have been resolved by communications from the OBO developers, in particular Chris Mungall:
An OBO file consists of a header followed by zero or more stanzas.
OBO-Doc := header { stanza }
The header consists of a number of tag-value pairs, most of which we will ignore for the time being. Many of these (e.g., <remark> could clearly be treated as annotations; others (e.g., <default-namespace> correspond to parts of an XML document preamble.
header :=
<format-version>
[ <data-version> ]
[ <date> ]
[ <saved-by> ]
[ <auto-generated-by> ]
[ <subsetdef> ]
{ import }
{ <synonymtypedef> }
{ <idspace> }
[ <default-relationship-id> ]
{ <idmapping> }
[ <remark> ]
import := 'import:' <URL>
stanza := term-stanza | typedef-stanza | instance-stanza
Term stanzas introduce and define the meaning of terms (AKA concepts, classes and unary predicates).
term-stanza :=
'[Term]'
termid-TVP
'name:'<string>
[ <namespace> ]
{ <alt_id> }
[ <def> ]
[ <comment> ]
{ <subset> }
{ <synonym> }
{ <xref> }
{ isa-TVP }
{ intersection-TVP }
{ union-TVP }
{ disjoint-TVP }
{ relationship-TVP }
[ <is_obsolete> ]
[ <replaced_by> ]
{ <consider> }
termid-TVP :=
'id:' term-id
[ 'is_anonymous: true' ]
term-id := <string>
isa-TVP :=
'is_a:' term-id
[ 'namespace=' <namespace-id> ]
[ 'derived=true' | 'derived=false' ]
intersection-TVP :=
'intersection_of:' termOrRestr
[ 'namespace=' <namespace-id> ]
termOrRestr := term-id | restriction
restriction := relationship-id term-id
relationship-id := <string>
union-TVP :=
'union_of:' termOrRestr
[ 'namespace=' <namespace-id> ]
disjoint-TVP :=
'disjoint_from:' term-id
[ 'namespace=' <namespace-id> ]
[ 'derived=true' | 'derived=false' ]
relationship-TVP :=
'relationship:' restriction
[ 'not_necessary=true' | 'not_necessary=false' ]
[ 'inverse_necessary=true' | 'inverse_necessary=false' ]
[ 'cardinality=' <non-neg-int> ]
[ 'maxCardinality=' <non-neg-int> ]
[ 'minCardinality=' <non-neg-int> ]
Typedef stanzas introduce and define the meaning of relations (AKA roles, properties and binary predicates).
typedef-stanza :=
'[Typedef]'
typedef-TVP
'name:'<string>
[ <namespace> ]
{ <alt_id> }
[ <def> ]
[ <comment> ]
{ <subset> }
{ <synonym> }
{ <xref> }
[ domain-TVP ]
[ range-TVP ]
{ meta-property-TVP }
{ r-isa-TVP }
{ r-intersection-TVP }
[ inverse-TVP ]
[ transover-TVP ]
{ relationship-TVP }
[ 'is_metadata_tag:true' | 'is_metadata_tag:false' ]
[ <is_obsolete> ]
[ <replaced_by> ]
{ <consider> }
typedefid-TVP :=
'id:' relationship-id
[ 'is_anonymous: true' ]
domain-TVP := 'domain:' termOrReserved
termOrReserved := term-id | <reserved-id>
range-TVP := 'range:' termOrReserved
meta-property-TVP :=
'is_anti_symmetric:true' | 'is_anti_symmetric:false' |
'is_cyclic:true' | 'is_cyclic:false' |
'is_reflexive:true' | 'is_reflexive:false' |
'is_symmetric:true' | 'is_symmetric:false' |
'is_transitive:true' | 'is_transitive:false'
r-isa-TVP := 'is_a:' relationship-id [ isa-mlist ]
isa-mlist := '{' isa-modifier { ',' isa-modifier } '}'
isa-modifier := namespace-mod | derived-mod
namespace-mod := 'namespace=' namespace-id
derived-mod := 'derived=true' | 'derived=false'
r-intersection-TVP :=
'intersection_of:' relationship-id
[ 'namespace=' <namespace-id> ]
inverse-TVP := 'inverse:' relationship-id
transover-TVP := 'transitive_over:' relationship-id
Instance stanzas introduce and define the meaning of instances (AKA individuals, individual names, constants).
instance-stanza :=
'[Instance]'
instanceid-TVP
'name:'<string>
[ <namespace>]
{ <alt_id> }
[ <comment> ]
{ <synonym> }
{ <xref> }
'instance_of:' term-id { 'instance_of:' term-id }
{ p-obj-value-TVP | p-data-value-TVP }
[ <is_obsolete> ]
[ <replaced_by> ]
{ <consider> }
instanceid-TVP :=
'id:' instance-id
[ 'is_anonymous: true' ]
p-obj-value-TVP := 'property_value:' relationship-id instance-id
p-data-value-TVP := 'property_value:' relationship-id '"' <string> '"' <XML-Schema-datatype>
This section defines the semantics of (a large subset of) OBO via a mapping to OWL DL. The mapping could also be used to specify a translation procedure and/or an interface to OWL tools (such as OWL reasoners).
A number of simplifying assumptions are made in order to improve readability:
The translation is defined using a translation function T which translates (a fragment of) OBO into OWL DL. The definition of T is often recursive, but it will eventually "ground out" in (a fragment of) OWL DL.