OBO Flat File Format Syntax and Semantics
and Mapping to OWL Web Ontology Language

Editor's Draft of 04 March 2007

This version:
http://www.cs.man.ac.uk/~horrocks/obo/syntax.html
Latest version:
http://purl.obolibrary.org/obo/oboformat/spec.html
Previous version:
http://www.cs.man.ac.uk/~horrocks/obo/syntax-20070126.html
Author:
http://www.cs.man.ac.uk/~horrocks, University of Manchester

Abstract

This document suggests a formalisation (using BNF) of the OBO Flat File Format Specification, version 1.2, and a semantics for (part of) the language defined via a translation to the OWL DL abstract syntax (which is a decidable fragment of FOL corresponding to the Description Logic SHOIN).

Status of this Document

This is an editor's draft, for comment by the community.

Comments should be sent to obo-format@lists.sourceforge.net (archive) to ensure wide visibility.


Table of Contents


1 Introduction

This document suggests a formalisation (using BNF) of the OBO Flat File Format Specification, version 1.2 (which we will refer to from now on as OBO), and a semantics for (part of) the language defined via a translation to the OWL DL abstract syntax (which is a decidable fragment of FOL corresponding to the Description Logic SHOIN). The objectives are:

2 OBO Syntax

This section suggests a BNF style grammar for OBO. This helps to make the structure a little more precise and allows any ambiguities and/or misunderstandings to be clarified. It is also allows the grammar to be automatically checked (e.g., for ambiguities) and a parser to be generated.

The grammar is presented in the standard BNF notation. Nonterminal symbols are denoted in bold (e.g., stanza), terminal symbols are written in single quotes (e.g. 'Term', zero or more instances of a symbol is denoted with curly braces (e.g., { stanza }), alternative productions are denoted with the vertical bar (e.g., term-stanza | typedef-stanza), and zero ore one instances of a symbol are denoted with square brackets (e.g., [ 'is_anonymous: true' ]).

Several simplifications have been made in the following grammar: details relating to XML syntax have been ignored, and details of some of the annotation tags are omitted. Angle brackets are used to denote parts of the specification which are nonterminal, but which are not further defined here (e.g., < saved-by >).

The OBO specification did not seem to be completely clear with respect to some details of syntax and semantics. Some of these have been resolved by communications from the OBO developers, in particular Chris Mungall:

A few points remain to be clarified:

2.1 OBO File Structure

An OBO file consists of a header followed by zero or more stanzas.

OBO-Doc := header { stanza }

2.2 OBO Header

The header consists of a number of tag-value pairs, most of which we will ignore for the time being. Many of these (e.g., <remark> could clearly be treated as annotations; others (e.g., <default-namespace> correspond to parts of an XML document preamble.

header :=
   <format-version>
   [ <data-version> ]
   [ <date> ]
   [ <saved-by> ]
   [ <auto-generated-by> ]
   [ <subsetdef> ]
   { import }
   { <synonymtypedef> }
   { <idspace> }
   [ <default-relationship-id> ]
   { <idmapping> }
   [ <remark> ]
import := 'import:' <URL>
stanza := term-stanza | typedef-stanza | instance-stanza

2.3 Term Stanzas

Term stanzas introduce and define the meaning of terms (AKA concepts, classes and unary predicates).

term-stanza :=
   '[Term]'
   termid-TVP
   'name:'<string>
   [ <namespace> ]
   { <alt_id> }
   [ <def> ]
   [ <comment> ]
   { <subset> }
   { <synonym> }
   { <xref> }
   { isa-TVP }
   { intersection-TVP }
   { union-TVP }
   { disjoint-TVP }
   { relationship-TVP }
   [ <is_obsolete> ]
   [ <replaced_by> ]
   { <consider> }

termid-TVP :=
   'id:' term-id
   [ 'is_anonymous: true' ]
term-id := <string>

isa-TVP :=
   'is_a:' term-id
   [ 'namespace=' <namespace-id> ]
   [ 'derived=true' | 'derived=false' ]

intersection-TVP :=
   'intersection_of:' termOrRestr
   [ 'namespace=' <namespace-id> ]
termOrRestr := term-id | restriction
restriction := relationship-id term-id
relationship-id := <string>

union-TVP :=
   'union_of:' termOrRestr
   [ 'namespace=' <namespace-id> ]

disjoint-TVP :=
   'disjoint_from:' term-id
   [ 'namespace=' <namespace-id> ]
   [ 'derived=true' | 'derived=false' ]

relationship-TVP :=
   'relationship:' restriction
   [ 'not_necessary=true' | 'not_necessary=false' ]
   [ 'inverse_necessary=true' | 'inverse_necessary=false' ]
   [ 'cardinality=' <non-neg-int> ]
   [ 'maxCardinality=' <non-neg-int> ]
   [ 'minCardinality=' <non-neg-int> ]

2.4 Typedef Stanzas

Typedef stanzas introduce and define the meaning of relations (AKA roles, properties and binary predicates).

typedef-stanza :=
   '[Typedef]'
   typedef-TVP
   'name:'<string>
   [ <namespace> ]
   { <alt_id> }
   [ <def> ]
   [ <comment> ]
   { <subset> }
   { <synonym> }
   { <xref> }
   [ domain-TVP ]
   [ range-TVP ]
   { meta-property-TVP }
   { r-isa-TVP }
   [ inverse-TVP ]
   [ transover-TVP ]
   { relationship-TVP }
   [ 'is_metadata_tag:true' | 'is_metadata_tag:false' ]
   [ <is_obsolete> ]
   [ <replaced_by> ]
   { <consider> }

typedefid-TVP :=
   'id:' relationship-id
   [ 'is_anonymous: true' ]

domain-TVP := 'domain:' termOrReserved
termOrReserved := term-id | <reserved-id>

range-TVP := 'range:' termOrReserved

meta-property-TVP :=
   'is_anti_symmetric:true' | 'is_anti_symmetric:false' |
   'is_cyclic:true' | 'is_cyclic:false' |
   'is_reflexive:true' | 'is_reflexive:false' |
   'is_symmetric:true' | 'is_symmetric:false' |
   'is_transitive:true' | 'is_transitive:false'

r-isa-TVP := 'is_a:' relationship-id [ isa-mlist ]
isa-mlist := '{' isa-modifier { ',' isa-modifier } '}'
isa-modifier := namespace-mod | derived-mod
namespace-mod := 'namespace=' namespace-id
derived-mod := 'derived=true' | 'derived=false'

inverse-TVP := 'inverse:' relationship-id

transover-TVP := 'transitive_over:' relationship-id

2.5 Instance Stanzas

Instance stanzas introduce and define the meaning of instances (AKA individuals, individual names, constants).

instance-stanza :=
   '[Instance]'
   instanceid-TVP
   'name:'<string>
   [ <namespace>]
   { <alt_id> }
   [ <comment> ]
   { <synonym> }
   { <xref> }
   'instance_of:' term-id { 'instance_of:' term-id }
   { p-obj-value-TVP | p-data-value-TVP }
   [ <is_obsolete> ]
   [ <replaced_by> ]
   { <consider> }

instanceid-TVP :=
   'id:' instance-id
   [ 'is_anonymous: true' ]

p-obj-value-TVP := 'property_value:' relationship-id instance-id

p-data-value-TVP := 'property_value:' relationship-id '"' <string> '"' <XML-Schema-datatype>

3 OBO Semantics

This section defines the semantics of (a large subset of) OBO via mappings to OWL DL and OWL 1.1. The mappings could also be used to specify a translation procedure and/or an interface to OWL tools (such as OWL reasoners).

3.1 Mapping to OWL DL

A number of simplifying assumptions are made in order to improve readability:

The translation is defined using a translation function T which translates (a fragment of) OBO into OWL DL. The definition of T is often recursive, but it will eventually "ground out" in (a fragment of) OWL DL.

Translation to OWL
OBO syntax (fragment) - S Translation - T(S)
header stanza_1 ... stanza_n T(header) T(stanza_1) ... T(stanza_n)
[Term]
      id:term-id
      name:name-string
      isa:isa-term_1
             :
      isa:isa-term_i
      intersection_of:int-termOrRestr_1
             :
      intersection_of:int-termOrRestr_j
      union-of:union-termOrRestr_1
             :
      union-of:union-termOrRestr_k
      disjoint_from:disjoint-term_1
             :
      disjoint_from:disjoint-term_m
      relationship-TVP_1
             :
      relationship-TVP_n
Class(term-id partial
      annotation(label name-string)

      isa-term_1
             :
      isa-term_i
EquivalentClasses(term-id
      intersectionOf(T(intersection_of:int-termOrRestr_1)
             ... T(intersection_of:int-termOrRestr_j)))
EquivalentClasses(term-id
      unionOf(T(union-of:union-termOrRestr_1)
             ...T(union-of:union-termOrRestr_k)))
DisjointClasses(term-id disjoint-term_1)
      :
DisjointClasses(term-id disjoint-term_n)
SubClassOf(term-id T(relationship-TVP_1))
      :
SubClassOf(term-id T(relationship-TVP_n))
intersection_of:term-idterm-id
intersection_of:relationship-id term-id restriction(relationship-id someValuesFrom(term-id))
union_of:term-id term-id
union_of:relationship-id term-id restriction(relationship-id someValuesFrom(term-id))
relationship: relationship-id term-id restriction(relationship-id someValuesFrom(term-id))
relationship: relationship-id term-id
      cardinality=card
restriction(relationship-id cardinality(card))
relationship: relationship-id term-id
      maxCardinality=max
restriction(relationship-id maxCardinality(card))
relationship: relationship-id term-id
      minCardinality=min
restriction(relationship-id minCardinality(card))
[Typedef]
      id:relationship-id
      name:name-string
      domain-TVP
      range-TVP
      meta-property-TVP_1 ... meta-property-TVP_k
      r-isa-TVP_1 ... r-isa-TVP_m
      inverse-TVP
      transover-TVP
ObjectProperty(relationship-id annotation(label name-string)
      T(transover-TVP)
      T(r-isa-TVP_1) ... T(r-isa-TVP_m)
      T(meta-property-TVP_1) ... T(meta-property-TVP_k)
      T(inverse-TVP)
      T(domain-TVP)
      T(range-TVP)
transitive_over:relationship-id annotation(transitive_over:relationship-id)
r-isa:relationship-id super(relationship-id)
is_anti_symmetric:true annotation(is_anti_symmetric:true)
is_anti_symmetric:false annotation(is_anti_symmetric:false)
is_cyclic:true annotation(is_cyclic:true)
is_cyclic:false annotation(is_cyclic:false)
is_reflexive:true annotation(is_reflexive:true)
is_reflexive:false annotation(is_reflexive:false)
is_symmetric:true Symmetric
is_symmetric:false annotation(is_symmetric:false)
is_transitive:true Transitive
is_transitive:false annotation(is_transitive:false)
inverse:relationship-id inverseOf(relationship-id)
domain:term-id domain(term-id)
range:term-id range(term-id)
[Instance]
      id:instance-id
      name:name-string
      instance_of_1 ... instance_of_i
      p-obj-value-TVP_1 ... p-obj-value-TVP_j
      p-data-value-TVP_1 ... p-data-value-TVP_k
Individual(instance-id annotation(label name-string)
      T(instance_of_1) ... T(instance_of_i)
      T(p-obj-value-TVP_1) ... T(p-obj-value-TVP_j)
      T(p-data-value-TVP_1) ... T(p-data-value-TVP_k))
instance_of:term-id type(term-id)
property_value:relationship-id instance-id value(relationship-id instance-id)
property_value:relationship-id "val" datatype-id value(relationship-id val^^datatype-id)

3.2 Mapping to OWL 1.1

This section defines the semantics of (an even larger subset of) OBO via a mapping to OWL 1.1.

In addition to the simplifying assumptions mentioned above, it should be noted that:

Translation to OWL 1.1
OBO syntax (fragment) - S Translation - T(S)
header stanza_1 ... stanza_n T(header) T(stanza_1) ... T(stanza_n)
[Term]
      id:term-id
      name:name-string
      isa:isa-term_1
             :
      isa:isa-term_i
      intersection_of:int-termOrRestr_1
             :
      intersection_of:int-termOrRestr_j
      union-of:union-termOrRestr_1
             :
      union-of:union-termOrRestr_k
      disjoint_from:disjoint-term_1
             :
      disjoint_from:disjoint-term_m
      relationship-TVP_1
             :
      relationship-TVP_n
Declaration(OWLClass(term-id))

EntityAnnotation(OWLClass(term-id) Label(name-string))
SubClassOf(term-id isa-term_1)
      :
SubClassOf(term-id isa-term_i)
EquivalentClasses(term-id
      ObjectIntersectionOf(T(intersection_of:int-termOrRestr_1)
             ... T(intersection_of:int-termOrRestr_j)))
EquivalentClasses(term-id
      ObjectUnionOf(T(union-of:union-termOrRestr_1)
             ... T(union-of:union-termOrRestr_k)))
DisjointClasses(term-id disjoint-term_1)
      :
DisjointClasses(term-id disjoint-term_m)
SubClassOf(term-id T(relationship-TVP_1))
      :
SubClassOf(term-id T(relationship-TVP_n))
intersection_of:term-idterm-id
intersection_of:relationship-id term-id ObjectSomeValuesFrom(relationship-id term-id)
union_of:term-id term-id
union_of:relationship-id term-id ObjectSomeValuesFrom(relationship-id term-id)
relationship:relationship-id term-id ObjectSomeValuesFrom(relationship-id term-id)
relationship:relationship-id term-id
      cardinality=card
ObjectExactCardinality(card relationship-id term-id)
relationship:relationship-id term-id
      maxCardinality=max
ObjectMaxCardinality(max relationship-id term-id)
relationship:relationship-id term-id
      minCardinality=min
ObjectMinCardinality(min relationship-id term-id)
[Typedef]
      id:relationship-id
      name:name-string
      domain:domain-id
      range:range-id
      meta-property-TVP_1
             :
      meta-property-TVP_k
      r-isa:isa-id_1
             :
      r-isa:isa-id_m
      inverse:inv-id
      transitive_over:tr-over-id
Declaration(ObjectProperty(relationship-id))

EntityAnnotation(ObjectProperty(relationship-id) Label(name-string))
ObjectPropertyDomain(relationship-id domain-id)
ObjectPropertyRange(relationship-id range-id)
T(relationship-id meta-property-TVP_1)
      :
T(relationship-id meta-property-TVP_k)
SubObjectPropertyOf(relationship-id isa-id_1)
      :
SubObjectPropertyOf(relationship-id isa-id_m)
InverseObjectProperties(relationship-id inv-id)
SubObjectPropertyOf(
      SubObjectPropertyChain(relationship-id tr-over-id)
      relationship-id)
relationship-id is_anti_symmetric:true AntisymmetricObjectProperty(relationship-id)
relationship-id is_anti_symmetric:false Declaration(ObjectProperty(relationship-id)
      comment(is_anti_symmetric:false))
relationship-id is_cyclic:true Declaration(ObjectProperty(relationship-id)
      comment(is_cyclic:true))
relationship-id is_cyclic:false Declaration(ObjectProperty(relationship-id)
      comment(is_cyclic:false))
relationship-id is_reflexive:true ReflexiveObjectProperty(relationship-id)
relationship-id is_reflexive:false Declaration(ObjectProperty(relationship-id)
      comment(is_reflexive:false))
relationship-id is_symmetric:true SymmetricObjectProperty(relationship-id)
relationship-id is_symmetric:false Declaration(ObjectProperty(relationship-id)
      comment(is_symmetric:false))
relationship-id is_transitive:true TransitiveObjectProperty(relationship-id)
relationship-id is_transitive:false Declaration(ObjectProperty(relationship-id)
      comment(is_transitive:false))
[Instance]
      id:instance-id
      name:name-string
      instance_of_1
             :
      instance_of_i
      property_value:obj-r_1 instance-id_1
             :
      property_value:obj-r_j instance-id_j
      property_value:data-r_1 "val_1" data-t_1
             :
      property_value:data-r_k "val_k" data-t_k
Declaration(Individual(instance-id))

EntitiyAnnotation(Individual(instance-id) Label(name-string)))
ClassAssertion(instance-id instance_of_1)
      :
ClassAssertion(instance-id instance_of_i)
ObjectPropertyAssertion(obj-r_1 instance-id instance-id_1)
      :
ObjectPropertyAssertion(obj-r_j instance-id instance-id_j)
DataPropertyAssertion(data-r_1 instance-id val_1^^data-t_1)
      :
DataPropertyAssertion(data-r_k instance-id val_k^^data-t_k)