OBO Flat File Format Syntax and Semantics
and Mapping to OWL Web Ontology Language

Editor's Draft of 03 November 2006

This version:
http://www.cs.man.ac.uk/~horrocks/obo/syntax.html
Latest version:
http://www.cs.man.ac.uk/~horrocks/obo/syntax.html
Previous version:
http://www.cs.man.ac.uk/~horrocks/obo/syntax-20061103.html
Author:
http://www.cs.man.ac.uk/~horrocks, University of Manchester

Abstract

This document suggests a formalisation (using BNF) of the OBO Flat File Format Specification, version 1.2, and a semantics for (part of) the language defined via a translation to the OWL DL abstract syntax (which is a decidable fragment of FOL corresponding to the Description Logic SHOIN).

Status of this Document

This is an editor's draft, for comment by the community.

Comments should be sent to obo-format@lists.sourceforge.net (archive) to ensure wide visibility.


Table of Contents


1 Introduction

This document suggests a formalisation (using BNF) of the OBO Flat File Format Specification, version 1.2 (which we will refer to from now on as OBO), and a semantics for (part of) the language defined via a translation to the OWL DL abstract syntax (which is a decidable fragment of FOL corresponding to the Description Logic SHOIN). The objectives are:

2 OBO Syntax

This section suggests a BNF style grammar for OBO. This helps to make the structure a little more precise and allows any ambiguities and/or misunderstandings to be clarified. It is also allows the grammar to be automatically checked (e.g., for ambiguities) and a parser to be generated.

The grammar is presented in the standard BNF notation. Nonterminal symbols are denoted in bold (e.g., stanza), terminal symbols are written in single quotes (e.g. 'Term', zero or more instances of a symbol is denoted with curly braces (e.g., { stanza }), alternative productions are denoted with the vertical bar (e.g., term-stanza | typedef-stanza), and zero ore one instances of a symbol are denoted with square brackets (e.g., [ 'is_anonymous: true' ]).

Several simplifications have been made in the following grammar: details relating to XML syntax have been ignored, and details of some of the annotation tags are omitted. Angle brackets are used to denote parts of the specification which are nonterminal, but which are not further defined here (e.g., < saved-by >).

The OBO specification did not seem to be completely clear with respect to some details of syntax and semantics. Some of these have been resolved by communications from the OBO developers, in particular Chris Mungall:

A few points remain to be clarified:

2.1 OBO File Structure

An OBO file consists of a header followed by zero or more stanzas.

OBO-Doc := header { stanza }

2.2 OBO Header

The header consists of a number of tag-value pairs, most of which we will ignore for the time being. Many of these (e.g., <remark> could clearly be treated as annotations; others (e.g., <default-namespace> correspond to parts of an XML document preamble.

header :=
   <format-version>
   [ <data-version> ]
   [ <date> ]
   [ <saved-by> ]
   [ <auto-generated-by> ]
   [ <subsetdef> ]
   { import }
   { <synonymtypedef> }
   { <idspace> }
   [ <default-relationship-id> ]
   { <idmapping> }
   [ <remark> ]
import := 'import:' <URL>
stanza := term-stanza | typedef-stanza | instance-stanza

2.3 Term Stanzas

Term stanzas introduce and define the meaning of terms (AKA concepts, classes and unary predicates).

term-stanza :=
   '[Term]'
   termid-TVP
   'name:'<string>
   [ <namespace> ]
   { <alt_id> }
   [ <def> ]
   [ <comment> ]
   { <subset> }
   { <synonym> }
   { <xref> }
   { isa-TVP }
   { intersection-TVP }
   { union-TVP }
   { disjoint-TVP }
   { relationship-TVP }
   [ <is_obsolete> ]
   [ <replaced_by> ]
   { <consider> }

termid-TVP :=
   'id:' term-id
   [ 'is_anonymous: true' ]
term-id := <string>

isa-TVP :=
   'is_a:' term-id
   [ 'namespace=' <namespace-id> ]
   [ 'derived=true' | 'derived=false' ]

intersection-TVP :=
   'intersection_of:' termOrRestr
   [ 'namespace=' <namespace-id> ]
termOrRestr := term-id | restriction
restriction := relationship-id term-id
relationship-id := <string>

union-TVP :=
   'union_of:' termOrRestr
   [ 'namespace=' <namespace-id> ]

disjoint-TVP :=
   'disjoint_from:' term-id
   [ 'namespace=' <namespace-id> ]
   [ 'derived=true' | 'derived=false' ]

relationship-TVP :=
   'relationship:' restriction
   [ 'not_necessary=true' | 'not_necessary=false' ]
   [ 'inverse_necessary=true' | 'inverse_necessary=false' ]
   [ 'cardinality=' <non-neg-int> ]
   [ 'maxCardinality=' <non-neg-int> ]
   [ 'minCardinality=' <non-neg-int> ]

2.4 Typedef Stanzas

Typedef stanzas introduce and define the meaning of relations (AKA roles, properties and binary predicates).

typedef-stanza :=
   '[Typedef]'
   typedef-TVP
   'name:'<string>
   [ <namespace> ]
   { <alt_id> }
   [ <def> ]
   [ <comment> ]
   { <subset> }
   { <synonym> }
   { <xref> }
   [ domain-TVP ]
   [ range-TVP ]
   { meta-property-TVP }
   { r-isa-TVP }
   { r-intersection-TVP }
   [ inverse-TVP ]
   [ transover-TVP ]
   { relationship-TVP }
   [ 'is_metadata_tag:true' | 'is_metadata_tag:false' ]
   [ <is_obsolete> ]
   [ <replaced_by> ]
   { <consider> }

typedefid-TVP :=
   'id:' relationship-id
   [ 'is_anonymous: true' ]

domain-TVP := 'domain:' termOrReserved
termOrReserved := term-id | <reserved-id>

range-TVP := 'range:' termOrReserved

meta-property-TVP :=
   'is_anti_symmetric:true' | 'is_anti_symmetric:false' |
   'is_cyclic:true' | 'is_cyclic:false' |
   'is_reflexive:true' | 'is_reflexive:false' |
   'is_symmetric:true' | 'is_symmetric:false' |
   'is_transitive:true' | 'is_transitive:false'

r-isa-TVP := 'is_a:' relationship-id [ isa-mlist ]
isa-mlist := '{' isa-modifier { ',' isa-modifier } '}'
isa-modifier := namespace-mod | derived-mod
namespace-mod := 'namespace=' namespace-id
derived-mod := 'derived=true' | 'derived=false'

r-intersection-TVP :=
   'intersection_of:' relationship-id
   [ 'namespace=' <namespace-id> ]

inverse-TVP := 'inverse:' relationship-id

transover-TVP := 'transitive_over:' relationship-id

2.5 Instance Stanzas

Instance stanzas introduce and define the meaning of instances (AKA individuals, individual names, constants).

instance-stanza :=
   '[Instance]'
   instanceid-TVP
   'name:'<string>
   [ <namespace>]
   { <alt_id> }
   [ <comment> ]
   { <synonym> }
   { <xref> }
   'instance_of:' term-id { 'instance_of:' term-id }
   { p-obj-value-TVP | p-data-value-TVP }
   [ <is_obsolete> ]
   [ <replaced_by> ]
   { <consider> }

instanceid-TVP :=
   'id:' instance-id
   [ 'is_anonymous: true' ]

p-obj-value-TVP := 'property_value:' relationship-id instance-id

p-data-value-TVP := 'property_value:' relationship-id '"' <string> '"' <XML-Schema-datatype>

3 OBO Semantics

This section defines the semantics of (a large subset of) OBO via a mapping to OWL DL. The mapping could also be used to specify a translation procedure and/or an interface to OWL tools (such as OWL reasoners).

A number of simplifying assumptions are made in order to improve readability:

The translation is defined using a translation function T which translates (a fragment of) OBO into OWL DL. The definition of T is often recursive, but it will eventually "ground out" in (a fragment of) OWL DL.

Transformation to OWL
OBO syntax (fragment) - S Transformation - T(S)
header stanza_1 ... stanza_n T(header) T(stanza_1) ... T(stanza_n)
[Term]
      id:term-id
      name:name-string
      isa-TVP_1 ... isa-TVP_i
      intersection-TVP_1 ... intersection-TVP_j
      union-TVP_1 ... union-TVP_k
      disjoint-TVP_1 ... disjoint-TVP_m
      relationship-TVP_1 ... relationship-TVP_n
Class(term-id partial annotation(label name-string)
      T(isa-TVP_1) ... T(isa-TVP_i)
      T(relationship-TVP_1) ... T(relationship-TVP_n))
EquivalentClasses(term-id
      intersectionOf(T(intersection-TVP_1)
             ... T(intersection-TVP_j)))
EquivalentClasses(term-id
      unionOf(T(union-TVP_1) ... T(union-TVP_k)))
DisjointClasses(term-id T(disjoint-TVP_1))
...
DisjointClasses(term-id T(disjoint-TVP_m))
isa:term-idterm-id
relationship: relationship-id term-id restriction(relationship-id someValuesFrom(term-id))
relationship: relationship-id term-id
      rel-modifier_1 ... rel-modifier_n
restriction(relationship-id
      T(rel-modifier_1)
      ... T(rel-modifier_n))
cardinality=card cardinality(card)
maxCardinality=max maxCardinality(max)
minCardinality=min minCardinality(min)
intersection_of:term-idterm-id
intersection_of:relationship-id term-idrestriction(relationship-id someValuesFrom(term-id))
union_of:term-id term-id
union_of:relationship-id term-id restriction(relationship-id someValuesFrom(term-id))
disjoint_from:term-id term-id
[Typedef]
      id:relationship-id
      name:name-string
      domain-TVP_1 ... domain-TVP_i
      range-TVP_1 ... range-TVP_j
      meta-property-TVP_1 ... meta-property-TVP_k
      r-isa-TVP_1 ... r-isa-TVP_m
      r-intersection-TVP_1 ... r-intersection-TVP_n
      inverse-TVP_1 ... inverse-TVP_v
      transover-TVP_1 ... transover-TVP_w
ObjectProperty(relationship-id annotation(label name-string)
      T(transover-TVP_1) ... T(transover-TVP_w)
      T(r-isa-TVP_1) ... T(r-isa-TVP_m)
      T(r-intersection-TVP_1) ... T(r-intersection-TVP_n)
      T(meta-property-TVP_1) ... T(meta-property-TVP_k)
      T(inverse-TVP_1) ... T(inverse-TVP_v)
      T(domain-TVP_1) ... T(domain-TVP_i)
      T(range-TVP_1) ... T(range-TVP_j))
transitive_over:relationship-id annotation(transitive_over:relationship-id)
r-isa:relationship-id super(relationship-id)
r-intersection:relationship-id super(relationship-id)
is_anti_symmetric:true annotation(is_anti_symmetric:true)
is_anti_symmetric:false annotation(is_anti_symmetric:false)
is_cyclic:true annotation(is_cyclic:true)
is_cyclic:false annotation(is_cyclic:false)
is_reflexive:true annotation(is_reflexive:true)
is_reflexive:false annotation(is_reflexive:false)
is_symmetric:true Symmetric
is_symmetric:false annotation(is_symmetric:false)
is_transitive:true Transitive
is_transitive:false annotation(is_transitive:false)
inverse:relationship-id inverseOf(relationship-id)
domain:term-id domain(term-id)
range:term-id range(term-id)
[Instance]
      id:instance-id
      name:name-string
      instance_of_1 ... instance_of_i
      p-obj-value-TVP_1 ... p-obj-value-TVP_j
      p-data-value-TVP_1 ... p-data-value-TVP_k
Individual(instance-id annotation(label name-string)
      T(instance_of_1) ... T(instance_of_i)
      T(p-obj-value-TVP_1) ... T(p-obj-value-TVP_j)
      T(p-data-value-TVP_1) ... T(p-data-value-TVP_k))
instance_of:term-id type(term-id)
property_value:relationship-id instance-id value(relationship-id instance-id)
property_value:relationship-id "val" datatype-id value(relationship-id val^^datatype-id)