The Manchester Computational Flora Project

This is a resource page for The Computational Flora Project in the School of Computer Science, The University of Manchester, UK.

It contains published and draft papers and experimental data gathered from botanical sources.

The Semantics of Botanical Texts

The project's aims are the automated and semi-automated processing of floras - botantical texts that describe plants, usually for identification purposes. As texts, floras have several special features: (1) they are highly structured, using both plant classifications and the morphology of plants to structure the descriptions; (2) there are multiple `parallel' texts, i.e. many different accounts of the same species are available; (3) they use a mix of specialised terminology, standard (if reduced) natural language, and numerical expressions; and (4) they include descriptions of values of continuous quantities and the variations of values, covering such things as flower colour, plant height, leaf shape, stem-cross section, leaf surface texture etc.

These features of floras make them especially accessible to computational processing, but to do so we need a suitable semantics and structural representation of the natural language descriptions. This is the essence of the Manchester Computational Flora Project.

Here is a collection of papers and notes on the semantic analysis and automated processing of floras:

[ blue flowers ]
The meaning of `blue': A hue-saturation plot of several blue flowers.

The MultiFlora Project

Intelligent processing of natural language text information is essential if the vast legacy of taxonomic data is to be made accessible. MultiFlora aims to provide proof of concept that Information Extraction (IE) can be improved by the analysis of multiple parallel texts, and that, applied to botanical taxon descriptions, it has the potential to be a useful tool in biodiversity informatics. See the Project Website for further details.


Members of the group include:

Part of the research is in association with members of the Natural History Museum, London:
  • Robert Huxley
  • David Sutton


  1. Lydon SJ, Wood MM, Huxley R, Sutton D. 2003. Data patterns in multiple botanical descriptions: implications for automatic processing of legacy data. Systematics and Biodiversity 1: 151 - 157. BibTex
  2. Wood, M. M., Lydon, S. J., Tablan, V., Maynard, D., and Cunningham, H. (2003). Using parallel texts to improve recall in IE. In Proceedings of Recent Advances in Natural Language Processing (RANLP-2003), pages 505-­512, Borovetz, Bulgaria. BibTex
  3. Wood, M. and Wang, S. (2004). Motivation for "ontology" in parallel-text information extraction. In Proceedings of ECAI-2004 Workshop on Ontology Learning and Population (ECAI-OLP), Poster, Valencia, Spain. BibTex
  4. Wood, M., Lydon, S., Tablan, V., Maynard, D., and Cunningham, H. (2004). Populating a database from parallel texts using ontology-based information extraction. In Meziane, F. and M´tais, E., editors, Proceedings of Natural Language Processing and Information Systems, 9th International Conference on Applications of Natural Languages to Information Systems, pages 254-­264. Springer. BibTex
  5. Shenghui Wang and Jeff Z. Pan. Ontology-based Representation and Query of Colour Descriptions from Botanical Documents. OTM Conferences (2) 2005: 1279-1295. Electronic Edition (link) BibTeX
  6. Shenghui Wang and Jeff Z. Pan. Integrating and Querying Parallel Leaf Shape Descriptions. In Proc. of the 5th International Semantic Web Conference (ISWC2006). To appear.
  7. S. Wang, D. E. Rydeheard, and J. Z. Pan. The Semantic Processing of Continuous Quantities for Discrete Terms in Ontologies. Journal of Logic and Computation, 18:3. (2008) 341-359.


Here are some useful links:

Contact the Website Administrator with comments or queries about this website.
All material copyright ŠThe University of Manchester.