Computational Biology: Botany - The Flora Project

Botanical Semantics
Multiflora
Personnel
Publications
Links

This is a resource page for The Computational Flora Project in the School of Computer Science, The University of Manchester, UK.

It contains published and draft papers and experimental data gathered from botanical sources.

The Semantics of Botanical Texts

The project's aims are the automated and semi-automated processing of floras - botantical texts that describe plants, usually for identification purposes. As texts, floras have several special features: (1) they are highly structured, using both plant classifications and the morphology of plants to structure the descriptions; (2) there are multiple `parallel' texts, i.e. many different accounts of the same species are available; (3) they use a mix of specialised terminology, standard (if reduced) natural language, and numerical expressions; and (4) they include descriptions of values of continuous quantities and the variations of values, covering such things as flower colour, plant height, leaf shape, stem-cross section, leaf surface texture etc.

These features of floras make them especially accessible to computational processing, but to do so we need a suitable semantics and structural representation of the natural language descriptions. This is the essence of the Manchester Computational Flora Project.

Here is a collection of papers and notes on the semantic analysis and automated processing of floras:

S. Wang, D. E. Rydeheard, and J. Z. Pan. The Semantic Processing of Continuous Quantities for Discrete Terms in Ontologies. Journal of Logic and Computation, 18:3. (2008) 341-359. A journal paper on the natural language semantics of continuous quantities, metric models and applications to processing botanical texts,
A summary paper (in PS) describing the overall aims of the project,
Experimental data on simple leaf shapes (in PS),
Experimental data on flower colours (in PS),
A draft paper on the semantics and its evidential base,
Introductory notes on metrics spaces (in PS).

[ blue flowers ]
The meaning of `blue': A hue-saturation plot of several blue flowers.

The MultiFlora Project

Intelligent processing of natural language text information is essential if the vast legacy of taxonomic data is to be made accessible. MultiFlora aims to provide proof of concept that Information Extraction (IE) can be improved by the analysis of multiple parallel texts, and that, applied to botanical taxon descriptions, it has the potential to be a useful tool in biodiversity informatics. See the Project Website for further details.

Personnel

Members of the group include:

Part of the research is in association with members of the Natural History Museum, London:

Robert Huxley
David Sutton

Publications

Lydon SJ, Wood MM, Huxley R, Sutton D. 2003. Data patterns in multiple botanical descriptions: implications for automatic processing of legacy data. Systematics and Biodiversity 1: 151 - 157. BibTex
Wood, M. M., Lydon, S. J., Tablan, V., Maynard, D., and Cunningham, H. (2003). Using parallel texts to improve recall in IE. In Proceedings of Recent Advances in Natural Language Processing (RANLP-2003), pages 505-512, Borovetz, Bulgaria. BibTex
Wood, M. and Wang, S. (2004). Motivation for "ontology" in parallel-text information extraction. In Proceedings of ECAI-2004 Workshop on Ontology Learning and Population (ECAI-OLP), Poster, Valencia, Spain. BibTex
Wood, M., Lydon, S., Tablan, V., Maynard, D., and Cunningham, H. (2004). Populating a database from parallel texts using ontology-based information extraction. In Meziane, F. and M´tais, E., editors, Proceedings of Natural Language Processing and Information Systems, 9th International Conference on Applications of Natural Languages to Information Systems, pages 254-264. Springer. BibTex
Shenghui Wang and Jeff Z. Pan. Ontology-based Representation and Query of Colour Descriptions from Botanical Documents. OTM Conferences (2) 2005: 1279-1295. Electronic Edition (link) BibTeX
Shenghui Wang and Jeff Z. Pan. Integrating and Querying Parallel Leaf Shape Descriptions. In Proc. of the 5th International Semantic Web Conference (ISWC2006). To appear.
S. Wang, D. E. Rydeheard, and J. Z. Pan. The Semantic Processing of Continuous Quantities for Discrete Terms in Ontologies. Journal of Logic and Computation, 18:3. (2008) 341-359.

The Manchester Computational Flora Project

The Semantics of Botanical Texts

The MultiFlora Project

Personnel

Publications

Links