This is a resource page for The Computational Flora Project in the School of Computer Science, The University of Manchester, UK.
It contains published and draft papers and experimental data gathered from botanical sources.
The Semantics of Botanical Texts
The project's aims are the automated and semi-automated processing of floras - botantical texts that describe plants, usually for identification purposes. As texts, floras have several special features: (1) they are highly structured, using both plant classifications and the morphology of plants to structure the descriptions; (2) there are multiple `parallel' texts, i.e. many different accounts of the same species are available; (3) they use a mix of specialised terminology, standard (if reduced) natural language, and numerical expressions; and (4) they include descriptions of values of continuous quantities and the variations of values, covering such things as flower colour, plant height, leaf shape, stem-cross section, leaf surface texture etc.
These features of floras make them especially accessible to computational processing, but to do so we need a suitable semantics and structural representation of the natural language descriptions. This is the essence of the Manchester Computational Flora Project.
Here is a collection of papers and notes on the semantic analysis and automated processing of floras:
The MultiFlora ProjectIntelligent processing of natural language text information is essential if the vast legacy of taxonomic data is to be made accessible. MultiFlora aims to provide proof of concept that Information Extraction (IE) can be improved by the analysis of multiple parallel texts, and that, applied to botanical taxon descriptions, it has the potential to be a useful tool in biodiversity informatics. See the Project Website for further details.
Members of the group include:Natural History Museum, London:
LinksHere are some useful links: