I'm currently in my second year of PhD. at the The University of Manchester, under the supervision of Prof. Carole Goble. The few pages under this section are an introduction to my research, which briefly consists in modelling e-Science provenance and how they can be used.
e-Science provenance is the metadata contributing to the origin of data as well as how these data were produced. Like experiments performed at a laboratory bench, results from an e-science in silico experiment are of limited value if the origin, or provenance, of those results can not be identified.
In traditional lab experiments provenance is either not well kept or not widely shared among scientific communities. With in silico experiments automating the orchestration of experiment resources, a large amount of experiment results and conclusions are accumulated and presented to scientists as a zipped folder of fragmented files. e-Scientists start to lose track of how data were generated and processed.
Recent research in e-Science provenance has proposed several solutions. Some algorithms claim to be able to compute provenance based on a formalized definition of data and processes. However, a requirement for these data to be stored in a database can not be always satisfied for data generated during e-Science in silico experiments. Some other works have proposed architectures and mechanisms for generating and archiving provenance. However, the lack of a systematic model of provenance and poor representation of provenance data prevent a full usage of provenance. Even fewer efforts have been put into discovering the implicit knowledge from provenance.
myGrid, as a pilot e-Science project in U.K., has attempted to build a knowledge web of provenance resources by using some Semantic Web technologies. Although promising results are achieved in these works, it is identified that the lack of the ability of machine processing provenance shows limited potential for using provenance. To solve this problem, a formalized definition and systematic algorithms for processing provenance are desired. This provenance work expects to contribute to the provenance research community by solving these issues identified in the myGrid project and providing an improved framework for provenance organisation and investigation. myGrid provides the user scenario and test bed for this provenance work.
The diagram below is an overview of the provenance pyramid, which shows four different level of view over e-Science provenance. These different views reflect the requirements from different users about what kind of provenannce they need. Please find the RDFS version provenance ontology in the ontology page.