Paul Fisher is using myGrid workflows to analyse microarray data. Through large-scale sequencing projects and associated gene prediction techniques we have very good quality information on genotype for a wide range of organisms. We also have much good quality information on the variations within these genotypes. The linkage from these genotype to their associated phenotypes is, however, more difficult. There are many genes for which these is no functional annotation. Similarly, there are many phenotypes for which there are no gene correlates. The navigation from genotype to phenotype necessitates gathering much fragmentary information from many databases, including facts from the literature and new high-throughput gene expression and proteomic resources. These fragments of evidence need to be gathered, filtered and formed into a picture of the network of association between genotype and phenotype.
The workflow paradigm in eScience offers an opportunity to develop an infra-structure for such genotype to phenotype investigations. Using trypanosomiosis in mice and cattle as a case study, this project will research techniques to capture in workflows the means to gather such data, store those data and analyse those data to perform genotype to phenotype investigations. Capturing these investigations as workflows should reveal a set of generic workflows that can be re-used, re-purposed for any such investigations. The contributions will be to capture bioinformatics best practice in workflows and involve new data sources into workflow analyses to support the in silico eScience process.