Bioinformatics shims

Service name (with link to Taverna API)	Description	Input I	Output O	Relation between O and I
Classified as "List"
StringListMerge	Merge string list to string. Consumes a string list and optional seperator character and emits a string formed from the concatenation of all items in the list with the seperator (default newline) interposed between them.	String list (and optional seperator)	Merged list	output hasPart input
FlattenList	Flatten I(I()) to I(). Consume a list of lists and emit a list containing the first level flattening of the input.	List. Set of sets?	List. Subset?	Unclassified
StringStripDuplicates	Remove duplicate stringsConsumes a string list and emits the string list with duplicate entries removed. The first occurance of a duplicate is preserved and all subsequent ones omited, i.e the string list 'a,b,c,b,a,d' is converted to 'a,b,c,d'	String list	String list (stripped)	Unclassified
EchoList	EchoList. Echo the input list to the output list, does no actual processing at all. This class is intended to be used in conjunction with nested workflows in order to split the iteration out from the previous stage in the flow.	List	List	Unclassified
Classified as "io"
TextFileWriter	Write Text File. This processor writes the "filecontents" out to the the url specified in the "outputFile" parameter. Note that the outputMap is always empty.	filecontents?	outputFile	Unclassified
EnvVariableWorker	Get Environment Variables as XML. This processor exposes the Java environment variables as an XML document.	None	Environment variables	n/a
FileListByRegexTask	List files by regexThis processor lists the files in a given subdirectory using a regular expression.	Directory and regular expression	FileList	Directory hasPart file?
LocalCommand	Execute cmd line app This processor executes a commandline and returns the response as a String	command (string)	result (string)	Unclassified, could be anything, depends on command
FileListByExtTask	List Files By Extension This processor gets a list of files on a local directory	directory, extension	filelist	directory hasPart file?
DataRangeTask	Select Data Range From File Extracts a range of values from a two dimension data array	array, starting point, end point	array	outputArray isPartOf inputArray
TextFileReader	Read Text File reads text from a file specified by the "fileurl" attribute. and returns the results in the "filecontents" item in the outputMap	fileurl	filecontents	filecontents isidentifiedby fileurl
ConcatenateFileListWorker	Concatenate files concatenates a series of text files and saves the results into the output file	filelist, outputfile, displayresults	results	results hasPart input
DataRangeColumnTask	Select Column From File extract a single column of data from a data array produced by either an ExcelFileReader or by a DelimitedFileReader	array, column	array	Unclassified, SELECT or PROJECT?
ExcelFileReader	Read Excel File reads an Excel spreadsheet and creates an ArrayList of ArrayLists containing string data. Note that Formula's are not currently evaluated and thus are returned as empty strings.	filename, firstRowContainsColumnNames, dateIndexes	data	Unclassified
Classified as "Metadata"
GetLSID	Get internal LSID of input Outputs "replacelsid:input" which should be substituted for the input's lsid by the ProcessorTask. Chris Greenhalgh wrote this strips out two (or three?) of the five? components of an LSID	input	replacelsid	Unclassified: try it out
Classified as "xml"
XPathTextWorker	XPath From Text applies an arbitrary XPath expression to an XML document, and returns a nodelist containing the nodes that match the XPath expression.	xpath, xml-text	nodelist, nodelistAsXML	Unclassified
XSLTWorker	Transform XML transforms an input XML document into an output document. If an inFileURL is supplied, it will use the document located at the URL as input. If the xml-text is supplied, it will this in-memory XML document as input. If an outputFile url is supplied, the results will be written to the output document.	xslFileURL, outFileURL, inFileURL, outputExt: @tavinput xslFileURL The complete path to XSL file, @tavinput outFileURL The complete path to the output file. (optional), @tavinput inFileURL The complete path to the input file, @tavinput xml-text The XML text to be processed. (optional), @tavinput outputExt The output file extension. Use this only if you want to add the extension to the input filename and use it as the output file name.	outputStr, @tavoutput outputStr A string containing the output text. This is useful, if you want to connect this processor to another and pass the results to it.	Unclassified
XPathWorker	XPath From XML File applies an arbitrary XPath expression to an XML document, and returns a nodelist containing the nodes that match the XPath expression..	xpath, xmltext	xml-text, nodelist	unclassified
Classified as "ncbi" note parameter naming conventions follow those outlined in Entrez Programming Utilities.
SNPWorker	Get SNP XML fetches SNP records.	id, retttype, retmode Doesn't work. Use snp id e.g. `rs3091213` (taken from this list of SNPs) with no luck. See Get SNP example workflow (produces empty results). `rettype` defaults to `xml`, but its not clear what values `retmode` takes and if its mandatory or not.	resultsXml	I `uniquelyIdentifes` O
ProteinGBSeqWorker	Get Protein GBSeq XML fetches protein data in GBSeq XML format.	id e.g. GenBank identifier without leading GI: e.g. `1293613` see GenBank Sequence example workflow	outputText	I `uniquelyIdentifes` O
NucleotideGBSeqWorker	Get Nucleotide GBSeq XML returns a GB Seq formatted record.	id e.g. GenBank identifier without leading GI: e.g. `1293613` see GenBank Sequence example workflow	outputText	I `uniquelyIdentifes` O
NucleotideFastaWorker	Get Nucleotide FASTA fetches a nucleotide sequence in FASTA format.	id Accession number e.g. `U49845` see Nucleotide FASTA workflow example.	outputText	I `uniquelyIdentifes` O
NucleotideINSDSeqXMLWorker	Get Nucleotide INSDSeq XML returns a INSD formatted nucleotide record	ide.g. The nucleotide accession. `U49845`, see Nucleotide INSD workflow example.	outputText, e.g. INSD formatted nucleotide record	I `uniquelyIdentifes` O
HomoloGeneWorker	Homologene XML fetches HomoloGene data from NCBI.	term, maxRecords, outputFile, xslt, ext Example? Homologene terms? [Ancestor] Taxonomic name of common ancestor of the species represented in a HomoloGene entry. [Gene Description] Detailed description of a Gene. [Gene Id] Unique Gene Identifier. [Gene Name] Gene Aliases. [Nucleotide Accession] GenBank accession identifier of nucleotide sequence. [Nucleotide GI] Unique Nucleotide identifier. [Organism] Description of the organism or the NCBI Taxonomy ID of a species. [Protein Accession] The protein accession number of a protein. [Protein GI] Unique Protein identifier. [Text Word] Free text to be searched for in HomoloGene. [Title] Summary of HomoloGene entry [UniGene ID] Unique Unigene identifier.	resultsXml	unclassified
ProteinFastaWorker	Get Protein FASTA fetches a protein sequence in FASTA format.	id is this a GenBankIdentifier or something else?	outputText	dereferencer
LocusLinkWorker	Get LocusLink XML fetch locuslink data from the NCBI database as XML.	id, rettype, retmode is this a GenBankIdentifier or something else? What values do rettype and retmode take?	outputText	dereferencer
PubMedSearchWorker	Search PubMed downloads PubMed records in XML format. Since NCBI does not currently support a pure XML	term, database, minDate, maxDate, reldate, rettype, cmd, cmd_current, dopt, orig_db, disp_max any valid pubmed query string / term? (e.g. `apweiler`), this doesn't currently work, are all the other parameters mandatory? See PubMed search example workflow	resultsXml	unclassified
NucleotideTinySeqXMLWorker	Get Nucleotide TinySeq XML fetches a nucleotide sequence from NCBI and returns the results in the TinySeqXML format.	id is this a GenBankIdentifier or something else?	outputText	dereferencer
OMIMWorker	Get OMIM XML fetches an OMIM record from the NCBI database in XML format.	term, maxRecords, outputFile, xslt, ext Can you give example `term`s? Are the other parameters optional or mandatory?	resultsXml	unclassified
EntrezGeneWorker	Get Entrez Gene XML fetching an Entrez Gene record in XML format. It can also transform the resulting XML document.	term, maxRecords, outputFile, xslt, ext Can you give example `term`s? Are the other parameters optional or mandatory?	resultsXml	unclassified
ProteinINSDSeqXMLWorker	Get Protein INSDSeq XML fetches an INSD formatted protein record	id is this a GenBankIdentifier or something else?	outputText	dereferencer
ProteinTinySeqXMLWorker	Get Protein TinySeq XML fetches a protein in TinySeqXML format.	id is this a GenBankIdentifier or something else?	outputText	dereferencer
EntrezProteinWorker	Get Entrez Protein XML processor fetches an Entrez Protein record from NCBI.	term, maxRecords, outputFile, xslt, ext Can you give example `term`s? Are the other parameters optional and what do they do?	resultsXml	unclassified
PubMedEFetchWorker	Get PubMed XML by PMID PubMed articles in XML form. Use this worker only if you already know the pubmed id	id, rettype, retmode e.g. `15262813`, see PMID workflow example	outputText	dereferencer
NucleotideXMLWorker	Get Nucleotide XML fetches Nucleotide XML documents.	term, maxRecords, outputFile, xslt, ext Can you give example `term`s? Are the other parameters optional or mandatory?	resultsXml	unclassified
PubMedESearchWorker	Search PubMed XML searches for articles in PubMed and returns their IDs in XML format	term, db, field, retstart, retmax, mindate, maxdate, rettype Can you give example `term`s? Are the other parameters optional or mandatory?	outputText	unclassified
Classified as "net"
ExtractImageLinks	Get image URLs from HTTP document Extract a list of all image links in the supplied html document	document	imagelinks	unclassified
SendEmail	Send an email Send an email from a workflow	to, from, subject, body, smtpserver	none	n/a
WebPageFetcher	Get web page from URL Fetch a single web page from URL	url, base	contents	dereferencer
WebImageFetcher	Get image from URL Fetch a single image from URL	url, base	image	dereferencer
Classified as "text"
ByteArrayToString `org.embl.ebi.escience. scuflworkers.java.ByteArrayToString`	Byte[] to String No description available yet. There isn't a String to Byte[] but there probably should be.	bytes 'application/octet-stream'	string 'text/plain'	syntax translator
StringSetUnion	String list union Provide the union of two lists of strings, the result being a string list containing all strings that occur in either of the input lists.	list1, list2	union	unclassified
StringConcat	Concatenate two strings Returns the result of appending firststring to secondstring	string1, string2	output	unclassified
StringSetDifference	String list difference Returns the items that are different between two sets or lists of string types where elements only exist in the output if they occur in either input, but not both	list1, list2	difference	unclassified
FilterStringList	Filter list of strings by regex Filter a list of Strings, only passing through those that match the supplied regular expression	stringlist, regex	filteredlist	unclassified
SplitByRegex	Split string into string list by regular expression Split an input string into a list of strings using the given regular expression to determine the delimiter. If the regular expression is not supplied then it will default to the ',' character	string, regex	split	unclassified
PadNumber	Pad numeral with leading 0's Pad a numeral with leading zeroes to take it up to a specified length, which defaults to seven.	input, targetlength	padded	unclassified
RegularExpressionStringList	Filter list of strings extracting match to a regex Apply a regular expression to a string, returning a group that matches if there is a match.	stringlist, regex, group	filteredlist	unclassified
StringSetIntersection	String list intersection Returns the intersection of two sets or lists of string types where elements only exist in the output if they occur in both inputs.	list, list2	intersection	unclassified
Classified as "biojava"
BlastParserWorker	Read BLAST results parses BLAST results and returns an XML document containing the results.	fileUrl, strict	blastresults	unclassified
TranscribeWorker	Transcribe DNA takes a DNA sequence and transcribes it into an RNA sequence	dna_seq	rna_seq	unclassified
EMBLParserWorker	Read EMBL file parses an EMBL-based file and outputs the results in Agave XML format.	fileUrl	emblFile	unclassified
TranslateWorker	Translate DNA translates a DNA sequence into a protein sequence.	dna_seq	prot_seq	unclassified
ReverseCompWorker	Reverse Complement DNA takes a raw DNA sequence and returns the reverse complement of the sequence.	rawSeq	revSeq	unclassified
GenBankParserWorker	Read GenBank file parses genbank files and outputs the results in Agave XML format.	fileUrl	genbankdata	unclassified
SwissProtParserWorker	Read SwissProt file parses a SwissProt file and outputs the results in Agave XML format	fileUrl	results	unclassified
Classified as "jdbc"
SQLQueryWorker	Execute SQL Query executes SQL prepared statements, and returns the results as an array of arrays. It can also, optionally generate an XML representation of the results.	url, driver, userid, password, sql, params, provideXml	resultsList, xmlresults	unclassified
SQLUpdateWorker	Execute SQL Update execute SQL update/insert statements	url, driver, userid, password, sql, params	resultsList	unclassified
Classified as "base64"
EncodeBase64	Encode byte[] to base64 Encode byte[] data into base64 string	bytes	base64	unclassified
DecodeBase64	Decode base64 to byte[] Decode base64 string into byte[]	base64	bytes	unclassified
Classified as "moby"
CreateMobyData	Create moby data construct a biomoby data packet from either an ID or a string content	namespace, id, value, type	mobydata	unclassified
ExtractMobyData	Parse moby data extract simple data types from biomoby data packets.	mobydata	namespace, id, value, type	unclassified
CreateMobyCollection	Create a moby collection construct a biomoby data packet from either an ID or a string content	collectionName, mobySimple1, mobysimple2, ..., mobysimple35	mobyCollection	unclassified
BioMoby services
mobycentral	Arbitrary biomoby service no description	namespace 'text/plain', id 'text/plain', article name 'text/plain'	mobyData 'text/xml'	unclassified, depends on service

References

Duncan Hull, Robert Stevens and Phillip Lord. Describing Web Services for user-oriented retrieval. Accepted paper and presentation in W3C Workshop on Frameworks for Semantics in Web Services, Digital Enterprise Research Institute (DERI), Innsbruck, Austria. 2005-06-09.
Duncan Hull, Robert Stevens, Phillip Lord, Chris Wroe and Carole Goble. Treating shimantic web syndrome with ontologies. In First Advanced Knowledge Technologies workshop on Semantic Web Services (AKT-SWS04) KMi, The Open University, Milton Keynes, UK. 2004-12-08. (See Workshop proceedings CEUR-WS.org (issn:1613-0073) Volume 122 - AKT-SWS04)

Description and classification of shims in myGrid

Table of Contents

1. Classification of shims by I/O relationship

2. Classification of shims by input

Classified as "List"

Classified as "io"

Classified as "Metadata"

Classified as "xml"

Classified as "ncbi"

Classified as "net"

Classified as "text"

Classified as "biojava"

Classified as "jdbc"

Classified as "base64"

Classified as "moby"

BioMoby services

References