Last updated on 2006-11-23 by Duncan Hull, University of Manchester, UK.
This document outlines some the “shims” in myGrid, software components that align the input and output of closely related data. The shims have been implemented in two ways: firstly, on the client-side in Taverna as “Local Java Widgets” and the server-side as individual Web Services. Two classifications are presented here, the first: by Input which describes the client-side shims for software developers. The second: classifies both client and server side shims, for end-users (e.g. biologists), based on the relationship between the input and output of a shim.
DISCLAIMER: This technical report is an incomplete and ongoing work, liable to change without notice. Some of these software components have been difficult to describe because its not clear from the documentation, what the inputs and outputs they take and produce are, without invoking them. Software these are commented in red. Also, the Taverna workbench is a moving target, so some of these components (e.g. BIND) have become obsolete but are left here for historical purposes.
This classification aims to help end-users of Taverna find shim services when required. They are currently organised into four classes, based on the relationship between the input and output of the shim.
Some of these types of shim are shown in the following workflows, which can be visualised and run in the Taverna workbench.
The classification below by Tom Oinn and Mark Fortner is based on the type of data that the shim operates on (e.g. xml, ncbi, text etc). This classification is currently used to organise the shims in the Available Services Panel under Available processors > Local Services > Local Java widgets. However, this classification is currently of limited use to users of Taverna who are not software developers.
Service name (with link to Taverna API) | Description | Input I | Output O | Relation between O and I |
---|---|---|---|---|
Classified as "List" |
||||
StringListMerge | Merge string list to string. Consumes a string list and optional seperator character and emits a string formed from the concatenation of all items in the list with the seperator (default newline) interposed between them. | String list (and optional seperator) | Merged list | output hasPart input |
FlattenList | Flatten I(I()) to I(). Consume a list of lists and emit a list containing the first level flattening of the input. | List. Set of sets? | List. Subset? | Unclassified |
StringStripDuplicates | Remove duplicate stringsConsumes a string list and emits the string list with duplicate entries removed. The first occurance of a duplicate is preserved and all subsequent ones omited, i.e the string list 'a,b,c,b,a,d' is converted to 'a,b,c,d' | String list | String list (stripped) | Unclassified |
EchoList | EchoList. Echo the input list to the output list, does no actual processing at all. This class is intended to be used in conjunction with nested workflows in order to split the iteration out from the previous stage in the flow. | List | List | Unclassified |
Classified as "io" |
||||
TextFileWriter | Write Text File. This processor writes the "filecontents" out to the the url specified in the "outputFile" parameter. Note that the outputMap is always empty. | filecontents? | outputFile | Unclassified |
EnvVariableWorker | Get Environment Variables as XML. This processor exposes the Java environment variables as an XML document. | None | Environment variables | n/a |
FileListByRegexTask | List files by regexThis processor lists the files in a given subdirectory using a regular expression. | Directory and regular expression | FileList | Directory hasPart file? |
LocalCommand | Execute cmd line app This processor executes a commandline and returns the response as a String | command (string) | result (string) | Unclassified, could be anything, depends on command |
FileListByExtTask | List Files By Extension This processor gets a list of files on a local directory | directory, extension | filelist | directory hasPart file? |
DataRangeTask | Select Data Range From File Extracts a range of values from a two dimension data array | array, starting point, end point | array | outputArray isPartOf inputArray |
TextFileReader | Read Text File reads text from a file specified by the "fileurl" attribute. and returns the results in the "filecontents" item in the outputMap | fileurl | filecontents | filecontents isidentifiedby fileurl |
ConcatenateFileListWorker | Concatenate files concatenates a series of text files and saves the results into the output file | filelist, outputfile, displayresults | results | results hasPart input |
DataRangeColumnTask | Select Column From File extract a single column of data from a data array produced by either an ExcelFileReader or by a DelimitedFileReader | array, column | array | Unclassified, SELECT or PROJECT? |
ExcelFileReader | Read Excel File reads an Excel spreadsheet and creates an ArrayList of ArrayLists containing string data. Note that Formula's are not currently evaluated and thus are returned as empty strings. | filename, firstRowContainsColumnNames, dateIndexes | data | Unclassified |
Classified as "Metadata" |
||||
GetLSID | Get internal LSID of input Outputs "replacelsid:input" which should be substituted for the input's lsid by the ProcessorTask. Chris Greenhalgh wrote this strips out two (or three?) of the five? components of an LSID | input | replacelsid | Unclassified: try it out |
Classified as "xml" |
||||
XPathTextWorker | XPath From Text applies an arbitrary XPath expression to an XML document, and returns a nodelist containing the nodes that match the XPath expression. | xpath, xml-text | nodelist, nodelistAsXML | Unclassified |
XSLTWorker | Transform XML transforms an input XML document into an output document. If an inFileURL is supplied, it will use the document located at the URL as input. If the xml-text is supplied, it will this in-memory XML document as input. If an outputFile url is supplied, the results will be written to the output document. | xslFileURL, outFileURL, inFileURL, outputExt: @tavinput xslFileURL The complete path to XSL file, @tavinput outFileURL The complete path to the output file. (optional), @tavinput inFileURL The complete path to the input file, @tavinput xml-text The XML text to be processed. (optional), @tavinput outputExt The output file extension. Use this only if you want to add the extension to the input filename and use it as the output file name. | outputStr, @tavoutput outputStr A string containing the output text. This is useful, if you want to connect this processor to another and pass the results to it. | Unclassified |
XPathWorker | XPath From XML File applies an arbitrary XPath expression to an XML document, and returns a nodelist containing the nodes that match the XPath expression.. | xpath, xmltext | xml-text, nodelist | unclassified |
Classified as "ncbi"note parameter naming conventions follow those outlined in Entrez Programming Utilities. |
||||
SNPWorker | Get SNP XML fetches SNP records. | id, retttype, retmode Doesn't work. Use snp id e.g. rs3091213 (taken from this list of SNPs) with no luck. See Get SNP example workflow (produces empty results). rettype defaults to xml, but its not clear what values retmode takes and if its mandatory or not. | resultsXml | I uniquelyIdentifes O |
ProteinGBSeqWorker | Get Protein GBSeq XML fetches protein data in GBSeq XML format. | id e.g. GenBank identifier without leading GI: e.g. 1293613 see GenBank Sequence example workflow | outputText | I uniquelyIdentifes O |
NucleotideGBSeqWorker | Get Nucleotide GBSeq XML returns a GB Seq formatted record. | id e.g. GenBank identifier without leading GI: e.g. 1293613 see GenBank Sequence example workflow | outputText | I uniquelyIdentifes O |
NucleotideFastaWorker | Get Nucleotide FASTA fetches a nucleotide sequence in FASTA format. | id Accession number e.g. U49845 see Nucleotide FASTA workflow example. | outputText | I uniquelyIdentifes O |
NucleotideINSDSeqXMLWorker | Get Nucleotide INSDSeq XML returns a INSD formatted nucleotide record | ide.g. The nucleotide accession. U49845, see Nucleotide INSD workflow example. | outputText, e.g. INSD formatted nucleotide record | I uniquelyIdentifes O |
HomoloGeneWorker | Homologene XML fetches HomoloGene data from NCBI. | term, maxRecords, outputFile, xslt, ext Example? Homologene terms? [Ancestor] Taxonomic name of common ancestor of the species represented in a HomoloGene entry. [Gene Description] Detailed description of a Gene. [Gene Id] Unique Gene Identifier. [Gene Name] Gene Aliases. [Nucleotide Accession] GenBank accession identifier of nucleotide sequence. [Nucleotide GI] Unique Nucleotide identifier. [Organism] Description of the organism or the NCBI Taxonomy ID of a species. [Protein Accession] The protein accession number of a protein. [Protein GI] Unique Protein identifier. [Text Word] Free text to be searched for in HomoloGene. [Title] Summary of HomoloGene entry [UniGene ID] Unique Unigene identifier. | resultsXml | unclassified |
ProteinFastaWorker | Get Protein FASTA fetches a protein sequence in FASTA format. | id is this a GenBankIdentifier or something else? | outputText | dereferencer |
LocusLinkWorker | Get LocusLink XML fetch locuslink data from the NCBI database as XML. | id, rettype, retmode is this a GenBankIdentifier or something else? What values do rettype and retmode take? | outputText | dereferencer |
PubMedSearchWorker | Search PubMed downloads PubMed records in XML format. Since NCBI does not currently support a pure XML | term, database, minDate, maxDate, reldate, rettype, cmd, cmd_current, dopt, orig_db, disp_max any valid pubmed query string / term? (e.g. apweiler), this doesn't currently work, are all the other parameters mandatory? See PubMed search example workflow | resultsXml | unclassified |
NucleotideTinySeqXMLWorker | Get Nucleotide TinySeq XML fetches a nucleotide sequence from NCBI and returns the results in the TinySeqXML format. | id is this a GenBankIdentifier or something else? | outputText | dereferencer |
OMIMWorker | Get OMIM XML fetches an OMIM record from the NCBI database in XML format. | term, maxRecords, outputFile, xslt, ext Can you give example terms? Are the other parameters optional or mandatory? | resultsXml | unclassified |
EntrezGeneWorker | Get Entrez Gene XML fetching an Entrez Gene record in XML format. It can also transform the resulting XML document. | term, maxRecords, outputFile, xslt, ext Can you give example terms? Are the other parameters optional or mandatory? | resultsXml | unclassified |
ProteinINSDSeqXMLWorker | Get Protein INSDSeq XML fetches an INSD formatted protein record | id is this a GenBankIdentifier or something else? | outputText | dereferencer |
ProteinTinySeqXMLWorker | Get Protein TinySeq XML fetches a protein in TinySeqXML format. | id is this a GenBankIdentifier or something else? | outputText | dereferencer |
EntrezProteinWorker | Get Entrez Protein XML processor fetches an Entrez Protein record from NCBI. | term, maxRecords, outputFile, xslt, ext Can you give example terms? Are the other parameters optional and what do they do? | resultsXml | unclassified |
PubMedEFetchWorker | Get PubMed XML by PMID PubMed articles in XML form. Use this worker only if you already know the pubmed id | id, rettype, retmode e.g. 15262813, see PMID workflow example | outputText | dereferencer |
NucleotideXMLWorker | Get Nucleotide XML fetches Nucleotide XML documents. | term, maxRecords, outputFile, xslt, ext Can you give example terms? Are the other parameters optional or mandatory? | resultsXml | unclassified |
PubMedESearchWorker | Search PubMed XML searches for articles in PubMed and returns their IDs in XML format | term, db, field, retstart, retmax, mindate, maxdate, rettype Can you give example terms? Are the other parameters optional or mandatory? | outputText | unclassified |
Classified as "net" |
||||
ExtractImageLinks | Get image URLs from HTTP document Extract a list of all image links in the supplied html document | document | imagelinks | unclassified |
SendEmail | Send an email Send an email from a workflow | to, from, subject, body, smtpserver | none | n/a |
WebPageFetcher | Get web page from URL Fetch a single web page from URL | url, base | contents | dereferencer |
WebImageFetcher | Get image from URL Fetch a single image from URL | url, base | image | dereferencer |
Classified as "text" |
||||
ByteArrayToString org.embl.ebi.escience. scuflworkers.java.ByteArrayToString | Byte[] to String No description available yet. There isn't a String to Byte[] but there probably should be. | bytes 'application/octet-stream' | string 'text/plain' | syntax translator |
StringSetUnion | String list union Provide the union of two lists of strings, the result being a string list containing all strings that occur in either of the input lists. | list1, list2 | union | unclassified |
StringConcat | Concatenate two strings Returns the result of appending firststring to secondstring | string1, string2 | output | unclassified |
StringSetDifference | String list difference Returns the items that are different between two sets or lists of string types where elements only exist in the output if they occur in either input, but not both | list1, list2 | difference | unclassified |
FilterStringList | Filter list of strings by regex Filter a list of Strings, only passing through those that match the supplied regular expression | stringlist, regex | filteredlist | unclassified |
SplitByRegex | Split string into string list by regular expression Split an input string into a list of strings using the given regular expression to determine the delimiter. If the regular expression is not supplied then it will default to the ',' character | string, regex | split | unclassified |
PadNumber | Pad numeral with leading 0's Pad a numeral with leading zeroes to take it up to a specified length, which defaults to seven. | input, targetlength | padded | unclassified |
RegularExpressionStringList | Filter list of strings extracting match to a regex Apply a regular expression to a string, returning a group that matches if there is a match. | stringlist, regex, group | filteredlist | unclassified |
StringSetIntersection | String list intersection Returns the intersection of two sets or lists of string types where elements only exist in the output if they occur in both inputs. | list, list2 | intersection | unclassified |
Classified as "biojava" |
||||
BlastParserWorker | Read BLAST results parses BLAST results and returns an XML document containing the results. | fileUrl, strict | blastresults | unclassified |
TranscribeWorker | Transcribe DNA takes a DNA sequence and transcribes it into an RNA sequence | dna_seq | rna_seq | unclassified |
EMBLParserWorker | Read EMBL file parses an EMBL-based file and outputs the results in Agave XML format. | fileUrl | emblFile | unclassified |
TranslateWorker | Translate DNA translates a DNA sequence into a protein sequence. | dna_seq | prot_seq | unclassified |
ReverseCompWorker | Reverse Complement DNA takes a raw DNA sequence and returns the reverse complement of the sequence. | rawSeq | revSeq | unclassified |
GenBankParserWorker | Read GenBank file parses genbank files and outputs the results in Agave XML format. | fileUrl | genbankdata | unclassified |
SwissProtParserWorker | Read SwissProt file parses a SwissProt file and outputs the results in Agave XML format | fileUrl | results | unclassified |
Classified as "jdbc" |
||||
SQLQueryWorker | Execute SQL Query executes SQL prepared statements, and returns the results as an array of arrays. It can also, optionally generate an XML representation of the results. | url, driver, userid, password, sql, params, provideXml | resultsList, xmlresults | unclassified |
SQLUpdateWorker | Execute SQL Update execute SQL update/insert statements | url, driver, userid, password, sql, params | resultsList | unclassified |
Classified as "base64" |
||||
EncodeBase64 | Encode byte[] to base64 Encode byte[] data into base64 string | bytes | base64 | unclassified |
DecodeBase64 | Decode base64 to byte[] Decode base64 string into byte[] | base64 | bytes | unclassified |
Classified as "moby" |
||||
CreateMobyData | Create moby data construct a biomoby data packet from either an ID or a string content | namespace, id, value, type | mobydata | unclassified |
ExtractMobyData | Parse moby data extract simple data types from biomoby data packets. | mobydata | namespace, id, value, type | unclassified |
CreateMobyCollection | Create a moby collection construct a biomoby data packet from either an ID or a string content | collectionName, mobySimple1, mobysimple2, ..., mobysimple35 | mobyCollection | unclassified |
BioMoby services |
||||
|
Arbitrary biomoby service no description | namespace 'text/plain', id 'text/plain', article name 'text/plain' | mobyData 'text/xml' | unclassified, depends on service |