net.sf.taverna.t2.provenance.api
Class ProvenanceAccess

java.lang.Object
  extended by net.sf.taverna.t2.provenance.api.ProvenanceAccess

public class ProvenanceAccess
extends java.lang.Object

Author:
Paolo Missier, Stuart Owen

This API is the single access point into the Taverna provenance database. Its main functionality is to let clients query the content of the DB, either using dedicated methods that retrieve specific entity values from the DB, or through a more general XML-based query language. Examples of XML provenance queries can be found in the external package net.sf.taverna.t2.provenance.apic.client.resources. Class net.sf.taverna.t2.provenance.api.client.ProvenanceAPISampleClient provides an example of API client that third parties would use to interact with this API.
The XML schema for the XML query language is pquery.xsd in net.sf.taverna.t2.provenance.apic.client.resources


Constructor Summary
ProvenanceAccess(java.lang.String connectorType)
           
 
Method Summary
 java.sql.Statement execSQLQuery(java.lang.String q)
          pass-through query method
 QueryAnswer executeQuery(Query pq)
          Executes a provenance query.
 Dependencies fetchPortData(java.lang.String wfInstance, java.lang.String workflowId, java.lang.String pname, java.lang.String port, java.lang.String iteration)
          Returns individal records from the provenance DB in response to a query that specifies specific elements within values associated with a processor port, in the context of a specific run of a workflow.
 java.util.List<WorkflowInstance> getAllWorkflowIDs()
           
 java.util.List<Arc> getArcs(java.util.Map<java.lang.String,java.lang.String> queryConstraints)
           
 java.util.List<Collection> getCollectionsForRun(java.lang.String wfInstanceID)
           
 java.lang.String getContainingCollection(LineageQueryResultRecord record)
           
 java.util.List<Workflow> getContainingWorkflowsForProcessor(java.lang.String pname)
           
 net.sf.taverna.t2.invocation.InvocationContext getInvocationContext()
           
 java.lang.String getLatestRunID()
           
 ProvenanceAnalysis getPa()
           
 java.util.List<Var> getPortsForDataflow(java.lang.String workflowID)
          lists all ports for a processor
 java.util.List<Var> getPortsForProcessor(java.lang.String workflowID, java.lang.String processorName)
          list all ports for a specific processor within a workflow
 ProvenanceQuery getPq()
           
 java.util.List<ProcBinding> getProcBindings(java.util.Map<java.lang.String,java.lang.String> constraints)
           
 java.util.List<ProvenanceProcessor> getProcessorsForWorkflowID(java.lang.String workflowID)
           
 java.util.Map<java.lang.String,java.util.List<ProvenanceProcessor>> getProcessorsInWorkflow(java.lang.String workflowID)
           
 ProvenanceConnector getProvenanceConnector()
           
 java.lang.String getTopLevelWorkflowID(java.lang.String runID)
           
 java.util.List<VarBinding> getVarBindings(java.util.Map<java.lang.String,java.lang.String> constraints)
           
 java.util.List<Var> getVars(java.util.Map<java.lang.String,java.lang.String> queryConstraints)
           
 java.util.List<Workflow> getWorkflowForRun(java.lang.String runID)
           
 java.util.List<java.lang.String> getWorkflowID(java.lang.String runID)
          returns a set of workflowIDs for a given runID.
 java.lang.String getWorkflowIDForExternalName(java.lang.String workflowName)
           
 void init()
           
static void initDataSource(java.lang.String driverClassName, java.lang.String jdbcUrl)
          The recommended data source intitialisation method, where only a driver name and jdbc url are required.
If the driver supports multiple connections, then a pool will be created of 10 min idle, 50 max idle, and 50 max active connections.
static void initDataSource(java.lang.String driverClassName, java.lang.String jdbcUrl, java.lang.String username, java.lang.String password, int minIdle, int maxIdle, int maxActive)
          Initialises a named JNDI DataSource if not already set up externally.
 net.sf.taverna.t2.invocation.InvocationContext initDefaultReferenceService()
          Initialises a default Reference Service for storing data and their associated references.
 net.sf.taverna.t2.invocation.InvocationContext initReferenceService(java.lang.String hibernateContext)
          Initialises the Reference Service for a given hibernate context definition.
 boolean isAttachOPMArtifactValues()
           
 boolean isIncludeProcessorOutputs()
           
 boolean isOPMGenerationActive()
           
 boolean isTopLevelDataflow(java.lang.String wfNameID)
           
 java.util.List<WorkflowInstance> listRuns(java.lang.String workflowId, java.util.Map<java.lang.String,java.lang.String> conditions)
           
 java.util.Set<java.lang.String> removeRun(java.lang.String runID)
          Removes all records that pertain to a specific run (but not the static specification of the workflow run)
 void removeWorkflow(java.lang.String wfName)
          removes all records pertaining to the static structure of a workflow.
 void setPa(ProvenanceAnalysis pa)
           
 void setPq(ProvenanceQuery pq)
           
 void setProvenanceConnector(ProvenanceConnector provenanceConnector)
           
 void toggleAttachOPMArtifactValues(boolean active)
          should actual artifact values be attached to OPM artifact nodes?
default is FALSE
THIS IS CURRENTLY UNSUPPORTED -- DEFAULTS TO FALSE
 void toggleIncludeProcessorOutputs(boolean active)
          include valus of output ports in the query result? input port values are always included
default is FALSE
 void toggleOPMGeneration(boolean active)
          should an OPM graph be generated in response to a query?
default is TRUE
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

ProvenanceAccess

public ProvenanceAccess(java.lang.String connectorType)
Method Detail

initDataSource

public static void initDataSource(java.lang.String driverClassName,
                                  java.lang.String jdbcUrl)
The recommended data source intitialisation method, where only a driver name and jdbc url are required.
If the driver supports multiple connections, then a pool will be created of 10 min idle, 50 max idle, and 50 max active connections.

Parameters:
driverClassName -
jdbcUrl -

initDataSource

public static void initDataSource(java.lang.String driverClassName,
                                  java.lang.String jdbcUrl,
                                  java.lang.String username,
                                  java.lang.String password,
                                  int minIdle,
                                  int maxIdle,
                                  int maxActive)
Initialises a named JNDI DataSource if not already set up externally. The DataSource is named jdbc/taverna

Parameters:
driverClassName - - the classname for the driver to be used.
jdbcUrl - - the jdbc connection url
username - - the username, if required (otherwise null)
password - - the password, if required (oteherwise null)
minIdle - - if the driver supports multiple connections, then the minumum number of idle connections in the pool
maxIdle - - if the driver supports multiple connections, then the maximum number of idle connections in the pool
maxActive - - if the driver supports multiple connections, then the minumum number of connections in the pool

initDefaultReferenceService

public net.sf.taverna.t2.invocation.InvocationContext initDefaultReferenceService()
Initialises a default Reference Service for storing data and their associated references. This creates a reference service using the named JNDI Data Source 'jdbc/taverna'.
the new Reference Service is associated to the ProvenanceConnector, enabling data references to be resolved


initReferenceService

public net.sf.taverna.t2.invocation.InvocationContext initReferenceService(java.lang.String hibernateContext)
Initialises the Reference Service for a given hibernate context definition. This mapping file must be available in the root of the classpath.

Parameters:
hibernateContext -
See Also:
initDefaultReferenceService()

init

public void init()

execSQLQuery

public java.sql.Statement execSQLQuery(java.lang.String q)
                                throws java.lang.InstantiationException,
                                       java.lang.IllegalAccessException,
                                       java.lang.ClassNotFoundException,
                                       java.sql.SQLException
pass-through query method

Parameters:
q - valid JDBC query string for the T2provenance schema
Returns:
the executed Statement if successull, null otherwise
Throws:
java.lang.InstantiationException
java.lang.IllegalAccessException
java.lang.ClassNotFoundException
java.sql.SQLException

executeQuery

public QueryAnswer executeQuery(Query pq)
                         throws java.sql.SQLException
Executes a provenance query. Please see separate doc. for the XML query language schema.

Throws:
java.sql.SQLException

fetchPortData

public Dependencies fetchPortData(java.lang.String wfInstance,
                                  java.lang.String workflowId,
                                  java.lang.String pname,
                                  java.lang.String port,
                                  java.lang.String iteration)
Returns individal records from the provenance DB in response to a query that specifies specific elements within values associated with a processor port, in the context of a specific run of a workflow.
This is used in the workbench to retrieve the "intermediate results" at various points during workflow execution, as opposed to a set of dependencies in response to a full-fledged provenance query.

Parameters:
wfInstance - lineage scope -- a specific instance
pname - for a specific processor [required]
a - specific (input or output) variable [optional]
iteration - and a specific iteration [optional]
Returns:
a list of , encapsulated in a Dependencies object
Throws:
java.sql.SQLException

getContainingCollection

public java.lang.String getContainingCollection(LineageQueryResultRecord record)
Parameters:
record - a record representing a single value -- possibly within a list hierarchy
Returns:
the URI for topmost containing collection when the input record is within a list hierarchy, or null otherwise

listRuns

public java.util.List<WorkflowInstance> listRuns(java.lang.String workflowId,
                                                 java.util.Map<java.lang.String,java.lang.String> conditions)
Parameters:
workflowId - defines the scope of the query - if null then the query runs on all available workflows
conditions - additional conditions to be defined. This is a placeholder as conditions are currently ignored
Returns:
a list of wfInstanceID, each representing one run of the input workflowID

isTopLevelDataflow

public boolean isTopLevelDataflow(java.lang.String wfNameID)

getLatestRunID

public java.lang.String getLatestRunID()
                                throws java.sql.SQLException
Throws:
java.sql.SQLException

removeRun

public java.util.Set<java.lang.String> removeRun(java.lang.String runID)
Removes all records that pertain to a specific run (but not the static specification of the workflow run)

Parameters:
runID - the internal ID of a run. This can be obtained using listRuns(String, Map)
Returns:
the set of data references that pertain to the deleted run. This can be used by the Data Manager to ensure that no dangling references are left in the main Taverna data repositorry

removeWorkflow

public void removeWorkflow(java.lang.String wfName)
removes all records pertaining to the static structure of a workflow.

Parameters:
wfName - the ID (not the external name) of the workflow whose static structure is to be deleted from the DB

getWorkflowID

public java.util.List<java.lang.String> getWorkflowID(java.lang.String runID)
returns a set of workflowIDs for a given runID. The set is a singleton if the workflow has no nesting, but in general the list contains one workflowID for each nested workflow involved in the run

Parameters:
runID - the internal ID for a specific workflow run
Returns:
a list of workflow IDs, one for each nested workflow involved in the input run

getWorkflowForRun

public java.util.List<Workflow> getWorkflowForRun(java.lang.String runID)
                                           throws java.sql.SQLException
Throws:
java.sql.SQLException

getTopLevelWorkflowID

public java.lang.String getTopLevelWorkflowID(java.lang.String runID)
Parameters:
runID - the internal ID for a specific workflow run
Returns:
the ID of the top-level workflow that executed during the input run

getAllWorkflowIDs

public java.util.List<WorkflowInstance> getAllWorkflowIDs()
Returns:
a list of WorkflowInstance beans, each representing the complete description of a workflow run (note that this is not just the ID of the run)

getContainingWorkflowsForProcessor

public java.util.List<Workflow> getContainingWorkflowsForProcessor(java.lang.String pname)
Parameters:
a - workflow processor name
Returns:
the IDs of all workflows that contain a processor named pname

getProcessorsInWorkflow

public java.util.Map<java.lang.String,java.util.List<ProvenanceProcessor>> getProcessorsInWorkflow(java.lang.String workflowID)
Parameters:
workflowID -
Returns:
a Map: workflowID -> [ ] Each entry in the list pertains to one composing sub-workflow (if no nesting then this contains only one workflow, namely the top level one)

getVars

public java.util.List<Var> getVars(java.util.Map<java.lang.String,java.lang.String> queryConstraints)
                            throws java.sql.SQLException
Throws:
java.sql.SQLException

getArcs

public java.util.List<Arc> getArcs(java.util.Map<java.lang.String,java.lang.String> queryConstraints)
                            throws java.sql.SQLException
Throws:
java.sql.SQLException

getProcBindings

public java.util.List<ProcBinding> getProcBindings(java.util.Map<java.lang.String,java.lang.String> constraints)
                                            throws java.sql.SQLException
Throws:
java.sql.SQLException

getCollectionsForRun

public java.util.List<Collection> getCollectionsForRun(java.lang.String wfInstanceID)

getVarBindings

public java.util.List<VarBinding> getVarBindings(java.util.Map<java.lang.String,java.lang.String> constraints)
                                          throws java.sql.SQLException
Throws:
java.sql.SQLException

getPortsForDataflow

public java.util.List<Var> getPortsForDataflow(java.lang.String workflowID)
lists all ports for a processor

Parameters:
workflowID -
Returns:
a list of Var beans, each representing an input or output port for the workflow

getPortsForProcessor

public java.util.List<Var> getPortsForProcessor(java.lang.String workflowID,
                                                java.lang.String processorName)
list all ports for a specific processor within a workflow

Parameters:
workflowID -
processorName -
Returns:
a list of Var beans, each representing an input or output port for the input processor

toggleIncludeProcessorOutputs

public void toggleIncludeProcessorOutputs(boolean active)
include valus of output ports in the query result? input port values are always included
default is FALSE


isIncludeProcessorOutputs

public boolean isIncludeProcessorOutputs()

getInvocationContext

public net.sf.taverna.t2.invocation.InvocationContext getInvocationContext()
Returns:
an instance of InvocationContext that can be used by a client to deref a Taverna data reference

toggleOPMGeneration

public void toggleOPMGeneration(boolean active)
should an OPM graph be generated in response to a query?
default is TRUE


isOPMGenerationActive

public boolean isOPMGenerationActive()
Returns:
true if OPM is set to be generated in response to a query

toggleAttachOPMArtifactValues

public void toggleAttachOPMArtifactValues(boolean active)
should actual artifact values be attached to OPM artifact nodes?
default is FALSE
THIS IS CURRENTLY UNSUPPORTED -- DEFAULTS TO FALSE

Parameters:
active -

isAttachOPMArtifactValues

public boolean isAttachOPMArtifactValues()
Returns:
true if the OPM graph artifacts are annotated with actual values

getWorkflowIDForExternalName

public java.lang.String getWorkflowIDForExternalName(java.lang.String workflowName)

getProcessorsForWorkflowID

public java.util.List<ProvenanceProcessor> getProcessorsForWorkflowID(java.lang.String workflowID)

getProvenanceConnector

public ProvenanceConnector getProvenanceConnector()
Returns:
the singleton ProvenanceConnector used by the API to operate on the DB. Currently we support MySQL MySQLProvenanceConnector and Derby DerbyProvenanceConnector connectors. The set of supported connectors is extensible. The available connectors are discovered automatically by the API upon startup, and it includes all the connectors that are mentioned in the <dependencies> section of pom.xml for Maven module net.sf.taverna.t2.core.provenanceconnector

setProvenanceConnector

public void setProvenanceConnector(ProvenanceConnector provenanceConnector)
Parameters:
a - specific provenanceConnector used by the API

getPa

public ProvenanceAnalysis getPa()
Returns:

setPa

public void setPa(ProvenanceAnalysis pa)
Parameters:
pa - the pa to set

getPq

public ProvenanceQuery getPq()
Returns:
the pq

setPq

public void setPq(ProvenanceQuery pq)
Parameters:
pq - the pq to set