Why This Matters

Scientific workflows enable large-scale computational analysis through specification of data processing activities and dependencies, yet diverse workflow systems with specialized requirements make it difficult to reconcile them. This work innovates by providing systematic comparison of workflow management approaches and identifying key features needed for effective scientific workflow execution. The survey highlights the importance of workflow lifecycle management including composition, execution tracking, fault tolerance, and data provenance for reproducible science.

What We Did

This paper presents a survey of scientific workflow management systems and their features for specifying, managing, and monitoring scientific computation workflows. The work compares workflow systems including Kepler, Pegasus, Chimera, and others across dimensions such as composition, representation, mapping, execution, fault tolerance, metadata handling, and provenance. It provides comprehensive analysis of how different systems handle the challenges of managing large-scale scientific computations across distributed infrastructure.

Key Results

The survey identifies key differences in how workflow systems handle composition and representation, with some supporting graphical interfaces while others use textual specifications. Results show variations in fault tolerance approaches and metadata handling, with most systems using ad-hoc techniques rather than integrated frameworks. The work demonstrates that effective workflow management requires addressing multiple interdependent concerns from specification through execution monitoring to result verification.

Full Abstract

Cite This Paper

@techreport{Saxena2011,
  author = {Saxena, Tripti and Dubey, Abhishek},
  institution = {Insititute for Software Integrated Systems, Vanderbilt University},
  title = {Meta-Tools For Designing Scientific Workflow Management Systems: Part-I, Survey},
  year = {2011},
  number = {ISIS-11-105},
  abstract = {Scientific workflows require the coordination of data processing activities, resulting in executions driven by data dependencies. Due to the scales involved and the repetition of analysis, typically workflows are analyzed in coordinated campaigns, each execution managed and controlled by the workflow management system. In this respect, a workflow management system is required to (1) provide facilities for specifying workflows: intermediate steps, inputs/outputs, and parameters, (2) manage the execution of the workflow based on specified parameters, (3) provide facilities for managing data provenance, and (4) provide facilities to monitor the progress of the workflow, include facilities to detect anomalies, isolate faults and provide recovery actions. In this paper, part-I of a two part series, we provide a comparison of some state of the art workflow management systems with respect to these four primary requirements.},
  attachments = {http://www.isis.vanderbilt.edu/sites/default/files/Survey-report.pdf},
  contribution = {colab},
  file = {:Saxena2011-Meta-tools_for_Designing_Scientific_Workflow_Management_Systems_Survey.pdf:PDF},
  keywords = {scientific workflows, workflow management, distributed computing, fault tolerance, data provenance, monitoring}
}
Quick Info
Year 2011
Keywords
scientific workflows workflow management distributed computing fault tolerance data provenance monitoring
Research Areas
middleware scalable AI
Search Tags

Meta, Tools, Designing, Scientific, Workflow, Management, Systems, Part, Survey, scientific workflows, workflow management, distributed computing, fault tolerance, data provenance, monitoring, middleware, scalable AI, 2011, Saxena, Dubey