Design, editing and execution of scientific workflows

Note: Taverna is no longer maintained, this page is provided for archival purposes.

About Taverna

Taverna was a powerful, scalable, open source & domain independent Workflow Management System written in Java – a suite of tools used to design and execute scientific workflows and aid in silico experimentation comprising:

Workbench (desktop client application)
Command Line Tool (for a quick execution of workflows from a terminal)
Server (for remote execution of workflows)
Player (Web interface plugin for submitting workflows for remote execution written in Ruby on Rails)
Taverna Online (for creation of Taverna workflows from a Web browser)
Taverna Mobile (for running Taverna workflows from an Android phone)

Taverna was used for graphically composing and executing computational workflows combining diverse sets of WSDL/REST web services, command line tools, spreadsheets, R scripts and user interactions.

The core user base of Taverna was in bioinformatics, particularly for combining public genomics databases with disparate web services, but Taverna was used in many scientific domains such as astronomy, biodiversity, chemistry, data mining and digital preservation.

Developments of Taverna pioneered and experimented in many novel directions for workflow systems, including:

Semantic service descriptions
Rich structured provenance of workflow executions
Interactive user interface
Combining multiple grid backends
Using fine-structured web services
Customizable through a plugin-system
Formalization of workflow execution semantics

Often seen as influential for development of scientific workflow systems and bioinformatics practices, publications about Taverna have received many thousands of citations.

The Common Workflow Language is strongly inspired by Taverna’s execution semantics and its formalisations in the Research Object model wfdesc ontology.

History

The below timeline has been extracted from the slides 2014-10-30 Taverna as an Apache Incubator project:

2020: Taverna Project Retired

From 2014 till 2020 the Taverna code base was maintained by the Apache Incubator project Apache Taverna (incubating) (see web archive and podling status).

For several successful years, Taverna saw increased developer contributons and progress towards open development practices and graduation from the Apache Incubator. Several releases of Apache Taverna modules were made, involving many new volunteers and Google Summer of Code students.

Through these years, while observing a rise in the number of workflow platforms, there were difficulties in establishing a new sustainable development model, with no funding acquired for core development, which combined with the shift of emphasis by the original Taverna team to workflow interoperability took its toll. Apache Taverna Workbench 3.1 did not get released, and in 2019 Taverna PMC started considering to halt further development.

In 2020 the Taverna community voted to retire Taverna as a project and withdraw the code base from the Apache Software Foundation.

The Taverna 3.x code base remains available under the Apache License 2.0, but is now simply called Taverna rather than ~~Apache Taverna (incubating)~~.

After retirement the code repository and website is being moved from Apache’s infrastructure to the taverna GitHub organisation; although the code base is no longer actively maintained, pull requests may infrequently be considered by remaining volunteer caretakers.

Currently effort is focused on archiving and documenting almost 20 years of Taverna’s history.

Archived repositories (Note: will be moved/renamed):

Archived website: https://web.archive.org/web/20200312133332/https://taverna.incubator.apache.org/

2014-2020: Taverna moves to the Apache Incubator

After almost a decade with development mainly led by the eScience Lab team at the University of Manchester team, funded by multiple research projects, the Taverna community realised the need for moving to an open development model.

The Apache Software Foundation is a non-profit organization, forming a community of open-source software projects. ASF has a strong emphasis on openness, collaboration and a consensus-based development process.

Donating Taverna to ASF aimed to move to fully open development, encourage further “third-party” developer involvement, and reduce dependency of University of Manchester as lead developers; that is a collective code ownership across all developers.

In 2014, the Taverna project was proposed and accepted to join the Apache Software Foundation as a podling of the ASF Incubator.

The Taverna code base was re-licensed as Apache License 2.0, and all infrastructure moved to ASF. The Apache Taverna website was adapted for ASF incubator branding, which included this disclaimer:

Apache Taverna is an effort undergoing incubation at The Apache Software Foundation (ASF) sponsored by the Apache Incubator PMC. Incubation is required of all newly accepted projects until a further review indicates that the infrastructure, communications, and decision making process have stabilized in a manner consistent with other successful ASF projects. While incubation status is not necessarily a reflection of the completeness or stability of the code, it does indicate that the project has yet to be fully endorsed by the ASF.

The reason for the disclaimer is that adapting to The Apache Way take some time for a developer community to get used to, and in addition ASF requires IP governance checks on the donated code, e.g. for license compatibility. The aim of the incubation was to graduate to a top-level Apache project that would then be sustained and managed by its own members. Multiple ASF Members volunteered to mentor Taverna community on this path, in particular Andy Seaborne which was Taverna’s champion when initially proposing ASF incubation.

A refreshed Taverna community and a podling Project Management Committee (PMC) was formed to manage the project, including current, new and former developers of Taverna.

The software was reshaped into Apache Taverna (incubating) and several modules were released under the Apache Incubator banner. Google Summer of Code students developed new features and products, and, in the Apache Way of recognizing meritocracy, several new volunteers were voted to join the Taverna PPMC with equal voting rights.

Notable Apache Taverna (incubating) releases:

These releases remain available at https://archive.apache.org/dist/incubator/taverna/ and https://repo.maven.apache.org/maven2/org/apache/taverna/ using the Maven group ID and Java Package name org.apache.taverna (any forks of the archived code is recommended to change this identifier).

Unfortunately during 2019 and 2020 developer activity for Apache Taverna podling went quiet; with a lack of funding for paid developers, this lead to the project committee voting for retirement from ASF incubator rather than graduating.

2012-2020: Taverna 3.x

Development during the Taverna 2.x series highlighted that maintaining our own plugin system Raven was becoming a burden. This was powerful and flexible, e.g. allowing two different plugins to use incompatible versions of the same Java library, but it then also became inflexible to work with the modularized engine and workbench.

The original intention of Taverna 2 was to allow partial upgrades of individual modules, to move to a “rolling release” model (e.g. updating only the diagram code), but in reality, because of cross-dependencies of the different modules, difficulties in testing of developer builds, Taverna was always built and released as a full set of modules.

The t2flow format was also seen as a challenge, as it was tied tightly with the implementing activity modules (serializing their Java configurations 1:1), and its XML was overly complex for third-party tools to process or generate without loading the full Taverna libraries into the classpath. It was seen as important to “liberate” the workflow definition file format from the Taverna implementation.

Therefore it was decided to develop a new “abstracted” file format called SCUFL2 that standarized the definition using a combination of XML Schemas and ontologies, having a separate SCUFL2 Java API (later Taverna Language).

Replacing the custom Raven plugin system, Taverna 3.x was planned to move to the industry-standard dynamic module system OSGi, with a clearer separation between internal API interfaces and their implementations. The level of indirection offered by OSGi was envisioned to avoid the “spaghetti dependencies” previously encountered, but meant a change in how third-party developers would make plugins available for Taverna.

The previous Taverna 2.x Commandline had been made as “Workbench without the GUI” and was therefore still loading many components not used during workflow execution, e.g. service discovery. The Taverna 2 workbench was tightly bound to the t2core engine, e.g. for providing progress while execution, but this meant it could not be easily switched to remote execution. The Taverna 2.x Server was also in effect a REST API that executed the Taverna 2.x command line, and thus had no access to the workflow state while it was running.

To support many ways of remote execution, and alternate integration of the engine from other applications (e.g. from KNIME), the engine was separated to a new independent module called Taverna Platform, which provided a higher-level API that also exposed detailed provenance information of the current state. Using OSGi this was intended for connecting the desktop workbench to Taverna Engine runnning on remote servers.

The refactored Taverna engine and Taverna 3.0 prototypes became the basis for the code base that was donated to Apache Taverna.

Notable Taverna 3.x releases:

Version 3.0.a1 (2013): first alpha-release workbench
Version 3.0.a2 (2013): separated engine, command line and workbench with separate plugins

2007-2014: Taverna 2.x

The first adapted releases of Taverna 2.0 using t2core were released in early alpha/beta releases to be tested by selected users, before the official 2.1 workbench was released.

The workflow fileformat was changed from the FreeFluo SCUFL XML to a new t2flow XML format that serialized the t2core layers, supporting the extensible and configurable Taverna 2 execution semantics.

Code maintenance was improved by moving from SourceForge CVS to GoogleCode SVN, supporting better larger refactoring. The code was modularized into different Maven subprojects to support partial releases and upgrades, utillizing the Raven plugin system also for Taverna’s internals, e.g. user interface and activity plugins.

Releases increased in number and size, with installers and JVM for multiple operating systems and domains. The releases were moved from SourceForge, Google Code (archived), BitBucket and Launchpad. As Google Code was decommissioned, source code was moved from a big single SVN tree to multiple git repositories on GitHub.

While Taverna 1.x mainly used specialized RPC web services with corresponding plugins, it was found many users preferred the generic support for rich web services loading WSDL descriptions, which Taverna dynamically generated bindings and user interface for, at a time when most frameworks implemented WSDL bindings by autogenerating hardcoded source code.

Taverna then added methods for calling arbitrary REST services, although this predated later REST community development of API specifications (e.g. OpenAPI and JSON Schema), so Taverna relied on URI Templates entered by the user.

Focus on third-party developers was increased, providing documentation and examples, a (perhaps too) flexible plugin system, and extensible user-interface. This welcomed multiple extensions, e.g. additional file format, integrations with myExperiment for workflow discovery and BioCatalogue for service discovery (WSDL+REST). Plugins could be installed and loaded on the fly from multiple web-sites, based on Maven repositories.

New additional domain-specific activity types were added and extended, and the focus changed from Taverna as a bioinformatics-specific workbench to a generic tool composition platform, for instance for astronomy in Wf4Ever, biodiversity in BioVeL and digital preservation in SCAPE.

Extensive security support was added, such as SSL certificate management UI for using secured Grid Services, developed for Globus in the caBIG project and for ARC in the KnowARC project. This development also added support for stateful webservices (WSRF).

Provenance support was reintroduced for Taverna 2; a tighter integration into the t2core engine allowed fine-grained tracing of workflow executions, e.g. inspection of intermediate values and individual start/stop times for step iterations. Provenance could be exported in RDF format according to the Open Provenance Model (OPM), and later with Taverna-PROV as Research Objects using W3C PROV. Taverna developers were active in shaping the PROV standard and Research Object model based on experience within the Wf4Ever project and the myExperiment workflow repository.

Support for executing arbitrary command line tools were added to Taverna, as well as making Taverna workflows themselves executable on the command line and through a Taverna Server REST API. This was fully utillized in projects like BioVel, where a web portal allowed user-driven execution of “hidden” Taverna workflows with interactive steps to steer the execution.

Notable Taverna 2.x releases:

Version 2.0 (2008): t2core workflow engine, Reimplemented pluggable workbench.
Version 2.1.2 (2009) Improved support for 3rd-party plugins
Version 2.2 (2010) Taverna Server + ruby gem.
Version 2.5 (2014) Domain-specific editions (astronomy, bioinformatics, biodiversity, digital preservation). Taverna-PROV.

Note: Source code for 2.1-2.5 is archived in taverna-svn but is split across multiple modules - builds assume access to https://www.mygrid.org.uk/maven/repository/ for dependencies.

2006-2009: Productizing

From 2006, with funding for the myGrid platform and OMII-UK, the myGrid group in Mancester grew with several developers to productize the previous Taverna workbench, which had already seen a big uptake by bioinformatics users who were raising requirements for usability, extensibility and functionalities.

Internally several code refactorings were also due to support plugins, to improve the build/test/release cycle, and to provide installers for desktop .rs

Notable Taverna 1.x releases:

Version 1.4 (2006): Apache Maven-based build, executeworkflow command line
Version 1.5 (2006): Raven plugin system
Version 1.7 (2008): Taverna Remote Execution Service

During this time it also become evident that relying on the Freefluo workflow engine, now no longer actively maintained by Southampton developers, hampered Taverna’s future directions. To support better concurrency, efficiency and tighter workflow engine integration and logging, work began on the t2core workflow engine, which was designed by Tom Oinn at EMBL-EBI and Matt Hancock, and prototyped by the Manchester team.

The website taverna.org.uk was established, with extensive documentation and training materials created. The myGrid team provided Taverna training for bioinformaticians at many international events and summer schools.

2001-2006: Prototyping

In 2001 the myGrid consortium was formed from 6 academic institutions and 8 industry partners. Their challenge: Create a graphical workbench for bioinformaticians to combine data and web services.

Part of the conceived myGrid workbench was the Taverna Workbench for building workflows. Taverna was seen as participating in the Semantic Web, other parts of myGrid included catalogues of semantic service descriptions (discovered by Taverna), wrapped bioinformatics web services (which Taverna called), FreeFluo workflow enactment (which Taverna used for execution), OGSA-DAI distributed queries and the myGrid Information Repository (mIR) storing workflow definitions, projects and data with semantic metadata.

Notable Taverna Workbench prototype releases:

Version 0.1 beta (2003-2004): First Taverna Scufl workbench
Version 1.0 (2005): First stable release, using FreeFluo
Version 1.3 (2005): Production-ready release

Taverna was developed as Open Source software from the beginning, licensed as LGPL 2.1. Source and binary releases were kept on SourceForge.net, the source code was initially kept on University of Manchester CSV servers before moving to CSV on SourceForge.

Funders

This is a most likely incomplete list of funders that have directly and indirectly sponsored development of Taverna and the Taverna ecosystem.

OMII UK
Engineering and Physical Sciences Research Council (EPSRC):
Biotechnology and Biological Sciences Research Council (BBSRC)
- Ondex (BB/F006012/1
- SynBioChem (BB/M017702/1)
- BioCatalogue, formerly “WS4LS” (BB/F01046X/1, BB/F010540/1)
Microsoft Technical Computing Initiative
- myExperiment, “Social Networking for Life Sciences”
Economic and Social Research Council (ESRC)
- MethodBox/Obesity e-Lab ES/F029721/1, ES/J010014/1
Joint Information Systems Committee (JISC)
- myExperiment
EU 7th Framework Programme (FP7), incl:
EU Horizon 2020 (H2020) programme
- BioExcel (H2020-INFRAEDI-02-2018-823830, H2020-EINFRA-2015-1-675728)
- IBISBA (H2020-INFRAIA-2017-1-730976, H2020-INFRADEV-2019-2-871118)