Professional Documents
Culture Documents
HelloWDL Tutorial
HelloWDL Tutorial
HelloWDL Tutorial
February 2020
This tutorial will teach you how to script a basic pipeline using the Workflow Description
Language (WDL) and how to run WDLs through the Cromwell execution engine.
The tutorial was last tested with Womtool v48, Cromwell v48 and the GATK v4.1.4.1
____________________________________________________________________________
First we need to run our docker container with a mounted bundle. The command for that is
below, but we need to edit it slightly. Replace /path/ with wherever you placed the downloaded
data bundle for this workshop. See the image further below for more details on the command
Once you have the docker open, navigate to the correct directory by running cd /gatk/my_data/
cd /gatk/my_data
1
2. Run a simple WDL script using Cromwell
Open scripts/hello_world_0.wdl in a text
editor. Here, we’ve pictured SublimeText, but you
are welcome to use whichever text editor you prefer.
This simple WDL script prints out the string Hello
World to the command line. Let’s take a look at the
structured elements of the script.
● Tasks are defined separately from the workflow section. We have one task,
nd the contents within the task brackets { } detail what it does.
WriteGreeting, a
● The task definition has two sections–-a command section and an output section. The
command section contains what is run on the command line. It is essentially the “do work”
part of the script, and is exactly what you might run in a regular bash terminal. The output
section defines the results of interest for that task.
The bundle we mounted earlier contains the jar files required to run Cromwell. Let’s run our first
script using the command below:
Notice Cromwell tells you what is going on during the run. There are a lot of logs that are
particularly relevant to developers, but for us we are interested in the output. Find the section that
gives you the location of the output and the workflow ID. It should look something like this:
● Every time you run a WDL script, Cromwell organizes it in the cromwell-executions
folder. This is to keep all your runs separate so you don’t accidentally overwrite old runs
with new ones. They are named by
<workflow_name>/<workflow_ID>/<call-task_name>/execution/<output_file>.
Copy the path of the output result, and use more to confirm it contains Hello World. You can also
open up the file using your computer’s file browser if you prefer
more \
/gatk/my_data/cromwell-executions/HelloWorld/69c617ee-87d7-4e8e-abf9-9ab0daf8bbec/call-Write
Greeting/execution/stdout
2
3. Add a configurable variable and define it in
an inputs JSON file
Open up our next script, hello_world_1.wdl
swaps out the literal string in the task command with a
variable named greeting. We call such variables
parameters o r keys.
Now run this WDL script and provide the new inputs file with -i.
3
together into a pipeline, but here we will go over the simplest example. Open up the next script,
hello_world_2.wdl, which chains two tasks linearly. The first task creates a greeting, which
we are familiar with by now, and the second task reads the first greeting back with an
amendment, "to you too".
● In the workflow, the second task takes in the result of the first task via the variable
WriteGreeting.out. This is how you chain two tasks together: in the workflow.
● When you have multiple tasks, the workflow output section highlights the results you want
Cromwell to show in the end. For scripts with many, many tasks, it is helpful to have
Cromwell only print out the results we are interested in the end. All intermediate outputs
are still created, but only the ones defined in the workflow output section are printed to
the terminal when you run the script.
Let’s run the WDL with the same inputs JSON file as before, then open up the result using either
the more command or your file browser.
Add an ‘s’ to String greeting. Save the script, then run the Womtool validate command.
Womtool should complain about the variable not existing, which makes sense because we
declared String greetings and then tried to use the variable greeting in our command.
4
Let’s look at another error. Go back and delete the ‘s’ we previously added. Introduce a new type
of error by replacing String with Int. Save, and run the validate command. Notice that this
time you need to include the inputs JSON!
With this error, Womtool tells us it couldn’t evaluate our greeting input:
It makes sense, since we told it we wanted a number, but our inputs JSON gave it a text value.
Change the variable back to String, and when you run Womtool with a clean script, you’ll get a
Success! message.
Open the newly-created inputs file and fill in the variable with whatever greeting you like! I’ve
chosen “Hello Workshop”. Then run the hello_world_2.wdl script with your new input file.
View the output using the more command or your file browser.
5
With this script, we see a lot more input definitions before the command section.
● java_opt is a variable that sets the max amount of memory the tool is allowed to use.
● The next block of inputs, refFasta through inputBamIndex, contain input files that
our tool, HaplotypeCaller, needs to run.
● The last input, gvcf_name, uses a WDL function called basename(). This function
takes in a file, reads the name, and strips off the file type ending put in quotes. Here, we
are stripping off the “.bam” ending to our file, then appending it with a new file type:
“.g.vcf”
➤ Do you notice anything odd when you compare the variable declarations and the variables
used in the commands?
The supporting files (indexes and dictionaries) all are not used in the command, even though we
declare them as variables in our task. This is because our tool, GATK, knows to look for
supporting files of similar names in the same directory as the base file. For example, GATK
would know to look for a sample.bai file if it was handed a sample.bam file. Cromwell, however,
needs to be told that these supporting files exist, so that it can pull them into the working
directory (cromwell-executions) so that GATK can then find them when it goes looking.
Run hello_gatk.wdl.
Using the more command, view the mother.g.vcf result. Hold the [ENTER] key to scroll down
until you see GVCF blocks and eventually the records themselves! This tool worked, and
generated a proper GVCF file. (Tip: Press Q to exit when you are done looking)
Now let’s take a look at the other kinds of errors you can cause. Break the WDL by inserting
gibberish into the HaplotypeCaller command. It seems a cat walked across your keyboard and
you didn’t notice what they changed. Try validating the WDL with womtool.
You’ll notice that this prints out a Success! message. This is clearly an error type that our
womtool doesn’t catch. When you run the script you’ll see that it fails. This is an error with the
tool itself, so the tool has to report back the message. Cromwell prints out the start of the error
message, but you’ll need to view the task's stderr file to see it all.
6
The full error message indicates that “Haploapuhr adkaliefCaller” is not a tool, and GATK
helpfully lists the tool options available to you. When you’re done, correct the error you inserted
and save.
Navigate to our featured workspace, GATKTutorials-Pipelining, and clone it. You will find all
instructions contained within the dashboard.
That's a wrap on WDL and Cromwell basics. To continue learning, do the Puzzles worksheet.