HelloWDL Tutorial

HelloWDL Tutorial
February 2020
This tutorial will teach you how to script a basic pipeline using the Workflow Description
Language (WDL) and how to run WDLs through the Cromwell execution engine.
The tutorial was last tested with Womtool v48, Cromwell v48 and the GATK v4.1.4.1
____________________________________________________________________________
1. Start a GATK docker container

If your system supports running Cromwell, you can simply navigate to the gatk_bundle_2002
folder and proceed to section 2. Otherwise, follow the directions below to start up and attach to a
docker container. You can run all tutorial commands except those in the last section from within
a GATK4 docker.
First we need to run our docker container with a mounted bundle. The command for that is
below, but we need to edit it slightly. Replace /path/ with wherever you placed the downloaded
data bundle for this workshop. See the image further below for more details on the command
docker run -v /path/gatk_bundle_2002:/gatk/my_data -it broadinstitute/gatk:4.1.4.1
Once you have the docker open, navigate to the correct directory by running cd /gatk/my_data/
cd /gatk/my_data
1
2. Run a simple WDL script using Cromwell
Open scripts/hello_world_0.wdl in a text
editor. Here, we’ve pictured SublimeText, but you
are welcome to use whichever text editor you prefer.
This simple WDL script prints out the string Hello
World to the command line. Let’s take a look at the
structured elements of the script.
● The workflow name is HelloWorld. The

contents within the workflow brackets { }
define the steps of the workflow. As this is a
very simple workflow, it only calls one task,
WriteGreeting.
● Tasks are defined separately from the workflow section. We have one task,
nd the contents within the task brackets { } detail what it does.
WriteGreeting, a
● The task definition has two sections–-a command section and an output section. The
command section contains what is run on the command line. It is essentially the “do work”
part of the script, and is exactly what you might run in a regular bash terminal. The output
section defines the results of interest for that task.
The bundle we mounted earlier contains the jar files required to run Cromwell. Let’s run our first
script using the command below:
java -jar jars/cromwell-48.jar run scripts/hello_world_0.wdl
Notice Cromwell tells you what is going on during the run. There are a lot of logs that are
particularly relevant to developers, but for us we are interested in the output. Find the section that
gives you the location of the output and the workflow ID. It should look something like this:
● Every time you run a WDL script, Cromwell organizes it in the cromwell-executions
folder. This is to keep all your runs separate so you don’t accidentally overwrite old runs
with new ones. They are named by
<workflow_name>/<workflow_ID>/<call-task_name>/execution/<output_file>.
Copy the path of the output result, and use more to confirm it contains Hello World. You can also
open up the file using your computer’s file browser if you prefer
more \
/gatk/my_data/cromwell-executions/HelloWorld/69c617ee-87d7-4e8e-abf9-9ab0daf8bbec/call-Write
Greeting/execution/stdout
2
3. Add a configurable variable and define it in
an inputs JSON file
Open up our next script, hello_world_1.wdl
swaps out the literal string in the task command with a
variable named greeting. We call such variables
parameters o r keys.
● Above the command section, the task defines

the variable type, here String greeting.
● The notations ${ } surround the variable in

the task command section.
We define the variable in a separate

inputs file,
hello_world.inputs.json. The
variable definition is also called the value,
here the string Hello World. All variables
in our JSON input files are structured as
key:value pairs.
● The key is structured as "<workflow>.<task>.<variable>".
● The value is on the right side, surrounded in quotation marks.
Now run this WDL script and provide the new inputs file with -i.
java -jar jars/cromwell-48.jar run scripts/hello_world_1.wdl -i

scripts/hello_world.inputs.json
Confirm the result contains Hello World by again copying the

output path or opening the file in your file browser. Try
changing the greeting in the inputs JSON file and run again.
We use variables in our scripts so we can run them over and
over again with a variety of different input values, without
needing to go back to the WDL script to edit. It’s a small
sample case here, but could you imagine typing out all your
file names for each task in the GATK Germline Best Practices
pipeline?
4. Chain tasks together

Another important factor in WDL scripts is the ability to build a
pipeline of different tasks. There are many ways to chain tasks
3
together into a pipeline, but here we will go over the simplest example. Open up the next script,
hello_world_2.wdl, which chains two tasks linearly. The first task creates a greeting, which
we are familiar with by now, and the second task reads the first greeting back with an
amendment, "to you too".
● In the workflow, the second task takes in the result of the first task via the variable
WriteGreeting.out. This is how you chain two tasks together: in the workflow.
● When you have multiple tasks, the workflow output section highlights the results you want
Cromwell to show in the end. For scripts with many, many tasks, it is helpful to have
Cromwell only print out the results we are interested in the end. All intermediate outputs
are still created, but only the ones defined in the workflow output section are printed to
the terminal when you run the script.
Let’s run the WDL with the same inputs JSON file as before, then open up the result using either
the more command or your file browser.

5. Validate the WDL script

When you write a WDL script, it’s best practices to validate that WDL script before running it. It
can save you some real headaches down the line when you’ve been struggling to run your script
for hours, only to find out you missed a curly brace somewhere. Validation can’t catch all the
errors, but we will get more into that case in a later section. First let’s take a look at some
syntax-based errors.
Add an ‘s’ to String greeting. Save the script, then run the Womtool validate command.
java -jar jars/womtool-48.jar validate scripts/hello_world_2.wdl
Womtool should complain about the variable not existing, which makes sense because we
declared String greetings and then tried to use the variable greeting in our command.
4
Let’s look at another error. Go back and delete the ‘s’ we previously added. Introduce a new type
of error by replacing String with Int. Save, and run the validate command. Notice that this
time you need to include the inputs JSON!
java -jar jars/womtool-48.jar validate scripts/hello_world_2.wdl -i

With this error, Womtool tells us it couldn’t evaluate our greeting input:
It makes sense, since we told it we wanted a number, but our inputs JSON gave it a text value.
Change the variable back to String, and when you run Womtool with a clean script, you’ll get a
Success! message.
6. Create an inputs JSON template with Womtool

Generate a blank inputs JSON template with the Womtool inputs function.
java -jar jars/womtool-48.jar inputs scripts/hello_world_2.wdl >

scripts/hello_world_2.inputs.json
Open the newly-created inputs file and fill in the variable with whatever greeting you like! I’ve
chosen “Hello Workshop”. Then run the hello_world_2.wdl script with your new input file.

scripts/hello_world_2.inputs.json
View the output using the more command or your file browser.
7. Run a GATK analysis and locate an

error message
The hello_gatk.wdl runs
HaplotypeCaller in GVCF mode. It is a
single-step workflow that takes in a BAM file
and produces a GVCF of variant calls. For
more information on what HaplotypeCaller
does, read here.
5
With this script, we see a lot more input definitions before the command section.
● java_opt is a variable that sets the max amount of memory the tool is allowed to use.
● The next block of inputs, refFasta through inputBamIndex, contain input files that
our tool, HaplotypeCaller, needs to run.
● The last input, gvcf_name, uses a WDL function called basename(). This function
takes in a file, reads the name, and strips off the file type ending put in quotes. Here, we
are stripping off the “.bam” ending to our file, then appending it with a new file type:
“.g.vcf”
➤ Do you notice anything odd when you compare the variable declarations and the variables
used in the commands?
The supporting files (indexes and dictionaries) all are not used in the command, even though we
declare them as variables in our task. This is because our tool, GATK, knows to look for
supporting files of similar names in the same directory as the base file. For example, GATK
would know to look for a sample.bai file if it was handed a sample.bam file. Cromwell, however,
needs to be told that these supporting files exist, so that it can pull them into the working
directory (cromwell-executions) so that GATK can then find them when it goes looking.
When you open up the inputs file,

you’ll find it has already been filled
out with relative paths to files in the
bundle we are using.
Run hello_gatk.wdl.
java -jar jars/cromwell-48.jar run scripts/hello_gatk.wdl -i

scripts/hello_gatk.inputs.json
Using the more command, view the mother.g.vcf result. Hold the [ENTER] key to scroll down
until you see GVCF blocks and eventually the records themselves! This tool worked, and
generated a proper GVCF file. (Tip: Press Q to exit when you are done looking)
Now let’s take a look at the other kinds of errors you can cause. Break the WDL by inserting
gibberish into the HaplotypeCaller command. It seems a cat walked across your keyboard and
you didn’t notice what they changed. Try validating the WDL with womtool.
java -jar jars/womtool-48.jar validate scripts/hello_gatk.wdl
You’ll notice that this prints out a Success! message. This is clearly an error type that our
womtool doesn’t catch. When you run the script you’ll see that it fails. This is an error with the
tool itself, so the tool has to report back the message. Cromwell prints out the start of the error
message, but you’ll need to view the task's stderr file to see it all.
6
The full error message indicates that “Haploapuhr adkaliefCaller” is not a tool, and GATK
helpfully lists the tool options available to you. When you’re done, correct the error you inserted
and save.
8. Run WDL On Terra

Now that we have gone through how to run WDLs locally, let’s talk about how we would run
these WDLs on Terra. There are a few ways to put your WDL scripts on Terra, but today we will
be walking you through the FireCloud Method Repository.
Navigate to our featured workspace, GATKTutorials-Pipelining, and clone it. You will find all
instructions contained within the dashboard.
That's a wrap on WDL and Cromwell basics. To continue learning, do the Puzzles worksheet.

HelloWDL Tutorial

Uploaded by

Copyright:

Available Formats

You might also like

HelloWDL Tutorial

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

HelloWDL Tutorial

Uploaded by

Copyright:

Available Formats

HelloWDL Tutorial

1. Start a GATK docker container

docker run -v /path/gatk_bundle_2002:/gatk/my_data -it broadinstitute/gatk:4.1.4.1

● The ​workflow​ name is ​HelloWorld​. The

java -jar jars/cromwell-48.jar run scripts/hello_world_0.wdl

● Above the command section, the task defines

● The notations ​${ }​ surround the variable in

We define the variable in a separate

● The key is structured as ​"<workflow>.<task>.<variable>"​.

● The value is on the right side, surrounded in quotation marks.

java -jar jars/cromwell-48.jar run scripts/hello_world_1.wdl -i

Confirm the result contains ​Hello World​ by again copying the

4. Chain tasks together

java -jar jars/cromwell-48.jar run scripts/hello_world_2.wdl -i

5. Validate the WDL script

java -jar jars/womtool-48.jar validate scripts/hello_world_2.wdl

java -jar jars/womtool-48.jar validate scripts/hello_world_2.wdl -i

6. Create an inputs JSON template with Womtool

java -jar jars/womtool-48.jar inputs scripts/hello_world_2.wdl >

java -jar jars/cromwell-48.jar run scripts/hello_world_2.wdl -i

7. Run a GATK analysis and locate an

When you open up the inputs file,

java -jar jars/cromwell-48.jar run scripts/hello_gatk.wdl -i

java -jar jars/womtool-48.jar validate scripts/hello_gatk.wdl

8. Run WDL On Terra

You might also like

● The workflow name is HelloWorld. The

● The notations ${ } surround the variable in

● The key is structured as "<workflow>.<task>.<variable>".

Confirm the result contains Hello World by again copying the