HelloWDL Tutorial

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 7

HelloWDL Tutorial

February 2020
This tutorial will teach you how to script a basic pipeline using the Workflow Description
Language (WDL) and how to run WDLs through the Cromwell execution engine.

The tutorial was last tested with Womtool v48, Cromwell v48 and the GATK v4.1.4.1
____________________________________________________________________________

1. Start a GATK docker container


If your system supports running Cromwell, you can simply navigate to the ​gatk_bundle_2002
folder and proceed to section 2. Otherwise, follow the directions below to start up and attach to a
docker container. You can run all tutorial commands ​except​ those in the last section from within
a GATK4 docker.

First we need to run our docker container with a mounted bundle. The command for that is
below, but we need to edit it slightly. Replace /path/ with wherever you placed the downloaded
data bundle for this workshop. See the image further below for more details on the command

docker run -v /path/gatk_bundle_2002:/gatk/my_data -it broadinstitute/gatk:4.1.4.1

Once you have the docker open, navigate to the correct directory by running ​cd /gatk/my_data/

cd /gatk/my_data

1
2. Run a simple WDL script using Cromwell
Open ​scripts/​hello_world_0.wdl​ ​in a text
editor. Here, we’ve pictured SublimeText, but you
are welcome to use whichever text editor you prefer.
This simple WDL script prints out the string ​Hello
World​ to the command line. Let’s take a look at the
structured elements of the script.

● The ​workflow​ name is ​HelloWorld​. The


contents within the workflow brackets ​{ }
define the steps of the workflow. As this is a
very simple workflow, it only ​call​s one task,
WriteGreeting​.

● Tasks​ are defined separately from the workflow section. We have one task,
​ nd the contents within the task brackets { } detail what it does.
WriteGreeting, a

● The task definition has two sections–-a ​command ​section and an ​output ​section. The
command section contains what is run on the command line. It is essentially the “do work”
part of the script, and is exactly what you might run in a regular bash terminal. The output
section defines the results of interest for that task.

The bundle we mounted earlier contains the jar files required to run Cromwell. Let’s run our first
script using the command below:

java -jar jars/cromwell-48.jar run scripts/hello_world_0.wdl

Notice Cromwell tells you what is going on during the run. There are a lot of logs that are
particularly relevant to developers, but for us we are interested in the output. Find the section that
gives you the location of the output and the workflow ID. It should look something like this:

● Every time you run a WDL script, Cromwell organizes it in the ​cromwell-executions
folder. This is to keep all your runs separate so you don’t accidentally overwrite old runs
with new ones. They are named by
<workflow_name>/<workflow_ID>/<call-task_name>/execution/<output_file>.

Copy the path of the output result, and use ​more​ to confirm it contains ​Hello World.​ You can also
open up the file using your computer’s file browser if you prefer

more \
/gatk/my_data/cromwell-executions/HelloWorld/69c617ee-87d7-4e8e-abf9-9ab0daf8bbec/call-Write
Greeting/execution/stdout

2
3. Add a configurable variable and define it in
an inputs JSON file
Open up our next script, ​hello_world_1.wdl
swaps out the literal string in the task command with a
variable named ​greeting​. We call such variables
parameters o​ r​ keys.​

● Above the command section, the task defines


the ​variable type​, here ​String greeting​.

● The notations ​${ }​ surround the variable in


the task command section.

We define the variable in a separate


inputs file,
hello_world.inputs.json​. The
variable definition is also called the ​value​,
here the string ​Hello World​. All variables
in our JSON input files are structured as
key:value pairs​.

● The key is structured as ​"<workflow>.<task>.<variable>"​.

● The value is on the right side, surrounded in quotation marks.

Now run this WDL script and provide the new inputs file with ​-i​.

java -jar jars/cromwell-48.jar run scripts/hello_world_1.wdl -i


scripts/hello_world.inputs.json

Confirm the result contains ​Hello World​ by again copying the


output path or opening the file in your file browser. Try
changing the greeting in the inputs JSON file and run again.
We use variables in our scripts so we can run them over and
over again with a variety of different input values, without
needing to go back to the WDL script to edit. It’s a small
sample case here, but could you imagine typing out all your
file names for each task in the GATK Germline Best Practices
pipeline?

4. Chain tasks together


Another important factor in WDL scripts is the ability to build a
pipeline of different tasks. There are many ways to chain tasks

3
together into a pipeline, but here we will go over the simplest example. Open up the next script,
hello_world_2.wdl​, which chains two tasks linearly. The first task creates a greeting, which
we are familiar with by now, and the second task reads the first greeting back with an
amendment, "to you too".

● In the workflow, the second task takes in the result of the first task via the variable
WriteGreeting.out​. This is how you chain two tasks together: in the workflow.

● When you have multiple tasks, the workflow output section highlights the results you want
Cromwell to show in the end. For scripts with many, many tasks, it is helpful to have
Cromwell only print out the results we are interested in the end. All intermediate outputs
are still created, but only the ones defined in the workflow output section are printed to
the terminal when you run the script.

Let’s run the WDL with the same inputs JSON file as before, then open up the result using either
the ​more​ command or your file browser.

java -jar jars/cromwell-48.jar run scripts/hello_world_2.wdl -i


scripts/hello_world.inputs.json

5. Validate the WDL script


When you write a WDL script, it’s best practices to validate that WDL script before running it. It
can save you some real headaches down the line when you’ve been struggling to run your script
for hours, only to find out you missed a curly brace somewhere. Validation can’t catch all the
errors, but we will get more into that case in a later section. First let’s take a look at some
syntax-based errors.

Add an ‘​s​’ to ​String greeting​. Save the script, then run the Womtool ​validate​ command.

java -jar jars/womtool-48.jar validate scripts/hello_world_2.wdl

Womtool should complain about the variable not existing, which makes sense because we
declared ​String greetings​ and then tried to use the variable ​greeting​ in our command.

4
Let’s look at another error. Go back and delete the ‘​s​’ we previously added. Introduce a new type
of error by replacing ​String​ with ​Int​. Save, and run the validate command. Notice that this
time you need to include the inputs JSON!

java -jar jars/womtool-48.jar validate scripts/hello_world_2.wdl -i


scripts/hello_world.inputs.json

With this error, Womtool tells us it couldn’t evaluate our greeting input:

It makes sense, since we told it we wanted a number, but our inputs JSON gave it a text value.
Change the variable back to ​String​, and when you run Womtool with a clean script, you’ll get a
Success!​ message.

6. Create an inputs JSON template with Womtool


Generate a blank inputs JSON template with the Womtool ​inputs​ function.

java -jar jars/womtool-48.jar inputs scripts/hello_world_2.wdl >


scripts/hello_world_2.inputs.json

Open the newly-created inputs file and fill in the variable with whatever greeting you like! I’ve
chosen “Hello Workshop”. Then run the ​hello_world_2.wdl​ script with your new input file.

java -jar jars/cromwell-48.jar run scripts/hello_world_2.wdl -i


scripts/hello_world_2.inputs.json

View the output using the ​more​ command or your file browser.

7. Run a GATK analysis and locate an


error message
The ​hello_gatk.wdl​ runs
HaplotypeCaller in GVCF mode. It is a
single-step workflow that takes in a BAM file
and produces a GVCF of variant calls. For
more information on what HaplotypeCaller
does, read ​here​.

5
With this script, we see a lot more input definitions before the command section.

● java_opt​ is a variable that sets the max amount of memory the tool is allowed to use.
● The next block of inputs, ​refFasta​ through ​inputBamIndex​, contain input files that
our tool, HaplotypeCaller, needs to run.
● The last input, ​gvcf_name​, uses a WDL function called ​basename()​. This function
takes in a file, reads the name, and strips off the file type ending put in quotes. Here, we
are stripping off the ​“.bam”​ ending to our file, then appending it with a new file type:
“.g.vcf”

➤ Do you notice anything odd when you compare the variable declarations and the variables
used in the commands?

The supporting files (indexes and dictionaries) all are not used in the command, even though we
declare them as variables in our task. This is because our tool, GATK, knows to look for
supporting files of similar names in the same directory as the base file. For example, GATK
would know to look for a sample.bai file if it was handed a sample.bam file. Cromwell, however,
needs to be told that these supporting files exist, so that it can pull them into the working
directory (cromwell-executions) so that GATK can then find them when it goes looking.

When you open up the inputs file,


you’ll find it has already been filled
out with relative paths to files in the
bundle we are using.

Run ​hello_gatk.wdl​.

java -jar jars/cromwell-48.jar run scripts/hello_gatk.wdl -i


scripts/hello_gatk.inputs.json

Using the ​more​ command, view the ​mother.g.vcf​ result. Hold the [ENTER] key to scroll down
until you see GVCF blocks and eventually the records themselves! This tool worked, and
generated a proper GVCF file. (Tip: Press Q to exit when you are done looking)

Now let’s take a look at the other kinds of errors you can cause. Break the WDL by inserting
gibberish into the HaplotypeCaller command. It seems a cat walked across your keyboard and
you didn’t notice what they changed. Try validating the WDL with womtool.

java -jar jars/womtool-48.jar validate scripts/hello_gatk.wdl

You’ll notice that this prints out a ​Success! ​message. This is clearly an error type that our
womtool doesn’t catch. When you run the script you’ll see that it fails. This is an error with the
tool itself, so the tool has to report back the message. Cromwell prints out the start of the error
message, but you’ll need to view the task's ​stderr​ file to see it all.

6
The full error message indicates that “Haploapuhr adkaliefCaller” is not a tool, and GATK
helpfully lists the tool options available to you. When you’re done, correct the error you inserted
and save.

8. Run WDL On Terra


Now that we have gone through how to run WDLs locally, let’s talk about how we would run
these WDLs on Terra. There are a few ways to put your WDL scripts on Terra, but today we will
be walking you through the FireCloud Method Repository.

Navigate to our featured workspace, ​GATKTutorials-Pipelining​, and clone it. You will find all
instructions contained within the dashboard.

That's a wrap on WDL and Cromwell basics. To continue learning, do the ​Puzzles​ worksheet.

You might also like