Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 12

Azure Data Factory –

Important Concepts
This document will describe some of the important concepts related to configuring ADF pipelines..

This document assumes that you have already set up the various linked services needed for your
pipeline as described in the Pre-requisites document.

Rev B – 11/13/2019
Rev A – 8/15/2019
Send feedback to MaestroTeam@microsoft.com

Contents
Overview.....................................................................................................................................................2
Connections.................................................................................................................................................2
Datasets.......................................................................................................................................................3
Pipelines......................................................................................................................................................5
Pipeline Parameters................................................................................................................................5
Triggers........................................................................................................................................................6
Dynamic Content.........................................................................................................................................8
Code Views................................................................................................................................................10
Overview
When you are setting up Azure Data Factory pipelines you need to be aware of some basic concepts.
These include

 Connections
 Datasets
 Pipelines and parameters
 Triggers
 Dynamic Content
 Code views

Connections
These are what you use to connect to data sources and sinks. Mostly you will create Linked Services.
The pre-requisite document discusses how to set these up.
Datasets
These are what you will be reading from and loading to. They require a Linked Service in order to access
the data. For ease of use we recommend storing the datasets for a pipeline in a folder with well-defined
names so it easy to keep track of them.

You need to create DataSets before you can create Pipelines that use them. Here is an example of the
steps to create an ADL SS dataset
Under connection you will need to select one of the existing LinkedServices. Make sure you test the
connection. The Browse option makes it relatively easy to find an input file and it also verifies that your
Linked Service has access. In order to import the schema you will have to point to an actual file.
Afterwards you can dynamically generate file paths as described in the pipeline parameter section.
Pipelines
These are a series of stages for validating and processing data. The various stages will run in order as
they feed into each other. It is usually a good idea to have an initial Validation step that ensures data
exists before you actually process it.

Pipeline Parameters
You will often create parameters for your pipelines that can be used as dynamic content for generating
file names and dates for processing.

For example – you may want an inputDateTime to use to generate a DateTime value

You would define it in your “Parameters” section of your pipeline. [And it will be passed in by the trigger
– see that section for the rest of the example]

You would then reference it as “Dynamic Content”. Here it is being used when accessing a DataSet

Then the dataset uses that value to generate the year part of the path [note that dataset().fileDateTime
is the value of @pipeline()parameter.inputDateTime.]
Triggers
Once you have a pipeline ready to go you can schedule it with a trigger. You will usually want to use a
Tumbling Window trigger.

Your trigger can populate your pipeline parameters. Here is the inputDateTime parameter from above
being populated with the start time of the trigger.

Here is an example of creating a Tumbling Window trigger from within a pipeline.


After you have published the trigger you can activate it and monitor it.

Dynamic Content
It is possible to add “Dynamic Content” to certain parts of a pipeline in order to allow values to be
generated passed on triggering values. Valid locations will look like this.
Once dynamic content hs been entered the field will look like this.

Here is what the adding dynamic content entry looks like. Note the leading @ sign. [Note that entering
what looks like dynamic content directly in a text box will NOT work.
Code Views
Even though the Data Factory interface is UI based, behind the scenes everything is being stored as a
“code” bundle that you can actually modify directly and save.
Note that there is a code button in the upper right hand corner that shows the code for the entire
pipeline.

You might also like