About Airflow date macros, ds and execution_date

A very common pattern when developing ETL workflows in any technology is to parameterize tasks with the execution date, so that tasks can, for example, work on the right data partition. Apache Airflow allows the usage of Jinja templating when defining tasks, where it makes available multiple helpful variables and macros to aid in date manipulation.

A simple task that executes a run.sh bash script with the execution date as a parameter might look like the following:

task = BashOperator(
    task_id='bash_script',
    bash_command='./run.sh {{ ds }}',
    dag=dag)

The {{ }} brackets tell Airflow that this is a Jinja template, and ds is a variable made available by Airflow that is replaced by the execution date in the format YYYY-MM-DD. Thus, in the dag run stamped with 2018-06-04, this would render to:

./run.sh 2018-06-04

Another useful variable is ds_nodash, where './run.sh {{ ds_nodash }}' renders to:

./run.sh 20180604

Often, however, we might need to further manipulate dates before passing them to the underlying tasks. For this, the execution_date variable is useful, as it is a python datetime object (and not a string like ds). Thus, we can create a date string in any format by using strftime:

'./run.sh {{ execution_date.strftime("%d-%m-%Y") }}'

which becomes:

./run.sh 04-06-2018

There is also a macros object, which exposes common python functions and libraries like macros.datetime and macros.timedelta, as well as some Airflow specific shorthand methods such as macros.ds_add and macros.ds_format. One way to, for example, subtract 5 days to the execution date would be:

'./run.sh {{ (execution_date - macros.timedelta(days=5)).strftime("%Y-%m-%d") }}'
./run.sh 2018-05-30

macros.ds_format is just a more concise way of accomplishing the same date arithmetic, as it receives the ds string directly and returns it in the same format:

'./run.sh {{ macros.ds_add(ds, -5) }}'
./run.sh 2018-05-30

Finally, you may also come across the ts variable which is the execution date in ISO 8601 format. Other variables can be looked up in this section of the Airflow API reference:

'./run.sh {{ ts }}'
./run.sh 2018-06-04T00:00:00+00:00

Lastly, a common source of confusion in Airflow regarding dates in the fact that the run timestamped with a given date only starts when the period that it covers ends. Thus, be aware that if your DAG’s schedule_interval is set to daily, the run with id 2018-06-04 will only start after that day ends, that is, in the beginning of the 5th of June.

If you got this far, you might enjoy my Data Engineering Resources post, where I link to some helpful Airflow resources. Cheers!

Diogo Franco

Diogo Franco

I work in Big Data at Farfetch, and I love data, distributed systems, machine learning, code and science!