Apache Airflow 2.6.0: Features You Can't Miss
4 May, 2023
0
0
0
Contributors
Apache Airflow is a popular open-source workflow management platform. Airflow has long been a powerful and flexible platform for orchestrating complex workflows. The wait is over because Airflow 2.6.0 is finally here and packed with excellent new features and improvements.
In this blog, I'll explain everything you need to know about Airflow 2.6.0, from the significant changes to minor tweaks. Let's dive in!
What's new in Apache Airflow 2.6.0?
Airflow's release notes have detailed information about everything that has been improved, introduced, or deprecated in Apache Airflow 2.6.0. Here is a brief about new changes:
Major changes:
- Default permissions of file task handler log directories and files have been changed to "owner + group" writeable.
- SLA callbacks no longer add files to the dag processor manager's queue.
- The
cleanup()
method in BaseTrigger is now defined as an asynchronous pattern. - The gauge
scheduler.tasks.running
no longer exists. - Consolidate handling tasks stuck in queued under the new
task_queued_timeout
config.
Improvements:
- Display only the running configuration in the configurations view.
- Explicit skipped states list for
ExternalTaskSensor
.
Miscellaneous Changes:
- Handle OverflowError on exponential backoff in
next_run_calculation
. - Move Hive macros to the provider.
Trigger logs are visible in the webserver
Trigger logs provide information about the progress of your deferred tasks. Deferred tasks can be halted and performed at a later time, conserving resources and money. Trigger logs appear in the task logs and the regular logs from your task. They make it easier to solve problems and keep track of your deferred chores.
Trigger logs are now included in task logs. They are displayed alongside the rest of your task's logs.
To enable this feature, you must modify the complete Airflow logging stack. As a result, if you use remote logging, you must update your providers to use this capability.
Grid view improvements
With Apache Airflow 2.6.0, the grid view has received a number of minor enhancements. The graph tab is one of the most noticeable improvements.
The graph tab is accessible from the grid view. It provides a more integrated graph representation of the DAG, in which selecting a job from the grid or graph highlights the same task in both perspectives.
Another improvement is the ability to filter upstream and downstream from a single activity. For example, if you want to filter downstream for the specified task 'describe_integrity,' the following may happen:
Support for notifications
The notifications framework allows you to send messages to external systems when a task instance/DAG run changes state. For example, you can easily post a message to Slack.
Here's a snippet from Airflow's official release blog that demonstrates the same:
with DAG(
"slack_notifier_example",
start_date=datetime(2023, 1, 1),
on_success_callback=[
send_slack_notification(
text="The DAG {{ dag.dag_id }} succeeded",
channel=" #general",
username= "Airflow",
)
],
)
Airflow 2.6.0 only supports Slack at the time of the release. The blog promises to bring more integrations in the upcoming releases.
Rapid Questions Round
What is Apache Airflow?
Apache Airflow, simply referred to as "Airflow", is a platform for authoring, scheduling, and monitoring processes programmatically. Workflows become more manageable, versionable, testable, and collaborative when expressed as code.
What is the use case of Airflow?
Airflow is used to create workflows as directed acyclic graphs (DAGs) of activities. Rich command line facilities make sophisticated DAG surgery a breeze. The Airflow scheduler performs your tasks across several workers while adhering to the requirements you specify. The intuitive user interface makes it simple to see pipelines in production, monitor progress, and fix issues as they arise.
How to get started with Apache Airflow 2.6.0?
Here's how you can start with Apache Airflow 2.6.0:
-> PyPI
-> Docs
-> Release Notes
-> Sources
Closing notes and references
Apache Airflow is a community-created platform to programmatically author, schedule, and monitor workflows. The open-source platform released Airflow 2.6.0 on April 30. Airflow 2.6.0 is a major update that brings over 500 changes, including 42 new features, 58 improvements, 38 bug fixes, and 17 documentation changes.
dataengineering
airflow
elt
orchestrator
apache