Migrating Slack Airflow to Python 3 Without Disruption

Last year, we migrated Airflow from 1.8 to 1.10 at Slack (see here) and we did a “Big bang” upgrade because of the constraints we had. This year, due to Python 2 reaching end of life, we again had a major migration of Airflow from Python 2 to 3 and we wanted to put our…

Ashwin Shankar
10 min readbeginner
--
View Original

Overview

This article details the process of migrating Slack's Apache Airflow from Python 2 to Python 3 without disrupting user experience. It outlines the steps taken, challenges faced, and solutions implemented during the migration, emphasizing a 'Red-black' deployment strategy.

What You'll Learn

1

How to set up a Python 3 virtual environment for Apache Airflow

2

How to migrate Apache Airflow DAGs from Python 2 to Python 3 seamlessly

3

Why using a 'Red-black' deployment strategy minimizes user disruption during migrations

Prerequisites & Requirements

  • Understanding of Apache Airflow and workflow management
  • Familiarity with Python and virtual environment management tools like pyenv and Poetry(optional)

Key Questions Answered

What steps are involved in migrating Apache Airflow to Python 3?
The migration involves setting up a Python 3 virtual environment, launching Python 3 workers, cleaning up unused DAGs, fixing incompatibilities in DAGs, moving DAGs to Python 3 workers, migrating Airflow services, and cleaning up Python 2 references. Each step is crucial for ensuring a smooth transition without user disruption.
What common issues arise during the migration of Airflow to Python 3?
Common issues include TypeErrors due to string and bytes mismatches, Snowflake connector issues, and problems with dictionary ordering. Solutions involve explicit type conversions, upgrading libraries, and using OrderedDict to maintain key order.

Technologies & Tools

Some links below are affiliate links. We may earn a commission if you make a purchase.

Workflow Management
Apache Airflow
Used for scheduling and monitoring workflows at Slack.
Task Queue
Celery
Facilitates the execution of tasks in a distributed manner during the migration.
Version Management
Pyenv
Used to manage Python installations and create virtual environments.
Dependency Management
Poetry
Helps manage dependencies for both Python 2 and 3.

Key Actionable Insights

1
Implement a phased migration strategy to minimize risks during major upgrades.
By migrating in phases, such as team by team, you can identify and address issues incrementally, reducing the impact on overall operations.
2
Utilize tools like Poetry for dependency management to streamline the migration process.
Using a single configuration file for dependencies simplifies managing different Python versions and ensures compatibility across environments.
3
Conduct thorough testing of DAGs before full migration to Python 3.
Testing DAGs in a controlled environment helps catch potential issues early, ensuring a smoother transition when moving to production.

Common Pitfalls

1
Migrating the scheduler or web server to Python 3 before fixing DAG incompatibilities can lead to errors.
This happens because the Python 3 scheduler may not be able to parse DAGs that are not compatible, causing failures in task execution.
2
Failing to address SQL file compatibility during migration can lead to runtime errors.
Since tools like futurize only convert Python files, SQL scripts with embedded Python code may be overlooked, necessitating manual conversion.