Get in Touch

Course Outline

Introduction to Apache Airflow

  • Understanding workflow orchestration
  • Key features and advantages of Apache Airflow
  • Overview of improvements in Airflow 2.x and its ecosystem

Architecture and Core Concepts

  • Roles of the scheduler, web server, and worker processes
  • Components including DAGs, tasks, and operators
  • Executors and backends options (Local, Celery, Kubernetes)

Installation and Setup

  • Installing Airflow in local and cloud settings
  • Configuring Airflow with various executors
  • Establishing metadata databases and connections

Navigating the Airflow UI and CLI

  • Exploring the Airflow web interface
  • Monitoring DAG runs, tasks, and logs
  • Utilizing the Airflow CLI for administrative tasks

Authoring and Managing DAGs

  • Creating DAGs using the TaskFlow API
  • Applying operators, sensors, and hooks
  • Managing dependencies and scheduling intervals

Integrating Airflow with Data and Cloud Services

  • Connecting to databases, APIs, and message queues
  • Executing ETL pipelines with Airflow
  • Cloud integrations involving AWS, GCP, and Azure operators

Monitoring and Observability

  • Reviewing task logs and real-time monitoring
  • Tracking metrics with Prometheus and Grafana
  • Setting up alerting and notifications via email or Slack

Securing Apache Airflow

  • Implementing role-based access control (RBAC)
  • Configuring authentication with LDAP, OAuth, and SSO
  • Managing secrets using Vault and cloud secret stores

Scaling Apache Airflow

  • Managing parallelism, concurrency, and task queues
  • Utilizing CeleryExecutor and KubernetesExecutor
  • Deploying Airflow on Kubernetes using Helm

Best Practices for Production

  • Applying version control and CI/CD for DAGs
  • Testing and debugging DAGs
  • Maintaining reliability and performance at scale

Troubleshooting and Optimization

  • Debugging failed DAGs and tasks
  • Optimizing DAG performance
  • Identifying common pitfalls and how to avoid them

Summary and Next Steps

Requirements

  • Prior experience with Python programming
  • Familiarity with concepts in data engineering or DevOps
  • Understanding of ETL processes or workflow orchestration

Target Audience

  • Data scientists
  • Data engineers
  • DevOps and infrastructure engineers
  • Software developers
 21 Hours

Number of participants


Price per participant

Testimonials (7)

Upcoming Courses

Related Categories