Course Outline
Introduction to Apache Airflow
- Understanding workflow orchestration
- Key features and advantages of Apache Airflow
- Overview of improvements in Airflow 2.x and its ecosystem
Architecture and Core Concepts
- Roles of the scheduler, web server, and worker processes
- Components including DAGs, tasks, and operators
- Executors and backends options (Local, Celery, Kubernetes)
Installation and Setup
- Installing Airflow in local and cloud settings
- Configuring Airflow with various executors
- Establishing metadata databases and connections
Navigating the Airflow UI and CLI
- Exploring the Airflow web interface
- Monitoring DAG runs, tasks, and logs
- Utilizing the Airflow CLI for administrative tasks
Authoring and Managing DAGs
- Creating DAGs using the TaskFlow API
- Applying operators, sensors, and hooks
- Managing dependencies and scheduling intervals
Integrating Airflow with Data and Cloud Services
- Connecting to databases, APIs, and message queues
- Executing ETL pipelines with Airflow
- Cloud integrations involving AWS, GCP, and Azure operators
Monitoring and Observability
- Reviewing task logs and real-time monitoring
- Tracking metrics with Prometheus and Grafana
- Setting up alerting and notifications via email or Slack
Securing Apache Airflow
- Implementing role-based access control (RBAC)
- Configuring authentication with LDAP, OAuth, and SSO
- Managing secrets using Vault and cloud secret stores
Scaling Apache Airflow
- Managing parallelism, concurrency, and task queues
- Utilizing CeleryExecutor and KubernetesExecutor
- Deploying Airflow on Kubernetes using Helm
Best Practices for Production
- Applying version control and CI/CD for DAGs
- Testing and debugging DAGs
- Maintaining reliability and performance at scale
Troubleshooting and Optimization
- Debugging failed DAGs and tasks
- Optimizing DAG performance
- Identifying common pitfalls and how to avoid them
Summary and Next Steps
Requirements
- Prior experience with Python programming
- Familiarity with concepts in data engineering or DevOps
- Understanding of ETL processes or workflow orchestration
Target Audience
- Data scientists
- Data engineers
- DevOps and infrastructure engineers
- Software developers
Testimonials (7)
The instructor adapted the training to the participants’ level and responded to all questions. He was very communicative, and it was easy to interact with him. I really appreciated the format of the training, which included many practical exercises. Overall, it was a very engaging and well-organized session.
Jacek Chlopik - ZAKLAD UBEZPIECZEN SPOLECZNYCH
Course - Apache Airflow: Building and Managing Data Pipelines
The training was spot on. Very useful theory and exercices.
Vladimir - PUBLIC COURSE
Course - Apache Airflow
The training was spot on in all aspects. Usefull theoretical aspects and exercises.
Vladimir - PUBLIC COURSE
Course - Apache Airflow
The training was spot on in all aspects. Usefull theoretical aspects and exercises.
Vladimir - PUBLIC COURSE
Course - Apache Airflow
The training was spot on in all aspects. Usefull theoretical aspects and exercises.
Vladimir - PUBLIC COURSE
Course - Apache Airflow
The training was spot on in all aspects. Usefull theoretical aspects and exercises.
Vladimir - PUBLIC COURSE
Course - Apache Airflow
The training was spot on in all aspects. Usefull theoretical aspects and exercises.