Get in Touch

Course Outline

Introduction:

  • Apache Spark within the Hadoop Ecosystem
  • Brief introduction to Python and Scala

Fundamentals (Theory):

  • Architecture
  • RDD
  • Transformations and Actions
  • Stages, Tasks, and Dependencies

Hands-on Workshop: Mastering the Basics in the Databricks Environment

  • RDD API exercises
  • Basic transformation and action functions
  • PairRDD
  • Joins
  • Caching strategies
  • DataFrame API exercises
  • SparkSQL
  • DataFrame operations: select, filter, group, and sort
  • UDF (User Defined Function)
  • Exploring the Dataset API
  • Streaming

Hands-on Workshop: Understanding Deployment in the AWS Environment

  • AWS Glue fundamentals
  • Comparing AWS EMR and AWS Glue
  • Sample jobs on both platforms
  • Pros and cons analysis

Additional Topics:

  • Introduction to Apache Airflow orchestration

Requirements

Programming skills (preferably in Python or Scala)

Basic knowledge of SQL

 21 Hours

Number of participants


Price per participant

Testimonials (3)

Upcoming Courses

Related Categories