Course Outline

Introduction

Scala Programming in Depth Review

  • Syntax and structure
  • Flow control and functions

Spark Internals

  • Resilient Distributed Datasets (RDD)
  • Spark script to graph to cluster

Overview of Spark Streaming

  • Streaming architecture
  • Intervals in streaming
  • Fault tolerance

Preparing the Development Environment

  • Installing and configuring Apache Spark
  • Installing and configuring the Scala IDE
  • Installing and configuring JDK

Spark Streaming Beginner to Advanced

  • Working with key/value RDD's
  • Filtering RDD's
  • Improving Spark scripts with regular expressions
  • Sharing data on a cluster
  • Working with network data sets
  • Implementing BFS algorithms
  • Creating Spark driver scripts
  • Tracking in real time with scripts
  • Writing continuous applications
  • Streaming linear regression
  • Using Spark Machine Learning Library

Spark and Clusters

  • Bundling dependencies and Spark scripts using the SBT tool
  • Using EMR for illustrating clusters
  • Optimizing by partitioning RDD's
  • Using Spark logs

Integration in Spark Streaming

  • Integrating Apache Kafka and working with Kafka topics
  • Integrating Apache Fume and working with pull-based/push-based Flume configurations
  • Writing a custom receiver class
  • Integrating Cassandra and exposing data as real-time services

In Production

  • Packaging an application and running it with Spark-Submit
  • Troubleshooting, tuning, and debugging Spark Jobs and clusters

Summary and Conclusion

Requirements

  • Programming and scripting experience

Audience

  • Software Engineers
 21 Hours

Number of participants



Price per participant

Testimonials (5)

Related Courses

Python and Spark for Big Data (PySpark)

21 Hours

Introduction to Graph Computing

28 Hours

Artificial Intelligence - the most applied stuff - Data Analysis + Distributed AI + NLP

21 Hours

Apache Spark MLlib

35 Hours

Akka - from Beginner to Intermediate

21 Hours

Machine Learning Fundamentals with Scala and Apache Spark

14 Hours

Scala: Advanced Object-Functional Programming

14 Hours

Scala: Advanced Functional Programming

14 Hours

Programming in Scala

14 Hours

Big Data Analytics in Health

21 Hours

Hadoop and Spark for Administrators

35 Hours

Hortonworks Data Platform (HDP) for Administrators

21 Hours

A Practical Introduction to Stream Processing

21 Hours

Magellan: Geospatial Analytics on Spark

14 Hours

Apache Spark for .NET Developers

21 Hours

Related Categories

1