Big Data Hadoop Developer with Spark, Scala & Kafka

Languages
English + Hindi + Kannada + Telugu
Batch Size
30-50
Duration
40 hours
Investment
$$$$$
Request Training Proposal

Course Contents

​Understanding Spark, Scala & Kafka from learning to on-the-job perspectives.

  • This Big data & Spark training will start with enabling learners to understand the big data systems. How divide and rule helped solving the big data problems. Then moving on to, how Spark executes in-memory data processing and runs much faster than old age Hadoop MapReduce architecture.
  • This course will also help you understand the basics Kafka framework, which is used as a streaming platform.
  • This course will also help you how to optimize your programs, by using the correct file formats and compression.

  1. The Motivation for Hadoop
    • Problems with traditional large-scale systems
    • Requirements for a new approach
  2. HDFS - Hadoop Distributed File System
    • Learn How HDFS Works
    • HDFS Design and Architecture
    • HDFS Concepts
    • Interacting HDFS using command Line
    • Dataflow
    • Blocks
    • Replica
    • HDFS Commands
  3. Setting local environment for development
    • If environment is already setup how to use it.
    • Working with oracle vm virtual box
    • Installation
    • Working with HDFS Commands on cloudera
  4. Understanding a MapReduce Framework
    • The MapReduce Flow
    • Running word count program (No coding required by learners)
    • PIG Concepts
    • Data loading in PIG
    • Data Extraction in PIG
    • Data Transformation in PIG
    • PIG UDF
    • PIG hands on Exercise
  5. Hive
  6. Hive Concepts
  7. What is HIVE?
  8. Hive Query Language
  9. External tables & Internal/Managed tables
  10. Advance features of hive query.
    • Partitions
      • Partition External Tables
      • Dynamic Partition
    • Bucketing
    • Serialization / De-serialization
    • Reading / Writing different File formats
  11. Joins in Hive
    • Joins basics
    • How joins works internally
  12. Unions in hive
  13. Hive UDF
  14. Hive Query optimizations
    • Optimizing joins
    • Shuffle Joins (Common Join)
    • Mapjoins
    • Bucket Map Join
    • Sort Merge Bucket Map Join
    • Skew Join
  15. Best Practices in Hive
  16. Hive hands on Exercise
  17. File formats
    • Sequence files
    • Map files
    • RC files
    • ORC files & Praquet files
    • Avro files
  18. Sqoop
    • Introduction
    • Import Data
    • Export Data
    • Databases connections
    • Sqoop Hands on Exercise
  19. Scala Programming
    • What is Scala
    • Scala vs JAVA
    • Scala Installation
    • Setting up intelij for Scala
    • Scala classes & Objects
      • Single-ton objects
      • Abstraction
      • Inheritance
      • Traits
    • Functional programming with Scala
      • Higher order functions
      • Anonymous functions
      • In-line functions
    • Scala collections
    • Scala Exception Handling
    • Pattern matching and case classes
    • Hands on session
  20. Scala Build tool (SBT)
    • Installation & Usage
    • SBT vs Maven
    • Understanding sbt file
    • sbt commands
    • Building project and making jars using SBT
    • Making your own project and compiling it
  21. Spark
    • Spark Basics
    • What is Apache Spark
    • Architecture and Execution Model
    • Using the Spark Shell
    • RDDs (Resilient Distributed Datasets)
    • Difference between spark1.6 and spark 2.0
  22. Spark Vs Mapreduce
    • When to use what
  23. Working with RDDs in Spark
    • A Closer Look at RDDs
    • Map, flatMap, reduce and much more
    • Transformation & Actions
    • Lazy Evaluation
    • Other RDD Operations
    • Persistence (Caching)
    • Hands on session
  24. Working with Key/Value pair
    • Creating pair RDD
    • Transformations on pair RDD
      • Aggregations
      • Grouping data
      • Joins
      • Sorting data
  25. Writing and Deploying Spark Applications
    • Creating project and converting into jar file
    • Building a Spark Application
    • Running a Spark Application
    • The Spark Application Web UI
    • Configuring Spark Properties
      • Memory optimizations
    • Sizing the cluster and resources for application
    • Logging
  26. Spark SQL & Dataframe
    • Spark SQL context
    • Creating Dataframe
    • Why Dataframes
    • Understanding Dataframe internal optimizations and processing
    • Running SQL queries programmatically
    • Loading and saving data
      • Hive
      • Parquet
      • JSON
      • From RDD’s
    • Spark SQL hands on session.
  27. How  to create a google cloud platform
  28. Creating of clusters and working with spark applications
  29. Apache Kafka
    • Understanding Kafka
    • Understanding Topics, borkers, partitions etc
    • Kafka Producer
    • Kafka consumer
  30. Hands on Project
    • Understanding project
    • Discussion on Project Solution approach
    • Solving participants problems
  31. Showing the Difference between pyspark and spark using pycharm (Ide)

  • ​Learners are supposed to be with good with basics of programing and will get trained on different APIs which Spark offers such as Spark SQL, Spark RDD.
  • Learners to have basic knowledge of Hadoop: the initial Hadoop basics will be skipped or will spend very less time to brush them up.
  • All the sessions or topics covered in the sessions will be followed by practicals, so all the participants are expected to get their hands dirty with the coding.

Instructor Profile

instructor_image

18+ years of professional IT corporate training experience.

This course includes:

  • 100% Online Sessions
  • Instructor led
  • Customizable Syllabus
  • Customizable Schedule
  • Certificate of Completion
  • Training Recordings
  • Training Resources
  • Learner Assessment
Request Training Proposal

StepUpwards Training Requirement Form

Please complete this form so we can get back to you with a training proposal.

By clicking "Submit", you agree to our Terms of Use , Privacy Policy and Service Agreement

Individuals younger than 18 years of age, but of the required age for consent to use online services, as per applicable law in their country of residence may set up an account and enroll in appropriate courses through their parent or guardian. Individuals younger than the required age for consent to use online services may not use the Services offered by StepUpwards Platform. For more details, please refer to our Privacy Policy.