Masters Program – Big Data


Masters Program - Big Data

The world is getting increasingly digital, and this means big data is here to stay. In fact, the importance of big data and data analytics is going to continue growing in the coming years. Choosing a career in the field of big data and analytics might just be the type of role that you have been trying to find to meet your career expectations.

Target Audience

Big Data career opportunities are on the rise. Big Data training is best suited for IT, data management, and analytics professionals looking to gain expertise in Big Data, including:

  • Software Developers and Architects
  • Analytics Professionals
  • Senior IT professionals
  • Testing and Mainframe Professionals
  • Data Management Professionals
  • Business Intelligence Professionals
  • Project Managers
  • Aspiring Data Scientists
  • Graduates looking to build a career in Big Data Analytics

Course Objectives

Big Data Master’s Program will help you master skills and tools like Cassandra Architecture, Data Model Creation, Database Interfaces, Advanced Architecture, Spark, Scala, RDD, SparkSQL, Spark Streaming, Spark ML,GraphX, Replication, Sharding, Scalability, Hadoop clusters, Storm Architecture, Ingestion, Zookeeper and Kafka Architecture. These skills will help you prepare for the role of a Big Data Hadoop architect.



Course Curriculum

Section 1 - Storing huge amounts of Data - Hadoop

  • Introduction to Hadoop
  • Map Reduce and its example applications
  • Hadoop architecture and HDFS
  • Setting up Hadoop

Section 2 - Introduction to programming in Scala

  • Introduction to Scala, installing Scala
  • Variable declaration, iterations, collections
  • Map, reduce functions in Scala

Section 3 - Spark

  • Introduction to Spark
  • RDDs and Difference with Hadoop Map Reduce
  • Spark architecture, Spark drivers and workers, Task scheduler, Spark Context               
  • Transformations and Spark Actions, Introduction to Spark Frameworks,
  • Key-value transformations. Key-value RDDs, Examples of programming in Spark using Scala

Section 4- Spark Frameworks - Spark Streaming

  • Introduction and Motivation
  • Discretized stream processing, window based transformations
  • Dstreams, Dstream execution. Programming examples
  • Structured Streaming and its windowed execution with examples

Section 5- Spark Framework - Spark SQL

  • Architecture- Interfaces to Spark SQL and interaction with Spark, Data Model               
  • Dataframe operations, querying datasets, User defined functions
  • Execution stages – Catalysts, Optimizer generators, Logical Plan, Plan optimization and execution, Code generation

Section 6 - Hive

  • Introduction, Installation, Data types, operations and functions
  • Usage and programming examples- Select query, partition based query, joins and aggregations
  • Usage and programming examples – insert, array operations, map reduce operations

Section 7 - Apache Flume and Sqoop

  • Introduction and applications
  • Flume Architecture
  • Flume Dataflow
  • Installation and configuration with an example
  • Introduction to Sqoop
  • Installation and configuration of Sqoop

Section 8- Resource Managers

  • Introduction and motivation. Mesos and its architecture
  • Yarn and its architecture
  • Yarn resource allocation and usage, Yarn application API
  • Sample Yarn application – distributed shell
  • Map reduce and Yarn, Introduction to REEF

Section 9- Replicating State Machines


  • Scaling a service, failures
  • Lock service, leased locks
  • Consensus, CAP theorem
  • Paxos, Finite State Machine
  • RSM and consensus, replicated log
  • VR
  • Raft


Section 10- State of the Art technologies in Big Data Storage

  • Amazon EC2
  • Microsoft Azure
  • NoSQL Databases


Section 11- Apache Kafka and Cassandra


  • Messaging System, Kafka introduction, its benefits
  • Kafka Cluster Architecture, Zookeeper
  • Kafka Cluster Architecture, Zookeeper
  • Kafka installation and example – Creating a topic, subscribing to a topic, starter/producer/consumer messages, modifying a topic, deleting a topic – could be covered in two lectures
  • Integration with Spark
  • Introduction to Cassandra and its features
  • Data replication, Cassandra Query Language
  • Cassandra installation and introduction to API