Masters Program – Big Data

Image

Masters Program - Big Data

The world is getting increasingly digital, and this means big data is here to stay. In fact, the importance of big data and data analytics is going to continue growing in the coming years. Choosing a career in the field of big data and analytics might just be the type of role that you have been trying to find to meet your career expectations.

Target Audience

Big Data career opportunities are on the rise. Big Data training is best suited for IT, data management, and analytics professionals looking to gain expertise in Big Data, including:

  • Software Developers and Architects
  • Analytics Professionals
  • Senior IT professionals
  • Testing and Mainframe Professionals
  • Data Management Professionals
  • Business Intelligence Professionals
  • Project Managers
  • Aspiring Data Scientists
  • Graduates looking to build a career in Big Data Analytics

Course Objectives

Big Data Master’s Program will help you master skills and tools like Cassandra Architecture, Data Model Creation, Database Interfaces, Advanced Architecture, Spark, Scala, RDD, SparkSQL, Spark Streaming, Spark ML,GraphX, Replication, Sharding, Scalability, Hadoop clusters, Storm Architecture, Ingestion, Zookeeper and Kafka Architecture. These skills will help you prepare for the role of a Big Data Hadoop architect.

 


 

Course Curriculum


Section 1 - Storing huge amounts of Data - Hadoop

  • Introduction to Hadoop
  • Map Reduce and its example applications
  • Hadoop architecture and HDFS
  • Setting up Hadoop


Section 2 - Introduction to programming in Scala

  • Introduction to Scala, installing Scala
  • Variable declaration, iterations, collections
  • Map, reduce functions in Scala


Section 3 - Spark

  • Introduction to Spark
  • RDDs and Difference with Hadoop Map Reduce
  • Spark architecture, Spark drivers and workers, Task scheduler, Spark Context               
  • Transformations and Spark Actions, Introduction to Spark Frameworks,
  • Key-value transformations. Key-value RDDs, Examples of programming in Spark using Scala


Section 4- Spark Frameworks - Spark Streaming

  • Introduction and Motivation
  • Discretized stream processing, window based transformations
  • Dstreams, Dstream execution. Programming examples
  • Structured Streaming and its windowed execution with examples


Section 5- Spark Framework - Spark SQL

  • Architecture- Interfaces to Spark SQL and interaction with Spark, Data Model               
  • Dataframe operations, querying datasets, User defined functions
  • Execution stages – Catalysts, Optimizer generators, Logical Plan, Plan optimization and execution, Code generation


Section 6 - Hive

  • Introduction, Installation, Data types, operations and functions
  • Usage and programming examples- Select query, partition based query, joins and aggregations
  • Usage and programming examples – insert, array operations, map reduce operations


Section 7 - Apache Flume and Sqoop

  • Introduction and applications
  • Flume Architecture
  • Flume Dataflow
  • Installation and configuration with an example
  • Introduction to Sqoop
  • Installation and configuration of Sqoop


Section 8- Resource Managers

  • Introduction and motivation. Mesos and its architecture
  • Yarn and its architecture
  • Yarn resource allocation and usage, Yarn application API
  • Sample Yarn application – distributed shell
  • Map reduce and Yarn, Introduction to REEF


Section 9- Replicating State Machines

 

  • Scaling a service, failures
  • Lock service, leased locks
  • Consensus, CAP theorem
  • Paxos, Finite State Machine
  • RSM and consensus, replicated log
  • VR
  • Raft

 


Section 10- State of the Art technologies in Big Data Storage

  • Amazon EC2
  • Microsoft Azure
  • NoSQL Databases

 


Section 11- Apache Kafka and Cassandra

 

  • Messaging System, Kafka introduction, its benefits
  • Kafka Cluster Architecture, Zookeeper
  • Kafka Cluster Architecture, Zookeeper
  • Kafka installation and example – Creating a topic, subscribing to a topic, starter/producer/consumer messages, modifying a topic, deleting a topic – could be covered in two lectures
  • Integration with Spark
  • Introduction to Cassandra and its features
  • Data replication, Cassandra Query Language
  • Cassandra installation and introduction to API