Masters Program – Big Data


Masters Program - Big Data

The world is getting increasingly digital, and this means big data is here to stay. In fact, the importance of big data and data analytics is going to continue growing in the coming years. Choosing a career in the field of big data and analytics might just be the type of role that you have been trying to find to meet your career expectations.

Target Audience

Big Data career opportunities are on the rise. Big Data training is best suited for IT, data management, and analytics professionals looking to gain expertise in Big Data, including:Software Developers and ArchitectsAnalytics ProfessionalsSenior IT professionalsTesting and Mainframe ProfessionalsData Management ProfessionalsBusiness Intelligence ProfessionalsProject ManagersAspiring Data ScientistsGraduates looking to build a career in Big Data Analytics

Course Objectives

Big Data Master’s Program will help you master skills and tools like Cassandra Architecture, Data Model Creation, Database Interfaces, Advanced Architecture, Spark, Scala, RDD, SparkSQL, Spark Streaming, Spark ML,GraphX, Replication, Sharding, Scalability, Hadoop clusters, Storm Architecture, Ingestion, Zookeeper and Kafka Architecture. These skills will help you prepare for the role of a Big Data Hadoop architect.  

Course Curriculum

Section 1 - Storing huge amounts of Data - Hadoop
Introduction to HadoopMap Reduce and its example applicationsHadoop architecture and HDFSSetting up Hadoop
Section 2 - Introduction to programming in Scala
Introduction to Scala, installing ScalaVariable declaration, iterations, collectionsMap, reduce functions in Scala
Section 3 - Spark
Introduction to SparkRDDs and Difference with Hadoop Map ReduceSpark architecture, Spark drivers and workers, Task scheduler, Spark Context               Transformations and Spark Actions, Introduction to Spark Frameworks,Key-value transformations. Key-value RDDs, Examples of programming in Spark using Scala
Section 4- Spark Frameworks - Spark Streaming
Introduction and MotivationDiscretized stream processing, window based transformationsDstreams, Dstream execution. Programming examplesStructured Streaming and its windowed execution with examples
Section 5- Spark Framework - Spark SQL
Architecture- Interfaces to Spark SQL and interaction with Spark, Data Model               Dataframe operations, querying datasets, User defined functionsExecution stages – Catalysts, Optimizer generators, Logical Plan, Plan optimization and execution, Code generation
Section 6 - Hive
Introduction, Installation, Data types, operations and functionsUsage and programming examples- Select query, partition based query, joins and aggregationsUsage and programming examples – insert, array operations, map reduce operations
Section 7 - Apache Flume and Sqoop
Introduction and applicationsFlume ArchitectureFlume DataflowInstallation and configuration with an exampleIntroduction to SqoopInstallation and configuration of Sqoop
Section 8- Resource Managers
Introduction and motivation. Mesos and its architectureYarn and its architectureYarn resource allocation and usage, Yarn application APISample Yarn application – distributed shellMap reduce and Yarn, Introduction to REEF
Section 9- Replicating State Machines
 Scaling a service, failuresLock service, leased locksConsensus, CAP theoremPaxos, Finite State MachineRSM and consensus, replicated logVRRaft 
Section 10- State of the Art technologies in Big Data Storage
Amazon EC2Microsoft AzureNoSQL Databases 
Section 11- Apache Kafka and Cassandra
 Messaging System, Kafka introduction, its benefitsKafka Cluster Architecture, ZookeeperKafka Cluster Architecture, ZookeeperKafka installation and example – Creating a topic, subscribing to a topic, starter/producer/consumer messages, modifying a topic, deleting a topic – could be covered in two lecturesIntegration with SparkIntroduction to Cassandra and its featuresData replication, Cassandra Query LanguageCassandra installation and introduction to API