Big Data - Hadoop

What is Big Data ?

Big data is a term that describes the large volume of data – both structured and unstructured – that inundates a business on a day-to-day basis. Big data is a term used to refer to data sets that are too large or complex for traditional data-processing application software to adequately deal with. Data with many cases (rows) offer greater statistical power, while data with higher complexity (more attributes or columns) may lead to a higher false discovery rate. Big data challenges include capturing data, data storage, data analysis, Search, sharing, transfer, visualization, querying, updating, information privacy and data source. Big data was originally associated with three key concepts: volume, variety, velocity and veracity.

  • Who can learn Big Data?

  • While anyone who is an intelligent technologist, can very easily pick up the skills for Hadoop, there happen to be certain pre-requisites that a candidate must fulfil. While there happens to be no hard and fast rule about knowing certain tricks very well, but it is kind of mandatory for a candidate to at least know the workings of Java and Linux.

  • What do I Learn?

  • Data Structure, Python, Machine Learning using Python and Hadoop, Supervised and unsupervised learning, Scikit and Hadoop, Hadoop Cluster Architecture, Apache Spark, Scala

Big Data - Hadoop

    Introduction to Hadoop

    What is Big Data
    Need and significance of innovative technologies
    what is Hadoop
    3 Vs (Characteristics)
    History of Hadoop and its Uses
    Different Components of Hadoop
    Various Hadoop Distributions

    HDFS ( Hadoop Distributed File System)

    Significance of HDFS in Hadoop
    HDFS Features
        a) NameNode
        b) DataNode
        c) JobTracker
        d) TaskTrack
        e) Secondary NameNode

    HDFS ( Hadoop Distributed File System)

    Data Storage in HDFS
        a) Blocks
        b) Heartbeats
        c) Data Replication
        d)HDFS Federation
        e) High Availability

    HDFS ( Hadoop Distributed File System)

    Accessing HDFS
        a) Blocks CLI (Command Line Interfac e) Unix and Hadoop Commands
        b) Java Based Approach
    Data Flow
        a) Anatomy of a File Read
        b) Anatomy of a File Write
    Hadoop Archives


    Introduction to MapReduce
    MapReduce Architecture
    MapReduce Programming Model
    MapReduce Algorithm and Phases
    Data Types
    Input Splits and Records
    Blocks Vs Splits
    Basic MapReduce Program
        a) Blocks
        b) Driver Code
        c) Mapper Code
        d) Reducer Code
        e) Combiner and Shuffler
    Creating Input and Output formats in MapReduce Jobs
        a) File Input / Output Format
        b) Text Input / Output Format
        c) Sequence File Input / Output Format,etc.
    How to Debug MapReduce Jobs in Local and Pseudo cluster mode
    The MapReduce Web UI
    Data Localization in MapReduce
    Distributed Cache
    Compression Mechanisms
        a) Map-Side Joins
        b) Compression Mechanisms


    Introduction to Apache Pig
    MapReduce Vs. Apache Pig
    SQL Vs. Apache Pig
    Different Data types in Apache Pig
    Modes of Execution in Apache Pig
        a) Local Mode
        b) Map Reduce or Distributed Mode
    Execution Mechanism
        a) Grunt shell
        b) Scripta
        c) Embedded
    Data Processing Operators
        a) Loading and Storing Data
        b) Filtering Data
        c) Grouping and Joining Data
        d) Sorting Data
        e) Combining and Splitting Data
    How to write a simple PIG Script
    UDFs in PIG


    Introduction to Sqoop
    Sqoop Architecture and Internals
    MySQL client and server installation
    How to connect relational database using Sqoop
    Sqoop Commands
        a) Different flavors of imports
        b) Export
        c) HIVE imports


    The Metastore
    Comparison with Traditional Databases
        a) Schema on Read Versus Schema on Write
        b) Updates, Transactions, and Indexes
        a) Data Types
        b) Operators and Functions
        a) Managed Tables and External Tables
        b) Partitions and Buckets
        c) Storage Formats
        d) Importing Data
        e) Altering Tables
        f) Dropping Tables
    Querying Data
        a) Sorting and Aggregating
        b) MapReduce Scripts
        c) Joins
        d) Subqueries
        e) Views
    User-Defined Functions
        a) Writing a UDF
        b) Writing a UDAF


    Introduction to Hbase
    HBaseVs HDFS
    Use Cases
    Basics Concepts
        a) Column families
        b) Scans
    Hbase Architecture
    Zoo Keeper
        a) REST
        b) Thrift
        c) Java Based
        d) Avro
    MapReduce integration
    MapReduce over Hbase
    Schema definition
    Basic CRUD Operations

Course Features

  • Next Batch Saturdays
  • Duration 30 Hrs
  • Students 57
  • Certificate Yes
  • Price Call - 91485 67987

About Us

We are the leading State of the Art Skill enhancer in the field of professional's training. Our idea of enhancing skills is through detailed industry research, experiential training, consulting, collaborations, innovations, and importantly experiments.

We have a proud Partnership with Leading trainers in the Industry ExcelR Solutions.

Reach Us

Skillnest Solutions,
Rajarajeshwari Nagar, Bangalore-560098

+91 91485 67987


• PMI®, PMBOK® Guide, PMP®, PgMP®, CAPM®, PMI-RMP®, PMI-ACP® are registered marks of the Project Management Institute (PMI)®