Hadoop Developer - Itronix Solutions

Hadoop Developer

Want create site? Find Free WordPress Themes and plugins.

Hadoop Developer training in  Mohali and Chandigarh

1. Introduction to Hadoop

  • Data
  • Data Storage and Analysis
  • Comparison with Other Systems
  • RDBMS
  • Grid Computing
  • Volunteer Computing
  • A Brief History of Hadoop
  • Apache Hadoop and the Hadoop Ecosystem
  • Hadoop Releases

2. MapReduce

  • A Weather Dataset
  • Data Format
  • Analyzing the Data with Unix Tools
  • Analyzing the Data with Hadoop
  • Map and Reduce
  • Java MapReduce
  • Scaling Out
  • Data Flow
  • Combiner Functions
  • Running a Distributed MapReduce Job
  • Hadoop Streaming
  • Compiling and Running

3. The Hadoop Distributed File System (HDFS)

  • The Design of HDFS
  • HDFS Concepts
  • Blocks
  • Namenodes and Datanodes
  • HDFS Federation
  • HDFS High-Availability
  • The Command-Line Interface
  • Basic Filesystem Operations
  • Hadoop Filesystems
  • Interfaces
  • The Java Interface
  • Reading Data from a Hadoop URL
  • Reading Data Using the FileSystem API
  • Writing Data
  • Directories
  • Querying the Filesystem
  • Deleting Data
  • Data Flow
  • Anatomy of a File Read
  • Anatomy of a File Write
  • Coherency Model
  • Parallel Copying with distcp
  • Keeping an HDFS Cluster Balanced
  • Hadoop Archives

4. Hadoop I/O

  • Data Integrity
  • Data Integrity in HDFS
  • LocalFileSystem
  • ChecksumFileSystem
  • Compression
  • Codecs
  • Compression and Input Splits
  • Using Compression in MapReduce
  • Serialization
  • The Writable Interface
  • Writable Classes
  • File-Based Data Structures
  • SequenceFile
  • MapFile

5. Developing a MapReduce Application

  • The Configuration API
  • Combining Resources
  • Variable Expansion
  • Configuring the Development Environment
  • Managing Configuration
  • GenericOptionsParser, Tool, and ToolRunner
  • Writing a Unit Test
  • Mapper
  • Reducer
  • Running Locally on Test Data
  • Running a Job in a Local Job Runner
  • Testing the Driver
  • Running on a Cluster
  • Packaging
  • Launching a Job
  • The MapReduce Web UI
  • Retrieving the Results
  • Debugging a Job
  • Hadoop Logs
  • Tuning a Job
  • Profiling Tasks
  • MapReduce Workflows
  • Decomposing a Problem into MapReduce Jobs
  • JobControl

6. How MapReduce Works

  • Anatomy of a MapReduce Job Run
  • Classic MapReduce (MapReduce 1)
  • Failures
  • Failures in Classic MapReduce
  • Failures in YARN
  • Job Scheduling
  • The Capacity Scheduler
  • Shuffle and Sort
  • The Map Side
  • The Reduce Side
  • Configuration Tuning
  • Task Execution
  • The Task Execution Environment
  • Speculative Execution
  • Output Committers
  • Task JVM Reuse
  • Skipping Bad Records

7. MapReduce Types and Formats

  • MapReduce Types
  • The Default MapReduce Job
  • Input Formats
  • Input Splits and Records
  • Text Input
  • Binary Input
  • Multiple Inputs
  • Database Input (and Output)
  • Output Formats
  • Text Output
  • Binary Output
  • Multiple Outputs
  • Lazy Output
  • Database Output

8. MapReduce Features

  • Counters
  • Built-in Counters
  • User-Defined Java Counters
  • User-Defined Streaming Counters
  • Sorting
  • Preparation
  • Partial Sort
  • Total Sort
  • Secondary Sort
  • Joins
  • Map-Side Joins
  • Reduce-Side Joins
  • Side Data Distribution
  • Using the Job Configuration
  • Distributed Cache
  • MapReduce Library Classes

9. Setting Up a Hadoop Cluster

  • Cluster Specification
  • Network Topology
  • Cluster Setup and Installation
  • Installing Java
  • Creating a Hadoop User
  • Installing Hadoop
  • Testing the Installation
  • SSH Configuration
  • Hadoop Configuration
  • Configuration Management
  • Environment Settings
  • Important Hadoop Daemon Properties
  • Hadoop Daemon Addresses and Ports
  • Other Hadoop Properties
  • User Account Creation
  • YARN Configuration
  • Important YARN Daemon Properties
  • YARN Daemon Addresses and Ports
  • Security
  • Kerberos and Hadoop
  • Delegation Tokens
  • Other Security Enhancements
  • Benchmarking a Hadoop Cluster
  • Hadoop Benchmarks
  • User Jobs
  • Hadoop in the Cloud
  • Hadoop on Amazon EC2

10. Administering Hadoop

  • HDFS
  • Persistent Data Structures
  • Safe Mode
  • Audit Logging
  • Tools
  • Monitoring
  • Logging
  • Metrics
  • Java Management Extensions
  • Routine Administration Procedures
  • Commissioning and Decommissioning Nodes
  • Upgrades

11. Pig

  • Installing and Running Pig
  • Execution Types
  • Running Pig Programs
  • Grunt
  • Pig Latin Editors
  • An Example
  • Generating Examples
  • Comparison with Databases
  • Pig Latin
  • Structure
  • Statements
  • Expressions
  • Types
  • Schemas
  • Functions
  • Macros
  • User-Defined Functions
  • A Filter UDF
  • An Eval UDF
  • A Load UDF
  • Data Processing Operators
  • Loading and Storing Data
  • Filtering Data
  • Grouping and Joining Data
  • Sorting Data
  • Combining and Splitting Data
  • Pig in Practice
  • Parallelism
  • Parameter Substitution

12. Hive

  • Installing Hive
  • The Hive Shell
  • An Example
  • Running Hive
  • Configuring Hive
  • Hive Services
  • Comparison with Traditional Databases
  • Schema on Read Versus Schema on Write
  • Updates, Transactions, and Indexes
  • HiveQL
  • Data Types
  • Operators and Functions
  • Tables
  • Managed Tables and External Tables
  • Partitions and Buckets
  • Storage Formats
  • Importing Data
  • Altering Tables
  • Dropping Tables
  • Querying Data
  • Sorting and Aggregating
  • MapReduce Scripts
  • Joins
  • Subqueries
  • Views
  • User-Defined Functions
  • Writing a UDF
  • Writing a UDAF

13. Hbase

  • Backdrop
  • Concepts
  • Whirlwind Tour of the Data Model
  • Implementation
  • Installation
  • Test Drive
  • Clients
  • Java
  • Avro, REST, and Thrift
  • Schemas
  • Loading Data
  • Web Queries
  • HBase Versus RDBMS
  • Successful Service
  • Hbase

14. ZooKeeper

  • Installing and Running ZooKeeper
  • Group Membership in ZooKeeper
  • Creating the Group
  • Joining a Group
  • Listing Members in a Group
  • Deleting a Group
  • The ZooKeeper Service
  • Data Model
  • Operations
  • Implementation
  • Consistency
  • Sessions
  • States

15. Sqoop

  • Getting Sqoop
  • A Sample Import
  • Generated Code
  • Additional Serialization Systems
  • Database Imports: A Deeper Look
  • Controlling the Import
  • Imports and Consistency
  • Direct-mode Imports
  • Working with Imported Data
  • Imported Data and Hive
  • Importing Large Objects

16. Flume

  • Introduction
    • Overview
    • Architecture
  • Data flow model
  • Reliability
  • Building Flume
    • Getting the source
    • Compile/test Flume
  • Developing custom components
    • Client
      • Client SDK
      • RPC client interface
      • RPC clients – Avro and Thrift
      • Failover Client
      • Load Balancing RPC client
    • Embedded agent
    • Transaction interface
    • Sink
    • Source
    • Channel
Did you find apk for android? You can find new Free Android Games and apps.