FB Twitter Linkedin Instagram Big Data Analyst - Itronix Solutions

Big Data Analyst

BIG DATA TRAINING IN MOHALI AND CHANDIGARH

Course Contant

Introduction Hadoop Fundamentals

  • The Motivation for Hadoop
  • Hadoop Overview
  • Data Storage: HDFS
  • Distributed Data Processing: YARN, MapReduce, and Spark
  • Data Processing and Analysis: Pig, Hive, and Impala
  • Database Integration: Sqoop
  • Other Hadoop Data Tools
  • Exercise Scenarios

Introduction to Pig

  • What is Pig?
  • Pig’s Features
  • Pig Use Cases
  • Interacting with Pig

Basic Data Analysis with Pig

  • Pig Latin Syntax
  • Loading Data
  • Simple Data Types
  • Field Definitions
  • Data Output
  • Viewing the Schema
  • Filtering and Sorting Data
  • Commonly Used Functions

Processing Complex Data with Pig

  • Storage Formats
  • Complex/Nested Data Types
  • Grouping
  • Built-In Functions for Complex Data
  • Iterating Grouped Data

Multi-Dataset Operations with Pig

  • Techniques for Combining Datasets
  • Joining Datasets in Pig
  • Set Operations
  • Splitting Datasets

Pig Troubleshooting and Optimization

  • Troubleshooting Pig
  • Logging
  • Using Hadoop’s Web UI
  • Data Sampling and Debugging
  • Performance Overview
  • Understanding the Execution Plan
  • Tips for Improving the Performance of Pig Jobs

 

Introduction to Hive and Impala

  • What is Hive?
  • What is Impala?
  • Why Use Hive and Impala?
  • Schema and Data Storage
  • Comparing Hive and Impala to Traditional Databases
  • Use Cases

Querying with Hive and Impala

  • Databases and Tables
  • Basic Hive and Impala Query Language Syntax
  • Data Types
  • Using Hue to Execute Queries
  • Using Beeline (Hive’s Shell)
  • Using the Impala Shell

Hive and Impala Data Management

  • Data Storage
  • Creating Databases and Tables
  • Loading Data
  • Altering Databases and Tables
  • Simplifying Queries with Views
  • Storing Query Results

Data Storage and Performance

  • Partitioning Tables
  • Loading Data into Partitioned Tables
  • When to Use Partitioning
  • Choosing a File Format
  • Using Avro and Parquet File Formats Course details:

Relational Data Analysis with Hive and Impala

  • Joining Datasets
  • Common Built-In Functions
  • Aggregation and Windowing

Complex Data with Hive and Impala

  • Complex Data with Hive
  • Complex Data with Impala

Analyzing Text with Hive and Impala

  • Using Regular Expressions with Hive and Impala
  • Processing Text Data with SerDes in Hive
  • Sentiment Analysis and n-grams

Hive Optimization

  • Understanding Query Performance
  • Bucketing
  • Indexing Data
  • Hive on Spark Impala Optimization
  • How Impala Executes Queries
  • Improving Impala Performance

Extending Hive and Impala

  • Custom SerDes and File Formats in Hive
  • Data Transformation with Custom Scripts in Hive
  • User-Defined Functions
  • Parameterized Queries

Choosing the Best Tool for the Job

  • Comparing Pig, Hive, Impala, and Relational Databases
  • Which to Choose?

Conclusion