Skill Up Card - Course Bundles

Save up to €4,145 per delegate.

skill up card logo - Nexus Human

Cloudera Data Analyst Training - Using Pig, Hive, and Impala with Hadoop

4.6 out of 5 rating

Jump to dates

Duration

4 Days

24 CPD hours

About this course

This course is designed for data analysts, business intelligence specialists, developers, system architects, and database administrators.

Overview

Skills gained in this training include:The features that Pig, Hive, and Impala offer for data acquisition, storage, and analysisThe fundamentals of Apache Hadoop and data ETL (extract, transform, load), ingestion, and processing with HadoopHow Pig, Hive, and Impala improve productivity for typical analysis tasksJoining diverse datasets to gain valuable business insightPerforming real-time, complex queries on datasets

Description

Cloudera University's four-day data analyst training course focusing on Apache Pig and Hive and Cloudera Impala will teach you to apply traditional data analytics and business intelligence skills to big data.

Hadoop Fundamentals
  • The Motivation for Hadoop
  • Hadoop Overview
  • Data Storage: HDFS
  • Distributed Data Processing: YARN, MapReduce, and Spark
  • Data Processing and Analysis: Pig, Hive, and Impala
  • Data Integration: Sqoop
  • Other Hadoop Data Tools
  • Exercise Scenarios Explanation
Introduction to Pig
  • What Is Pig
  • Pigs Features
  • Pig Use Cases
  • Interacting with Pig
Basic Data Analysis with Pig
  • Pig Latin Syntax
  • Loading Data
  • Simple Data Types
  • Field Definitions
  • Data Output
  • Viewing the Schema
  • Filtering and Sorting Data
  • Commonly-Used Functions
Processing Complex Data with Pig
  • Storage Formats
  • Complex/Nested Data Types
  • Grouping
  • Built-In Functions for Complex Data
  • Iterating Grouped Data
Multi-Dataset Operations with Pig
  • Techniques for Combining Data Sets
  • Joining Data Sets in Pig
  • Set Operations
  • Splitting Data Sets
Pig Troubleshoot & Optimization
  • Troubleshooting Pig
  • Logging
  • Using Hadoops Web UI
  • Data Sampling and Debugging
  • Performance Overview
  • Understanding the Execution Plan
  • Tips for Improving the Performance of Your Pig Jobs
Introduction to Hive & Impala
  • What Is Hive
  • What Is Impala
  • Schema and Data Storage
  • Comparing Hive to Traditional Databases
  • Hive Use Cases
Querying with Hive & Impala
  • Databases and Tables
  • Basic Hive and Impala Query Language Syntax
  • Data Types
  • Differences Between Hive and Impala Query Syntax
  • Using Hue to Execute Queries
  • Using the Impala Shell
Data Management
  • Data Storage
  • Creating Databases and Tables
  • Loading Data
  • Altering Databases and Tables
  • Simplifying Queries with Views
  • Storing Query Results
Data Storage & Performance
  • Partitioning Tables
  • Choosing a File Format
  • Managing Metadata
  • Controlling Access to Data
Relational Data Analysis with Hive & Impala
  • Joining Datasets
  • Common Built-In Functions
  • Aggregation and Windowing
Working with Impala
  • How Impala Executes Queries
  • Extending Impala with User-Defined Functions
  • Improving Impala Performance
Analyzing Text and Complex Data with Hive
  • Complex Values in Hive
  • Using Regular Expressions in Hive
  • Sentiment Analysis and N-Grams
  • Conclusion
Hive Optimization
  • Understanding Query Performance
  • Controlling Job Execution Plan
  • Bucketing
  • Indexing Data
Extending Hive
  • SerDes
  • Data Transformation with Custom Scripts
  • User-Defined Functions
  • Parameterized Queries
Choosing the Best Tool for the Job
  • Comparing MapReduce, Pig, Hive, Impala, and Relational Databases
  • Which to Choose
Additional course details:

Nexus Humans Cloudera Data Analyst Training - Using Pig, Hive, and Impala with Hadoop training program is a workshop that presents an invigorating mix of sessions, lessons, and masterclasses meticulously crafted to propel your learning expedition forward.

This immersive bootcamp-style experience boasts interactive lectures, hands-on labs, and collaborative hackathons, all strategically designed to fortify fundamental concepts.

Guided by seasoned coaches, each session offers priceless insights and practical skills crucial for honing your expertise. Whether you're stepping into the realm of professional skills or a seasoned professional, this comprehensive course ensures you're equipped with the knowledge and prowess necessary for success.

While we feel this is the best course for the Cloudera Data Analyst Training - Using Pig, Hive, and Impala with Hadoop course and one of our Top 10 we encourage you to read the course outline to make sure it is the right content for you.

Additionally, private sessions, closed classes or dedicated events are available both live online and at our training centres in Dublin and London, as well as at your offices anywhere in the UK, Ireland or across EMEA.

Training Insurance Included!

When you organise training, we understand that there is a risk that some people may fall ill, become unavailable. To mitigate the risk we include training insurance for each delegate enrolled on our public schedule, they are welcome to sit on the same Public class within 6 months at no charge, if the case arises.

What people say about us


Find out more about this course

Interested in alternative dates? Would like to book a private session of this course for your company? Or for any other queries please simply fill out the form below.