Skill Up Card - Course Bundles

Pricing is per delegate, giving you huge savings over the cost of individual courses.

  • UK = £2,000 + VAT per Skill Up Card
  • Ireland = €2,400 per Skill Up Card
skill up card logo - Nexus Human

Introduction to Apache Spark | Hands-on Spark for Big Data & Machine Learning (TTSK7503)

4.6 out of 5 rating Last updated 13/12/2024   English

Jump to outline

Click "Enquire" below to find out more about this course

Interested in available dates? Would like to book a private session of this course for your company? Or for any other queries please simply fill out the form below.


Duration

3 Days

18 CPD hours

Overview

This œskills-centric course is about 50% hands-on lab and 50% lecture, designed to train attendees in core big data/ Spark development and use skills, coupling the most current, effective techniques with the soundest industry practices. Throughout the course students will be led through a series of progressively advanced topics, where each topic consists of lecture, group discussion, comprehensive hands-on lab exercises, and lab review.This course provides indoctrination in the practical use of the umbrella of technologies that are on the leading edge of data science development focused on Spark and related tools. Working in a hands-on learning environment, students will explore:
-Spark Ecosystem
-Spark Shell
-Spark Data structures (RDD, DataFrame, Dataset)
-Spark SQL
-Modern data formats and Spark
-Spark API
-Spark & Hadoop & Hive
-Spark ML overview
-GraphX
-Time-permitting: Spark Streaming
-Time-permitting: Optional Capstone Workshop (Time-Permitting)

Description

Apache Spark, a significant component in the Hadoop Ecosystem, is a cluster computing engine used in Big Data. Building on top of the Hadoop YARN and HDFS ecosystem, it offers order-of-magnitude faster processing for many in-memory computing tasks compared to Map/Reduce. It can be programmed in Java, Scala, Python, and R - the favorite languages of Data Scientists - along with SQL-based front ends. With advanced libraries like Mahout and MLib for Machine Learning, GraphX or Neo4J for rich data graph processing as well as access to other NOSQL data stores, Rule engines and other Enterprise components, Spark is a lynchpin in modern Big Data and Data Science computing.
Geared for experienced developers, Introduction to Apache Spark for Big Data & Machine Learning provides students with a comprehensive, hands-on exploration of enterprise-grade Spark programming, interacting with the significant components mentioned above to craft complete data science solutions. Students will leave this course armed with the skills they require to begin working with Spark in a practical, real world environment.
This course is offered in support of the Python programming language but can also be offered for R or Java with advance notice and planning. Our team will work with you to coordinate the languages, tools and environment that will work best for your organization and needs. Please inquire for details.

Prerequisites

This foundation-level course is geared for intermediate skilled, experienced Developers and Architects (with basic Python experience) who seek to be proficient in advanced, modern development skills working with Apache Spark in an enterprise data environment. TTPS4800 Introduction to Python Programming
TTSQLB3 Introduction to SQL (Basic familiarity is needed, not in-depth SQL skills)

Spark Introduction
  • Big data, Hadoop, Spark
  • Spark concepts and architecture
  • Spark components overview
The first look at Spark
  • Spark shell
  • Spark web UIs
  • Analyzing dataset part 1
Spark Data structures
  • Partitions
  • Distributed execution
  • Operations: transformations and actions
Caching
  • Caching overview
  • Various caching mechanisms available in Spark
  • In memory file systems
  • Caching use cases and best practices
DataFrames and Datasets
  • DataFrames Intro
  • Loading structured data (JSON, CSV) using DataFrames
  • Using schema
  • Specifying schema for DataFrames
Spark SQL
  • Spark SQL concepts and overview
  • Defining tables and importing datasets
  • Querying data using SQL
  • Handling various storage formats: JSON, Parquet, ORC
Spark and Hadoop
  • Hadoop Primer: HDFS, YARN
  • Hadoop + Spark architecture
  • Running Spark on Hadoop YARN
  • Processing HDFS files using Spark
  • Spark & Hive
Spark API
  • Overview of Spark APIs in Scala / Python
  • The lifecycle of a Spark application
  • Spark APIs
  • Deploying Spark applications on YARN
Spark ML Overview
  • Machine Learning primer
  • Machine Learning in Spark: MLib / ML
  • Spark ML overview (newer Spark2 version)
  • Algorithms overview: Clustering, Classifications, Recommendations
GraphX
  • GraphX library overview
  • GraphX APIs
  • Create a Graph and navigating it
  • Shortest distance
  • Pregel API
Time Permitting Topics Spark Streaming
  • Streaming concepts
  • Evaluating Streaming platforms
  • Spark streaming library overview
  • Streaming operations
  • Sliding window operations
  • Structured Streaming
  • Continuous streaming
  • Spark & Kafka streaming
Workshop
  • Attendees will work on solving real-world data analysis problems using Spark
Additional course details:

Nexus Humans Introduction to Apache Spark | Hands-on Spark for Big Data & Machine Learning (TTSK7503) training program is a workshop that presents an invigorating mix of sessions, lessons, and masterclasses meticulously crafted to propel your learning expedition forward.

This immersive bootcamp-style experience boasts interactive lectures, hands-on labs, and collaborative hackathons, all strategically designed to fortify fundamental concepts.

Guided by seasoned coaches, each session offers priceless insights and practical skills crucial for honing your expertise. Whether you're stepping into the realm of professional skills or a seasoned professional, this comprehensive course ensures you're equipped with the knowledge and prowess necessary for success.

While we feel this is the best course for the ITS Data Analytics course and one of our Top 10 we encourage you to read the course outline to make sure it is the right content for you.

Additionally, private sessions, closed classes or dedicated events are available both live online and at our training centres in Dublin and London, as well as at your offices anywhere in the UK, Ireland or across EMEA.

FAQ for the Introduction to Apache Spark | Hands-on Spark for Big Data & Machine Learning (TTSK7503) Course

Available Delivery Options for the Introduction to Apache Spark | Hands-on Spark for Big Data & Machine Learning (TTSK7503) training.
  • Live Instructor Led Classroom Online (Live Online)
  • Traditional Instructor Led Classroom (TILT/ILT)
  • Delivery at your offices in London or anywhere in the UK
  • Private dedicated course as works for your staff.
How many CPD hours does the Introduction to Apache Spark | Hands-on Spark for Big Data & Machine Learning (TTSK7503) training provide?

The 3 day. Introduction to Apache Spark | Hands-on Spark for Big Data & Machine Learning (TTSK7503) training course give you up to 18 CPD hours/structured learning hours. If you need a letter or certificate in a particular format for your association, organisation or professional body please just ask.

Which exam does the Introduction to Apache Spark | Hands-on Spark for Big Data & Machine Learning (TTSK7503) training course prepare you for?

The Introduction to Apache Spark | Hands-on Spark for Big Data & Machine Learning (TTSK7503) prepares you for the Yes official exam. You can take this exam at any exam center across Ireland including, Dublin, Cork, Galway, Northern Ireland or live online where ever you are. Exams vary in duration and if required you can request with the provider for any accommodations appropriate for you.

What is the correct audience for the Introduction to Apache Spark | Hands-on Spark for Big Data & Machine Learning (TTSK7503) training?

This foundation-level course is geared for intermediate skilled, experienced Developers and Architects (with basic Python experience) who seek to be proficient in advanced, modern development skills working with Apache Spark in an enterprise data environment.

Do you provide training for the Introduction to Apache Spark | Hands-on Spark for Big Data & Machine Learning (TTSK7503).

Yes we provide corporate training, dedicated training and closed classes for the Introduction to Apache Spark | Hands-on Spark for Big Data & Machine Learning (TTSK7503). This can take place anywhere in Ireland including, Dublin, Cork, Galway, Northern Ireland or live online allowing you to have your teams from across Ireland or further afield to attend a single training event saving travel and delivery expenses.

What is the duration of the Introduction to Apache Spark | Hands-on Spark for Big Data & Machine Learning (TTSK7503) program.

The Introduction to Apache Spark | Hands-on Spark for Big Data & Machine Learning (TTSK7503) training takes place over 3 day(s), with each day lasting approximately 8 hours including small and lunch breaks to ensure that the delegates get the most out of the day.

What other terms do people search for when looking for this course?

Popular related searched include Spark.

Why are Nexus Human the best provider for the Introduction to Apache Spark | Hands-on Spark for Big Data & Machine Learning (TTSK7503)?
Nexus Human are recognised as one of the best training companies as they and their trainers have won and hold many awards and titles including having previously won the Small Firms Best Trainer award, national training partner of the year for Ireland on multiple occasions, having trainers in the global top 30 instructor awards in 2012, 2019 and 2021. Nexus Human has also been nominated for the Tech Excellence awards multiple times. Learning Performance institute (LPI) external training provider sponsor 2024.
Is there a discount code for the Introduction to Apache Spark | Hands-on Spark for Big Data & Machine Learning (TTSK7503) training.

Yes, the discount code PENPAL5 is currently available for the Introduction to Apache Spark | Hands-on Spark for Big Data & Machine Learning (TTSK7503) training. Other discount codes may also be available but only one discount code or special offer can be used for each booking. This discount code is available for companies and individuals.

Jump to dates

Training Insurance Included!

When you organise training, we understand that there is a risk that some people may fall ill, become unavailable. To mitigate the risk we include training insurance for each delegate enrolled on our public schedule, they are welcome to sit on the same Public class within 6 months at no charge, if the case arises.

What people say about us


Top

}