Big Data Architecture Workshop

4.6 out of 5 rating

Jump to dates


3 Days

18 CPD hours

This course is intended for

Senior Executives
CIOs and CTOs
Business Intelligence Executives
Marketing Executives
Data & Business Analytics Specialists
Innovation Specialists & Entrepreneurs
Academics, and other people interested in Big Data


More specifically, BDAW addresses advanced big data architecture topics, including, data formats, transformation, real-time, batch and machine learning processing, scalability, fault tolerance, security and privacy, minimizing the risk of an unsound architecture and technology selection.


Big Data Architecture Workshop (BDAW) is a learning event that addresses advanced big data architecture topics. BDAW brings together technical contributors into a group setting to design and architect solutions to a challenging business problem. The workshop addresses big data architecture problems in general, and then applies them to the design of a challenging system. Throughout the highly interactive workshop, students apply concepts to real-world examples resulting in detailed synergistic discussions. The workshop is conducive for students to learn techniques for architecting big data systems, not only from Cloudera?s experience but also from the experiences of fellow students.

Workshop Application Use Cases
  • Oz Metropolitan
  • Architectural questions
  • Team activity: Analyze Metroz Application Use Cases
Application Vertical Slice
  • Definition
  • Minimizing risk of an unsound architecture
  • Selecting a vertical slice
  • Team activity: Identify an initial vertical slice for Metroz
Application Processing
  • Real time, near real time processing
  • Batch processing
  • Data access patterns
  • Delivery and processing guarantees
  • Machine Learning pipelines
  • Team activity: identify delivery and processing patterns in Metroz, characterize response time requirements, identify Machine Learning pipelines
Application Data
  • Three V?s of Big Data
  • Data Lifecycle
  • Data Formats
  • Transforming Data
  • Team activity: Metroz Data Requirements
Scalable Applications
  • Scale up, scale out, scale to X
    Determining if an application will scale
    Poll: scalable airport terminal designs
    Hadoop and Spark Scalability
    Team activity: Scaling Metroz
Fault Tolerant Distributed Systems
  • Principles
    Hardware vs. Software redundancy
    Tolerating disasters
    Stateless functional fault tolerance
    Stateful fault tolerance
    Replication and group consistency
    Fault tolerance in Spark and Map Reduce
    Application tolerance for failures
    Team activity: Identify Metroz component failures and requirements
Security and Privacy
  • Principles
  • Privacy
  • Threats
  • Technologies
  • Team activity: identify threats and security mechanisms in Metroz
  • Cluster sizing and evolution
  • On-premise vs. Cloud
  • Edge computing
  • Team activity: select deployment for Metroz
Technology Selection
  • HDFS
  • HBase
  • Kudu
  • Relational Database Management Systems
  • Map Reduce
  • Spark, including streaming, SparkSQL and SparkML
  • Hive
  • Impala
  • Cloudera Search
  • Data Sets and Formats
  • Team activity: technologies relevant to Metroz
Software Architecture
  • Architecture artifacts
  • One platform or multiple, lambda architecture
  • Team activity: produce high level architecture, selected technologies, revisit vertical slice
  • Vertical Slice demonstration
Training Insurance Included!

When you organise training, we understand that there is a risk that some people may fall ill, become unavailable. To mitigate the risk we include training insurance for each delegate enrolled on our public schedule, they are welcome to sit on the same Public class within 6 months at no charge, if the case arises.

What people say about us

Find out more about this course

Interested in alternative dates? Would like to book a private session of this course for your company? Or for any other queries please simply fill out the form below.