Big data 101 v1

www.geekseat.com.au Agile Software Development
Welcome to “Big Data” Jungle
Welly Tambunan
(welly.tambunan@danamon.co.id)
Solution and Integration Architect Lead
Analytics & Data warehouse Department

Outlines
 Big Data Overview and History
 Introduction to Hadoop
 Hadoop Ecosystem
 Hadoop Distribution
 Cloudera
 Big Data Architecture
 ETL vs ELT
 Talend for ETL Tools

Big Data Overview and History
 Google Search Engine
 Search Engine Architecture
 Crawler
 Indexer
 Search Algorithm / Page Rank
 Doug Cutting and Search Engine
 Apache Lucene
 Apache Nutch
 Google File System + Map Reduce
 Hadoop Birth

Hadoop
 HDFS ( Hadoop Distributed File System )
 Map Reduce
 Hadoop = HDFS + Map Reduce
 Hadoop = Storage + Processing
 Feature
 schemaless with no predefined structure, i.e. no rigid schema with tables and columns (and column types and sizes)
 durable once data is written it should never be lost
 capable of handling component failure without human intervention (e.g. CPU, disk, memory, network, power supply, MB)
 automatically rebalanced to even out disk space consumption throughout cluster

Hadoop Ecosystem
 SQL on Hadoop
 HIVE
 Impala
 Hbase
 Hue
 Kafka
 Oozie
 Sqoop

Hadoop Ecosystem
 Yarn
 Zookeeper
 Spark
 Batch
 Streaming
 Flink
 Batch
 Streaming

Hadoop Distribution
 Cloudera ( Danamon choice )
 Hortonworks
 MapR
 IBM
 etc

Cloudera Demo
 Cloudera Manager
 Hue
 File
 Format
 CSV
 Parquet
 Avro
 Compression
 Gzip
 Snappy
 Deflate
 Read as Database from
 Hive
 Impala

ETL vs ELT
 Extract Transform Load
 Extract Load Transform

Talend for ETL/ELT Tools
 Demo for Standard Job with Database
 Demo for Batch Job
 Demo for Streaming Job

Announcement
 https://weltam.wordpress.com/ is back with Big Data Flavor

Big data 101 v1

Related slideshows

More Related Content

Big data 101 v1