From a single droplet to a full bottle, our journey to Hadoop at Coca-Cola East Japan

コカ･コーライーストジャパン株式会社
From a single droplet to a full bottle,
our journey to Hadoop at Coca-Cola east Japan
October 27, 2016
Information Systems, Enterprise Architect
& Innovation project manager
Damien Contreras
ダミアンコントレラ

In This session
• About Coca-Cola East Japan
• Hadoop Journey at CCEJ
• Hadoop Projects
• Hadoop for the manufacturing
industry
• Hadoop for CCEJ: What’s Next

コカ･コーライーストジャパン株式会社 3
• Coca-Cola East Japan was established on Jul. 1, 2013
through the merger of four bottlers.
• On Apr. 1, 2015, it underwent further business integration with
Sendai Coca-Cola Bottling Co. , Ltd.
• Announced MOU with Coca-Cola West on April 26, 2016 to
proceed with discussions/review of business integration
opportunities
• Japan's largest Coca-Cola Bottler, with an extensive
local network, selling the most popular beverage
brands in Japan
Data as of December 2015
About Coca-Cola East Japan

CCEJ Data Landscape
DATA IN SILOS
(Datamart, ERP, DWH, Staging, Mainframe,…)
P2P INTERFACES
(No ESB, Multiple ETL & Interface Servers)
NO GOVERNANCE
(Multiple Data formats for same business
context, No Meta Data Mgt.)
BATCH ORIENTED
(File, Scheduler, …)

Hadoop Journey: Genesis
Yarn
HiveKNIME
WEKA Tez
Analytics System Processing Integration Data source
Data
Restitution
HDFS
MR
Centos
Flat files
July 2015
• Pilot phase
• 5 nodes
• Azure A1  A4
• 100GB
• 70GB of RAM
• Team: 1 person
Ambari
KNIME

Hadoop Journey: Stability
Yarn
Hive
Ranger
KNIME
Tez
Python
Notebook
NiFi
Data
Restitution
Flat files
HDFS
MR
Centos
Active
Directory
November 2015
• Pilot phase
• 6 nodes
• Azure A4  D & DS13
• 1TB of data
• 336GB of RAM
• Team: 2 persons
Zeppelin
Ambari

Hadoop Journey: Production
Yarn
HiveSpark
BW on Hana
Ranger
KNIME Zeppelin
Tez
Python
Notebook
NiFi
Data
Restitution
Flat files
Web
Services
HDFS
MR
Centos
Active
Directory
March 2016
• 8nodes
• Azure D/DS13
• 3TB of Data
• 64 cores
• 448GB Ram
• Team: 2 people
Ambari

• 13 nodes
• 20TB
• 104 cores
• 728GB RAM
• 1000+ Tables
• 3 Production Systems
Hadoop Eco-system at CCEJ
Analytics System Processing Integratio
n
Data
source
Data
Restituti
on
Aggregated data Visualization
2
Data Hub
Past data Forecast data
1
Analytics
3
Master Data
Centralize
Lineage
Governance
Yar
n
Hiv
e
Spark
BW on
HanaHTML
Report
Ranger
Zeppelin
Tez Presto
AirPal
Python
Notebook
MySQL
NiFi
SAP ECC
Boomi
Sparkling
Water
Tensorflow
Flat files
Web
Services
HDFS
MR
Drill
Centos
Active
Directo
ry
Ambar
i
KNIME

May Jun July Aug Sep
t
Oct Nov Dec Jan Feb Mar Apr May Jun July Aug Sep
t
Oct
Timeline
Hadoop / NiFi PlatformPlatform POC
VM Analytics POC Forecast ImplementationVM Analytics POC
2015 2016
POC VM Placement
Flow implementation
BW Report integration
1 SAP integration & MDM3
2 Write-Off report

20TB

HIGH Nbr. OF MACHINES
550,000VM, On/Offline
Nbr. SKUs per VM
25 SKUs, Hot & Cold
Vending Replenishment: The Business Case
EXTERNAL FACTORS
(Weather, City data, Geo-Location, Events )
VENDING ROUTES
(Visit List per truck, Logistics dependence)
ColdHot
How to:
Reduce nbr. of visits
Optimize Truck stock
Avoid out of stocks

Vending replenishment forecast: The Project
The Challenge:
• Deployment in 3 months
• 1 ½ hour to generate the forecast
• +20% of accuracy versus previous version
• 120 steps in the program
Picking list
Visit Plan
Online VM
Offline VM
Every day
Yes NoNoArbitration
Forecast
generation
Hadoop Has Delivered:
• Feed 5GB+ of new data everyday
• Process high volume of data (in-memory)
300GB+
• Integrate from different data sources
• Generate more complicated forecast than
legacy systems
14 Million items

Staging: The Case of “write-off” report
Drill Web ServerAzure
X7systems
Master Data
Generate
SQL query
JSON
HTML Interface
Verify & Check
Combine
Report
Challenges:
• Data set harmonization
(Sales, Billing, Inventory)
• Data volume from source
systems
• Complex Computation logic
• Not clear functional
requirements
Objectives:
• Aggregate a large number of
dataset 40+ flows 4GB of data
everyday
• Single view of data, anywhere, to
Finance, SC & Commercial
• Dynamic transaction vs. static in
excel
• Reduce manual work to zero
Comparison
=
Aggregation
=+
Enrichment

Analytics

Transformation (conversion)


MDM: Centralization and Dispatch
External Systems
4 Replicate data
Event driven
3 Consistency check
Rule engine Replication EngineMDM Repository
2 MDM registration
Lineage
1 MDM Creation
Challenges:
• Rule engine definition and
implementation
• MDM on Hadoop & ESB
integration
• MDM & SAP Synchronization
Objectives:
• Single MDM repository
• Centralized bridge tables &
Mapping table
• Standardization of MDM across
data landscape
• Targeted distribution / replication
of MDM to external systems
Realization:
• MySQL and Hadoop synchronization
300+ tables
• Replication engine with ESB
• MDM-Tool: Pilot with Customer
Master
• Full go-live: April 2017

Use case – SAP Integration / sales interface report
Objectives:
• Leverage the most granular data already
in Hadoop
• Leverage the processing power of
Hadoop
x9flows
x4flows
x7flows
x9flowsMD & Bridge
Vending
Sales Data
Legacy format data
CCEJ format data
Bridge table
& Master Combine
Calculate
x9output tables
Company 1
Company 2
Company 3
Azure
Challenges:
• Many data format requiring
complex data transformation
• Wide variety of data sources &
technologies to transfer data
• Data mapping between systems
Realization:
• Data structure in Hadoop
• Logic for one type of sales
channel implemented
• Full go-live: April 2017

Hadoop: What’s Next
Increase data velocity & Create a true Data Lake
Improve data collection, quality, profiling, meta-data &
propose a catalog of curated data to end users
Toward a Data Driven Decision Process
Develop Support & Operational Excellence

I thank CCEJ management who had the courage to believe in an Agile
approach
Thank to my team member and comrade:
Vinay Mahadev for all the long hours we’ve put together
to make this project a reality

Your turn, let s share ideas & a coke !
Damien Contreras
Email: damien.contreras@ccej.co.jp
LinkedIn: Damien Contreras
Twitter: @dvolute

The inside of Hadoop

BW on Hana
Integration Landscape overview
Hadoop Prod
Nifi
Prod
NiFi
Prod
Oracle
Boomi
Hive
JDBC
Drill
IDOCS
JDBC
Flat files
MySQL
SAP ECC
Other systems
Other
systems
FTP
JDBC
HTTP HTML
interface
Power users
Acquisition Transformation Restitution
dt=20161024
dt=20161025
t_my_table_txt_p
My_file_20161024.csv
My_file_20161025.csv
Myflow-data
t_my_table_txt_p
(External text tables)
t_my_table_txt_p
t_my_bridge_table_txt_p
+Myflow-data
(Database)
t_my_report_orc_p
(ORC tables)

Guidelines around NiFi flows
Prod
Dev
Prod
Dev
Azure
Triggers
System source NiFi
Listener
Extraction
webCall
JDBC
Groups
Encryption
/ Flow
Master Data
Transaction
Data
Processing
Group

Guidelines around NiFi flows
Retry
Processor
Write to error log
Success
OnError
Read from Error log
Re-Process
Update Error log
Send Data
Every 5 mins
Error
Handling / Flow
Master Data
Transaction
Data

NiFi enhancement: example

Technical Architecture
Hadoop Production environment
….
Node 3
Node 4
Node 5 Node 11
AD
NiFi
Node 0
Node 1
Node 2 Node 6
Hadoop Dev environment
Node 3Node 0
Node 1 Node 2
Prod environment
Dev environment
RDBMS
FTP Server
SAP ECC
Azure
NiFi
NiFi
NiFi
…

From a single droplet to a full bottle, our journey to Hadoop at Coca-Cola East Japan

Related slideshows

More Related Content

From a single droplet to a full bottle, our journey to Hadoop at Coca-Cola East Japan

Editor's Notes