SlideShare a Scribd company logo
(Big) Data Driven at
Eway
Tu Pham - CTO @ Eway
Journey to Google Cloud
-- Ha Noi - 03/2019 --
About Me - CTO at Eway JSC
- Google Developer Expert on Cloud
Platform
- Open source contributor, blogger,
father
- 8 years experience on Big data and
Cloud Computing
Big Data Driven At Eway
Big Data Driven At Eway
Big Data Driven At Eway
Big Data Driven At Eway
Big Data Driven At Eway
Big Data Driven At Eway
Big Data Driven At Eway
Big Data Driven At Eway
Big Data Driven At Eway
44M dollar
transaction
value in
2018
> 5M
Transactions
Big Data Driven At Eway
Big Data Driven At Eway
When You
Have (Big)
Data
How Do We
Use This
Data
Use Case - Reporting
- Business Analytics
- Operational Analytics
- Product Features
- System Monitoring
- Reporting to:
- Partners
- Advertisers
- Publishers
Reporting
Business
Analytics
- Analyzing
- Growth
- Users behavior
- Sign up funnels
- Sign up referrals
- ...
Operational
Analytics
- Analyzing
- Root cause analysis
- Latency analysis
- Error analysis
- Better
- Threshold alerts
- Security alerts
- Capacity planning (server,
bandwidth)
Product
Features
- Product Features
- Top Products
- Adflex publisher challenge
- Signup referrals
- A/B Testing
Big Data Driven At Eway
Samole: End-To-End Flow For Mining
User Behavior
How do we
collect this
data?
Step 1: GC Compute Engine Instances
Collect Raw Data
- Technology: Cloud Load Balancing, Compute Engine
- Why Cloud Load Balancing:
- TCP/UDP Load Balancing
- Seamless Autoscaling
- Scalable
- Why Compute Engine:
- High-Performance
- Scalable
- Low Cost
- Fast Networking
- Custom Machine Types
Step 1: GC Compute Engine Instances
Collect Raw Data
How do we
process this
data?
Step 2: GC Compute Engine Instances
Convert Raw Data To Apache Parquet Files
- Technology: Compute Engine, Parquet file format
- Why Parquet:
- Self-describing, columnar storage format
- Language-independent
- High query-performance
- Spark SQL is much faster with Parquet
- High compression (up to 70%)- less disk IO
Step 2: GC Compute Engine Instances
Convert Raw Data To Apache Parquet Files
Step 2: GC Compute Engine Instances
Convert Raw Data To Apache Parquet Files
- Technology: Compute Engine, Parquet file format, Cloud Storage
- Why Cloud Storage:
- Four storage classes
- Easy to integrate
- Object Lifecycle Management
- Fast Networking
Step 3: GC Compute Engine Upload Parquet
File To GC Cloud Storage
Step 3: GC Compute Engine Upload Parquet
File To GC Cloud Storage
Step 3: GC Compute Engine Upload Parquet
File To GC Cloud Storage
Step 3: GC Compute Engine Upload Parquet
File To GC Cloud Storage
Step 3: GC Compute Engine Upload Parquet
File To GC Cloud Storage
How do we
visualize this
data
Step 4a: Explore Dataset Using GC Datalab
- Technology: Cloud Datalab
- Why Datalab:
- Integrated with: Cloud BigQuery, Cloud Machine Learning Engine, Cloud Storage, and
Stackdriver Monitoring
- IPython Support & Notebook Format
- Interactive Data Visualization
- Multi-Language Support: Python, SQL, and JavaScript (for BigQuery user-defined functions
Step 4b: Explore Dataset Using BI Tools
- Technology: Grafana, PowerBI
- Why:
- Support >40 data sources (File, Database, Log Stream, Zabbix, Google Analytic, Google
Calendar, AWS Cloudwatch, Jira, ...)
- Query, visualize, alert on and understand your metrics
- Create, explore, and share dashboards with your team
Step 4a: Explore Dataset Using GC Datalab
Big Data Driven At Eway
Big Data Driven At Eway
Big Data Driven At Eway
Big Data Driven At Eway
Become
Geek
Where Are
The AI /
ML
Create
Your
Principles
Principles:
- KISS (Keep it simple, stupid)
- DRY (Don’t Repeat Yourself)
- Single Responsibility
- Low Cost
- Scalable
Be 1% better everyday
tips
Create your system
principles
Design system
architecture, data flow,
data model, data
structure first
Separate realtime and
batch flows
Separate data storage
strategies between data
types
Save the cost by
network cost, instances
cost, storage cost by
metric monitoring &
alert system
We Are Hiring
�� Product Owner
● Backend Java Developer
● Full Stack PHP Developer
● Full Stack Python Developer
● DevOps Engineer
Thank You - Q&A
● Eway: https://eway.vn
● My Contact: tupp@eway.vn

More Related Content

Big Data Driven At Eway

  • 1. (Big) Data Driven at Eway Tu Pham - CTO @ Eway Journey to Google Cloud -- Ha Noi - 03/2019 --
  • 2. About Me - CTO at Eway JSC - Google Developer Expert on Cloud Platform - Open source contributor, blogger, father - 8 years experience on Big data and Cloud Computing
  • 17. How Do We Use This Data
  • 18. Use Case - Reporting - Business Analytics - Operational Analytics - Product Features - System Monitoring
  • 19. - Reporting to: - Partners - Advertisers - Publishers Reporting
  • 20. Business Analytics - Analyzing - Growth - Users behavior - Sign up funnels - Sign up referrals - ...
  • 21. Operational Analytics - Analyzing - Root cause analysis - Latency analysis - Error analysis - Better - Threshold alerts - Security alerts - Capacity planning (server, bandwidth)
  • 22. Product Features - Product Features - Top Products - Adflex publisher challenge - Signup referrals - A/B Testing
  • 24. Samole: End-To-End Flow For Mining User Behavior
  • 25. How do we collect this data?
  • 26. Step 1: GC Compute Engine Instances Collect Raw Data - Technology: Cloud Load Balancing, Compute Engine - Why Cloud Load Balancing: - TCP/UDP Load Balancing - Seamless Autoscaling - Scalable - Why Compute Engine: - High-Performance - Scalable - Low Cost - Fast Networking - Custom Machine Types
  • 27. Step 1: GC Compute Engine Instances Collect Raw Data
  • 28. How do we process this data?
  • 29. Step 2: GC Compute Engine Instances Convert Raw Data To Apache Parquet Files - Technology: Compute Engine, Parquet file format - Why Parquet: - Self-describing, columnar storage format - Language-independent - High query-performance - Spark SQL is much faster with Parquet - High compression (up to 70%)- less disk IO
  • 30. Step 2: GC Compute Engine Instances Convert Raw Data To Apache Parquet Files
  • 31. Step 2: GC Compute Engine Instances Convert Raw Data To Apache Parquet Files
  • 32. - Technology: Compute Engine, Parquet file format, Cloud Storage - Why Cloud Storage: - Four storage classes - Easy to integrate - Object Lifecycle Management - Fast Networking Step 3: GC Compute Engine Upload Parquet File To GC Cloud Storage
  • 33. Step 3: GC Compute Engine Upload Parquet File To GC Cloud Storage
  • 34. Step 3: GC Compute Engine Upload Parquet File To GC Cloud Storage
  • 35. Step 3: GC Compute Engine Upload Parquet File To GC Cloud Storage
  • 36. Step 3: GC Compute Engine Upload Parquet File To GC Cloud Storage
  • 37. How do we visualize this data
  • 38. Step 4a: Explore Dataset Using GC Datalab - Technology: Cloud Datalab - Why Datalab: - Integrated with: Cloud BigQuery, Cloud Machine Learning Engine, Cloud Storage, and Stackdriver Monitoring - IPython Support & Notebook Format - Interactive Data Visualization - Multi-Language Support: Python, SQL, and JavaScript (for BigQuery user-defined functions
  • 39. Step 4b: Explore Dataset Using BI Tools - Technology: Grafana, PowerBI - Why: - Support >40 data sources (File, Database, Log Stream, Zabbix, Google Analytic, Google Calendar, AWS Cloudwatch, Jira, ...) - Query, visualize, alert on and understand your metrics - Create, explore, and share dashboards with your team
  • 40. Step 4a: Explore Dataset Using GC Datalab
  • 47. Create Your Principles Principles: - KISS (Keep it simple, stupid) - DRY (Don’t Repeat Yourself) - Single Responsibility - Low Cost - Scalable
  • 48. Be 1% better everyday tips Create your system principles Design system architecture, data flow, data model, data structure first Separate realtime and batch flows Separate data storage strategies between data types Save the cost by network cost, instances cost, storage cost by metric monitoring & alert system
  • 49. We Are Hiring ● Product Owner ● Backend Java Developer ● Full Stack PHP Developer ● Full Stack Python Developer ● DevOps Engineer
  • 50. Thank You - Q&A ● Eway: https://eway.vn ● My Contact: tupp@eway.vn