Big Data Driven At Eway
- 1. (Big) Data Driven at
Eway
Tu Pham - CTO @ Eway
Journey to Google Cloud
-- Ha Noi - 03/2019 --
- 2. About Me - CTO at Eway JSC
- Google Developer Expert on Cloud
Platform
- Open source contributor, blogger,
father
- 8 years experience on Big data and
Cloud Computing
- 18. Use Case - Reporting
- Business Analytics
- Operational Analytics
- Product Features
- System Monitoring
- 26. Step 1: GC Compute Engine Instances
Collect Raw Data
- Technology: Cloud Load Balancing, Compute Engine
- Why Cloud Load Balancing:
- TCP/UDP Load Balancing
- Seamless Autoscaling
- Scalable
- Why Compute Engine:
- High-Performance
- Scalable
- Low Cost
- Fast Networking
- Custom Machine Types
- 27. Step 1: GC Compute Engine Instances
Collect Raw Data
- 29. Step 2: GC Compute Engine Instances
Convert Raw Data To Apache Parquet Files
- Technology: Compute Engine, Parquet file format
- Why Parquet:
- Self-describing, columnar storage format
- Language-independent
- High query-performance
- Spark SQL is much faster with Parquet
- High compression (up to 70%)- less disk IO
- 30. Step 2: GC Compute Engine Instances
Convert Raw Data To Apache Parquet Files
- 31. Step 2: GC Compute Engine Instances
Convert Raw Data To Apache Parquet Files
- 32. - Technology: Compute Engine, Parquet file format, Cloud Storage
- Why Cloud Storage:
- Four storage classes
- Easy to integrate
- Object Lifecycle Management
- Fast Networking
Step 3: GC Compute Engine Upload Parquet
File To GC Cloud Storage
- 33. Step 3: GC Compute Engine Upload Parquet
File To GC Cloud Storage
- 34. Step 3: GC Compute Engine Upload Parquet
File To GC Cloud Storage
- 35. Step 3: GC Compute Engine Upload Parquet
File To GC Cloud Storage
- 36. Step 3: GC Compute Engine Upload Parquet
File To GC Cloud Storage
- 38. Step 4a: Explore Dataset Using GC Datalab
- Technology: Cloud Datalab
- Why Datalab:
- Integrated with: Cloud BigQuery, Cloud Machine Learning Engine, Cloud Storage, and
Stackdriver Monitoring
- IPython Support & Notebook Format
- Interactive Data Visualization
- Multi-Language Support: Python, SQL, and JavaScript (for BigQuery user-defined functions
- 39. Step 4b: Explore Dataset Using BI Tools
- Technology: Grafana, PowerBI
- Why:
- Support >40 data sources (File, Database, Log Stream, Zabbix, Google Analytic, Google
Calendar, AWS Cloudwatch, Jira, ...)
- Query, visualize, alert on and understand your metrics
- Create, explore, and share dashboards with your team
- 48. Be 1% better everyday
tips
Create your system
principles
Design system
architecture, data flow,
data model, data
structure first
Separate realtime and
batch flows
Separate data storage
strategies between data
types
Save the cost by
network cost, instances
cost, storage cost by
metric monitoring &
alert system
- 49. We Are Hiring
● Product Owner
● Backend Java Developer
● Full Stack PHP Developer
● Full Stack Python Developer
● DevOps Engineer
- 50. Thank You - Q&A
● Eway: https://eway.vn
● My Contact: tupp@eway.vn