The volume of available data is growing by the second (to an estimated 175 zetabytes by 2025), and it is becoming increasingly granular in its information. With that change every organization is moving towards building a data driven culture. We at Northwestern Mutual share similar story of driving towards making data driven decisions to improve both efficiency and effectiveness. Legacy system analysis revealed bottlenecks, excesses, duplications etc. Based on ever growing need to analyze more data our BI Team decided to make a move to more modern, scalable, cost effective data platform. As a financial company, data security is as important as ingestion of data. In addition to fast ingestion and compute we would need a solution that can support column level encryption, Role based access to different teams from our datalake.
In this talk we describe our journey to move 100’s of ELT jobs from current MSBI stack to Databricks and building a datalake (using Lakehouse). How we reduced our daily data load time from 7 hours to 2 hours with capability to ingest more data. Share our experience, challenges, learning, architecture and design patterns used while undertaking this huge migration effort. Different sets of tools/frameworks built by our engineers to help ease the learning curve that our non-Apache Spark engineers would have to go through during this migration. You will leave this session with more understand on what it would mean for you and your organization if you are thinking about migrating to Apache Spark/Databricks.
Report
Share
Report
Share
1 of 13
Download to read offline
More Related Content
Northwestern Mutual Journey – Transform BI Space to Cloud
1. Northwestern Mutual
Journey - Transform BI
Space to Cloud
Madhu Kotian – Vice President of Engineering
Keyuri Shah – Lead Engineer
3. Revenue $31.1 billion
#102 on FORTUNE 500
4.6+ million clients
10,500+ financial
professionals
6,700+ employees
Headquartered
in Milwaukee, Wisconsin
Figures as of December 31, 2020.
FOR 160+ YEARS,
NORTHWESTERN MUTUAL
HAS BEEN HELPING FAMILIES
AND BUSINESSES ACHIEVE
FINANCIAL SECURITY
5. Our Team – Insights (Book of Business)
• Build and manage reporting platform
• Curate aggregated content to provide insights to our Field and
Home office users
• Generate canned reports and dashboards
• Enable our Business partners to perform adhoc analysis
6. Our World Before Migration
No of ETL
300
Batch cycle time
7 hours
Time to market
5-6 Weeks
8. Key Architecture Pillars
▪ Performance
▪ Easy to Maintain, Use and
Learn ( Config Driven)
▪ Scale compute and storage as
needed
▪ Ability to manage
complicated dependencies
between jobs
• Metadata governance
• Databricks Delta
• Support ACID Operations
• Data Lake
• ELT/Scheduling
• Column Level Encryption
• Effective cluster
management
• Role Based Access to
Database/Tables/Views
• Security
9. Our World After Migration
Config Files
500
Batch cycle time
2 hours
Time to market
1-2 Weeks
10. Migration Approach
• Team Building
• Start with a small core group
• Learn – Train – Transform - Repeat for team building
• Ease out learning curve by building abstraction layers
• Code Migration
• Not lift and shift – redoing all code (No accelerators)
• Build small shippable pieces
• Keep it simple
• Not changing end user experience
• Production Support
• Running both environments in parallel
• Continuous push to new environment for faster feedback
11. Challenges
• Bringing Business/Product/Security onboard
• Go through current pain-points
• Explain long term benefits
• Think security first
• Balance Business priority v/s innovation
• Show and Prove Progress
• Do incremental approach – learn - build – test – repeat
• Put small chunks into production
• Open Communication to all interested parties
12. Frameworks Built
▪ Config Driven – JSON File
▪ CI-CD - with approvals
▪ Column level encryption
▪ Exec Commands
▪ Talk scheduled at
5/25 3:50PM to 4:20PM
Modern Config Driven ELT
Framework
for Building a Data Lake
• Config Driven – YML File
• CI-CD - with approvals
• Schema management
• Access management
• Talk scheduled at 5/26 5PM
to 5:30PM - Automated
Metadata Management in
Data Lake – A CI/CD Driven
approach
• Metadata Framework
• ELT Framework
• Config Driven – YML File
• CI-CD - with approvals
• Automatic DAG management
• Dependency management
• Airflow Framework
13. Feedback
Your feedback is important to us.
Don’t forget to rate and review the sessions.
Madhu Kotian: https://www.linkedin.com/in/imkotian
Keyuri Shah: https://www.linkedin.com/in/keyuri-shah