Data Engineering the Startup Way - AWS Startup Day Chicago 2018
- 1. 1Copyright © 2018 Uptake06-Sep-18AWS Startup Day
Data Engineering the Startup Way
| Dan Collins | 2018
- 4. 4Copyright © 2018 Uptake06-Sep-18AWS Startup Day
• CEO and Co-founder Brad Keywell
• President Ganesh Bell
• ~ 4 years old
• 100+ Customers
• Two-time CNBC Disruptor 50
honoree
• World Economic Forum Technology
Pioneer
• One of Chicago’s best workplaces
for 2018 by Fortune
• Uptake is ranked in top 25 of the
2017 “Forbes Cloud 100”
- 7. 7Copyright © 2018 Uptake06-Sep-18AWS Startup Day
Solving Hard Problems
Industrial AI and IOT
• Predictive Analytics
• Anomaly Detection
• Label Correction
• Applications and AI UX
- 8. 8Copyright © 2018 Uptake06-Sep-18AWS Startup Day
Solving Hard Problems
Industrial Data
• Telematics
• SCADA Systems
• PLC / Sensor Data
• Contextual Data
• Resource Planning
• Customer Relationships
• Content Management
- 9. 9Copyright © 2018 Uptake06-Sep-18AWS Startup Day
Solving Hard Problems
Industrial Data is… Dirty
• Out of Order
• High Volatility
• System-wide Snapshots with no deltas
• Pre-determined Aggregation
• Duplicated, Partitioned, Compressed
- 10. 10Copyright © 2018 Uptake06-Sep-18AWS Startup Day
Solving Hard Problems
Industrial Data has a past…
• Very old systems (some > 30 years old)
• Susceptible to policy changes over time (formatting, time, etc)
• Most integrations follow a standard, but not the same one
- 13. 13Copyright © 2018 Uptake06-Sep-18AWS Startup Day
Solving Hard Problems
• ~150,000 writes/second
• Across tenants
• Across integrations
1 2 3 4 5 6 7 8 9 10
1 2 3 4 5 6 7 8 9 10
1 2 3 4 5 6 7 8 9 10
T1 T2 T3 T4 T5 T6 T7 T8 T9 T10
Processing Time
- 14. 14Copyright © 2018 Uptake06-Sep-18AWS Startup Day
Solving Hard Problems
• How it really works
• Remember, industrial data is dirty
• We need to validate, hydrate,
quarantine, and persist updates
as they come in
• We need to be consistent or our
data science models lose their
efficacy
• At 150,000k writes/second
1 2 3 5 6 7 8 9
1234
9 7 9 10
1 1 1 2 3 4 1 8 9 2
T1 T2 T3 T4 T5 T6 T7 T8 T9 T10
Processing Time
- 16. 16Copyright © 2018 Uptake06-Sep-18AWS Startup Day
Solving Hard Problems
Platform Instance Platform Instance Platform InstancePlatform
Oh my!
Platform
Platform
Platform
Platform
Platform
Platform
Platform
Platform
Platform
Platform
Platform
Platform
Platform
Platform
Platform
Platform
Platform
Platform
Platform
Platform
Platform
Platform
Platform
Platform
Platform Instance
W
e did it!
- 17. 17Copyright © 2018 Uptake06-Sep-18AWS Startup Day
Solving Hard Problems
Shared Platform
Configured
Product
Bespoke
Solution
Platform
W
e did it!
More
Feature set
Feature set
- 19. 19Copyright © 2018 Uptake06-Sep-18AWS Startup Day
Facts of Life
Machine Learning: The
High-Interest Credit Card of
Tech Debt
- 20. 20Copyright © 2018 Uptake06-Sep-18AWS Startup Day
Facts of Life
What people talk about
The hard parts
- 21. 21Copyright © 2018 Uptake06-Sep-18AWS Startup Day
Facts of Life
Changing Anything,
Changes Everything
- 22. 22Copyright © 2018 Uptake06-Sep-18AWS Startup Day
Facts of Life
So, to recap:
• Take dirty data from old systems
• Scale it to > 150,000 writes/seconds
• Spin up data science models on top
and balance them really carefully
• What could go wrong?
xkcd.com/1838
- 24. 24Copyright © 2018 Uptake06-Sep-18AWS Startup Day
Continuous Evolution
1. Proof of Concept
2. Build it
3. Learn from it
4. Repeat
- 25. 25Copyright © 2018 Uptake06-Sep-18AWS Startup Day
Continuous Evolution – Proof of Concept
• Prototype: from works on my machine to scales in the cloud
• We create real-world working models written in R and Python
and sample data sets
• Focus on the problem, not the infrastructure, monitoring, etc
• Use the “beefiest” boxes to find equilibrium
• AWS allows you to go all in as soon as you’re ready to start
• Quickly spin up test instances or scaffold an environment
- 26. 26Copyright © 2018 Uptake06-Sep-18AWS Startup Day
Continuous Evolution – Build It
• Build out for scale
• Account for real-world data sets on distributed systems
• Lean on managed services and IaaS as your foundation
• AWS managed services and elastic scaling can drastically
reduce the time it takes to get up and running
• You can be production ready very quickly
- 27. 27Copyright © 2018 Uptake06-Sep-18AWS Startup Day
Continuous Evolution – Build It
What people talk about
The hard parts
AWS kickstarts your data
engineering here
- 28. 28Copyright © 2018 Uptake06-Sep-18AWS Startup Day
Continuous Evolution – Learn from It
• Codify patterns and encourage repeatability
• From bespoke to baked in
• Review trade-offs
• Analyze compute, I/O, parallelism
• Partition the problem space
• The scientific method, AWS’ huge array of services, and some
luck let you put hindsight to work as you build
- 29. 29Copyright © 2018 Uptake06-Sep-18AWS Startup Day
Continuous Evolution
Repeat “A program that is used and that as an
implementation of its specification reflects some
other reality, undergoes continual
change or becomes progressively less
useful. The change or decay process continues
until it is judged more cost effective to replace
the system with a recreated version.”
- Meir Lehman’s law of software evolution
- 31. 31Copyright © 2018 Uptake06-Sep-18AWS Startup Day
Data Engineering the Startup Way
Monolith
Microservices
Platform
• Features and efficiency are better fit each iteration
• Survival depends on flexibility and feedback
Data Science Applications Data Engineering
Platform
- 32. 32Copyright © 2018 Uptake06-Sep-18AWS Startup Day
Data Engineering the Startup Way
1. Focus on Value
2. Choose good abstractions
3. Act like an enterprise
4. Invest
5. Be Open
- 33. 33Copyright © 2018 Uptake06-Sep-18AWS Startup Day
Data Engineering the Startup Way – Focus on Value
• You have great ideas.
• Focus on where you have value, let others solve the less
interesting problems
• Use what’s available when it’s available, check often
• AWS and services like it can remove noise, letting you focus on
where you’re most innovative
- 34. 34Copyright © 2018 Uptake06-Sep-18AWS Startup Day
Data Engineering the Startup Way – Choose good abstractions
• Choose abstractions that let you take advantage of managed
services
• Don’t reinvent the wheel and don’t be afraid to change the
implementation
• docker, microservices, test driven development, continuous
delivery, automation, etc can all help you here
- 35. 35Copyright © 2018 Uptake06-Sep-18AWS Startup Day
Data Engineering the Startup Way – Act like an enterprise
• When you use world class, global services, you get the
services levels of world class, global services.
• Use services to enable your two person team operate like the
army of infra/ops they’re used to working with
• An outage is an outage no matter how small…
- 36. 36Copyright © 2018 Uptake06-Sep-18AWS Startup Day
Data Engineering the Startup Way – Invest
• Pairing really smart people with really great services gives you
the flexibility to be curious while you deliver
• Put down a foundation in your data platform and use managed
services where you can
• Craft your platform
• Investing in your data engineering gives you repeatability and
“paved roads” you can use to accelerate your delivery
- 37. 37Copyright © 2018 Uptake06-Sep-18AWS Startup Day
Data Engineering the Startup Way – Be Open
• There are a lot of smart people working on really useful
projects
• Scala, Flink, Spark, Kafka, Postgres, Docker, Airflow,
Kubernetes, Mesos, Kudu, Hive, Impala
• Get involved, share back, and use
open source
- 38. 38Copyright © 2018 Uptake06-Sep-18AWS Startup Day
Data Engineering the Startup Way – Oh, and Have Fun
• Don’t fight change, build systems and orgs that are flexible
• Use all the cool tech and packaged solutions to get you closer
to your vision
• And have fun!
• There’s never been a better time to be building
- 41. 41Copyright © 2018 Uptake06-Sep-18AWS Startup Day
• is awesome
• There are hard problems and we’re
solving them
• You can solve your hard problems
too if you try
• AWS makes it easier, especially for
startups
• Build, Learn, Repeat
• Have fun
In Summary
- 43. Copyright © 2018 by Uptake Technologies Inc. All rights reserved. No parts of this document may be
distributed, reproduced, transmitted, or stored electronically without Uptake’s prior written permission. This
document contains Uptake's confidential and proprietary information. If a pre-existing contract containing
disclosure and use restrictions exists between your company and Uptake, you and your company will use the
information in this document subject to the terms of the pre-existing contract. If no such pre-existing contract
exists, you and your Company agree to protect the information in this document and agree not to reproduce or
disclose the information in any way. Uptake makes no warranties, express or implied, in this document. Uptake
shall not be liable for damages of any kind arising out of use of this document. Any discussion of potential
features is not a promise of future functionality.
- 44. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Thanks!