How to get started in Big Data without Big Costs - StampedeCon 2016

Dipping Your Toe Into
Hadoop
How to get started in Big Data without Big
Costs
Bobby Dewitt
VP, Systems Architect
Aisle411
StampedeCon 2016

My Background
• Oracle, MySQL, and PostgreSQL DBA with 15
years of experience
• Led database, infrastructure, and business
intelligence teams to deliver highly available
data systems
• Currently responsible for design,
implementation, and operational availability of
infrastructure and systems at Aisle411

Aisle411
• Digitizing the indoor world
• Indoor maps, positioning, and analytics
• Asset and customer tracking within
locations
• Using augmented reality to make
indoor solutions more interactive
• Small company - big data

RDBMS Versus Hadoop
• Relational databases
• Very structured data
• Good for transactional and operational systems
• Difﬁcult to scale out
• Hardware failures can be disastrous
• Hadoop
• Semistructured or unstructured data
• Good for batch and bulk processing as well as
analytic systems
• Simple to scale out
• Hardware failures are handled seamlessly

Hadoop Adoption
• Still not a reality for many companies
• Major barriers include
• Lack of skilled employees
• Getting value out of the investment
• Constant changes to the ecosystem

Kick the Tires
• Play around with it
• A Hadoop cluster can reside on a single
machine
• Pre-loaded virtual machines
• Install on EC2 or other cloud VM

What Data Should I Use?
• Stick with what you know
• Choose a dataset that is not speciﬁc to
your company
• Try documented examples and use
cases

Example Datasets
• Apache web server logs
• Twitter feeds
• Stock market prices
• Census data
• Sports statistics
• Song data

Apache Web Log Data
• Many online resources
• Potentially large data set
• Real business value
• Combine with other data sources

From Batch to Streaming
• Initial testing done with a batch load using HDFS
tools
• Setup streaming to provide near real-time
updates
• Used several Hadoop components
• HDFS
• Flume
• Morphlines
• Avro
• Hive
• Impala

Quick Wins
• Get data into HDFS
• Get data into Hive or Impala
• Stream live data
• Combine with other data sources
• Create pretty graphs and charts

Costs
• Start small with a data puddle
• Use virtual machines, not the big
appliance
• Research and experimentation time
may be biggest cost

Where Am I?
• Evaluate your initial trials
• Is Hadoop everything you thought it would
be?
• Do you have a real business need to use it?
• Can you migrate any existing data or
processes?

Training
• Hortonworks University
• MapR Academy
• Cloudera quick start tutorials
• Online classes through Coursera, edX, and
others
• Conferences like StampedeCon

Hadoop Is Not For Everyone
• Your “big data” may not be big enough
• Still some work to be done with security
and tools
• Skills are being learned, but not quickly
enough

Thank You
• Questions?
rdewitt@aisle411.com

How to get started in Big Data without Big Costs - StampedeCon 2016

Related slideshows

More Related Content

How to get started in Big Data without Big Costs - StampedeCon 2016