SlideShare a Scribd company logo
BigData in Cloud computing
Viet-Trung Tran
@Vietstack
Sunday 1 February 15
Bio
Viet-Trung Tran
trungtv@soict.hust.edu.vn
https://www.facebook.com/groups/BigDataStartUp/
SoICT, Trendiction S.A Luxembourg, Microsoft Research Cambridge,
INRIA France, BKAV
Sunday 1 February 15
Sunday 1 February 15
Sunday 1 February 15
Sunday 1 February 15
Sunday 1 February 15
Sunday 1 February 15
Sunday 1 February 15
Sunday 1 February 15
Google trends
Google MapReduce paper 2014
Sunday 1 February 15
BigData in science
Sunday 1 February 15
Sunday 1 February 15
The Data Science: The 4th Paradigm
for Scientific Discovery
Last
few decades
Thousand
years ago
Today and the
Future
Last few
hundred years
2
2
2.
3
4
a
cG
a
a
Κ−=
##
#
$
%
&&
&
'
(
ρπ
Simulation of
complex phenomena
Newton’s laws,
Maxwell’s equations…
Description of natural
phenomena
Crédits: Dennis Gannon
Sunday 1 February 15
What’s BigData
Data has always been Big. The one aspect that differs now, if
compared with the past, would be the sheer scale and accessibility
of Data, which is the direct result of the super efficient speeds in
which data can now be computed. Big Data is therefore an all-
encompassing term for any collection of large data sets that were
once difficult to process.
Big data requires exceptional technologies to efficiently process large
quantities of data within tolerable elapsed times.
Sunday 1 February 15
Data mining -> BigData mining?
Sunday 1 February 15
Simplified BigData stack
Data analytics &
visualization
Data processing frameworks
(Streaming, MapReduce, BSP
model)
Data management systems BlobSeer
Sunday 1 February 15
BigData management
Sunday 1 February 15
NoSQL
Sunday 1 February 15
The last 25 years of commercial DBMS development can be summed
up in a single phrase: "one size fits all". This phrase refers to the fact
that the traditional DBMS architecture (originally designed and
optimized for business data processing) has been used to support
many data-centric applications with widely varying characteristics and
requirements. In this paper, we argue that this concept is no longer
applicable to the database market, and that the commercial
world will fracture into a collection of independent database
engines, some of which may be unified by a common front-end
Sunday 1 February 15
Sunday 1 February 15
Why NoSQL
“The whole point of seeking alternatives [to RDBMS systems] is that you need to
solve a problem that relational databases are a bad fit for.” Eric Evans -
Rackspace
ACID does not scale
Web applications have different needs
Scalability
Elasticity
Flexible schema/ semi-structured data
Geographically distributed
Web applications do not always need
Transaction
Strong consistency
Complex queries
Sunday 1 February 15
Sunday 1 February 15
Sunday 1 February 15
Big Data processing engines
MapReduce
Sunday 1 February 15
Sunday 1 February 15
Stream processing
Sunday 1 February 15
Large scale graph processing
Sunday 1 February 15
2012
Sunday 1 February 15
2014
Sunday 1 February 15
Vanilla Hadoop ecosystem
Sunday 1 February 15
Hortonworks data flatform
Sunday 1 February 15
Sunday 1 February 15
Hadoop ecosystem: Microsoft
HDinsight
Sunday 1 February 15
BigData & Cloud
A Match made in heaven?
Sunday 1 February 15
Sunday 1 February 15
Sunday 1 February 15
Cloud features
Sunday 1 February 15
Data in the Clouds
As estimated by IDC, by 2020, about 40% data
globally would be touched with Cloud Computing.
Cloud adoption is accelerating – the amount of
data stored in Amazon Web Services (AWS) S3
cloud storage has jumped from 262 billion objects
in 2010 to over 1 trillion objects at the end of the
first second of 2012.
Sunday 1 February 15
While enterprises often keep their most sensitive data in-house, huge
volumes of data such as social media data may be located externally.
It is a fact that data that is too big to process is also too big to transfer
anywhere, so it’s just the analytical program which needs to be moved
—not the data.
"You don't want to be shipping terabytes and petabytes around,".
"Keep the data where it is, and then you move the analytics … to that
data."
Sunday 1 February 15
Cloud enables BigData
Some of the first adopters of big data in
cloud computing are users that deployed
Hadoop clusters in highly scalable and
elastic clouds: IBM, Azure, AWS
Cloud computing democratizes big data –
any enterprise can now work with
unstructured data at a huge scale.
Analytics-as-a-service (AaaS) models
for cloud-based big data analytics
Sunday 1 February 15
Drivers for big data on cloud adoption
Cost reduction
Managing cloud-based big data is cost-effective, scalable, and fast to build.
Rapid provisioning/time to market
Faster provisioning is important for big data applications because the value of data
reduces quickly as time goes by. 
Flexibility/scalability
Big data analysis, especially in the life sciences industry, requires huge compute
power for a brief amount of time. For this type of analysis, servers need to be
provisioned in minutes.
Sunday 1 February 15
Sunday 1 February 15
Sunday 1 February 15
BigData is not always
Cloud-appropriate
Low latency realtime data
Virtualization overhead
Multi-tenancy overhead
Scalability
Lack of cloud computing features to support RDBMS
Availability
“Rain cloud” incorporates clouds
Data integrity/privacy
Data can only be accessed by authorized users
Currently, encryption is utilized by most researchers to ensure data privacy in the cloud
Sunday 1 February 15
NoSQL vs SQL in the Cloud
Sunday 1 February 15
Data security/peformance trade-offs
Distributed nodes
Distributed data
Internode communication
RPC over TCP/IP?
Encrypted IO?
Security/performance trade-offs
Sunday 1 February 15
Cloud Architecture for Big Data
Resource scheduling and SLA for Big Data on
Cloud
Storage and computation management in Cloud for
Big Data
Large-scale data intensive workflow in support of
Big Data processing on Cloud
Multiple source data processing and integration on
Cloud
Virtualisation and visualisation of Big Data on Cloud
Fault tolerance and reliability for Big Data
processing on Cloud
MapReduce with Cloud for Big Data processing
Distributed file storage system with Cloud for Big Data
Inter-cloud technology for Big Data
Security, privacy and trust in Big Data processing on Cloud
Green, energy-efficient models and sustainability issues in Cloud for Big Data
processing
Cloud infrastructure for social networking with Big Data
User friendly Cloud access for Big Data processing
Innovative Cloud data centre networking for Big Data
Wireless and mobility support in Cloud data centre for Big Data
Sunday 1 February 15
BigData use cases
Sunday 1 February 15
Security Analytics
Sunday 1 February 15
Sunday 1 February 15
Sunday 1 February 15
Thank you for your attention
Sunday 1 February 15
Sunday 1 February 15
8 big trends in big data analytics
http://www.computerworld.com/article/2690856/8-big-trends-in-big-
data-analytics.html
Sunday 1 February 15
Reference
http://www.oracle.com/us/corporate/profit/big-ideas/012314-
spasalapudi-2112687.html
https://gigaom.com/2014/10/15/cloud-computing-is-going-to-
absorb-your-big-data-workloads-too/
Sunday 1 February 15
Classification of BigData
Sunday 1 February 15
Relationship between Cloud and
BigData
Sunday 1 February 15
Sunday 1 February 15
Sunday 1 February 15
Open research issues
Data staging
Distributed storage systems: NoSQL, NewSQL
Data analysis
Data security
Sunday 1 February 15
In theory, Unfortunately, it’s not all good news.
DB administrators don’t have an easy ride. The NoSQL databases
that have appeared in the last few years, with their key-value pairs,
document stores, and missing schemas,
Sunday 1 February 15

More Related Content

Viet stack 2nd meetup - BigData in Cloud Computing