Viet-Trung Tran presents information on big data and cloud computing. The document discusses key concepts like what constitutes big data, popular big data management systems like Hadoop and NoSQL databases, and how cloud computing can enable big data processing by providing scalable infrastructure. Some benefits of running big data analytics on the cloud include cost reduction, rapid provisioning, and flexibility/scalability. However, big data may not always be suitable for the cloud due to issues like data security, latency requirements, and multi-tenancy overhead.
Report
Share
Report
Share
1 of 61
Download to read offline
More Related Content
Viet stack 2nd meetup - BigData in Cloud Computing
1. BigData in Cloud computing
Viet-Trung Tran
@Vietstack
Sunday 1 February 15
13. The Data Science: The 4th Paradigm
for Scientific Discovery
Last
few decades
Thousand
years ago
Today and the
Future
Last few
hundred years
2
2
2.
3
4
a
cG
a
a
Κ−=
##
#
$
%
&&
&
'
(
ρπ
Simulation of
complex phenomena
Newton’s laws,
Maxwell’s equations…
Description of natural
phenomena
Crédits: Dennis Gannon
Sunday 1 February 15
14. What’s BigData
Data has always been Big. The one aspect that differs now, if
compared with the past, would be the sheer scale and accessibility
of Data, which is the direct result of the super efficient speeds in
which data can now be computed. Big Data is therefore an all-
encompassing term for any collection of large data sets that were
once difficult to process.
Big data requires exceptional technologies to efficiently process large
quantities of data within tolerable elapsed times.
Sunday 1 February 15
16. Simplified BigData stack
Data analytics &
visualization
Data processing frameworks
(Streaming, MapReduce, BSP
model)
Data management systems BlobSeer
Sunday 1 February 15
19. The last 25 years of commercial DBMS development can be summed
up in a single phrase: "one size fits all". This phrase refers to the fact
that the traditional DBMS architecture (originally designed and
optimized for business data processing) has been used to support
many data-centric applications with widely varying characteristics and
requirements. In this paper, we argue that this concept is no longer
applicable to the database market, and that the commercial
world will fracture into a collection of independent database
engines, some of which may be unified by a common front-end
Sunday 1 February 15
21. Why NoSQL
“The whole point of seeking alternatives [to RDBMS systems] is that you need to
solve a problem that relational databases are a bad fit for.” Eric Evans -
Rackspace
ACID does not scale
Web applications have different needs
Scalability
Elasticity
Flexible schema/ semi-structured data
Geographically distributed
Web applications do not always need
Transaction
Strong consistency
Complex queries
Sunday 1 February 15
38. Data in the Clouds
As estimated by IDC, by 2020, about 40% data
globally would be touched with Cloud Computing.
Cloud adoption is accelerating – the amount of
data stored in Amazon Web Services (AWS) S3
cloud storage has jumped from 262 billion objects
in 2010 to over 1 trillion objects at the end of the
first second of 2012.
Sunday 1 February 15
39. While enterprises often keep their most sensitive data in-house, huge
volumes of data such as social media data may be located externally.
It is a fact that data that is too big to process is also too big to transfer
anywhere, so it’s just the analytical program which needs to be moved
—not the data.
"You don't want to be shipping terabytes and petabytes around,".
"Keep the data where it is, and then you move the analytics … to that
data."
Sunday 1 February 15
40. Cloud enables BigData
Some of the first adopters of big data in
cloud computing are users that deployed
Hadoop clusters in highly scalable and
elastic clouds: IBM, Azure, AWS
Cloud computing democratizes big data –
any enterprise can now work with
unstructured data at a huge scale.
Analytics-as-a-service (AaaS) models
for cloud-based big data analytics
Sunday 1 February 15
41. Drivers for big data on cloud adoption
Cost reduction
Managing cloud-based big data is cost-effective, scalable, and fast to build.
Rapid provisioning/time to market
Faster provisioning is important for big data applications because the value of data
reduces quickly as time goes by.
Flexibility/scalability
Big data analysis, especially in the life sciences industry, requires huge compute
power for a brief amount of time. For this type of analysis, servers need to be
provisioned in minutes.
Sunday 1 February 15
44. BigData is not always
Cloud-appropriate
Low latency realtime data
Virtualization overhead
Multi-tenancy overhead
Scalability
Lack of cloud computing features to support RDBMS
Availability
“Rain cloud” incorporates clouds
Data integrity/privacy
Data can only be accessed by authorized users
Currently, encryption is utilized by most researchers to ensure data privacy in the cloud
Sunday 1 February 15
47. Cloud Architecture for Big Data
Resource scheduling and SLA for Big Data on
Cloud
Storage and computation management in Cloud for
Big Data
Large-scale data intensive workflow in support of
Big Data processing on Cloud
Multiple source data processing and integration on
Cloud
Virtualisation and visualisation of Big Data on Cloud
Fault tolerance and reliability for Big Data
processing on Cloud
MapReduce with Cloud for Big Data processing
Distributed file storage system with Cloud for Big Data
Inter-cloud technology for Big Data
Security, privacy and trust in Big Data processing on Cloud
Green, energy-efficient models and sustainability issues in Cloud for Big Data
processing
Cloud infrastructure for social networking with Big Data
User friendly Cloud access for Big Data processing
Innovative Cloud data centre networking for Big Data
Wireless and mobility support in Cloud data centre for Big Data
Sunday 1 February 15
60. Open research issues
Data staging
Distributed storage systems: NoSQL, NewSQL
Data analysis
Data security
Sunday 1 February 15
61. In theory, Unfortunately, it’s not all good news.
DB administrators don’t have an easy ride. The NoSQL databases
that have appeared in the last few years, with their key-value pairs,
document stores, and missing schemas,
Sunday 1 February 15