SlideShare a Scribd company logo
BIG Data
Desai Karan A
https://in.linkedin.com/in/karan28
SYNOPSIS:
1. Handy Hands-on
2. Introduction to big data
3. Big Data Niceties
4. Specifics of Big Data
5. Big Data Management Tools
6. Practical use-cases
7. Conclusions
8. References
1 Handy Hands-On
Introduction to Big Data
Introduction to Big Data
Introduction to Big Data
2. Introduction to big data
-2.1 What is big data?
-2.2 Etymology.
-2.3 Hype and Facts.
2.1 What is big data?
• “Big data” refers to datasets whose size is
beyond the ability of typical database software
tools to capture, store, manage, and analyze.
• Big Data is the extremely large data sets that
may be analyzed computationally to reveal
patterns, trends, and associations, especially
relating to human behavior and interactions.
• Big data is the data of range more than 1000
gigabytes or 100 zettabytes.
2.2 Etymology: Word Origination
Big data is the simplest,
shortest phrase to convey that
the boundaries of computing
keep advancing, growing,
diversifying and intensifying
rapidly..
John R Mashey, chief
scientist at Silicon Graphics
coined the term “Big Data”.
2.3 Hype and Facts
2.3 Hype and Facts
Introduction to Big Data
GLOBALLY, EVERY 60 SECONDS…
• 204 Million emails are
sent.
• 300k logins to .
• 1.3 Million views on
YouTube.
• 2 Million Google searches.
• 100k tweets.
• 62,000 hours of Music
Downloads
• WE GENERATE 2.5 QUINTILION BYTES
EVERYDAY
• IN 2012, WORLD’S INFORMATION
CROSSED 2 ZETTA BYTES =2
TRILLION GIGABYTES!!
2.3 Hype and Facts (contd.)
3. Big Data Niceties.
-3.1 Evolution of Big Data
-3.2 Why traditional tools fail?
-3.3 Utilities of Big Data
3.1 Evolution Story:
Introduction to Big Data
• E-TSUNAMI and Heavy RAINS of DATA…
3.2 Why traditional tools fail? (contd.)
3.2 Why traditional tools fail?
• The present data is highly BIG for the
traditional data managers.
-Can work only with small samples of
data
-It is same as looking through keyhole
and finding size of room…
• High Turnaround time for meaningful
results
– Means Deciding to cross road based on
picture taken 5 minutes earlier!!
3.2 Why traditional tools fail? (contd.)
3.3 Big data utilities:
• Dealing with real time data.
• A new level of insight and
opportunity.
• More effective, fact based
decision making.
• A new source of business
values.
• A competitive advantage.
4. Specifics of Big Data
-4.1 Characteristics
-4.2 Life cycle
4.1 Characteristics
Big
data
Volume
Variety
Velocity
Veracity
Introduction to Big Data
Introduction to Big Data
Introduction to Big Data
Introduction to Big Data
Introduction to Big Data
4.2 Big Data Life Cycle
Insight
Enrich
Manage
• Manage and secure data of any size.
• Enrich by connecting world’s data.
• Insights on any data irrespective of
location
3.2 Big Data Life Cycle
Introduction to Big Data
5. Big Data Management tools.
-5.1 Cow story
-5.2 Introduction to Hadoop
-5.3 Basic Working of Hadoop.
5.1 Cow story: Case 1
It is easy for me
to handle my
resources.(Data)
.
Data
Storage device
MB/GB
Case 2 I am strong…I
can handle my
resources
Data Data
Data Data
Data Data
Storage device
TB
Case 3
Oof…There are so
many resources!!!
I am not strong!
Storage device
PB
Case 4
I call my
friends
for help
Big Data Management tools
5.2 Introduction to Hadoop
Apache Hadoop is an open-source software
framework for storage and large-scale
processing of data-sets on clusters of
commodity hardware.
Introduction to Hadoop
• Doug Cutting created the Apache Hadoop.
• Logo of Hadoop is a tiny yellow elephant.
5.3 Basic working of Hadoop
Read 1 TB of Data
1 Machine 10 Machine
• 4 I/O Channels
• Each channel: 100
MB/s
• ~ 45 minutes
• 4 I/O Channels
• Each channel: 100
MB/s
• ~4.5 Minutes
Present Hadoop basic
architecture.
Introduction to Big Data
Introduction to Big Data
Schematic Working.
Schematic Working.
• Application written in java for Big Data Processing
• Uses the “Map-Reduce��� Processing Paradigm
• Optimized for distributed storage and computing
of data
• Open Source
• Very low cost for acquisition and storage
Hadoop .
HadoopData Analytics
Other big data management
tools: Overview…
Introduction to Big Data
6. Practical Use-Cases
-6.1 Big apps of Big Data tools
-6.2 How big data affects small business
-6.3 Relevance of big data in market
6.1 Big apps of big data tools.
Introduction to Big Data
Who is using big data?
Who is using big data?
6.2 How big data affects
small businesses?
• Every organization has a tipping point, and
most organizations – regardless of size –
will eventually reach a point where the
volume, variety and velocity of their data
will be something that they have to
address.
• This new big data world is not only about
running problems faster, but about solving
problems that were not solvable before.
6.3 Relevance of big data in
market.
Introduction to Big Data
7. Conclusions
Conclusions: Through pics..
Conclusions: Through pics..
Conclusions: Through pics..
Introduction to Big Data
8. References:
• www.microsoft.com
• http://en.wikipedia.org/wiki/Hadoop
• http://en.wikipedia.org/wiki/Big_data
• www.google.com
• www.slideshare.net
• Pdf: Mgkinskey Global Institute
• Pdf: 101 Big data by Pradeep Vardan
• Workshop in college by ‘Ecsttasys’ on big
data
Introduction to Big Data

More Related Content

Introduction to Big Data

  • 1. BIG Data Desai Karan A https://in.linkedin.com/in/karan28
  • 2. SYNOPSIS: 1. Handy Hands-on 2. Introduction to big data 3. Big Data Niceties 4. Specifics of Big Data 5. Big Data Management Tools 6. Practical use-cases 7. Conclusions 8. References
  • 7. 2. Introduction to big data -2.1 What is big data? -2.2 Etymology. -2.3 Hype and Facts.
  • 8. 2.1 What is big data? • “Big data” refers to datasets whose size is beyond the ability of typical database software tools to capture, store, manage, and analyze. • Big Data is the extremely large data sets that may be analyzed computationally to reveal patterns, trends, and associations, especially relating to human behavior and interactions. • Big data is the data of range more than 1000 gigabytes or 100 zettabytes.
  • 9. 2.2 Etymology: Word Origination Big data is the simplest, shortest phrase to convey that the boundaries of computing keep advancing, growing, diversifying and intensifying rapidly.. John R Mashey, chief scientist at Silicon Graphics coined the term “Big Data”.
  • 10. 2.3 Hype and Facts
  • 11. 2.3 Hype and Facts
  • 13. GLOBALLY, EVERY 60 SECONDS… • 204 Million emails are sent. • 300k logins to . • 1.3 Million views on YouTube. • 2 Million Google searches. • 100k tweets. • 62,000 hours of Music Downloads
  • 14. • WE GENERATE 2.5 QUINTILION BYTES EVERYDAY • IN 2012, WORLD’S INFORMATION CROSSED 2 ZETTA BYTES =2 TRILLION GIGABYTES!! 2.3 Hype and Facts (contd.)
  • 15. 3. Big Data Niceties. -3.1 Evolution of Big Data -3.2 Why traditional tools fail? -3.3 Utilities of Big Data
  • 18. • E-TSUNAMI and Heavy RAINS of DATA… 3.2 Why traditional tools fail? (contd.)
  • 19. 3.2 Why traditional tools fail? • The present data is highly BIG for the traditional data managers. -Can work only with small samples of data -It is same as looking through keyhole and finding size of room…
  • 20. • High Turnaround time for meaningful results – Means Deciding to cross road based on picture taken 5 minutes earlier!! 3.2 Why traditional tools fail? (contd.)
  • 21. 3.3 Big data utilities: • Dealing with real time data. • A new level of insight and opportunity. • More effective, fact based decision making. • A new source of business values. • A competitive advantage.
  • 22. 4. Specifics of Big Data -4.1 Characteristics -4.2 Life cycle
  • 29. 4.2 Big Data Life Cycle Insight Enrich Manage
  • 30. • Manage and secure data of any size. • Enrich by connecting world’s data. • Insights on any data irrespective of location 3.2 Big Data Life Cycle
  • 32. 5. Big Data Management tools. -5.1 Cow story -5.2 Introduction to Hadoop -5.3 Basic Working of Hadoop.
  • 33. 5.1 Cow story: Case 1 It is easy for me to handle my resources.(Data) . Data Storage device MB/GB
  • 34. Case 2 I am strong…I can handle my resources Data Data Data Data Data Data Storage device TB
  • 35. Case 3 Oof…There are so many resources!!! I am not strong! Storage device PB
  • 36. Case 4 I call my friends for help Big Data Management tools
  • 37. 5.2 Introduction to Hadoop Apache Hadoop is an open-source software framework for storage and large-scale processing of data-sets on clusters of commodity hardware.
  • 38. Introduction to Hadoop • Doug Cutting created the Apache Hadoop. • Logo of Hadoop is a tiny yellow elephant.
  • 39. 5.3 Basic working of Hadoop
  • 40. Read 1 TB of Data 1 Machine 10 Machine • 4 I/O Channels • Each channel: 100 MB/s • ~ 45 minutes • 4 I/O Channels • Each channel: 100 MB/s • ~4.5 Minutes
  • 46. • Application written in java for Big Data Processing • Uses the “Map-Reduce” Processing Paradigm • Optimized for distributed storage and computing of data • Open Source • Very low cost for acquisition and storage Hadoop . HadoopData Analytics
  • 47. Other big data management tools: Overview…
  • 49. 6. Practical Use-Cases -6.1 Big apps of Big Data tools -6.2 How big data affects small business -6.3 Relevance of big data in market
  • 50. 6.1 Big apps of big data tools.
  • 52. Who is using big data?
  • 53. Who is using big data?
  • 54. 6.2 How big data affects small businesses? • Every organization has a tipping point, and most organizations – regardless of size – will eventually reach a point where the volume, variety and velocity of their data will be something that they have to address. • This new big data world is not only about running problems faster, but about solving problems that were not solvable before.
  • 55. 6.3 Relevance of big data in market.
  • 62. 8. References: • www.microsoft.com • http://en.wikipedia.org/wiki/Hadoop • http://en.wikipedia.org/wiki/Big_data • www.google.com • www.slideshare.net • Pdf: Mgkinskey Global Institute • Pdf: 101 Big data by Pradeep Vardan • Workshop in college by ‘Ecsttasys’ on big data

Editor's Notes

  1. ©Karan Desai(Follow me on twitter/@karlmit or https://in.linkedin.com/in/karan28) DISCLAIMER: The images or diagrams or content presented in the presentations are meant for educational purpose only. The author don’t guarantee the originality of any media of the presentation. The author has only combined and summed up the details regarding the topic from varied sources. The author is not subjected to any violation or copyrights.
  2. SSAS: SQL Server Analysis Services, SSAS, is an online analytical processing (OLAP), data mining and reporting tool in Microsoft SQL Server. Essbase is a multidimensional database management system (MDBMS) that provides a multidimensional database platform upon which to build analytic applications.  BM Cognos TM1 (formerly Applix TM1) is enterprise planning software used to implement collaborative planning, budgeting and forecasting solutions, as well as analytical and reporting applications. Power Pivot is a free add-in to the 2010 version of the spreadsheet application Microsoft Excel. PowerPivot workbooks are self contained web applications, merely requiring a 'Save as' to make them accessible in the browser as interactive solutions.”. K is a proprietary array processing language developed by Arthur Whitney and commercialized by Kx Systems. Since then, an open-source implementation known as Kona has also been developed. ... kdb is both a database (kdb) and a vector language (q). It's used by almost every major financial institution Vertica Systems is an analytic database management software company. QlikView is the most flexible Business Intelligence platform for turning data into knowledge. TIBCO Spotfire® designs, develops and distributes in-memory analytics software for next generation business intelligence. Tableau Software is an American computer software company headquartered in Seattle, Washington. It produces a family of interactive data visualization products focused on business intelligence Omniscope is single, in-memory, file-based application that enables agile, 'best practise' data sharing solutions An in-memory database (IMDB; also main memory database system or MMDB or memory resident database) is a database management system that primarily relies on main memory for computer data storage. It is contrasted with database management systems that employ a disk storage mechanism. Relational databases are row oriented, as the data in each row of a table is stored together. In a columnar, or column-oriented database, the data is stored across rows.