SlideShare a Scribd company logo
Data Science 
Module 1: Introduction 
to Data Science
Class Recording in LMS 
24/7 Post Class Support 
Module Wise Quiz 
Project Work on Large Data Base 
Verifiable Certificate 
Slide2 
www.edureka.in/data-science 
How it Works?
Slide3 
www.edureka.in/data-science 
Topics for the Day 
Big Data 
Big Data Scenarios 
Big Data Challenges 
Introduction to Data Science 
Data Science: Components 
Types of Data Scientists 
Data Science: Core Components 
Use-Cases 
Introduction to Hadoop and R 
R and Hadoop Integration 
Machine Learning with Mahout 
Assignment, Pre-work and Agenda for the Next Class 
What’s Within the LMS 
References
Objectives 
At the end of this module, you will be able to 
Understand Big Data and its challenges 
Implement Big Data in real time scenarios 
List and explain the components and prospects of Data Science 
Learn the implementation of Hadoop on Big data 
Analyze some real world use-cases with the help of R programming Language 
Understand machine learning concepts
Slide5 
www.edureka.in/data-science 
Data Science
Slide6 
www.edureka.in/data-science 
Big Data
Slide7 
www.edureka.in/data-science 
What is Big Data? 
LotsofData(TerabytesorPetabytes) 
Systems/EnterprisesgeneratehugeamountofdatafromTerabytestoandevenPetabytesofinformation 
http://www.today.mccombs.utexas.edu/2012/04/the-big-data-machine
Slide8 
www.edureka.in/data-science 
http://www.clker.com/clipart-13967.html 
Big Data Scenarios
Slide9 
www.edureka.in/data-science 
http://www.espncricinfo.com/ 
Big Data Scenarios: Sports
Slide10 
www.edureka.in/data-science 
http://www.espncricinfo.com/ 
Big Data Scenarios: Sports 
Sportsteamsareusingdatafortrackingticketsalesandevenfortrackingteamstrategies. 
Advertisingandmarketingagenciesaretrackingsocialmediatounderstandresponsivenesstocampaigns,promotions,andotheradvertisingmediums
Slide11 
www.edureka.in/data-science 
Big Data Scenarios : Hospital Care 
http://www.majorprojects.vic.gov.au/our-projects/our-past-projects/austin-hospital
Slide12 
www.edureka.in/data-science 
Big Data Scenarios : Hospital Care 
Hospitalsareanalyzingmedicaldataandpatientrecordstopredictthosepatientsthatarelikelytoseekreadmissionwithinafewmonthsofdischarge.Thehospitalcantheninterveneinhopesofpreventinganothercostlyhospitalstay. 
Medicaldiagnosticscompanyanalyzesmillionsoflinesofdatatodevelopfirstnon-intrusivetestforpredictingcoronaryarterydisease.Todoso, researchersatthecompanyanalyzedover100milliongenesamplestoultimatelyidentifythe23primarypredictivegenesforcoronaryarterydisease e
Slide13 
www.edureka.in/data-science 
http://wp.streetwise.co/wp-content/uploads/2012/08/Amazon-Recommendations.png 
Big Data Scenarios : Amazon.com
Slide14 
www.edureka.in/data-science 
http://wp.streetwise.co/wp-content/uploads/2012/08/Amazon-Recommendations.png 
Amazonhasanunrivalledbankofdataononlineconsumerpurchasingbehaviourthatitcanminefromits152millioncustomeraccounts. 
AmazonalsousesBigDatatomonitor,trackandsecureits1.5billionitemsinitsretailstorethatarelayingaroundit200fulfilmentcentresaroundtheworld.AmazonstorestheproductcataloguedatainS3. 
S3canwrite,readanddeleteobjectsupto5TBofdataeach. ThecataloguestoredinS3receivesmorethan50millionupdatesaweekandevery30minutesalldatareceivediscrunchedandreportedbacktothedifferentwarehousesandthewebsite. 
Big Data Scenarios : Amazon.com
Slide15 
www.edureka.in/data-science 
http://smhttp.23575.nexcesscdn.net/80ABE1/sbmedia/blog/wp-content/uploads/2013/03/netflix-in-asia.png 
Big Data Scenarios: NetFlix
Slide16 
www.edureka.in/data-science 
http://smhttp.23575.nexcesscdn.net/80ABE1/sbmedia/blog/wp-content/uploads/2013/03/netflix-in-asia.png 
Netflix uses 1 petabyte to store the videos for streaming. 
BitTorrentSynchastransferredover30petabytesofdatasinceitspre-alphareleaseinJanuary2013. 
The2009movieAvatarisreportedtohavetakenover1petabyteoflocalstorageatWetaDigitalfortherenderingofthe3DCGIeffects. 
OnepetabyteofaverageMP3-encodedsongs(formobile, roughlyonemegabyteperminute),wouldrequire2000yearstoplay. 
Big Data Scenarios: NetFlix
Slide17 
www.edureka.in/data-science 
http://www.crowdsourcing.org/article/-nasa-tries-to-free-creativity-with-big-data-challenge/19984 
Big Data Scenarios: The Large Hadron Collider
Slide18 
www.edureka.in/data-science 
TheexperimentsintheLargeHadronColliderproduceabout15petabytesofdataperyear,whicharedistributedovertheWorldwideLHCComputingGrid. 
OnepetabyteisenoughtostoretheDNAoftheentirepopulationoftheUSA-withcloningittwice. 
http://en.wikipedia.org/wiki/Large_Hadron_Collider 
Big Data Scenarios: The Large Hadron Collider
Slide19 
www.edureka.in/data-science 
IBM’s Definition 
IBM’s Definition–Big DataCharacteristics 
http://www-01.ibm.com/software/data/bigdata/ 
Web logs 
Images 
Videos 
Audios 
Sensor Data 
VOLUME 
VELOCITY 
VARIETY
Slide20 
www.edureka.in/data-science 
IBM’s Definition 
Structured 
Unstructured 
Semi structured 
All the above 
Variety 
3 Vs of Big data 
Batch 
Near Time 
Real Time 
Streams 
Velocity 
Terabytes 
Records 
Transactions 
Tables, files 
Volume 
IBM’s Definition–Big DataCharacteristics 
http://www-01.ibm.com/software/data/bigdata/
Slide21 
www.edureka.in/data-science 
http://whatsthebigdata.files.wordpress.com/2013/11/batman-on-big-data.jpg 
What about ‘Veracity’?
Slide22 
www.edureka.in/data-science 
Hello There!! 
My name is Annie. I love quizzes and 
puzzles and I am here to make you guys think and answer my questions. 
Hello There!! 
My name is Annie. I love quizzes and 
puzzles and I am here to make you guys think and answer my questions. 
Annie’s Introduction
Slide23 
www.edureka.in/data-science 
Map the following to corresponding type: Structured/ Unstructured/ Semi- structured. 
-XML Files 
-Word Docs, PDF files, Text files 
-E-Mail body 
-Data from Enterprise systems (ERP, CRM etc.) 
Annie’s Question
Slide24 
www.edureka.in/data-science 
XML Files -> Semi-structured data 
Word Docs, PDF files, Text files -> Unstructured Data 
E-Mail body -> Unstructured Data 
Data from Enterprise systems (ERP, CRM etc.) -> Structured Data 
Annie’s Answer
Slide25 
www.edureka.in/data-science 
Big Data: Challenges 
http://spinnakr.com/blog/wp-content/uploads/2013/08/Using-Big-Data-.jpg
Slide26 
www.edureka.in/data-science 
Big Data Challenges 
Data security and Privacy 
High variety of Information 
High veracity of Data 
Data Acquisition 
High velocity of processed Data 
Information search and Analytics 
High volume of Data 
Information storage and Analytics 
Big Data: Challenges
Slide27 
www.edureka.in/data-science 
http://thesocietypages.org/sociologylens/files/2013/09/BIgDataDilbert_Cartoon.jpg
Slide28 
www.edureka.in/data-science 
http://escience.washington.edu/blog/uw-berkeley-nyu-collaborate-378m-data-science-initiative 
Data Science
Slide29 
www.edureka.in/data-science 
Data Science 
“More data usually beats better algorithms,” 
Such as: Recommending movies or music based on past preferences.
Slide30 
www.edureka.in/data-science 
No matter how extremely unpleasant your algorithm is, they can often be beaten simply by having moredata (and a less sophisticated algorithm). 
Big Data is here 
Bad News 
We are struggling to store and analyze it. 
Good News 
Data Science
Slide31 
www.edureka.in/data-science 
http://abstrusegoose.com/55 
Data Science: Components
Slide32 
www.edureka.in/data-science 
Data Science 
Visualization 
Data Engineering 
Statistics 
Advanced Computing 
Domain Expertise 
Data Science: Components
Slide33 
www.edureka.in/data-science 
Data Science: Prospects
Slide34 
www.edureka.in/data-science 
Types of Data Scientists 
BasedonclusteringthewaysthatdataishandledbyDataScientists,thefollowing4categoriescanbecreated: 
DataBusinesspeoplearetheproductandprofit-focuseddatascientists.They’releaders,managers,andentrepreneurs,butwithatechnicalbent.AcommoneducationalpathisanengineeringdegreepairedwithanMBA. 
DataCreativesareeclecticjacks-of-all-trades,abletoworkwithabroadrangeofdataandtools.Theymaythinkofthemselvesasartistsorhackers,andexcelatvisualizationandopensourcetechnologies. 
DataDevelopersarefocusedonwritingsoftwaretodoanalytic,statistical,andmachinelearningtasks,ofteninproductionenvironments.Theyoftenhavecomputersciencedegrees,andoftenworkwithso-called“bigdata”. 
DataResearchersapplytheirscientifictraining,andthetoolsandtechniquestheylearnedinacademia,toorganizationaldata.TheymayhavePhDs,andtheircreativeapplicationsofmathematicaltoolsyieldsvaluableinsightsandproducts. 
http://datacommunitydc.org/blog/2013/06/there-is-more-than-one-kind-of-data-scientist/
Slide35 
www.edureka.in/data-science 
Relationships -Four Categories and the Five Skill Groups 
http://datacommunitydc.org/blog/wp-content/uploads/2012/08/SkillsSelfIDMosaic-edit-500px.png
Slide36 
www.edureka.in/data-science 
Data Science: Core Components 
Data Science 
Data Architecture 
Tool: Hadoop 
Machine Learning 
Tool: Mahout 
Analytics 
Tool: R
Slide37 
www.edureka.in/data-science 
Use-Cases
Slide38 
www.edureka.in/data-science 
No one Knows How to Use it
Slide39 
www.edureka.in/data-science 
Use-Case Implementation: Techniques Used 
A Problem Dataset 
Analysis Results
Slide40 
www.edureka.in/data-science 
Understanding the Machine Learning algorithm to be used 
Implementing Machine Learning in Hadoop on Big Data 
Visualisation of the analysis 
Understanding the problem statement and defining the solution 
Exploring ways to integrate R with Hadoop 
Implementing Machine Learning algorithm in R on the smaller dataset 
Use-Case Implementation:Process Flow Diagram
Slide41 
www.edureka.in/data-science 
DomainoftheDataset: 
CommunicationsandMedia.However,theapplicationofthealgorithmisnotlimitedtoonlyCommunicationsandMedia.Thetechniqueisusefulforanydomainwhichrequiresorganizingdocumentstoimproveretrievalandsupportbrowsing. 
ProblemStatement: 
AtopmediacompanywantstobrowsethroughthepopularnewsfromacollectionthatappearedontheReutersnewswirein1987. 
Clustering/Groupingdocumentsbasedontheircontentswillmaketheanalysiseasier. 
Media Use-Case 
The Reuters-21578 data set composition
Slide42 
www.edureka.in/data-science 
Media Use-Case: K-means Clustering 
First we will understand the implementation of the technique in R on a smaller dataset 
Then we will understand how to achieve document clustering on Big Data using Mahout libraries on Hadoop 
K-Means Clustering can be implemented on this dataset 
Communications and Media Dataset to be Clustered based on their contents 
R Implementation 
Hadoop 
Implementation 
Machine Learning 
Implementation 
Content-wise Clustered/Grouped documents
Slide43 
www.edureka.in/data-science 
DomainoftheDataset: 
ProductsandRetail.However,theapplicationofthealgorithmisnotlimitedtoonlyProductsandRetail.Thetechniquecanbeappliedwhereverwewanttodiscovertheco-occurrencerelationshipamongstvariousactivities. 
ProblemStatement: 
MarketBasketAnalysis. 
Aretailoutletwantsunderstandthepurchasebehaviorofabuyer.Thisinformationwillenabletheretailertounderstandthebuyer'sneeds. 
Theanalysismighttellaretailerthatcustomersoftenpurchaseshampooandconditionertogether,soputtingbothitemsonpromotionatthesametimewouldcreateasignificantincreaseinprofit,whileapromotioninvolvingjustoneoftheitemswouldlikelydrivesalesoftheother. 
Market Basket Use-Case 
Market Basket Analysis 
98% of people who purchased items A and B also purchased item C
Slide44 
www.edureka.in/data-science 
Market Basket Use-Case: Association Rule Mining 
Product and Retail Dataset 
Understand the implementation of the technique on a smaller dataset 
Understand how to achieve the same on Big Data using Mahout libraries on Hadoop 
The technique used is Affinity Analysis or Association Rule Mining 
R Implementation 
Hadoop 
Implementation 
Machine Learning 
Implementation 
Market Basket Analysis
Slide45 
www.edureka.in/data-science 
DomainoftheDataset: 
LifeScienceandHealthCare.However,theapplicationofthealgorithmisnotlimitedtoonlyLifeScienceandHealthCare.Thetechniquecanbeappliedwhereverwewanttoforecasttheoccurrenceofaeventonthebasisofcertainconditions. 
ProblemStatement: 
AhealthcareorganizationwantstoforecasttheonsetofdiabetesmellitusinIndiansusingcertainsetofattributesofpatientsasinputsuchas: 
Plasmaglucoseconcentration 
Diastolicbloodpressure 
Tricepsskinfoldthickness 
etc. 
Health Care Use-Case 
http://www.thenewstribe.com/2013/11/15/diabetes-is-killing-one-patient-every-six-seconds/
Slide46 
www.edureka.in/data-science 
Understand the basic implementation of the technique on a smaller dataset using R 
Achieve parallel processing on the same algorithm using a parallel processing library provided by Revolution R. 
Understand how to achieve the same on Big Data using Mahout libraries on Hadoop 
The technique used is Affinity Analysis or Association Rule Mining. 
R Implementation 
Hadoop 
Implementation 
Machine Learning 
Implementation 
Forecast the onset of diabetes mellitus in Indians 
Life Science and Health Care Dataset with some attributes of patients as input. 
Health Care Use-Case: Parallel Processing
Slide47 
www.edureka.in/data-science 
DomainoftheDataset: 
SocialMedia.However,theapplicationofthealgorithmisnotlimitedtoonlySocialMedia.Thetechniquecanbeappliedwhereverwewanttoputdocumentsintocategorywithoutgoingthroughthecontentsofallthedocuments. 
ProblemStatement: 
ASocialMediaresearchfirmwantstoknowthetrendsoftopicsdiscussedonTwitter.Foreasyanalysisitwantstoclassifytheminthefollowingcategories: 
apparel(clothes,shoes,watches,…) 
art(Book,DVD,Music,…) 
camera 
event(travel,concert,…) 
health(beauty,spa,…) 
home(kitchen,furniture,garden,…) 
tech(computer,laptop,tablet,…) 
http://www.mobigyaan.com/images/stories/Miscellaneous/mobigyaan-twitter-chat.jpg 
Social Media Use-Case
Slide48 
www.edureka.in/data-science 
Social Media Use-Case: Naïve Bayes Classifier 
Understand the basic implementation of the technique on a smaller dataset using R. 
Understand how to achieve the same on Big Data using Mahout libraries on Hadoop. 
The technique used is Naïve Bayes Classifier. 
Social Media dataset 
R Implementation 
Hadoop 
Implementation 
Machine Learning 
Implementation 
Categorical classification of the tweets
Slide49 
www.edureka.in/data-science 
Going forward with the class, we will throw some light on the concepts of Hadoop, R andMachine Learning respectively. 
These topics will be vividly covered in their respective modules during the course. 
Data Science: Core Components
Slide50 
www.edureka.in/data-science 
Introduction to Hadoop
Slide51 
www.edureka.in/data-science 
ApacheHadoopisaframeworkthatallowsforthedistributedprocessingoflargedatasetsacrossclustersofcommoditycomputersusingasimpleprogrammingmodel. 
ItisanOpen-sourceDataManagementwithscale-outstorage&distributedprocessing. 
In2004,GooglepublishedapaperonaprocesscalledMapReduce. 
MapReduceframeworkprovidesaparallelprocessingmodelandassociatedimplementationtoprocesshugeamountofdata. 
Therefore,animplementationofMapReduceframeworkwasadoptedbyanApacheopensourceprojectnamedHadoop. 
Introduction to Hadoop
Slide52 
www.edureka.in/data-science 
Hadoop Key Characteristics 
Scalable 
Reliable 
Economical 
Flexible 
Robust Ecosystem 
Hadoop Key Characteristics
Slide53 
www.edureka.in/data-science 
Hadoop Core Components 
Data Node 
Task 
Tracker 
Data Node 
Task 
Tracker 
Data Node 
Task 
Tracker 
Data Node 
Task 
Tracker 
MapReduce 
Engine 
HDFS 
Cluster 
Job Tracker 
Admin Node 
Name node
Slide54 
www.edureka.in/data-science 
Hadoop is a framework that allows for the distributed processing of: 
-Small Data Sets 
-Large Data Sets 
Annie’s Question
Slide55 
www.edureka.in/data-science 
Large Data Sets. It is also capable to process small data-sets however to experience the true power of Hadoop one needs to have data in Tb’s because this where RDBMS takes hours and fails whereas Hadoop does the same in couple of minutes. 
Annie’s Answer
Slide56 
www.edureka.in/data-science 
For setting-up Hadoop on your system you can follow the “Hadoop Installation Guide” present in the LMS.
Slide57 
www.edureka.in/data-science 
Analytics with R
Slide58 
www.edureka.in/data-science 
Analytics with R 
http://www.r-project.org/
Slide59 
www.edureka.in/data-science 
R : Characteristics 
Risopensourceandfree. 
Rhaslotsofpackagesandmultiplewaysofdoingthesamething. 
BydefaultstoresmemoryinRAM. 
Rhasthemostadvancedgraphics.Youneedmuchbetterprogrammingskills. 
RhasGUItohelpmakelearningeasier. 
Customizationneedscommandline. 
Rcanconnecttomanydatabaseanddatatypes.
Slide60 
www.edureka.in/data-science 
ComparingRandothers 
http://r4stats.com/articles/popularity/ 
ComparingR
Slide61 
www.edureka.in/data-science 
ComparingR with Base SAS* /SAS Stat* 
*Copyright©2012SASInstituteInc.,SASCampusDrive,Cary,NorthCarolina27513,USA.Allrightsreserved. 
R 
Base SAS* /SAS Stat* 
R is opensourceand free 
BaseSAS*,SAS/Stat*,SAS/ET*, SAS/OR*, SAS/Graph*are expensive relativelybecauseof annuallicenses 
OpensourceRhassupportfrom emaillists, twitter,stack overflow 
SASInstitute*productshavededicated supportandextensivedocumentation 
R is sloweronthedesktopthanbase SASfordatasets~4-5gb 
BydefaultRstoresmemoryinRAM, sowe canusethecloud 
R has muchbettergraphics 
Youneedmuchbetterprogramming skills 
Youcan createcustomfunctionsin R easily 
Customizationneedscommandline 
R has multipleGUIthatarefree 
SASGUI are moreexpensive
Slide62 
www.edureka.in/data-science 
Annie’s Question 
R Provides support in terms of: 
1.Dedicated Support and Documentation 
2.Email-lists, twitter, etc.
Slide63 
www.edureka.in/data-science 
Annie’s Answer 
Answer: 
2. Email-lists, twitter, etc.
Slide64 
www.edureka.in/data-science 
Annie’s Question 
Custom functions can be easily created in : 
1.SAS 
2.R
Slide65 
www.edureka.in/data-science 
Annie’s Answer 
Answer: 
1. R
Slide66 
www.edureka.in/data-science 
Annie’s Question 
Most of the functions in R are written in : 
-Java 
-R 
-C 
-Fortran
Slide67 
www.edureka.in/data-science 
Annie’s Answer 
Most of the user-visible functions in R are written in R. 
It is possible for the user to interface to procedures written in the C, C++, or FORTRAN languages for efficiency.
Slide68 
www.edureka.in/data-science 
Introduction to R Programming language 
www.r-project.org/about.html 
History 
Evolution 
Current State 
Open Source 
Free 
Widely Recognized 
Official Website 
R Core 
Creators 
R Journal
Slide69 
www.edureka.in/data-science 
R and Hadoop Integration 
RandHadoopareanaturalmatchinBigDataAnalyticsandvisualization. 
Oneofthemostwell-knownRpackagestosupportHadoopfunctionalitiesis:RHadoop 
RhadoopwasdevelopedbyRevolutionAnalytics. 
RHadoopisacollectionofthreeRpackages:rmr,rhdfsandrhbase. 
rmrpackageprovidesHadoopMapReducefunctionalityinR,rhdfsprovidesHDFSfilemanagementinRandrhbaseprovidesHBasedatabasemanagementfromwithinR. 
+
Slide70 
www.edureka.in/data-science 
For setting-up R on your system you can follow the “R Installation Guide” present in the LMS under module 1.
Slide71 
www.edureka.in/data-science 
Machine Learning
Slide72 
www.edureka.in/data-science 
Machine Learning: Mahout 
MachineLearningisaclass of algorithmswhichisdata-driven,i.e.unlike"normal" algorithmsitis 
thedatathat"tells"whatthe"goodanswer"is. 
Example: 
Anhypotheticalnon-machinelearningalgorithmforfacerecognitioninimageswouldtrytodefine 
whatafaceis(roundskin-like-coloreddisk,withdarkareawhereyouexpecttheeyesetc). 
Amachinelearningalgorithmwouldnothavesuchcodeddefinition,butwill 
"learn-by-examples":you'llshowseveralimagesoffacesandnot-facesandagoodalgorithmwill 
eventuallylearnandbeabletopredictwhetherornotanunseenimageisaface. 
http://endthelie.com/2012/08/24/fbi-sharing-facial-recognition-software-with-police-departments-across-america/
www.edureka.in/data-science 
MahoutOverview 
Slide 73 
Mahout is about scalable Machine Learning 
Mahout has functionality for many of today’s common machine learning tasks 
Machine Learning is all over the web today 
MapReduce magic in action
www.edureka.in/data-science 
Slide74 
Hadoop and MapReduce magic in action 
Write intelligent applications using Apache Mahout 
https://cwiki.apache.org/confluence/display/MAHOUT/Powered+By+Mahout 
LinkedIn Recommendations 
Machine Learning: LinkedIn Recommendations
Slide75 
www.edureka.in/data-science 
Annie’s Question 
Mahout Algorithms for clustering, classification and collaborative filtering are implemented on top of Apache Hadoop using : 
-Flume 
-MapReduce 
-Sqoop 
-Hive
Slide76 
www.edureka.in/data-science 
Annie’s Answer 
Mahout Algorithms are implemented on top of Apache Hadoop using the Map/Reduce paradigm.
Slide77 
www.edureka.in/data-science 
1.Install R with the help of “R Installation Steps” guide in the LMS. This is a step wise guide which will help you in installing and setting up R on your system 
Assignment
Slide78 
www.edureka.in/data-science 
Agenda for Next Class 
Understand what is R 
Describe why R is used? 
Implement R Programming Concepts 
Learn Data Import Techniques 
Analyze the Processing of Data 
In the next class you will be able to
Slide79 
www.edureka.in/data-science 
Pre-work 
Go through the “R Essentials for Data Science” section in the LMS. Watch the recordings present in the section to gain an understanding of the R environment.
Slide80 
www.edureka.in/data-science 
What’s Within the LMS?
Slide81 
www.edureka.in/data-science 
What’s Within the LMS? 
Recording of the Class 
Presentation 
Quiz
Slide82 
www.edureka.in/data-science 
What’s Within the LMS? 
Assignment 
Installation Guide 
Pre-work
Slide83 
www.edureka.in/data-science 
References 
http://www.today.mccombs.utexas.edu/2012/04/the-big-data-machine 
http://www.espncricinfo.com/ 
http://www.majorprojects.vic.gov.au/our-projects/our-past-projects/austin-hospital 
http://wp.streetwise.co/wp-content/uploads/2012/08/Amazon-Recommendations.png 
http://smhttp.23575.nexcesscdn.net/80ABE1/sbmedia/blog/wp-content/uploads/2013/03/netflix-in-asia.pnghttp://www.crowdsourcing.org/article/-nasa-tries-to-free-creativity-with-big-data-challenge/19984 
http://whatsthebigdata.files.wordpress.com/2013/11/batman-on-big-data.jpghttp://spinnakr.com/blog/wp-content/uploads/2013/08/Using-Big-Data-.jpg 
http://thesocietypages.org/sociologylens/files/2013/09/BIgDataDilbert_Cartoon.jpg 
http://abstrusegoose.com/55http://www.thenewstribe.com/2013/11/15/diabetes-is-killing-one-patient-every-six-seconds/ 
http://www.mobigyaan.com/images/stories/Miscellaneous/mobigyaan-twitter-chat.jpghttp://www.r-project.org/ http://endthelie.com/2012/08/24/fbi-sharing-facial-recognition-software-with-police-departments-across-america/
How it works- Data Science

More Related Content

How it works- Data Science