SlideShare a Scribd company logo
Big data Analytics
Big data Analytics
Introduction
 Unstructured data contains different (multiple) types of
data
 Unstructured data is a generic label for describing data
that is not contained in a database or some other type of
data structure.
 Unstructured data contains everything and presents
everywhere globally.
 More than 90% of social media data is unstructured.
Big data Analytics
Importance of Unstructured Data
 Every minute, there are more than 6,000 pictures shared
on social media sites and more than 200 million emails
sent.
 Analyzing social content such as Tweets, Facebook posts
and transcripts from support calls gives a clear view of
how customers perceive the value and issues.
 Unstructured data isn't well organized or easy to access,
but companies who analyze this data and integrate it into
their information management landscape can significantly
improve employee productivity.
Big data Analytics
Examples of Unstructured Data
 e-mail messages, word processing documents, videos,
photos, audio files, presentations, web pages.
 Examples of "unstructured data" may include books,
journals, documents, metadata, health records, audio,
video, analog data, images, files.
Big data Analytics
Influence of Unstructured data on Social media
 The social media needs to be part of the business
strategy by interacting with clients on customer.
 The statistics contain the number of Twitter
followers, Facebook likes, LinkedIn connections,
blog subscribers.
 Social media like Facebook is growing enormously
with the massive amount of unstructured data, they
are collecting.
 Twitter sees about 175 million tweets each day and
has more than 465 million accounts.
Big data Analytics
Big data Analytics
Technologies
 Data mining
 Pattern Recognition
 Operations Research
 Social Network Analytics (Facebook, Twitter, LinkedIn)
 Natural Language Processing
Big data Analytics
Tools to analyze
 R language
 Rapid Miner
 Weka
 Hadoop
 Python
Big data Analytics
RapidMiner
 Rapidminer provides an integrated environment for
machine learning, data mining, text mining, predictive
analytics.
 It is the most powerful tool, easy to use and intuitive
graphical interface for the design of analytic process.
 The code is written in JAVA.
 Runs on all major platforms and operating system.
 Save time by identifying possible errors, and get
suggested quick fixes and support .csv, excel and binary
files.
Big data Analytics
RapidMiner
Imported from csv, excel files.
Statistics, charts.
Big data Analytics
Weka
 Weka is a collection of machine learning algorithms.
 It contains tools for data pre-processing,
classification, regression, clustering, association
rules, and visualization.
 It is s written in Java and runs on almost any
platform.
 Large collection of different data mining algorithms.
Big data Analytics
Python
 Connect python with R by installing package
“Rserve”
 High level language and easy to interpret.
 Free and open source, runs on all platforms.
Big data Analytics
R language
 R is very effective statistical tool and well worth the effort
to learn.
 R is polymorphic, which means that the same function
can be applied to different types of objects.
 R has more than 4000 packages available from multiple
repositories in various specializations.
 CRAN (Comprehensive R Archive Network).
 R can import data from csv files, excel, sas and produces
the output in pdf, jpg, png formats and also table output.
Big data Analytics
R langauge
Working with R studio, loading packages, extracting
the tweets.
Big data Analytics
Unstructured data Analysis for Motor Insurance
 Extracting the data from social media related to Motor
insurance sector.
 Company names, keywords.
 Getting the tweets from twitter and analyzing the data.
 Sentiment analysis.
 User interface.
 What type of insurance can be given or any fraud detection?
Big data Analytics
Extracting data from Twitter using R
 Need to create an app
 api_key
 api_token
 access_token
 access_secret
Big data Analytics
Sentiment Analysis from Twitter
Big data Analytics
Comparison between data mining tools
Characteristic R Rapidminer Weka
Purpose Statistics,Clusteirng
and analytics
Data Mining,
Classification
Data Mining,
Association.
Data Import .xlsx, csv,
RODBSC, .txt
.csv.xlsx, binary files .csv.arff
Specialization It has a large
number of users, in
the fields of bio-
informatics and
social science.
Specialized for
Business solutions that
include predictive
analysis and statistical
computing.
Weka is best suited
for mining
association rules
and machine
learning techniques.
Advantages Purely statistical Visualization,
Parameter
optimization
Ease of use and
machine learning
Big data Analytics

More Related Content

Big data analytics

  • 2. Big data Analytics Introduction  Unstructured data contains different (multiple) types of data  Unstructured data is a generic label for describing data that is not contained in a database or some other type of data structure.  Unstructured data contains everything and presents everywhere globally.  More than 90% of social media data is unstructured.
  • 3. Big data Analytics Importance of Unstructured Data  Every minute, there are more than 6,000 pictures shared on social media sites and more than 200 million emails sent.  Analyzing social content such as Tweets, Facebook posts and transcripts from support calls gives a clear view of how customers perceive the value and issues.  Unstructured data isn't well organized or easy to access, but companies who analyze this data and integrate it into their information management landscape can significantly improve employee productivity.
  • 4. Big data Analytics Examples of Unstructured Data  e-mail messages, word processing documents, videos, photos, audio files, presentations, web pages.  Examples of "unstructured data" may include books, journals, documents, metadata, health records, audio, video, analog data, images, files.
  • 5. Big data Analytics Influence of Unstructured data on Social media  The social media needs to be part of the business strategy by interacting with clients on customer.  The statistics contain the number of Twitter followers, Facebook likes, LinkedIn connections, blog subscribers.  Social media like Facebook is growing enormously with the massive amount of unstructured data, they are collecting.  Twitter sees about 175 million tweets each day and has more than 465 million accounts.
  • 7. Big data Analytics Technologies  Data mining  Pattern Recognition  Operations Research  Social Network Analytics (Facebook, Twitter, LinkedIn)  Natural Language Processing
  • 8. Big data Analytics Tools to analyze  R language  Rapid Miner  Weka  Hadoop  Python
  • 9. Big data Analytics RapidMiner  Rapidminer provides an integrated environment for machine learning, data mining, text mining, predictive analytics.  It is the most powerful tool, easy to use and intuitive graphical interface for the design of analytic process.  The code is written in JAVA.  Runs on all major platforms and operating system.  Save time by identifying possible errors, and get suggested quick fixes and support .csv, excel and binary files.
  • 10. Big data Analytics RapidMiner Imported from csv, excel files. Statistics, charts.
  • 11. Big data Analytics Weka  Weka is a collection of machine learning algorithms.  It contains tools for data pre-processing, classification, regression, clustering, association rules, and visualization.  It is s written in Java and runs on almost any platform.  Large collection of different data mining algorithms.
  • 12. Big data Analytics Python  Connect python with R by installing package “Rserve”  High level language and easy to interpret.  Free and open source, runs on all platforms.
  • 13. Big data Analytics R language  R is very effective statistical tool and well worth the effort to learn.  R is polymorphic, which means that the same function can be applied to different types of objects.  R has more than 4000 packages available from multiple repositories in various specializations.  CRAN (Comprehensive R Archive Network).  R can import data from csv files, excel, sas and produces the output in pdf, jpg, png formats and also table output.
  • 14. Big data Analytics R langauge Working with R studio, loading packages, extracting the tweets.
  • 15. Big data Analytics Unstructured data Analysis for Motor Insurance  Extracting the data from social media related to Motor insurance sector.  Company names, keywords.  Getting the tweets from twitter and analyzing the data.  Sentiment analysis.  User interface.  What type of insurance can be given or any fraud detection?
  • 16. Big data Analytics Extracting data from Twitter using R  Need to create an app  api_key  api_token  access_token  access_secret
  • 17. Big data Analytics Sentiment Analysis from Twitter
  • 18. Big data Analytics Comparison between data mining tools Characteristic R Rapidminer Weka Purpose Statistics,Clusteirng and analytics Data Mining, Classification Data Mining, Association. Data Import .xlsx, csv, RODBSC, .txt .csv.xlsx, binary files .csv.arff Specialization It has a large number of users, in the fields of bio- informatics and social science. Specialized for Business solutions that include predictive analysis and statistical computing. Weka is best suited for mining association rules and machine learning techniques. Advantages Purely statistical Visualization, Parameter optimization Ease of use and machine learning