SlideShare a Scribd company logo
NADAR SARSWATHI COLLEGE
OF ARTS AND SCIENCE
THENI
DEPARTMENT OF INFORMATION
TECHNOLOGY
INTERNET OF THINGS
USING HADOOP MAPREDUCE FOR BATCH DATA
ANALYSIS
BY:
S. SABTHAMI
II. M.Sc(IT)
What is MapReduce?
• Terms are borrowed from Functional Language (e.g., Lisp)
Sum of squares:
• (map square ‘(1 2 3 4))
– Output: (1 4 9 16)
[processes each record sequentially and independently]
• (reduce + ‘(1 4 9 16))
– (+ 16 (+ 9 (+ 4 1) ) )
– Output: 30
[processes set of all records in batches]
Map
• Process individual records to generate
intermediate key/value pairs.
Map
• Process individual records to generate
intermediate key/value pairs.
• Parallelly Process a large number of
individual records to generate intermediate
key/value pairs.

Recommended for you

MapReduce.pptx
MapReduce.pptxMapReduce.pptx
MapReduce.pptx

This document summarizes the MapReduce programming model and its associated implementation for processing large datasets across distributed systems. It describes how MapReduce allows users to express computations over large datasets in a simple way, while hiding the complexity of parallelization, fault-tolerance, and data distribution. The core abstractions of Map and Reduce are explained, along with how an implementation like Google's leverages a distributed file system to parallelize tasks across clusters and provide fault-tolerance through replication.

map reduce
Stratosphere with big_data_analytics
Stratosphere with big_data_analyticsStratosphere with big_data_analytics
Stratosphere with big_data_analytics

1) Stratosphere is a distributed data processing system that extends the MapReduce model by supporting more operators and advanced data flow graphs composed of operators. 2) It has components like a query parser, compiler, and optimizer that translate queries into execution plans composed of operators like Map, Reduce, Join, Cross, CoGroup, and Union. 3) Stratosphere supports arbitrary data flows while MapReduce only supports MapReduce, and Stratosphere has better performance through in-memory processing and pipelining compared to MapReduce which always writes to disk.

Distributed Computing with Apache Hadoop. Introduction to MapReduce.
Distributed Computing with Apache Hadoop. Introduction to MapReduce.Distributed Computing with Apache Hadoop. Introduction to MapReduce.
Distributed Computing with Apache Hadoop. Introduction to MapReduce.

Abstract: The presentation describes - What is the BigData problem - How Hadoop helps to solve BigData problems - The main principles of the Hadoop architecture as a distributed computational platform - History and definition of the MapReduce computational model - Practical examples of how to write MapReduce programs and run them on Hadoop clusters The talk is targeted to a wide audience of engineers who do not have experience using Hadoop.

apache hadoopmapreducedistributed computing
Reduce
• Reduce processes and merges all intermediate
values associated per key
• Each key assigned to one Reduce
• Parallelly Processes and merges all
intermediate values by partitioning keys
Some Applications of MapReduce
Distributed Grep:
– Input: large set of files
– Output: lines that match pattern
– Map – Emits a line if it matches the supplied pattern
– Reduce – Copies the intermediate data to output
Some Applications of MapReduce
(2)
Reverse Web-Link Graph
– Input: Web graph: tuples (a, b) where (page a  page b)
– Output: For each page, list of pages that link to it
– Map – process web log and for each input <source, target>, it outputs
<target, source>
– Reduce - emits <target, list(source)>
Some Applications of MapReduce
(3)
Count of URL access frequency
– Input: Log of accessed URLs, e.g., from proxy server
– Output: For each URL, % of total accesses for that URL
– Map – Process web log and outputs <URL, 1>
– Multiple Reducers - Emits <URL, URL_count>
(So far, like Wordcount. But still need %)
– Chain another MapReduce job after above one
– Map – Processes <URL, URL_count> and outputs <1,
(<URL, URL_count> )>
�� 1 Reducer – Sums up URL_count’s to calculate
overall_count.
Emits multiple <URL, URL_count/overall_count>

Recommended for you

Hadoop Architecture
Hadoop ArchitectureHadoop Architecture
Hadoop Architecture

This presentation discusses the following topics: Introduction Components of Hadoop MapReduce Map Task Reduce Task Anatomy of a Map Reduce

hadoopbig datadata analytics
Hadoop eco system with mapreduce hive and pig
Hadoop eco system with mapreduce hive and pigHadoop eco system with mapreduce hive and pig
Hadoop eco system with mapreduce hive and pig

This document provides an overview of MapReduce architecture and components. It discusses how MapReduce processes data using map and reduce tasks on key-value pairs. The JobTracker manages jobs by scheduling tasks on TaskTrackers. Data is partitioned and sorted during the shuffle and sort phase before being processed by reducers. Components like Hive, Pig, partitions, combiners, and HBase are described in the context of how they integrate with and optimize MapReduce processing.

Unit3 MapReduce
Unit3 MapReduceUnit3 MapReduce
Unit3 MapReduce

MapReduce is a programming model used for processing large datasets in a distributed computing environment. It consists of two main tasks - the Map task which converts input data into intermediate key-value pairs, and the Reduce task which combines these intermediate pairs into a smaller set of output pairs. The MapReduce framework operates on input and output in the form of key-value pairs, with the keys and values implemented as serializable Java objects. It divides jobs into map and reduce tasks executed in parallel on a cluster, with a JobTracker coordinating task assignment and tracking progress.

mapreducemappersplitter
Some Applications of MapReduce
(4)
Map task’s output is sorted (e.g., quicksort)
Reduce task’s input is sorted (e.g., mergesort)
Sort
– Input: Series of (key, value) pairs
– Output: Sorted <value>s
– Map – <key, value>  <value, _> (identity)
– Reducer – <key, value>  <key, value> (identity)
– Partitioning function – partition keys across reducers based
on ranges (can’t use hashing!)
• Take data distribution into account to balance reducer
tasks
Programming MapReduce
Externally: For user
1. Write a Map program (short), write a Reduce program (short)
2. Specify number of Maps and Reduces (parallelism level)
3. Submit job; wait for result
4. Need to know very little about parallel/distributed programming!
Internally: For the Paradigm and Scheduler
1. Parallelize Map
2. Transfer data from Map to Reduce
3. Parallelize Reduce
4. Implement Storage for Map input, Map output, Reduce input, and
Reduce output
(Ensure that no Reduce starts before all Maps are finished. That is,
ensure the barrier between the Map phase and Reduce phase)
Inside MapReduce
1. Parallelize Map: easy! each map task is independent of
the other!
• All Map output records with same key assigned to
same Reduce
2. Transfer data from Map to Reduce:
• All Map output records with same key assigned to
same Reduce task
• use partitioning function, e.g., hash(key)%number of
reducers
iot.pptx

Recommended for you

Meethadoop
MeethadoopMeethadoop
Meethadoop

Hadoop is an open source framework for running large-scale data processing jobs across clusters of computers. It has two main components: HDFS for reliable storage and Hadoop MapReduce for distributed processing. HDFS stores large files across nodes through replication and uses a master-slave architecture. MapReduce allows users to write map and reduce functions to process large datasets in parallel and generate results. Hadoop has seen widespread adoption for processing massive datasets due to its scalability, reliability and ease of use.

Lecture 2 part 3
Lecture 2 part 3Lecture 2 part 3
Lecture 2 part 3

The document discusses key concepts related to Hadoop including its components like HDFS, MapReduce, Pig, Hive, and HBase. It provides explanations of HDFS architecture and functions, how MapReduce works through map and reduce phases, and how higher-level tools like Pig and Hive allow for more simplified programming compared to raw MapReduce. The summary also mentions that HBase is a NoSQL database that provides fast random access to large datasets on Hadoop, while HCatalog provides a relational abstraction layer for HDFS data.

Introduction to MapReduce
Introduction to MapReduceIntroduction to MapReduce
Introduction to MapReduce

The document provides an introduction to MapReduce, including: - MapReduce is a framework for executing parallel algorithms across large datasets using commodity computers. It is based on map and reduce functions. - Mappers process input key-value pairs in parallel, and outputs are sorted and grouped by the reducers. - Examples demonstrate how MapReduce can be used for tasks like building indexes, joins, and iterative algorithms.

More Related Content

Similar to iot.pptx

Hadoop first mr job - inverted index construction
Hadoop first mr job - inverted index constructionHadoop first mr job - inverted index construction
Hadoop first mr job - inverted index construction
Subhas Kumar Ghosh
 
Hadoop
HadoopHadoop
Big Data Analytics with Hadoop with @techmilind
Big Data Analytics with Hadoop with @techmilindBig Data Analytics with Hadoop with @techmilind
Big Data Analytics with Hadoop with @techmilind
EMC
 
MapReduce.pptx
MapReduce.pptxMapReduce.pptx
MapReduce.pptx
AtulYadav218546
 
Stratosphere with big_data_analytics
Stratosphere with big_data_analyticsStratosphere with big_data_analytics
Stratosphere with big_data_analytics
Avinash Pandu
 
Distributed Computing with Apache Hadoop. Introduction to MapReduce.
Distributed Computing with Apache Hadoop. Introduction to MapReduce.Distributed Computing with Apache Hadoop. Introduction to MapReduce.
Distributed Computing with Apache Hadoop. Introduction to MapReduce.
Konstantin V. Shvachko
 
Hadoop Architecture
Hadoop ArchitectureHadoop Architecture
Hadoop Architecture
Dr. C.V. Suresh Babu
 
Hadoop eco system with mapreduce hive and pig
Hadoop eco system with mapreduce hive and pigHadoop eco system with mapreduce hive and pig
Hadoop eco system with mapreduce hive and pig
KhanKhaja1
 
Unit3 MapReduce
Unit3 MapReduceUnit3 MapReduce
Meethadoop
MeethadoopMeethadoop
Meethadoop
IIIT-H
 
Lecture 2 part 3
Lecture 2 part 3Lecture 2 part 3
Lecture 2 part 3
Jazan University
 
Introduction to MapReduce
Introduction to MapReduceIntroduction to MapReduce
Introduction to MapReduce
Chicago Hadoop Users Group
 
Large Scale Data Analysis with Map/Reduce, part I
Large Scale Data Analysis with Map/Reduce, part ILarge Scale Data Analysis with Map/Reduce, part I
Large Scale Data Analysis with Map/Reduce, part I
Marin Dimitrov
 
Hadoop and Mapreduce for .NET User Group
Hadoop and Mapreduce for .NET User GroupHadoop and Mapreduce for .NET User Group
Hadoop and Mapreduce for .NET User Group
Csaba Toth
 
Hadoop classes in mumbai
Hadoop classes in mumbaiHadoop classes in mumbai
Hadoop classes in mumbai
Vibrant Technologies & Computers
 
Introduction to the Map-Reduce framework.pdf
Introduction to the Map-Reduce framework.pdfIntroduction to the Map-Reduce framework.pdf
Introduction to the Map-Reduce framework.pdf
BikalAdhikari4
 
Map Reduce
Map ReduceMap Reduce
Map Reduce
Sri Prasanna
 
Scala and spark
Scala and sparkScala and spark
Scala and spark
Fabio Fumarola
 
Map reduce presentation
Map reduce presentationMap reduce presentation
Map reduce presentation
ateeq ateeq
 
AI與大數據數據處理 Spark實戰(20171216)
AI與大數據數據處理 Spark實戰(20171216)AI與大數據數據處理 Spark實戰(20171216)
AI與大數據數據處理 Spark實戰(20171216)
Paul Chao
 

Similar to iot.pptx (20)

Hadoop first mr job - inverted index construction
Hadoop first mr job - inverted index constructionHadoop first mr job - inverted index construction
Hadoop first mr job - inverted index construction
 
Hadoop
HadoopHadoop
Hadoop
 
Big Data Analytics with Hadoop with @techmilind
Big Data Analytics with Hadoop with @techmilindBig Data Analytics with Hadoop with @techmilind
Big Data Analytics with Hadoop with @techmilind
 
MapReduce.pptx
MapReduce.pptxMapReduce.pptx
MapReduce.pptx
 
Stratosphere with big_data_analytics
Stratosphere with big_data_analyticsStratosphere with big_data_analytics
Stratosphere with big_data_analytics
 
Distributed Computing with Apache Hadoop. Introduction to MapReduce.
Distributed Computing with Apache Hadoop. Introduction to MapReduce.Distributed Computing with Apache Hadoop. Introduction to MapReduce.
Distributed Computing with Apache Hadoop. Introduction to MapReduce.
 
Hadoop Architecture
Hadoop ArchitectureHadoop Architecture
Hadoop Architecture
 
Hadoop eco system with mapreduce hive and pig
Hadoop eco system with mapreduce hive and pigHadoop eco system with mapreduce hive and pig
Hadoop eco system with mapreduce hive and pig
 
Unit3 MapReduce
Unit3 MapReduceUnit3 MapReduce
Unit3 MapReduce
 
Meethadoop
MeethadoopMeethadoop
Meethadoop
 
Lecture 2 part 3
Lecture 2 part 3Lecture 2 part 3
Lecture 2 part 3
 
Introduction to MapReduce
Introduction to MapReduceIntroduction to MapReduce
Introduction to MapReduce
 
Large Scale Data Analysis with Map/Reduce, part I
Large Scale Data Analysis with Map/Reduce, part ILarge Scale Data Analysis with Map/Reduce, part I
Large Scale Data Analysis with Map/Reduce, part I
 
Hadoop and Mapreduce for .NET User Group
Hadoop and Mapreduce for .NET User GroupHadoop and Mapreduce for .NET User Group
Hadoop and Mapreduce for .NET User Group
 
Hadoop classes in mumbai
Hadoop classes in mumbaiHadoop classes in mumbai
Hadoop classes in mumbai
 
Introduction to the Map-Reduce framework.pdf
Introduction to the Map-Reduce framework.pdfIntroduction to the Map-Reduce framework.pdf
Introduction to the Map-Reduce framework.pdf
 
Map Reduce
Map ReduceMap Reduce
Map Reduce
 
Scala and spark
Scala and sparkScala and spark
Scala and spark
 
Map reduce presentation
Map reduce presentationMap reduce presentation
Map reduce presentation
 
AI與大數據數據處理 Spark實戰(20171216)
AI與大數據數據處理 Spark實戰(20171216)AI與大數據數據處理 Spark實戰(20171216)
AI與大數據數據處理 Spark實戰(20171216)
 

More from SabthamiS1

women%20empowerment11.pptx
women%20empowerment11.pptxwomen%20empowerment11.pptx
women%20empowerment11.pptx
SabthamiS1
 
big data analytics.pptx
big data analytics.pptxbig data analytics.pptx
big data analytics.pptx
SabthamiS1
 
dip.pptx
dip.pptxdip.pptx
dip.pptx
SabthamiS1
 
csc.pptx
csc.pptxcsc.pptx
csc.pptx
SabthamiS1
 
python.pptx
python.pptxpython.pptx
python.pptx
SabthamiS1
 
Data minig.pptx
Data minig.pptxData minig.pptx
Data minig.pptx
SabthamiS1
 
artificial intelligence.pptx
artificial intelligence.pptxartificial intelligence.pptx
artificial intelligence.pptx
SabthamiS1
 
distributed computing.pptx
distributed computing.pptxdistributed computing.pptx
distributed computing.pptx
SabthamiS1
 
Network and internet security
Network and internet security Network and internet security
Network and internet security
SabthamiS1
 
Java
Java Java
Java
SabthamiS1
 
Advance computer architecture
Advance computer architecture Advance computer architecture
Advance computer architecture
SabthamiS1
 
Data structure and algorithm
Data structure and algorithmData structure and algorithm
Data structure and algorithm
SabthamiS1
 

More from SabthamiS1 (12)

women%20empowerment11.pptx
women%20empowerment11.pptxwomen%20empowerment11.pptx
women%20empowerment11.pptx
 
big data analytics.pptx
big data analytics.pptxbig data analytics.pptx
big data analytics.pptx
 
dip.pptx
dip.pptxdip.pptx
dip.pptx
 
csc.pptx
csc.pptxcsc.pptx
csc.pptx
 
python.pptx
python.pptxpython.pptx
python.pptx
 
Data minig.pptx
Data minig.pptxData minig.pptx
Data minig.pptx
 
artificial intelligence.pptx
artificial intelligence.pptxartificial intelligence.pptx
artificial intelligence.pptx
 
distributed computing.pptx
distributed computing.pptxdistributed computing.pptx
distributed computing.pptx
 
Network and internet security
Network and internet security Network and internet security
Network and internet security
 
Java
Java Java
Java
 
Advance computer architecture
Advance computer architecture Advance computer architecture
Advance computer architecture
 
Data structure and algorithm
Data structure and algorithmData structure and algorithm
Data structure and algorithm
 

Recently uploaded

AI Risk Management: ISO/IEC 42001, the EU AI Act, and ISO/IEC 23894
AI Risk Management: ISO/IEC 42001, the EU AI Act, and ISO/IEC 23894AI Risk Management: ISO/IEC 42001, the EU AI Act, and ISO/IEC 23894
AI Risk Management: ISO/IEC 42001, the EU AI Act, and ISO/IEC 23894
PECB
 
eeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee
eeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee
eeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee
siemaillard
 
How to Add Colour Kanban Records in Odoo 17 Notebook
How to Add Colour Kanban Records in Odoo 17 NotebookHow to Add Colour Kanban Records in Odoo 17 Notebook
How to Add Colour Kanban Records in Odoo 17 Notebook
Celine George
 
The Jewish Trinity : Sabbath,Shekinah and Sanctuary 4.pdf
The Jewish Trinity : Sabbath,Shekinah and Sanctuary 4.pdfThe Jewish Trinity : Sabbath,Shekinah and Sanctuary 4.pdf
The Jewish Trinity : Sabbath,Shekinah and Sanctuary 4.pdf
JackieSparrow3
 
Unlocking Educational Synergy-DIKSHA & Google Classroom.pptx
Unlocking Educational Synergy-DIKSHA & Google Classroom.pptxUnlocking Educational Synergy-DIKSHA & Google Classroom.pptx
Unlocking Educational Synergy-DIKSHA & Google Classroom.pptx
bipin95
 
No, it's not a robot: prompt writing for investigative journalism
No, it's not a robot: prompt writing for investigative journalismNo, it's not a robot: prompt writing for investigative journalism
No, it's not a robot: prompt writing for investigative journalism
Paul Bradshaw
 
The basics of sentences session 10pptx.pptx
The basics of sentences session 10pptx.pptxThe basics of sentences session 10pptx.pptx
The basics of sentences session 10pptx.pptx
heathfieldcps1
 
AI_in_HR_Presentation Part 1 2024 0703.pdf
AI_in_HR_Presentation Part 1 2024 0703.pdfAI_in_HR_Presentation Part 1 2024 0703.pdf
AI_in_HR_Presentation Part 1 2024 0703.pdf
SrimanigandanMadurai
 
2024 KWL Back 2 School Summer Conference
2024 KWL Back 2 School Summer Conference2024 KWL Back 2 School Summer Conference
2024 KWL Back 2 School Summer Conference
KlettWorldLanguages
 
SYBCOM SEM III UNIT 1 INTRODUCTION TO ADVERTISING
SYBCOM SEM III UNIT 1 INTRODUCTION TO ADVERTISINGSYBCOM SEM III UNIT 1 INTRODUCTION TO ADVERTISING
SYBCOM SEM III UNIT 1 INTRODUCTION TO ADVERTISING
Dr Vijay Vishwakarma
 
NAEYC Code of Ethical Conduct Resource Book
NAEYC Code of Ethical Conduct Resource BookNAEYC Code of Ethical Conduct Resource Book
NAEYC Code of Ethical Conduct Resource Book
lakitawilson
 
Lecture_Notes_Unit4_Chapter_8_9_10_RDBMS for the students affiliated by alaga...
Lecture_Notes_Unit4_Chapter_8_9_10_RDBMS for the students affiliated by alaga...Lecture_Notes_Unit4_Chapter_8_9_10_RDBMS for the students affiliated by alaga...
Lecture_Notes_Unit4_Chapter_8_9_10_RDBMS for the students affiliated by alaga...
Murugan Solaiyappan
 
NLC English 7 Consolidation Lesson plan for teacher
NLC English 7 Consolidation Lesson plan for teacherNLC English 7 Consolidation Lesson plan for teacher
NLC English 7 Consolidation Lesson plan for teacher
AngelicaLubrica
 
ENGLISH-7-CURRICULUM MAP- MATATAG CURRICULUM
ENGLISH-7-CURRICULUM MAP- MATATAG CURRICULUMENGLISH-7-CURRICULUM MAP- MATATAG CURRICULUM
ENGLISH-7-CURRICULUM MAP- MATATAG CURRICULUM
HappieMontevirgenCas
 
National Learning Camp( Reading Intervention for grade1)
National Learning Camp( Reading Intervention for grade1)National Learning Camp( Reading Intervention for grade1)
National Learning Camp( Reading Intervention for grade1)
SaadaGrijaldo1
 
How to Show Sample Data in Tree and Kanban View in Odoo 17
How to Show Sample Data in Tree and Kanban View in Odoo 17How to Show Sample Data in Tree and Kanban View in Odoo 17
How to Show Sample Data in Tree and Kanban View in Odoo 17
Celine George
 
Bedok NEWater Photostory - COM322 Assessment (Story 2)
Bedok NEWater Photostory - COM322 Assessment (Story 2)Bedok NEWater Photostory - COM322 Assessment (Story 2)
Bedok NEWater Photostory - COM322 Assessment (Story 2)
Liyana Rozaini
 
NLC Grade 3.................................... ppt.pptx
NLC Grade 3.................................... ppt.pptxNLC Grade 3.................................... ppt.pptx
NLC Grade 3.................................... ppt.pptx
MichelleDeLaCruz93
 
Configuring Single Sign-On (SSO) via Identity Management | MuleSoft Mysore Me...
Configuring Single Sign-On (SSO) via Identity Management | MuleSoft Mysore Me...Configuring Single Sign-On (SSO) via Identity Management | MuleSoft Mysore Me...
Configuring Single Sign-On (SSO) via Identity Management | MuleSoft Mysore Me...
MysoreMuleSoftMeetup
 
The membership Module in the Odoo 17 ERP
The membership Module in the Odoo 17 ERPThe membership Module in the Odoo 17 ERP
The membership Module in the Odoo 17 ERP
Celine George
 

Recently uploaded (20)

AI Risk Management: ISO/IEC 42001, the EU AI Act, and ISO/IEC 23894
AI Risk Management: ISO/IEC 42001, the EU AI Act, and ISO/IEC 23894AI Risk Management: ISO/IEC 42001, the EU AI Act, and ISO/IEC 23894
AI Risk Management: ISO/IEC 42001, the EU AI Act, and ISO/IEC 23894
 
eeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee
eeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee
eeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee
 
How to Add Colour Kanban Records in Odoo 17 Notebook
How to Add Colour Kanban Records in Odoo 17 NotebookHow to Add Colour Kanban Records in Odoo 17 Notebook
How to Add Colour Kanban Records in Odoo 17 Notebook
 
The Jewish Trinity : Sabbath,Shekinah and Sanctuary 4.pdf
The Jewish Trinity : Sabbath,Shekinah and Sanctuary 4.pdfThe Jewish Trinity : Sabbath,Shekinah and Sanctuary 4.pdf
The Jewish Trinity : Sabbath,Shekinah and Sanctuary 4.pdf
 
Unlocking Educational Synergy-DIKSHA & Google Classroom.pptx
Unlocking Educational Synergy-DIKSHA & Google Classroom.pptxUnlocking Educational Synergy-DIKSHA & Google Classroom.pptx
Unlocking Educational Synergy-DIKSHA & Google Classroom.pptx
 
No, it's not a robot: prompt writing for investigative journalism
No, it's not a robot: prompt writing for investigative journalismNo, it's not a robot: prompt writing for investigative journalism
No, it's not a robot: prompt writing for investigative journalism
 
The basics of sentences session 10pptx.pptx
The basics of sentences session 10pptx.pptxThe basics of sentences session 10pptx.pptx
The basics of sentences session 10pptx.pptx
 
AI_in_HR_Presentation Part 1 2024 0703.pdf
AI_in_HR_Presentation Part 1 2024 0703.pdfAI_in_HR_Presentation Part 1 2024 0703.pdf
AI_in_HR_Presentation Part 1 2024 0703.pdf
 
2024 KWL Back 2 School Summer Conference
2024 KWL Back 2 School Summer Conference2024 KWL Back 2 School Summer Conference
2024 KWL Back 2 School Summer Conference
 
SYBCOM SEM III UNIT 1 INTRODUCTION TO ADVERTISING
SYBCOM SEM III UNIT 1 INTRODUCTION TO ADVERTISINGSYBCOM SEM III UNIT 1 INTRODUCTION TO ADVERTISING
SYBCOM SEM III UNIT 1 INTRODUCTION TO ADVERTISING
 
NAEYC Code of Ethical Conduct Resource Book
NAEYC Code of Ethical Conduct Resource BookNAEYC Code of Ethical Conduct Resource Book
NAEYC Code of Ethical Conduct Resource Book
 
Lecture_Notes_Unit4_Chapter_8_9_10_RDBMS for the students affiliated by alaga...
Lecture_Notes_Unit4_Chapter_8_9_10_RDBMS for the students affiliated by alaga...Lecture_Notes_Unit4_Chapter_8_9_10_RDBMS for the students affiliated by alaga...
Lecture_Notes_Unit4_Chapter_8_9_10_RDBMS for the students affiliated by alaga...
 
NLC English 7 Consolidation Lesson plan for teacher
NLC English 7 Consolidation Lesson plan for teacherNLC English 7 Consolidation Lesson plan for teacher
NLC English 7 Consolidation Lesson plan for teacher
 
ENGLISH-7-CURRICULUM MAP- MATATAG CURRICULUM
ENGLISH-7-CURRICULUM MAP- MATATAG CURRICULUMENGLISH-7-CURRICULUM MAP- MATATAG CURRICULUM
ENGLISH-7-CURRICULUM MAP- MATATAG CURRICULUM
 
National Learning Camp( Reading Intervention for grade1)
National Learning Camp( Reading Intervention for grade1)National Learning Camp( Reading Intervention for grade1)
National Learning Camp( Reading Intervention for grade1)
 
How to Show Sample Data in Tree and Kanban View in Odoo 17
How to Show Sample Data in Tree and Kanban View in Odoo 17How to Show Sample Data in Tree and Kanban View in Odoo 17
How to Show Sample Data in Tree and Kanban View in Odoo 17
 
Bedok NEWater Photostory - COM322 Assessment (Story 2)
Bedok NEWater Photostory - COM322 Assessment (Story 2)Bedok NEWater Photostory - COM322 Assessment (Story 2)
Bedok NEWater Photostory - COM322 Assessment (Story 2)
 
NLC Grade 3.................................... ppt.pptx
NLC Grade 3.................................... ppt.pptxNLC Grade 3.................................... ppt.pptx
NLC Grade 3.................................... ppt.pptx
 
Configuring Single Sign-On (SSO) via Identity Management | MuleSoft Mysore Me...
Configuring Single Sign-On (SSO) via Identity Management | MuleSoft Mysore Me...Configuring Single Sign-On (SSO) via Identity Management | MuleSoft Mysore Me...
Configuring Single Sign-On (SSO) via Identity Management | MuleSoft Mysore Me...
 
The membership Module in the Odoo 17 ERP
The membership Module in the Odoo 17 ERPThe membership Module in the Odoo 17 ERP
The membership Module in the Odoo 17 ERP
 

iot.pptx

  • 1. NADAR SARSWATHI COLLEGE OF ARTS AND SCIENCE THENI DEPARTMENT OF INFORMATION TECHNOLOGY INTERNET OF THINGS USING HADOOP MAPREDUCE FOR BATCH DATA ANALYSIS BY: S. SABTHAMI II. M.Sc(IT)
  • 2. What is MapReduce? • Terms are borrowed from Functional Language (e.g., Lisp) Sum of squares: • (map square ‘(1 2 3 4)) – Output: (1 4 9 16) [processes each record sequentially and independently] • (reduce + ‘(1 4 9 16)) – (+ 16 (+ 9 (+ 4 1) ) ) – Output: 30 [processes set of all records in batches]
  • 3. Map • Process individual records to generate intermediate key/value pairs.
  • 4. Map • Process individual records to generate intermediate key/value pairs. • Parallelly Process a large number of individual records to generate intermediate key/value pairs.
  • 5. Reduce • Reduce processes and merges all intermediate values associated per key • Each key assigned to one Reduce • Parallelly Processes and merges all intermediate values by partitioning keys
  • 6. Some Applications of MapReduce Distributed Grep: – Input: large set of files – Output: lines that match pattern – Map – Emits a line if it matches the supplied pattern – Reduce – Copies the intermediate data to output
  • 7. Some Applications of MapReduce (2) Reverse Web-Link Graph – Input: Web graph: tuples (a, b) where (page a  page b) – Output: For each page, list of pages that link to it – Map – process web log and for each input <source, target>, it outputs <target, source> – Reduce - emits <target, list(source)>
  • 8. Some Applications of MapReduce (3) Count of URL access frequency – Input: Log of accessed URLs, e.g., from proxy server – Output: For each URL, % of total accesses for that URL – Map – Process web log and outputs <URL, 1> – Multiple Reducers - Emits <URL, URL_count> (So far, like Wordcount. But still need %) – Chain another MapReduce job after above one – Map – Processes <URL, URL_count> and outputs <1, (<URL, URL_count> )> – 1 Reducer – Sums up URL_count’s to calculate overall_count. Emits multiple <URL, URL_count/overall_count>
  • 9. Some Applications of MapReduce (4) Map task’s output is sorted (e.g., quicksort) Reduce task’s input is sorted (e.g., mergesort) Sort – Input: Series of (key, value) pairs – Output: Sorted <value>s – Map – <key, value>  <value, _> (identity) – Reducer – <key, value>  <key, value> (identity) – Partitioning function – partition keys across reducers based on ranges (can’t use hashing!) • Take data distribution into account to balance reducer tasks
  • 10. Programming MapReduce Externally: For user 1. Write a Map program (short), write a Reduce program (short) 2. Specify number of Maps and Reduces (parallelism level) 3. Submit job; wait for result 4. Need to know very little about parallel/distributed programming! Internally: For the Paradigm and Scheduler 1. Parallelize Map 2. Transfer data from Map to Reduce 3. Parallelize Reduce 4. Implement Storage for Map input, Map output, Reduce input, and Reduce output (Ensure that no Reduce starts before all Maps are finished. That is, ensure the barrier between the Map phase and Reduce phase)
  • 11. Inside MapReduce 1. Parallelize Map: easy! each map task is independent of the other! • All Map output records with same key assigned to same Reduce 2. Transfer data from Map to Reduce: • All Map output records with same key assigned to same Reduce task • use partitioning function, e.g., hash(key)%number of reducers