Distributed Computing with
Apache Hadoop
Introduction to MapReduce
Konstantin V. Shvachko
Birmingham Big Data Science Group
October 19, 2011
• History of computing started long time ago
• Fascination with numbers
– Vast universe with simple strict rules
– Computing devices
– Crunch numbers
• The Internet
– Universe of words, fuzzy rules
– Different type of computing
– Understand meaning of things
– Human thinking
– Errors & deviations are a
part of study
Computer History Museum, San Jose
Words vs. Numbers
• In 1997 IBM built Deep Blue supercomputer
– Playing chess game with the champion G. Kasparov
– Human race was defeated
– Strict rules for Chess
– Fast deep analyses of current state
– Still numbers
• In 2011 IBM built Watson computer to
play Jeopardy
– Questions and hints in human terms
– Analysis of texts from library and the
– Human champions defeated
Big Data
• Computations that need the power of many computers
– Large datasets: hundreds of TBs, PBs
– Or use of thousands of CPUs in parallel
– Or both
• Cluster as a computer
What is a PB?
1 KB = 1000 Bytes
1 MB = 1000 KB
1 GB = 1000 MB
1 TB = 1000 GB
1 PB = 1000 TB
???? = 1000 PB

Examples – Science
• Fundamental physics: Large Hadron Collider (LHC)
– Smashing high-energy protons at the speed of light
– 1 PB of event data per sec, most filtered out
– 15 PB of data per year
– 150 computing centers around the World
– 160 PB of disk + 90 PB of tape storage
• Math: Big Numbers
– 2 quadrillionth (1015) digit of π is 0
– pure CPU workload
– 12 days of cluster time
– 208 years of CPU-time on a cluster with 7600 CPU cores
• Big Data – Big Science
Examples – Web
• Search engine Webmap
– Map of the Internet
– 2008 @ Yahoo, 1500 nodes, 5 PB raw storage
• Internet Search Index
– Traditional application
• Social Network Analysis
– Intelligence
– Trends
The Sorting Problem
• Classic in-memory sorting
– Complexity: number of comparisons
• External sorting
– Cannot load all data in memory
– 16 GB RAM vs. 200 GB file
– Complexity: + disk IOs (bytes read or written)
• Distributed sorting
– Cannot load data on a single server
– 12 drives * 2 TB = 24 TB disc space vs. 200 TB data set
– Complexity: + network transfers
Worst Average Space
Bubble Sort O(n2) O(n2) In-place
Quicksort O(n2) O(n log n) In-place
Merge Sort O(n log n) O(n log n) Double
What do we do?
• Need a lot of computers
• How to make them work together

• Apache Hadoop is an ecosystem of
tools for processing “Big Data”
• Started in 2005 by D. Cutting and M. Cafarella
• Consists of two main components: Providing unified cluster view
1. HDFS – a distributed file system
– File system API connecting thousands of drives
2. MapReduce – a framework for distributed computations
– Splitting jobs into parts executable on one node
– Scheduling and monitoring of job execution
• Today used everywhere: Becoming a standard of distributed computing
• Hadoop is an open source project
• MapReduce
– 2004 Jeffrey Dean, Sanjay Ghemawat. Google.
– “MapReduce: Simplified Data Processing on Large Clusters”
• Computational model
– What is a comp. model ???
• Turing machine, Java
– Split large input data into small enough pieces, process in parallel
• Execution framework
– Compilers, interpreters
– Scheduling, Processing, Coordination
– Failure recovery
Functional Programming
• Map a higher-order function
– applies a given function to each element of a list
– returns the list of results
• Map( f(x), X[1:n] ) -> [ f(X[1]), …, f(X[n]) ]
• Example. Map( x2, [0,1,2,3,4,5] ) = [0,1,4,9,16,25]
Example: Sum of Squares
• Composition of
– a map followed by
– a reduce applied to the results of the map
• Example.
– Map( x2, [1,2,3,4,5] ) = [0,1,4,9,16,25]
– Reduce( x + y, [1,4,9,16,25] ) = ((((1 + 4) + 9) + 16) + 25) = 55
• Map easily parallelizable
– Compute x2 for 1,2,3 on one node and for 4,5 on another
• Reduce notoriously sequential
– Need all squares at one node to compute the total sum.
Square Pyramid Number
1 + 4 + … + n2 =
n(n+1)(2n+1) / 6
Computational Model
• MapReduce is a Parallel Computational Model
• Map-Reduce algorithm = job
• Operates with key-value pairs: (k, V)
– Primitive types, Strings or more complex Structures
• Map-Reduce job input and output is a list of pairs {(k, V)}
• MR Job as defined by 2 functions
• map: (k1; v1) → {(k2; v2)}
• reduce: (k2; {v2}) → {(k3; v3)}

Job Workflow
dogs C, 3
V, 1
C, 2 V, 2
C, 3 V, 1
C, 8
V, 4
The Algorithm
Map ( null, word)
nC = Consonants(word)
nV = Vowels(word)
Emit(“Consonants”, nC)
Emit(“Vowels”, nV)
Reduce(key, {n1, n2, …})
nRes = n1 + n2 + …
Emit(key, nRes)
Computation Framework
• Two virtual clusters: HDFS and MapReduce
– Physically tightly coupled. Designed to work together
• Hadoop Distributed File System. View data as files and directories
• MapReduce is a Parallel Computation Framework
– Job scheduling and execution framework
HDFS Architecture Principles
• The name space is a hierarchy of files and directories
• Files are divided into blocks (typically 128 MB)
• Namespace (metadata) is decoupled from data
– Fast namespace operations, not slowed down by
– Data streaming
• Single NameNode keeps the entire name space in RAM
• DataNodes store data blocks on local drives
• Blocks are replicated on 3 DataNodes for redundancy and availability

MapReduce Framework
• Job Input is a file or a set of files in a distributed file system (HDFS)
– Input is split into blocks of roughly the same size
– Blocks are replicated to multiple nodes
– Block holds a list of key-value pairs
• Map task is scheduled to one of the nodes containing the block
– Map task input is node-local
– Map task result is node-local
• Map task results are grouped: one group per reducer
Each group is sorted
• Reduce task is scheduled to a node
– Reduce task transfers the targeted groups from all mapper nodes
– Computes and stores results in a separate HDFS file
• Job Output is a set of files in HDFS. With #files = #reducers
Map Reduce Example: Mean
• Mean
• Input: large text file
• Output: average length of words in the file µ
• Example: µ({dogs, like, cats}) = 4
n 1
Mean Mapper
• Map input is the set of words {w} in the partition
– Key = null Value = w
• Map computes
– Number of words in the partition
– Total length of the words ∑length(w)
• Map output
– <“count”, #words>
– <“length”, #totalLength>
Map (null, w)
Emit(“count”, 1)
Emit(“length”, length(w))
Single Mean Reducer
• Reduce input
– {<key, {value}>}, where
– key = “count”, “length”
– value is an integer
• Reduce computes
– Total number of words: N = sum of all “count” values
– Total length of words: L = sum of all “length” values
• Reduce Output
– <“count”, N>
– <“length”, L>
• The result
– µ = L / N
Reduce(key, {n1, n2, …})
nRes = n1 + n2 + …
Emit(key, nRes)
Analyze ()
print(“mean = ” + L/N)

Mean: Mapper, Reducer
public class WordMean {
private final static Text COUNT_KEY = new Text(new String("count"));
private final static Text LENGTH_KEY = new Text(new String("length"));
private final static LongWritable ONE = new LongWritable(1);
public static class WordMeanMapper
extends Mapper<Object, Text, Text, LongWritable> {
public void map(Object key, Text value, Context context
) throws IOException, InterruptedException {
StringTokenizer itr = new StringTokenizer(value.toString());
while (itr.hasMoreTokens()) {
String word = itr.nextToken();
context.write(LENGTH_KEY, new LongWritable(word.length()));
context.write(COUNT_KEY, ONE);
} } }
public static class WordMeanReducer
extends Reducer<Text,LongWritable,Text,LongWritable> {
public void reduce(Text key, Iterable<LongWritable> values,
Context context) throws IOException, InterruptedException {
int sum = 0;
for (LongWritable val : values)
sum += val.get();
context.write(key, new LongWritable(sum));
} }
. . . . . . . . . . . . . . . .
Mean: main()
. . . . . . . . . . . . . . . .
public static void main(String[] args) throws IOException {
Configuration conf = new Configuration();
String[] otherArgs = new GenericOptionsParser(
conf, args).getRemainingArgs();
if (otherArgs.length != 2) {
System.err.println("Usage: wordmean <in> <out>");
Job job = new Job(conf, "word mean");
FileInputFormat.addInputPath(job, new Path(otherArgs[0]));
Path outputpath = new Path(otherArgs[1]);
FileOutputFormat.setOutputPath(job, outputpath);
boolean result = job.waitForCompletion(true);
System.exit(result ? 0 : 1);
. . . . . . . . . . . . . . . .
Mean: analyzeResult()
. . . . . . . . . . . . . . . .
private static void analyzeResult(Path outDir) throws IOException {
FileSystem fs = FileSystem.get(new Configuration());
Path reduceFile = new Path(outDir, "part-r-00000");
if(!fs.exists(reduceFile)) return;
long count = 0, length = 0;
BufferedReader in =
new BufferedReader(new InputStreamReader(;
while(in != null && in.ready()) {
StringTokenizer st = new StringTokenizer(in.readLine());
String key = st.nextToken();
String value = st.nextToken();
if(key.equals("count")) count = Long.parseLong(value);
else if(key.equals("length")) length = Long.parseLong(value);
double average = (double)length / count;
System.out.println("The mean is: " + average);
} // end WordMean
MapReduce Implementation
• Single master JobTracker shepherds the distributed heard of TaskTrackers
1. Job scheduling and resource allocation
2. Job monitoring and job lifecycle coordination
3. Cluster health and resource tracking
• Job is defined
– Program: myJob.jar file
– Configuration: conf.xml
– Input, output paths
• JobClient submits the job to the JobTracker
– Calculates and creates splits based on the input
– Write myJob.jar and conf.xml to HDFS

MapReduce Implementation
• JobTracker divides the job into tasks: one map task per split.
– Assigns a TaskTracker for each task, collocated with the split
• TaskTrackers execute tasks and report status to the JobTracker
– TaskTracker can run multiple map and reduce tasks
– Map and Reduce Slots
• Failed attempts reassigned to other TaskTrackers
• Job execution status and results reported back to the client
• Scheduler lets many jobs run in parallel
Example: Standard Deviation
• Standard deviation
• Input: large text file
• Output: standard deviation σ of word lengths
• Example: σ({dogs, like, cats}) = 0
• How many jobs
n 1
Standard Deviation: Hint
Standard Deviation Mapper
• Map input is the set of words {w} in the partition
– Key = null Value = w
• Map computes
– Number of words in the partition
– Total length of the words ∑length(w)
– The sum of lengths squared ∑length(w)2
• Map output
– <“count”, #words>
– <“length”, #totalLength>
– <“squared”, #sumLengthSquared>
Map (null, w)
Emit(“count”, 1)
Emit(“length”, length(w))
Emit(“squared”, length(w)2)

Standard Deviation Reducer
• Reduce input
– {<key, {value}>}, where
– key = “count”, “length”, “squared”
– value is an integer
• Reduce computes
– Total number of words: N = sum of all “count” values
– Total length of words: L = sum of all “length” values
– Sum of length squares: S = sum of all “squared” values
• Reduce Output
– <“count”, N>
– <“length”, L>
– <“squared”, S>
• The result
– µ = L / N
– σ = sqrt(S / N - µ2)
Reduce(key, {n1, n2, …})
nRes = n1 + n2 + …
Emit(key, nRes)
Analyze ()
print(“mean = ” + L/N)
print(“ = ” +
sqrt(S/N – L*L / N*N))
Combiner, Partitioner
• Combiners perform local aggregation before the shuffle & sort phase
– Optimization to reduce data transfers during shuffle
– In Mean example reduces transfer of many keys to only two
• Partitioners assign intermediate (map) key-value pairs to reducers
– Responsible for dividing up the intermediate key space
– Not used with single Reducer
Map Reduce
Input Map Shuffle
& sort
Reduce OutputCombiner
Distributed Sorting
• Sort a dataset, which cannot be entirely stored on one node.
• Input:
– Set of files. 100 byte records.
– The first 10 bytes of each record is the key and the rest is the value.
• Output:
– Ordered list of files: f1, … fN
– Each file fi is sorted, and
– If i < j then for any keys k Є fi and r Є fj (k ≤ r)
– Concatenation of files in the given order must form a completely sorted record set
Naïve MapReduce Sorting
• If the output could be stored on one node
• The input to any Reducer is always sorted by key
– Shuffle sorts Map outputs
• One identity Mapper and one identity Reducer would do the trick
– Identity: <k,v> → <k,v>
Map Reduce
Input Map Shuffle Reduce Output
cats dogs like

Naïve Sorting: Multiple Maps
• Multiple identity Mappers and one identity Reducer – same result
– Does not work for multiple Reducers
Input Map Shuffle Reduce Output
Sorting: Generalization
• Define a hash function, such that
– h: {k} → [1,N]
– Preserves the order: k ≤ s → h(k) ≤ h(s)
– h(k) is a fixed size prefix of string k (2 first bytes)
• Identity Mapper
• With a specialized Partitioner
– Compute hash of the key h(k) and assigns <k,v> to reducer Rh(k)
• Identity Reducer
– Number of reducers is N: R1, …, RN
– Inputs for Ri are all pairs that have key h(k) = i
– Ri is an identity reducer, which writes output to HDFS file fi
– Hash function choice guarantees that
keys from fi are less than keys from fj if i < j
• The algorithm was implemented to win Gray’s Terasort Benchmark in 2008
Undirected Graphs
• “A Discipline of Programming” E. W. Dijkstra. Ch. 23.
– Good old classics
• Graph is defined by V = {v}, E = {<v,w> | v,w Є V}
• Undirected graph. E is symmetrical, that is <v,w> Є E ≡ <w,v> Є E
• Different representations of E
1. Set of pairs
2. <v, {direct neighbors}>
3. Adjacency matrix
• From 1 to 2 in one MR job
– Identity Mapper
– Combiner = Reducer
– Reducer joins values for each vertex
Connected Components
• Partition set of nodes V into disjoint subsets V1, …, VN
– V = V1 U … U VN
– No paths using E from Vi to Vj if i ≠ j
– Gi = <Vi, Ei >
• Representation of connected component
– key = min{Vi}
– value = Vi
• Chain of MR jobs
• Initial data representation
– E is partitioned into sets of records (blocks)
– <v,w> Є E → <min(v,w), {v,w}> = <k, C>

MR Connected Components
• Mapper / Reducer Input
– {<k, C>}, where C is a subset of V, k = min(C)
• Mapper
• Reducer
• Iterate. Stop when stabilized
Map {<k, C>}
For all <ki, Ci> and <kj, Cj>
if Ci ∩ Cj ≠ Ǿ then
C = Ci U Cj
Emit(min(C), C)
Reduce(k, {C1, C2, …})
resC = C1 U C2 U …
Emit(k, resC)
The End

Recently uploaded (20)

Intro to Amazon Web Services (AWS) and Gen AI
Intro to Amazon Web Services (AWS) and Gen AIIntro to Amazon Web Services (AWS) and Gen AI
Intro to Amazon Web Services (AWS) and Gen AI
Folding Cheat Sheet #7 - seventh in a series
Folding Cheat Sheet #7 - seventh in a seriesFolding Cheat Sheet #7 - seventh in a series
Folding Cheat Sheet #7 - seventh in a series
Leading Project Management Tool Taskruop.pptx
Leading Project Management Tool Taskruop.pptxLeading Project Management Tool Taskruop.pptx
Leading Project Management Tool Taskruop.pptx
Google ML-Kit - Understanding on-device machine learning
Google ML-Kit - Understanding on-device machine learningGoogle ML-Kit - Understanding on-device machine learning
Google ML-Kit - Understanding on-device machine learning
AWS Cloud Practitioner Essentials (Second Edition) (Arabic) Course Introducti...
AWS Cloud Practitioner Essentials (Second Edition) (Arabic) Course Introducti...AWS Cloud Practitioner Essentials (Second Edition) (Arabic) Course Introducti...
AWS Cloud Practitioner Essentials (Second Edition) (Arabic) Course Introducti...
active-directory-auditing-solution (2).pptx
active-directory-auditing-solution (2).pptxactive-directory-auditing-solution (2).pptx
active-directory-auditing-solution (2).pptx
Discover the Power of ONEMONITAR: The Ultimate Mobile Spy App for Android Dev...
Discover the Power of ONEMONITAR: The Ultimate Mobile Spy App for Android Dev...Discover the Power of ONEMONITAR: The Ultimate Mobile Spy App for Android Dev...
Discover the Power of ONEMONITAR: The Ultimate Mobile Spy App for Android Dev...
Addressing the Top 9 User Pain Points with Visual Design Elements.pptx
Addressing the Top 9 User Pain Points with Visual Design Elements.pptxAddressing the Top 9 User Pain Points with Visual Design Elements.pptx
Addressing the Top 9 User Pain Points with Visual Design Elements.pptx
Shivam Pandit working on Php Web Developer.
Shivam Pandit working on Php Web Developer.Shivam Pandit working on Php Web Developer.
Shivam Pandit working on Php Web Developer.
A Comparative Analysis of Functional and Non-Functional Testing.pdf
A Comparative Analysis of Functional and Non-Functional Testing.pdfA Comparative Analysis of Functional and Non-Functional Testing.pdf
A Comparative Analysis of Functional and Non-Functional Testing.pdf
dachnug51 - HCL Sametime 12 as a Software Appliance.pdf
dachnug51 - HCL Sametime 12 as a Software Appliance.pdfdachnug51 - HCL Sametime 12 as a Software Appliance.pdf
dachnug51 - HCL Sametime 12 as a Software Appliance.pdf
ThaiPy meetup - Indexes and Django
ThaiPy meetup - Indexes and DjangoThaiPy meetup - Indexes and Django
ThaiPy meetup - Indexes and Django
AWS Cloud Practitioner Essentials (Second Edition) (Arabic) AWS Security .pdf
AWS Cloud Practitioner Essentials (Second Edition) (Arabic) AWS Security .pdfAWS Cloud Practitioner Essentials (Second Edition) (Arabic) AWS Security .pdf
AWS Cloud Practitioner Essentials (Second Edition) (Arabic) AWS Security .pdf
ENISA Threat Landscape 2023 documentation
ENISA Threat Landscape 2023 documentationENISA Threat Landscape 2023 documentation
ENISA Threat Landscape 2023 documentation
React vs Next js: Which is Better for Web Development? - Semiosis Software Pr...
React vs Next js: Which is Better for Web Development? - Semiosis Software Pr...React vs Next js: Which is Better for Web Development? - Semiosis Software Pr...
React vs Next js: Which is Better for Web Development? - Semiosis Software Pr...
React Native vs Flutter - SSTech System
React Native vs Flutter  - SSTech SystemReact Native vs Flutter  - SSTech System
React Native vs Flutter - SSTech System
NBFC Software: Optimize Your Non-Banking Financial Company
NBFC Software: Optimize Your Non-Banking Financial CompanyNBFC Software: Optimize Your Non-Banking Financial Company
NBFC Software: Optimize Your Non-Banking Financial Company
Ported to Cloud with Wing_ Blue ZnZone app from _Hexagonal Architecture Expla...
Ported to Cloud with Wing_ Blue ZnZone app from _Hexagonal Architecture Expla...Ported to Cloud with Wing_ Blue ZnZone app from _Hexagonal Architecture Expla...
Ported to Cloud with Wing_ Blue ZnZone app from _Hexagonal Architecture Expla...
WEBINAR SLIDES: CCX for Cloud Service Providers
WEBINAR SLIDES: CCX for Cloud Service ProvidersWEBINAR SLIDES: CCX for Cloud Service Providers
WEBINAR SLIDES: CCX for Cloud Service Providers

Distributed Computing with Apache Hadoop. Introduction to MapReduce.

  • 1. Distributed Computing with Apache Hadoop Introduction to MapReduce Konstantin V. Shvachko Birmingham Big Data Science Group October 19, 2011
  • 2. Computing • History of computing started long time ago • Fascination with numbers – Vast universe with simple strict rules – Computing devices – Crunch numbers • The Internet – Universe of words, fuzzy rules – Different type of computing – Understand meaning of things – Human thinking – Errors & deviations are a part of study 2 Computer History Museum, San Jose
  • 3. Words vs. Numbers • In 1997 IBM built Deep Blue supercomputer – Playing chess game with the champion G. Kasparov – Human race was defeated – Strict rules for Chess – Fast deep analyses of current state – Still numbers 3 • In 2011 IBM built Watson computer to play Jeopardy – Questions and hints in human terms – Analysis of texts from library and the Internet – Human champions defeated
  • 4. Big Data • Computations that need the power of many computers – Large datasets: hundreds of TBs, PBs – Or use of thousands of CPUs in parallel – Or both • Cluster as a computer 4 What is a PB? 1 KB = 1000 Bytes 1 MB = 1000 KB 1 GB = 1000 MB 1 TB = 1000 GB 1 PB = 1000 TB ???? = 1000 PB
  • 5. Examples – Science • Fundamental physics: Large Hadron Collider (LHC) – Smashing high-energy protons at the speed of light – 1 PB of event data per sec, most filtered out – 15 PB of data per year – 150 computing centers around the World – 160 PB of disk + 90 PB of tape storage • Math: Big Numbers – 2 quadrillionth (1015) digit of π is 0 – pure CPU workload – 12 days of cluster time – 208 years of CPU-time on a cluster with 7600 CPU cores • Big Data – Big Science 5
  • 6. Examples – Web • Search engine Webmap – Map of the Internet – 2008 @ Yahoo, 1500 nodes, 5 PB raw storage • Internet Search Index – Traditional application • Social Network Analysis – Intelligence – Trends 6
  • 7. The Sorting Problem • Classic in-memory sorting – Complexity: number of comparisons • External sorting – Cannot load all data in memory – 16 GB RAM vs. 200 GB file – Complexity: + disk IOs (bytes read or written) • Distributed sorting – Cannot load data on a single server – 12 drives * 2 TB = 24 TB disc space vs. 200 TB data set – Complexity: + network transfers 7 Worst Average Space Bubble Sort O(n2) O(n2) In-place Quicksort O(n2) O(n log n) In-place Merge Sort O(n log n) O(n log n) Double
  • 8. What do we do? • Need a lot of computers • How to make them work together 8
  • 9. Hadoop • Apache Hadoop is an ecosystem of tools for processing “Big Data” • Started in 2005 by D. Cutting and M. Cafarella • Consists of two main components: Providing unified cluster view 1. HDFS – a distributed file system – File system API connecting thousands of drives 2. MapReduce – a framework for distributed computations – Splitting jobs into parts executable on one node – Scheduling and monitoring of job execution • Today used everywhere: Becoming a standard of distributed computing • Hadoop is an open source project 9
  • 10. MapReduce • MapReduce – 2004 Jeffrey Dean, Sanjay Ghemawat. Google. – “MapReduce: Simplified Data Processing on Large Clusters” • Computational model – What is a comp. model ??? • Turing machine, Java – Split large input data into small enough pieces, process in parallel • Execution framework – Compilers, interpreters – Scheduling, Processing, Coordination – Failure recovery 10
  • 11. Functional Programming • Map a higher-order function – applies a given function to each element of a list – returns the list of results • Map( f(x), X[1:n] ) -> [ f(X[1]), …, f(X[n]) ] • Example. Map( x2, [0,1,2,3,4,5] ) = [0,1,4,9,16,25] 11
  • 12. Functional Programming: reduce • Map a higher-order function – applies a given function to each element of a list – returns the list of results • Map( f(x), X[1:n] ) -> [ f(X[1]), …, f(X[n]) ] • Example. Map( x2, [0,1,2,3,4,5] ) = [0,1,4,9,16,25] • Reduce / fold a higher-order function – Iterates given function over a list of elements – Applies function to previous result and current element – Return single result • Example. Reduce( x + y, [0,1,2,3,4,5] ) = (((((0 + 1) + 2) + 3) + 4) + 5) = 15 12
  • 13. Functional Programming • Map a higher-order function – applies a given function to each element of a list – returns the list of results • Map( f(x), X[1:n] ) -> [ f(X[1]), …, f(X[n]) ] • Example. Map( x2, [0,1,2,3,4,5] ) = [0,1,4,9,16,25] • Reduce / fold a higher-order function – Iterates given function over a list of elements – Applies function to previous result and current element – Return single result • Example. Reduce( x + y, [0,1,2,3,4,5] ) = (((((0 + 1) + 2) + 3) + 4) + 5) = 15 • Reduce( x * y, [0,1,2,3,4,5] ) = ? 13
  • 14. Functional Programming • Map a higher-order function – applies a given function to each element of a list – returns the list of results • Map( f(x), X[1:n] ) -> [ f(X[1]), …, f(X[n]) ] • Example. Map( x2, [0,1,2,3,4,5] ) = [0,1,4,9,16,25] • Reduce / fold a higher-order function – Iterates given function over a list of elements – Applies function to previous result and current element – Return single result • Example. Reduce( x + y, [0,1,2,3,4,5] ) = (((((0 + 1) + 2) + 3) + 4) + 5) = 15 • Reduce( x * y, [0,1,2,3,4,5] ) = 0 14
  • 15. Example: Sum of Squares • Composition of – a map followed by – a reduce applied to the results of the map • Example. – Map( x2, [1,2,3,4,5] ) = [0,1,4,9,16,25] – Reduce( x + y, [1,4,9,16,25] ) = ((((1 + 4) + 9) + 16) + 25) = 55 • Map easily parallelizable – Compute x2 for 1,2,3 on one node and for 4,5 on another • Reduce notoriously sequential – Need all squares at one node to compute the total sum. 15 Square Pyramid Number 1 + 4 + … + n2 = n(n+1)(2n+1) / 6
  • 16. Computational Model • MapReduce is a Parallel Computational Model • Map-Reduce algorithm = job • Operates with key-value pairs: (k, V) – Primitive types, Strings or more complex Structures • Map-Reduce job input and output is a list of pairs {(k, V)} • MR Job as defined by 2 functions • map: (k1; v1) → {(k2; v2)} • reduce: (k2; {v2}) → {(k3; v3)} 16
  • 17. Job Workflow 17 dogs C, 3 like cats V, 1 C, 2 V, 2 C, 3 V, 1 C, 8 V, 4
  • 18. The Algorithm 18 Map ( null, word) nC = Consonants(word) nV = Vowels(word) Emit(“Consonants”, nC) Emit(“Vowels”, nV) Reduce(key, {n1, n2, …}) nRes = n1 + n2 + … Emit(key, nRes)
  • 19. Computation Framework • Two virtual clusters: HDFS and MapReduce – Physically tightly coupled. Designed to work together • Hadoop Distributed File System. View data as files and directories • MapReduce is a Parallel Computation Framework – Job scheduling and execution framework 19
  • 20. HDFS Architecture Principles • The name space is a hierarchy of files and directories • Files are divided into blocks (typically 128 MB) • Namespace (metadata) is decoupled from data – Fast namespace operations, not slowed down by – Data streaming • Single NameNode keeps the entire name space in RAM • DataNodes store data blocks on local drives • Blocks are replicated on 3 DataNodes for redundancy and availability 20
  • 21. MapReduce Framework • Job Input is a file or a set of files in a distributed file system (HDFS) – Input is split into blocks of roughly the same size – Blocks are replicated to multiple nodes – Block holds a list of key-value pairs • Map task is scheduled to one of the nodes containing the block – Map task input is node-local – Map task result is node-local • Map task results are grouped: one group per reducer Each group is sorted • Reduce task is scheduled to a node – Reduce task transfers the targeted groups from all mapper nodes – Computes and stores results in a separate HDFS file • Job Output is a set of files in HDFS. With #files = #reducers 21
  • 22. Map Reduce Example: Mean • Mean • Input: large text file • Output: average length of words in the file µ • Example: µ({dogs, like, cats}) = 4 22 n ix n 1 1
  • 23. Mean Mapper • Map input is the set of words {w} in the partition – Key = null Value = w • Map computes – Number of words in the partition – Total length of the words ∑length(w) • Map output – <“count”, #words> – <“length”, #totalLength> 23 Map (null, w) Emit(“count”, 1) Emit(“length”, length(w))
  • 24. Single Mean Reducer • Reduce input – {<key, {value}>}, where – key = “count”, “length” – value is an integer • Reduce computes – Total number of words: N = sum of all “count” values – Total length of words: L = sum of all “length” values • Reduce Output – <“count”, N> – <“length”, L> • The result – µ = L / N 24 Reduce(key, {n1, n2, …}) nRes = n1 + n2 + … Emit(key, nRes) Analyze () read(“part-r-00000”) print(“mean = ” + L/N)
  • 25. Mean: Mapper, Reducer 25 public class WordMean { private final static Text COUNT_KEY = new Text(new String("count")); private final static Text LENGTH_KEY = new Text(new String("length")); private final static LongWritable ONE = new LongWritable(1); public static class WordMeanMapper extends Mapper<Object, Text, Text, LongWritable> { public void map(Object key, Text value, Context context ) throws IOException, InterruptedException { StringTokenizer itr = new StringTokenizer(value.toString()); while (itr.hasMoreTokens()) { String word = itr.nextToken(); context.write(LENGTH_KEY, new LongWritable(word.length())); context.write(COUNT_KEY, ONE); } } } public static class WordMeanReducer extends Reducer<Text,LongWritable,Text,LongWritable> { public void reduce(Text key, Iterable<LongWritable> values, Context context) throws IOException, InterruptedException { int sum = 0; for (LongWritable val : values) sum += val.get(); context.write(key, new LongWritable(sum)); } } . . . . . . . . . . . . . . . .
  • 26. Mean: main() 26 . . . . . . . . . . . . . . . . public static void main(String[] args) throws IOException { Configuration conf = new Configuration(); String[] otherArgs = new GenericOptionsParser( conf, args).getRemainingArgs(); if (otherArgs.length != 2) { System.err.println("Usage: wordmean <in> <out>"); System.exit(2); } Job job = new Job(conf, "word mean"); job.setJarByClass(WordMean.class); job.setMapperClass(WordMeanMapper.class); job.setReducerClass(WordMeanReducer.class); job.setCombinerClass(WordMeanReducer.class); job.setOutputKeyClass(Text.class); job.setOutputValueClass(LongWritable.class); job.setNumReduceTasks(1); FileInputFormat.addInputPath(job, new Path(otherArgs[0])); Path outputpath = new Path(otherArgs[1]); FileOutputFormat.setOutputPath(job, outputpath); boolean result = job.waitForCompletion(true); analyzeResult(outputpath); System.exit(result ? 0 : 1); } . . . . . . . . . . . . . . . .
  • 27. Mean: analyzeResult() 27 . . . . . . . . . . . . . . . . private static void analyzeResult(Path outDir) throws IOException { FileSystem fs = FileSystem.get(new Configuration()); Path reduceFile = new Path(outDir, "part-r-00000"); if(!fs.exists(reduceFile)) return; long count = 0, length = 0; BufferedReader in = new BufferedReader(new InputStreamReader(; while(in != null && in.ready()) { StringTokenizer st = new StringTokenizer(in.readLine()); String key = st.nextToken(); String value = st.nextToken(); if(key.equals("count")) count = Long.parseLong(value); else if(key.equals("length")) length = Long.parseLong(value); } double average = (double)length / count; System.out.println("The mean is: " + average); } } // end WordMean
  • 28. MapReduce Implementation • Single master JobTracker shepherds the distributed heard of TaskTrackers 1. Job scheduling and resource allocation 2. Job monitoring and job lifecycle coordination 3. Cluster health and resource tracking • Job is defined – Program: myJob.jar file – Configuration: conf.xml – Input, output paths • JobClient submits the job to the JobTracker – Calculates and creates splits based on the input – Write myJob.jar and conf.xml to HDFS 28
  • 29. MapReduce Implementation • JobTracker divides the job into tasks: one map task per split. – Assigns a TaskTracker for each task, collocated with the split • TaskTrackers execute tasks and report status to the JobTracker – TaskTracker can run multiple map and reduce tasks – Map and Reduce Slots • Failed attempts reassigned to other TaskTrackers • Job execution status and results reported back to the client • Scheduler lets many jobs run in parallel 29
  • 30. Example: Standard Deviation • Standard deviation • Input: large text file • Output: standard deviation σ of word lengths • Example: σ({dogs, like, cats}) = 0 • How many jobs 30 n ix n 1 2 )( 1
  • 32. Standard Deviation Mapper • Map input is the set of words {w} in the partition – Key = null Value = w • Map computes – Number of words in the partition – Total length of the words ∑length(w) – The sum of lengths squared ∑length(w)2 • Map output – <“count”, #words> – <“length”, #totalLength> – <“squared”, #sumLengthSquared> 32 Map (null, w) Emit(“count”, 1) Emit(“length”, length(w)) Emit(“squared”, length(w)2)
  • 33. Standard Deviation Reducer • Reduce input – {<key, {value}>}, where – key = “count”, “length”, “squared” – value is an integer • Reduce computes – Total number of words: N = sum of all “count” values – Total length of words: L = sum of all “length” values – Sum of length squares: S = sum of all “squared” values • Reduce Output – <“count”, N> – <“length”, L> – <“squared”, S> • The result – µ = L / N – σ = sqrt(S / N - µ2) 33 Reduce(key, {n1, n2, …}) nRes = n1 + n2 + … Emit(key, nRes) Analyze () read(“part-r-00000”) print(“mean = ” + L/N) print(“ = ” + sqrt(S/N – L*L / N*N))
  • 34. Combiner, Partitioner • Combiners perform local aggregation before the shuffle & sort phase – Optimization to reduce data transfers during shuffle – In Mean example reduces transfer of many keys to only two • Partitioners assign intermediate (map) key-value pairs to reducers – Responsible for dividing up the intermediate key space – Not used with single Reducer 34 Input Data Input Data Map Reduce Input Map Shuffle & sort Reduce OutputCombiner Partitioner
  • 35. Distributed Sorting • Sort a dataset, which cannot be entirely stored on one node. • Input: – Set of files. 100 byte records. – The first 10 bytes of each record is the key and the rest is the value. • Output: – Ordered list of files: f1, … fN – Each file fi is sorted, and – If i < j then for any keys k Є fi and r Є fj (k ≤ r) – Concatenation of files in the given order must form a completely sorted record set 35
  • 36. Input Data Naïve MapReduce Sorting • If the output could be stored on one node • The input to any Reducer is always sorted by key – Shuffle sorts Map outputs • One identity Mapper and one identity Reducer would do the trick – Identity: <k,v> → <k,v> 36 Input Data Map Reduce dogs like cats cats dogs like Input Map Shuffle Reduce Output cats dogs like
  • 37. Naïve Sorting: Multiple Maps • Multiple identity Mappers and one identity Reducer – same result – Does not work for multiple Reducers 37 Input Data Output Data Map Map Map Reduce dogs like cats cats dogs like Input Map Shuffle Reduce Output
  • 38. Sorting: Generalization • Define a hash function, such that – h: {k} → [1,N] – Preserves the order: k ≤ s → h(k) ≤ h(s) – h(k) is a fixed size prefix of string k (2 first bytes) • Identity Mapper • With a specialized Partitioner – Compute hash of the key h(k) and assigns <k,v> to reducer Rh(k) • Identity Reducer – Number of reducers is N: R1, …, RN – Inputs for Ri are all pairs that have key h(k) = i – Ri is an identity reducer, which writes output to HDFS file fi – Hash function choice guarantees that keys from fi are less than keys from fj if i < j • The algorithm was implemented to win Gray’s Terasort Benchmark in 2008 38
  • 39. Undirected Graphs • “A Discipline of Programming” E. W. Dijkstra. Ch. 23. – Good old classics • Graph is defined by V = {v}, E = {<v,w> | v,w Є V} • Undirected graph. E is symmetrical, that is <v,w> Є E ≡ <w,v> Є E • Different representations of E 1. Set of pairs 2. <v, {direct neighbors}> 3. Adjacency matrix • From 1 to 2 in one MR job – Identity Mapper – Combiner = Reducer – Reducer joins values for each vertex 39
  • 40. Connected Components • Partition set of nodes V into disjoint subsets V1, …, VN – V = V1 U … U VN – No paths using E from Vi to Vj if i ≠ j – Gi = <Vi, Ei > • Representation of connected component – key = min{Vi} – value = Vi • Chain of MR jobs • Initial data representation – E is partitioned into sets of records (blocks) – <v,w> Є E → <min(v,w), {v,w}> = <k, C> 40
  • 41. MR Connected Components • Mapper / Reducer Input – {<k, C>}, where C is a subset of V, k = min(C) • Mapper • Reducer • Iterate. Stop when stabilized 41 Map {<k, C>} For all <ki, Ci> and <kj, Cj> if Ci ∩ Cj ≠ Ǿ then C = Ci U Cj Emit(min(C), C) Reduce(k, {C1, C2, …}) resC = C1 U C2 U … Emit(k, resC)