SlideShare a Scribd company logo
Mastering Map Reduce
Scott Crespo
Path to Success
Map Reduce Refresher
Optimization Strategies
CustomType Example
Applications
What’s Hadoop?
A framework that facilitates data flow through a cluster of servers
What’s Map Reduce?
 A paradigm for analyzing distributed data sets
Raw Data ( K, [V1..Vn] )(K,V)
What About Hive And Pig?
Use them whenever possible!
Data States in Map Reduce (Letter Count)
HelloWorld
Hello
World
H,1
E,1
L,1
L,1
O,1
W,1
O,1
R,1
L,1
D,1
H,[1]
E,[1]
L,[1,1,1]
O,[1,1]
W,[1]
R,[1]
D,[1]
H,1
E,1
L,3
O,2
W,1
R,1
D,1
Split
Map Partition/Shuffle
Reduce
Basic Map Reduce Program Structure
MyMapReduceProgram {
MyMapperClass extends Mapper {
map() {
// map code
}
}
MyReducerClass extends Reducer {
reduce() {
//reduce code
}
}
main() {
//driver code
}
}
Advanced Optimizations
 Drivers
 CustomTypes
 Setup Methods
 Partitioning
 Combiners
 Chaining
 FaultTolerance
Generating N-Grams
 N-Gram: Set of all n sequential elements in a set.
Trigram: “The quick brown fox jumps over the lazy dog”
(the quick brown), (quick brown fox), (brown fox jumps),
(fox jumps over), (jumps over the), (the lazy dog)
Solution Design
NGramCounter {
NGramMapper {
map() {
//Tokenize and Sanitize Inputs
// Create NGram
// Output (NGram ngram, Int count)
}
}
NGramCombiner {
combine() {
// Sum local NGrams counts that are of the same key
// Output (NGram ngram, Int Count)
}
}
NGramReducer {
reduce() {
// Sum Ngrams counts of the same key
// Output (NGram ngram, Int Count)
}
}
}
CustomType!
Work Flow
 Prototype (Python)
 CustomType (Trigram)
 UnitTests
 Mapper
 Reducer
Prototype
Quick and Dirty Python
Prototype
def test_mapper():
lines = [“the quick brown fox jumped over the lazy dog", "the quick brown”]
for line in lines:
words = line.split()
length = len(words)
sys.stdout.write("nLength of %d n-------------------n" % length)
i = 0
while (i+2 < length):
first = words[i]
second = words [i+1]
third = words[i+2]
trigram = "%s %s %s n" % (first, second, third)
sys.stdout.write(trigram)
i += 1
Output
Length of 9
-------------------
the quick brown
quick brown fox
brown fox jumped
fox jumped over
jumped over the
over the lazy
the lazy dog
Length of 3
-------------------
the quick brown
Custom DataTypes
Custom KeyTypes
Must implement Hadoops WritableComparable interface
 Writable:The key can be serialized and transmitted across a
network
 Comparable:The key can be compared to other keys &
combined/sorted for the reduce phase
write() readFields() compareTo() hashCode()
toString() equals()
Trigram.java
public class Trigram implements WritableComparable<Trigram> {
…
public int compareTo(Trigram other) {
int compared = first.compareTo(other.first);
if (compared != 0) {
return compared;
}
compared = second.compareTo(other.second);
if (compared != 0) {
return compared;
}
return third.compareTo(other.third);
}
public int hashCode() {
return first.hashCode()*163 + second.hashCode() + third.hashCode();
}
}
Map Reduce Program
TrigramMapper
public static class TrigramMapper
extends Mapper<Object, Text, Trigram, IntWritable> {
…
public void map(Object key, Text value, Context context) {
String line = value.toString().toLowerCase(); // create string and lower case
line = line.replaceAll("[^a-zs]",""); // remove bad non-word chars
String[] words = line.split("s"); // split line into list of words
int len = words.length; // need the length for our loop condition
for(int i = 0; i+2 < len; i++) {
if(len <= 1) { continue; } // remove short lines
first.set(words[i]);
second.set(words[i+1]);
third.set(words[i+2]);
trigram.set(first, second, third);
context.write(trigram, one);
TrigramReducer
public static class TrigramReducer
extends Reducer<Trigram, IntWritable, Trigram, IntWritable> {
private IntWritable result = new IntWritable();
public void reduce(Trigram key, Iterable<IntWritable> values, Context context ) {
int sum = 0;
for(IntWritable value : values) {
sum += value.get();
}
result.set(sum);
context.write(key, result);
…
Driver
public static void main(String[] args) throws Exception {
Configuration conf = new Configuration();
Job job = Job.getInstance(conf, "Trigram Count");
job.setJarByClass(TrigramCount.class);
job.setMapperClass(TrigramMapper.class);
job.setMapOutputKeyClass(Trigram.class);
job.setMapOutputValueClass(IntWritable.class);
job.setReducerClass(TrigramReducer.class);
job.setCombinerClass(TrigramReducer.class);
job.setOutputKeyClass(Trigram.class);
job.setOutputValueClass(IntWritable.class);
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
System.exit(job.waitForCompletion(true) ? 0 : 1);
}
Applications
Speech Recognition
(Trigram1, 90)
(Trigram2, 76)
(Trigram3, 8)
(Trigram4, 1)
Other Applications
 Blog Posts
 Stocks
 GIS Coordinates
Any object with multiple attributes!
Stock
Attributes
Text timeStamp;
Text ticker;
Float price;
Conclusion
Custom DataTypes Can:
 Improve Runtime Performance
 Result in Reusable Code
 Provide a Consistent Interface
ThankYou!
Scott Crespo
scott@orlandods.com

More Related Content

Similar to Mastering Hadoop Map Reduce - Custom Types and Other Optimizations

Behm Shah Pagerank
Behm Shah PagerankBehm Shah Pagerank
Behm Shah Pagerank
gothicane
 
Hadoop Map Reduce
Hadoop Map ReduceHadoop Map Reduce
Hadoop Map Reduce
VNIT-ACM Student Chapter
 
HDFS-HC: A Data Placement Module for Heterogeneous Hadoop Clusters
HDFS-HC: A Data Placement Module for Heterogeneous Hadoop ClustersHDFS-HC: A Data Placement Module for Heterogeneous Hadoop Clusters
HDFS-HC: A Data Placement Module for Heterogeneous Hadoop Clusters
Xiao Qin
 
Map Reduce
Map ReduceMap Reduce
Map Reduce
Rahul Agarwal
 
Introduction to Map-Reduce Programming with Hadoop
Introduction to Map-Reduce Programming with HadoopIntroduction to Map-Reduce Programming with Hadoop
Introduction to Map-Reduce Programming with Hadoop
Dilum Bandara
 
Lecture 2 part 3
Lecture 2 part 3Lecture 2 part 3
Lecture 2 part 3
Jazan University
 
Hadoop trainingin bangalore
Hadoop trainingin bangaloreHadoop trainingin bangalore
Hadoop trainingin bangalore
appaji intelhunt
 
Hadoop
HadoopHadoop
Stratosphere System Overview Big Data Beers Berlin. 20.11.2013
Stratosphere System Overview Big Data Beers Berlin. 20.11.2013Stratosphere System Overview Big Data Beers Berlin. 20.11.2013
Stratosphere System Overview Big Data Beers Berlin. 20.11.2013
Robert Metzger
 
Embarrassingly/Delightfully Parallel Problems
Embarrassingly/Delightfully Parallel ProblemsEmbarrassingly/Delightfully Parallel Problems
Embarrassingly/Delightfully Parallel Problems
Dilum Bandara
 
Applying stratosphere for big data analytics
Applying stratosphere for big data analyticsApplying stratosphere for big data analytics
Applying stratosphere for big data analytics
Avinash Pandu
 
Map reduce presentation
Map reduce presentationMap reduce presentation
Map reduce presentation
ateeq ateeq
 
MapReduceAlgorithms.ppt
MapReduceAlgorithms.pptMapReduceAlgorithms.ppt
MapReduceAlgorithms.ppt
CheeWeiTan10
 
Google Cluster Innards
Google Cluster InnardsGoogle Cluster Innards
Google Cluster Innards
Martin Dvorak
 
Hadoop ecosystem
Hadoop ecosystemHadoop ecosystem
Hadoop ecosystem
Ran Silberman
 
Meethadoop
MeethadoopMeethadoop
Meethadoop
IIIT-H
 
Intermachine Parallelism
Intermachine ParallelismIntermachine Parallelism
Intermachine Parallelism
Sri Prasanna
 
Transformations and actions a visual guide training
Transformations and actions a visual guide trainingTransformations and actions a visual guide training
Transformations and actions a visual guide training
Spark Summit
 
Advance Map reduce - Apache hadoop Bigdata training by Design Pathshala
Advance Map reduce - Apache hadoop Bigdata training by Design PathshalaAdvance Map reduce - Apache hadoop Bigdata training by Design Pathshala
Advance Map reduce - Apache hadoop Bigdata training by Design Pathshala
Desing Pathshala
 
Hadoop trainting-in-hyderabad@kelly technologies
Hadoop trainting-in-hyderabad@kelly technologiesHadoop trainting-in-hyderabad@kelly technologies
Hadoop trainting-in-hyderabad@kelly technologies
Kelly Technologies
 

Similar to Mastering Hadoop Map Reduce - Custom Types and Other Optimizations (20)

Behm Shah Pagerank
Behm Shah PagerankBehm Shah Pagerank
Behm Shah Pagerank
 
Hadoop Map Reduce
Hadoop Map ReduceHadoop Map Reduce
Hadoop Map Reduce
 
HDFS-HC: A Data Placement Module for Heterogeneous Hadoop Clusters
HDFS-HC: A Data Placement Module for Heterogeneous Hadoop ClustersHDFS-HC: A Data Placement Module for Heterogeneous Hadoop Clusters
HDFS-HC: A Data Placement Module for Heterogeneous Hadoop Clusters
 
Map Reduce
Map ReduceMap Reduce
Map Reduce
 
Introduction to Map-Reduce Programming with Hadoop
Introduction to Map-Reduce Programming with HadoopIntroduction to Map-Reduce Programming with Hadoop
Introduction to Map-Reduce Programming with Hadoop
 
Lecture 2 part 3
Lecture 2 part 3Lecture 2 part 3
Lecture 2 part 3
 
Hadoop trainingin bangalore
Hadoop trainingin bangaloreHadoop trainingin bangalore
Hadoop trainingin bangalore
 
Hadoop
HadoopHadoop
Hadoop
 
Stratosphere System Overview Big Data Beers Berlin. 20.11.2013
Stratosphere System Overview Big Data Beers Berlin. 20.11.2013Stratosphere System Overview Big Data Beers Berlin. 20.11.2013
Stratosphere System Overview Big Data Beers Berlin. 20.11.2013
 
Embarrassingly/Delightfully Parallel Problems
Embarrassingly/Delightfully Parallel ProblemsEmbarrassingly/Delightfully Parallel Problems
Embarrassingly/Delightfully Parallel Problems
 
Applying stratosphere for big data analytics
Applying stratosphere for big data analyticsApplying stratosphere for big data analytics
Applying stratosphere for big data analytics
 
Map reduce presentation
Map reduce presentationMap reduce presentation
Map reduce presentation
 
MapReduceAlgorithms.ppt
MapReduceAlgorithms.pptMapReduceAlgorithms.ppt
MapReduceAlgorithms.ppt
 
Google Cluster Innards
Google Cluster InnardsGoogle Cluster Innards
Google Cluster Innards
 
Hadoop ecosystem
Hadoop ecosystemHadoop ecosystem
Hadoop ecosystem
 
Meethadoop
MeethadoopMeethadoop
Meethadoop
 
Intermachine Parallelism
Intermachine ParallelismIntermachine Parallelism
Intermachine Parallelism
 
Transformations and actions a visual guide training
Transformations and actions a visual guide trainingTransformations and actions a visual guide training
Transformations and actions a visual guide training
 
Advance Map reduce - Apache hadoop Bigdata training by Design Pathshala
Advance Map reduce - Apache hadoop Bigdata training by Design PathshalaAdvance Map reduce - Apache hadoop Bigdata training by Design Pathshala
Advance Map reduce - Apache hadoop Bigdata training by Design Pathshala
 
Hadoop trainting-in-hyderabad@kelly technologies
Hadoop trainting-in-hyderabad@kelly technologiesHadoop trainting-in-hyderabad@kelly technologies
Hadoop trainting-in-hyderabad@kelly technologies
 

Recently uploaded

Victoria University degree offer diploma Transcript
Victoria University  degree offer diploma TranscriptVictoria University  degree offer diploma Transcript
Victoria University degree offer diploma Transcript
taqyea
 
NPS_Presentation_V3.pptx it is regarding National pension scheme
NPS_Presentation_V3.pptx it is regarding National pension schemeNPS_Presentation_V3.pptx it is regarding National pension scheme
NPS_Presentation_V3.pptx it is regarding National pension scheme
ASISHSABAT3
 
MUMBAI MONTHLY RAINFALL CAPSTONE PROJECT
MUMBAI MONTHLY RAINFALL CAPSTONE PROJECTMUMBAI MONTHLY RAINFALL CAPSTONE PROJECT
MUMBAI MONTHLY RAINFALL CAPSTONE PROJECT
GaneshGanesh399816
 
Greater Kailash @ℂall @Girls ꧁❤ 9873777170 ❤꧂Glamorous sonam Mehra Top Model ...
Greater Kailash @ℂall @Girls ꧁❤ 9873777170 ❤꧂Glamorous sonam Mehra Top Model ...Greater Kailash @ℂall @Girls ꧁❤ 9873777170 ❤꧂Glamorous sonam Mehra Top Model ...
Greater Kailash @ℂall @Girls ꧁❤ 9873777170 ❤꧂Glamorous sonam Mehra Top Model ...
shoeb2926
 
LLM powered Contract Compliance Application.pptx
LLM powered Contract Compliance Application.pptxLLM powered Contract Compliance Application.pptx
LLM powered Contract Compliance Application.pptx
Jyotishko Biswas
 
Introduction to the Red Hat Portfolio.pdf
Introduction to the Red Hat Portfolio.pdfIntroduction to the Red Hat Portfolio.pdf
Introduction to the Red Hat Portfolio.pdf
kihus38
 
University of the Sunshine Coast degree offer diploma Transcript
University of the Sunshine Coast  degree offer diploma TranscriptUniversity of the Sunshine Coast  degree offer diploma Transcript
University of the Sunshine Coast degree offer diploma Transcript
taqyea
 
Lajpat Nagar @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Ruhi Singla Top Model Safe
Lajpat Nagar @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Ruhi Singla Top Model SafeLajpat Nagar @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Ruhi Singla Top Model Safe
Lajpat Nagar @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Ruhi Singla Top Model Safe
jiya khan$A17
 
[D3T1S04] Aurora PostgreSQL performance monitoring and troubleshooting by use...
[D3T1S04] Aurora PostgreSQL performance monitoring and troubleshooting by use...[D3T1S04] Aurora PostgreSQL performance monitoring and troubleshooting by use...
[D3T1S04] Aurora PostgreSQL performance monitoring and troubleshooting by use...
Amazon Web Services Korea
 
University of Toronto degree offer diploma Transcript
University of Toronto  degree offer diploma TranscriptUniversity of Toronto  degree offer diploma Transcript
University of Toronto degree offer diploma Transcript
taqyea
 
EGU2020-10385_presentation LSTM algorithm
EGU2020-10385_presentation LSTM algorithmEGU2020-10385_presentation LSTM algorithm
EGU2020-10385_presentation LSTM algorithm
fatimaezzahraboumaiz2
 
Cómo hemos implementado semántica de "Exactly Once" en nuestra base de datos ...
Cómo hemos implementado semántica de "Exactly Once" en nuestra base de datos ...Cómo hemos implementado semántica de "Exactly Once" en nuestra base de datos ...
Cómo hemos implementado semántica de "Exactly Once" en nuestra base de datos ...
javier ramirez
 
Lajpat Nagar @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Arti Singh Top Model Safe
Lajpat Nagar @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Arti Singh Top Model SafeLajpat Nagar @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Arti Singh Top Model Safe
Lajpat Nagar @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Arti Singh Top Model Safe
aarusi sexy model
 
Simon Fraser University degree offer diploma Transcript
Simon Fraser University  degree offer diploma TranscriptSimon Fraser University  degree offer diploma Transcript
Simon Fraser University degree offer diploma Transcript
taqyea
 
Maruti Wagon R on road price in Faridabad - CarDekho
Maruti Wagon R on road price in Faridabad - CarDekhoMaruti Wagon R on road price in Faridabad - CarDekho
Maruti Wagon R on road price in Faridabad - CarDekho
kamli sharma#S10
 
[D3T1S03] Amazon DynamoDB design puzzlers
[D3T1S03] Amazon DynamoDB design puzzlers[D3T1S03] Amazon DynamoDB design puzzlers
[D3T1S03] Amazon DynamoDB design puzzlers
Amazon Web Services Korea
 
Daryaganj @ℂall @Girls ꧁❤ 9873940964 ❤꧂VIP Jina Singh Top Model Safe
Daryaganj @ℂall @Girls ꧁❤ 9873940964 ❤꧂VIP Jina Singh Top Model SafeDaryaganj @ℂall @Girls ꧁❤ 9873940964 ❤꧂VIP Jina Singh Top Model Safe
Daryaganj @ℂall @Girls ꧁❤ 9873940964 ❤꧂VIP Jina Singh Top Model Safe
butwhat24
 
Nehru Place @ℂall @Girls ꧁❤ 9873940964 ❤꧂VIP Jina Singh Top Model Safe
Nehru Place @ℂall @Girls ꧁❤ 9873940964 ❤꧂VIP Jina Singh Top Model SafeNehru Place @ℂall @Girls ꧁❤ 9873940964 ❤꧂VIP Jina Singh Top Model Safe
Nehru Place @ℂall @Girls ꧁❤ 9873940964 ❤꧂VIP Jina Singh Top Model Safe
butwhat24
 
Lajpat Nagar @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Ginni Singh Top Model Safe
Lajpat Nagar @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Ginni Singh Top Model SafeLajpat Nagar @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Ginni Singh Top Model Safe
Lajpat Nagar @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Ginni Singh Top Model Safe
khansayyad1256
 
RK Puram @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Neha Singla Top Model Safe
RK Puram @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Neha Singla Top Model SafeRK Puram @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Neha Singla Top Model Safe
RK Puram @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Neha Singla Top Model Safe
Alisha Pathan $A17
 

Recently uploaded (20)

Victoria University degree offer diploma Transcript
Victoria University  degree offer diploma TranscriptVictoria University  degree offer diploma Transcript
Victoria University degree offer diploma Transcript
 
NPS_Presentation_V3.pptx it is regarding National pension scheme
NPS_Presentation_V3.pptx it is regarding National pension schemeNPS_Presentation_V3.pptx it is regarding National pension scheme
NPS_Presentation_V3.pptx it is regarding National pension scheme
 
MUMBAI MONTHLY RAINFALL CAPSTONE PROJECT
MUMBAI MONTHLY RAINFALL CAPSTONE PROJECTMUMBAI MONTHLY RAINFALL CAPSTONE PROJECT
MUMBAI MONTHLY RAINFALL CAPSTONE PROJECT
 
Greater Kailash @ℂall @Girls ꧁❤ 9873777170 ❤꧂Glamorous sonam Mehra Top Model ...
Greater Kailash @ℂall @Girls ꧁❤ 9873777170 ❤꧂Glamorous sonam Mehra Top Model ...Greater Kailash @ℂall @Girls ꧁❤ 9873777170 ❤꧂Glamorous sonam Mehra Top Model ...
Greater Kailash @ℂall @Girls ꧁❤ 9873777170 ❤꧂Glamorous sonam Mehra Top Model ...
 
LLM powered Contract Compliance Application.pptx
LLM powered Contract Compliance Application.pptxLLM powered Contract Compliance Application.pptx
LLM powered Contract Compliance Application.pptx
 
Introduction to the Red Hat Portfolio.pdf
Introduction to the Red Hat Portfolio.pdfIntroduction to the Red Hat Portfolio.pdf
Introduction to the Red Hat Portfolio.pdf
 
University of the Sunshine Coast degree offer diploma Transcript
University of the Sunshine Coast  degree offer diploma TranscriptUniversity of the Sunshine Coast  degree offer diploma Transcript
University of the Sunshine Coast degree offer diploma Transcript
 
Lajpat Nagar @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Ruhi Singla Top Model Safe
Lajpat Nagar @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Ruhi Singla Top Model SafeLajpat Nagar @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Ruhi Singla Top Model Safe
Lajpat Nagar @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Ruhi Singla Top Model Safe
 
[D3T1S04] Aurora PostgreSQL performance monitoring and troubleshooting by use...
[D3T1S04] Aurora PostgreSQL performance monitoring and troubleshooting by use...[D3T1S04] Aurora PostgreSQL performance monitoring and troubleshooting by use...
[D3T1S04] Aurora PostgreSQL performance monitoring and troubleshooting by use...
 
University of Toronto degree offer diploma Transcript
University of Toronto  degree offer diploma TranscriptUniversity of Toronto  degree offer diploma Transcript
University of Toronto degree offer diploma Transcript
 
EGU2020-10385_presentation LSTM algorithm
EGU2020-10385_presentation LSTM algorithmEGU2020-10385_presentation LSTM algorithm
EGU2020-10385_presentation LSTM algorithm
 
Cómo hemos implementado semántica de "Exactly Once" en nuestra base de datos ...
Cómo hemos implementado semántica de "Exactly Once" en nuestra base de datos ...Cómo hemos implementado semántica de "Exactly Once" en nuestra base de datos ...
Cómo hemos implementado semántica de "Exactly Once" en nuestra base de datos ...
 
Lajpat Nagar @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Arti Singh Top Model Safe
Lajpat Nagar @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Arti Singh Top Model SafeLajpat Nagar @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Arti Singh Top Model Safe
Lajpat Nagar @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Arti Singh Top Model Safe
 
Simon Fraser University degree offer diploma Transcript
Simon Fraser University  degree offer diploma TranscriptSimon Fraser University  degree offer diploma Transcript
Simon Fraser University degree offer diploma Transcript
 
Maruti Wagon R on road price in Faridabad - CarDekho
Maruti Wagon R on road price in Faridabad - CarDekhoMaruti Wagon R on road price in Faridabad - CarDekho
Maruti Wagon R on road price in Faridabad - CarDekho
 
[D3T1S03] Amazon DynamoDB design puzzlers
[D3T1S03] Amazon DynamoDB design puzzlers[D3T1S03] Amazon DynamoDB design puzzlers
[D3T1S03] Amazon DynamoDB design puzzlers
 
Daryaganj @ℂall @Girls ꧁❤ 9873940964 ❤꧂VIP Jina Singh Top Model Safe
Daryaganj @ℂall @Girls ꧁❤ 9873940964 ❤꧂VIP Jina Singh Top Model SafeDaryaganj @ℂall @Girls ꧁❤ 9873940964 ❤꧂VIP Jina Singh Top Model Safe
Daryaganj @ℂall @Girls ꧁❤ 9873940964 ❤꧂VIP Jina Singh Top Model Safe
 
Nehru Place @ℂall @Girls ꧁❤ 9873940964 ❤꧂VIP Jina Singh Top Model Safe
Nehru Place @ℂall @Girls ꧁❤ 9873940964 ❤꧂VIP Jina Singh Top Model SafeNehru Place @ℂall @Girls ꧁❤ 9873940964 ❤꧂VIP Jina Singh Top Model Safe
Nehru Place @ℂall @Girls ꧁❤ 9873940964 ❤꧂VIP Jina Singh Top Model Safe
 
Lajpat Nagar @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Ginni Singh Top Model Safe
Lajpat Nagar @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Ginni Singh Top Model SafeLajpat Nagar @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Ginni Singh Top Model Safe
Lajpat Nagar @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Ginni Singh Top Model Safe
 
RK Puram @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Neha Singla Top Model Safe
RK Puram @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Neha Singla Top Model SafeRK Puram @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Neha Singla Top Model Safe
RK Puram @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Neha Singla Top Model Safe
 

Mastering Hadoop Map Reduce - Custom Types and Other Optimizations

  • 2. Path to Success Map Reduce Refresher Optimization Strategies CustomType Example Applications
  • 3. What’s Hadoop? A framework that facilitates data flow through a cluster of servers
  • 4. What’s Map Reduce?  A paradigm for analyzing distributed data sets Raw Data ( K, [V1..Vn] )(K,V)
  • 5. What About Hive And Pig? Use them whenever possible!
  • 6. Data States in Map Reduce (Letter Count) HelloWorld Hello World H,1 E,1 L,1 L,1 O,1 W,1 O,1 R,1 L,1 D,1 H,[1] E,[1] L,[1,1,1] O,[1,1] W,[1] R,[1] D,[1] H,1 E,1 L,3 O,2 W,1 R,1 D,1 Split Map Partition/Shuffle Reduce
  • 7. Basic Map Reduce Program Structure MyMapReduceProgram { MyMapperClass extends Mapper { map() { // map code } } MyReducerClass extends Reducer { reduce() { //reduce code } } main() { //driver code } }
  • 8. Advanced Optimizations  Drivers  CustomTypes  Setup Methods  Partitioning  Combiners  Chaining  FaultTolerance
  • 9. Generating N-Grams  N-Gram: Set of all n sequential elements in a set. Trigram: “The quick brown fox jumps over the lazy dog” (the quick brown), (quick brown fox), (brown fox jumps), (fox jumps over), (jumps over the), (the lazy dog)
  • 10. Solution Design NGramCounter { NGramMapper { map() { //Tokenize and Sanitize Inputs // Create NGram // Output (NGram ngram, Int count) } } NGramCombiner { combine() { // Sum local NGrams counts that are of the same key // Output (NGram ngram, Int Count) } } NGramReducer { reduce() { // Sum Ngrams counts of the same key // Output (NGram ngram, Int Count) } } } CustomType!
  • 11. Work Flow  Prototype (Python)  CustomType (Trigram)  UnitTests  Mapper  Reducer
  • 13. Prototype def test_mapper(): lines = [“the quick brown fox jumped over the lazy dog", "the quick brown”] for line in lines: words = line.split() length = len(words) sys.stdout.write("nLength of %d n-------------------n" % length) i = 0 while (i+2 < length): first = words[i] second = words [i+1] third = words[i+2] trigram = "%s %s %s n" % (first, second, third) sys.stdout.write(trigram) i += 1
  • 14. Output Length of 9 ------------------- the quick brown quick brown fox brown fox jumped fox jumped over jumped over the over the lazy the lazy dog Length of 3 ------------------- the quick brown
  • 16. Custom KeyTypes Must implement Hadoops WritableComparable interface  Writable:The key can be serialized and transmitted across a network  Comparable:The key can be compared to other keys & combined/sorted for the reduce phase write() readFields() compareTo() hashCode() toString() equals()
  • 17. Trigram.java public class Trigram implements WritableComparable<Trigram> { … public int compareTo(Trigram other) { int compared = first.compareTo(other.first); if (compared != 0) { return compared; } compared = second.compareTo(other.second); if (compared != 0) { return compared; } return third.compareTo(other.third); } public int hashCode() { return first.hashCode()*163 + second.hashCode() + third.hashCode(); } }
  • 19. TrigramMapper public static class TrigramMapper extends Mapper<Object, Text, Trigram, IntWritable> { … public void map(Object key, Text value, Context context) { String line = value.toString().toLowerCase(); // create string and lower case line = line.replaceAll("[^a-zs]",""); // remove bad non-word chars String[] words = line.split("s"); // split line into list of words int len = words.length; // need the length for our loop condition for(int i = 0; i+2 < len; i++) { if(len <= 1) { continue; } // remove short lines first.set(words[i]); second.set(words[i+1]); third.set(words[i+2]); trigram.set(first, second, third); context.write(trigram, one);
  • 20. TrigramReducer public static class TrigramReducer extends Reducer<Trigram, IntWritable, Trigram, IntWritable> { private IntWritable result = new IntWritable(); public void reduce(Trigram key, Iterable<IntWritable> values, Context context ) { int sum = 0; for(IntWritable value : values) { sum += value.get(); } result.set(sum); context.write(key, result); …
  • 21. Driver public static void main(String[] args) throws Exception { Configuration conf = new Configuration(); Job job = Job.getInstance(conf, "Trigram Count"); job.setJarByClass(TrigramCount.class); job.setMapperClass(TrigramMapper.class); job.setMapOutputKeyClass(Trigram.class); job.setMapOutputValueClass(IntWritable.class); job.setReducerClass(TrigramReducer.class); job.setCombinerClass(TrigramReducer.class); job.setOutputKeyClass(Trigram.class); job.setOutputValueClass(IntWritable.class); FileInputFormat.addInputPath(job, new Path(args[0])); FileOutputFormat.setOutputPath(job, new Path(args[1])); System.exit(job.waitForCompletion(true) ? 0 : 1); }
  • 23. Speech Recognition (Trigram1, 90) (Trigram2, 76) (Trigram3, 8) (Trigram4, 1)
  • 24. Other Applications  Blog Posts  Stocks  GIS Coordinates Any object with multiple attributes!
  • 26. Conclusion Custom DataTypes Can:  Improve Runtime Performance  Result in Reusable Code  Provide a Consistent Interface