Questions tagged [mapreduce]
MapReduce is an algorithm for processing huge datasets on certain kinds of distributable problems using a large number of nodes
mapreduce
12,166
questions
0
votes
0
answers
12
views
Hbase 2.0.0 with Hadooop 2.6.5 : Hbase shell Unhandled Java exception: java.lang.IncompatibleClassChangeError:java.lang.IncompatibleClassChangeError:
I am running Hadoop 2.6.5 with hbase 2.0.0 .
When I try to log into hbase with $hbase shell , i get the following below
I tried Hbase 2.5.8 with hadoop 2.6.5 same issue .
I tired Hbase 1.3.1 with ...
0
votes
0
answers
7
views
Hive 'explain' query plan / meaning of Backup Stage
My complex query returned below 'explain' command result. This query is dealing with a huge dataset. What does backup stage mean in line number 4 (Stage-63 has a backup stage: Stage-2)?
STAGE ...
0
votes
0
answers
29
views
Hadoop cannot make a MapReduce operation because is getting hang, waiting for AM container to be allocated
I used my teacher's guide to install and configure Hadoop (it's my first time using it). It says it is for Ubuntu, but it should for all linux distros since it just compiles Hadoop from source. Since ...
0
votes
1
answer
65
views
C++23 tbb:parallel_reduce with std::multiplies as reduction
I want to multiplies all the elements of a vector. However, the following code snippet
long doulbe sum = parallel_reduce(
blocked_range<long double>(0, sum1.size()), 1.0L /* Identity for ...
0
votes
0
answers
14
views
Apache pig throwing Nullpointer exception when filtering using a parameter ERROR 2000: Error processing rule PartitionFilterOptimizer
I am trying to store data in Apache hive using filter operation where MYVARIABLE is a parameter sent.
Filtering using the variable sends nullpointer exception .This happens when I try to store data in ...
0
votes
1
answer
44
views
Error while scanning intermediate done dir - dataproc spark job
Our spark aggregation jobs are taking a lot of execution time to complete. It supposed to complete in 5 mins but taking 30 to 40 minutes to complete.
dataproc cluster logging say it's trying to scan ...
0
votes
0
answers
19
views
On every run jar file using hadoop it is always stuck
On every run jar file using hadoop it is always stuck here in the last line.
Here, try the Foil Jar located in the hadoop file itself, but with the same result, it gets stuck in the last line, ...
0
votes
0
answers
31
views
PySpark with RDD - How to calculate and compare averages?
I need to solve a problem where a company wants to offer k different users free use (a kind of coupon) of their application for two months. The goal is to identify users who are likely to churn (leave ...
0
votes
0
answers
13
views
Streaming command failed inside Hadoop
I have the following code put in and when I run it inside Hadoop I get the error message.
#!/usr/local/hadoop/bin/hdfs dfs -rm -r /users/spatel/output/
!/usr/local/hadoop/bin/hadoop jar /usr/local/...
0
votes
0
answers
16
views
Mapreduce doesn't successfully do INSERT / CREATE TABLE from existing table operations
I created 2 tables and in beeline (hive) and it worked quickly.
However, I am not able to create a new table from those two. Mapreduce is taking forever.
Also,
The url to track the job: http://...
0
votes
0
answers
23
views
AWS Emr Map Reduce job logs are in stderr
I'm running a MR job in EMR and all my logs are in stderr section (when I go into the Job logs from the Resource Manager UI). How can I move them to stdout or syslog ?
1
vote
0
answers
40
views
Mongodb Map-Reduce perform multiple aggregations
Let's say that I have a collection with documents of this form:
{
id: id1,
name: foo,
value: 64
},
{
id: id1,
name: bar,
value: 37
},
{
id: id1,
name: bar,
value: ...
0
votes
0
answers
36
views
Hadoop Mapreduce word count - /tmp/user is not recognized as an internal or external command,operable program or batch file
I´m new to hadoop and I need to use wordcounter for school project. Everything went fine with hadoop installation, until this error which showed when I ran the mapreducer.jar program to count word ...
0
votes
0
answers
17
views
Unable to Submit MapReduce Job from Java Client to Hadoop Cluster Running in Pseudo-Distributed Mode
I'm working on a project where I need to perform aggregations on the result of an HBase table scan using MapReduce and store the result in another HBase table. To achieve this, I've set up a Hadoop ...
0
votes
0
answers
26
views
How does XGBoost aggregate models being trained in a distributed fashion across n machines?
I am trying to understand how XGBoost distributed training works. The best explanation I've found so far is in this paper: https://ml-pai-learn.oss-cn-beijing.aliyuncs.com/%E6%9C%BA%E5%99%A8%E5%AD%A6%...