Skip to main content

Questions tagged [mapreduce]

MapReduce is an algorithm for processing huge datasets on certain kinds of distributable problems using a large number of nodes

mapreduce
0 votes
0 answers
12 views

Hbase 2.0.0 with Hadooop 2.6.5 : Hbase shell Unhandled Java exception: java.lang.IncompatibleClassChangeError:java.lang.IncompatibleClassChangeError:

I am running Hadoop 2.6.5 with hbase 2.0.0 . When I try to log into hbase with $hbase shell , i get the following below I tried Hbase 2.5.8 with hadoop 2.6.5 same issue . I tired Hbase 1.3.1 with ...
user3884278's user avatar
0 votes
0 answers
7 views

Hive 'explain' query plan / meaning of Backup Stage

My complex query returned below 'explain' command result. This query is dealing with a huge dataset. What does backup stage mean in line number 4 (Stage-63 has a backup stage: Stage-2)? STAGE ...
Gyanendra Dwivedi's user avatar
0 votes
0 answers
29 views

Hadoop cannot make a MapReduce operation because is getting hang, waiting for AM container to be allocated

I used my teacher's guide to install and configure Hadoop (it's my first time using it). It says it is for Ubuntu, but it should for all linux distros since it just compiles Hadoop from source. Since ...
Alberto's user avatar
0 votes
1 answer
65 views

C++23 tbb:parallel_reduce with std::multiplies as reduction

I want to multiplies all the elements of a vector. However, the following code snippet long doulbe sum = parallel_reduce( blocked_range<long double>(0, sum1.size()), 1.0L /* Identity for ...
khteh's user avatar
  • 3,734
0 votes
0 answers
14 views

Apache pig throwing Nullpointer exception when filtering using a parameter ERROR 2000: Error processing rule PartitionFilterOptimizer

I am trying to store data in Apache hive using filter operation where MYVARIABLE is a parameter sent. Filtering using the variable sends nullpointer exception .This happens when I try to store data in ...
Luff li's user avatar
  • 133
0 votes
1 answer
44 views

Error while scanning intermediate done dir - dataproc spark job

Our spark aggregation jobs are taking a lot of execution time to complete. It supposed to complete in 5 mins but taking 30 to 40 minutes to complete. dataproc cluster logging say it's trying to scan ...
vikrant rana's user avatar
  • 4,557
0 votes
0 answers
19 views

On every run jar file using hadoop it is always stuck

On every run jar file using hadoop it is always stuck here in the last line. Here, try the Foil Jar located in the hadoop file itself, but with the same result, it gets stuck in the last line, ...
Noor Khalil's user avatar
0 votes
0 answers
31 views

PySpark with RDD - How to calculate and compare averages?

I need to solve a problem where a company wants to offer k different users free use (a kind of coupon) of their application for two months. The goal is to identify users who are likely to churn (leave ...
Yoel Ha's user avatar
0 votes
0 answers
13 views

Streaming command failed inside Hadoop

I have the following code put in and when I run it inside Hadoop I get the error message. #!/usr/local/hadoop/bin/hdfs dfs -rm -r /users/spatel/output/ !/usr/local/hadoop/bin/hadoop jar /usr/local/...
Shekhar's user avatar
0 votes
0 answers
16 views

Mapreduce doesn't successfully do INSERT / CREATE TABLE from existing table operations

I created 2 tables and in beeline (hive) and it worked quickly. However, I am not able to create a new table from those two. Mapreduce is taking forever. Also, The url to track the job: http://...
Anushka's user avatar
0 votes
0 answers
23 views

AWS Emr Map Reduce job logs are in stderr

I'm running a MR job in EMR and all my logs are in stderr section (when I go into the Job logs from the Resource Manager UI). How can I move them to stdout or syslog ?
Stefan Ss's user avatar
1 vote
0 answers
40 views

Mongodb Map-Reduce perform multiple aggregations

Let's say that I have a collection with documents of this form: { id: id1, name: foo, value: 64 }, { id: id1, name: bar, value: 37 }, { id: id1, name: bar, value: ...
Julio Sanz Rodríguez's user avatar
0 votes
0 answers
36 views

Hadoop Mapreduce word count - /tmp/user is not recognized as an internal or external command,operable program or batch file

I´m new to hadoop and I need to use wordcounter for school project. Everything went fine with hadoop installation, until this error which showed when I ran the mapreducer.jar program to count word ...
Natália Gálová's user avatar
0 votes
0 answers
17 views

Unable to Submit MapReduce Job from Java Client to Hadoop Cluster Running in Pseudo-Distributed Mode

I'm working on a project where I need to perform aggregations on the result of an HBase table scan using MapReduce and store the result in another HBase table. To achieve this, I've set up a Hadoop ...
Pedro Gomes's user avatar
0 votes
0 answers
26 views

How does XGBoost aggregate models being trained in a distributed fashion across n machines?

I am trying to understand how XGBoost distributed training works. The best explanation I've found so far is in this paper: https://ml-pai-learn.oss-cn-beijing.aliyuncs.com/%E6%9C%BA%E5%99%A8%E5%AD%A6%...
Altamash Rafiq's user avatar

15 30 50 per page
1
2 3 4 5
812