All Questions
1,332
questions
0
votes
0
answers
12
views
"Hadoop archive -archiveName directoryname.har -p /source/hdfs/path /destination/hdfs/path" doesn't work via spark-submit but working in spark-shell
I am trying to develop code in spark scala in intellij, created mvn package and doing spark-submit in cluster but it shows warn in yarn log saying "warn ioc.client: exception encountered while ...
0
votes
0
answers
20
views
Spark Job Hold a While in "Sending RPC" Log
I have a Spark Job,
When Running Job on a YARN Cluster (HDP 3.1), After A long Time (about 1hour) i get this Message on Trace Log and Job Nothing to Do, After That Job Create Executers and Running ...
0
votes
1
answer
94
views
How to delete key for all commits in HUDI Table (history)?
For a HUDI table the goal is to apply GDPR and delete a key of a table.
I'm only able to delete data fror the latest commit of the table.
How can I make sure the key is deleted for all commits on the ...
0
votes
0
answers
40
views
How to get the name of the file that was just written by a Spark Job?
I have this simple Hadoop application in Scala. I'm repartitioning and writing to 2 files. I however, need to know the file name that was just written.
package com.scala.sparkscalaplayground
import ...
0
votes
0
answers
94
views
Unable to run scala test file due to installation problem with installation of java package
When I was running the Scala test code in IntelliJ, I was troubled by this error:
Testing started ...
Using Spark's default log4j profile: org/apache/spark/log4j2-defaults.properties
24/01/03 22:20:24 ...
0
votes
1
answer
238
views
Spark-submit yarn-client mode hangs even though spark task completed (pyspark 3.4.1)
Recently set up a dockerized env along with CDP for submitting yarn-client mode spark jobs on a kereberized hadoop cluster and seeing inconsistent behavior with application lifecycle. Scenario:
...
0
votes
2
answers
118
views
Alternative to InMemoryFileIndex to list files in folder using spark scala
The task I would like to solve:
I have a constant influx of files in a specific folder held on azure storage. I would like to periodically list the files in this folder in order to copy them to a ...
-1
votes
1
answer
99
views
How to copy zip file from hdfs to sftp server
I have a zip file named - "FileName.zip" in hdfs location. I wanted to copy this zip file to sftp server.
The zip folder structure is below (when downloaded to local)-
FileName.zip
- file....
1
vote
0
answers
210
views
NoSuchMethodError while trying to save parquet files on s3 bucket
I am trying to save data into s3 bucket using Scala code. I am always getting following error. How can I resolve this error?
After changing jar version of hadoop-aws from 3.3.0 to 3.3.1 version, the ...
0
votes
0
answers
126
views
To read orc file from GCS bucket
To read orc file from a GCS bucket i'm using below code snippet, where i'm creating hadoop configuration and setting required file system attributes to use gcs bucket
val hadoopConf = new ...
4
votes
4
answers
1k
views
How to resolve harmless "java.nio.file.NoSuchFileException: xxx/hadoop-client-api-3.3.4.jar" error in Spark when run `sbt run`?
I have a simple Spark application in Scala 2.12.
My App
find-retired-people-scala/project/build.properties
sbt.version=1.8.2
find-retired-people-scala/src/main/scala/com/hongbomiao/FindRetiredPeople....
1
vote
1
answer
615
views
java.lang.NoSuchMethodError: org.apache.hadoop.hive.common.FileUtils.mkdir while trying to save a table to Hive
I am trying to read a kafka stream and save it to Hive as a table.
The consumer code is :
import org.apache.spark.sql.{DataFrame, Dataset, SaveMode, SparkSession}
import org.apache.spark.sql.functions....
0
votes
1
answer
105
views
Submitting Multiple Jobs in Sequence
I'm having some trouble understanding how Spark allows for scheduling of jobs. I have a series of jobs I'd like to run in sequence. From what I've read, I can submit any number of jobs to spark-submit ...
0
votes
1
answer
1k
views
Spark Shell on Kubernetes with Kerberos enabled Cluster
I have a hard time to get the spark shell (3.3.1) on kubernetes to work with kerberos. It works in cluster mode and client mode for submit. Here is what we did to get it to work:
cluster mode (works ...
1
vote
0
answers
150
views
Dropping external table in spark is dropping the location or data too
import org.apache.hadoop.fs.{Path,FileSystem}
import org.apache.hadoop.conf.Configuration
import org.apache.spark.sql.{SaveMode, SparkSession}
import org.apache.spark.sql.functions.{current_date, ...