All Questions
Tagged with scala apache-spark-sql
6,828
questions
0
votes
1
answer
51
views
spark.sql() giving error : org.apache.spark.sql.catalyst.parser.ParseException: Syntax error at or near '('(line 2, pos 52)
I have class LowerCaseColumn.scala where one function is defined as below :
override def registerSQL(): Unit = spark.sql(
"""
|CREATE OR REPLACE TEMPORARY ...
0
votes
2
answers
38
views
Determine if a condition is ever true in an aggregated dataset with Scala spark sql library
I'm trying to aggregate a dataset and determine if a condition is ever true for a row in the dataset.
Suppose I have a dataset with these values
cust_id
travel_type
distance_travelled
1
car
10
1
...
0
votes
0
answers
42
views
Spark re computes the cached Dataframes
Working on a Spark application written in Scala. Have six functions. Each function takes two Dataframes as an input, processes them and emits one result DF. I am caching the result of each function's ...
0
votes
1
answer
138
views
java.lang.OutOfMemoryError: UTF16 String size exceeding default value
I was trying to load a tsv files from urls (max file size was 1.05 GB or 1129672402 Bytes)
I used java.net.URL for it.
But, it throwed the below error (for the largest one)-
java.lang.OutOfMemoryError:...
0
votes
2
answers
39
views
spark dataframe to check if all the elements are matched to given value of particular column
I have created spark dataframe using scala, here is sample data
emp_id|result
1000 | [true,true,true]
1001 | [true,false,true]
1002 | [true,true,true]
result column is array
I would like to ...
1
vote
1
answer
36
views
adding new column to dataframe of Array[String] type based on condition, spark scala
I have the following dataframe -
colA
colB
A1
B1
A2
B2
A3
B3
colA: String, colB: String
Also, I have a Map[String, Array[String]]
I want to add a new column 'colC' containing values of Map ...
0
votes
0
answers
74
views
Compare 2 Lists/Array in Scala Spark
I have 2 lists:
# time taken
x1 = List(10, 20, 30, 40, 50)
# time alloted
y1 = List(15, 30)
here are some more examples
+-------------------+----------------------+----------+
| time_taken |...
0
votes
0
answers
63
views
Convert nested avro structures to flat schema in Apache Spark
I have a use case where I have to read data from Kafka and write to a Sink. The data in kafka is in avro and the fields are wrapped in an avro map. The map will not have same keys always and will vary ...
0
votes
0
answers
31
views
Scala Spark: average of difference
Given input dataframe with structure:
| machine_id | process_id | activity_type | timestamp |
| ---------- | ---------- | ------------- | --------- |
| 0 | 0 | start | ...
0
votes
0
answers
24
views
Spark SQL - performance degradation after adding a new column
My code is in Scala and I'm using Spark SQL syntax to make a union between 3 dataframes of data.
Currently I am working on adding a new field. It's applicable only for one of the dataframes, so the ...
-1
votes
1
answer
41
views
Not able to create CSV using Spark dataframe and scala, Instead it is creating folder with `.csv` in folder name
I am not able to write or create csv using spark dataframe. Instead it is creating directoy for me. This is my code
package com.package.dssupplier
import org.apache.spark.sql.{SaveMode, SparkSession}
...
2
votes
0
answers
50
views
How can I replace values in array of struct with another values using spark?
I have a hive table named student_details, it has the below format:
| Date | Name | Age | Subject | Students ...
0
votes
1
answer
46
views
how to call a class inside another Scala Object?
I have a class DFHelper which helps getting the dataframe keys.
I want to maintain it as generic code and call it from another main scala object. E.g the first code section i am defining for generic ...
0
votes
1
answer
31
views
How to get the keys from org.apache.spark.sql.Column type in scala and put into a list variable?
I am trying to get the keys from org.apache.spark.sql.Column type variable and put it into a list so that i can do some schema comparison.
inputFieldMap: org.apache.spark.sql.Column = keys:[customerID,...
-1
votes
1
answer
25
views
Filter out and log null values from Spark dataframe
I have this dataframe :
+------+-------------------+-----------+
|brand |original_timestamp |weight |
+------+-------------------+-----------+
|BR1 |1632899456 |4.0 |
|BR2 |...