Newest 'scala+apache-spark-sql' Questions

0 votes

1 answer

51 views

spark.sql() giving error : org.apache.spark.sql.catalyst.parser.ParseException: Syntax error at or near '('(line 2, pos 52)

I have class LowerCaseColumn.scala where one function is defined as below : override def registerSQL(): Unit = spark.sql( """ |CREATE OR REPLACE TEMPORARY ...

Chandra Prakash

1

asked Jul 9 at 13:54

0 votes

2 answers

38 views

Determine if a condition is ever true in an aggregated dataset with Scala spark sql library

I'm trying to aggregate a dataset and determine if a condition is ever true for a row in the dataset. Suppose I have a dataset with these values cust_id travel_type distance_travelled 1 car 10 1 ...

Darragh.McL

107

asked Jun 27 at 10:06

0 votes

0 answers

42 views

Spark re computes the cached Dataframes

Working on a Spark application written in Scala. Have six functions. Each function takes two Dataframes as an input, processes them and emits one result DF. I am caching the result of each function's ...

Karthik

63

asked Jun 23 at 9:22

0 votes

1 answer

138 views

java.lang.OutOfMemoryError: UTF16 String size exceeding default value

I was trying to load a tsv files from urls (max file size was 1.05 GB or 1129672402 Bytes) I used java.net.URL for it. But, it throwed the below error (for the largest one)- java.lang.OutOfMemoryError:...

prisoner

13

asked Jun 11 at 13:06

0 votes

2 answers

39 views

spark dataframe to check if all the elements are matched to given value of particular column

I have created spark dataframe using scala, here is sample data emp_id|result 1000 | [true,true,true] 1001 | [true,false,true] 1002 | [true,true,true] result column is array I would like to ...

N9909

225

asked Jun 9 at 23:38

1 vote

1 answer

36 views

adding new column to dataframe of Array[String] type based on condition, spark scala

I have the following dataframe - colA colB A1 B1 A2 B2 A3 B3 colA: String, colB: String Also, I have a Map[String, Array[String]] I want to add a new column 'colC' containing values of Map ...

prisoner

13

asked Jun 4 at 11:53

0 votes

0 answers

74 views

Compare 2 Lists/Array in Scala Spark

I have 2 lists: # time taken x1 = List(10, 20, 30, 40, 50) # time alloted y1 = List(15, 30) here are some more examples +-------------------+----------------------+----------+ | time_taken |...

nagraj036

175

asked Jun 3 at 17:58

0 votes

0 answers

63 views

Convert nested avro structures to flat schema in Apache Spark

I have a use case where I have to read data from Kafka and write to a Sink. The data in kafka is in avro and the fields are wrapped in an avro map. The map will not have same keys always and will vary ...

user3679686

516

asked May 8 at 15:33

0 votes

0 answers

31 views

Scala Spark: average of difference

Given input dataframe with structure: | machine_id | process_id | activity_type | timestamp | | ---------- | ---------- | ------------- | --------- | | 0 | 0 | start | ...

Jelly

1,190

asked May 2 at 11:47

0 votes

0 answers

24 views

Spark SQL - performance degradation after adding a new column

My code is in Scala and I'm using Spark SQL syntax to make a union between 3 dataframes of data. Currently I am working on adding a new field. It's applicable only for one of the dataframes, so the ...

nelyanne

106

asked Apr 18 at 17:54

-1 votes

1 answer

41 views

Not able to create CSV using Spark dataframe and scala, Instead it is creating folder with `.csv` in folder name

I am not able to write or create csv using spark dataframe. Instead it is creating directoy for me. This is my code package com.package.dssupplier import org.apache.spark.sql.{SaveMode, SparkSession} ...

Braham Shakti

1,436

asked Apr 18 at 17:32

2 votes

0 answers

50 views

How can I replace values in array of struct with another values using spark?

Ambivert

21

asked Apr 17 at 10:43

0 votes

1 answer

46 views

how to call a class inside another Scala Object?

I have a class DFHelper which helps getting the dataframe keys. I want to maintain it as generic code and call it from another main scala object. E.g the first code section i am defining for generic ...

Shankar Panda

822

asked Apr 15 at 9:17

0 votes

1 answer

31 views

How to get the keys from org.apache.spark.sql.Column type in scala and put into a list variable?

I am trying to get the keys from org.apache.spark.sql.Column type variable and put it into a list so that i can do some schema comparison. inputFieldMap: org.apache.spark.sql.Column = keys:[customerID,...

Shankar Panda

822

asked Apr 15 at 4:43

-1 votes

1 answer

25 views

Filter out and log null values from Spark dataframe

I have this dataframe : +------+-------------------+-----------+ |brand |original_timestamp |weight | +------+-------------------+-----------+ |BR1 |1632899456 |4.0 | |BR2 |...

Nab

138

asked Apr 8 at 10:18

Collectives™ on Stack Overflow

All Questions

spark.sql() giving error : org.apache.spark.sql.catalyst.parser.ParseException: Syntax error at or near '('(line 2, pos 52)

Determine if a condition is ever true in an aggregated dataset with Scala spark sql library

Spark re computes the cached Dataframes

java.lang.OutOfMemoryError: UTF16 String size exceeding default value

spark dataframe to check if all the elements are matched to given value of particular column

adding new column to dataframe of Array[String] type based on condition, spark scala

Compare 2 Lists/Array in Scala Spark

Convert nested avro structures to flat schema in Apache Spark

Scala Spark: average of difference

Spark SQL - performance degradation after adding a new column

Not able to create CSV using Spark dataframe and scala, Instead it is creating folder with `.csv` in folder name

How can I replace values in array of struct with another values using spark?

how to call a class inside another Scala Object?

How to get the keys from org.apache.spark.sql.Column type in scala and put into a list variable?

Filter out and log null values from Spark dataframe

Hot Network Questions

Collectives™ on Stack Overflow

All Questions

Related Tags