Questions tagged [aws-glue]

Ask Question

AWS Glue is a fully managed ETL (extract, transform, and load) service that can categorize your data, clean it, enrich it, and move it between various data stores. AWS Glue consists of a central data repository known as the AWS Glue Data Catalog, an ETL engine that automatically generates Python code, and a scheduler that handles dependency resolution, job monitoring, and retries. AWS Glue is serverless, so there's no infrastructure to manage.

4,215 questions

0 votes

0 answers

15 views

Delta table column mapping support in Athena/Glue

I'm confused by AWS documentation regarding compatibility with delta tables. We need to delete a column that is the "column mapping" feature supported in delta-lake 1.2.0 and we do it ...

Sergii V.

asked 2 days ago

0 votes

2 answers

20 views

Glue: Extracting Bucket Name and Key from AWS Event Triggered NotifyEvent payload in Cloud trail

I have a Event Bridge Trigger set on a s3 bucket and everytime we upload an object, it triggers a NotifyEvent in Cloud Trail. I am trying to extract the bucket name and key from the payload

Manish

asked Jul 18 at 2:38

0 votes

1 answer

33 views

Is there a better way to optimize my AWS Glue Script?

I am a novice with AWS Glue and PySpark and I am unable to resolve the problem I am facing and would require the community's help. Task: I was tasked with creating a script on AWS Glue using PySpark ...

Gokul Subramanian

asked Jul 17 at 13:21

0 votes

1 answer

30 views

AWS Glue Python Script Doesn't install wheel from s3 when adding a glue connection

I'm running a glue python-shell script, and I include extra-py-files that are paths in S3 to wheels I've built for the script. These are installed as expected. When I attach a Glue Connection to the ...

Nevermore

7,319

asked Jul 17 at 13:00

1 vote

1 answer

38 views

Pyspark efficient ways to iterate over 1M columns

I have a pyspark dataframe as below: +--------+-------------+---------+---------+---------+ | code| updatedAt|S0x223433|S1yd33333|S4r256467| +--------+-------------+---------+---------+---------+...

datawiz879

asked Jul 15 at 22:34

0 votes

0 answers

22 views

Apache Iceberg - long merge time

I have AWS Glue job which is trying to merge data into Apache Iceberg table partitioned by product_id. What i'm trying to achieve is to be able to run concurrent merge operations using AWS Glue jobs ...

P.Zaw

asked Jul 12 at 12:47

3 votes

0 answers

38 views

Getting : "An error occurred while calling o110.pyWriteDynamicFrame. Exception thrown in awaitResult:" in AWS Glue Job

I am getting "An error occurred while calling o110.pyWriteDynamicFrame. Exception is thrown in awaitResult:" in AWS Glue Job. The size of my source data in s3 is around 60 GB. I am reading ...

Nikhil Khandelwal

asked Jul 12 at 6:06

0 votes

0 answers

28 views

Botocore.exceptions.DataNotFoundError: unable to load data points for : glue

I’m working on migrating aws glue etls from v2 to v4. While creating glue client i’m getting error botocore.exception data not found: unable to load data points error. Failing at botocore—> loaders....

Malvika Garg

asked Jul 12 at 4:48

0 votes

1 answer

31 views

AWS Athena Error: Modifying Hive table rows is only supported for transactional tables

I am not able to perform delete operation on row in AWS Athena tables. It is throwing below error as: NOT_SUPPORTED: Modifying Hive table rows is only supported for transactional tables This query ran ...

Lakshay

asked Jul 11 at 12:04

1 vote

0 answers

26 views

Flattening a DataFrame in Spark after resolving nested field datatypes

Im having issues flattening a nested dataframe in spark which I have solved using a custom function but I am wondering if there is a better way to go about this. The workflow is simple, the files ...

BurgerTown

asked Jul 11 at 8:50

3 votes

1 answer

48 views

How to format string date for AWS glue crawler/data frame to correctly identify as date field?

I have some json data (sample below). aws glue crawler reads this data and creates a glue catalog database with table , and sets the date field as a string field . is there a way , i can format date ...

kishi

asked Jul 11 at 1:29

0 votes

0 answers

20 views

What are the character limitations of a SchemaName in the AWS Schema Registry?

What are the limiations of the SchemaName within AWS Glue Schema Registry? In particular I would like to know: What is the maximum character limit for a schema name? I ask because the ...

vab2048

1,195

asked Jul 10 at 21:57

0 votes

0 answers

24 views

how to check logs for a glue trigger job?

I have aws glue resources set up , which if i run manually from the browser works. to automate this i added aws_glue_trigger resource with a condition that , if the crawler succeeds then i fire off ...

kishi

asked Jul 10 at 20:51

0 votes

0 answers

35 views

How to execute SQL statements against on premise Oracle DB in AWS Glue

I am trying to update a table in my Oracle DB using AWS Glue. I know that PySpark is not really meant for updating tables or inserting. More for reading. Though, I am trying to update all rows for a ...

rz01

asked Jul 9 at 22:59

0 votes

1 answer

33 views

Invalid Identifier error while using standard_hash function in oracle 11g

I'm trying to generate a hash based on a field but got the following error: Query: select standard_hash(pk_time) from schema.table Error: "STANDARD_HASH": invalid identifier The column type ...

Gocht

10.2k

asked Jul 9 at 20:19

15 30 50 per page

2 3 4 5

…

281 Next

Collectives™ on Stack Overflow

Questions tagged [aws-glue]

Delta table column mapping support in Athena/Glue

Glue: Extracting Bucket Name and Key from AWS Event Triggered NotifyEvent payload in Cloud trail

Is there a better way to optimize my AWS Glue Script?

AWS Glue Python Script Doesn't install wheel from s3 when adding a glue connection

Pyspark efficient ways to iterate over 1M columns

Apache Iceberg - long merge time

Getting : "An error occurred while calling o110.pyWriteDynamicFrame. Exception thrown in awaitResult:" in AWS Glue Job

Botocore.exceptions.DataNotFoundError: unable to load data points for : glue

AWS Athena Error: Modifying Hive table rows is only supported for transactional tables

Flattening a DataFrame in Spark after resolving nested field datatypes

How to format string date for AWS glue crawler/data frame to correctly identify as date field?

What are the character limitations of a SchemaName in the AWS Schema Registry?

how to check logs for a glue trigger job?

How to execute SQL statements against on premise Oracle DB in AWS Glue

Invalid Identifier error while using standard_hash function in oracle 11g

Hot Network Questions

Collectives™ on Stack Overflow

Questions tagged [aws-glue]

Related Tags