Questions tagged [aws-glue]
AWS Glue is a fully managed ETL (extract, transform, and load) service that can categorize your data, clean it, enrich it, and move it between various data stores. AWS Glue consists of a central data repository known as the AWS Glue Data Catalog, an ETL engine that automatically generates Python code, and a scheduler that handles dependency resolution, job monitoring, and retries. AWS Glue is serverless, so there's no infrastructure to manage.
aws-glue
4,215
questions
0
votes
0
answers
15
views
Delta table column mapping support in Athena/Glue
I'm confused by AWS documentation regarding compatibility with delta tables.
We need to delete a column that is the "column mapping" feature supported in delta-lake 1.2.0 and we do it ...
0
votes
2
answers
20
views
Glue: Extracting Bucket Name and Key from AWS Event Triggered NotifyEvent payload in Cloud trail
I have a Event Bridge Trigger set on a s3 bucket and everytime we upload an object, it triggers a NotifyEvent in Cloud Trail. I am trying to extract the bucket name and key from the payload
0
votes
1
answer
33
views
Is there a better way to optimize my AWS Glue Script?
I am a novice with AWS Glue and PySpark and I am unable to resolve the problem I am facing and would require the community's help.
Task: I was tasked with creating a script on AWS Glue using PySpark ...
0
votes
1
answer
30
views
AWS Glue Python Script Doesn't install wheel from s3 when adding a glue connection
I'm running a glue python-shell script, and I include extra-py-files that are paths in S3 to wheels I've built for the script. These are installed as expected.
When I attach a Glue Connection to the ...
1
vote
1
answer
38
views
Pyspark efficient ways to iterate over 1M columns
I have a pyspark dataframe as below:
+--------+-------------+---------+---------+---------+
| code| updatedAt|S0x223433|S1yd33333|S4r256467|
+--------+-------------+---------+---------+---------+...
0
votes
0
answers
22
views
Apache Iceberg - long merge time
I have AWS Glue job which is trying to merge data into Apache Iceberg table partitioned by product_id.
What i'm trying to achieve is to be able to run concurrent merge operations using AWS Glue jobs ...
3
votes
0
answers
38
views
Getting : "An error occurred while calling o110.pyWriteDynamicFrame. Exception thrown in awaitResult:" in AWS Glue Job
I am getting "An error occurred while calling o110.pyWriteDynamicFrame. Exception is thrown in awaitResult:" in AWS Glue Job.
The size of my source data in s3 is around 60 GB.
I am reading ...
0
votes
0
answers
28
views
Botocore.exceptions.DataNotFoundError: unable to load data points for : glue
I’m working on migrating aws glue etls from v2 to v4.
While creating glue client i’m getting error botocore.exception data not found: unable to load data points error.
Failing at botocore—> loaders....
0
votes
1
answer
31
views
AWS Athena Error: Modifying Hive table rows is only supported for transactional tables
I am not able to perform delete operation on row in AWS Athena tables. It is throwing below error as:
NOT_SUPPORTED: Modifying Hive table rows is only supported for transactional tables
This query ran ...
1
vote
0
answers
26
views
Flattening a DataFrame in Spark after resolving nested field datatypes
Im having issues flattening a nested dataframe in spark which I have solved using a custom function but I am wondering if there is a better way to go about this.
The workflow is simple, the files ...
3
votes
1
answer
48
views
How to format string date for AWS glue crawler/data frame to correctly identify as date field?
I have some json data (sample below). aws glue crawler reads this data and creates a glue catalog database with table , and sets the date field as a string field . is there a way , i can format date ...
0
votes
0
answers
20
views
What are the character limitations of a SchemaName in the AWS Schema Registry?
What are the limiations of the SchemaName within AWS Glue Schema Registry? In particular I would like to know:
What is the maximum character limit for a schema name?
I ask because the ...
0
votes
0
answers
24
views
how to check logs for a glue trigger job?
I have aws glue resources set up , which if i run manually from the browser works. to automate this i added aws_glue_trigger resource with a condition that , if the crawler succeeds then i fire off ...
0
votes
0
answers
35
views
How to execute SQL statements against on premise Oracle DB in AWS Glue
I am trying to update a table in my Oracle DB using AWS Glue. I know that PySpark is not really meant for updating tables or inserting. More for reading. Though, I am trying to update all rows for a ...
0
votes
1
answer
33
views
Invalid Identifier error while using standard_hash function in oracle 11g
I'm trying to generate a hash based on a field but got the following error:
Query:
select standard_hash(pk_time) from schema.table
Error:
"STANDARD_HASH": invalid identifier
The column type ...