Questions tagged [apache-beam]
Apache Beam is a unified SDK for batch and stream processing. It allows to specify large-scale data processing workflows with a Beam-specific DSL. Beam workflows can be executed on different runtimes like Apache Flink, Apache Spark, or Google Cloud Dataflow (a cloud service).
apache-beam
4,849
questions
0
votes
1
answer
22
views
How to handle skewness of data in Apache Beam.? Is this achievable? If yes, then how?
I am a Data Engineer.
I have used PySpark for a long time and now moving to Apache Beam/Dataflow .
So,as this is managed services, we dont have to do much.
But, there is one question , I want to know, ...
0
votes
0
answers
14
views
Not able to create job in dataflow for streaming data
I am executing my Apache-beam code in google cloud shell, I am able to execute code without errors, but jobs not creating in data flow.
**below roles I assigned to service account
**
Dataflow Worker, ...
0
votes
0
answers
32
views
Beam RunInference and sentence-transformers from huggingface
I am trying to use RunInference beam Transform, together with stsb-xlm-r-multilingual model.
Like so:
inferences = (
formatted_examples
| "Run Inference" >> ...
0
votes
1
answer
36
views
Fixed windowing not producing synchronous output
I am using the Apache Beam Python SDK to create a Dataflow pipeline.
Steps:
ingest synchronous pubsub messages
window into 1 second windows with 600ms allowed latency and a 2 second processing ...
0
votes
2
answers
50
views
Apache beam streaming process with time base windows
I have a dataflow pipeline that reads messages from kafka, process them, and insert them into bigquery.
I want that the processing / bigquery insertion will happen in time based batches, so that on ...
-1
votes
0
answers
28
views
Apache beam Pipeline | kafka to Dataproc
I have created a beam pipeline to read data from the Kafka topic and then insert it to hive tables in DataProc cluster
I consumed the data and converted it to the HcatRecord as below
p....
0
votes
1
answer
20
views
Firestore Write from Beam
Is it possible to write to 2 different firestore databases from the same beam Job ? One database is the 'default' and the other being a different one.
When building the beam pipeline, it is picking ...
0
votes
0
answers
26
views
How do I specify a field having keyword or fielddata=true in ElasticSearchIO?
We are using Apache Beam Pipeline and ElasticSearchIO in Java. How can I make a field be a keyword or use fielddata=true? Preferrably, we would like to use a keyword. Thanks.
We tried the following ...
0
votes
0
answers
40
views
Dataflow Job Fails with Cannot create PoolableConnectionFactory and PERMISSION_DENIED Errors
I'm working on a data migration personal project where I need to transfer multiple tables from Microsoft SQL Server AdventureWorks2019 database to Google BigQuery using Dataflow SQL Server to BQ ...
2
votes
1
answer
54k
views
ERROR DockerEnvironmentFactory: Docker container xxxxx logs, when trying to run Apache Beam pipeline written in Go using Spark runner
I have a pipeline written in Go that I want to execute with Spark runner, the Spark Standalone is installed on my local machine.
Apache Beam 2.56.0
Apache Spark 3.2.2
I started Spark master and ...
0
votes
0
answers
39
views
How to configure GCP Spanner ChangeStream read duration in java
We are using GCP spanner changestream in our apache beam java dataflow job. And using SpannerIO connector we are configured it. Code is below,
static class Read extends PTransform<PBegin, ...
0
votes
0
answers
21
views
Apache Beam Parallel Shared State
Considering a pipeline Apache Beam with 2 parallel transformations
# Transformation 1
p | read_from_pubsub_subscription_1 | save_current_state | write_to_pubsub
# Transformation 2
p | ...
0
votes
1
answer
27
views
Apache beam code not running giving error
I am trying to write custom flex template to extract the data from JDBC and write into GCS bucket. What I am doing wrong in the code?
The code is producing error:
Error message from worker: Traceback (...
0
votes
0
answers
15
views
Why Beam AfterCount trigger behaving differently? Can anyone explain the output?
I am learning apache-beam triggers.
I have written a apache beam code, which have 30 second fixed window, and a afterCount trigger of 3, and accumulation_mode as trigger.AccumulationMode.ACCUMULATING.
...
1
vote
0
answers
40
views
Error "Unable to parse" Custom Data Flow Template
I am trying to create custom data flow template for JDBC connection however when importing the template (python code to json converted) it is giving error/warning in console.
Error/Warning:Fail to ...