Skip to main content

Questions tagged [apache-beam]

Apache Beam is a unified SDK for batch and stream processing. It allows to specify large-scale data processing workflows with a Beam-specific DSL. Beam workflows can be executed on different runtimes like Apache Flink, Apache Spark, or Google Cloud Dataflow (a cloud service).

apache-beam
0 votes
1 answer
22 views

How to handle skewness of data in Apache Beam.? Is this achievable? If yes, then how?

I am a Data Engineer. I have used PySpark for a long time and now moving to Apache Beam/Dataflow . So,as this is managed services, we dont have to do much. But, there is one question , I want to know, ...
amarjeet kushwaha's user avatar
0 votes
0 answers
14 views

Not able to create job in dataflow for streaming data

I am executing my Apache-beam code in google cloud shell, I am able to execute code without errors, but jobs not creating in data flow. **below roles I assigned to service account ** Dataflow Worker, ...
Sai3554's user avatar
  • 13
0 votes
0 answers
32 views

Beam RunInference and sentence-transformers from huggingface

I am trying to use RunInference beam Transform, together with stsb-xlm-r-multilingual model. Like so: inferences = ( formatted_examples | "Run Inference" >> ...
yuvalon's user avatar
  • 103
0 votes
1 answer
36 views

Fixed windowing not producing synchronous output

I am using the Apache Beam Python SDK to create a Dataflow pipeline. Steps: ingest synchronous pubsub messages window into 1 second windows with 600ms allowed latency and a 2 second processing ...
Joe Moore's user avatar
  • 1,965
0 votes
2 answers
50 views

Apache beam streaming process with time base windows

I have a dataflow pipeline that reads messages from kafka, process them, and insert them into bigquery. I want that the processing / bigquery insertion will happen in time based batches, so that on ...
asafal's user avatar
  • 45
-1 votes
0 answers
28 views

Apache beam Pipeline | kafka to Dataproc

I have created a beam pipeline to read data from the Kafka topic and then insert it to hive tables in DataProc cluster I consumed the data and converted it to the HcatRecord as below p....
Dulanga Heshan's user avatar
0 votes
1 answer
20 views

Firestore Write from Beam

Is it possible to write to 2 different firestore databases from the same beam Job ? One database is the 'default' and the other being a different one. When building the beam pipeline, it is picking ...
Kaling's user avatar
  • 1
0 votes
0 answers
26 views

How do I specify a field having keyword or fielddata=true in ElasticSearchIO?

We are using Apache Beam Pipeline and ElasticSearchIO in Java. How can I make a field be a keyword or use fielddata=true? Preferrably, we would like to use a keyword. Thanks. We tried the following ...
Phil McLachlan's user avatar
0 votes
0 answers
40 views

Dataflow Job Fails with Cannot create PoolableConnectionFactory and PERMISSION_DENIED Errors

I'm working on a data migration personal project where I need to transfer multiple tables from Microsoft SQL Server AdventureWorks2019 database to Google BigQuery using Dataflow SQL Server to BQ ...
Cjizzle's user avatar
2 votes
1 answer
54k views

ERROR DockerEnvironmentFactory: Docker container xxxxx logs, when trying to run Apache Beam pipeline written in Go using Spark runner

I have a pipeline written in Go that I want to execute with Spark runner, the Spark Standalone is installed on my local machine. Apache Beam 2.56.0 Apache Spark 3.2.2 I started Spark master and ...
David Undersit's user avatar
0 votes
0 answers
39 views

How to configure GCP Spanner ChangeStream read duration in java

We are using GCP spanner changestream in our apache beam java dataflow job. And using SpannerIO connector we are configured it. Code is below, static class Read extends PTransform<PBegin, ...
Shetty's user avatar
  • 1
0 votes
0 answers
21 views

Apache Beam Parallel Shared State

Considering a pipeline Apache Beam with 2 parallel transformations # Transformation 1 p | read_from_pubsub_subscription_1 | save_current_state | write_to_pubsub # Transformation 2 p | ...
Yak O'Poe's user avatar
  • 822
0 votes
1 answer
27 views

Apache beam code not running giving error

I am trying to write custom flex template to extract the data from JDBC and write into GCS bucket. What I am doing wrong in the code? The code is producing error: Error message from worker: Traceback (...
DKM's user avatar
  • 1,801
0 votes
0 answers
15 views

Why Beam AfterCount trigger behaving differently? Can anyone explain the output?

I am learning apache-beam triggers. I have written a apache beam code, which have 30 second fixed window, and a afterCount trigger of 3, and accumulation_mode as trigger.AccumulationMode.ACCUMULATING. ...
Amar Jeet's user avatar
1 vote
0 answers
40 views

Error "Unable to parse" Custom Data Flow Template

I am trying to create custom data flow template for JDBC connection however when importing the template (python code to json converted) it is giving error/warning in console. Error/Warning:Fail to ...
DKM's user avatar
  • 1,801

15 30 50 per page
1
2 3 4 5
324