Google Cloud Storage + Function: System architecture at scale

Question

Background

I have a processing chain with 3 steps. I going to design my application to have a very high output.

Getting into details

The system is solving incoming tasks. Each processing chain (A, B and C) has input and output:

A input is a task to be solved. A output is a list of sub tasks to be solved. A produces multiple outputs for a single input (all related to the same task).

B input is a task to be solved. B output is a single task targeted to C.

C input are list of messages, aggregated by the "parent task". Once all the items for a specific tasks is completely solved, C mark the task as completed.

Diagram:

One possible architecture, using Google Cloud, is to write a Google Cloud Storage Object into a bucket for every new incoming task. Turn on Google Function notification for each new storage object created. This function will to the work of A (from the processing chain). The output will be written into diffrent bucket that will fire another Function notification (B). The output will be written into a 3rd bucket for processing of C.

Note: When a function process a task, it also delete it on the end.

Let's assume that a specific task was created 10 items to process on Function B. So, in bucket C you will find, at the end, 10 different objects. Function C mission is to detect the exact time when ALL the items (A output) for a specific task was completely executed. If all the items executed, C has to mark the task as completed.

The Problem

Sounds like we have to count how many outputs A had, and compare it to how many inputs C had.

Is this possible to change the system design to prevent the need of "counting messages"?

guillaume blaquiere · Accepted Answer · 2021-03-02 14:30:15Z

1

I recommend you to have a look to Cloud Workflows product. In your design,

Keep the function A but update it to send a JSON list of B task to run.
Then, iterate over the A JSON array response and use parallel executor to run, in parallel, the B tasks. i wrote an article on this
When all the B tasks are finished, the C function can be called by Workflows.

answered Mar 2, 2021 at 14:30

guillaume blaquiere

73.4k3 gold badges60 silver badges93 bronze badges

Add a comment |

al-dann · Accepted Answer · 2021-03-02 09:29:56Z

I am not sure I understand all context, requirements and scope restrictions/complications, but I would suggest to go through couple of StackOverflow questions and a Medium article to start with.

How to combine multiple files in GCS bucket with Cloud Function trigger

How to concatenate sharded files on Google Cloud Storage automatically using Cloud Functions

Google Cloud Platform solution for serverless log ingestion (files downloading) from a SFTP server

I think those questions (and discussions) won't provide a complete answer on your question, but should expose some ideas on how a "state machine" can be supported in the idempotent world of cloud functions.

Collectives™ on Stack Overflow

Google Cloud Storage + Function: System architecture at scale

Background

Getting into details

The Problem

2 Answers 2

Not the answer you're looking for? Browse other questions tagged
google-cloud-platform
google-cloud-functions
google-cloud-storage
google-cloud-pubsub
or ask your own question.

Linked

Hot Network Questions

Collectives™ on Stack Overflow

Background

Getting into details

The Problem

2 Answers 2

Not the answer you're looking for? Browse other questions tagged google-cloud-platformgoogle-cloud-functionsgoogle-cloud-storagegoogle-cloud-pubsub or ask your own question.

Linked

Related

Not the answer you're looking for? Browse other questions tagged
google-cloud-platform
google-cloud-functions
google-cloud-storage
google-cloud-pubsub
or ask your own question.