0

I have a list of arguments that can be executed in any order. I have written one function to parse and write to an S3 bucket. The function output is a CSV file that's written to the bucket.

def func(arg):
    # code to parse
    writeToS3(arg)

if __name__ == "__main__":
    with concurrent.futures.ThreadPoolExecutor(max_workers=6) as executor:
        futures1 = [executor.submit(func, arg) for arg in argumentsList[1:250]]
        futures2 = [executor.submit(func, arg) for arg in argumentsList[250:500]]
        futures3 = [executor.submit(func, arg) for arg in argumentsList[500:750]]
        # ...
        concurrent.futures.wait(futures1)
        concurrent.futures.wait(futures2)
        concurrent.futures.wait(futures3)
        # ...

I made changes to the chunk size, i.e., argumentsList with sizes of 100, 200, etc., but over the course of execution, it slows down.

Here's how the code I tried looks


futures1 = [executor.submit(func, arg) for arg in argumentsList[1:250]]
futures2 = [executor.submit(func, arg) for arg in argumentsList[250:500]]
futures3 = [executor.submit(func, arg) for arg in argumentsList[500:750]]
# ...

concurrent.futures.wait(futures1)
concurrent.futures.wait(futures2)
concurrent.futures.wait(futures3)
# ...`

Now, while running, it gets slowed down. Any advice to speed up execution?

Appreciate your help.

0