I have a list of arguments that can be executed in any order. I have written one function to parse and write to an S3 bucket. The function output is a CSV file that's written to the bucket.
def func(arg):
# code to parse
writeToS3(arg)
if __name__ == "__main__":
with concurrent.futures.ThreadPoolExecutor(max_workers=6) as executor:
futures1 = [executor.submit(func, arg) for arg in argumentsList[1:250]]
futures2 = [executor.submit(func, arg) for arg in argumentsList[250:500]]
futures3 = [executor.submit(func, arg) for arg in argumentsList[500:750]]
# ...
concurrent.futures.wait(futures1)
concurrent.futures.wait(futures2)
concurrent.futures.wait(futures3)
# ...
I made changes to the chunk size, i.e., argumentsList with sizes of 100, 200, etc., but over the course of execution, it slows down.
Here's how the code I tried looks
futures1 = [executor.submit(func, arg) for arg in argumentsList[1:250]]
futures2 = [executor.submit(func, arg) for arg in argumentsList[250:500]]
futures3 = [executor.submit(func, arg) for arg in argumentsList[500:750]]
# ...
concurrent.futures.wait(futures1)
concurrent.futures.wait(futures2)
concurrent.futures.wait(futures3)
# ...`
Now, while running, it gets slowed down. Any advice to speed up execution?
Appreciate your help.