14

There is an apply method in pandas dataframe that allows to apply some sync functions like:

import numpy as np
import pandas as pd

def fun(x):
    return x * 2

df = pd.DataFrame(np.arange(10), columns=['old'])

df['new'] = df['old'].apply(fun)

What is the fastest way to do similar thing if there is an async function fun2 that has to be applied:

import asyncio
import numpy as np
import pandas as pd

async def fun2(x):
    return x * 2

async def main():
    df = pd.DataFrame(np.arange(10), columns=['old'])
    df['new'] = 0    
    for i in range(len(df)):
        df['new'].iloc[i] = await fun2(df['old'].iloc[i])
    print(df)

asyncio.run(main())

1 Answer 1

19

Use asyncio.gather and overwrite the whole column when complete.

import asyncio

import numpy as np
import pandas as pd


async def fun2(x):
    return x * 2


async def main():
    df = pd.DataFrame(np.arange(10), columns=['old'])
    df['new'] = await asyncio.gather(*(fun2(v) for v in df['old']))
    print(df)


asyncio.run(main())

Doing it this way will pass each value in the column to the async function, meaning that all column values will be being run concurrently (which will be much faster than awaiting each function result sequentially in a loop).

Note: Column order is guaranteed to be preserved by asyncio.gather and the column will not be resolved until all awaitables have successfully completed.

Resulting output DataFrame:

   old  new
0    0    0
1    1    2
2    2    4
3    3    6
4    4    8
5    5   10
6    6   12
7    7   14
8    8   16
9    9   18
5
  • How do we make the output into a pandas dataframe ?
    – snow
    Commented Jan 21, 2022 at 6:54
  • 1
    @snow the output is a list of the results of fun2() over the df['old'] Series. @Henry's answer just assigns that list to a new column (Series) in the original df Commented Jan 27, 2022 at 18:36
  • 7
    One of the most ingenious ways to mix pandas with asyncio I've seen. Actually had to hunt for it in my browser's history. LOL Commented Jan 27, 2022 at 18:37
  • 1
    Will the correct order of the elements always be preserved? Commented Feb 23, 2022 at 14:25
  • 1
    @BulatIbragimov Yes. From the linked gather docs: "The order of result values corresponds to the order of awaitables in aws."
    – Henry Ecker
    Commented Feb 23, 2022 at 15:59

Not the answer you're looking for? Browse other questions tagged or ask your own question.