Create my own method for DataFrames (python)

Question

So I wanted to create a module for my own projects and wanted to use methods. For example I wanted to do:

from mymodule import *
df = pd.DataFrame(np.random.randn(4,4))
df.mymethod()

Thing is it seems I can't use .myfunc() since I think I can only use methods for the classes I've created. A work around is making mymethod a function and making it use pandas.Dataframes as a variable:

myfunc(df)

I don't really want to do this, is there anyway to implement the first one?

Why don't you want to make it a function? Otherwise you'll have to subclass or patch the data frame. — jonrsharpe, Commented Apr 19, 2017 at 19:04
Depending on what the function does you may be able to use apply. For example df.apply(myfunc) I realize this doesn't create a new method, but perhaps it gets you what you need, at the very least you can do method chaining this way ` df.apply(myfunc).apply(myotherfunc)... — johnchase, Commented Apr 19, 2017 at 19:10
What about just using the apply method? How complex is your method? — boot-scootin, Commented Apr 19, 2017 at 19:10
As noted in an answer below, the pandas documentation provides a "way to extend pandas objects without subclassing them" using the decorator pandas.api.extensions.register_dataframe_accessor(). There is a long list of extensions in the pandas ecosystem page. — Paul Rougieux, Commented Nov 16, 2021 at 14:07

Ivan Mishalkin · Accepted Answer · 2018-12-05 10:24:45Z

38

Nice solution can be found in ffn package. What authors do:

from pandas.core.base import PandasObject
def your_fun(df):
    ...
PandasObject.your_fun = your_fun

After that your manual function "your_fun" becomes a method of pandas.DataFrame object and you can do something like

df.your_fun()

This method will be able to work with both DataFrame and Series objects

answered Dec 5, 2018 at 10:24

Ivan Mishalkin

1,0789 silver badges26 bronze badges

1

Does this technique or way of coding has a name? I am trying to understand how/why it works and not sure I grasp it.
– monkey intern
Commented Aug 1, 2019 at 11:00
3

@monkeyintern There is "monkey-patching" name for it in outdated docs pandas.pydata.org/pandas-docs/version/0.15/… , however I found not pandas specific, but general way to add methods here medium.com/@mgarod/…
– Ivan Mishalkin
Commented Aug 1, 2019 at 11:47
1

After experimenting, this seems to add this under all Pandas object, including Series (columns), maybe not what you want, as "self" - here "df" is then not a dataframe, but a Series... You would then have to stop the user from using a method in a place you have put it. The Pandas API now lets you extend in other ways. pandas.pydata.org/docs/development/extending.html Take a look at pandichef's answer.
– Carl F. Corneil
Commented Feb 5, 2022 at 19:16
Note that this can also be done with an anonymous function, (e.g. pd.Series.vc = lambda x: x.value_counts(dropna=False))
– Raisin
Commented Jun 24, 2022 at 16:02

Add a comment |

Stephen Rauch · Accepted Answer · 2017-04-19 20:21:59Z

If you really need to add a method to a pandas.DataFrame you can inherit from it. Something like:

mymodule:

import pandas as pd

class MyDataFrame(pd.DataFrame):
    def mymethod(self):
        """Do my stuff"""

Use mymodule:

from mymodule import *
df = MyDataFrame(np.random.randn(4,4))
df.mymethod()

To preserve your custom dataframe class:

pandas routinely returns new dataframes when performing operations on dataframes. So to preserve your dataframe class, you need to have pandas return your class when performing operations on an instance of your class. That can be done by providing a _constructor property like:

class MyDataFrame(pd.DataFrame):

    @property
    def _constructor(self):
        return MyDataFrame

    def mymethod(self):
        """Do my stuff"""

Test Code:

class MyDataFrame(pd.DataFrame):

    @property
    def _constructor(self):
        return MyDataFrame

df = MyDataFrame([1])
print(type(df))
df = df.rename(columns={})
print(type(df))

Test Results:

<class '__main__.MyDataFrame'>
<class '__main__.MyDataFrame'>

plus one for effort. But won't this be difficult because pandas will just return a dataframe in most cases. You have to do some additional trickery to override every pd.DataFrame method that returns pd.DataFrame. Otherwise, this is a one use method and you are back to a pdDataFrame... most likely. — piRSquared, Commented Apr 19, 2017 at 19:21
@piRSquared, you are correct as usual. But there appears to be an easy workaround. — Stephen Rauch, Commented Apr 19, 2017 at 19:43

pandichef · Accepted Answer · 2020-06-06 00:23:08Z

12

This topic is well documented as of Nov 2019: Extending pandas

Note that the most obvious technique - Ivan Mishalkin's monkey patching - was actually removed at some point in the official documentation... probably for good reason.

Monkey patching works fine for small projects, but there is a serious drawback for a large scale project: IDEs like Pycharm can't introspect the patched-in methods. So if one right clicks "Go to declaration", Pycharm simply says "cannot find declaration to go to". It gets old fast if you're an IDE junkie.

I confirmed that Pycharm CAN introspect both the "custom accessors" and "subclassing" methods discussed in the official documentation.

edited Jun 6, 2020 at 0:23

answered Nov 5, 2019 at 6:53

pandichef

76610 silver badges12 bronze badges

2

This is now the best answer!
– n8yoder
Commented Sep 8, 2023 at 15:58

Add a comment |

Amir Py · Accepted Answer · 2023-02-21 18:24:05Z

I have used the Ivan Mishalkins handy solution in our in-house python library extensively. At some point I thought, it would be better to use his solution in form of a decorator. The only restriction is that the first argument of decorated function must be a DataFrame:

from copy import deepcopy
from functools import wraps
import pandas as pd
from pandas.core.base import PandasObject

def as_method(func):
    """
    This decrator makes a function also available as a method.
    The first passed argument must be a DataFrame.
    """

    @wraps(func)
    def wrapper(*args, **kwargs):
        return func(*deepcopy(args), **deepcopy(kwargs))

    setattr(PandasObject, wrapper.__name__, wrapper)

    return wrapper


@as_method
def augment_x(DF, x):
    """We will be able to see this docstring if we run ??augment_x"""
    DF[f"column_{x}"] = x

    return DF

Example:

df = pd.DataFrame({"A": [1, 2]})
df
   A
0  1
1  2

df.augment_x(10)
   A  column_10
0  1         10
1  2         10

As you can see, the original DataFrame is not changed. As if there is a inplace = False

You can still use the augment_x as a simple function:

augment_x(df, 2)
    A   column_2
0   1   2
1   2   2

Collectives™ on Stack Overflow

Create my own method for DataFrames (python)

4 Answers 4

Not the answer you're looking for? Browse other questions tagged
python
pandas
methods
module
or ask your own question.

Linked

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

Not the answer you're looking for? Browse other questions tagged pythonpandasmethodsmodule or ask your own question.

Linked

Related

Not the answer you're looking for? Browse other questions tagged
python
pandas
methods
module
or ask your own question.