68

Given a dataclass like below:

class MessageHeader(BaseModel):
    message_id: uuid.UUID

    def dict(self, **kwargs):
        return json.loads(self.json())

I would like to get a dictionary of string literal when I call dict on MessageHeader The desired outcome of dictionary is like below:

{'message_id': '383b0bfc-743e-4738-8361-27e6a0753b5a'}

I want to avoid using 3rd party library like pydantic & I do not want to use json.loads(self.json()) as there are extra round trips

Is there any better way to convert a dataclass to a dictionary with string literal like above?

3
  • 1
    Is uuid.UUID already a string or some other type?
    – pho
    Commented Jun 13, 2022 at 14:57
  • 1
    When I call dict from pydantic, it returns uuid.UUID as it is. I need the UUID as a string in dictionary
    – Unknown
    Commented Jun 13, 2022 at 15:00
  • If you don't wanna use 3rd party libraries or even builtins, you probably came to Python from Assembler Commented Mar 23 at 14:24

4 Answers 4

104

You can use dataclasses.asdict:

from dataclasses import dataclass, asdict

class MessageHeader(BaseModel):
    message_id: uuid.UUID

    def dict(self):
        return {k: str(v) for k, v in asdict(self).items()}

If you're sure that your class only has string values, you can skip the dictionary comprehension entirely:

class MessageHeader(BaseModel):
    message_id: uuid.UUID

    dict = asdict
8
  • 5
    Note that asdict only returns keys that are defined at init
    – crypdick
    Commented Jan 4 at 21:23
  • @crypdick. That's an important note indeed Commented Jan 5 at 3:56
  • Getting the following error: TypeError: asdict() should be called on dataclass instances
    – Jeremy
    Commented Apr 1 at 22:45
  • @Jeremy. That's because the object you are calling it on is not a dataclass Commented Apr 2 at 2:53
  • if this solution is fragile and you won't even notice the problem if passing the dataclass object as **kwargs why is it getting all the votes and not the other solutions that don't have that fragility? e.g. stackoverflow.com/a/76410147/9201239
    – stason
    Commented Apr 6 at 2:20
17

For absolute pure, unadulterated speed and boundless efficiency, the kinds of which could even cause the likes of Chuck Norris to take pause and helplessly look on in awe, I humbly recommend this remarkably well planned-out approach with __dict__:

def dict(self):
    _dict = self.__dict__.copy()
    _dict['message_id'] = str(_dict['message_id'])
    return _dict

For a class that defines a __slots__ attribute, such as with @dataclass(slots=True), the above approach most likely won't work, as the __dict__ attribute won't be available on class instances. In that case, a highly efficient "shoot for the moon" approach such as below could instead be viable:

def dict(self):
    body_lines = ','.join(f"'{f}':" + (f'str(self.{f})' if f == 'message_id'
                                       else f'self.{f}') for f in self.__slots__)
    # Compute the text of the entire function.
    txt = f'def dict(self):\n return {{{body_lines}}}'
    ns = {}
    exec(txt, locals(), ns)
    _dict_fn = self.__class__.dict = ns['dict']
    return _dict_fn(self)

In case anyone's teetering at the edge of their seats right now (I know, this is really incredible, breakthrough-level stuff) - I've added my personal timings via the timeit module below, that should hopefully shed a little more light in the performance aspect of things.

FYI, the approaches with pure __dict__ are inevitably much faster than dataclasses.asdict().

Note: Even though __dict__ works better in this particular case, dataclasses.asdict() will likely be better for composite dictionaries, such as ones with nested dataclasses, or values with mutable types such as dict or list.

from dataclasses import dataclass, asdict, field
from uuid import UUID, uuid4


class DictMixin:
    """Mixin class to add a `dict()` method on classes that define a __slots__ attribute"""

    def dict(self):
        body_lines = ','.join(f"'{f}':" + (f'str(self.{f})' if f == 'message_id'
                                           else f'self.{f}') for f in self.__slots__)
        # Compute the text of the entire function.
        txt = f'def dict(self):\n return {{{body_lines}}}'
        ns = {}
        exec(txt, locals(), ns)
        _dict_fn = self.__class__.dict = ns['dict']
        return _dict_fn(self)


@dataclass
class MessageHeader:
    message_id: UUID = field(default_factory=uuid4)
    string: str = 'a string'
    integer: int = 1000
    floating: float = 1.0

    def dict1(self):
        _dict = self.__dict__.copy()
        _dict['message_id'] = str(_dict['message_id'])
        return _dict

    def dict2(self):
        return {k: str(v) if k == 'message_id' else v
                for k, v in self.__dict__.items()}

    def dict3(self):
        return {k: str(v) if k == 'message_id' else v
                for k, v in asdict(self).items()}


@dataclass(slots=True)
class MessageHeaderWithSlots(DictMixin):
    message_id: UUID = field(default_factory=uuid4)
    string: str = 'a string'
    integer: int = 1000
    floating: float = 1.0

    def dict2(self):
        return {k: str(v) if k == 'message_id' else v
                for k, v in asdict(self).items()}


if __name__ == '__main__':
    from timeit import timeit

    header = MessageHeader()
    header_with_slots = MessageHeaderWithSlots()

    n = 10000
    print('dict1():  ', timeit('header.dict1()', number=n, globals=globals()))
    print('dict2():  ', timeit('header.dict2()', number=n, globals=globals()))
    print('dict3():  ', timeit('header.dict3()', number=n, globals=globals()))

    print('slots -> dict():  ', timeit('header_with_slots.dict()', number=n, globals=globals()))
    print('slots -> dict2(): ', timeit('header_with_slots.dict2()', number=n, globals=globals()))

    print()

    dict__ = header.dict1()
    print(dict__)

    asdict__ = header.dict3()
    print(asdict__)

    assert isinstance(dict__['message_id'], str)
    assert isinstance(dict__['integer'], int)

    assert header.dict1() == header.dict2() == header.dict3()
    assert header_with_slots.dict() == header_with_slots.dict2()

Results on my Mac M1 laptop:

dict1():   0.005992999998852611
dict2():   0.00800508284009993
dict3():   0.07069579092785716
slots -> dict():   0.00583599996753037
slots -> dict2():  0.07395245810039341

{'message_id': 'b4e17ef9-1a58-4007-9cef-39158b094da2', 'string': 'a string', 'integer': 1000, 'floating': 1.0}
{'message_id': 'b4e17ef9-1a58-4007-9cef-39158b094da2', 'string': 'a string', 'integer': 1000, 'floating': 1.0}

Note: For a more "complete" implementation of DictMixin (named as SerializableMixin), check out a related answer I had also added.

8
  • Any idea what asdict is doing to slow it down so much? Commented Sep 30, 2022 at 3:06
  • 3
    @KarlKnechtel I'm not entirely sure, but my money's on the copy.deepcopy() call. If you look at the dataclasses source code for asdict, you can see it calls deepcopy on any complex or unknown type, which in this case would likely be the UUID object. Commented Sep 30, 2022 at 3:09
  • 2
    This is the correct answer. You may add a note that while it works better in this case, asdict will likely be better for composite dictionaries. Commented Sep 30, 2022 at 13:57
  • 1
    @RyanDeschamps done. agreed that was something that should be mentioned at least. Commented Sep 30, 2022 at 15:19
  • 1
    This won't work with the slots=True dataclass parameter introduced in python 3.10
    – G. Ghez
    Commented Oct 8, 2022 at 22:04
13

This is a top google result for "dataclass to dict", and the answers above are overly complicated. You're probably looking for this:

from dataclasses import dataclass
@dataclass
class MessageHeader():
    uuid: str = "abcd"
vars(MessageHeader()) # or MessageHeader().__dict__
1
  • 2
    What about slots=True Commented Jul 30, 2023 at 2:30
2

Inspired by @rv.kvetch's answer, I wrote this decorator, which will generate the code for an asdict method on the fly based on the class definition. It also supports subclassing, meaning the subclass will inherit superclass' attributes.

Decorator:

import typing


def generate_dict_method(
        __source: typing.Literal["slots", "annotations"],
        __name: str,
        /,
        **custom_mappings: typing.Callable[[typing.Any], typing.Any]
):
    if custom_mappings is None:
        custom_mappings = dict()

    def decorator(cls):
        attributes = set()
        for mc in cls.__mro__:
            if __source == 'annotations':
                attrs = getattr(mc, "__annotations__", None)
                if attrs:
                    attrs = attrs.keys()
            elif __source == "slots":
                attrs = getattr(mc, "__slots__", None)
            else:
                raise NotImplementedError(__source)
            if attrs:
                attributes.update(attrs)

        if not attributes:
            raise RuntimeError(
                f"Unable to generate `{__name}` method for `{cls.__qualname__}` class: "
                "no attributes found."
            )

        funclocals = {}
        mapping_to_funcname = {}

        for attrname, f in custom_mappings.items():
            funcname = f'__parse_{attrname}'
            funclocals[funcname] = f
            mapping_to_funcname[attrname] = funcname

        body_lines = ','.join([
            f'"{attrname}": ' + (f'self.{attrname}' if attrname not in custom_mappings
                                 else f'{mapping_to_funcname[attrname]}(self.{attrname})')
            for attrname in attributes
        ])
        txt = f'def {__name}(self):\n return {{{body_lines}}}'
        d = dict()
        exec(txt, funclocals, d)
        setattr(cls, __name, d[__name])
        return cls

    return decorator

Usage:


from dataclasses import dataclass
import json


@dataclass(slots=True, kw_only=True)
class TestBase:
    i1: int
    i2: int


@generate_dict_method("annotations", "asdict", d=(lambda x: "FUNNY" + json.dumps(x) + "JSON"))
@dataclass(slots=True, kw_only=True)
class Test(TestBase):
    i: int
    b: bool
    s: str
    d: dict


a = Test(i=1, b=True, s="test", d={"test": "test"}, i1=2, i2=3)
print(a.asdict())

Output:

{'d': 'FUNNY{"test": "test"}JSON', 'i': 1, 'i1': 2, 'b': True, 's': 'test', 'i2': 3}

As you can see, you only need to provide a custom parser for the **custom_mappings argument with the name of your attribute. This way you can mutate the attribute in any way you see fit.

In your case you can provide the str function for the message_id attribute.

Not the answer you're looking for? Browse other questions tagged or ask your own question.