27

I have a dictionary with the following structure:

D = {
   'rows': 11,
   'cols': 13,
   (i, j): {
              'meta': 'random string',
              'walls': {
                  'E': True,
                  'O': False,
                  'N': True,
                  'S': True
              }
           }
}
# i ranging in {0 .. D['rows']-1}
# j ranging in {0 .. D['cols']-1}

I want to write a function that takes an arbitrary object as argument and checks if it has that structure. This is what I wrote:

def well_formed(L):
    if type(L) != dict:
        return False
    if 'rows' not in L:
        return False
    if 'cols' not in L:
        return False

    nr, nc = L['rows'], L['cols']

    # I should also check the int-ness of nr and nc ...

    if len(L) != nr*nc + 2:
        return False

    for i in range(nr):
        for j in range(nc):
            if not ((i, j) in L
                and 'meta' in L[i, j]
                and  'walls' in L[i, j]
                and type(L[i, j]['meta']) == str
                and type(L[i, j]['walls'])  == dict
                and     'E' in L[i, j]['walls']
                and     'N' in L[i, j]['walls']
                and     'O' in L[i, j]['walls']
                and     'S' in L[i, j]['walls']
                and type(L[i, j]['walls']['E']) == bool
                and type(L[i, j]['walls']['N']) == bool
                and type(L[i, j]['walls']['O']) == bool
                and type(L[i, j]['walls']['S']) == bool):
                return False

    return True

Although it works, I don't like it at all. Is there a Pythonic way to do it?

I'm allowed to use just the standard library.

12
  • 7
    I'd look at writing a JSON Schema for your dictionary and then using the Python module to validate it.
    – tzaman
    Commented May 6, 2017 at 13:52
  • 1
    I think raising exceptions would be a better idea that only return true or false, since it have many test done here. It may be easier to know why the dict doesn't pass the tests
    – Loïc G.
    Commented May 6, 2017 at 13:53
  • @tzaman converting that dictionary to JSON would be tricky, since one of the keys is a tuple, and JSON only natively supports strings as object keys. You'd have to encode the tuple as a string, which can potentially create ambiguity (e.g. if the object being checked has string keys in the same format you're using for the tuples), I'd consider that invalid, but a JSON validator wouldn't be able to tell it apart from a valid tuple that was encoded as a string.
    – tavnab
    Commented May 6, 2017 at 13:58
  • see related question. stackoverflow.com/q/6011881/7724457
    – tell k
    Commented May 6, 2017 at 14:03
  • At the very least, use is instead of == when comparing types.
    – chepner
    Commented May 6, 2017 at 14:15

6 Answers 6

19

Firstly, I think the more 'Pythonic' thing might be to ask forgiveness rather than permission - to check when you needed a property whether the data structure had that property.

But on the other hand, that doesn't help if you've been asked to create something to check it's well-formed. :)

So if you need to check, you can use something like the schema library to define what your data structure should look like, and then check other data structures against that schema.

8

In Python, the exact identity of the types involved is less important than how the values behave. For a defined use of such an object, would the object suffice? This means L doesn't have to be a dict, it just has support __getitem__; L[(i,j)]['meta'] doesn't have to be a str, it just has to support conversion to a string via str(L[(i,j)]['meta']); etc.

Given that relaxation, I would simply try to catch any errors raised by attempting such actions, and return False if any occur. For example,

def well_formed(L):
    try:
        nr = L['rows']
        nc = L['cols']
    except KeyError:
        return False

    try:
        for i in range(nr):
            for j in range(nc):
                str(L[(i,j)]['meta'])
                walls = L[(i,j)]['walls']
                for direction in walls:
                    # Necessary?
                    if direction not in "ENOS":
                        return False
                    if walls[direction] not in (True, False):
                        return False
    except KeyError:
        return False

    return True

Given that any object has a boolean value, it seemed pointless to attempt bool(walls[direction]); rather, if having exactly True or False as a value isn't a hard requirement, I would just any test of the value altogether. Likewise, additional walls may or may not be an issue, and don't have to be explicitly tested for.

3
  • Why two try blocks instead of trying everything?
    – Mephy
    Commented May 6, 2017 at 20:17
  • To be honest, I'm not sure. I was trying a lot of different approaches, and I think at one point it made sense for the first two checks to be in their own try statement. With the version seen here, not so much.
    – chepner
    Commented May 6, 2017 at 21:24
  • 2
    I don't agree with your point about str(). Firstly everything supports str() just like everything supports bool(). Secondly if the value isn't a string it indicates that (1) the code that produced the dict might have an error and (2) the code that uses the dict may need to know how to handle both string and nonstring values.
    – Alex Hall
    Commented May 6, 2017 at 23:03
4

You can compose validation like this (idea from Scala extractors). The advantage is that the validator structure is similar to the structure to test.

A disadvantage is that the many function calls may make it much slower.

class Mapping:
    def __init__(self, **kwargs):
        self.key_values = [KeyValue(k, v) for k, v in kwargs.items()]

    def validate(self, to_validate):
        if not isinstance(to_validate, dict):
            return False

        for validator in self.key_values:
            if not validator.validate(to_validate):
                return False
        return True


class KeyValue:
    def __init__(self, key, value):
        self.key = key
        self.value = value

    def validate(self, to_validate):
        return self.key in to_validate and self.value.validate(to_validate[self.key])


class Boolean:
    def validate(self, to_validate):
        return isinstance(to_validate, bool)


class Integer:
    def validate(self, to_validate):
        return isinstance(to_validate, int)


class String:
    def validate(self, to_validate):
        return isinstance(to_validate, str)


class CustomValidator:
    def validate(self, to_validate):
        if not Mapping(rows=Integer(), cols=Integer()).validate(to_validate):
            return False
        element_validator = Mapping(meta=String(), walls=Mapping(**{k: Boolean() for k in "EONS"}))
        for i in range(to_validate['rows']):
            for j in range(to_validate['cols']):
                if not KeyValue((i, j), element_validator).validate(to_validate):
                    return False
        return True


d = {
    'rows': 11,
    'cols': 13,
}
d.update({(i, j): {
    'meta': 'random string',
    'walls': {
        'E': True,
        'O': False,
        'N': True,
        'S': True
    }
} for i in range(11) for j in range(13)})

assert CustomValidator().validate(d)

Same thing with overriding isinstance(tested with Python 3.5)

class IsInstanceCustomMeta(type):
    def __instancecheck__(self, instance):
        return self.validate(instance)

def create_custom_isinstance_class(f):
    class IsInstanceCustomClass(metaclass=IsInstanceCustomMeta):
        validate = f
    return IsInstanceCustomClass

def Mapping(**kwargs):
    key_values = [KeyValue(k, v) for k, v in kwargs.items()]

    def validate(to_validate):
        if not isinstance(to_validate, dict):
            return False

        for validator in key_values:
            if not isinstance(to_validate, validator):
                return False
        return True

    return create_custom_isinstance_class(validate)

def KeyValue(key, value):
    return create_custom_isinstance_class(lambda to_validate: key in to_validate and isinstance(to_validate[key], value))

def my_format_validation(to_validate):
    if not isinstance(to_validate, Mapping(rows=int, cols=int)):
        return False
    element_validator = Mapping(meta=str, walls=Mapping(**{k: bool for k in "EONS"}))
    for i in range(to_validate['rows']):
        for j in range(to_validate['cols']):
            if not isinstance(to_validate, KeyValue((i, j), element_validator)):
                return False
    return True

MyFormat = create_custom_isinstance_class(my_format_validation)

d = {
    'rows': 11,
    'cols': 13,
}
d.update({(i, j): {
    'meta': 'random string',
    'walls': {
        'E': True,
        'O': False,
        'N': True,
        'S': True
    }
} for i in range(11) for j in range(13)})

assert isinstance(d, MyFormat)
3

If your format were simpler, I'd agree with the other answer/comments to use existing schema validation libraries, like schema and voluptuous. But, given your specific case of having to check a dictionary with tuple keys, and those tuples' values depending on the values of other members of your dict, I think you'd be better off writing your own validator than trying to coax a schema to fit your format.

1
from itertools import product

def isvalid(d):
    try:
        for key in product(range(d['rows']), range(d['cols'])):
            sub = d[key]
            assert (isinstance(sub['meta'], str) and
                    all(isinstance(sub['walls'][c], bool)
                        for c in 'EONS'))
    except (KeyError, TypeError, AssertionError):
        return False
    return True

If Python 2 compatibility is important or it's necessary to assert that there are no additional keys, let me know.

1

If you use something like this:

def get_deep_keys(d, depth=2):
    """Gets a representation of all dictionary keys to a set depth.

    If a (sub)dictionary contains all non-dictionary values, a list of keys
    will be returned.

    If a dictionary contains a mix of types, a dictionary of dicts/lists/types
    will be returned.
    """
    if isinstance(d, dict):
        if depth > 0 and any(isinstance(v, dict) for v in d.values()):
            return {k: get_deep_keys(v, depth=depth - 1) for k, v in d.items()}
        else:
            return set(d.keys())

    else:
        return type(d)

Then you can do:

assert get_deep_keys(D[i, j]) == {
    'meta': str, 'walls': {'E', 'N', 'O', 'S'}}

Inside a loop over i, j. This is pretty easy to modify to return the types of the bottom level elements instead:

def get_deep_keys(d, depth=2):
    """Gets a representation of all dictionary keys to a set depth, with types.
    """
    if isinstance(d, dict) and depth > 0:
        return {k: get_deep_keys(v, depth=depth - 1) for k, v in d.items()}

    return type(d)


get_deep_keys(D)
 # {'meta': str, 'walls': {'E': bool, 'O': bool, 'N': bool, 'S': bool}}

Not the answer you're looking for? Browse other questions tagged or ask your own question.