11

There is already a multi key dict in python and also a multivalued dict. I needed a python dictionary which is both:

example:

# probabilistically fetch any one of baloon, toy or car
d['red','blue','green']== "baloon" or "car" or "toy"  

Probability of d['red']==d['green'] is high and Probability of d['red']!=d['red'] is low but possible

the single output value should be probabilistically determined (fuzzy) based on a rule from keys eg:in above case rule could be if keys have both "red" and "blue" then return "baloon" 80% of time if only blue then return "toy" 15% of time else "car" 5% of time.

The setitem method should be designed such that following is possible:

d["red", "blue"] =[
    ("baloon",haseither('red','green'),0.8),
    ("toy",.....)
    ,....
]

Above assigns multiple values to the dictionary with a predicate function and corresponding probability. And instead of the assignment list above even a dictionary as assignment would be preferable:

d["red", "blue"] ={ 
    "baloon": haseither('red','green',0.8),
    "toy": hasonly("blue",0.15),
    "car": default(0.05)
}

In the above baloon will be returned 80% of time if "red" or green is present , return toy 15% of time if blue present and return car 5% of time without any condition.

Are there any existing data structures which already satisfy the above requirements in python? if no then how can multikeydict code be modified to meet the above requirements in python?

if using dictionary then there can be a configuration file or use of appropriate nested decorators which configures the above probabilistic predicate logics without having to hard code if \else statements .

Note: Above is a useful automata for a rule based auto responder application hence do let me know if any similar rule based framework is available in python even if it does not use the dictionary structure?

18
  • From what I understand, the multi-key part is like a "synonym" key: different ways to refer to the same thing, like 1000, k, kilo in the multi-key-dict readme. My question is: when you say multi-value, Are you saying synonym values? or many different values?
    – chapelo
    Commented May 1, 2015 at 19:56
  • What do you mean by d['red']==d['green'] ? Is that because you've looked up a multi-key including both of those before? I understood your question to mean that d['red'] != d['red'] necessarily... Commented May 2, 2015 at 2:30
  • @chapelo ideally it should have provision for both synonym values as well as different values , to be configured by a "setting" in the dictionary.
    – stackit
    Commented May 2, 2015 at 6:50
  • 1
    WoJ has the right idea, there's no need for this to be dict-like - it's a function. Making it dict-like is obscurification. :( Commented May 2, 2015 at 7:03
  • 1
    What about setting an item? Say you have d['red','blue','green']= "baloon" or "car" or "toy" with the probabilities set for those things. What happens with d['red']='ball'?
    – dawg
    Commented May 2, 2015 at 15:38

4 Answers 4

4
+100

Simulated MultiKey Dictionary

multi_key_dict did not allow __getitem__() with multiple keys at onces...

(e.g. d["red", "green"])

A multi key can be simulated with tuple or set keys. If order does not matter, set seems the best (actually the hashable frozen set, so that ["red", "blue"] is the same a ["blue", "red"].

Simulated MultiVal Dictionary

Multi values are inherent by using certain datatypes, it can be any storage element that may be conveniently indexed. A standard dict should provide that.

Non-determinism

Using a probability distribution defined by the rules and assumptions1, non-deterministic selection is performed using this recipe from the python docs.

MultiKeyMultiValNonDeterministicDict Class

What a name.   \o/-nice!

This class takes multiple keys that define a probabilistic rule set of multiple values. During item creation (__setitem__()) all value probabilities are precomputed for all combinations of keys1. During item access (__getitem__()) the precomputed probability distribution is selected and the result is evaluated based on a random weighted selection.

Definition

import random
import operator
import bisect
import itertools

# or use itertools.accumulate in python 3
def accumulate(iterable, func=operator.add):
    'Return running totals'
    # accumulate([1,2,3,4,5]) --> 1 3 6 10 15
    # accumulate([1,2,3,4,5], operator.mul) --> 1 2 6 24 120
    it = iter(iterable)
    try:
        total = next(it)
    except StopIteration:
        return
    yield total
    for element in it:
        total = func(total, element)
        yield total

class MultiKeyMultiValNonDeterministicDict(dict):

    def key_combinations(self, keys):
        """get all combinations of keys"""
        return [frozenset(subset) for L in range(0, len(keys)+1) for subset in itertools.combinations(keys, L)]

    def multi_val_rule_prob(self, rules, rule):
        """
        assign probabilities for each value, 
        spreading undefined result probabilities
        uniformly over the leftover results not defined by rule.
        """
        all_results = set([result for result_probs in rules.values() for result in result_probs])
        prob = rules[rule]
        leftover_prob = 1.0 - sum([x for x in prob.values()])
        leftover_results = len(all_results) - len(prob)
        for result in all_results:
            if result not in prob:
                # spread undefined prob uniformly over leftover results
                prob[result] = leftover_prob/leftover_results
        return prob

    def multi_key_rule_prob(self, key, val):
        """
        assign probability distributions for every combination of keys,
        using the default for combinations not defined in rule set
        """ 
        combo_probs = {}
        for combo in self.key_combinations(key):
            if combo in val:
                result_probs = self.multi_val_rule_prob(val, combo).items()
            else:
                result_probs = self.multi_val_rule_prob(val, frozenset([])).items()
            combo_probs[combo] = result_probs
        return combo_probs

    def weighted_random_choice(self, weighted_choices):
        """make choice from weighted distribution"""
        choices, weights = zip(*weighted_choices)
        cumdist = list(accumulate(weights))
        return choices[bisect.bisect(cumdist, random.random() * cumdist[-1])]

    def __setitem__(self, key, val):
        """
        set item in dictionary, 
        assigns values to keys with precomputed probability distributions
        """

        precompute_val_probs = self.multi_key_rule_prob(key, val)        
        # use to show ALL precomputed probabilities for key's rule set
        # print precompute_val_probs        

        dict.__setitem__(self, frozenset(key), precompute_val_probs)

    def __getitem__(self, key):
        """
        get item from dictionary, 
        randomly select value based on rule probability
        """
        key = frozenset([key]) if isinstance(key, str) else frozenset(key)             
        val = None
        weighted_val = None        
        if key in self.keys():
            val = dict.__getitem__(self, key)
            weighted_val = val[key]
        else:
            for k in self.keys():
                if key.issubset(k):
                    val = dict.__getitem__(self, k)
                    weighted_val = val[key]

        # used to show probabality for key
        # print weighted_val

        if weighted_val:
            prob_results = self.weighted_random_choice(weighted_val)
        else:
            prob_results = None
        return prob_results

Usage

d = MultiKeyMultiValNonDeterministicDict()

d["red","blue","green"] = {
    # {rule_set} : {result: probability}
    frozenset(["red", "green"]): {"ballon": 0.8},
    frozenset(["blue"]): {"toy": 0.15},
    frozenset([]): {"car": 0.05}
}

Testing

Check the probabilities

N = 10000
red_green_test = {'car':0.0, 'toy':0.0, 'ballon':0.0}
red_blue_test = {'car':0.0, 'toy':0.0, 'ballon':0.0}
blue_test = {'car':0.0, 'toy':0.0, 'ballon':0.0}
red_blue_green_test = {'car':0.0, 'toy':0.0, 'ballon':0.0}
default_test = {'car':0.0, 'toy':0.0, 'ballon':0.0}

for _ in xrange(N):
    red_green_test[d["red","green"]] += 1.0
    red_blue_test[d["red","blue"]] += 1.0
    blue_test[d["blue"]] += 1.0
    default_test[d["green"]] += 1.0
    red_blue_green_test[d["red","blue","green"]] += 1.0

print 'red,green test      =', ' '.join('{0}: {1:05.2f}%'.format(key, 100.0*val/N) for key, val in red_green_test.items())
print 'red,blue test       =', ' '.join('{0}: {1:05.2f}%'.format(key, 100.0*val/N) for key, val in red_blue_test.items())
print 'blue test           =', ' '.join('{0}: {1:05.2f}%'.format(key, 100.0*val/N) for key, val in blue_test.items())
print 'default test        =', ' '.join('{0}: {1:05.2f}%'.format(key, 100.0*val/N) for key, val in default_test.items())
print 'red,blue,green test =', ' '.join('{0}: {1:05.2f}%'.format(key, 100.0*val/N) for key, val in red_blue_green_test.items())

red,green test      = car: 09.89% toy: 10.06% ballon: 80.05%
red,blue test       = car: 05.30% toy: 47.71% ballon: 46.99%
blue test           = car: 41.69% toy: 15.02% ballon: 43.29%
default test        = car: 05.03% toy: 47.16% ballon: 47.81%
red,blue,green test = car: 04.85% toy: 49.20% ballon: 45.95%

Probabilities match rules!


Footnotes

  1. Distribution Assumption

    Since the rule set is not fully defined, assumptions are made about the probability distributions, most of this is done in multi_val_rule_prob(). Basically any undefined probability will be spread uniformly over the remaining values. This is done for all combinations of keys, and creates a generalized key interface for the random weighted selection.

    Given the example rule set

    d["red","blue","green"] = {
        # {rule_set} : {result: probability}
        frozenset(["red", "green"]): {"ballon": 0.8},
        frozenset(["blue"]): {"toy": 0.15},
        frozenset([]): {"car": 0.05}
    }
    

    this will create the following distributions

    'red'           = [('car', 0.050), ('toy', 0.475), ('ballon', 0.475)]
    'green'         = [('car', 0.050), ('toy', 0.475), ('ballon', 0.475)]
    'blue'          = [('car', 0.425), ('toy', 0.150), ('ballon', 0.425)]
    'blue,red'      = [('car', 0.050), ('toy', 0.475), ('ballon', 0.475)]
    'green,red'     = [('car', 0.098), ('toy', 0.098), ('ballon', 0.800)]
    'blue,green'    = [('car', 0.050), ('toy', 0.475), ('ballon', 0.475)]
    'blue,green,red'= [('car', 0.050), ('toy', 0.475), ('ballon', 0.475)]
     default        = [('car', 0.050), ('toy', 0.475), ('ballon', 0.475)]
    

    If this is incorrect, please advise.

12
  • that looks to be a great implementation, thanks, I will soon check it and share doubts if any. I had many praises for this but comment system does not allow it
    – stackit
    Commented May 8, 2015 at 18:05
  • in the default case , you used "green" key what makes you assume that "green" is default key?
    – stackit
    Commented May 8, 2015 at 18:24
  • 1 also the dictionary keys should be frozenset as the order of keys dont matter and it needs to be hashable. I had used that construct for illustrative example.
    – stackit
    Commented May 8, 2015 at 18:31
  • 2 what if another dict assignment comes later like d["blue"]== {frozenset(["blue"]): {"doll": 0.55} ...} how will this be handled?
    – stackit
    Commented May 8, 2015 at 18:36
  • and will this program be tractable if the keys are say 10 keys ? 10!=3628800
    – stackit
    Commented May 8, 2015 at 18:52
4

the single output value should be probabilistically determined (fuzzy) based on a rule from keys eg:in above case rule could be if keys have both "red" and "blue" then return "baloon" 80% of time if only blue then return "toy" 15% of time else "car" 5% of time.

Bare in mind your case analysis is not complete, and it's ambiguous, but you can do the following "in spirit" (fleshing out the desired results):

import random

def randomly_return(*colors):
    colors = set(*colors)
    if 'red' in colors and 'blue' in colors:
        if random.random() < 0.8:  # 80 % of the time
            return "baloon"

    if 'blue' in colors and len(colors) == 1:  # only blue in colors
        if random.random() < 0.15:
            return "toy"
        else:
            if random.random() < 0.05:
                return "car"

# other cases to consider

I would keep this as a function, because it is a function! But if you insist to make it dict-like, then python let's you do this by overriding __getitem__ (IMO it's not pythonic).

class RandomlyReturn(object):
    def __getitem__(self, *colors):
        return randomly_return(*colors)

>>> r = RandomlyReturn()
>>> r["red", "blue"]  # 80% of the time it'll return "baloon"
"baloon"

From your clarification, OP wants to pass and generate:

randreturn((haseither(red,blue),baloon:0.8),((hasonly(blue),toy:0.15)),(default(‌​),car:0.05)))

you want to generate a function as follows:

funcs = {"haseither": lambda needles, haystack: any(n in haystack for n in needles),
         "hasonly": lambda needles, haystack: len(needles) == 1 and needles[1] in haystack}

def make_random_return(crits, default):
    def random_return(*colors):
        colors = set(*colors)
        for c in crits:
            if funcs[c["func"]](c["args"], colors) and random.random() > c["with_prob"]:
                return c["return_value"]
        return default
    return random_return

where the crit and default in this case would be:

crit = [{"func": "haseither", "args": ("red", "blue"), "return_value": "baloon", "with_prob": 0.8}, ...]
default = "car"  # ??
my_random_return = make_random_return(crits, default)

As I say, your probabilities are ambiguous/don't add up, so you're most likely going to need to tweak this...

You can extend the class definition by passing crit and default upon instantiation:

class RandomlyReturn(object):
    def __init__(self, crit, default):
        self.randomly_return = make_random_return(crit, default)
    def __getitem__(self, *colors):
        return self.randomly_return(*colors)

>>> r = RandomlyReturn(crit, default)
>>> r["red", "blue"]  # 80% of the time it'll return "baloon"
"baloon"
25
  • Thanks ,but it needs to not use logics in the function , but independently specified either as a decorator or seperate configuration eg:a decorator randreturn(baloon:0.8,toy:0.15, car:0.05)
    – stackit
    Commented May 2, 2015 at 7:53
  • the getitem will have the above decorator
    – stackit
    Commented May 2, 2015 at 7:55
  • randreturn((has(red,blue),baloon:0.8),(...))
    – stackit
    Commented May 2, 2015 at 7:58
  • above is a nested decorator to be used for getitem function in the dictionary defination
    – stackit
    Commented May 2, 2015 at 8:00
  • 1
    @NizamMohamed you can always modify above code or write a new answer(preferable)
    – stackit
    Commented May 7, 2015 at 14:16
1

If it is possible to change the data structure, it would be simpler to have a function returning the data you need. This will be completely flexible and could accommodate any kind of data, should you need to change them later.

import random

def myfunc(*args):
    if 'red' in args:
        return 'blue'
    elif 'green' in args or 'violet' in args:
        return 'violet'
    else:
        r = random.random()
        if 0 < r < 0.2:
            return 'blue'
        else:
            return 'green'

print(myfunc('green', 'blue'))
print(myfunc('yellow'))

output (the second line obviously changes):

violet
blue
11
  • Yes it is possible , you can give an example
    – stackit
    Commented May 1, 2015 at 5:00
  • Thanks , but how to incorporate this in the multidict structure
    – stackit
    Commented May 1, 2015 at 12:45
  • It is not possible, these data containers are expected to hold static (deterministic) values. Is there a specific reason why you do not want to go for a function-based approach?
    – WoJ
    Commented May 1, 2015 at 18:44
  • 1
    classes can also have this function, how would you incorporate above in multikey dict
    – stackit
    Commented May 1, 2015 at 19:13
  • I am not sure i understand: the functions in classes (methods) are "normal" functions, just in a specific context. Again: a dict holds deterministic values, so your "fuzzy" answer cannot be implemented directly (you could pre-compute the values (once, or on a timely basis) so that whatever is stored in the dict is still deterministic, but with a value which changes from time to time). But I have a hard time understanding why you want to use a dict for that, as opposed to a function.
    – WoJ
    Commented May 1, 2015 at 19:17
1

The OP wants as follows,

d["red", "blue"] ={ 
    "baloon": haseither('red','green',0.8),
    "toy": hasonly("blue",0.15),
    "car": default(0.05)
}  

but this is data with embeded logic. It's very tedious to define a function for every value. What I suggest is to seprate the data and logic.

Python has a data type for this, that's class. A callable instance of a class can be assigned to the dict and let the dict pass the keys and call the object to return the result.

I've inherited and extended multiple_key_dict to support multi-key fetch and to pass keys to the object and call the object which has been stored in the dict.

I assume data is recalculated per rule. This is Rule class, it has list of rules. A rule is a Python expressions and it has access to len function and keys list. So one can write a rule like len(keys) == 1 and 'blue' in keys.

class Rule(object):

    def __init__(self, rule, data):
        self.rule = rule
        self.data = data

This is Data class which has both set of data and rules.

class Data(object):
    def __init__(self, rules):
        self.rules= rules

    def make_choice(self, data):
        data = tuple(self.make_list_of_values(data))
        return random.choice(data)

    def make_list_of_values(self, data):
        for val, weight in data:
            percent = int(weight * 100)
            for v in [val] * percent:
                yield v

    def __call__(self, keys):
        for rule in self.rules:
            if eval(rule.rule,dict(keys=keys)):
                return self.make_choice(rule.data)

This is RuleDict, but non-callables can not be fetched.

class RuleDict(multi_key_dict):
    def __init__(self, *args, **kwargs):
        multi_key_dict.__init__(self, *args, **kwargs)

    def __getitem__(self, keys):
        if isinstance(keys, str):
            keys = (keys, )
        keys_set = frozenset(keys)
        for key in self.keys():
            key = frozenset(key)
            if keys_set <= key:
                return multi_key_dict.__getitem__(self,keys[0])(keys)
        raise KeyError(keys)

usage example,

d = RuleDict()
rule1 = Rule('"red" in keys and "green" in keys',(('baloon',0.8), ('car',0.05), ('toy',0.15)))
rule2 = Rule('len(keys) ==1 and "blue" in keys',(('baloon',0.25), ('car',0.35), ('toy',0.15)))
data = Data((rule1, rule2))
d['red','blue','green'] = data

print(d['red','green'])  

d['red','green'] calls the object, with keys, that was assigned and return the result.

Another approach is, to make the dict callable. This one seems a sound approach, because data and logic are separate. By this, you pass the keys and the logic, a callable, to the dict and return the result. f.e.,

def f(keys, data):
    pass # do the logic and return data

d['red','blue','green'] = ('baloon', 'car', 'toy')

Now call the dict

d(('red','blue'),f)

This is callable dict. If no callable is given, just returns the whole data.

class callable_mkd(multi_key_dict):
    def __init__(self, *args, **kwargs):
        multi_key_dict.__init__(self, *args, **kwargs)

    def __call__(self, keys, process=None):
        keys_set = frozenset(keys)
        for key in self.keys():
            key = frozenset(key)
            if keys_set <= key:
                if process:
                    return process(keys, self[keys[0]])
                return self[keys[0]]
        raise KeyError(keys)

Not the answer you're looking for? Browse other questions tagged or ask your own question.